| Research
Back

The difference between using AI and giving it a job.

Aman ·
ai-native team nia ai-coworker
The difference between using AI and giving it a job.

On working with an AI coworker. Four humans, one AI, and the math that changed when she joined.

We’re four people. We build for insurance. Plus ops, support, customer onboarding, research, and writing.

That math doesn’t work for a four-person team. Or it shouldn’t. The way most teams handle this kind of workload is to hire or to do less. We did neither. We went AI-native top to bottom and built ourselves a fifth teammate: an AI coworker named Nia who doesn’t log off. The workload became possible.

In the last two months: 5,849 messages from her, 5,844 from the rest of us. 998 customer threads handled. Active in every hour of the day.

This post is about what that actually looks like. Not the pitch deck version. The receipts.


The cliche we’re not making

There’s a familiar genre of “we’re AI-native” posts where the punchline is sales credibility. We use our own product. We eat our own dogfood. If you can’t trust us to run AI on our own work, why would you trust us to run AI on yours?

That’s not the argument we’re making. We didn’t go AI-native to look credible to AI buyers, and especially not in this market. Only 10% of P&C insurers are successfully scaling AI today, while 90% of insurance executives intend to invest more in AI in 2026. The market we sell into is AI-curious but largely not yet AI-operational.

A pitch about how AI-native we are wouldn’t land with this audience. So we don’t make it.

We went AI-native because at our headcount, the alternative was to ship less, or ship worse. The sales credibility, if it comes, is downstream. It’s not the reason.

The reason is simpler: a four-person team can’t take on this scope without changing what a four-person team is capable of.


What’s actually in the workload

Kay ships software into insurance.

Coworkers runs browser automation for insurance ops: quote pulls, document filing, the long tail of UI work that hasn’t been API-fied. It has to be robust because insurance UIs are pre-Web 2.0 and brittle in surprising ways.

Copilot runs document extraction, structured outputs, and reasoning-heavy workflows. Compliance audits, proposal generation, quote comparison.

On top of all that: live customer ops, support, onboarding sessions, internal AI tooling, the research site this post lives on, and the writing itself.

Four people. One regulated, high-trust vertical. Everything else stacked on top.


The compounding effect

Going AI-native didn’t just absorb the extra workload. It reshaped the work itself.

On-call collapsed

We used to run a two-person on-call rotation: one person on customer issues full-time, one engineer shadowing in case anything needed code-level fixes. Today, neither slot is staffed. Nia handles the ops side end-to-end. On the engineering side, she takes action herself, asks for permission, or stages the fix and waits for an engineer to approve it. In the same window, she’s staged 338 such approval moments. What used to be two full-time on-call roles is now a Nia loop. Humans are reviewers, not on standby.

Engineering attention moved up the stack

The biggest change is where engineering attention goes. A lot of it used to go toward watching the floor: operational threads, stuck workflows, customer issues — staying close enough to jump in before they escalated. Now we don’t have to actively look unless Nia flags something. The default is no longer engineer watches the floor. The default is Nia watches the floor, engineer steps in when the signal is real.

Self-found bugs ship the same night

When we hit a failure mode in our own product after hours, we ship the fix overnight — not weeks later through a support ticket. The loop went from user reports → ops triage → engineer queues fix → ships next sprint to engineer hits it → engineer + Nia fix → shipped before morning.


What AI-native actually looks like here

The dev stack

Our daily tools today are Codex and Claude Code, after a cycle through Cursor and Windsurf. The shift was from IDE assistance to terminal-native, repo-native agents we can run hard and fold into our own workflows.

To make this concrete, here’s the actual usage across the engineering team from recent May windows:

  • Aman: 4.53B tokens, 11 active days, almost entirely on Claude Code
  • Akhil: 1.87B tokens, 10 active days
  • Anton: 1.46B tokens, last 7 days, mostly on Codex

That’s roughly 8 billion tokens of AI-assisted coding across three engineers. For scale: only about half of professional developers use AI coding tools daily at all, and even among those who do, running into the billions of tokens per month is well above typical usage.

It isn’t a single-sprint spike. Anton’s last 30 days run at 5.47B tokens. My own all-time, since February 2026, sits at 19.9B tokens across 99 active days. It’s the operating baseline.

Nobody on the team is coupled to a single vendor’s bet — we pick the model that fits the job and switch when it stops fitting. The point isn’t a flex. AI-native, for us, isn’t a posture. It’s a usage volume.

Ops delegating real customer-facing work

The default frame for “AI in ops” is AI drafts, human approves. Ours is closer to AI executes within bounded scopes, human owns the boundaries.

Ops on our team isn’t owned by one person — it floats across the team. Whoever’s on it doesn’t write messages for Nia to send; they delegate the actual remediation: workflow restarts on stuck threads, answering customer questions, Canny bug filings, background nudge loops on tasks that need follow-up. Live customer workflows. Real operational consequences. The difference is whether AI is a typing assistant or a teammate.

Engineers self-remediate without ops handoffs

Anton re-runs stuck Quote Comparison workflows directly. The recovery loop for Coworkers isn’t user reports → ops triages → engineer queues fix. It’s engineer + Nia surface the stuck state and unstick it. The ops queue exists for things that actually need a human decision.

Retrospective analysis without a standing role

We used to have a dedicated ops analyst, someone whose job was to track threads, intervention patterns, what failed silently overnight. We don’t anymore. “How many threads yesterday, where did we intervene, what failed silently?” That question gets asked when needed, and Nia answers it. The analysis is on-demand, not a standing role.


What we’d lose if we weren’t AI-native

We’d ship fewer things, slower.

We’d specialize narrower. The delegation patterns above only work because AI widens what each role can carry — without it, we’d need dedicated specialists.

We’d lose the habit of turning operational learning into public research.

We’d be a smaller company doing a smaller job. That’s the actual trade.


Close

AI-native isn’t a culture statement, and it isn’t a sales pitch. It’s a scope decision: the choice to take on more work than your headcount should be able to hold, by changing what your headcount is capable of.

You can stay small and do less. Or you can stay small and do more.

Only one of those is AI-native.

The biggest piece of our AI-native stack is the coworker we built: Nia. The next post is her origin story.


Sources