---
title: "AI Agent Company: Outcome-Based Billing with Long Reservations"
description: "How to bill for work, not seats — model an AI agent company (AI SDR, AI paralegal, AI bookkeeper) with outcome-based pricing, long-running reservations, variable cost pass-through, and per-customer spend caps."
order: 6
---

# AI Agent Company

Pattern: outcome-based metering + long reservations + spend caps.

**Pattern:** OUTCOME-BASED · LONG RESERVATIONS · SPEND CAPS

*Inspired by: AI SDR, AI bookkeeping, AI paralegal, AI support, AI recruiting, the services-dollar thesis*

> **Mental Model:** Think of this as the **shape AI-native services companies are converging on**: you sell <em>work, not seats</em> — closed tickets, reviewed contracts, sent outbound emails. Each outcome kicks off a multi-step agent run that holds credits for minutes or hours, commits actual cost on success, and refunds the customer on failure.

## Quick Take

- The **outcome is the billable unit** — 1 closed ticket = N credits, tuned from real LLM cost
- **Long reservations** (minutes to hours) with generous TTL for multi-step agent workflows
- **Commit actual cost** — agent runs have 10× cost variance; price tiers or `actual_units` absorb it
- **Release on failure = "no outcome, no charge"** — a pricing-page promise customers understand
- **Per-customer spend caps** so one runaway account can't blow up your LLM bill overnight

## Diagram

Customer subscribes to a retainer plan (e.g. $499/mo, 50 outcomes). Each job: agent checks entitlement → reserves estimated credits with long TTL → runs multi-step workflow (minutes to hours) → commits actual cost on success or releases on failure. Reservation TTL + per-customer spend caps provide the cost guardrails that seat-based billing can't.

```mermaid
flowchart TD
    A[Retainer purchase] --> B[POST /v1/subscriptions<br/>Pro plan · 50 outcomes/mo]
    B --> C[Monthly grant<br/>5000 credits, P0]
    D[New job arrives] --> E[Entitlement + spend-cap check]
    E --> F[POST /v1/reservations<br/>estimated_units=1, ttl_seconds=7200]
    F --> G[Agent runs<br/>LLM + tools + HIL<br/>minutes to hours]
    G --> H{Outcome?}
    H -->|success| I[POST /v1/reservations/id/commit<br/>actual_units + cost telemetry]
    H -->|failure| J[POST /v1/reservations/id/release<br/>customer not charged]
    H -->|worker crash| K[TTL auto-release]
```

## The problem

You're building an AI-native company that sells **work, not software**. Your customers pay for *outcomes* — closed support tickets, reviewed contracts, sent outbound emails, reconciled invoices, qualified leads. Each outcome kicks off a multi-step agent workflow that takes minutes to hours, burns a variable amount of LLM cost, and may fail halfway.

Seat-based billing doesn't fit. Neither does per-token billing (your customers don't want to think in tokens). You need:

- **Outcome as the unit of value** — bill per closed ticket, not per API call
- **Long reservations** to hold estimated cost during agent runs that take minutes or hours
- **Commit the actual cost** at job completion — the agent might use 10× more or less LLM than estimated
- **Release on failure** — if the agent can't complete the outcome, the customer isn't charged
- **Per-customer spend caps** so one runaway customer can't 10× your LLM bill overnight
- **Hybrid pricing** — a monthly retainer that includes N outcomes, then metered overage beyond that

This is the pattern the next wave of AI-native companies are converging on: AI SDR, AI bookkeeping, AI paralegal, AI support, AI recruiting. Their billing shape is different enough from classic SaaS that it deserves its own walkthrough.

## Why credits are the right primitive here

You could bill directly in dollars — charge $5 per closed ticket, $2 per sent email. But you'll hit these problems fast:

- **Variable LLM cost**: a simple ticket costs $0.20 to close; a complex one costs $4. Flat per-outcome pricing either loses money on hard ones or overcharges on easy ones.
- **Multi-outcome products**: a customer wants to buy "any kind of agent work" — closed tickets, sent emails, scheduled meetings — and have a single balance that covers all of them.
- **Enterprise contracts**: one prospect wants a $10k/month commit with any mix of outcomes; another wants pure per-outcome pricing. Same product, two shapes.

Credits normalize all of this. 1 closed ticket = 100 credits. 1 sent email = 10 credits. 1 complex contract review = 1000 credits. The customer sees a single balance and a single bill; you tune the credit-per-outcome ratio based on actual LLM cost and desired margin.

## Credit structure

| Block type | Priority | Expiry | Source |
|---|---|---|---|
| Retainer grant (monthly) | 0 | Cycle end | Subscription plan grant |
| Rollover of unused retainer | 0 | Next cycle | Auto at renewal (optional) |
| Overage / topup packs | 0 | 30–90 days | Topup grant after payment |
| Free trial credits | 0 | None | Topup grant on signup |

Retainer burns first — customers use what they paid for on-plan before any rollover or overage packs kick in. Trial credits are the last resort (the safety net that disappears from the available balance once real credits land).

## Billable metrics — one per outcome type

Define a metric for each unit of work your agent sells:

```bash
POST /v1/billable-metrics
Idempotency-Key: metric:ticket_closed

{ "key": "ticket_closed", "name": "Support ticket closed" }
```

```bash
POST /v1/billable-metrics
Idempotency-Key: metric:email_sent

{ "key": "email_sent", "name": "Outbound email sent" }
```

```bash
POST /v1/billable-metrics
Idempotency-Key: metric:contract_reviewed

{ "key": "contract_reviewed", "name": "Contract reviewed" }
```

Then a metering rule per metric with the credit cost that matches your LLM economics:

```bash
POST /v1/metering-rules
Idempotency-Key: rule:ticket_closed

{
  "billable_metric_key": "ticket_closed",
  "cost_type": "per_unit",
  "credit_cost": 100000,
  "unit_cost": 100000
}
```

1 closed ticket = 100,000mc = 100 credits. Tune this number from real LLM-cost telemetry — aim for ~60–80% gross margin per outcome.

## Subscription shape: retainer + metered overage

Define a plan variant that grants retainer credits each cycle:

```bash
POST /v1/plans/<plan_id>/variants
Idempotency-Key: variant:pro-monthly

{
  "name": "Pro — 50 tickets/mo",
  "billing_mode": "prepaid",
  "cycle_duration_days": 30,
  "recurring_grants": [
    {
      "credits": 5000000,
      "priority": 0,
      "expires_after_seconds": 2592000
    }
  ],
  "price_amount": 49900,
  "price_currency": "USD"
}
```

5,000,000mc = 5000 credits = 50 tickets at 100cr each. The 30-day expiry means retainer credits burn before no-expiry trial credits and don't accumulate forever.

When the customer exceeds the retainer, your code grants overage packs (same pattern as AI Generation App) or bills postpaid via a separate metered plan variant.

## Integration flow

### 1. Customer onboarding — start with free trial

```bash
POST /v1/customers
Idempotency-Key: signup:<your_user_id>

{ "external_id": "cus_acme_corp" }
```

```bash
POST /v1/topups/grant
Idempotency-Key: topup:trial-<your_user_id>

{
  "customer_id": "cus_...",
  "credits": 500000,
  "metadata": { "source": "free_trial", "outcomes": 5 }
}
```

500,000mc = 5 free tickets. Priority 0 (default), no expiry — but they'll burn after any expiring paid credits.

### 2. Activate subscription on purchase

After payment confirmation for the $499/mo Pro plan:

```bash
POST /v1/subscriptions
Idempotency-Key: sub:<payment_id>

{
  "customer_id": "cus_...",
  "plan_variant_id": "var_pro_monthly",
  "external_payment_id": "stripe_sub_abc"
}
```

This activates the subscription and fires the initial recurring grant — 5000 credits land with priority 0, expiring at cycle end.

### 3. Job starts — reserve with an estimate

This is the heart of the pattern. A new ticket arrives; your agent begins work.

**Step 1: Check entitlement with the estimated cost.**

```bash
GET /v1/customers/cus_.../entitlements/ticket_closed?units=1
```

```json
{
  "allowed": true,
  "balance": 5000000,
  "effective_balance": 5000000,
  "cost_per_unit": 100000,
  "cost_total": 100000,
  "affordable_units": 50
}
```

If `allowed` is false — or if you enforce a per-customer spend cap (e.g., max 100 outcomes per day) and the customer has hit it — reject the job before the agent burns LLM cost.

**Step 2: Reserve with an estimated_units that covers the worst-case LLM spend.**

```bash
POST /v1/reservations
Idempotency-Key: reservation:<job_id>

{
  "customer_id": "cus_...",
  "billable_metric_key": "ticket_closed",
  "estimated_units": 1,
  "ttl_seconds": 7200,
  "metadata": {
    "job_id": "job_abc123",
    "ticket_id": "tkt_998",
    "agent_version": "v3.2"
  }
}
```

**TTL is different from AI generation.** A generation run takes 30–90 seconds — 300s TTL is fine. An agent workflow may take hours. Set `ttl_seconds` to 2–3× the 95th-percentile job duration. `7200` (2 hours) is a safe default; go up to `3600 × 6` for long research tasks. The reaper catches abandoned reservations if a worker crashes or gets stuck.

Response:
```json
{
  "id": "res_...",
  "status": "active",
  "estimated_units": 1,
  "estimated_cost": 100000,
  "expires_at": "2026-04-17T10:00:00Z",
  "effective_balance_after": 4900000
}
```

The hold drops the effective balance immediately — so a concurrent job request sees less available, and your spend-cap check stays accurate.

### 4. Agent runs — LLM calls, tool calls, human-in-the-loop

Your agent does its work. It may call the LLM 1 time or 200 times. It may need a human to approve a step. It may take 30 seconds or 6 hours. None of this touches QuotaStack — the reservation is holding the credit slot.

Track the actual LLM spend as you go. At the end, you'll know the real cost.

### 5a. Job completes — commit actual cost

If the agent closed the ticket successfully:

```bash
POST /v1/reservations/res_.../commit
Idempotency-Key: commit:res_...

{
  "actual_units": 1,
  "metadata": {
    "llm_cost_usd": 0.42,
    "llm_tokens_in": 18200,
    "llm_tokens_out": 4100,
    "tool_calls": 12,
    "job_duration_ms": 340000,
    "outcome_verified_by": "resolver_v2"
  }
}
```

**Cost adjustment at commit.** If your job turned out much more expensive than estimated (say 3× tokens), you can pass `actual_units: 3` at commit — the ledger records 300,000mc burned instead of 100,000mc. Most agent businesses keep `actual_units: 1` (one outcome = one unit) and absorb the variance; sophisticated ones tier outcomes by complexity (easy/medium/hard) with different credit costs and commit to the tier that matched reality.

### 5b. Job fails — release, customer not charged

If the agent couldn't close the ticket (missing data, escalation to human, unresolvable):

```bash
POST /v1/reservations/res_.../release
Idempotency-Key: release:res_...

{}
```

The 100,000mc hold is released. The customer's balance returns. This is the core promise of outcome-based billing: **no outcome, no charge**. This is a feature you can advertise on your pricing page.

### 6. Enforce per-customer spend caps

The "one customer blows up our LLM bill" nightmare. Solve it with entitlement policies at the customer level:

```bash
PATCH /v1/customers/cus_.../limits

{
  "daily_credit_cap": 1000000,
  "monthly_credit_cap": 15000000
}
```

(Or model this in your application layer by checking daily-summed usage before reserving.) Entitlement checks will now return `allowed: false` when the cap is hit, and your agent refuses to start new jobs until the next cycle or a cap lift. This keeps a rogue customer from running 10,000 tickets in an afternoon.

## Worked example: full AI SDR job handler

```python
def handle_new_lead(customer_id, lead):
    job_id = generate_uuid()

    # 1. Entitlement — is this customer allowed to generate another outbound email?
    ent = quotastack.get_entitlement(customer_id, "email_sent", units=1)
    if not ent.allowed:
        return skip_lead(lead, reason="customer_out_of_credits")

    # 1b. Spend-cap guard — apply our own daily cap
    if daily_usage(customer_id) >= DAILY_CAP:
        return skip_lead(lead, reason="daily_cap_hit")

    # 2. Reserve for up to 1 hour (SDR job includes research + draft + review)
    try:
        reservation = quotastack.reserve(
            customer_id=customer_id,
            billable_metric_key="email_sent",
            estimated_units=1,
            ttl_seconds=3600,
            idempotency_key=f"reservation:{job_id}",
            metadata={"job_id": job_id, "lead_id": lead.id}
        )
    except InsufficientCreditsError:
        return skip_lead(lead, reason="race_condition")

    # 3. Run the multi-step agent workflow
    try:
        research   = agent.research_company(lead.company)
        draft      = agent.draft_email(lead, research)
        approved   = agent.review_and_approve(draft)
        sent       = mail_service.send(lead.email, approved.body)
    except AgentFailedError as e:
        quotastack.release(
            reservation_id=reservation.id,
            idempotency_key=f"release:{reservation.id}"
        )
        return failed(job_id, reason=str(e))

    # 4. Commit — record the real cost telemetry
    quotastack.commit(
        reservation_id=reservation.id,
        actual_units=1,
        idempotency_key=f"commit:{reservation.id}",
        metadata={
            "llm_cost_usd": agent.total_cost,
            "tool_calls": agent.tool_count,
            "duration_ms": agent.duration_ms,
        }
    )
    return completed(job_id, sent_email_id=sent.id)
```

## Pricing pages your customers can actually compare

Because your unit of value is the outcome, your pricing page can be one line:

> **Pro — $499/mo, 50 outcomes. Additional outcomes $12 each. No outcome, no charge.**

Not *"$0.02 per 1k input tokens, $0.06 per 1k output tokens, billing accurate to the penny"* — nobody outside of infra buyers reads that. Outcome pricing is what the services-dollar buyer understands. QuotaStack is the engine that makes it real behind the scenes.

## Tips

- **Estimate conservatively; commit accurately.** A reservation that's too small means concurrent jobs may blow past the customer's budget before the commit lands. Err high on the estimate, refund the delta at commit time by reducing `actual_units`.

- **Model outcome tiers explicitly.** If you know your jobs span a 20× cost range, don't average — define `ticket_closed_easy`, `ticket_closed_medium`, `ticket_closed_hard` as separate metrics with different credit costs, and let the agent classify complexity at commit time. Customers see a single "outcome" unit on their invoice; you see the variance.

- **Track LLM cost in reservation metadata, not as separate billing.** The customer pays for outcomes, not tokens. But YOU need the LLM cost data for gross-margin reporting. Put it in the `commit` metadata and query it later via the Audit Log.

- **Use subscriptions for retainers, not per-outcome billing.** The `plan_variant.recurring_grants` pattern gives you predictable monthly revenue + a fair rollover story. Pure per-outcome billing is fine for pay-as-you-go tiers but painful for customer forecasting.

- **Fire `reservation.expired` webhooks into your ops channel.** A reservation expiring before commit usually means an agent worker died mid-job. Worth paging on-call.

- **Rollover vs. reset is a policy knob.** Services-firm customers expect to "use what they paid for" — rollover reduces churn. AI-native challengers often don't offer it, arguing credits are cheap. Model your market and pick; QuotaStack supports both via `rollover_percentage` on the plan variant.

## Why this pattern wins over seat pricing

Seat pricing is aligned to effort (engineers hired = seats needed). **Outcome pricing is aligned to value** (tickets closed = revenue delivered). Seat pricing breaks when your customer gets more efficient — they reduce seats and your revenue shrinks. Outcome pricing scales with their success — they close more tickets, you make more revenue.

This is the macro shift Sequoia's services-dollar thesis names. The technical substrate that makes outcome pricing possible — reservations, commits, releases, spend caps, variable credit costs — is exactly what QuotaStack gives you.

See also: [Reservations](/docs/concepts/reservations), [Metering](/docs/concepts/metering), [Subscriptions](/docs/concepts/subscriptions), [Idempotency](/docs/concepts/idempotency).

## Concepts Used

- [Reservations](/docs/concepts/reservations)
- [Subscriptions](/docs/concepts/subscriptions)
- [Metering](/docs/concepts/metering)
- [Entitlements](/docs/concepts/entitlements)