Docs / Use Cases / AI Agent Company: Outcome-Based Billing with Long Reservations
OUTCOME-BASED · LONG RESERVATIONS · SPEND CAPS

AI Agent Company: Outcome-Based Billing with Long Reservations

How to bill for work, not seats — model an AI agent company (AI SDR, AI paralegal, AI bookkeeper) with outcome-based pricing, long-running reservations, variable cost pass-through, and per-customer spend caps.

Inspired by: AI SDR, AI bookkeeping, AI paralegal, AI support, AI recruiting, the services-dollar thesis

Mental Model

Think of this as the shape AI-native services companies are converging on: you sell work, not seats — closed tickets, reviewed contracts, sent outbound emails. Each outcome kicks off a multi-step agent run that holds credits for minutes or hours, commits actual cost on success, and refunds the customer on failure.

Quick Take
The outcome is the billable unit — 1 closed ticket = N credits, tuned from real LLM cost
Long reservations (minutes to hours) with generous TTL for multi-step agent workflows
Commit actual cost — agent runs have 10× cost variance; price tiers or actual_units absorb it
Release on failure = "no outcome, no charge" — a pricing-page promise customers understand
Per-customer spend caps so one runaway account can't blow up your LLM bill overnight

AI Agent Company

Pattern: outcome-based metering + long reservations + spend caps.

The problem

You’re building an AI-native company that sells work, not software. Your customers pay for outcomes — closed support tickets, reviewed contracts, sent outbound emails, reconciled invoices, qualified leads. Each outcome kicks off a multi-step agent workflow that takes minutes to hours, burns a variable amount of LLM cost, and may fail halfway.

Seat-based billing doesn’t fit. Neither does per-token billing (your customers don’t want to think in tokens). You need:

  • Outcome as the unit of value — bill per closed ticket, not per API call
  • Long reservations to hold estimated cost during agent runs that take minutes or hours
  • Commit the actual cost at job completion — the agent might use 10× more or less LLM than estimated
  • Release on failure — if the agent can’t complete the outcome, the customer isn’t charged
  • Per-customer spend caps so one runaway customer can’t 10× your LLM bill overnight
  • Hybrid pricing — a monthly retainer that includes N outcomes, then metered overage beyond that

This is the pattern the next wave of AI-native companies are converging on: AI SDR, AI bookkeeping, AI paralegal, AI support, AI recruiting. Their billing shape is different enough from classic SaaS that it deserves its own walkthrough.

Why credits are the right primitive here

You could bill directly in dollars — charge $5 per closed ticket, $2 per sent email. But you’ll hit these problems fast:

  • Variable LLM cost: a simple ticket costs $0.20 to close; a complex one costs $4. Flat per-outcome pricing either loses money on hard ones or overcharges on easy ones.
  • Multi-outcome products: a customer wants to buy “any kind of agent work” — closed tickets, sent emails, scheduled meetings — and have a single balance that covers all of them.
  • Enterprise contracts: one prospect wants a $10k/month commit with any mix of outcomes; another wants pure per-outcome pricing. Same product, two shapes.

Credits normalize all of this. 1 closed ticket = 100 credits. 1 sent email = 10 credits. 1 complex contract review = 1000 credits. The customer sees a single balance and a single bill; you tune the credit-per-outcome ratio based on actual LLM cost and desired margin.

Credit structure

Block typePriorityExpirySource
Retainer grant (monthly)0Cycle endSubscription plan grant
Rollover of unused retainer0Next cycleAuto at renewal (optional)
Overage / topup packs030–90 daysTopup grant after payment
Free trial credits0NoneTopup grant on signup

Retainer burns first — customers use what they paid for on-plan before any rollover or overage packs kick in. Trial credits are the last resort (the safety net that disappears from the available balance once real credits land).

Billable metrics — one per outcome type

Define a metric for each unit of work your agent sells:

POST /v1/billable-metrics
Idempotency-Key: metric:ticket_closed

{ "key": "ticket_closed", "name": "Support ticket closed" }
POST /v1/billable-metrics
Idempotency-Key: metric:email_sent

{ "key": "email_sent", "name": "Outbound email sent" }
POST /v1/billable-metrics
Idempotency-Key: metric:contract_reviewed

{ "key": "contract_reviewed", "name": "Contract reviewed" }

Then a metering rule per metric with the credit cost that matches your LLM economics:

POST /v1/metering-rules
Idempotency-Key: rule:ticket_closed

{
  "billable_metric_key": "ticket_closed",
  "cost_type": "per_unit",
  "credit_cost": 100000,
  "unit_cost": 100000
}

1 closed ticket = 100,000mc = 100 credits. Tune this number from real LLM-cost telemetry — aim for ~60–80% gross margin per outcome.

Subscription shape: retainer + metered overage

Define a plan variant that grants retainer credits each cycle:

POST /v1/plans/<plan_id>/variants
Idempotency-Key: variant:pro-monthly

{
  "name": "Pro — 50 tickets/mo",
  "billing_mode": "prepaid",
  "cycle_duration_days": 30,
  "recurring_grants": [
    {
      "credits": 5000000,
      "priority": 0,
      "expires_after_seconds": 2592000
    }
  ],
  "price_amount": 49900,
  "price_currency": "USD"
}

5,000,000mc = 5000 credits = 50 tickets at 100cr each. The 30-day expiry means retainer credits burn before no-expiry trial credits and don’t accumulate forever.

When the customer exceeds the retainer, your code grants overage packs (same pattern as AI Generation App) or bills postpaid via a separate metered plan variant.

Integration flow

1. Customer onboarding — start with free trial

POST /v1/customers
Idempotency-Key: signup:<your_user_id>

{ "external_id": "cus_acme_corp" }
POST /v1/topups/grant
Idempotency-Key: topup:trial-<your_user_id>

{
  "customer_id": "cus_...",
  "credits": 500000,
  "metadata": { "source": "free_trial", "outcomes": 5 }
}

500,000mc = 5 free tickets. Priority 0 (default), no expiry — but they’ll burn after any expiring paid credits.

2. Activate subscription on purchase

After payment confirmation for the $499/mo Pro plan:

POST /v1/subscriptions
Idempotency-Key: sub:<payment_id>

{
  "customer_id": "cus_...",
  "plan_variant_id": "var_pro_monthly",
  "external_payment_id": "stripe_sub_abc"
}

This activates the subscription and fires the initial recurring grant — 5000 credits land with priority 0, expiring at cycle end.

3. Job starts — reserve with an estimate

This is the heart of the pattern. A new ticket arrives; your agent begins work.

Step 1: Check entitlement with the estimated cost.

GET /v1/customers/cus_.../entitlements/ticket_closed?units=1
{
  "allowed": true,
  "balance": 5000000,
  "effective_balance": 5000000,
  "cost_per_unit": 100000,
  "cost_total": 100000,
  "affordable_units": 50
}

If allowed is false — or if you enforce a per-customer spend cap (e.g., max 100 outcomes per day) and the customer has hit it — reject the job before the agent burns LLM cost.

Step 2: Reserve with an estimated_units that covers the worst-case LLM spend.

POST /v1/reservations
Idempotency-Key: reservation:<job_id>

{
  "customer_id": "cus_...",
  "billable_metric_key": "ticket_closed",
  "estimated_units": 1,
  "ttl_seconds": 7200,
  "metadata": {
    "job_id": "job_abc123",
    "ticket_id": "tkt_998",
    "agent_version": "v3.2"
  }
}

TTL is different from AI generation. A generation run takes 30–90 seconds — 300s TTL is fine. An agent workflow may take hours. Set ttl_seconds to 2–3× the 95th-percentile job duration. 7200 (2 hours) is a safe default; go up to 3600 × 6 for long research tasks. The reaper catches abandoned reservations if a worker crashes or gets stuck.

Response:

{
  "id": "res_...",
  "status": "active",
  "estimated_units": 1,
  "estimated_cost": 100000,
  "expires_at": "2026-04-17T10:00:00Z",
  "effective_balance_after": 4900000
}

The hold drops the effective balance immediately — so a concurrent job request sees less available, and your spend-cap check stays accurate.

4. Agent runs — LLM calls, tool calls, human-in-the-loop

Your agent does its work. It may call the LLM 1 time or 200 times. It may need a human to approve a step. It may take 30 seconds or 6 hours. None of this touches QuotaStack — the reservation is holding the credit slot.

Track the actual LLM spend as you go. At the end, you’ll know the real cost.

5a. Job completes — commit actual cost

If the agent closed the ticket successfully:

POST /v1/reservations/res_.../commit
Idempotency-Key: commit:res_...

{
  "actual_units": 1,
  "metadata": {
    "llm_cost_usd": 0.42,
    "llm_tokens_in": 18200,
    "llm_tokens_out": 4100,
    "tool_calls": 12,
    "job_duration_ms": 340000,
    "outcome_verified_by": "resolver_v2"
  }
}

Cost adjustment at commit. If your job turned out much more expensive than estimated (say 3× tokens), you can pass actual_units: 3 at commit — the ledger records 300,000mc burned instead of 100,000mc. Most agent businesses keep actual_units: 1 (one outcome = one unit) and absorb the variance; sophisticated ones tier outcomes by complexity (easy/medium/hard) with different credit costs and commit to the tier that matched reality.

5b. Job fails — release, customer not charged

If the agent couldn’t close the ticket (missing data, escalation to human, unresolvable):

POST /v1/reservations/res_.../release
Idempotency-Key: release:res_...

{}

The 100,000mc hold is released. The customer’s balance returns. This is the core promise of outcome-based billing: no outcome, no charge. This is a feature you can advertise on your pricing page.

6. Enforce per-customer spend caps

The “one customer blows up our LLM bill” nightmare. Solve it with entitlement policies at the customer level:

PATCH /v1/customers/cus_.../limits

{
  "daily_credit_cap": 1000000,
  "monthly_credit_cap": 15000000
}

(Or model this in your application layer by checking daily-summed usage before reserving.) Entitlement checks will now return allowed: false when the cap is hit, and your agent refuses to start new jobs until the next cycle or a cap lift. This keeps a rogue customer from running 10,000 tickets in an afternoon.

Worked example: full AI SDR job handler

def handle_new_lead(customer_id, lead):
    job_id = generate_uuid()

    # 1. Entitlement — is this customer allowed to generate another outbound email?
    ent = quotastack.get_entitlement(customer_id, "email_sent", units=1)
    if not ent.allowed:
        return skip_lead(lead, reason="customer_out_of_credits")

    # 1b. Spend-cap guard — apply our own daily cap
    if daily_usage(customer_id) >= DAILY_CAP:
        return skip_lead(lead, reason="daily_cap_hit")

    # 2. Reserve for up to 1 hour (SDR job includes research + draft + review)
    try:
        reservation = quotastack.reserve(
            customer_id=customer_id,
            billable_metric_key="email_sent",
            estimated_units=1,
            ttl_seconds=3600,
            idempotency_key=f"reservation:{job_id}",
            metadata={"job_id": job_id, "lead_id": lead.id}
        )
    except InsufficientCreditsError:
        return skip_lead(lead, reason="race_condition")

    # 3. Run the multi-step agent workflow
    try:
        research   = agent.research_company(lead.company)
        draft      = agent.draft_email(lead, research)
        approved   = agent.review_and_approve(draft)
        sent       = mail_service.send(lead.email, approved.body)
    except AgentFailedError as e:
        quotastack.release(
            reservation_id=reservation.id,
            idempotency_key=f"release:{reservation.id}"
        )
        return failed(job_id, reason=str(e))

    # 4. Commit — record the real cost telemetry
    quotastack.commit(
        reservation_id=reservation.id,
        actual_units=1,
        idempotency_key=f"commit:{reservation.id}",
        metadata={
            "llm_cost_usd": agent.total_cost,
            "tool_calls": agent.tool_count,
            "duration_ms": agent.duration_ms,
        }
    )
    return completed(job_id, sent_email_id=sent.id)

Pricing pages your customers can actually compare

Because your unit of value is the outcome, your pricing page can be one line:

Pro — $499/mo, 50 outcomes. Additional outcomes $12 each. No outcome, no charge.

Not “$0.02 per 1k input tokens, $0.06 per 1k output tokens, billing accurate to the penny” — nobody outside of infra buyers reads that. Outcome pricing is what the services-dollar buyer understands. QuotaStack is the engine that makes it real behind the scenes.

Tips

  • Estimate conservatively; commit accurately. A reservation that’s too small means concurrent jobs may blow past the customer’s budget before the commit lands. Err high on the estimate, refund the delta at commit time by reducing actual_units.

  • Model outcome tiers explicitly. If you know your jobs span a 20× cost range, don’t average — define ticket_closed_easy, ticket_closed_medium, ticket_closed_hard as separate metrics with different credit costs, and let the agent classify complexity at commit time. Customers see a single “outcome” unit on their invoice; you see the variance.

  • Track LLM cost in reservation metadata, not as separate billing. The customer pays for outcomes, not tokens. But YOU need the LLM cost data for gross-margin reporting. Put it in the commit metadata and query it later via the Audit Log.

  • Use subscriptions for retainers, not per-outcome billing. The plan_variant.recurring_grants pattern gives you predictable monthly revenue + a fair rollover story. Pure per-outcome billing is fine for pay-as-you-go tiers but painful for customer forecasting.

  • Fire reservation.expired webhooks into your ops channel. A reservation expiring before commit usually means an agent worker died mid-job. Worth paging on-call.

  • Rollover vs. reset is a policy knob. Services-firm customers expect to “use what they paid for” — rollover reduces churn. AI-native challengers often don’t offer it, arguing credits are cheap. Model your market and pick; QuotaStack supports both via rollover_percentage on the plan variant.

Why this pattern wins over seat pricing

Seat pricing is aligned to effort (engineers hired = seats needed). Outcome pricing is aligned to value (tickets closed = revenue delivered). Seat pricing breaks when your customer gets more efficient — they reduce seats and your revenue shrinks. Outcome pricing scales with their success — they close more tickets, you make more revenue.

This is the macro shift Sequoia’s services-dollar thesis names. The technical substrate that makes outcome pricing possible — reservations, commits, releases, spend caps, variable credit costs — is exactly what QuotaStack gives you.

See also: Reservations, Metering, Subscriptions, Idempotency.

Concepts used in this pattern

🤖
Building with an AI agent?
Get this page as markdown: /docs/use-cases/ai-agent-company.md · Full index: /llms.txt