AI Agent Company: Outcome-Based Billing with Long Reservations

How to bill for work, not seats — model an AI agent company (AI SDR, AI paralegal, AI bookkeeper) with outcome-based pricing, long-running reservations, variable cost pass-through, and per-customer spend caps.

AI Agent Company

Pattern: outcome-based metering + long reservations + spend caps.

The problem

You’re building an AI-native company that sells work, not software. Your customers pay for outcomes — closed support tickets, reviewed contracts, sent outbound emails, reconciled invoices, qualified leads. Each outcome kicks off a multi-step agent workflow that takes minutes to hours, burns a variable amount of LLM cost, and may fail halfway.

Seat-based billing doesn’t fit. Neither does per-token billing (your customers don’t want to think in tokens). You need:

Outcome as the unit of value — bill per closed ticket, not per API call
Long reservations to hold estimated cost during agent runs that take minutes or hours
Commit the actual cost at job completion — the agent might use 10× more or less LLM than estimated
Release on failure — if the agent can’t complete the outcome, the customer isn’t charged
Per-customer spend caps so one runaway customer can’t 10× your LLM bill overnight
Hybrid pricing — a monthly retainer that includes N outcomes, then metered overage beyond that

This is the pattern the next wave of AI-native companies are converging on: AI SDR, AI bookkeeping, AI paralegal, AI support, AI recruiting. Their billing shape is different enough from classic SaaS that it deserves its own walkthrough.

Why credits are the right primitive here

You could bill directly in dollars — charge $5 per closed ticket, $2 per sent email. But you’ll hit these problems fast:

Variable LLM cost: a simple ticket costs $0.20 to close; a complex one costs $4. Flat per-outcome pricing either loses money on hard ones or overcharges on easy ones.
Multi-outcome products: a customer wants to buy “any kind of agent work” — closed tickets, sent emails, scheduled meetings — and have a single balance that covers all of them.
Enterprise contracts: one prospect wants a $10k/month commit with any mix of outcomes; another wants pure per-outcome pricing. Same product, two shapes.

Credits normalize all of this. 1 closed ticket = 100 credits. 1 sent email = 10 credits. 1 complex contract review = 1000 credits. The customer sees a single balance and a single bill; you tune the credit-per-outcome ratio based on actual LLM cost and desired margin.

Credit structure

Block type	Expiry	Source
Retainer grant (monthly)	Cycle end	Subscription plan grant
Rollover of unused retainer	Next cycle	Auto at renewal (optional)
Overage / topup packs	30–90 days	Topup grant after payment
Free trial credits	None	Topup grant on signup

Retainer burns first — customers use what they paid for on-plan before any rollover or overage packs kick in. Trial credits are the last resort (the safety net that disappears from the available balance once real credits land).

Billable metrics — one per outcome type

Define a metric for each unit of work your agent sells:

POST /v1/billable-metrics
Idempotency-Key: metric:ticket_closed

{ "key": "ticket_closed", "name": "Support ticket closed" }

POST /v1/billable-metrics
Idempotency-Key: metric:email_sent

{ "key": "email_sent", "name": "Outbound email sent" }

POST /v1/billable-metrics
Idempotency-Key: metric:contract_reviewed

{ "key": "contract_reviewed", "name": "Contract reviewed" }

Then a metering rule per metric with the credit cost that matches your LLM economics:

POST /v1/metering-rules
Idempotency-Key: rule:ticket_closed

{
  "billable_metric_key": "ticket_closed",
  "cost_type": "per_unit",
  "credit_cost": 100000,
  "unit_cost": 100000
}

1 closed ticket = 100,000mc = 100 credits. Tune this number from real LLM-cost telemetry — aim for ~60–80% gross margin per outcome.

Modeling outcome complexity

Agent jobs have wildly different costs. A password reset takes one LLM call ($0.05). A billing dispute takes 12 tool calls and 3 human-in-the-loop steps ($4). If you flat-price every ticket at 100 credits, you’ll lose money on the hard ones or overcharge on the easy ones.

Two patterns:

Option A: One metric, variable units (recommended)

Keep the single ticket_closed metric. Encode complexity as units:

Complexity	Units	Credit cost	Typical LLM cost
Simple (auto-resolved)	1	100 credits	$0.05
Standard	2	200 credits	$0.40
Complex (research-heavy)	3	300 credits	$2–4

The key insight: reserve for the worst case, commit for reality. Your agent doesn’t know how hard a ticket is until it finishes. Reserve 3 units (300 credits) at job start. If the ticket turns out simple, commit with actual_units: 1 — only 100 credits are charged, the remaining 200-credit hold releases automatically.

The customer’s invoice shows “14 tickets closed” — a single line. You see the complexity breakdown in your internal analytics via commit metadata.

Option B: Separate metrics (for independent caps)

When you need per-complexity spend caps or disaggregated reporting, create separate metrics: ticket_closed_simple (80,000mc) and ticket_closed_complex (250,000mc). This lets you cap complex tickets independently (e.g., max 10 complex resolutions per day per customer) while leaving simple tickets uncapped.

The downside is operational complexity — the agent must classify the ticket before work starts, and the customer sees multiple line items. Use Option A unless you have a specific reason to cap or report complexity tiers separately.

Subscription shape: retainer + metered overage

Define a plan variant that grants retainer credits each cycle:

POST /v1/plans/<plan_id>/variants
Idempotency-Key: variant:pro-monthly

{
  "name": "Pro — 50 tickets/mo",
  "billing_mode": "prepaid",
  "cycle_duration_days": 30,
  "recurring_grants": [
    {
      "credits": 5000000,
      "priority": 0,
      "expires_after_seconds": 2592000
    }
  ],
  "price_amount": 49900,
  "price_currency": "USD"
}

5,000,000mc = 5000 credits = 50 tickets at 100cr each. The 30-day expiry means retainer credits burn before no-expiry trial credits and don’t accumulate forever.

When the customer exceeds the retainer, your code grants overage packs (same pattern as AI Generation App) or bills postpaid via a separate metered plan variant.

Integration flow

1. Customer onboarding — start with free trial

POST /v1/customers
Idempotency-Key: signup:<your_user_id>

{ "external_id": "cus_acme_corp" }

POST /v1/topups/grant
Idempotency-Key: topup:trial-<your_user_id>

{
  "customer_id": "cus_...",
  "credits": 500000,
  "metadata": { "source": "free_trial", "outcomes": 5 }
}

500,000mc = 5 free tickets. Priority 0 (default), no expiry — but they’ll burn after any expiring paid credits.

2. Activate subscription on purchase

After payment confirmation for the $499/mo Pro plan:

POST /v1/subscriptions
Idempotency-Key: sub:<payment_id>

{
  "customer_id": "cus_...",
  "plan_variant_id": "var_pro_monthly",
  "external_payment_id": "stripe_sub_abc"
}

This activates the subscription and fires the initial recurring grant — 5000 credits land with priority 0, expiring at cycle end.

3. Job starts — reserve with an estimate

This is the heart of the pattern. A new ticket arrives; your agent begins work.

Step 1: Check entitlement with the estimated cost.

GET /v1/customers/cus_.../entitlements/ticket_closed?units=1

{
  "allowed": true,
  "balance": 5000000,
  "effective_balance": 5000000,
  "cost_per_unit": 100000,
  "cost_total": 100000,
  "affordable_units": 50
}

If allowed is false — or if you enforce a per-customer spend cap (e.g., max 100 outcomes per day) and the customer has hit it — reject the job before the agent burns LLM cost.

Step 2: Reserve with an estimated_units that covers the worst-case LLM spend.

POST /v1/reservations
Idempotency-Key: reservation:<job_id>

{
  "customer_id": "cus_...",
  "billable_metric_key": "ticket_closed",
  "estimated_units": 1,
  "ttl_seconds": 7200,
  "metadata": {
    "job_id": "job_abc123",
    "ticket_id": "tkt_998",
    "agent_version": "v3.2"
  }
}

TTL is different from AI generation. A generation run takes 30–90 seconds — 300s TTL is fine. An agent workflow may take hours. Set ttl_seconds to 2–3× the 95th-percentile job duration. 7200 (2 hours) is a safe default; go up to 3600 × 6 for long research tasks. The reaper catches abandoned reservations if a worker crashes or gets stuck.

Response:

{
  "id": "res_...",
  "status": "active",
  "estimated_units": 1,
  "estimated_cost": 100000,
  "expires_at": "2026-04-17T10:00:00Z",
  "effective_balance_after": 4900000
}

The hold drops the effective balance immediately — so a concurrent job request sees less available, and your spend-cap check stays accurate.

4. Agent runs — LLM calls, tool calls, human-in-the-loop

Your agent does its work. It may call the LLM 1 time or 200 times. It may need a human to approve a step. It may take 30 seconds or 6 hours. None of this touches QuotaStack — the reservation is holding the credit slot.

Track the actual LLM spend as you go. At the end, you’ll know the real cost.

5a. Job completes — commit actual cost

If the agent closed the ticket successfully:

POST /v1/reservations/res_.../commit
Idempotency-Key: commit:res_...

{
  "actual_units": 1,
  "metadata": {
    "llm_cost_usd": 0.42,
    "llm_tokens_in": 18200,
    "llm_tokens_out": 4100,
    "tool_calls": 12,
    "job_duration_ms": 340000,
    "outcome_verified_by": "resolver_v2"
  }
}

Cost adjustment at commit. If your job turned out much more expensive than estimated (say 3× tokens), you can pass actual_units: 3 at commit — the ledger records 300,000mc burned instead of 100,000mc. Most agent businesses keep actual_units: 1 (one outcome = one unit) and absorb the variance; sophisticated ones tier outcomes by complexity (easy/medium/hard) with different credit costs and commit to the tier that matched reality.

5b. Job fails — release, customer not charged

If the agent couldn’t close the ticket (missing data, escalation to human, unresolvable):

POST /v1/reservations/res_.../release
Idempotency-Key: release:res_...

{}

The 100,000mc hold is released. The customer’s balance returns. This is the core promise of outcome-based billing: no outcome, no charge. This is a feature you can advertise on your pricing page.

6. Enforce per-customer spend caps

The “one customer blows up our LLM bill” nightmare. Solve it with entitlement policies at the customer level:

PATCH /v1/customers/cus_.../limits

{
  "daily_credit_cap": 1000000,
  "monthly_credit_cap": 15000000
}

(Or model this in your application layer by checking daily-summed usage before reserving.) Entitlement checks will now return allowed: false when the cap is hit, and your agent refuses to start new jobs until the next cycle or a cap lift. This keeps a rogue customer from running 10,000 tickets in an afternoon.

Worked example: full AI SDR job handler

def handle_new_lead(customer_id, lead):
    job_id = generate_uuid()

    # 1. Entitlement — is this customer allowed to generate another outbound email?
    ent = quotastack.get_entitlement(customer_id, "email_sent", units=1)
    if not ent.allowed:
        return skip_lead(lead, reason="customer_out_of_credits")

    # 1b. Spend-cap guard — apply our own daily cap
    if daily_usage(customer_id) >= DAILY_CAP:
        return skip_lead(lead, reason="daily_cap_hit")

    # 2. Reserve for up to 1 hour (SDR job includes research + draft + review)
    try:
        reservation = quotastack.reserve(
            customer_id=customer_id,
            billable_metric_key="email_sent",
            estimated_units=1,
            ttl_seconds=3600,
            idempotency_key=f"reservation:{job_id}",
            metadata={"job_id": job_id, "lead_id": lead.id}
        )
    except InsufficientCreditsError:
        return skip_lead(lead, reason="race_condition")

    # 3. Run the multi-step agent workflow
    try:
        research   = agent.research_company(lead.company)
        draft      = agent.draft_email(lead, research)
        approved   = agent.review_and_approve(draft)
        sent       = mail_service.send(lead.email, approved.body)
    except AgentFailedError as e:
        quotastack.release(
            reservation_id=reservation.id,
            idempotency_key=f"release:{reservation.id}"
        )
        return failed(job_id, reason=str(e))

    # 4. Commit — record the real cost telemetry
    quotastack.commit(
        reservation_id=reservation.id,
        actual_units=1,
        idempotency_key=f"commit:{reservation.id}",
        metadata={
            "llm_cost_usd": agent.total_cost,
            "tool_calls": agent.tool_count,
            "duration_ms": agent.duration_ms,
        }
    )
    return completed(job_id, sent_email_id=sent.id)

Pricing pages your customers can actually compare

Because your unit of value is the outcome, your pricing page can be one line:

Pro — $499/mo, 50 outcomes. Additional outcomes $12 each. No outcome, no charge.

Not “$0.02 per 1k input tokens, $0.06 per 1k output tokens, billing accurate to the penny” — nobody outside of infra buyers reads that. Outcome pricing is what the services-dollar buyer understands. QuotaStack is the engine that makes it real behind the scenes.

Tips

Estimate conservatively; commit accurately. A reservation that’s too small means concurrent jobs may blow past the customer’s budget before the commit lands. Err high on the estimate, refund the delta at commit time by reducing actual_units.
Model outcome tiers explicitly. If you know your jobs span a 20× cost range, don’t average — define ticket_closed_easy, ticket_closed_medium, ticket_closed_hard as separate metrics with different credit costs, and let the agent classify complexity at commit time. Customers see a single “outcome” unit on their invoice; you see the variance.
Track LLM cost in reservation metadata, not as separate billing. The customer pays for outcomes, not tokens. But YOU need the LLM cost data for gross-margin reporting. Put it in the commit metadata and query it later via the Audit Log.
Use subscriptions for retainers, not per-outcome billing. The plan_variant.recurring_grants pattern gives you predictable monthly revenue + a fair rollover story. Pure per-outcome billing is fine for pay-as-you-go tiers but painful for customer forecasting.
Fire reservation.expired webhooks into your ops channel. A reservation expiring before commit usually means an agent worker died mid-job. Worth paging on-call.
Rollover vs. reset is a policy knob. Services-firm customers expect to “use what they paid for” — rollover reduces churn. AI-native challengers often don’t offer it, arguing credits are cheap. Model your market and pick; QuotaStack supports both via rollover_percentage on the plan variant.

Why this pattern wins over seat pricing

Seat pricing is aligned to effort (engineers hired = seats needed). Outcome pricing is aligned to value (tickets closed = revenue delivered). Seat pricing breaks when your customer gets more efficient — they reduce seats and your revenue shrinks. Outcome pricing scales with their success — they close more tickets, you make more revenue.

This is the macro shift Sequoia’s services-dollar thesis names. The technical substrate that makes outcome pricing possible — reservations, commits, releases, spend caps, variable credit costs — is exactly what QuotaStack gives you.

● raw markdown · ai-agent-company.md ↗ — this is exactly what an agent fetches

Loading…