Skip to content
Blog

GenAI unit economics: a CFO/COO model to keep margins intact

A practical framework to model, forecast, and cap GenAI costs (tokens, routing, caching, quotas) so AI adoption improves productivity without creating an unbounded variable-expense line.

January 29, 2026Justin MustermanJustin Musterman · Technology and Marketing ExecutiveLinkedIn

Executive summary

If you adopt GenAI broadly, your cost structure changes. What looks like a “software seat” problem becomes a variable-cost problem: requests drive tokens, tokens drive compute, compute drives dollars.

That’s not bad — it just needs the same discipline you apply to cloud spend, payments, or customer support: unit economics, guardrails, and monitoring.

This post gives you a CFO/COO-friendly model you can drop into a spreadsheet, plus the three biggest margin-protection levers:

  1. Routing (send each task to the cheapest model that meets quality)
  2. Caching (don’t pay twice for the same work)
  3. Caps & quotas (budgetable AI, by team and workflow)

The core model (requests → tokens → $)

At the simplest level:

  • Requests: How many AI calls you make (per day/week/month)
  • Tokens: How “big” each call is (input + output)
  • Cost per token: What the provider charges for the model you used

A basic cost equation:

Monthly AI Cost = Requests × Avg Tokens/Request × Cost/Token

In practice, you’ll model by workflow, because workflows have different volumes and tolerance for latency/quality.

A practical worksheet structure

Create a table with rows as workflows (examples below) and columns:

  • Workflow name
  • Team owner
  • Requests/month
  • Avg input tokens
  • Avg output tokens
  • Total tokens/request
  • Model used (or route mix)
  • Effective cost/token
  • Monthly cost
  • Success metric (time saved, cycle time, error rate)

Example workflows to start with:

  • Sales: inbound lead qualification + draft reply
  • Finance: vendor invoice triage + coding suggestion
  • Ops: weekly KPI narrative + anomaly explanation
  • CS: ticket categorization + suggested response

Why averages are dangerous (and how to account for variance)

Most teams underestimate variance:

  • Some requests are tiny (“rewrite this sentence”).
  • Some requests are huge (long context + long output).
  • Some requests trigger tool calls or multi-step agent loops.

Two fixes:

  1. Track p50 and p95 tokens/request per workflow (not just average).
  2. Add a “loop multiplier” for agentic workflows:

Effective tokens = tokens/request × average steps/run

If an “agent” does 6 model calls per run, your cost is 6× even if each call is modest.

Margin framing: treat GenAI like cloud + labor, not like SaaS

CFO/COO framing that holds up in a board meeting:

  • GenAI replaces or augments labor hours.
  • It also consumes variable compute.
  • Your goal is to ensure the compute line grows slower than the value created.

A useful metric:

$/hour saved = Monthly AI cost / Hours saved

Then compare that to the fully loaded cost of the role(s) impacted, and adjust for quality risk.

If the AI costs $12K/month and saves 600 hours/month, that’s $20/hour saved. If the impacted time is worth $80–$150/hour fully loaded, you have room.

But only if quality and control are real.

The three biggest levers to protect margin

1) Routing: cheapest model that meets quality

Most companies overspend by sending everything to the best model. Instead, implement a routing policy:

  • “Good enough” tasks → cheaper/faster model
  • High-stakes tasks → best model
  • Sensitive tasks → restricted model/tooling

A simple three-tier policy:

  • Tier A (low risk): drafting, summarizing, formatting
  • Tier B (medium risk): analysis with citations, structured extraction
  • Tier C (high risk): customer-facing commitments, finance approvals, legal language

Routing rules should be explicit and auditable.

Result: you often cut costs 30–70% with minimal quality loss.

2) Caching: don’t pay twice

Two types of caching matter:

  • Response caching: identical prompt + identical context → reuse output
  • Embedding / retrieval caching: reuse retrieved context, don’t re-embed unchanged docs

Where caching pays immediately:

  • Knowledge-base Q&A
  • Policy questions (expense policy, refund policy)
  • Standard operating procedures
  • Repeatable internal reporting narratives

Caching turns “variable cost” into “mostly fixed” for repeat queries.

3) Caps & quotas: budgetable AI

If GenAI is going to be a real operating capability, it must be budgetable. Set caps at multiple levels:

  • Per user (daily token budgets)
  • Per team (monthly budgets)
  • Per workflow (hard ceilings)
  • Per tool (e.g., max calls to the premium model)

Add “graceful degradation”:

  • When a cap is hit, route to cheaper model
  • Or require approval for premium model
  • Or pause non-critical workflows

This avoids the classic failure mode: AI adoption succeeds, and your cost spikes with it.

Governance: the minimum viable controls

You don’t need a bureaucracy, but you do need a control plane. Minimum viable governance for CFO/COO comfort:

  • Usage telemetry by workflow/team
  • Cost attribution (who/what is spending)
  • Quality checks (sampling + clear failure categories)
  • Audit logs for agentic actions
  • Change control for model upgrades and prompt changes
  • Kill switch for runaway workflows

If you can’t answer “what changed?” when cost or quality shifts, you don’t have control.

A rollout pattern that reduces risk

A practical pattern for scaling without surprises:

  1. Start with 3–5 workflows that are repeatable and measurable.
  2. Instrument requests/tokens/cost from day 1.
  3. Set caps early (even if generous).
  4. Implement routing before you scale.
  5. Add caching once you see repeat patterns.
  6. Review weekly for 4–6 weeks; then move to monthly cadence.

What to do next (fast)

If you want this to be real (not a demo), do two things this week:

  1. Build a workflow cost table (10 rows max) with requests/tokens/cost.
  2. Decide your authority ladder for AI actions:
    • Read-only
    • Draft-only
    • Execute with approval
    • Execute with caps

That’s enough to unlock meaningful adoption while keeping margins protected.

If your team wants help building the cost model and the guardrails, CDS can deliver a lightweight “AI unit economics + governance” sprint that results in:

  • a unit-econ dashboard,
  • routing + quota policies,
  • and 1–2 high-ROI workflows running in production.

Related services

Keep exploring the work behind the insight.

See the services and outcomes that connect to this topic.

AI enablement

Turn AI pressure into a prioritized roadmap with measurable outcomes.

View service

Corporate development

Structure partnerships and strategic deals that unlock distribution.

View service

Case studies

Review operator-led outcomes across partnerships, product, and delivery.

View case studies

Want more operator insights?

Join the list to get new posts and case studies as they publish.