Run receipts: the missing audit layer for AI agents (a CFO/COO checklist)

Executive summary

If you deploy AI agents that can take actions (not just draft text), you’ve created a new operational layer.

That layer can be powerful — it can compress cycle time, reduce manual work, and create real headcount efficiency. But it also introduces a failure mode CFOs and COOs recognize immediately:

You can’t manage what you can’t audit.

Most agent deployments fail governance not because the models are “unsafe,” but because they are unauditable:

you can’t explain why an outcome happened,
you can’t reproduce it,
you can’t attribute cost,
and you can’t prove the system stayed within policy.

The fix is simple conceptually:

Require a run receipt for every agent run.
Add a verification step before execution.
Put approval gates anywhere the agent touches customers, money, or system-of-record updates.
Add reconciliation after execution to ensure the world matches the agent’s assumptions.

This post gives you a practical, CFO/COO-friendly checklist to implement those controls without turning your AI program into bureaucracy.

Why “agents” change the governance game

A chatbot is a suggestion engine. An agent is a workflow participant.

The difference is not the model — it’s the authority.

Agents often do some combination of:

reading internal systems (CRM, ticketing, finance tools),
calling APIs (billing, procurement, scheduling),
drafting artifacts (emails, tickets, invoices, reports),
and sometimes executing changes (updating records, triggering payments, contacting customers).

The moment an agent can execute, you need the same disciplines you already apply to:

software releases,
financial approvals,
customer communications,
and operational controls.

The concept: a “run receipt”

A run receipt is a minimal, structured record of what happened in an agent run. Think of it like the receipt you get after a card transaction:

it proves the action occurred,
it captures key metadata,
and it makes reconciliation possible.

A good receipt is not a giant transcript. It’s a compact, queryable record that enables:

auditability,
debugging,
cost attribution,
and governance.

The minimum viable run receipt (MV-RR)

For most internal workflows, the minimum viable run receipt should include:

1) Identity

timestamp
workflow name
workflow version (prompt + tool config)
initiating user or system
environment (prod vs staging)

2) Inputs

input payload (or a hash/pointer if sensitive)
data sources used (systems + record IDs)
retrieval results summary (which docs/records were consulted)

3) Decisions

key assumptions (explicit)
policy checks performed (what rule set)
confidence signals (even if qualitative)

4) Tool calls

For each tool/API call:

tool name
parameters (redacted where needed)
response status
response summary

5) Outputs

drafts produced
records updated (IDs)
messages prepared or sent

6) Human involvement

whether a human approved
what changed during review (diff summary)
who approved and when

7) Exceptions

error category
fallback path taken
escalation destination (queue/person)

If you can capture those seven sections consistently, you can scale governance later.

The execution pattern: propose → verify → execute → reconcile

A reliable automation pattern looks like payments, not like chat.

Step 1: Propose

The agent should propose a plan and the intended actions. Examples:

“Update these 5 CRM fields on Deal #123.”
“Send this customer an invoice reminder using Template B.”
“Create three Jira tickets and assign them to the Ops queue.”

The proposal becomes part of the receipt.

Step 2: Verify (before execution)

Before the agent executes, run a verification step.

This can be:

rules-based checks (schema validation, numeric checks, allowlists),
a second model acting as a critic (“does this violate policy?”),
or a lightweight human approval.

The goal is not perfection. It’s to catch obvious failures early.

Step 3: Execute

Only execute actions that are explicitly allowed by policy. Execution should be the smallest, safest step possible:

prefer idempotent operations,
prefer reversible changes,
and prefer small batches.

Step 4: Reconcile (after execution)

After execution, verify reality matches the plan:

Did the record update actually persist?
Did the email send? Did it bounce?
Did the ticket get created in the correct project?
Did the payment settle?

Reconciliation closes the loop — and creates accountability.

Where CFO/COO approval gates belong

You don’t need approval gates everywhere. You need them where risk is real.

A practical rule:

Customer-facing communication → approval gate until you have proven quality
Money movement (payments, refunds, credits) → always approval or strict caps
Contracts/legal language → approval gate
HR decisions (hiring, performance, compensation) → approval gate
System-of-record updates that affect forecasting/financial reporting → approval gate

For low-risk internal drafts (meeting notes, summaries, internal reporting drafts), keep it light.

The CFO/COO scorecard: what to measure per workflow

If you want this to be finance-grade, measure per workflow:

cost per run (or per outcome)
success rate (runs completed without escalation)
escalation rate (how often humans must intervene)
time-to-complete (cycle time)
error rate / rework rate (sampled)
policy violations (should be zero)

Then set thresholds and review on a cadence.

Common failure modes (and how run receipts prevent them)

Failure mode 1: “It worked last week” drift

Model or prompt changes cause silent behavior drift. Run receipts make drift visible.

Failure mode 2: Unbounded variable cost

Agents loop. Receipts let you attribute cost by workflow and cap the expensive ones.

Failure mode 3: Blame without diagnosis

When something goes wrong, people argue. Receipts let you debug.

Failure mode 4: Scaling without trust

Executives don’t greenlight automation if they can’t audit outcomes. Receipts create trust.

A practical 30-day rollout plan

If you want to install this fast:

Week 1: Pick one workflow

high-volume, measurable
failure mode tolerable

Week 2: Add MV run receipts

log identity, inputs, tool calls, outputs, exceptions

Week 3: Add verification + a simple approval gate

rule checks + human approval for external actions

Week 4: Add reconciliation + dashboard

success rate, escalation rate, cost/run, cycle time

At the end of 30 days you should have a workflow that is not only automated, but auditable.

Closing thought

AI agents are not just productivity tools. They are a new layer of execution.

If you treat them like software — with receipts, verification, approval gates, and reconciliation — you can scale automation without turning your operating system into a black box.

If you want help implementing run receipts and governance for your highest-ROI workflows, CDS can do a tight, CFO/COO-friendly sprint that installs:

1–2 production workflows,
audit logging (“run receipts”),
approval gates where needed,
and a measurement cadence that finance trusts.