Budgets, Usage, and Audit

The second half of Control is about knowing what happened and bounding how much it cost — the Efficiency and Traceability operating guarantees in concrete shipping form. Three intertwined concerns:

Usage tracking — per-turn cost, broken down by input tokens, output tokens, and tool call overhead. Efficiency at the per-decision level: every model dollar is attached to the turn that produced it, not aggregated into an opaque monthly invoice.
Budgets — tenant defaults and user-specific spend limits that can pause new work when thresholds hit, not just warn. Efficiency as enforcement: the harness pauses execution before a runaway loop compounds into real money.
Audit — the durable, append-only record of every invocation that makes “what did the agent see, decide, and do?” answerable months later. Traceability at full fidelity: stored in the customer’s S3 bucket with no delete permission on the Lambda role, partitioned by tenant + date, queryable for a single operation.

They share one spine: every turn has an id, and every trace event, cost event, budget check, and audit entry references that same id.

How per-turn cost is captured

AgentCore and the ThinkWork API now write a canonical trace and accounting ledger. When a turn runs:

Runtime/finalize records create trace_runs and trace_events for the turn, model calls, tool calls, workspace hydration, profile lanes, and response finalization.
Each event stores a summary payload and a source-evidence reference. Raw provider payloads stay in their owning systems; eval snapshots and UI projections retain safe summaries and source ids.
Runtime token and cost observations link to cost_events as runtime-reported.
Bedrock invocation logs can upgrade individual invocations to invocation-reconciled or mark mismatches.
AWS billing exports can upgrade accounting to bill-reconciled at the level the bill actually proves. Account-only or delayed bill evidence stays aggregate-only until tenant-level attribution exists.

The admin app, CLI trace commands, analytics, and eval snapshots project from that ledger instead of treating CloudWatch or the AWS bill as the primary user interface. Historical cost_events and thread_turns.usage_json rows that predate the ledger are backfilled as backfill source evidence with unreconciled/error reconciliation state. They remain visible, but they are not silently promoted to provider- or bill-grade data.

The admin app’s analytics surface reads server-side summaries of this ledger and caches results in the useCostStore — a Zustand store that the dashboard and analytics pages share so neither re-fetches cost data on navigation.

Budget enforcement

Budgets are configured at the tenant baseline and can be scoped to individual users:

Per-user — “work owned by this user can spend at most $50 per month.”
Per-tenant — “this tenant’s total spend across all users and system work can’t exceed $5,000 per month.”

Foreground user work is checked before dispatch when a user budget is already exceeded. Cost recording also updates budget status after spend is written so the turn that crosses a limit can pause user-owned background work. The budget reset job clears budget pauses at the next billing period without re-enabling work an admin disabled manually.

Budget enforcement distinguishes visible spend from enforced spend. Operators can see runtime-estimated, invocation-reconciled, bill-reconciled, mismatch, and unreconciled totals. Strict budget decisions use the configured reconciliation confidence threshold, so historical or mismatched data can be reviewed without pretending it has invoice-grade confidence.

The reconciler and runtime checks:

Compute month-to-date spend by cost_events.user_id and by tenant, grouped by reconciliation confidence.
Compare against enabled user and tenant caps.
Block foreground work for users already over budget before invoking paid runtime work.
Mark user-owned scheduled/background work as budget-paused when that user’s cap is exceeded.
Keep system/unattributed spend visible in tenant totals without assigning it to a user.

Unpausing is an operator action after raising or deleting the relevant budget policy, or an automatic reset at the start of the next billing period when configured.

The audit log

Every state-changing operation in ThinkWork emits an NDJSON record to an append-only S3 bucket. The records cover:

Agent turn invocations (full context assembled, full response generated, tool calls issued, memory reads performed).
Guardrail activations (which filter fired, what action was taken).
Tenant configuration changes (template edits, agent creation, integration updates).
Admin user actions (who approved which inbox item, who paused which agent).
Budget enforcement events.

The bucket partitioning is tenant_id/YYYY-MM-DD/ so a compliance query for one tenant’s March activity is a single S3 Select over one prefix. The bucket has no DeleteObject permission on the Lambda role that writes — records are cryptographically immutable at the IAM level.

Retention is handled via S3 Lifecycle: audit records transition to S3 Glacier after 90 days (configurable per deployment) and purge after the tenant’s configured retention window (default: never, but many tenants configure 7 years for compliance).

Incident reconstruction

When a question arrives about a specific turn — “why did this agent say that on April 2nd?” — the reconstruction path is:

Find the threadId and approximate timestamp.
aws s3 select over the audit prefix for the tenant + date.
Filter to records matching the threadId.
Read the turn record — it contains the full context that was assembled (thread history, retrieved documents, recalled memories, tool config), the full model response, every tool call with input + output, and every guardrail evaluation.

You don’t need the live database. You don’t need the agent to still exist. You don’t need the memory engine to still be running. The audit record is self-contained.

What “audit” doesn’t cover

Worth being honest about:

Read operations are not audited. Listing threads, viewing a dashboard, or reading a wiki page doesn’t write to the audit log. Only state-changing operations do.
Bedrock’s own logs. AWS CloudTrail records every Bedrock API call separately. ThinkWork’s audit log records the ThinkWork-level invocation, not the raw Bedrock API call.
Model prompts verbatim aren’t stored forever by default. The audit record stores the assembled context — the structured pieces of it — not every token of the final rendered prompt. This is a tradeoff: full prompt text would ~double the audit volume. The assembled context is enough for reconstruction in almost all cases; if full prompts are needed for compliance, there’s a deployment flag to enable verbose prompt capture.

Known limits

The crossing turn can exceed the cap. A user already over budget is blocked before new paid work starts, but the turn that pushes a user over the limit is only known after cost is recorded. Set caps with headroom.
No user self-service budget management. User budgets are operator-managed today.
No budget alerts pre-enforcement. The reconciler pauses agents; it doesn’t warn operators at 80%. A Slack/email notification path is a planned follow-up.
Runtime cost numbers are estimates. Runtime-reported per-turn cost uses observed tokens and pricing tables. Treat runtime-reported and historical backfill rows as useful operational estimates, not bill-grade accounting.
Bedrock log gaps are explicit. If provider invocation logs are delayed, unavailable, or ambiguous, the reconciliation state remains unreconciled/error or mismatch; the UI should not hide that gap.
Billing exports lag. AWS billing export delivery is delayed and aggregate. Until a line item has tenant-level attribution, it should inform aggregate review without upgrading individual turns to bill-reconciled.

Control (overview) — where this fits
Guardrails — the other half of Control
Admin: Analytics — the operator view of tenant, user, and system spend
Admin: Security Center — guardrail events and incident review
Architecture — where audit storage lives in the deployment

Under the hood

Cost capture. The AgentCore Pi runtime and finalize path write runtime-observed trace events and cost events. Bedrock invocation logs and AWS billing exports add later reconciliation facts when they are available.
Trace ledger. trace_runs, trace_events, trace_source_evidence, and trace_cost_reconciliation_facts hold canonical execution and accounting evidence. cost_events remains the compatibility projection for existing cost APIs.
Budget enforcement. Runtime gates and budget helpers compute confidence-aware month-to-date spend against configured caps, then block or pause over-budget user-owned work.
Pause mechanism. User-owned scheduled work stores explicit budget pause state so monthly reset and manual admin disable remain distinct. Tenant-level budget state remains tenant-scoped.
Audit storage. NDJSON records land in an append-only S3 bucket provisioned by the ThinkWork Terraform module, with IAM denying deletes on the writer role. Lifecycle + retention is configurable per deployment.

For the admin operator views of analytics and budgets, see Admin: Analytics and Admin: Security Center.