Skip to content

Budgets, Usage, and Audit

The second half of Control is about knowing what happened and bounding how much it cost — the Efficiency and Traceability operating guarantees in concrete shipping form. Three intertwined concerns:

  • Usage tracking — per-turn cost, broken down by input tokens, output tokens, and tool call overhead. Efficiency at the per-decision level: every model dollar is attached to the turn that produced it, not aggregated into an opaque monthly invoice.
  • Budgets — tenant defaults and Space-specific spend limits that can pause new work when thresholds hit, not just warn. Efficiency as enforcement: the harness pauses execution before a runaway loop compounds into real money.
  • Audit — the durable, append-only record of every invocation that makes “what did the agent see, decide, and do?” answerable months later. Traceability at full fidelity: stored in the customer’s S3 bucket with no delete permission on the Lambda role, partitioned by tenant + date, queryable for a single operation.

They share one spine: every turn has an id, and every cost record, budget check, and audit entry references that same id.

AgentCore instruments every turn with OpenTelemetry spans. When a turn runs:

  1. A root span opens when the invocation starts.
  2. Child spans wrap each model call (converse), each tool call, and each memory operation.
  3. Every model-call span records input_tokens, output_tokens, and the model id.
  4. Every tool-call span records the tool name, arguments size, and execution latency.
  5. The root span closes when the turn finishes; the full span tree is emitted to CloudWatch.

A cost reducer — running async off the span stream — converts each span into a dollar figure using per-model, per-region pricing tables, and writes a thread_turn_costs row for the turn. The row includes the breakdown (model_cost_usd, tool_cost_usd, total_cost_usd) plus the span ids so a spot-check can walk back to the raw OTel data.

The admin app’s analytics surface reads from this table and caches results in the useCostStore — a Zustand store that the dashboard and analytics pages share so neither re-fetches cost data on navigation.

Budgets are configured at the tenant-agent baseline and can be overridden per Space:

  • Per-Space — “work in this Space can spend at most $50 per month.”
  • Per-tenant — “this tenant’s total spend across all agents can’t exceed $5,000 per month.”

Enforcement runs on a schedule, not at turn-start time. An EventBridge rule fires a budget reconciler Lambda every 15 minutes. The reconciler:

  1. Computes month-to-date spend per Space, tenant platform agent, and tenant from thread_turn_costs.
  2. Compares against the configured caps.
  3. Marks over-budget Spaces as paused for new work.
  4. Pauses tenant-level agent work when the tenant cap is exceeded.
  5. Writes a budget_enforced audit record for each pause.

Why reconciliation instead of turn-time blocking: the cost of every turn is known at span-close time, not span-open time, and ThinkWork doesn’t want to pre-emptively reserve budget (which is a different and uglier problem). The 15-minute reconciler is the explicit tradeoff — budgets are enforced eventually, not instantaneously, with the expectation that the cap is set with some headroom.

Unpausing is an operator action in the relevant Space or tenant-agent configuration, or an automatic reset at the start of the next billing period when configured.

Every state-changing operation in ThinkWork emits an NDJSON record to an append-only S3 bucket. The records cover:

  • Agent turn invocations (full context assembled, full response generated, tool calls issued, memory reads performed).
  • Guardrail activations (which filter fired, what action was taken).
  • Tenant configuration changes (template edits, agent creation, integration updates).
  • Admin user actions (who approved which inbox item, who paused which agent).
  • Budget enforcement events.

The bucket partitioning is tenant_id/YYYY-MM-DD/ so a compliance query for one tenant’s March activity is a single S3 Select over one prefix. The bucket has no DeleteObject permission on the Lambda role that writes — records are cryptographically immutable at the IAM level.

Retention is handled via S3 Lifecycle: audit records transition to S3 Glacier after 90 days (configurable per deployment) and purge after the tenant’s configured retention window (default: never, but many tenants configure 7 years for compliance).

When a question arrives about a specific turn — “why did this agent say that on April 2nd?” — the reconstruction path is:

  1. Find the threadId and approximate timestamp.
  2. aws s3 select over the audit prefix for the tenant + date.
  3. Filter to records matching the threadId.
  4. Read the turn record — it contains the full context that was assembled (thread history, retrieved documents, recalled memories, tool config), the full model response, every tool call with input + output, and every guardrail evaluation.

You don’t need the live database. You don’t need the agent to still exist. You don’t need the memory engine to still be running. The audit record is self-contained.

Worth being honest about:

  • Read operations are not audited. Listing threads, viewing a dashboard, or reading a wiki page doesn’t write to the audit log. Only state-changing operations do.
  • Bedrock’s own logs. AWS CloudTrail records every Bedrock API call separately. ThinkWork’s audit log records the ThinkWork-level invocation, not the raw Bedrock API call.
  • Model prompts verbatim aren’t stored forever by default. The audit record stores the assembled context — the structured pieces of it — not every token of the final rendered prompt. This is a tradeoff: full prompt text would ~double the audit volume. The assembled context is enough for reconstruction in almost all cases; if full prompts are needed for compliance, there’s a deployment flag to enable verbose prompt capture.
  • Budget enforcement is eventually-consistent (15 min). A misbehaving agent can exceed its cap by a meaningful amount before the reconciler catches it. Set caps with headroom.
  • No per-user budgets. Budgets are Space/tenant scoped. A runaway end user hammering one Space can eat that Space’s budget if the cap is higher than the single-user expected load.
  • No budget alerts pre-enforcement. The reconciler pauses agents; it doesn’t warn operators at 80%. A Slack/email notification path is a planned follow-up.
  • Cost numbers are estimates. Per-turn cost uses published Bedrock pricing, which doesn’t account for cross-region overages, on-demand vs. provisioned throughput, or reserved capacity. The numbers are directionally correct for budgeting; the canonical cost record is the AWS bill.
  • Cost capture. The AgentCore runtime (in packages/agentcore-strands/agent-container/) instruments every turn with OpenTelemetry spans. A downstream reducer turns those spans into per-turn cost rows in the operational database.
  • Budget enforcement. The budget handler runs on a schedule, computes month-to-date spend against configured caps, and pauses over-budget scopes.
  • Pause mechanism. Space-level pause is part of Space runtime policy; tenant-level pause is stored with the platform-agent budget state. Admin and mobile surfaces react to the change via real-time subscriptions.
  • Audit storage. NDJSON records land in an append-only S3 bucket provisioned by the ThinkWork Terraform module, with IAM denying deletes on the writer role. Lifecycle + retention is configurable per deployment.

For the admin operator views of analytics and budgets, see Admin: Analytics and Admin: Security Center.