Control

Control is the harness’s governance layer — the component that turns ThinkWork from a demo into a production system, and the load-bearing implementation of the four operating guarantees (Reliability, Efficiency, Security, Traceability). It answers two questions that every real deployment has to answer:

What are agents allowed to do? Model-level safety, spend limits, tool access, role boundaries.
What actually happened when they did it? Turn-level audit, cost tracking, incident reconstruction.

Both matter equally. A system that only enforces boundaries is a black box; a system that only logs is a landmine waiting for a safety question. ThinkWork treats Control as one first-class concept instead of scattering it across three unrelated surfaces.

Operating guarantees implemented

The five governance controls below map 1:N to the four operating guarantees. The guarantees are the language a buyer or operator can grab onto; the controls are the implementation that ships in the admin web. Every guarantee has at least one shipping control behind it.

Guarantee	What it commits to	Shipping controls
Reliability	Fault recovery, idempotent writes, behavior consistent under the same inputs.	Approved agent capabilities (templates pin model + guardrails so behavior is reproducible); Security + accuracy evaluations (regression detection per template version).
Efficiency	Token budgets, Space spend caps, low-latency interactive paths.	Cost control and analysis (per-turn cost capture, Space/tenant budget enforcement that pauses at threshold).
Security	Per-agent capability grants, sandboxed execution, I/O filtering for prompt injection and PII.	Runs in your AWS (the harness deploys into customer VPC; data, IAM, and network stay in the customer’s account); Approved agent capabilities (Bedrock Guardrails attached at the template level — content filters, topic filters, PII redaction, prompt-injection detection); Security + accuracy evaluations.
Traceability	End-to-end traces per turn, explainable decisions, auditable state.	Runs in your AWS (audit log lives in customer’s S3, partitioned by tenant, no delete permission on the Lambda role); Centralized management (one admin console aggregates threads, turns, cost, guardrail activations, and audit events under the same `threadId`/`turnId` keys so incident reconstruction is one query).

The mapping is 1:N — most controls implement more than one guarantee — because the guarantees are qualities the harness commits to, and the controls are mechanisms that often serve multiple qualities at once. “Runs in your AWS” is simultaneously a Security boundary (your IAM, not ours) and a Traceability mechanism (your audit log, your retention policy). That’s intentional.

Why Control scales differently than demos

A single agent answering a single user’s questions rarely needs guardrails, rarely needs a budget, and rarely needs an audit trail — because a human is watching. A fleet of agents running across multiple tenants, handling integration events while their operator is asleep, doesn’t have that luxury.

The problems that emerge with scale:

Runaway spend. A misconfigured agent that retries on every failure, or a prompt that accidentally triggers a recursive tool call, can burn hundreds of dollars in minutes. Without a budget, nobody notices until billing arrives.
Safety boundary drift. “Don’t answer questions about X” lives in the system prompt. Every new agent, every prompt edit, every template fork risks dropping that boundary without anyone noticing. A guardrail applied at the template level doesn’t drift.
Incident retrospective. “An agent told a customer the wrong price last Tuesday.” Reconstructing that turn — what the agent saw in its context, which tool it called, what memory it recalled, what the guardrail evaluator did — requires a durable audit record. Not a log that rolled over; a durable record with the right partition key.

Control is the concept that bundles those three concerns together. One place to configure boundaries. One place to see what happened. One set of primitives that work across every agent, template, and tenant in the deployment.

What’s in Control

Guardrails

Bedrock Guardrails applied at model invocation time — topic filters, content filters, PII redaction, grounding checks — referenced from templates so the boundary applies to every agent on the template.

Budgets, Usage, and Audit

Per-turn cost capture from OTel spans, tenant-level budget enforcement that can suspend agents when thresholds hit, and an append-only audit log for every invocation.

How Control interacts with the rest of the system

Control doesn’t own agents, threads, or memory. It wraps them:

Guardrails hook into the AgentCore invocation path. When AgentCore assembles context and calls Bedrock, the guardrail evaluator runs before the response is returned to the thread. A blocked response becomes a visible event in the thread timeline — agents don’t get a silently-edited response without the system recording that it happened.
Budgets hook into the turn accounting path. Every turn’s cost is captured from OpenTelemetry spans that AgentCore emits (input tokens × model rate + output tokens × model rate + tool call costs). A tenant-level budget cap, when hit, flips the agent’s status to PAUSED rather than silently continuing.
Audit hooks into every state-changing operation. Turn creation, tool call, memory write, guardrail activation, and tenant configuration change all emit NDJSON records to an append-only S3 bucket with no delete permission on the Lambda role. The bucket is partitioned by tenant + date; a single operation with a compliance ask can pull every relevant record.

The fact that guardrail activations land in the thread timeline, cost lands on the turn, and audit lands in S3 — all addressable by the same threadId + turnId — is what makes incident reconstruction tractable. You’re not stitching three disjoint systems together.

The tenant boundary

Control is configured per tenant. A single ThinkWork deployment typically has multiple tenants (customers, internal business units, or environments), and their control surfaces are independent:

Tenant A can have strict PII redaction; tenant B can have none.
Tenant A can have a $500/month budget; tenant B can have no cap.
Tenant A’s audit log lives in its own S3 prefix.

This means one deployment can host production work, internal eval runs, and external customer workloads without their Control posture bleeding across. Row-level security on every relevant table enforces the boundary at the database level; IAM policies enforce it at the S3 level.

Guardrails — how Bedrock Guardrails integrate with the AgentCore agent loop
Budgets, Usage, and Audit — the cost pipeline and the audit log model
Admin: Security Center — the operator surface for reviewing guardrail hits and incidents
Admin: Analytics — tenant-level cost, usage, and activity views
Architecture — where audit storage and IAM boundaries live in the deployment