Skip to content

Architecture

ThinkWork is the open Agent Harness for Business. The harness is the engineered structure around the model that turns raw, non-deterministic agent output into reliable, traceable, auditable work. This page is the architecture of that harness — first the mechanics that make it a harness, then the components, then the AWS deployment topology.

There are three useful views of the system:

  1. The harness mechanics — the agent loop, the four operating guarantees the runtime enforces, the metaphor that ties it together
  2. The conceptual model — how the core components fit together to do work
  3. The infrastructure model — what gets deployed in your AWS

Every resource lives in your AWS account. There is no shared infrastructure, no callbacks to external control planes, and no telemetry sent outside your account. That matters because ThinkWork is not just trying to run agents — it is trying to give customers an open, customer-owned harness for AI work.

A model alone is a wild horse. The harness is what turns its motion into useful work — direction, control, recovery, accountability. Every concept in the rest of this page (Agents, Spaces, Threads, Memory, Integrations, Automations, Control) earns its place by implementing one or more pieces of that harness. This section names the mechanics directly.

Every agent turn in ThinkWork follows the same four-phase cycle: Perception, Planning, Action, Feedback (PPAF). The harness implements each phase as a distinct, observable surface — that’s what makes turns reproducible and auditable instead of opaque.

flowchart LR
P[Perception<br/><span style="font-size:11px">Thread + Memory + tool catalog</span>] --> Pl[Planning<br/><span style="font-size:11px">Model invocation in AgentCore</span>]
Pl --> A[Action<br/><span style="font-size:11px">Tool calls, sandbox execution, replies</span>]
A --> F[Feedback<br/><span style="font-size:11px">Turn record, cost, evals, audit</span>]
F -.->|next turn| P
  • Perception. The harness assembles the agent’s view of the world: the thread’s prior messages, Memory context, the tool catalog, and current channel context. This is what Threads + Memory + Integrations produce together — a structured snapshot the model reasons over.
  • Planning. The model (via the selected AgentCore runtime on Bedrock) decides what to do next. ThinkWork doesn’t replace the model’s reasoning; it bounds the inputs and channels the outputs.
  • Action. Tool calls fire (with capability gating), sandbox code executes (with isolation), replies emit (through integrations). Every action is captured as it happens.
  • Feedback. The turn closes with a durable record: tokens, cost, tool outcomes, guardrail decisions, evaluator scores. That feedback is what the next turn perceives — and what an operator reads after the fact.

The four phases are why the docs structure the components the way it does: Threads + Memory + Integrations are the perception substrate; Agents are the planning + action layer; Control is the feedback layer (audit, cost, evaluation). Automations time the loop’s entry from the outside.

Beyond the loop, the harness has four operational properties it commits to delivering on every turn — the operating guarantees. These are the four promises a buyer or operator can grab onto and the four dimensions an evaluator can measure:

  • Reliability. Fault recovery from checkpoints, idempotent writes, behavior consistent under the same inputs. Implemented in: thread durability + RLS, AgentCore retry semantics, Step Functions for automations.
  • Efficiency. Token budgets and Space/tenant spend caps, low-latency interactive paths, throughput that scales with usage. Implemented in: Control’s budget enforcement, AppSync for streaming, the cost ledger.
  • Security. Per-agent capability grants, sandboxed execution, I/O filtering for prompt injection and PII. Implemented in: Templates, AgentCore code sandbox, Bedrock Guardrails (content_filters), and runtime-level tool gating.
  • Traceability. End-to-end traces per turn, explainable decisions, auditable state. Implemented in: the turn record, the append-only audit log on S3, evaluator scores per turn.

The five governance controls in Control map 1:N to these four operating guarantees — for example, “Runs in your AWS” implements Security and Traceability; “Cost control and analysis” implements Efficiency. The operating guarantees are the language; the controls are the implementation.

State separation: model as compute, harness as state

Section titled “State separation: model as compute, harness as state”

A single principle underlies everything below: ThinkWork treats the model as stateless compute. Durable state lives in the harness — threads, memory, audit, cost, policies, and execution records.

This is the cleanest way to explain why the product exists. A model returns a response and forgets. The harness remembers everything that surrounded the response: which thread it ran inside, what memory was assembled, which tools fired, what the guardrail decided, what the cost was, what the evaluator scored. None of that lives in the model. All of it lives in the harness.

That separation is what makes turns reproducible (re-run a turn against a different model, the harness state is still intact), auditable (every turn has a durable record independent of what the model returned), and recoverable (a failed turn is a thread state, not a lost conversation). It also makes the model swappable — rotating from Sonnet to Opus is a template-id change because the harness owns everything around the model.

The conceptual model maps onto the harness

Section titled “The conceptual model maps onto the harness”

The conceptual model below — Agents, Spaces, Threads, Memory, Integrations, Automations, Control — is what most of the rest of this site documents. Reading the components against the harness mechanics keeps the picture coherent: every component is solving a piece of the harness problem (state, perception, planning, action, feedback, governance), not floating in isolation.

Before getting into the deployment tiers, it helps to frame the runtime in product terms:

External systems and users
Threads (per Space)
Memory
Tenant platform agent
Integrations and responses back out

Threads are the durable record of work. User chats, integration events, emails, and automations all become threads with history, status, metadata, and auditability.

See Threads.

Memory is the context layer. It determines what gets surfaced into the current turn beyond the latest message, and over time it should be portable through a ThinkWork-owned contract rather than defined by any one backend.

In the current open source app, that mainly means:

  • thread history selected for the context window
  • document retrieval through Bedrock Knowledge Bases
  • long-term memory recall through AWS AgentCore LongTerm memory by default, or Hindsight when configured
  • context assembly before model invocation

The default long-term memory setup includes semantic, summarization, user-preference, and episodic strategies.

See Memory.

Agents are the execution layer. The tenant platform agent receives a thread plus assembled context, decides what to do, calls tools, and produces a response. Per-Space configuration (CONTEXT.md, skills, MCP) is rendered into the workspace the agent runs against.

In managed mode, this execution happens in AgentCore. Strands is the default runtime; Pi is a parallel runtime substrate that individual agents or templates can opt into. The surrounding harness remains ThinkWork’s either way. That distinction is important: managed does not mean vendor-hosted.

See Agents.

Integrations are the integration boundary. Some integrations bring inbound events into threads, and some expose external tools for agents to call.

This includes:

  • channel and event integrations such as Slack, GitHub, and Google Workspace
  • tool integrations, including MCP Tools

See Integrations.

┌─────────────────────────────────────────────────────────────┐
│ App Tier │
│ AppSync · API Gateway · AgentCore runtimes · Crons · SES │
│ CloudFront · Integration Lambdas · Step Functions │
├─────────────────────────────────────────────────────────────┤
│ Data Tier │
│ Aurora Postgres (pgvector) · S3 (skills, KB, logs) │
│ Bedrock KB · Secrets Manager │
├─────────────────────────────────────────────────────────────┤
│ Foundation Tier │
│ VPC · Subnets · Cognito · KMS · Route53 · ACM · SES Setup │
└─────────────────────────────────────────────────────────────┘

The foundation tier provides identity, networking, and encryption. It changes rarely and is the most stable part of the deployment.

ResourcePurpose
VPC + subnetsIsolated network with public and private subnets across 2 AZs
NAT GatewayOutbound internet access for private subnet resources
Cognito User PoolUser authentication and JWT issuance
Cognito Identity PoolMaps Cognito users to IAM roles for direct AWS resource access
KMS keysEncryption at rest for app data, audit logs, and credential vault
Route53 recordsDNS for admin app, API, and email
ACM certificatesTLS for CloudFront and API Gateway custom domains
SES domain identityVerified sending domain for outbound email

The data tier holds all persistent state. It depends on the foundation tier for network access and KMS encryption.

ResourcePurpose
Aurora PostgresPrimary data store: agents, threads, messages, automations, integrations, users. Also hosts the pgvector index used by Bedrock Knowledge Bases — no separate vector DB
S3 — skill catalogskills/catalog/*.md — skill packs loaded at invoke time
S3 — knowledge docsSource documents for Bedrock Knowledge Bases
S3 — audit logsAppend-only log of every agent invoke (NDJSON, partitioned by date)
S3 — assetsAdmin and end-user app static files
Bedrock Knowledge BaseVector-indexed document store for inline RAG, backed by Aurora pgvector
Secrets ManagerDB credentials, OAuth client secrets

The app tier is where computation happens. It depends on both lower tiers.

ResourcePurpose
AppSync GraphQL APIReal-time subscriptions (WebSocket), used for streaming responses
API Gateway v2HTTP queries and mutations, integration webhook ingress
AgentCore runtimesContainer-based managed runtimes: Strands (default) and Pi (opt-in), both on Bedrock AgentCore
Integration LambdasOne per integration (GitHub, Google) — handles inbound events
Step FunctionsAutomation runner, routine executor
EventBridgeTriggers for scheduled automations
Bedrock AgentCore MemoryAlways on — automatic per-turn retention into four strategies (semantic, preferences, summaries, episodes)
ECS Fargate (optional)Hindsight memory add-on (if enable_hindsight = true)
CloudFrontCDN for admin app, end-user app static files
SES (sending)Outbound email from agent responses

A full round trip from user message to agent response:

1. User sends message
└─ POST /graphql (API Gateway)
└─ JWT validated by Cognito authorizer
└─ createMessage resolver → Aurora (writes message record)
└─ Triggers the runtime dispatcher
2. Runtime dispatcher resolves the target
└─ Reads agent/template runtime selector
└─ Chooses Strands or Pi AgentCore function
3. Selected AgentCore runtime receives event
└─ Reads thread history from Aurora
└─ Downloads assigned skill packs from S3
└─ Queries Bedrock Knowledge Base (if assigned) → retrieves relevant chunks
└─ Agent tools read long-term memories from AgentCore Memory (always on) via the `recall()` tool, and optionally from Hindsight (ECS) when `enable_hindsight = true`
└─ After the turn completes, the container auto-emits a CreateEvent into AgentCore Memory so background strategies extract facts for future recall
└─ Builds context: system prompt + selected history + retrieved knowledge + recalled memory + tool config
4. Bedrock inference
└─ Selected runtime sends context + message to Bedrock
└─ Model may request tool calls
└─ Tools execute (SQL, S3, HTTP, skill-defined functions)
└─ Tool results injected, model generates final response
5. Response delivery
└─ AgentCore writes response message to Aurora
└─ Publishes AppSync mutation → NewMessageEvent subscription
└─ Stream chunks published in real time via AppSync
└─ Thread status updated in Aurora
6. Client receives response
└─ AppSync WebSocket delivers StreamChunkEvent chunks
└─ Final NewMessageEvent marks completion
External service (Slack, GitHub, etc.)
→ POST /integrations/<id>/webhook (API Gateway)
→ Integration Lambda
└─ Validates signature
└─ Writes thread record to Aurora (channel=SLACK, metadata={...})
└─ Invokes AgentCore (same path as user message above)
→ Agent response
→ Outbound integration or tool call posts reply back to Slack/GitHub/etc.
Selected AgentCore runtime receives thread + assembled context
→ Resolves enabled tool integrations from template/agent config
→ Connects to MCP server over HTTP streaming or SSE
→ Discovers available tools for this invocation
→ Model calls MCP tool when needed
→ Tool result returns into the same turn
→ Final response written back to the thread
External system (Linear, Jira, etc.)
→ POST /integrations/<provider>/webhook (API Gateway)
→ Webhooks Lambda
└─ Resolves webhook by token (target_type=task, provider id)
└─ adapter.verifySignature (opt-in, provider-specific)
└─ adapter.normalizeEvent → NormalizedEvent
└─ Resolves the connection via providerUserId
└─ Resolves the per-user MCP token (auto-refresh on expiry)
└─ adapter.refresh() → live envelope
│ (synthetic envelope fallback if refresh fails)
└─ Upserts external-task thread (channel=TASK, metadata.external.latestEnvelope)
└─ Writes system message (metadata.kind="external_task_event")
└─ Awaits [notifyNewMessage, notifyThreadUpdate, sendExternalTaskPush]
→ Mobile task card refreshes via AppSync subscription (~2s)
→ Push notification fires only for assignment / status / due changes
EventBridge scheduled rule (cron)
→ Step Functions state machine starts
→ Creates AUTO- thread in Aurora
→ Invokes AgentCore with configured prompt
→ (same agent loop as above)
→ Step Functions records execution result
→ Thread marked closed (or failed)

This split is useful to keep in mind:

  • Threads preserve the canonical record of work in Aurora
  • Memory combines persisted sources like documents and memories with retrieval-time assembly
  • Agents are mostly stateless between invocations aside from their configuration
  • Integrations store credentials and integration configuration, but the resulting work still lands in threads
WhatWhereBackup
Agents, threads, messagesAurora PostgresAutomated daily snapshots (7-day retention)
User accountsCognito User PoolCognito-managed, multi-AZ
Skill packsS3 (skill catalog bucket)S3 versioning enabled
Knowledge documentsS3 (knowledge bucket)S3 versioning enabled
Audit logsS3 (audit log bucket)S3 versioning + lifecycle to Glacier after 90d
Memories (managed)Aurora PostgresSame as above
Memories (Hindsight)Aurora Postgres + ECS in-flight processingSame as above
OAuth tokens / API keysSSM Parameter Store (SecureString, KMS)SSM-managed
Terraform stateS3 (tfstate bucket) + DynamoDB (lock table)S3 versioning enabled
Vector indexAurora Postgres (pgvector)Same as Aurora above

ThinkWork is multi-tenant within a single deployment. Every Aurora table has a tenant_id column and all queries are tenant-scoped. The Cognito identity pool maps users to tenants at login time.

Row-level security (Postgres RLS) enforces tenant isolation at the database level — even if application code has a bug, a query cannot return data from another tenant.

-- RLS policy (applied automatically by ThinkWork migrations)
CREATE POLICY tenant_isolation ON threads
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);
BoundaryEnforcement
External → APICognito JWT (API Gateway authorizer)
API → AuroraIAM auth + VPC security groups
AgentCore → BedrockIAM role (least privilege)
AgentCore → S3IAM role (read skills, read KB docs, write audit logs)
AgentCore → SSMIAM role (read credentials for assigned integrations)
Tenant isolationPostgres RLS on all tables
Secrets at restKMS encryption (dedicated key per secret category)
Audit trailAppend-only S3 bucket (no delete permissions on Lambda role)

ThinkWork deploys managed runtimes as container images stored in ECR and hosted by Bedrock AgentCore. The runtime selector chooses between the provisioned functions at invocation time.

The Strands image includes:

  • Python 3.12
  • Strands agent framework
  • Boto3 (Bedrock, S3, SSM, Secrets Manager clients)
  • httpx (for skill HTTP tools)
  • psycopg3 (Aurora connection)
  • The ThinkWork runtime library (tool registration, memory read/write, context assembly)

The Pi image includes:

  • Node.js
  • Pi runtime loop code
  • The ThinkWork invocation envelope and response contract
  • Runtime adapters needed to preserve the same thread, memory, tool, audit, and cost surfaces

Deployments pin both image tags. Upgrading either runtime is a deploy operation, not an admin web setting. Selecting which runtime an agent uses is an agent/template configuration decision.