Skip to content

Architecture

ThinkWork has two useful views of the system:

  1. a conceptual model for reasoning about how work flows through the product
  2. an infrastructure model for understanding what gets deployed in AWS

The conceptual model is:

  • Threads are the record of work
  • Memory surfaces useful context into a turn, including document retrieval and long-term memory
  • Agents are the execution layer that decide and act
  • Connectors connect ThinkWork to outside systems, including MCP-based tool connectors

Those concepts map onto a three-tier AWS deployment. Every resource lives in your AWS account. There is no shared infrastructure, no callbacks to external control planes, and no telemetry sent outside your account.

That matters because ThinkWork is not just trying to run agents. It is trying to give customers an open, customer-owned harness for AI work.

Before getting into the deployment tiers, it helps to frame the runtime in product terms:

External systems and users
Threads
Memory
Agents
Connectors and responses back out

Threads are the durable record of work. User chats, connector events, emails, and automations all become threads with history, status, metadata, and auditability.

See Threads.

Memory is the context layer. It determines what gets surfaced into the current turn beyond the latest message, and over time it should be portable through a ThinkWork-owned contract rather than defined by any one backend.

In the current open source app, that mainly means:

  • thread history selected for the context window
  • document retrieval through Bedrock Knowledge Bases
  • long-term memory recall through AWS AgentCore LongTerm memory by default, or Hindsight when configured
  • context assembly before model invocation

The default long-term memory setup includes semantic, summarization, user-preference, and episodic strategies.

See Memory.

Agents are the execution layer. They receive a thread plus assembled context, decide what to do, call tools, and produce a response.

In managed mode, this execution happens in AgentCore, but the surrounding harness remains ThinkWork’s. That distinction is important: managed does not mean vendor-hosted.

See Agents.

Connectors are the integration boundary. Some connectors bring inbound events into threads, and some expose external tools for agents to call.

This includes:

  • channel and event connectors such as Slack, GitHub, and Google Workspace
  • tool connectors, including MCP Tools

See Connectors.

┌─────────────────────────────────────────────────────────────┐
│ App Tier │
│ AppSync · API Gateway · AgentCore Lambda · Crons · SES │
│ CloudFront · Connector Lambdas · Step Functions │
├─────────────────────────────────────────────────────────────┤
│ Data Tier │
│ Aurora Postgres (pgvector) · S3 (skills, KB, logs) │
│ Bedrock KB · Secrets Manager │
├─────────────────────────────────────────────────────────────┤
│ Foundation Tier │
│ VPC · Subnets · Cognito · KMS · Route53 · ACM · SES Setup │
└─────────────────────────────────────────────────────────────┘

The foundation tier provides identity, networking, and encryption. It changes rarely and is the most stable part of the deployment.

ResourcePurpose
VPC + subnetsIsolated network with public and private subnets across 2 AZs
NAT GatewayOutbound internet access for private subnet resources
Cognito User PoolUser authentication and JWT issuance
Cognito Identity PoolMaps Cognito users to IAM roles for direct AWS resource access
KMS keysEncryption at rest for app data, audit logs, and credential vault
Route53 recordsDNS for admin app, API, and email
ACM certificatesTLS for CloudFront and API Gateway custom domains
SES domain identityVerified sending domain for outbound email

The data tier holds all persistent state. It depends on the foundation tier for network access and KMS encryption.

ResourcePurpose
Aurora PostgresPrimary data store: agents, threads, messages, automations, connectors, users. Also hosts the pgvector index used by Bedrock Knowledge Bases — no separate vector DB
S3 — skill catalogskills/catalog/*.md — skill packs loaded at invoke time
S3 — knowledge docsSource documents for Bedrock Knowledge Bases
S3 — audit logsAppend-only log of every agent invoke (NDJSON, partitioned by date)
S3 — assetsAdmin and end-user app static files
Bedrock Knowledge BaseVector-indexed document store for inline RAG, backed by Aurora pgvector
Secrets ManagerDB credentials, OAuth client secrets

The app tier is where computation happens. It depends on both lower tiers.

ResourcePurpose
AppSync GraphQL APIReal-time subscriptions (WebSocket), used for streaming responses
API Gateway v2HTTP queries and mutations, connector webhook ingress
AgentCore LambdaContainer-based agent runtime (Python/Strands + Bedrock)
Connector LambdasOne per connector (Slack, GitHub, Google) — handles inbound events
Step FunctionsAutomation runner, routine executor
EventBridgeTriggers for scheduled automations
Bedrock AgentCore MemoryAlways on — automatic per-turn retention into four strategies (semantic, preferences, summaries, episodes)
ECS Fargate (optional)Hindsight memory add-on (if enable_hindsight = true)
CloudFrontCDN for admin app, end-user app static files
SES (sending)Outbound email from agent responses

A full round trip from user message to agent response:

1. User sends message
└─ POST /graphql (API Gateway)
└─ JWT validated by Cognito authorizer
└─ createMessage resolver → Aurora (writes message record)
└─ Triggers AgentCore Lambda invocation (async via SQS)
2. AgentCore Lambda receives event
└─ Reads thread history from Aurora
└─ Downloads assigned skill packs from S3
└─ Queries Bedrock Knowledge Base (if assigned) → retrieves relevant chunks
└─ Agent tools read long-term memories from AgentCore Memory (always on) via the `recall()` tool, and optionally from Hindsight (ECS) when `enable_hindsight = true`
└─ After the turn completes, the container auto-emits a CreateEvent into AgentCore Memory so background strategies extract facts for future recall
└─ Builds context: system prompt + selected history + retrieved knowledge + recalled memory + tool config
3. Bedrock inference
└─ Strands sends context + message to Bedrock (Claude)
└─ Model may request tool calls
└─ Tools execute (SQL, S3, HTTP, skill-defined functions)
└─ Tool results injected, model generates final response
4. Response delivery
└─ AgentCore writes response message to Aurora
└─ Publishes AppSync mutation → NewMessageEvent subscription
└─ Stream chunks published in real time via AppSync
└─ Thread status updated in Aurora
5. Client receives response
└─ AppSync WebSocket delivers StreamChunkEvent chunks
└─ Final NewMessageEvent marks completion
External service (Slack, GitHub, etc.)
→ POST /connectors/<id>/webhook (API Gateway)
→ Connector Lambda
└─ Validates signature
└─ Writes thread record to Aurora (channel=SLACK, metadata={...})
└─ Invokes AgentCore (same path as user message above)
→ Agent response
→ Outbound connector or tool call posts reply back to Slack/GitHub/etc.
AgentCore receives thread + assembled context
→ Resolves enabled tool connectors from template/agent config
→ Connects to MCP server over HTTP streaming or SSE
→ Discovers available tools for this invocation
→ Model calls MCP tool when needed
→ Tool result returns into the same turn
→ Final response written back to the thread
EventBridge scheduled rule (cron)
→ Step Functions state machine starts
→ Creates AUTO- thread in Aurora
→ Invokes AgentCore with configured prompt
→ (same agent loop as above)
→ Step Functions records execution result
→ Thread marked closed (or failed)

This split is useful to keep in mind:

  • Threads preserve the canonical record of work in Aurora
  • Memory combines persisted sources like documents and memories with retrieval-time assembly
  • Agents are mostly stateless between invocations aside from their configuration
  • Connectors store credentials and integration configuration, but the resulting work still lands in threads
WhatWhereBackup
Agents, threads, messagesAurora PostgresAutomated daily snapshots (7-day retention)
User accountsCognito User PoolCognito-managed, multi-AZ
Skill packsS3 (skill catalog bucket)S3 versioning enabled
Knowledge documentsS3 (knowledge bucket)S3 versioning enabled
Audit logsS3 (audit log bucket)S3 versioning + lifecycle to Glacier after 90d
Memories (managed)Aurora PostgresSame as above
Memories (Hindsight)Aurora Postgres + ECS in-flight processingSame as above
OAuth tokens / API keysSSM Parameter Store (SecureString, KMS)SSM-managed
Terraform stateS3 (tfstate bucket) + DynamoDB (lock table)S3 versioning enabled
Vector indexAurora Postgres (pgvector)Same as Aurora above

ThinkWork is multi-tenant within a single deployment. Every Aurora table has a tenant_id column and all queries are tenant-scoped. The Cognito identity pool maps users to tenants at login time.

Row-level security (Postgres RLS) enforces tenant isolation at the database level — even if application code has a bug, a query cannot return data from another tenant.

-- RLS policy (applied automatically by ThinkWork migrations)
CREATE POLICY tenant_isolation ON threads
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);
BoundaryEnforcement
External → APICognito JWT (API Gateway authorizer)
API → AuroraIAM auth + VPC security groups
AgentCore → BedrockIAM role (least privilege)
AgentCore → S3IAM role (read skills, read KB docs, write audit logs)
AgentCore → SSMIAM role (read credentials for assigned connectors)
Tenant isolationPostgres RLS on all tables
Secrets at restKMS encryption (dedicated key per secret category)
Audit trailAppend-only S3 bucket (no delete permissions on Lambda role)

AgentCore is deployed as a Lambda container image stored in ECR. The image is built from the ThinkWork base image and includes:

  • Python 3.12
  • Strands agent framework
  • Boto3 (Bedrock, S3, SSM, Secrets Manager clients)
  • httpx (for skill HTTP tools)
  • psycopg3 (Aurora connection)
  • The ThinkWork runtime library (tool registration, memory read/write, context assembly)

The container is 512MB compressed. Lambda allocates 3GB memory (configurable), which maps to approximately 2 vCPUs. Cold start time is 3–5 seconds for the first invocation after a period of inactivity; warm invocations start in under 100ms.