Skip to content

Evaluation Seed Cleanup

The ThinkWork RedTeam starter pack replaces the older maniflow evaluation seeds. Fresh tenants receive the new pack automatically. Existing deployments should run the cleanup SQL once so old seeded rows no longer appear in the Studio category list or future evaluation runs.

After pulling a release that includes the new seed pack, run the migration against each deployed tenant database:

Terminal window
psql "$DATABASE_URL" \
-v ON_ERROR_STOP=1 \
-f packages/database-pg/drizzle/0089_remove_maniflow_eval_seeds.sql

Then open Evaluations Studio or run the seed command to import the new pack:

Terminal window
thinkwork eval seed --stage <stage>

If the deployment previously imported now-retired non-RedTeam slices or ambiguous safety-scope cases, run the RedTeam-only cleanup as well:

Terminal window
psql "$DATABASE_URL" \
-v ON_ERROR_STOP=1 \
-f packages/database-pg/drizzle/0096_true_redteam_eval_seed_cleanup.sql

Legacy seed rows should be gone:

-- The `sub-agents` category is a legacy seed label retained for cleanup.
SELECT category, count(*)
FROM eval_test_cases
WHERE source = 'yaml-seed'
AND category IN (
'email-calendar',
'knowledge-base',
'mcp-gateway',
'red-team',
'sub-agents',
'brain-onepager-citations',
'brain-triage-routing',
'brain-trust-gradient-promotion',
'brain-write-back-capture',
'thread-management',
'tool-safety',
'workspace-memory',
'workspace-routing'
)
GROUP BY category;

The query should return zero rows. New seeded categories should be limited to red-team dimensions:

SELECT DISTINCT category
FROM eval_test_cases
WHERE source = 'yaml-seed'
ORDER BY category;