StrongDM’s AI team has built a Software Factory: a non-interactive development pipeline where specs and end-to-end scenarios drive coding agents that write, test, and iterate on software without human authorship or review. The setup treats scenario descriptions as held-out validation sets, and evaluates deployments using a probabilistic “satisfaction” metric instead of traditional boolean test results.
How the pipeline validates work
- Scenarios as holdouts: End-to-end user stories are kept separate from the codebase so agents cannot overfit. These scenarios serve as external validators that agents must satisfy, rather than being easily gamed unit tests.
- Probabilistic validation: Rather than relying solely on green test suites, the system measures what fraction of observed scenario trajectories likely meet user needs—a statistical view of correctness.
- Digital Twin Universe: To exercise the system at scale, the team builds agent-generated clones of third-party services (Okta, Jira, Slack, Google Docs/Drive/Sheets). These twins replicate APIs, edge cases, and behaviors so scenarios can be run thousands of times per hour without rate limits, costs, or safety concerns.
Techniques and tooling
StrongDM documents a set of techniques that emerge from this agentic workflow:
- Gene Transfusion — extracting and reusing behavioral patterns from existing systems.
- Semports — automated porting of code between languages.
- Pyramid Summaries — layered summaries that let agents surface brief overviews quickly and drill into detail when needed.
Two public artifacts illustrate the approach. The Attractor project on GitHub is unusual: the repo contains no runnable code, only detailed spec markdown meant to be fed into an agent. The CXDB release is a more conventional codebase: an AI Context Store that logs conversation histories and tool outputs in an immutable DAG for reproducible context management.
Why it matters
This workflow shifts validation away from human code review toward agent-driven simulation and empirical scenario testing. The Digital Twin concept in particular enables exhaustive, safe validation of complex integrations that would be impractical or risky against live services.
For a concise, firsthand write-up and links to StrongDM’s documentation and repositories, see the original post by Simon Willison: https://simonwillison.net/2026/Feb/7/software-factory/