StrongDM's Software Factory Runs Autonomous Agent-Driven Development and Simu…

A recent article by Simon Willison takes a closer look at StrongDM's Software Factory, where specs and scenario holdouts let agents write, test, and iterate code without human review. Willison highlights the Digital Twin testing setup and published repos.

StrongDM's Software Factory Runs Autonomous Agent-Driven Development and Simu…

TL;DR

  • Software Factory: non-interactive pipeline where specs and end-to-end scenarios drive coding agents that write, test, and iterate software without human authorship or review
  • Scenarios as holdouts: end-to-end user stories kept separate from the codebase to prevent agent overfitting and serve as external validators
  • Probabilistic validation: measures the fraction of observed scenario trajectories that likely meet user needs instead of binary test results
  • Digital Twin Universe: agent-generated clones of third-party services (Okta, Jira, Slack, Google Docs/Drive/Sheets) for high-rate, low-cost, safe scenario runs
  • Techniques and artifacts: Gene Transfusion, Semports, Pyramid Summaries; Attractor repo contains spec markdown only; CXDB is an AI Context Store that logs conversation histories and tool outputs in an immutable DAG

StrongDM’s AI team has built a Software Factory: a non-interactive development pipeline where specs and end-to-end scenarios drive coding agents that write, test, and iterate on software without human authorship or review. The setup treats scenario descriptions as held-out validation sets, and evaluates deployments using a probabilistic “satisfaction” metric instead of traditional boolean test results.

How the pipeline validates work

  • Scenarios as holdouts: End-to-end user stories are kept separate from the codebase so agents cannot overfit. These scenarios serve as external validators that agents must satisfy, rather than being easily gamed unit tests.
  • Probabilistic validation: Rather than relying solely on green test suites, the system measures what fraction of observed scenario trajectories likely meet user needs—a statistical view of correctness.
  • Digital Twin Universe: To exercise the system at scale, the team builds agent-generated clones of third-party services (Okta, Jira, Slack, Google Docs/Drive/Sheets). These twins replicate APIs, edge cases, and behaviors so scenarios can be run thousands of times per hour without rate limits, costs, or safety concerns.

Techniques and tooling

StrongDM documents a set of techniques that emerge from this agentic workflow:

  • Gene Transfusion — extracting and reusing behavioral patterns from existing systems.
  • Semports — automated porting of code between languages.
  • Pyramid Summaries — layered summaries that let agents surface brief overviews quickly and drill into detail when needed.

Two public artifacts illustrate the approach. The Attractor project on GitHub is unusual: the repo contains no runnable code, only detailed spec markdown meant to be fed into an agent. The CXDB release is a more conventional codebase: an AI Context Store that logs conversation histories and tool outputs in an immutable DAG for reproducible context management.

Why it matters

This workflow shifts validation away from human code review toward agent-driven simulation and empirical scenario testing. The Digital Twin concept in particular enables exhaustive, safe validation of complex integrations that would be impractical or risky against live services.

For a concise, firsthand write-up and links to StrongDM’s documentation and repositories, see the original post by Simon Willison: https://simonwillison.net/2026/Feb/7/software-factory/

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community