Simon Willison on asynchronous code research with coding agents
Simon Willison outlines a workflow for asynchronous code research that leans on modern coding agents — Claude Code, Codex Cloud, Gemini Jules and GitHub Copilot agents — to run experiments autonomously and return results as git commits and pull requests. The pattern is straightforward: craft a clear research question, spin up an agent against a purpose-built repo, let it run unattended, then review the artifacts it produces.
The workflow, distilled
- Start with a dedicated GitHub repository (public or private) so agents can be given broad permissions without risking production secrets.
- Frame the research task as a concise prompt and fire it off to an asynchronous coding agent that runs server-side and files PRs when finished.
- Prefer repositories configured for full network access when the research requires fetching dependencies, external data or compiling toolchains.
- Treat agent output as experimental material: code and tests can be executed to confirm results, but human review remains necessary before publication.
Concrete examples in the wild
A public collection hosted at simonw/research demonstrates the pattern across 13 projects. Notable examples:
- python-markdown-comparison — a benchmark comparing seven Python Markdown libraries that found cmarkgfm outperforming the others by a significant margin, with charts and a performance report produced by the agent.
- cmarkgfm-in-pyodide — an agent-driven attempt to compile a C-extension Python package into a Pyodide-compatible wheel and load it inside Node.js via WebAssembly; the project demonstrates agents chaining on prior experiments and iterating through failures.
- blog-tags-scikit-learn — a text-classification experiment using scikit-learn to suggest tags for older blog posts, producing scripts, JSON result files and a written report.
The repository also includes a GitHub Workflow using GitHub Models to auto-update the README, and an AGENTS.md with operational tips for directing agents.
Safety and curation
Agent-produced work is often raw. The intent is to quarantine AI-generated material inside a single repo and apply human verification before wider sharing. For non-sensitive research, relaxing network restrictions unlocks more capable experiments but increases the need for review and containment strategies.
Getting started
The pattern is accessible: create an empty repo, craft a research prompt, and let an asynchronous agent run. Some services currently offer trial incentives — for example, Claude Code advertised a limited-time promotion of credits for subscribers and Jules provides a free tier — which can lower the barrier to experimentation.
For the full write-up, prompts, transcripts and links to the code examples, read the original article: https://simonwillison.net/2025/Nov/6/async-code-research/
