agent-browser: Fast Rust CLI and Playwright Daemon for AI Agents

agent-browser.dev is a command-line tool for headless browser automation designed with AI agents in mind. The project combines a fast Rust CLI with a Node.js daemon that runs Playwright, providing a deterministic, agent-friendly workflow for navigation, interaction, and site inspection.

What it is

agent-browser exposes a wide set of browser controls as simple CLI commands: navigation (open), element interaction (click, fill, type), state checks (is visible, is checked), page capture (screenshot, pdf), and an accessibility-driven snapshot that produces refs for deterministic element selection. The repo is licensed under Apache-2.0 and currently shows 3.2k stars and 116 forks on GitHub.

Installation and platforms

Installation is available via npm install -g agent-browser, with an agent-browser install step to download Chromium. Source builds are supported (pnpm + Rust toolchain required for the native binary). On Linux, an --with-deps option installs system dependencies, or Playwright's npx playwright install-deps chromium can be used.

Supported platforms provide a native Rust binary with a Node.js fallback:

macOS ARM64 / x64: Native Rust (fallback: Node.js)
Linux ARM64 / x64: Native Rust (fallback: Node.js)
Windows x64: Native Rust (fallback: Node.js)

Core commands and workflows

Commands are designed for concise scripting and agent consumption. Common commands include:

agent-browser open <url> (aliases: goto, navigate)
agent-browser click <sel>, fill <sel> <text>, type <sel> <text>
agent-browser screenshot [path] (--full for full page)
agent-browser snapshot — returns an accessibility tree and refs (e.g., @e1)

The snapshot + refs workflow is emphasized as deterministic, fast, and AI-friendly: an agent obtains a snapshot, selects refs, then performs actions by referencing those refs (e.g., agent-browser click @e2).

Semantic locators and selectors

Beyond refs, agent-browser supports traditional CSS selectors, XPath/text selectors, and semantic locators (ARIA role, labels, placeholders, alt/title/testid). Examples of semantic commands:

agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"

Agent integration and machine-readable output

A --json option provides machine-readable output for integration with LLM-based agents:

agent-browser snapshot --json returns a structured snapshot and refs
agent-browser get text @e1 --json and other commands support JSON output

The repository includes a skills/agent-browser skill and instructions for integrating with Claude Code via a provided skill file.

Sessions, authentication, and network control

agent-browser supports isolated sessions (--session or AGENT_BROWSER_SESSION), where each session maintains its own browser instance, cookies, storage, history, and auth state. Scoped headers can be set per origin to enable authenticated sessions without UI logins:

agent-browser open api.example.com --headers '{"Authorization":"Bearer <token>"}'

Network control includes request routing and mocking:

agent-browser network route <url> --abort
agent-browser network route <url> --body <json>

Advanced usage: CDP, custom executables, and headed mode

CDP mode (--cdp <port>) connects to an existing Chrome DevTools Protocol endpoint (Electron, remote Chrome, WebView2).
Custom browser executables can be supplied via --executable-path or AGENT_BROWSER_EXECUTABLE_PATH, useful for serverless deployments with lightweight Chromium builds.
--headed shows the browser window for debugging.

Architecture and runtime behavior

agent-browser follows a client-daemon architecture:

Rust CLI parses commands and communicates with the daemon.
Node.js daemon manages the Playwright browser instance.
Fallback: If the native binary is not available, a Node.js-only path runs the CLI logic.

Chromium is used by default, and Playwright-driven support exists for Firefox and WebKit.

Quick start examples

A minimal interaction flow shown in the docs:

agent-browser open example.com
agent-browser snapshot -i (interactive elements only)
agent-browser click @e2
agent-browser fill @e3 "test@example.com"
agent-browser screenshot page.png
agent-browser close

Repository notes

The project is primarily written in TypeScript and Rust (approximately 58% TypeScript, 39% Rust), and includes sample skills, docs, and a suite of commands and options for debugging, tracing, and state management.

Original source: https://github.com/vercel-labs/agent-browser