agent-browser: Fast Rust CLI and Playwright Daemon for AI Agents

agent-browser pairs a fast Rust CLI with a Node.js Playwright daemon to provide deterministic, AI-friendly browser automation via snapshots and refs. It offers semantic locators, JSON output for LLMs, session isolation, network mocking, and headed/CDP modes.

tool cover

TL;DR

  • Client–daemon architecture: Rust CLI parses commands and talks to a Node.js daemon running Playwright; native Rust binary with Node fallback; Chromium default, Playwright also supports Firefox/WebKit; CDP mode and custom executable support.
  • Installation and platforms: npm install -g agent-browser plus agent-browser install for Chromium; source builds require pnpm + Rust toolchain; macOS/Linux ARM64/x64 and Windows x64 supported; repo Apache-2.0 (3.2k stars, 116 forks).
  • Deterministic snapshot + refs workflow: agent-browser snapshot returns an accessibility tree and refs (e.g., @e1) for reliable agent actions like agent-browser click @e2.
  • Core commands: concise CLI for navigation and interaction (open, click, fill, type), page capture (screenshot, pdf), and snapshot for element refs.
  • Machine-readable output and agent integration: --json for structured snapshots and command output; includes a skills/agent-browser skill and a Claude Code skill file.
  • Sessions, auth, and network control: isolated sessions via --session/ENV, per-origin headers for auth, and network routing/mocking (network route <url> --abort / --body <json>).

agent-browser.dev is a command-line tool for headless browser automation designed with AI agents in mind. The project combines a fast Rust CLI with a Node.js daemon that runs Playwright, providing a deterministic, agent-friendly workflow for navigation, interaction, and site inspection.

What it is

agent-browser exposes a wide set of browser controls as simple CLI commands: navigation (open), element interaction (click, fill, type), state checks (is visible, is checked), page capture (screenshot, pdf), and an accessibility-driven snapshot that produces refs for deterministic element selection. The repo is licensed under Apache-2.0 and currently shows 3.2k stars and 116 forks on GitHub.

Installation and platforms

Installation is available via npm install -g agent-browser, with an agent-browser install step to download Chromium. Source builds are supported (pnpm + Rust toolchain required for the native binary). On Linux, an --with-deps option installs system dependencies, or Playwright's npx playwright install-deps chromium can be used.

Supported platforms provide a native Rust binary with a Node.js fallback:

  • macOS ARM64 / x64: Native Rust (fallback: Node.js)
  • Linux ARM64 / x64: Native Rust (fallback: Node.js)
  • Windows x64: Native Rust (fallback: Node.js)

Core commands and workflows

Commands are designed for concise scripting and agent consumption. Common commands include:

  • agent-browser open <url> (aliases: goto, navigate)
  • agent-browser click <sel>, fill <sel> <text>, type <sel> <text>
  • agent-browser screenshot [path] (--full for full page)
  • agent-browser snapshot — returns an accessibility tree and refs (e.g., @e1)

The snapshot + refs workflow is emphasized as deterministic, fast, and AI-friendly: an agent obtains a snapshot, selects refs, then performs actions by referencing those refs (e.g., agent-browser click @e2).

Semantic locators and selectors

Beyond refs, agent-browser supports traditional CSS selectors, XPath/text selectors, and semantic locators (ARIA role, labels, placeholders, alt/title/testid). Examples of semantic commands:

  • agent-browser find role button click --name "Submit"
  • agent-browser find label "Email" fill "test@test.com"

Agent integration and machine-readable output

A --json option provides machine-readable output for integration with LLM-based agents:

  • agent-browser snapshot --json returns a structured snapshot and refs
  • agent-browser get text @e1 --json and other commands support JSON output

The repository includes a skills/agent-browser skill and instructions for integrating with Claude Code via a provided skill file.

Sessions, authentication, and network control

agent-browser supports isolated sessions (--session or AGENT_BROWSER_SESSION), where each session maintains its own browser instance, cookies, storage, history, and auth state. Scoped headers can be set per origin to enable authenticated sessions without UI logins:

  • agent-browser open api.example.com --headers '{"Authorization":"Bearer <token>"}'

Network control includes request routing and mocking:

  • agent-browser network route <url> --abort
  • agent-browser network route <url> --body <json>

Advanced usage: CDP, custom executables, and headed mode

  • CDP mode (--cdp <port>) connects to an existing Chrome DevTools Protocol endpoint (Electron, remote Chrome, WebView2).
  • Custom browser executables can be supplied via --executable-path or AGENT_BROWSER_EXECUTABLE_PATH, useful for serverless deployments with lightweight Chromium builds.
  • --headed shows the browser window for debugging.

Architecture and runtime behavior

agent-browser follows a client-daemon architecture:

  1. Rust CLI parses commands and communicates with the daemon.
  2. Node.js daemon manages the Playwright browser instance.
  3. Fallback: If the native binary is not available, a Node.js-only path runs the CLI logic.

Chromium is used by default, and Playwright-driven support exists for Firefox and WebKit.

Quick start examples

A minimal interaction flow shown in the docs:

  • agent-browser open example.com
  • agent-browser snapshot -i (interactive elements only)
  • agent-browser click @e2
  • agent-browser fill @e3 "test@example.com"
  • agent-browser screenshot page.png
  • agent-browser close

Repository notes

The project is primarily written in TypeScript and Rust (approximately 58% TypeScript, 39% Rust), and includes sample skills, docs, and a suite of commands and options for debugging, tracing, and state management.

Original source: https://github.com/vercel-labs/agent-browser

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community