/ README.md
README.md
1 # agent-browser 2 3 Browser automation CLI for AI agents. Fast native Rust CLI. 4 5 ## Installation 6 7 ### Global Installation (recommended) 8 9 Installs the native Rust binary: 10 11 ```bash 12 npm install -g agent-browser 13 agent-browser install # Download Chrome from Chrome for Testing (first time only) 14 ``` 15 16 ### Project Installation (local dependency) 17 18 For projects that want to pin the version in `package.json`: 19 20 ```bash 21 npm install agent-browser 22 agent-browser install 23 ``` 24 25 Then use via `package.json` scripts or by invoking `agent-browser` directly. 26 27 ### Homebrew (macOS) 28 29 ```bash 30 brew install agent-browser 31 agent-browser install # Download Chrome from Chrome for Testing (first time only) 32 ``` 33 34 ### Cargo (Rust) 35 36 ```bash 37 cargo install agent-browser 38 agent-browser install # Download Chrome from Chrome for Testing (first time only) 39 ``` 40 41 ### From Source 42 43 ```bash 44 git clone https://github.com/vercel-labs/agent-browser 45 cd agent-browser 46 pnpm install 47 pnpm build 48 pnpm build:native # Requires Rust (https://rustup.rs) 49 pnpm link --global # Makes agent-browser available globally 50 agent-browser install 51 ``` 52 53 ### Linux Dependencies 54 55 On Linux, install system dependencies: 56 57 ```bash 58 agent-browser install --with-deps 59 ``` 60 61 ### Updating 62 63 Upgrade to the latest version: 64 65 ```bash 66 agent-browser upgrade 67 ``` 68 69 Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically. 70 71 ### Requirements 72 73 - **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon. 74 - **Rust** - Only needed when building from source (see From Source above). 75 76 ## Quick Start 77 78 ```bash 79 agent-browser open example.com 80 agent-browser snapshot # Get accessibility tree with refs 81 agent-browser click @e2 # Click by ref from snapshot 82 agent-browser fill @e3 "test@example.com" # Fill by ref 83 agent-browser get text @e1 # Get text by ref 84 agent-browser screenshot page.png 85 agent-browser close 86 ``` 87 88 ### Traditional Selectors (also supported) 89 90 ```bash 91 agent-browser click "#submit" 92 agent-browser fill "#email" "test@example.com" 93 agent-browser find role button click --name "Submit" 94 ``` 95 96 ## Commands 97 98 ### Core Commands 99 100 ```bash 101 agent-browser open # Launch browser (no navigation); stays on about:blank 102 agent-browser open <url> # Launch + navigate to URL (aliases: goto, navigate) 103 agent-browser click <sel> # Click element (--new-tab to open in new tab) 104 agent-browser dblclick <sel> # Double-click element 105 agent-browser focus <sel> # Focus element 106 agent-browser type <sel> <text> # Type into element 107 agent-browser fill <sel> <text> # Clear and fill 108 agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key) 109 agent-browser keyboard type <text> # Type with real keystrokes (no selector, current focus) 110 agent-browser keyboard inserttext <text> # Insert text without key events (no selector) 111 agent-browser keydown <key> # Hold key down 112 agent-browser keyup <key> # Release key 113 agent-browser hover <sel> # Hover element 114 agent-browser select <sel> <val> # Select dropdown option 115 agent-browser check <sel> # Check checkbox 116 agent-browser uncheck <sel> # Uncheck checkbox 117 agent-browser scroll <dir> [px] # Scroll (up/down/left/right, --selector <sel>) 118 agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto) 119 agent-browser drag <src> <tgt> # Drag and drop 120 agent-browser upload <sel> <files> # Upload files 121 agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path) 122 agent-browser screenshot --annotate # Annotated screenshot with numbered element labels 123 agent-browser screenshot --screenshot-dir ./shots # Save to custom directory 124 agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80 125 agent-browser pdf <path> # Save as PDF 126 agent-browser snapshot # Accessibility tree with refs (best for AI) 127 agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input) 128 agent-browser connect <port> # Connect to browser via CDP 129 agent-browser stream enable [--port <port>] # Start runtime WebSocket streaming 130 agent-browser stream status # Show runtime streaming state and bound port 131 agent-browser stream disable # Stop runtime WebSocket streaming 132 agent-browser close # Close browser (aliases: quit, exit) 133 agent-browser close --all # Close all active sessions 134 agent-browser chat "<instruction>" # AI chat: natural language browser control (single-shot) 135 agent-browser chat # AI chat: interactive REPL mode 136 ``` 137 138 ### Get Info 139 140 ```bash 141 agent-browser get text <sel> # Get text content 142 agent-browser get html <sel> # Get innerHTML 143 agent-browser get value <sel> # Get input value 144 agent-browser get attr <sel> <attr> # Get attribute 145 agent-browser get title # Get page title 146 agent-browser get url # Get current URL 147 agent-browser get cdp-url # Get CDP WebSocket URL (for DevTools, debugging) 148 agent-browser get count <sel> # Count matching elements 149 agent-browser get box <sel> # Get bounding box 150 agent-browser get styles <sel> # Get computed styles 151 ``` 152 153 ### Check State 154 155 ```bash 156 agent-browser is visible <sel> # Check if visible 157 agent-browser is enabled <sel> # Check if enabled 158 agent-browser is checked <sel> # Check if checked 159 ``` 160 161 ### Find Elements (Semantic Locators) 162 163 ```bash 164 agent-browser find role <role> <action> [value] # By ARIA role 165 agent-browser find text <text> <action> # By text content 166 agent-browser find label <label> <action> [value] # By label 167 agent-browser find placeholder <ph> <action> [value] # By placeholder 168 agent-browser find alt <text> <action> # By alt text 169 agent-browser find title <text> <action> # By title attr 170 agent-browser find testid <id> <action> [value] # By data-testid 171 agent-browser find first <sel> <action> [value] # First match 172 agent-browser find last <sel> <action> [value] # Last match 173 agent-browser find nth <n> <sel> <action> [value] # Nth match 174 ``` 175 176 **Actions:** `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text` 177 178 **Options:** `--name <name>` (filter role by accessible name), `--exact` (require exact text match) 179 180 **Examples:** 181 182 ```bash 183 agent-browser find role button click --name "Submit" 184 agent-browser find text "Sign In" click 185 agent-browser find label "Email" fill "test@test.com" 186 agent-browser find first ".item" click 187 agent-browser find nth 2 "a" text 188 ``` 189 190 ### Wait 191 192 ```bash 193 agent-browser wait <selector> # Wait for element to be visible 194 agent-browser wait <ms> # Wait for time (milliseconds) 195 agent-browser wait --text "Welcome" # Wait for text to appear (substring match) 196 agent-browser wait --url "**/dash" # Wait for URL pattern 197 agent-browser wait --load networkidle # Wait for load state 198 agent-browser wait --fn "window.ready === true" # Wait for JS condition 199 200 # Wait for text/element to disappear 201 agent-browser wait --fn "!document.body.innerText.includes('Loading...')" 202 agent-browser wait "#spinner" --state hidden 203 ``` 204 205 **Load states:** `load`, `domcontentloaded`, `networkidle` 206 207 ### Batch Execution 208 209 Execute multiple commands in a single invocation. Commands can be passed as 210 quoted arguments or piped as JSON via stdin. This avoids per-command process 211 startup overhead when running multi-step workflows. 212 213 ```bash 214 # Argument mode: each quoted argument is a full command 215 agent-browser batch "open https://example.com" "snapshot -i" "screenshot" 216 217 # With --bail to stop on first error 218 agent-browser batch --bail "open https://example.com" "click @e1" "screenshot" 219 220 # Stdin mode: pipe commands as JSON 221 echo '[ 222 ["open", "https://example.com"], 223 ["snapshot", "-i"], 224 ["click", "@e1"], 225 ["screenshot", "result.png"] 226 ]' | agent-browser batch --json 227 ``` 228 229 ### Clipboard 230 231 ```bash 232 agent-browser clipboard read # Read text from clipboard 233 agent-browser clipboard write "Hello, World!" # Write text to clipboard 234 agent-browser clipboard copy # Copy current selection (Ctrl+C) 235 agent-browser clipboard paste # Paste from clipboard (Ctrl+V) 236 ``` 237 238 ### Mouse Control 239 240 ```bash 241 agent-browser mouse move <x> <y> # Move mouse 242 agent-browser mouse down [button] # Press button (left/right/middle) 243 agent-browser mouse up [button] # Release button 244 agent-browser mouse wheel <dy> [dx] # Scroll wheel 245 ``` 246 247 ### Browser Settings 248 249 ```bash 250 agent-browser set viewport <w> <h> [scale] # Set viewport size (scale for retina, e.g. 2) 251 agent-browser set device <name> # Emulate device ("iPhone 14") 252 agent-browser set geo <lat> <lng> # Set geolocation 253 agent-browser set offline [on|off] # Toggle offline mode 254 agent-browser set headers <json> # Extra HTTP headers 255 agent-browser set credentials <u> <p> # HTTP basic auth 256 agent-browser set media [dark|light] # Emulate color scheme 257 ``` 258 259 ### Cookies & Storage 260 261 ```bash 262 agent-browser cookies # Get all cookies 263 agent-browser cookies set <name> <val> # Set cookie 264 agent-browser cookies set --curl <file> # Import cookies from a Copy-as-cURL dump, 265 # JSON array, or bare Cookie header (auto-detected) 266 agent-browser cookies clear # Clear cookies 267 268 agent-browser storage local # Get all localStorage 269 agent-browser storage local <key> # Get specific key 270 agent-browser storage local set <k> <v> # Set value 271 agent-browser storage local clear # Clear all 272 273 agent-browser storage session # Same for sessionStorage 274 ``` 275 276 ### Network 277 278 ```bash 279 agent-browser network route <url> # Intercept requests 280 agent-browser network route <url> --abort # Block requests 281 agent-browser network route <url> --body <json> # Mock response 282 agent-browser network route '*' --abort --resource-type script # Block scripts only 283 agent-browser network unroute [url] # Remove routes 284 agent-browser network requests # View tracked requests 285 agent-browser network requests --filter api # Filter requests 286 agent-browser network requests --type xhr,fetch # Filter by resource type 287 agent-browser network requests --method POST # Filter by HTTP method 288 agent-browser network requests --status 2xx # Filter by status (200, 2xx, 400-499) 289 agent-browser network request <requestId> # View full request/response detail 290 agent-browser network har start # Start HAR recording 291 agent-browser network har stop [output.har] # Stop and save HAR (temp path if omitted) 292 ``` 293 294 ### Tabs & Windows 295 296 ```bash 297 agent-browser tab # List tabs (shows `tabId` and optional label) 298 agent-browser tab new [url] # New tab (optionally with URL) 299 agent-browser tab new --label docs [url] # New tab with a user-assigned label 300 agent-browser tab <t<N>|label> # Switch to a tab by id or label 301 agent-browser tab close [t<N>|label] # Close a tab (defaults to active) 302 agent-browser window new # New window 303 ``` 304 305 Tab ids are stable strings of the form `t1`, `t2`, `t3`. They're never reused 306 within a session, so scripts and agents can keep referring to the same tab 307 even after other tabs are opened or closed. Positional integers like `tab 2` 308 are **not** accepted; the `t` prefix disambiguates handles from indices and 309 mirrors the `@e1` convention used for element refs. 310 311 You can also assign a memorable label (`docs`, `app`, `admin`) and use it 312 interchangeably with the id. Labels are never auto-generated and never 313 rewritten on navigation — they're yours to name and keep: 314 315 ```bash 316 agent-browser tab new --label docs https://docs.example.com 317 agent-browser tab docs # switch to the docs tab 318 agent-browser snapshot # populate refs for docs 319 agent-browser click @e3 # click uses docs's refs 320 agent-browser tab close docs # close by label 321 ``` 322 323 ### Frames 324 325 ```bash 326 agent-browser frame <sel> # Switch to iframe 327 agent-browser frame main # Back to main frame 328 ``` 329 330 ### Dialogs 331 332 ```bash 333 agent-browser dialog accept [text] # Accept (with optional prompt text) 334 agent-browser dialog dismiss # Dismiss 335 agent-browser dialog status # Check if a dialog is currently open 336 ``` 337 338 By default, `alert` and `beforeunload` dialogs are automatically accepted so they never block the agent. `confirm` and `prompt` dialogs still require explicit handling. Use `--no-auto-dialog` (or `AGENT_BROWSER_NO_AUTO_DIALOG=1`) to disable automatic handling. 339 340 When a JavaScript dialog is pending, all command responses include a `warning` field with the dialog type and message. 341 342 ### Diff 343 344 ```bash 345 agent-browser diff snapshot # Compare current vs last snapshot 346 agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file 347 agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff 348 agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline 349 agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path 350 agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1) 351 agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff) 352 agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff 353 agent-browser diff url https://v1.com https://v2.com --wait-until networkidle # Custom wait strategy 354 agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element 355 ``` 356 357 ### Debug 358 359 ```bash 360 agent-browser trace start [path] # Start recording trace 361 agent-browser trace stop [path] # Stop and save trace 362 agent-browser profiler start # Start Chrome DevTools profiling 363 agent-browser profiler stop [path] # Stop and save profile (.json) 364 agent-browser console # View console messages (log, error, warn, info) 365 agent-browser console --json # JSON output with raw CDP args for programmatic access 366 agent-browser console --clear # Clear console 367 agent-browser errors # View page errors (uncaught JavaScript exceptions) 368 agent-browser errors --clear # Clear errors 369 agent-browser highlight <sel> # Highlight element 370 agent-browser inspect # Open Chrome DevTools for the active page 371 agent-browser state save <path> # Save auth state 372 agent-browser state load <path> # Load auth state 373 agent-browser state list # List saved state files 374 agent-browser state show <file> # Show state summary 375 agent-browser state rename <old> <new> # Rename state file 376 agent-browser state clear [name] # Clear states for session 377 agent-browser state clear --all # Clear all saved states 378 agent-browser state clean --older-than <days> # Delete old states 379 ``` 380 381 ### Navigation 382 383 ```bash 384 agent-browser back # Go back 385 agent-browser forward # Go forward 386 agent-browser reload # Reload page 387 agent-browser pushstate <url> # SPA client-side nav; auto-detects window.next.router.push, 388 # falls back to history.pushState + popstate 389 ``` 390 391 ### Pre-navigation setup 392 393 Some flows (SSR debug, auth cookies for protected origins, init scripts) 394 need state set up *before* the first navigation. Use `open` with no URL 395 to launch the browser, then stage cookies / routes / init scripts, then 396 navigate. `batch` sends it all in one CLI call: 397 398 ```bash 399 agent-browser batch \ 400 '["open"]' \ 401 '["network","route","*","--abort","--resource-type","script"]' \ 402 '["cookies","set","--curl","cookies.curl","--domain","localhost"]' \ 403 '["navigate","http://localhost:3000/target"]' 404 ``` 405 406 Without `batch` the same sequence is three commands that all reuse the 407 same daemon (fast, but not one turn). 408 409 ### React / Web Vitals 410 411 Agent-browser ships with first-class React introspection and universal Web 412 Vitals metrics. The React commands need the React DevTools hook installed at 413 launch; Web Vitals and pushstate are framework-agnostic. 414 415 ```bash 416 agent-browser open --enable react-devtools <url> # Launch with React hook installed 417 agent-browser react tree # Full component tree 418 agent-browser react inspect <fiberId> # props, hooks, state, source 419 agent-browser react renders start # Begin fiber render recording 420 agent-browser react renders stop [--json] # Stop and print profile (--json for raw data) 421 agent-browser react suspense [--only-dynamic] [--json] # Suspense boundaries + classifier 422 # --only-dynamic hides the "static" list 423 agent-browser vitals [url] [--json] # LCP/CLS/TTFB/FCP/INP + React hydration phases 424 ``` 425 426 Each `react ...` subcommand requires `--enable react-devtools` to have been 427 passed at launch (the React DevTools `installHook.js` is embedded in the 428 binary). Without it the commands error with `React DevTools hook not installed 429 - relaunch with --enable react-devtools`. 430 431 Works on any React app — Next.js, Remix, Vite+React, CRA, TanStack Start, 432 React Native Web, etc. `vitals` and `pushstate` are framework-agnostic. 433 434 ### Init scripts 435 436 ```bash 437 agent-browser open --init-script <path> # Register page init script before first navigation 438 # (repeatable; also AGENT_BROWSER_INIT_SCRIPTS env) 439 agent-browser addinitscript <js> # Register at runtime (returns identifier) 440 agent-browser removeinitscript <identifier> # Remove a previously registered init script 441 ``` 442 443 ### Setup 444 445 ```bash 446 agent-browser install # Download Chrome from Chrome for Testing (Google's official automation channel) 447 agent-browser install --with-deps # Also install system deps (Linux) 448 agent-browser upgrade # Upgrade agent-browser to the latest version 449 agent-browser doctor # Diagnose the install and auto-clean stale daemon files 450 agent-browser doctor --fix # Also run destructive repairs (reinstall Chrome, purge old state, ...) 451 agent-browser doctor --offline --quick # Skip network probes and the live launch test 452 ``` 453 454 `doctor` checks your environment, Chrome install, daemon state, config files, 455 encryption key, providers, network reachability, and runs a live headless 456 browser launch test. Stale socket/pid sidecar files are auto-cleaned. Output 457 is also available as `--json` for agents. 458 459 ### Skills 460 461 ```bash 462 agent-browser skills # List available skills 463 agent-browser skills list # Same as above 464 agent-browser skills get <name> # Output a skill's full content 465 agent-browser skills get <name> --full # Include references and templates 466 agent-browser skills get --all # Output every skill 467 agent-browser skills path [name] # Print skill directory path 468 ``` 469 470 Serves bundled skill content that always matches the installed CLI version. AI agents use this to get current instructions rather than relying on cached copies. Set `AGENT_BROWSER_SKILLS_DIR` to override the skills directory path. 471 472 ## Authentication 473 474 agent-browser provides multiple ways to persist login sessions so you don't re-authenticate every run. 475 476 ### Quick summary 477 478 | Approach | Best for | Flag / Env | 479 |----------|----------|------------| 480 | **Chrome profile reuse** | Reuse your existing Chrome login state (cookies, sessions) with zero setup | `--profile <name>` / `AGENT_BROWSER_PROFILE` | 481 | **Persistent profile** | Full browser state (cookies, IndexedDB, service workers, cache) across restarts | `--profile <path>` / `AGENT_BROWSER_PROFILE` | 482 | **Session persistence** | Auto-save/restore cookies + localStorage by name | `--session-name <name>` / `AGENT_BROWSER_SESSION_NAME` | 483 | **Import from your browser** | Grab auth from a Chrome session you already logged into | `--auto-connect` + `state save` | 484 | **State file** | Load a previously saved state JSON on launch | `--state <path>` / `AGENT_BROWSER_STATE` | 485 | **Auth vault** | Store credentials locally (encrypted), login by name | `auth save` / `auth login` | 486 487 ### Import auth from your browser 488 489 If you are already logged in to a site in Chrome, you can grab that auth state and reuse it: 490 491 ```bash 492 # 1. Launch Chrome with remote debugging enabled 493 # macOS: 494 "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222 495 # Or use --auto-connect to discover an already-running Chrome 496 497 # 2. Connect and save the authenticated state 498 agent-browser --auto-connect state save ./my-auth.json 499 500 # 3. Use the saved auth in future sessions 501 agent-browser --state ./my-auth.json open https://app.example.com/dashboard 502 503 # 4. Or use --session-name for automatic persistence 504 agent-browser --session-name myapp state load ./my-auth.json 505 # From now on, --session-name myapp auto-saves/restores this state 506 ``` 507 508 > **Security notes:** 509 > - `--remote-debugging-port` exposes full browser control on localhost. Any local process can connect. Only use on trusted machines and close Chrome when done. 510 > - State files contain session tokens in plaintext. Add them to `.gitignore` and delete when no longer needed. For encryption at rest, set `AGENT_BROWSER_ENCRYPTION_KEY` (see [State Encryption](#state-encryption)). 511 512 For full details on login flows, OAuth, 2FA, cookie-based auth, and the auth vault, see the [Authentication](docs/src/app/sessions/page.mdx) docs. 513 514 ## Sessions 515 516 Run multiple isolated browser instances: 517 518 ```bash 519 # Different sessions 520 agent-browser --session agent1 open site-a.com 521 agent-browser --session agent2 open site-b.com 522 523 # Or via environment variable 524 AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn" 525 526 # List active sessions 527 agent-browser session list 528 # Output: 529 # Active sessions: 530 # -> default 531 # agent1 532 533 # Show current session 534 agent-browser session 535 ``` 536 537 Each session has its own: 538 539 - Browser instance 540 - Cookies and storage 541 - Navigation history 542 - Authentication state 543 544 ## Chrome Profile Reuse 545 546 The fastest way to use your existing login state: pass a Chrome profile name to `--profile`: 547 548 ```bash 549 # List available Chrome profiles 550 agent-browser profiles 551 552 # Reuse your default Chrome profile's login state 553 agent-browser --profile Default open https://gmail.com 554 555 # Use a named profile (by display name or directory name) 556 agent-browser --profile "Work" open https://app.example.com 557 558 # Or via environment variable 559 AGENT_BROWSER_PROFILE=Default agent-browser open https://gmail.com 560 ``` 561 562 This copies your Chrome profile to a temp directory (read-only snapshot, no changes to your original profile), so the browser launches with your existing cookies and sessions. 563 564 > **Note:** On Windows, close Chrome before using `--profile <name>` if Chrome is running, as some profile files may be locked. 565 566 ## Persistent Profiles 567 568 For a persistent custom profile directory that stores state across browser restarts, pass a path to `--profile`: 569 570 ```bash 571 # Use a persistent profile directory 572 agent-browser --profile ~/.myapp-profile open myapp.com 573 574 # Login once, then reuse the authenticated session 575 agent-browser --profile ~/.myapp-profile open myapp.com/dashboard 576 577 # Or via environment variable 578 AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com 579 ``` 580 581 The profile directory stores: 582 583 - Cookies and localStorage 584 - IndexedDB data 585 - Service workers 586 - Browser cache 587 - Login sessions 588 589 **Tip**: Use different profile paths for different projects to keep their browser state isolated. 590 591 ## Session Persistence 592 593 Alternatively, use `--session-name` to automatically save and restore cookies and localStorage across browser restarts: 594 595 ```bash 596 # Auto-save/load state for "twitter" session 597 agent-browser --session-name twitter open twitter.com 598 599 # Login once, then state persists automatically 600 # State files stored in ~/.agent-browser/sessions/ 601 602 # Or via environment variable 603 export AGENT_BROWSER_SESSION_NAME=twitter 604 agent-browser open twitter.com 605 ``` 606 607 ### State Encryption 608 609 Encrypt saved session data at rest with AES-256-GCM: 610 611 ```bash 612 # Generate key: openssl rand -hex 32 613 export AGENT_BROWSER_ENCRYPTION_KEY=<64-char-hex-key> 614 615 # State files are now encrypted automatically 616 agent-browser --session-name secure open example.com 617 ``` 618 619 | Variable | Description | 620 | --------------------------------- | -------------------------------------------------- | 621 | `AGENT_BROWSER_SESSION_NAME` | Auto-save/load state persistence name | 622 | `AGENT_BROWSER_ENCRYPTION_KEY` | 64-char hex key for AES-256-GCM encryption | 623 | `AGENT_BROWSER_STATE_EXPIRE_DAYS` | Auto-delete states older than N days (default: 30) | 624 625 ## Security 626 627 agent-browser includes security features for safe AI agent deployments. All features are opt-in -- existing workflows are unaffected until you explicitly enable a feature: 628 629 - **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github` 630 - **Content Boundary Markers** -- Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries` 631 - **Domain Allowlist** -- Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`). 632 - **Action Policy** -- Gate destructive actions with a static policy file: `--action-policy ./policy.json` 633 - **Action Confirmation** -- Require explicit approval for sensitive action categories: `--confirm-actions eval,download` 634 - **Output Length Limits** -- Prevent context flooding: `--max-output 50000` 635 636 | Variable | Description | 637 | ----------------------------------- | ---------------------------------------- | 638 | `AGENT_BROWSER_CONTENT_BOUNDARIES` | Wrap page output in boundary markers | 639 | `AGENT_BROWSER_MAX_OUTPUT` | Max characters for page output | 640 | `AGENT_BROWSER_ALLOWED_DOMAINS` | Comma-separated allowed domain patterns | 641 | `AGENT_BROWSER_ACTION_POLICY` | Path to action policy JSON file | 642 | `AGENT_BROWSER_CONFIRM_ACTIONS` | Action categories requiring confirmation | 643 | `AGENT_BROWSER_CONFIRM_INTERACTIVE` | Enable interactive confirmation prompts | 644 645 See [Security documentation](https://agent-browser.dev/security) for details. 646 647 ## Snapshot Options 648 649 The `snapshot` command supports filtering to reduce output size: 650 651 ```bash 652 agent-browser snapshot # Full accessibility tree 653 agent-browser snapshot -i # Interactive elements only (buttons, inputs, links) 654 agent-browser snapshot -i --urls # Interactive elements with link URLs 655 agent-browser snapshot -c # Compact (remove empty structural elements) 656 agent-browser snapshot -d 3 # Limit depth to 3 levels 657 agent-browser snapshot -s "#main" # Scope to CSS selector 658 agent-browser snapshot -i -c -d 5 # Combine options 659 ``` 660 661 | Option | Description | 662 | ---------------------- | ----------------------------------------------------------------------- | 663 | `-i, --interactive` | Only show interactive elements (buttons, links, inputs) | 664 | `-u, --urls` | Include href URLs for link elements | 665 | `-c, --compact` | Remove empty structural elements | 666 | `-d, --depth <n>` | Limit tree depth | 667 | `-s, --selector <sel>` | Scope to CSS selector | 668 669 ## Annotated Screenshots 670 671 The `--annotate` flag overlays numbered labels on interactive elements in the screenshot. Each label `[N]` corresponds to ref `@eN`, so the same refs work for both visual and text-based workflows. 672 673 Annotated screenshots are supported on the CDP-backed browser path (Chrome/Lightpanda). The Safari/WebDriver backend does not yet support `--annotate`. 674 675 ```bash 676 agent-browser screenshot --annotate 677 # -> Screenshot saved to /tmp/screenshot-2026-02-17T12-00-00-abc123.png 678 # [1] @e1 button "Submit" 679 # [2] @e2 link "Home" 680 # [3] @e3 textbox "Email" 681 ``` 682 683 After an annotated screenshot, refs are cached so you can immediately interact with elements: 684 685 ```bash 686 agent-browser screenshot --annotate ./page.png 687 agent-browser click @e2 # Click the "Home" link labeled [2] 688 ``` 689 690 This is useful for multimodal AI models that can reason about visual layout, unlabeled icon buttons, canvas elements, or visual state that the text accessibility tree cannot capture. 691 692 ## Options 693 694 | Option | Description | 695 |--------|-------------| 696 | `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) | 697 | `--session-name <name>` | Auto-save/restore session state (or `AGENT_BROWSER_SESSION_NAME` env) | 698 | `--profile <name\|path>` | Chrome profile name or persistent directory path (or `AGENT_BROWSER_PROFILE` env) | 699 | `--state <path>` | Load storage state from JSON file (or `AGENT_BROWSER_STATE` env) | 700 | `--headers <json>` | Set HTTP headers scoped to the URL's origin | 701 | `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) | 702 | `--extension <path>` | Load browser extension (repeatable; or `AGENT_BROWSER_EXTENSIONS` env) | 703 | `--init-script <path>` | Register a page init script before the first navigation (repeatable; or `AGENT_BROWSER_INIT_SCRIPTS` env) | 704 | `--enable <feature>` | Built-in init scripts: `react-devtools` (repeatable or comma-list; or `AGENT_BROWSER_ENABLE` env) | 705 | `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) | 706 | `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) | 707 | `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) | 708 | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) | 709 | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) | 710 | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) | 711 | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) | 712 | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) | 713 | `--json` | JSON output (for agents) | 714 | `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) | 715 | `--screenshot-dir <path>` | Default screenshot output directory (or `AGENT_BROWSER_SCREENSHOT_DIR` env) | 716 | `--screenshot-quality <n>` | JPEG quality 0-100 (or `AGENT_BROWSER_SCREENSHOT_QUALITY` env) | 717 | `--screenshot-format <fmt>` | Screenshot format: `png`, `jpeg` (or `AGENT_BROWSER_SCREENSHOT_FORMAT` env) | 718 | `--headed` | Show browser window (not headless) (or `AGENT_BROWSER_HEADED` env) | 719 | `--cdp <port\|url>` | Connect via Chrome DevTools Protocol (port or WebSocket URL) | 720 | `--auto-connect` | Auto-discover and connect to running Chrome (or `AGENT_BROWSER_AUTO_CONNECT` env) | 721 | `--color-scheme <scheme>` | Color scheme: `dark`, `light`, `no-preference` (or `AGENT_BROWSER_COLOR_SCHEME` env) | 722 | `--download-path <path>` | Default download directory (or `AGENT_BROWSER_DOWNLOAD_PATH` env) | 723 | `--content-boundaries` | Wrap page output in boundary markers for LLM safety (or `AGENT_BROWSER_CONTENT_BOUNDARIES` env) | 724 | `--max-output <chars>` | Truncate page output to N characters (or `AGENT_BROWSER_MAX_OUTPUT` env) | 725 | `--allowed-domains <list>` | Comma-separated allowed domain patterns (or `AGENT_BROWSER_ALLOWED_DOMAINS` env) | 726 | `--action-policy <path>` | Path to action policy JSON file (or `AGENT_BROWSER_ACTION_POLICY` env) | 727 | `--confirm-actions <list>` | Action categories requiring confirmation (or `AGENT_BROWSER_CONFIRM_ACTIONS` env) | 728 | `--confirm-interactive` | Interactive confirmation prompts; auto-denies if stdin is not a TTY (or `AGENT_BROWSER_CONFIRM_INTERACTIVE` env) | 729 | `--engine <name>` | Browser engine: `chrome` (default), `lightpanda` (or `AGENT_BROWSER_ENGINE` env) | 730 | `--no-auto-dialog` | Disable automatic dismissal of `alert`/`beforeunload` dialogs (or `AGENT_BROWSER_NO_AUTO_DIALOG` env) | 731 | `--model <name>` | AI model for chat command (or `AI_GATEWAY_MODEL` env) | 732 | `-v`, `--verbose` | Show tool commands and their raw output (chat) | 733 | `-q`, `--quiet` | Show only AI text responses, hide tool calls (chat) | 734 | `--config <path>` | Use a custom config file (or `AGENT_BROWSER_CONFIG` env) | 735 | `--debug` | Debug output | 736 737 ## Observability Dashboard 738 739 Monitor agent-browser sessions in real time with a local web dashboard showing a live viewport and command activity feed. 740 741 ```bash 742 # Start the dashboard server (runs in background on port 4848) 743 agent-browser dashboard start 744 agent-browser dashboard start --port 8080 # Custom port 745 746 # All sessions are automatically visible in the dashboard 747 agent-browser open example.com 748 749 # Stop the dashboard 750 agent-browser dashboard stop 751 ``` 752 753 The dashboard runs as a standalone background process on port 4848, independent of browser sessions. It stays available even when no sessions are running. All sessions automatically stream to the dashboard. 754 755 The dashboard displays: 756 - **Live viewport** -- real-time JPEG frames from the browser 757 - **Activity feed** -- chronological command/result stream with timing and expandable details 758 - **Console output** -- browser console messages (log, warn, error) 759 - **Session creation** -- create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel) 760 - **AI Chat** -- chat with an AI assistant directly in the dashboard (requires Vercel AI Gateway configuration) 761 762 ### AI Chat 763 764 The dashboard includes an optional AI chat panel powered by the Vercel AI Gateway. The same functionality is available directly from the CLI via the `chat` command. Set these environment variables to enable AI chat: 765 766 ```bash 767 export AI_GATEWAY_API_KEY=gw_your_key_here 768 export AI_GATEWAY_MODEL=anthropic/claude-sonnet-4.6 # optional, this is the default 769 export AI_GATEWAY_URL=https://ai-gateway.vercel.sh # optional, this is the default 770 ``` 771 772 **CLI usage:** 773 774 ```bash 775 agent-browser chat "open google.com and search for cats" # Single-shot 776 agent-browser chat # Interactive REPL 777 agent-browser -q chat "summarize this page" # Quiet mode (text only) 778 agent-browser -v chat "fill in the login form" # Verbose (show command output) 779 agent-browser --model openai/gpt-4o chat "take a screenshot" # Override model 780 ``` 781 782 The `chat` command translates natural language instructions into agent-browser commands, executes them, and streams the AI response. In interactive mode, type `quit` to exit. Use `--json` for structured output suitable for agent consumption. 783 784 **Dashboard usage:** 785 786 The Chat tab is always visible in the dashboard. When `AI_GATEWAY_API_KEY` is set, the Rust server proxies requests to the gateway and streams responses back using the Vercel AI SDK's UI Message Stream protocol. Without the key, sending a message shows an error inline. 787 788 ## Configuration 789 790 Create an `agent-browser.json` file to set persistent defaults instead of repeating flags on every command. 791 792 **Locations (lowest to highest priority):** 793 794 1. `~/.agent-browser/config.json` -- user-level defaults 795 2. `./agent-browser.json` -- project-level overrides (in working directory) 796 3. `AGENT_BROWSER_*` environment variables override config file values 797 4. CLI flags override everything 798 799 **Example `agent-browser.json`:** 800 801 ```json 802 { 803 "headed": true, 804 "proxy": "http://localhost:8080", 805 "profile": "./browser-data", 806 "userAgent": "my-agent/1.0", 807 "ignoreHttpsErrors": true 808 } 809 ``` 810 811 Use `--config <path>` or `AGENT_BROWSER_CONFIG` to load a specific config file instead of the defaults: 812 813 ```bash 814 agent-browser --config ./ci-config.json open example.com 815 AGENT_BROWSER_CONFIG=./ci-config.json agent-browser open example.com 816 ``` 817 818 All options from the table above can be set in the config file using camelCase keys (e.g., `--executable-path` becomes `"executablePath"`, `--proxy-bypass` becomes `"proxyBypass"`). Unknown keys are ignored for forward compatibility. 819 820 A [JSON Schema](agent-browser.schema.json) is available for IDE autocomplete and validation. Add a `$schema` key to your config file to enable it: 821 822 ```json 823 { 824 "$schema": "https://agent-browser.dev/schema.json", 825 "headed": true 826 } 827 ``` 828 829 Boolean flags accept an optional `true`/`false` value to override config settings. For example, `--headed false` disables `"headed": true` from config. A bare `--headed` is equivalent to `--headed true`. 830 831 Auto-discovered config files that are missing are silently ignored. If `--config <path>` points to a missing or invalid file, agent-browser exits with an error. Extensions from user and project configs are merged (concatenated), not replaced. 832 833 > **Tip:** If your project-level `agent-browser.json` contains environment-specific values (paths, proxies), consider adding it to `.gitignore`. 834 835 ## Default Timeout 836 837 The default timeout for standard operations (clicks, waits, fills, etc.) is 25 seconds. This is intentionally below the CLI's 30-second IPC read timeout so that the daemon returns a proper error instead of the CLI timing out with EAGAIN. 838 839 Override the default timeout via environment variable: 840 841 ```bash 842 # Set a longer timeout for slow pages (in milliseconds) 843 export AGENT_BROWSER_DEFAULT_TIMEOUT=45000 844 ``` 845 846 > **Note:** Setting this above 30000 (30s) may cause EAGAIN errors on slow operations because the CLI's read timeout will expire before the daemon responds. The CLI retries transient errors automatically, but response times will increase. 847 848 | Variable | Description | 849 | ------------------------------- | ---------------------------------------- | 850 | `AGENT_BROWSER_DEFAULT_TIMEOUT` | Default operation timeout in ms (default: 25000) | 851 852 ## Selectors 853 854 ### Refs (Recommended for AI) 855 856 Refs provide deterministic element selection from snapshots: 857 858 ```bash 859 # 1. Get snapshot with refs 860 agent-browser snapshot 861 # Output: 862 # - heading "Example Domain" [ref=e1] [level=1] 863 # - button "Submit" [ref=e2] 864 # - textbox "Email" [ref=e3] 865 # - link "Learn more" [ref=e4] 866 867 # 2. Use refs to interact 868 agent-browser click @e2 # Click the button 869 agent-browser fill @e3 "test@example.com" # Fill the textbox 870 agent-browser get text @e1 # Get heading text 871 agent-browser hover @e4 # Hover the link 872 ``` 873 874 **Why use refs?** 875 876 - **Deterministic**: Ref points to exact element from snapshot 877 - **Fast**: No DOM re-query needed 878 - **AI-friendly**: Snapshot + ref workflow is optimal for LLMs 879 880 ### CSS Selectors 881 882 ```bash 883 agent-browser click "#id" 884 agent-browser click ".class" 885 agent-browser click "div > button" 886 ``` 887 888 ### Text & XPath 889 890 ```bash 891 agent-browser click "text=Submit" 892 agent-browser click "xpath=//button" 893 ``` 894 895 ### Semantic Locators 896 897 ```bash 898 agent-browser find role button click --name "Submit" 899 agent-browser find label "Email" fill "test@test.com" 900 ``` 901 902 ## Agent Mode 903 904 Use `--json` for machine-readable output: 905 906 ```bash 907 agent-browser snapshot --json 908 # Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}} 909 910 agent-browser get text @e1 --json 911 agent-browser is visible @e2 --json 912 ``` 913 914 ### Optimal AI Workflow 915 916 ```bash 917 # 1. Navigate and get snapshot 918 agent-browser open example.com 919 agent-browser snapshot -i --json # AI parses tree and refs 920 921 # 2. AI identifies target refs from snapshot 922 # 3. Execute actions using refs 923 agent-browser click @e2 924 agent-browser fill @e3 "input text" 925 926 # 4. Get new snapshot if page changed 927 agent-browser snapshot -i --json 928 ``` 929 930 ### Command Chaining 931 932 Commands can be chained with `&&` in a single shell invocation. The browser persists via a background daemon, so chaining is safe and more efficient: 933 934 ```bash 935 # Open, wait for load, and snapshot in one call 936 agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i 937 938 # Chain multiple interactions 939 agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3 940 941 # Navigate and screenshot 942 agent-browser open example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png 943 ``` 944 945 Use `&&` when you don't need intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs before interacting). 946 947 ## Headed Mode 948 949 Show the browser window for debugging: 950 951 ```bash 952 agent-browser open example.com --headed 953 ``` 954 955 This opens a visible browser window instead of running headless. 956 957 > **Note:** Browser extensions work in both headed and headless mode (Chrome's `--headless=new`). 958 959 ## Authenticated Sessions 960 961 Use `--headers` to set HTTP headers for a specific origin, enabling authentication without login flows: 962 963 ```bash 964 # Headers are scoped to api.example.com only 965 agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}' 966 967 # Requests to api.example.com include the auth header 968 agent-browser snapshot -i --json 969 agent-browser click @e2 970 971 # Navigate to another domain - headers are NOT sent (safe!) 972 agent-browser open other-site.com 973 ``` 974 975 This is useful for: 976 977 - **Skipping login flows** - Authenticate via headers instead of UI 978 - **Switching users** - Start new sessions with different auth tokens 979 - **API testing** - Access protected endpoints directly 980 - **Security** - Headers are scoped to the origin, not leaked to other domains 981 982 To set headers for multiple origins, use `--headers` with each `open` command: 983 984 ```bash 985 agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}' 986 agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}' 987 ``` 988 989 For global headers (all domains), use `set headers`: 990 991 ```bash 992 agent-browser set headers '{"X-Custom-Header": "value"}' 993 ``` 994 995 ## Custom Browser Executable 996 997 Use a custom browser executable instead of the bundled Chromium. This is useful for: 998 999 - **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB) 1000 - **System browsers**: Use an existing Chrome/Chromium installation 1001 - **Custom builds**: Use modified browser builds 1002 1003 ### CLI Usage 1004 1005 ```bash 1006 # Via flag 1007 agent-browser --executable-path /path/to/chromium open example.com 1008 1009 # Via environment variable 1010 AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com 1011 ``` 1012 1013 ### Serverless (Vercel) 1014 1015 Run agent-browser + Chrome in an ephemeral Vercel Sandbox microVM. No external server needed: 1016 1017 ```typescript 1018 import { Sandbox } from "@vercel/sandbox"; 1019 1020 const sandbox = await Sandbox.create({ runtime: "node24" }); 1021 await sandbox.runCommand("agent-browser", ["open", "https://example.com"]); 1022 const result = await sandbox.runCommand("agent-browser", ["screenshot", "--json"]); 1023 await sandbox.stop(); 1024 ``` 1025 1026 See the [environments example](examples/environments/) for a working demo with a UI and deploy-to-Vercel button. 1027 1028 ### Serverless (AWS Lambda) 1029 1030 ```typescript 1031 import chromium from '@sparticuz/chromium'; 1032 import { execSync } from 'child_process'; 1033 1034 export async function handler() { 1035 const executablePath = await chromium.executablePath(); 1036 const result = execSync( 1037 `AGENT_BROWSER_EXECUTABLE_PATH=${executablePath} agent-browser open https://example.com && agent-browser snapshot -i --json`, 1038 { encoding: 'utf-8' } 1039 ); 1040 return JSON.parse(result); 1041 } 1042 ``` 1043 1044 ## Local Files 1045 1046 Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs: 1047 1048 ```bash 1049 # Enable file access (required for JavaScript to access local files) 1050 agent-browser --allow-file-access open file:///path/to/document.pdf 1051 agent-browser --allow-file-access open file:///path/to/page.html 1052 1053 # Take screenshot of a local PDF 1054 agent-browser --allow-file-access open file:///Users/me/report.pdf 1055 agent-browser screenshot report.png 1056 ``` 1057 1058 The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to: 1059 1060 - Load and render local files 1061 - Access other local files via JavaScript (XHR, fetch) 1062 - Load local resources (images, scripts, stylesheets) 1063 1064 **Note:** This flag only works with Chromium. For security, it's disabled by default. 1065 1066 ## CDP Mode 1067 1068 Connect to an existing browser via Chrome DevTools Protocol: 1069 1070 ```bash 1071 # Start Chrome with: google-chrome --remote-debugging-port=9222 1072 1073 # Connect once, then run commands without --cdp 1074 agent-browser connect 9222 1075 agent-browser snapshot 1076 agent-browser tab 1077 agent-browser close 1078 1079 # Or pass --cdp on each command 1080 agent-browser --cdp 9222 snapshot 1081 1082 # Connect to remote browser via WebSocket URL 1083 agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot 1084 ``` 1085 1086 The `--cdp` flag accepts either: 1087 1088 - A port number (e.g., `9222`) for local connections via `http://localhost:{port}` 1089 - A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services 1090 1091 This enables control of: 1092 1093 - Electron apps 1094 - Chrome/Chromium instances with remote debugging 1095 - WebView2 applications 1096 - Any browser exposing a CDP endpoint 1097 1098 ### Auto-Connect 1099 1100 Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port: 1101 1102 ```bash 1103 # Auto-discover running Chrome with remote debugging 1104 agent-browser --auto-connect open example.com 1105 agent-browser --auto-connect snapshot 1106 1107 # Or via environment variable 1108 AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot 1109 ``` 1110 1111 Auto-connect discovers Chrome by: 1112 1113 1. Reading Chrome's `DevToolsActivePort` file from the default user data directory 1114 2. Falling back to probing common debugging ports (9222, 9229) 1115 3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection 1116 1117 This is useful when: 1118 1119 - Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port) 1120 - You want a zero-configuration connection to your existing browser 1121 - You don't want to track which port Chrome is using 1122 1123 ## Streaming (Browser Preview) 1124 1125 Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent. 1126 1127 ### Streaming 1128 1129 Every session automatically starts a WebSocket stream server on an OS-assigned port. Use `stream status` to see the bound port and connection state: 1130 1131 ```bash 1132 agent-browser stream status 1133 ``` 1134 1135 To bind to a specific port, set `AGENT_BROWSER_STREAM_PORT`: 1136 1137 ```bash 1138 AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com 1139 ``` 1140 1141 You can also manage streaming at runtime with `stream enable`, `stream disable`, and `stream status`: 1142 1143 ```bash 1144 agent-browser stream enable --port 9223 # Re-enable on a specific port 1145 agent-browser stream disable # Stop streaming for the session 1146 ``` 1147 1148 The WebSocket server streams the browser viewport and accepts input events. 1149 1150 ### WebSocket Protocol 1151 1152 Connect to `ws://localhost:9223` to receive frames and send input: 1153 1154 **Receive frames:** 1155 1156 ```json 1157 { 1158 "type": "frame", 1159 "data": "<base64-encoded-jpeg>", 1160 "metadata": { 1161 "deviceWidth": 1280, 1162 "deviceHeight": 720, 1163 "pageScaleFactor": 1, 1164 "offsetTop": 0, 1165 "scrollOffsetX": 0, 1166 "scrollOffsetY": 0 1167 } 1168 } 1169 ``` 1170 1171 **Send mouse events:** 1172 1173 ```json 1174 { 1175 "type": "input_mouse", 1176 "eventType": "mousePressed", 1177 "x": 100, 1178 "y": 200, 1179 "button": "left", 1180 "clickCount": 1 1181 } 1182 ``` 1183 1184 **Send keyboard events:** 1185 1186 ```json 1187 { 1188 "type": "input_keyboard", 1189 "eventType": "keyDown", 1190 "key": "Enter", 1191 "code": "Enter" 1192 } 1193 ``` 1194 1195 **Send touch events:** 1196 1197 ```json 1198 { 1199 "type": "input_touch", 1200 "eventType": "touchStart", 1201 "touchPoints": [{ "x": 100, "y": 200 }] 1202 } 1203 ``` 1204 1205 ## Architecture 1206 1207 agent-browser uses a client-daemon architecture: 1208 1209 1. **Rust CLI** - Parses commands, communicates with daemon 1210 2. **Rust Daemon** - Pure Rust daemon using direct CDP, no Node.js required 1211 1212 The daemon starts automatically on first command and persists between commands for fast subsequent operations. To auto-shutdown the daemon after a period of inactivity, set `AGENT_BROWSER_IDLE_TIMEOUT_MS` (value in milliseconds). When set, the daemon closes the browser and exits after receiving no commands for the specified duration. 1213 1214 **Browser Engine:** Uses Chrome (from Chrome for Testing) by default. The `--engine` flag selects between `chrome` and `lightpanda`. Supported browsers: Chromium/Chrome (via CDP) and Safari (via WebDriver for iOS). 1215 1216 ## Platforms 1217 1218 | Platform | Binary | 1219 | ----------- | ----------- | 1220 | macOS ARM64 | Native Rust | 1221 | macOS x64 | Native Rust | 1222 | Linux ARM64 | Native Rust | 1223 | Linux x64 | Native Rust | 1224 | Windows x64 | Native Rust | 1225 1226 ## Usage with AI Agents 1227 1228 ### Just ask the agent 1229 1230 The simplest approach -- just tell your agent to use it: 1231 1232 ``` 1233 Use agent-browser to test the login flow. Run agent-browser --help to see available commands. 1234 ``` 1235 1236 The `--help` output is comprehensive and most agents can figure it out from there. 1237 1238 ### AI Coding Assistants (recommended) 1239 1240 Add the skill to your AI coding assistant for richer context: 1241 1242 ```bash 1243 npx skills add vercel-labs/agent-browser 1244 ``` 1245 1246 This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale. 1247 1248 ### Claude Code 1249 1250 Install as a Claude Code skill: 1251 1252 ```bash 1253 npx skills add vercel-labs/agent-browser 1254 ``` 1255 1256 This adds a thin discovery stub at `.claude/skills/agent-browser/SKILL.md`. The stub is intentionally minimal — it points Claude Code at `agent-browser skills get core` to load the actual workflow content at runtime. This way the instructions always match the installed CLI version instead of going stale between releases. 1257 1258 ### AGENTS.md / CLAUDE.md 1259 1260 For more consistent results, add to your project or global instructions file: 1261 1262 ```markdown 1263 ## Browser Automation 1264 1265 Use `agent-browser` for web automation. Run `agent-browser --help` for all commands. 1266 1267 Core workflow: 1268 1269 1. `agent-browser open <url>` - Navigate to page 1270 2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2) 1271 3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs 1272 4. Re-snapshot after page changes 1273 ``` 1274 1275 ## Integrations 1276 1277 ### iOS Simulator 1278 1279 Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode. 1280 1281 **Setup:** 1282 1283 ```bash 1284 # Install Appium and XCUITest driver 1285 npm install -g appium 1286 appium driver install xcuitest 1287 ``` 1288 1289 **Usage:** 1290 1291 ```bash 1292 # List available iOS simulators 1293 agent-browser device list 1294 1295 # Launch Safari on a specific device 1296 agent-browser -p ios --device "iPhone 16 Pro" open https://example.com 1297 1298 # Same commands as desktop 1299 agent-browser -p ios snapshot -i 1300 agent-browser -p ios tap @e1 1301 agent-browser -p ios fill @e2 "text" 1302 agent-browser -p ios screenshot mobile.png 1303 1304 # Mobile-specific commands 1305 agent-browser -p ios swipe up 1306 agent-browser -p ios swipe down 500 1307 1308 # Close session 1309 agent-browser -p ios close 1310 ``` 1311 1312 Or use environment variables: 1313 1314 ```bash 1315 export AGENT_BROWSER_PROVIDER=ios 1316 export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro" 1317 agent-browser open https://example.com 1318 ``` 1319 1320 | Variable | Description | 1321 | -------------------------- | ----------------------------------------------- | 1322 | `AGENT_BROWSER_PROVIDER` | Set to `ios` to enable iOS mode | 1323 | `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") | 1324 | `AGENT_BROWSER_IOS_UDID` | Device UDID (alternative to device name) | 1325 1326 **Supported devices:** All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices. 1327 1328 **Note:** The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast. 1329 1330 #### Real Device Support 1331 1332 Appium also supports real iOS devices connected via USB. This requires additional one-time setup: 1333 1334 **1. Get your device UDID:** 1335 1336 ```bash 1337 xcrun xctrace list devices 1338 # or 1339 system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad" 1340 ``` 1341 1342 **2. Sign WebDriverAgent (one-time):** 1343 1344 ```bash 1345 # Open the WebDriverAgent Xcode project 1346 cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent 1347 open WebDriverAgent.xcodeproj 1348 ``` 1349 1350 In Xcode: 1351 1352 - Select the `WebDriverAgentRunner` target 1353 - Go to Signing & Capabilities 1354 - Select your Team (requires Apple Developer account, free tier works) 1355 - Let Xcode manage signing automatically 1356 1357 **3. Use with agent-browser:** 1358 1359 ```bash 1360 # Connect device via USB, then: 1361 agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com 1362 1363 # Or use the device name if unique 1364 agent-browser -p ios --device "John's iPhone" open https://example.com 1365 ``` 1366 1367 **Real device notes:** 1368 1369 - First run installs WebDriverAgent to the device (may require Trust prompt) 1370 - Device must be unlocked and connected via USB 1371 - Slightly slower initial connection than simulator 1372 - Tests against real Safari performance and behavior 1373 1374 ### Browserless 1375 1376 [Browserless](https://browserless.io) provides cloud browser infrastructure with a Sessions API. Use it when running agent-browser in environments where a local browser isn't available. 1377 1378 To enable Browserless, use the `-p` flag: 1379 1380 ```bash 1381 export BROWSERLESS_API_KEY="your-api-token" 1382 agent-browser -p browserless open https://example.com 1383 ``` 1384 1385 Or use environment variables for CI/scripts: 1386 1387 ```bash 1388 export AGENT_BROWSER_PROVIDER=browserless 1389 export BROWSERLESS_API_KEY="your-api-token" 1390 agent-browser open https://example.com 1391 ``` 1392 1393 Optional configuration via environment variables: 1394 1395 | Variable | Description | Default | 1396 | -------------------------- | ------------------------------------------------ | --------------------------------------- | 1397 | `BROWSERLESS_API_URL` | Base API URL (for custom regions or self-hosted) | `https://production-sfo.browserless.io` | 1398 | `BROWSERLESS_BROWSER_TYPE` | Type of browser to use (chromium or chrome) | chromium | 1399 | `BROWSERLESS_TTL` | Session TTL in milliseconds | `300000` | 1400 | `BROWSERLESS_STEALTH` | Enable stealth mode (`true`/`false`) | `true` | 1401 1402 When enabled, agent-browser connects to a Browserless cloud session instead of launching a local browser. All commands work identically. 1403 1404 Get your API token from the [Browserless Dashboard](https://browserless.io). 1405 1406 ### Browserbase 1407 1408 [Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible. 1409 1410 To enable Browserbase, use the `-p` flag: 1411 1412 ```bash 1413 export BROWSERBASE_API_KEY="your-api-key" 1414 agent-browser -p browserbase open https://example.com 1415 ``` 1416 1417 Or use environment variables for CI/scripts: 1418 1419 ```bash 1420 export AGENT_BROWSER_PROVIDER=browserbase 1421 export BROWSERBASE_API_KEY="your-api-key" 1422 agent-browser open https://example.com 1423 ``` 1424 1425 When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically. 1426 1427 Get your API key from the [Browserbase Dashboard](https://browserbase.com/overview). 1428 1429 ### Browser Use 1430 1431 [Browser Use](https://browser-use.com) provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.). 1432 1433 To enable Browser Use, use the `-p` flag: 1434 1435 ```bash 1436 export BROWSER_USE_API_KEY="your-api-key" 1437 agent-browser -p browseruse open https://example.com 1438 ``` 1439 1440 Or use environment variables for CI/scripts: 1441 1442 ```bash 1443 export AGENT_BROWSER_PROVIDER=browseruse 1444 export BROWSER_USE_API_KEY="your-api-key" 1445 agent-browser open https://example.com 1446 ``` 1447 1448 When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically. 1449 1450 Get your API key from the [Browser Use Cloud Dashboard](https://cloud.browser-use.com/settings?tab=api-keys). Free credits are available to get started, with pay-as-you-go pricing after. 1451 1452 ### Kernel 1453 1454 [Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles. 1455 1456 To enable Kernel, use the `-p` flag: 1457 1458 ```bash 1459 export KERNEL_API_KEY="your-api-key" 1460 agent-browser -p kernel open https://example.com 1461 ``` 1462 1463 Or use environment variables for CI/scripts: 1464 1465 ```bash 1466 export AGENT_BROWSER_PROVIDER=kernel 1467 export KERNEL_API_KEY="your-api-key" 1468 agent-browser open https://example.com 1469 ``` 1470 1471 Optional configuration via environment variables: 1472 1473 | Variable | Description | Default | 1474 | ------------------------ | -------------------------------------------------------------------------------- | ------- | 1475 | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` | 1476 | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` | 1477 | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` | 1478 | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) | 1479 1480 When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically. 1481 1482 **Profile Persistence:** When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions. 1483 1484 Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com). 1485 1486 ### AgentCore 1487 1488 [AWS Bedrock AgentCore](https://aws.amazon.com/bedrock/agentcore/) provides cloud browser sessions with SigV4 authentication. 1489 1490 To enable AgentCore, use the `-p` flag: 1491 1492 ```bash 1493 agent-browser -p agentcore open https://example.com 1494 ``` 1495 1496 Or use environment variables for CI/scripts: 1497 1498 ```bash 1499 export AGENT_BROWSER_PROVIDER=agentcore 1500 agent-browser open https://example.com 1501 ``` 1502 1503 Credentials are automatically resolved from environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) or the AWS CLI (`aws configure export-credentials`), which supports SSO, profiles, and IAM roles. 1504 1505 Optional configuration via environment variables: 1506 1507 | Variable | Description | Default | 1508 | -------------------------- | -------------------------------------------------------------------- | ---------------- | 1509 | `AGENTCORE_REGION` | AWS region for the AgentCore endpoint | `us-east-1` | 1510 | `AGENTCORE_BROWSER_ID` | Browser identifier | `aws.browser.v1` | 1511 | `AGENTCORE_PROFILE_ID` | Browser profile for persistent state (cookies, localStorage) | (none) | 1512 | `AGENTCORE_SESSION_TIMEOUT`| Session timeout in seconds | `3600` | 1513 | `AWS_PROFILE` | AWS CLI profile for credential resolution | `default` | 1514 1515 **Browser profiles:** When `AGENTCORE_PROFILE_ID` is set, browser state (cookies, localStorage) is persisted across sessions automatically. 1516 1517 When enabled, agent-browser connects to an AgentCore cloud browser session instead of launching a local browser. All commands work identically. 1518 1519 ## License 1520 1521 Apache-2.0