Cradicle Explorer

/ README.md
README.md
   1  # agent-browser
   2  
   3  Browser automation CLI for AI agents. Fast native Rust CLI.
   4  
   5  ## Installation
   6  
   7  ### Global Installation (recommended)
   8  
   9  Installs the native Rust binary:
  10  
  11  ```bash
  12  npm install -g agent-browser
  13  agent-browser install  # Download Chrome from Chrome for Testing (first time only)
  14  ```
  15  
  16  ### Project Installation (local dependency)
  17  
  18  For projects that want to pin the version in `package.json`:
  19  
  20  ```bash
  21  npm install agent-browser
  22  agent-browser install
  23  ```
  24  
  25  Then use via `package.json` scripts or by invoking `agent-browser` directly.
  26  
  27  ### Homebrew (macOS)
  28  
  29  ```bash
  30  brew install agent-browser
  31  agent-browser install  # Download Chrome from Chrome for Testing (first time only)
  32  ```
  33  
  34  ### Cargo (Rust)
  35  
  36  ```bash
  37  cargo install agent-browser
  38  agent-browser install  # Download Chrome from Chrome for Testing (first time only)
  39  ```
  40  
  41  ### From Source
  42  
  43  ```bash
  44  git clone https://github.com/vercel-labs/agent-browser
  45  cd agent-browser
  46  pnpm install
  47  pnpm build
  48  pnpm build:native   # Requires Rust (https://rustup.rs)
  49  pnpm link --global  # Makes agent-browser available globally
  50  agent-browser install
  51  ```
  52  
  53  ### Linux Dependencies
  54  
  55  On Linux, install system dependencies:
  56  
  57  ```bash
  58  agent-browser install --with-deps
  59  ```
  60  
  61  ### Updating
  62  
  63  Upgrade to the latest version:
  64  
  65  ```bash
  66  agent-browser upgrade
  67  ```
  68  
  69  Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically.
  70  
  71  ### Requirements
  72  
  73  - **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon.
  74  - **Rust** - Only needed when building from source (see From Source above).
  75  
  76  ## Quick Start
  77  
  78  ```bash
  79  agent-browser open example.com
  80  agent-browser snapshot                    # Get accessibility tree with refs
  81  agent-browser click @e2                   # Click by ref from snapshot
  82  agent-browser fill @e3 "test@example.com" # Fill by ref
  83  agent-browser get text @e1                # Get text by ref
  84  agent-browser screenshot page.png
  85  agent-browser close
  86  ```
  87  
  88  ### Traditional Selectors (also supported)
  89  
  90  ```bash
  91  agent-browser click "#submit"
  92  agent-browser fill "#email" "test@example.com"
  93  agent-browser find role button click --name "Submit"
  94  ```
  95  
  96  ## Commands
  97  
  98  ### Core Commands
  99  
 100  ```bash
 101  agent-browser open                    # Launch browser (no navigation); stays on about:blank
 102  agent-browser open <url>              # Launch + navigate to URL (aliases: goto, navigate)
 103  agent-browser click <sel>             # Click element (--new-tab to open in new tab)
 104  agent-browser dblclick <sel>          # Double-click element
 105  agent-browser focus <sel>             # Focus element
 106  agent-browser type <sel> <text>       # Type into element
 107  agent-browser fill <sel> <text>       # Clear and fill
 108  agent-browser press <key>             # Press key (Enter, Tab, Control+a) (alias: key)
 109  agent-browser keyboard type <text>    # Type with real keystrokes (no selector, current focus)
 110  agent-browser keyboard inserttext <text>  # Insert text without key events (no selector)
 111  agent-browser keydown <key>           # Hold key down
 112  agent-browser keyup <key>             # Release key
 113  agent-browser hover <sel>             # Hover element
 114  agent-browser select <sel> <val>      # Select dropdown option
 115  agent-browser check <sel>             # Check checkbox
 116  agent-browser uncheck <sel>           # Uncheck checkbox
 117  agent-browser scroll <dir> [px]       # Scroll (up/down/left/right, --selector <sel>)
 118  agent-browser scrollintoview <sel>    # Scroll element into view (alias: scrollinto)
 119  agent-browser drag <src> <tgt>        # Drag and drop
 120  agent-browser upload <sel> <files>    # Upload files
 121  agent-browser screenshot [path]       # Take screenshot (--full for full page, saves to a temporary directory if no path)
 122  agent-browser screenshot --annotate   # Annotated screenshot with numbered element labels
 123  agent-browser screenshot --screenshot-dir ./shots    # Save to custom directory
 124  agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
 125  agent-browser pdf <path>              # Save as PDF
 126  agent-browser snapshot                # Accessibility tree with refs (best for AI)
 127  agent-browser eval <js>               # Run JavaScript (-b for base64, --stdin for piped input)
 128  agent-browser connect <port>          # Connect to browser via CDP
 129  agent-browser stream enable [--port <port>]  # Start runtime WebSocket streaming
 130  agent-browser stream status           # Show runtime streaming state and bound port
 131  agent-browser stream disable          # Stop runtime WebSocket streaming
 132  agent-browser close                   # Close browser (aliases: quit, exit)
 133  agent-browser close --all             # Close all active sessions
 134  agent-browser chat "<instruction>"    # AI chat: natural language browser control (single-shot)
 135  agent-browser chat                    # AI chat: interactive REPL mode
 136  ```
 137  
 138  ### Get Info
 139  
 140  ```bash
 141  agent-browser get text <sel>          # Get text content
 142  agent-browser get html <sel>          # Get innerHTML
 143  agent-browser get value <sel>         # Get input value
 144  agent-browser get attr <sel> <attr>   # Get attribute
 145  agent-browser get title               # Get page title
 146  agent-browser get url                 # Get current URL
 147  agent-browser get cdp-url             # Get CDP WebSocket URL (for DevTools, debugging)
 148  agent-browser get count <sel>         # Count matching elements
 149  agent-browser get box <sel>           # Get bounding box
 150  agent-browser get styles <sel>        # Get computed styles
 151  ```
 152  
 153  ### Check State
 154  
 155  ```bash
 156  agent-browser is visible <sel>        # Check if visible
 157  agent-browser is enabled <sel>        # Check if enabled
 158  agent-browser is checked <sel>        # Check if checked
 159  ```
 160  
 161  ### Find Elements (Semantic Locators)
 162  
 163  ```bash
 164  agent-browser find role <role> <action> [value]       # By ARIA role
 165  agent-browser find text <text> <action>               # By text content
 166  agent-browser find label <label> <action> [value]     # By label
 167  agent-browser find placeholder <ph> <action> [value]  # By placeholder
 168  agent-browser find alt <text> <action>                # By alt text
 169  agent-browser find title <text> <action>              # By title attr
 170  agent-browser find testid <id> <action> [value]       # By data-testid
 171  agent-browser find first <sel> <action> [value]       # First match
 172  agent-browser find last <sel> <action> [value]        # Last match
 173  agent-browser find nth <n> <sel> <action> [value]     # Nth match
 174  ```
 175  
 176  **Actions:** `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text`
 177  
 178  **Options:** `--name <name>` (filter role by accessible name), `--exact` (require exact text match)
 179  
 180  **Examples:**
 181  
 182  ```bash
 183  agent-browser find role button click --name "Submit"
 184  agent-browser find text "Sign In" click
 185  agent-browser find label "Email" fill "test@test.com"
 186  agent-browser find first ".item" click
 187  agent-browser find nth 2 "a" text
 188  ```
 189  
 190  ### Wait
 191  
 192  ```bash
 193  agent-browser wait <selector>         # Wait for element to be visible
 194  agent-browser wait <ms>               # Wait for time (milliseconds)
 195  agent-browser wait --text "Welcome"   # Wait for text to appear (substring match)
 196  agent-browser wait --url "**/dash"    # Wait for URL pattern
 197  agent-browser wait --load networkidle # Wait for load state
 198  agent-browser wait --fn "window.ready === true"  # Wait for JS condition
 199  
 200  # Wait for text/element to disappear
 201  agent-browser wait --fn "!document.body.innerText.includes('Loading...')"
 202  agent-browser wait "#spinner" --state hidden
 203  ```
 204  
 205  **Load states:** `load`, `domcontentloaded`, `networkidle`
 206  
 207  ### Batch Execution
 208  
 209  Execute multiple commands in a single invocation. Commands can be passed as
 210  quoted arguments or piped as JSON via stdin. This avoids per-command process
 211  startup overhead when running multi-step workflows.
 212  
 213  ```bash
 214  # Argument mode: each quoted argument is a full command
 215  agent-browser batch "open https://example.com" "snapshot -i" "screenshot"
 216  
 217  # With --bail to stop on first error
 218  agent-browser batch --bail "open https://example.com" "click @e1" "screenshot"
 219  
 220  # Stdin mode: pipe commands as JSON
 221  echo '[
 222    ["open", "https://example.com"],
 223    ["snapshot", "-i"],
 224    ["click", "@e1"],
 225    ["screenshot", "result.png"]
 226  ]' | agent-browser batch --json
 227  ```
 228  
 229  ### Clipboard
 230  
 231  ```bash
 232  agent-browser clipboard read                      # Read text from clipboard
 233  agent-browser clipboard write "Hello, World!"     # Write text to clipboard
 234  agent-browser clipboard copy                      # Copy current selection (Ctrl+C)
 235  agent-browser clipboard paste                     # Paste from clipboard (Ctrl+V)
 236  ```
 237  
 238  ### Mouse Control
 239  
 240  ```bash
 241  agent-browser mouse move <x> <y>      # Move mouse
 242  agent-browser mouse down [button]     # Press button (left/right/middle)
 243  agent-browser mouse up [button]       # Release button
 244  agent-browser mouse wheel <dy> [dx]   # Scroll wheel
 245  ```
 246  
 247  ### Browser Settings
 248  
 249  ```bash
 250  agent-browser set viewport <w> <h> [scale]  # Set viewport size (scale for retina, e.g. 2)
 251  agent-browser set device <name>       # Emulate device ("iPhone 14")
 252  agent-browser set geo <lat> <lng>     # Set geolocation
 253  agent-browser set offline [on|off]    # Toggle offline mode
 254  agent-browser set headers <json>      # Extra HTTP headers
 255  agent-browser set credentials <u> <p> # HTTP basic auth
 256  agent-browser set media [dark|light]  # Emulate color scheme
 257  ```
 258  
 259  ### Cookies & Storage
 260  
 261  ```bash
 262  agent-browser cookies                 # Get all cookies
 263  agent-browser cookies set <name> <val> # Set cookie
 264  agent-browser cookies set --curl <file> # Import cookies from a Copy-as-cURL dump,
 265                                          # JSON array, or bare Cookie header (auto-detected)
 266  agent-browser cookies clear           # Clear cookies
 267  
 268  agent-browser storage local           # Get all localStorage
 269  agent-browser storage local <key>     # Get specific key
 270  agent-browser storage local set <k> <v>  # Set value
 271  agent-browser storage local clear     # Clear all
 272  
 273  agent-browser storage session         # Same for sessionStorage
 274  ```
 275  
 276  ### Network
 277  
 278  ```bash
 279  agent-browser network route <url>              # Intercept requests
 280  agent-browser network route <url> --abort      # Block requests
 281  agent-browser network route <url> --body <json>  # Mock response
 282  agent-browser network route '*' --abort --resource-type script  # Block scripts only
 283  agent-browser network unroute [url]            # Remove routes
 284  agent-browser network requests                 # View tracked requests
 285  agent-browser network requests --filter api    # Filter requests
 286  agent-browser network requests --type xhr,fetch  # Filter by resource type
 287  agent-browser network requests --method POST   # Filter by HTTP method
 288  agent-browser network requests --status 2xx    # Filter by status (200, 2xx, 400-499)
 289  agent-browser network request <requestId>      # View full request/response detail
 290  agent-browser network har start                # Start HAR recording
 291  agent-browser network har stop [output.har]    # Stop and save HAR (temp path if omitted)
 292  ```
 293  
 294  ### Tabs & Windows
 295  
 296  ```bash
 297  agent-browser tab                              # List tabs (shows `tabId` and optional label)
 298  agent-browser tab new [url]                    # New tab (optionally with URL)
 299  agent-browser tab new --label docs [url]       # New tab with a user-assigned label
 300  agent-browser tab <t<N>|label>                 # Switch to a tab by id or label
 301  agent-browser tab close [t<N>|label]           # Close a tab (defaults to active)
 302  agent-browser window new                       # New window
 303  ```
 304  
 305  Tab ids are stable strings of the form `t1`, `t2`, `t3`. They're never reused
 306  within a session, so scripts and agents can keep referring to the same tab
 307  even after other tabs are opened or closed. Positional integers like `tab 2`
 308  are **not** accepted; the `t` prefix disambiguates handles from indices and
 309  mirrors the `@e1` convention used for element refs.
 310  
 311  You can also assign a memorable label (`docs`, `app`, `admin`) and use it
 312  interchangeably with the id. Labels are never auto-generated and never
 313  rewritten on navigation — they're yours to name and keep:
 314  
 315  ```bash
 316  agent-browser tab new --label docs https://docs.example.com
 317  agent-browser tab docs               # switch to the docs tab
 318  agent-browser snapshot               # populate refs for docs
 319  agent-browser click @e3              # click uses docs's refs
 320  agent-browser tab close docs         # close by label
 321  ```
 322  
 323  ### Frames
 324  
 325  ```bash
 326  agent-browser frame <sel>             # Switch to iframe
 327  agent-browser frame main              # Back to main frame
 328  ```
 329  
 330  ### Dialogs
 331  
 332  ```bash
 333  agent-browser dialog accept [text]    # Accept (with optional prompt text)
 334  agent-browser dialog dismiss          # Dismiss
 335  agent-browser dialog status           # Check if a dialog is currently open
 336  ```
 337  
 338  By default, `alert` and `beforeunload` dialogs are automatically accepted so they never block the agent. `confirm` and `prompt` dialogs still require explicit handling. Use `--no-auto-dialog` (or `AGENT_BROWSER_NO_AUTO_DIALOG=1`) to disable automatic handling.
 339  
 340  When a JavaScript dialog is pending, all command responses include a `warning` field with the dialog type and message.
 341  
 342  ### Diff
 343  
 344  ```bash
 345  agent-browser diff snapshot                              # Compare current vs last snapshot
 346  agent-browser diff snapshot --baseline before.txt        # Compare current vs saved snapshot file
 347  agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
 348  agent-browser diff screenshot --baseline before.png      # Visual pixel diff against baseline
 349  agent-browser diff screenshot --baseline b.png -o d.png  # Save diff image to custom path
 350  agent-browser diff screenshot --baseline b.png -t 0.2    # Adjust color threshold (0-1)
 351  agent-browser diff url https://v1.com https://v2.com     # Compare two URLs (snapshot diff)
 352  agent-browser diff url https://v1.com https://v2.com --screenshot  # Also visual diff
 353  agent-browser diff url https://v1.com https://v2.com --wait-until networkidle  # Custom wait strategy
 354  agent-browser diff url https://v1.com https://v2.com --selector "#main"  # Scope to element
 355  ```
 356  
 357  ### Debug
 358  
 359  ```bash
 360  agent-browser trace start [path]      # Start recording trace
 361  agent-browser trace stop [path]       # Stop and save trace
 362  agent-browser profiler start          # Start Chrome DevTools profiling
 363  agent-browser profiler stop [path]    # Stop and save profile (.json)
 364  agent-browser console                 # View console messages (log, error, warn, info)
 365  agent-browser console --json          # JSON output with raw CDP args for programmatic access
 366  agent-browser console --clear         # Clear console
 367  agent-browser errors                  # View page errors (uncaught JavaScript exceptions)
 368  agent-browser errors --clear          # Clear errors
 369  agent-browser highlight <sel>         # Highlight element
 370  agent-browser inspect                 # Open Chrome DevTools for the active page
 371  agent-browser state save <path>       # Save auth state
 372  agent-browser state load <path>       # Load auth state
 373  agent-browser state list              # List saved state files
 374  agent-browser state show <file>       # Show state summary
 375  agent-browser state rename <old> <new> # Rename state file
 376  agent-browser state clear [name]      # Clear states for session
 377  agent-browser state clear --all       # Clear all saved states
 378  agent-browser state clean --older-than <days>  # Delete old states
 379  ```
 380  
 381  ### Navigation
 382  
 383  ```bash
 384  agent-browser back                    # Go back
 385  agent-browser forward                 # Go forward
 386  agent-browser reload                  # Reload page
 387  agent-browser pushstate <url>         # SPA client-side nav; auto-detects window.next.router.push,
 388                                        # falls back to history.pushState + popstate
 389  ```
 390  
 391  ### Pre-navigation setup
 392  
 393  Some flows (SSR debug, auth cookies for protected origins, init scripts)
 394  need state set up *before* the first navigation. Use `open` with no URL
 395  to launch the browser, then stage cookies / routes / init scripts, then
 396  navigate. `batch` sends it all in one CLI call:
 397  
 398  ```bash
 399  agent-browser batch \
 400    '["open"]' \
 401    '["network","route","*","--abort","--resource-type","script"]' \
 402    '["cookies","set","--curl","cookies.curl","--domain","localhost"]' \
 403    '["navigate","http://localhost:3000/target"]'
 404  ```
 405  
 406  Without `batch` the same sequence is three commands that all reuse the
 407  same daemon (fast, but not one turn).
 408  
 409  ### React / Web Vitals
 410  
 411  Agent-browser ships with first-class React introspection and universal Web
 412  Vitals metrics. The React commands need the React DevTools hook installed at
 413  launch; Web Vitals and pushstate are framework-agnostic.
 414  
 415  ```bash
 416  agent-browser open --enable react-devtools <url>   # Launch with React hook installed
 417  agent-browser react tree                           # Full component tree
 418  agent-browser react inspect <fiberId>              # props, hooks, state, source
 419  agent-browser react renders start                  # Begin fiber render recording
 420  agent-browser react renders stop [--json]          # Stop and print profile (--json for raw data)
 421  agent-browser react suspense [--only-dynamic] [--json]  # Suspense boundaries + classifier
 422                                                           # --only-dynamic hides the "static" list
 423  agent-browser vitals [url] [--json]                # LCP/CLS/TTFB/FCP/INP + React hydration phases
 424  ```
 425  
 426  Each `react ...` subcommand requires `--enable react-devtools` to have been
 427  passed at launch (the React DevTools `installHook.js` is embedded in the
 428  binary). Without it the commands error with `React DevTools hook not installed
 429  - relaunch with --enable react-devtools`.
 430  
 431  Works on any React app — Next.js, Remix, Vite+React, CRA, TanStack Start,
 432  React Native Web, etc. `vitals` and `pushstate` are framework-agnostic.
 433  
 434  ### Init scripts
 435  
 436  ```bash
 437  agent-browser open --init-script <path>           # Register page init script before first navigation
 438                                                    # (repeatable; also AGENT_BROWSER_INIT_SCRIPTS env)
 439  agent-browser addinitscript <js>                  # Register at runtime (returns identifier)
 440  agent-browser removeinitscript <identifier>       # Remove a previously registered init script
 441  ```
 442  
 443  ### Setup
 444  
 445  ```bash
 446  agent-browser install                 # Download Chrome from Chrome for Testing (Google's official automation channel)
 447  agent-browser install --with-deps     # Also install system deps (Linux)
 448  agent-browser upgrade                 # Upgrade agent-browser to the latest version
 449  agent-browser doctor                  # Diagnose the install and auto-clean stale daemon files
 450  agent-browser doctor --fix            # Also run destructive repairs (reinstall Chrome, purge old state, ...)
 451  agent-browser doctor --offline --quick  # Skip network probes and the live launch test
 452  ```
 453  
 454  `doctor` checks your environment, Chrome install, daemon state, config files,
 455  encryption key, providers, network reachability, and runs a live headless
 456  browser launch test. Stale socket/pid sidecar files are auto-cleaned. Output
 457  is also available as `--json` for agents.
 458  
 459  ### Skills
 460  
 461  ```bash
 462  agent-browser skills                  # List available skills
 463  agent-browser skills list             # Same as above
 464  agent-browser skills get <name>       # Output a skill's full content
 465  agent-browser skills get <name> --full  # Include references and templates
 466  agent-browser skills get --all        # Output every skill
 467  agent-browser skills path [name]      # Print skill directory path
 468  ```
 469  
 470  Serves bundled skill content that always matches the installed CLI version. AI agents use this to get current instructions rather than relying on cached copies. Set `AGENT_BROWSER_SKILLS_DIR` to override the skills directory path.
 471  
 472  ## Authentication
 473  
 474  agent-browser provides multiple ways to persist login sessions so you don't re-authenticate every run.
 475  
 476  ### Quick summary
 477  
 478  | Approach | Best for | Flag / Env |
 479  |----------|----------|------------|
 480  | **Chrome profile reuse** | Reuse your existing Chrome login state (cookies, sessions) with zero setup | `--profile <name>` / `AGENT_BROWSER_PROFILE` |
 481  | **Persistent profile** | Full browser state (cookies, IndexedDB, service workers, cache) across restarts | `--profile <path>` / `AGENT_BROWSER_PROFILE` |
 482  | **Session persistence** | Auto-save/restore cookies + localStorage by name | `--session-name <name>` / `AGENT_BROWSER_SESSION_NAME` |
 483  | **Import from your browser** | Grab auth from a Chrome session you already logged into | `--auto-connect` + `state save` |
 484  | **State file** | Load a previously saved state JSON on launch | `--state <path>` / `AGENT_BROWSER_STATE` |
 485  | **Auth vault** | Store credentials locally (encrypted), login by name | `auth save` / `auth login` |
 486  
 487  ### Import auth from your browser
 488  
 489  If you are already logged in to a site in Chrome, you can grab that auth state and reuse it:
 490  
 491  ```bash
 492  # 1. Launch Chrome with remote debugging enabled
 493  #    macOS:
 494  "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
 495  #    Or use --auto-connect to discover an already-running Chrome
 496  
 497  # 2. Connect and save the authenticated state
 498  agent-browser --auto-connect state save ./my-auth.json
 499  
 500  # 3. Use the saved auth in future sessions
 501  agent-browser --state ./my-auth.json open https://app.example.com/dashboard
 502  
 503  # 4. Or use --session-name for automatic persistence
 504  agent-browser --session-name myapp state load ./my-auth.json
 505  # From now on, --session-name myapp auto-saves/restores this state
 506  ```
 507  
 508  > **Security notes:**
 509  > - `--remote-debugging-port` exposes full browser control on localhost. Any local process can connect. Only use on trusted machines and close Chrome when done.
 510  > - State files contain session tokens in plaintext. Add them to `.gitignore` and delete when no longer needed. For encryption at rest, set `AGENT_BROWSER_ENCRYPTION_KEY` (see [State Encryption](#state-encryption)).
 511  
 512  For full details on login flows, OAuth, 2FA, cookie-based auth, and the auth vault, see the [Authentication](docs/src/app/sessions/page.mdx) docs.
 513  
 514  ## Sessions
 515  
 516  Run multiple isolated browser instances:
 517  
 518  ```bash
 519  # Different sessions
 520  agent-browser --session agent1 open site-a.com
 521  agent-browser --session agent2 open site-b.com
 522  
 523  # Or via environment variable
 524  AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
 525  
 526  # List active sessions
 527  agent-browser session list
 528  # Output:
 529  # Active sessions:
 530  # -> default
 531  #    agent1
 532  
 533  # Show current session
 534  agent-browser session
 535  ```
 536  
 537  Each session has its own:
 538  
 539  - Browser instance
 540  - Cookies and storage
 541  - Navigation history
 542  - Authentication state
 543  
 544  ## Chrome Profile Reuse
 545  
 546  The fastest way to use your existing login state: pass a Chrome profile name to `--profile`:
 547  
 548  ```bash
 549  # List available Chrome profiles
 550  agent-browser profiles
 551  
 552  # Reuse your default Chrome profile's login state
 553  agent-browser --profile Default open https://gmail.com
 554  
 555  # Use a named profile (by display name or directory name)
 556  agent-browser --profile "Work" open https://app.example.com
 557  
 558  # Or via environment variable
 559  AGENT_BROWSER_PROFILE=Default agent-browser open https://gmail.com
 560  ```
 561  
 562  This copies your Chrome profile to a temp directory (read-only snapshot, no changes to your original profile), so the browser launches with your existing cookies and sessions.
 563  
 564  > **Note:** On Windows, close Chrome before using `--profile <name>` if Chrome is running, as some profile files may be locked.
 565  
 566  ## Persistent Profiles
 567  
 568  For a persistent custom profile directory that stores state across browser restarts, pass a path to `--profile`:
 569  
 570  ```bash
 571  # Use a persistent profile directory
 572  agent-browser --profile ~/.myapp-profile open myapp.com
 573  
 574  # Login once, then reuse the authenticated session
 575  agent-browser --profile ~/.myapp-profile open myapp.com/dashboard
 576  
 577  # Or via environment variable
 578  AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
 579  ```
 580  
 581  The profile directory stores:
 582  
 583  - Cookies and localStorage
 584  - IndexedDB data
 585  - Service workers
 586  - Browser cache
 587  - Login sessions
 588  
 589  **Tip**: Use different profile paths for different projects to keep their browser state isolated.
 590  
 591  ## Session Persistence
 592  
 593  Alternatively, use `--session-name` to automatically save and restore cookies and localStorage across browser restarts:
 594  
 595  ```bash
 596  # Auto-save/load state for "twitter" session
 597  agent-browser --session-name twitter open twitter.com
 598  
 599  # Login once, then state persists automatically
 600  # State files stored in ~/.agent-browser/sessions/
 601  
 602  # Or via environment variable
 603  export AGENT_BROWSER_SESSION_NAME=twitter
 604  agent-browser open twitter.com
 605  ```
 606  
 607  ### State Encryption
 608  
 609  Encrypt saved session data at rest with AES-256-GCM:
 610  
 611  ```bash
 612  # Generate key: openssl rand -hex 32
 613  export AGENT_BROWSER_ENCRYPTION_KEY=<64-char-hex-key>
 614  
 615  # State files are now encrypted automatically
 616  agent-browser --session-name secure open example.com
 617  ```
 618  
 619  | Variable                          | Description                                        |
 620  | --------------------------------- | -------------------------------------------------- |
 621  | `AGENT_BROWSER_SESSION_NAME`      | Auto-save/load state persistence name              |
 622  | `AGENT_BROWSER_ENCRYPTION_KEY`    | 64-char hex key for AES-256-GCM encryption         |
 623  | `AGENT_BROWSER_STATE_EXPIRE_DAYS` | Auto-delete states older than N days (default: 30) |
 624  
 625  ## Security
 626  
 627  agent-browser includes security features for safe AI agent deployments. All features are opt-in -- existing workflows are unaffected until you explicitly enable a feature:
 628  
 629  - **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
 630  - **Content Boundary Markers** -- Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
 631  - **Domain Allowlist** -- Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
 632  - **Action Policy** -- Gate destructive actions with a static policy file: `--action-policy ./policy.json`
 633  - **Action Confirmation** -- Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
 634  - **Output Length Limits** -- Prevent context flooding: `--max-output 50000`
 635  
 636  | Variable                            | Description                              |
 637  | ----------------------------------- | ---------------------------------------- |
 638  | `AGENT_BROWSER_CONTENT_BOUNDARIES`  | Wrap page output in boundary markers     |
 639  | `AGENT_BROWSER_MAX_OUTPUT`          | Max characters for page output           |
 640  | `AGENT_BROWSER_ALLOWED_DOMAINS`     | Comma-separated allowed domain patterns  |
 641  | `AGENT_BROWSER_ACTION_POLICY`       | Path to action policy JSON file          |
 642  | `AGENT_BROWSER_CONFIRM_ACTIONS`     | Action categories requiring confirmation |
 643  | `AGENT_BROWSER_CONFIRM_INTERACTIVE` | Enable interactive confirmation prompts  |
 644  
 645  See [Security documentation](https://agent-browser.dev/security) for details.
 646  
 647  ## Snapshot Options
 648  
 649  The `snapshot` command supports filtering to reduce output size:
 650  
 651  ```bash
 652  agent-browser snapshot                    # Full accessibility tree
 653  agent-browser snapshot -i                 # Interactive elements only (buttons, inputs, links)
 654  agent-browser snapshot -i --urls          # Interactive elements with link URLs
 655  agent-browser snapshot -c                 # Compact (remove empty structural elements)
 656  agent-browser snapshot -d 3               # Limit depth to 3 levels
 657  agent-browser snapshot -s "#main"         # Scope to CSS selector
 658  agent-browser snapshot -i -c -d 5         # Combine options
 659  ```
 660  
 661  | Option                 | Description                                                             |
 662  | ---------------------- | ----------------------------------------------------------------------- |
 663  | `-i, --interactive`    | Only show interactive elements (buttons, links, inputs)                 |
 664  | `-u, --urls`           | Include href URLs for link elements                                     |
 665  | `-c, --compact`        | Remove empty structural elements                                        |
 666  | `-d, --depth <n>`      | Limit tree depth                                                        |
 667  | `-s, --selector <sel>` | Scope to CSS selector                                                   |
 668  
 669  ## Annotated Screenshots
 670  
 671  The `--annotate` flag overlays numbered labels on interactive elements in the screenshot. Each label `[N]` corresponds to ref `@eN`, so the same refs work for both visual and text-based workflows.
 672  
 673  Annotated screenshots are supported on the CDP-backed browser path (Chrome/Lightpanda). The Safari/WebDriver backend does not yet support `--annotate`.
 674  
 675  ```bash
 676  agent-browser screenshot --annotate
 677  # -> Screenshot saved to /tmp/screenshot-2026-02-17T12-00-00-abc123.png
 678  #    [1] @e1 button "Submit"
 679  #    [2] @e2 link "Home"
 680  #    [3] @e3 textbox "Email"
 681  ```
 682  
 683  After an annotated screenshot, refs are cached so you can immediately interact with elements:
 684  
 685  ```bash
 686  agent-browser screenshot --annotate ./page.png
 687  agent-browser click @e2     # Click the "Home" link labeled [2]
 688  ```
 689  
 690  This is useful for multimodal AI models that can reason about visual layout, unlabeled icon buttons, canvas elements, or visual state that the text accessibility tree cannot capture.
 691  
 692  ## Options
 693  
 694  | Option | Description |
 695  |--------|-------------|
 696  | `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) |
 697  | `--session-name <name>` | Auto-save/restore session state (or `AGENT_BROWSER_SESSION_NAME` env) |
 698  | `--profile <name\|path>` | Chrome profile name or persistent directory path (or `AGENT_BROWSER_PROFILE` env) |
 699  | `--state <path>` | Load storage state from JSON file (or `AGENT_BROWSER_STATE` env) |
 700  | `--headers <json>` | Set HTTP headers scoped to the URL's origin |
 701  | `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) |
 702  | `--extension <path>` | Load browser extension (repeatable; or `AGENT_BROWSER_EXTENSIONS` env) |
 703  | `--init-script <path>` | Register a page init script before the first navigation (repeatable; or `AGENT_BROWSER_INIT_SCRIPTS` env) |
 704  | `--enable <feature>` | Built-in init scripts: `react-devtools` (repeatable or comma-list; or `AGENT_BROWSER_ENABLE` env) |
 705  | `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) |
 706  | `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) |
 707  | `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) |
 708  | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
 709  | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
 710  | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
 711  | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
 712  | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
 713  | `--json` | JSON output (for agents) |
 714  | `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) |
 715  | `--screenshot-dir <path>` | Default screenshot output directory (or `AGENT_BROWSER_SCREENSHOT_DIR` env) |
 716  | `--screenshot-quality <n>` | JPEG quality 0-100 (or `AGENT_BROWSER_SCREENSHOT_QUALITY` env) |
 717  | `--screenshot-format <fmt>` | Screenshot format: `png`, `jpeg` (or `AGENT_BROWSER_SCREENSHOT_FORMAT` env) |
 718  | `--headed` | Show browser window (not headless) (or `AGENT_BROWSER_HEADED` env) |
 719  | `--cdp <port\|url>` | Connect via Chrome DevTools Protocol (port or WebSocket URL) |
 720  | `--auto-connect` | Auto-discover and connect to running Chrome (or `AGENT_BROWSER_AUTO_CONNECT` env) |
 721  | `--color-scheme <scheme>` | Color scheme: `dark`, `light`, `no-preference` (or `AGENT_BROWSER_COLOR_SCHEME` env) |
 722  | `--download-path <path>` | Default download directory (or `AGENT_BROWSER_DOWNLOAD_PATH` env) |
 723  | `--content-boundaries` | Wrap page output in boundary markers for LLM safety (or `AGENT_BROWSER_CONTENT_BOUNDARIES` env) |
 724  | `--max-output <chars>` | Truncate page output to N characters (or `AGENT_BROWSER_MAX_OUTPUT` env) |
 725  | `--allowed-domains <list>` | Comma-separated allowed domain patterns (or `AGENT_BROWSER_ALLOWED_DOMAINS` env) |
 726  | `--action-policy <path>` | Path to action policy JSON file (or `AGENT_BROWSER_ACTION_POLICY` env) |
 727  | `--confirm-actions <list>` | Action categories requiring confirmation (or `AGENT_BROWSER_CONFIRM_ACTIONS` env) |
 728  | `--confirm-interactive` | Interactive confirmation prompts; auto-denies if stdin is not a TTY (or `AGENT_BROWSER_CONFIRM_INTERACTIVE` env) |
 729  | `--engine <name>` | Browser engine: `chrome` (default), `lightpanda` (or `AGENT_BROWSER_ENGINE` env) |
 730  | `--no-auto-dialog` | Disable automatic dismissal of `alert`/`beforeunload` dialogs (or `AGENT_BROWSER_NO_AUTO_DIALOG` env) |
 731  | `--model <name>` | AI model for chat command (or `AI_GATEWAY_MODEL` env) |
 732  | `-v`, `--verbose` | Show tool commands and their raw output (chat) |
 733  | `-q`, `--quiet` | Show only AI text responses, hide tool calls (chat) |
 734  | `--config <path>` | Use a custom config file (or `AGENT_BROWSER_CONFIG` env) |
 735  | `--debug` | Debug output |
 736  
 737  ## Observability Dashboard
 738  
 739  Monitor agent-browser sessions in real time with a local web dashboard showing a live viewport and command activity feed.
 740  
 741  ```bash
 742  # Start the dashboard server (runs in background on port 4848)
 743  agent-browser dashboard start
 744  agent-browser dashboard start --port 8080   # Custom port
 745  
 746  # All sessions are automatically visible in the dashboard
 747  agent-browser open example.com
 748  
 749  # Stop the dashboard
 750  agent-browser dashboard stop
 751  ```
 752  
 753  The dashboard runs as a standalone background process on port 4848, independent of browser sessions. It stays available even when no sessions are running. All sessions automatically stream to the dashboard.
 754  
 755  The dashboard displays:
 756  - **Live viewport** -- real-time JPEG frames from the browser
 757  - **Activity feed** -- chronological command/result stream with timing and expandable details
 758  - **Console output** -- browser console messages (log, warn, error)
 759  - **Session creation** -- create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
 760  - **AI Chat** -- chat with an AI assistant directly in the dashboard (requires Vercel AI Gateway configuration)
 761  
 762  ### AI Chat
 763  
 764  The dashboard includes an optional AI chat panel powered by the Vercel AI Gateway. The same functionality is available directly from the CLI via the `chat` command. Set these environment variables to enable AI chat:
 765  
 766  ```bash
 767  export AI_GATEWAY_API_KEY=gw_your_key_here
 768  export AI_GATEWAY_MODEL=anthropic/claude-sonnet-4.6           # optional, this is the default
 769  export AI_GATEWAY_URL=https://ai-gateway.vercel.sh           # optional, this is the default
 770  ```
 771  
 772  **CLI usage:**
 773  
 774  ```bash
 775  agent-browser chat "open google.com and search for cats"     # Single-shot
 776  agent-browser chat                                           # Interactive REPL
 777  agent-browser -q chat "summarize this page"                  # Quiet mode (text only)
 778  agent-browser -v chat "fill in the login form"               # Verbose (show command output)
 779  agent-browser --model openai/gpt-4o chat "take a screenshot" # Override model
 780  ```
 781  
 782  The `chat` command translates natural language instructions into agent-browser commands, executes them, and streams the AI response. In interactive mode, type `quit` to exit. Use `--json` for structured output suitable for agent consumption.
 783  
 784  **Dashboard usage:**
 785  
 786  The Chat tab is always visible in the dashboard. When `AI_GATEWAY_API_KEY` is set, the Rust server proxies requests to the gateway and streams responses back using the Vercel AI SDK's UI Message Stream protocol. Without the key, sending a message shows an error inline.
 787  
 788  ## Configuration
 789  
 790  Create an `agent-browser.json` file to set persistent defaults instead of repeating flags on every command.
 791  
 792  **Locations (lowest to highest priority):**
 793  
 794  1. `~/.agent-browser/config.json` -- user-level defaults
 795  2. `./agent-browser.json` -- project-level overrides (in working directory)
 796  3. `AGENT_BROWSER_*` environment variables override config file values
 797  4. CLI flags override everything
 798  
 799  **Example `agent-browser.json`:**
 800  
 801  ```json
 802  {
 803    "headed": true,
 804    "proxy": "http://localhost:8080",
 805    "profile": "./browser-data",
 806    "userAgent": "my-agent/1.0",
 807    "ignoreHttpsErrors": true
 808  }
 809  ```
 810  
 811  Use `--config <path>` or `AGENT_BROWSER_CONFIG` to load a specific config file instead of the defaults:
 812  
 813  ```bash
 814  agent-browser --config ./ci-config.json open example.com
 815  AGENT_BROWSER_CONFIG=./ci-config.json agent-browser open example.com
 816  ```
 817  
 818  All options from the table above can be set in the config file using camelCase keys (e.g., `--executable-path` becomes `"executablePath"`, `--proxy-bypass` becomes `"proxyBypass"`). Unknown keys are ignored for forward compatibility.
 819  
 820  A [JSON Schema](agent-browser.schema.json) is available for IDE autocomplete and validation. Add a `$schema` key to your config file to enable it:
 821  
 822  ```json
 823  {
 824    "$schema": "https://agent-browser.dev/schema.json",
 825    "headed": true
 826  }
 827  ```
 828  
 829  Boolean flags accept an optional `true`/`false` value to override config settings. For example, `--headed false` disables `"headed": true` from config. A bare `--headed` is equivalent to `--headed true`.
 830  
 831  Auto-discovered config files that are missing are silently ignored. If `--config <path>` points to a missing or invalid file, agent-browser exits with an error. Extensions from user and project configs are merged (concatenated), not replaced.
 832  
 833  > **Tip:** If your project-level `agent-browser.json` contains environment-specific values (paths, proxies), consider adding it to `.gitignore`.
 834  
 835  ## Default Timeout
 836  
 837  The default timeout for standard operations (clicks, waits, fills, etc.) is 25 seconds. This is intentionally below the CLI's 30-second IPC read timeout so that the daemon returns a proper error instead of the CLI timing out with EAGAIN.
 838  
 839  Override the default timeout via environment variable:
 840  
 841  ```bash
 842  # Set a longer timeout for slow pages (in milliseconds)
 843  export AGENT_BROWSER_DEFAULT_TIMEOUT=45000
 844  ```
 845  
 846  > **Note:** Setting this above 30000 (30s) may cause EAGAIN errors on slow operations because the CLI's read timeout will expire before the daemon responds. The CLI retries transient errors automatically, but response times will increase.
 847  
 848  | Variable                        | Description                              |
 849  | ------------------------------- | ---------------------------------------- |
 850  | `AGENT_BROWSER_DEFAULT_TIMEOUT` | Default operation timeout in ms (default: 25000) |
 851  
 852  ## Selectors
 853  
 854  ### Refs (Recommended for AI)
 855  
 856  Refs provide deterministic element selection from snapshots:
 857  
 858  ```bash
 859  # 1. Get snapshot with refs
 860  agent-browser snapshot
 861  # Output:
 862  # - heading "Example Domain" [ref=e1] [level=1]
 863  # - button "Submit" [ref=e2]
 864  # - textbox "Email" [ref=e3]
 865  # - link "Learn more" [ref=e4]
 866  
 867  # 2. Use refs to interact
 868  agent-browser click @e2                   # Click the button
 869  agent-browser fill @e3 "test@example.com" # Fill the textbox
 870  agent-browser get text @e1                # Get heading text
 871  agent-browser hover @e4                   # Hover the link
 872  ```
 873  
 874  **Why use refs?**
 875  
 876  - **Deterministic**: Ref points to exact element from snapshot
 877  - **Fast**: No DOM re-query needed
 878  - **AI-friendly**: Snapshot + ref workflow is optimal for LLMs
 879  
 880  ### CSS Selectors
 881  
 882  ```bash
 883  agent-browser click "#id"
 884  agent-browser click ".class"
 885  agent-browser click "div > button"
 886  ```
 887  
 888  ### Text & XPath
 889  
 890  ```bash
 891  agent-browser click "text=Submit"
 892  agent-browser click "xpath=//button"
 893  ```
 894  
 895  ### Semantic Locators
 896  
 897  ```bash
 898  agent-browser find role button click --name "Submit"
 899  agent-browser find label "Email" fill "test@test.com"
 900  ```
 901  
 902  ## Agent Mode
 903  
 904  Use `--json` for machine-readable output:
 905  
 906  ```bash
 907  agent-browser snapshot --json
 908  # Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
 909  
 910  agent-browser get text @e1 --json
 911  agent-browser is visible @e2 --json
 912  ```
 913  
 914  ### Optimal AI Workflow
 915  
 916  ```bash
 917  # 1. Navigate and get snapshot
 918  agent-browser open example.com
 919  agent-browser snapshot -i --json   # AI parses tree and refs
 920  
 921  # 2. AI identifies target refs from snapshot
 922  # 3. Execute actions using refs
 923  agent-browser click @e2
 924  agent-browser fill @e3 "input text"
 925  
 926  # 4. Get new snapshot if page changed
 927  agent-browser snapshot -i --json
 928  ```
 929  
 930  ### Command Chaining
 931  
 932  Commands can be chained with `&&` in a single shell invocation. The browser persists via a background daemon, so chaining is safe and more efficient:
 933  
 934  ```bash
 935  # Open, wait for load, and snapshot in one call
 936  agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
 937  
 938  # Chain multiple interactions
 939  agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3
 940  
 941  # Navigate and screenshot
 942  agent-browser open example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
 943  ```
 944  
 945  Use `&&` when you don't need intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs before interacting).
 946  
 947  ## Headed Mode
 948  
 949  Show the browser window for debugging:
 950  
 951  ```bash
 952  agent-browser open example.com --headed
 953  ```
 954  
 955  This opens a visible browser window instead of running headless.
 956  
 957  > **Note:** Browser extensions work in both headed and headless mode (Chrome's `--headless=new`).
 958  
 959  ## Authenticated Sessions
 960  
 961  Use `--headers` to set HTTP headers for a specific origin, enabling authentication without login flows:
 962  
 963  ```bash
 964  # Headers are scoped to api.example.com only
 965  agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
 966  
 967  # Requests to api.example.com include the auth header
 968  agent-browser snapshot -i --json
 969  agent-browser click @e2
 970  
 971  # Navigate to another domain - headers are NOT sent (safe!)
 972  agent-browser open other-site.com
 973  ```
 974  
 975  This is useful for:
 976  
 977  - **Skipping login flows** - Authenticate via headers instead of UI
 978  - **Switching users** - Start new sessions with different auth tokens
 979  - **API testing** - Access protected endpoints directly
 980  - **Security** - Headers are scoped to the origin, not leaked to other domains
 981  
 982  To set headers for multiple origins, use `--headers` with each `open` command:
 983  
 984  ```bash
 985  agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
 986  agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
 987  ```
 988  
 989  For global headers (all domains), use `set headers`:
 990  
 991  ```bash
 992  agent-browser set headers '{"X-Custom-Header": "value"}'
 993  ```
 994  
 995  ## Custom Browser Executable
 996  
 997  Use a custom browser executable instead of the bundled Chromium. This is useful for:
 998  
 999  - **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB)
1000  - **System browsers**: Use an existing Chrome/Chromium installation
1001  - **Custom builds**: Use modified browser builds
1002  
1003  ### CLI Usage
1004  
1005  ```bash
1006  # Via flag
1007  agent-browser --executable-path /path/to/chromium open example.com
1008  
1009  # Via environment variable
1010  AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
1011  ```
1012  
1013  ### Serverless (Vercel)
1014  
1015  Run agent-browser + Chrome in an ephemeral Vercel Sandbox microVM. No external server needed:
1016  
1017  ```typescript
1018  import { Sandbox } from "@vercel/sandbox";
1019  
1020  const sandbox = await Sandbox.create({ runtime: "node24" });
1021  await sandbox.runCommand("agent-browser", ["open", "https://example.com"]);
1022  const result = await sandbox.runCommand("agent-browser", ["screenshot", "--json"]);
1023  await sandbox.stop();
1024  ```
1025  
1026  See the [environments example](examples/environments/) for a working demo with a UI and deploy-to-Vercel button.
1027  
1028  ### Serverless (AWS Lambda)
1029  
1030  ```typescript
1031  import chromium from '@sparticuz/chromium';
1032  import { execSync } from 'child_process';
1033  
1034  export async function handler() {
1035    const executablePath = await chromium.executablePath();
1036    const result = execSync(
1037      `AGENT_BROWSER_EXECUTABLE_PATH=${executablePath} agent-browser open https://example.com && agent-browser snapshot -i --json`,
1038      { encoding: 'utf-8' }
1039    );
1040    return JSON.parse(result);
1041  }
1042  ```
1043  
1044  ## Local Files
1045  
1046  Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs:
1047  
1048  ```bash
1049  # Enable file access (required for JavaScript to access local files)
1050  agent-browser --allow-file-access open file:///path/to/document.pdf
1051  agent-browser --allow-file-access open file:///path/to/page.html
1052  
1053  # Take screenshot of a local PDF
1054  agent-browser --allow-file-access open file:///Users/me/report.pdf
1055  agent-browser screenshot report.png
1056  ```
1057  
1058  The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to:
1059  
1060  - Load and render local files
1061  - Access other local files via JavaScript (XHR, fetch)
1062  - Load local resources (images, scripts, stylesheets)
1063  
1064  **Note:** This flag only works with Chromium. For security, it's disabled by default.
1065  
1066  ## CDP Mode
1067  
1068  Connect to an existing browser via Chrome DevTools Protocol:
1069  
1070  ```bash
1071  # Start Chrome with: google-chrome --remote-debugging-port=9222
1072  
1073  # Connect once, then run commands without --cdp
1074  agent-browser connect 9222
1075  agent-browser snapshot
1076  agent-browser tab
1077  agent-browser close
1078  
1079  # Or pass --cdp on each command
1080  agent-browser --cdp 9222 snapshot
1081  
1082  # Connect to remote browser via WebSocket URL
1083  agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot
1084  ```
1085  
1086  The `--cdp` flag accepts either:
1087  
1088  - A port number (e.g., `9222`) for local connections via `http://localhost:{port}`
1089  - A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services
1090  
1091  This enables control of:
1092  
1093  - Electron apps
1094  - Chrome/Chromium instances with remote debugging
1095  - WebView2 applications
1096  - Any browser exposing a CDP endpoint
1097  
1098  ### Auto-Connect
1099  
1100  Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port:
1101  
1102  ```bash
1103  # Auto-discover running Chrome with remote debugging
1104  agent-browser --auto-connect open example.com
1105  agent-browser --auto-connect snapshot
1106  
1107  # Or via environment variable
1108  AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot
1109  ```
1110  
1111  Auto-connect discovers Chrome by:
1112  
1113  1. Reading Chrome's `DevToolsActivePort` file from the default user data directory
1114  2. Falling back to probing common debugging ports (9222, 9229)
1115  3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection
1116  
1117  This is useful when:
1118  
1119  - Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port)
1120  - You want a zero-configuration connection to your existing browser
1121  - You don't want to track which port Chrome is using
1122  
1123  ## Streaming (Browser Preview)
1124  
1125  Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
1126  
1127  ### Streaming
1128  
1129  Every session automatically starts a WebSocket stream server on an OS-assigned port. Use `stream status` to see the bound port and connection state:
1130  
1131  ```bash
1132  agent-browser stream status
1133  ```
1134  
1135  To bind to a specific port, set `AGENT_BROWSER_STREAM_PORT`:
1136  
1137  ```bash
1138  AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
1139  ```
1140  
1141  You can also manage streaming at runtime with `stream enable`, `stream disable`, and `stream status`:
1142  
1143  ```bash
1144  agent-browser stream enable --port 9223   # Re-enable on a specific port
1145  agent-browser stream disable              # Stop streaming for the session
1146  ```
1147  
1148  The WebSocket server streams the browser viewport and accepts input events.
1149  
1150  ### WebSocket Protocol
1151  
1152  Connect to `ws://localhost:9223` to receive frames and send input:
1153  
1154  **Receive frames:**
1155  
1156  ```json
1157  {
1158    "type": "frame",
1159    "data": "<base64-encoded-jpeg>",
1160    "metadata": {
1161      "deviceWidth": 1280,
1162      "deviceHeight": 720,
1163      "pageScaleFactor": 1,
1164      "offsetTop": 0,
1165      "scrollOffsetX": 0,
1166      "scrollOffsetY": 0
1167    }
1168  }
1169  ```
1170  
1171  **Send mouse events:**
1172  
1173  ```json
1174  {
1175    "type": "input_mouse",
1176    "eventType": "mousePressed",
1177    "x": 100,
1178    "y": 200,
1179    "button": "left",
1180    "clickCount": 1
1181  }
1182  ```
1183  
1184  **Send keyboard events:**
1185  
1186  ```json
1187  {
1188    "type": "input_keyboard",
1189    "eventType": "keyDown",
1190    "key": "Enter",
1191    "code": "Enter"
1192  }
1193  ```
1194  
1195  **Send touch events:**
1196  
1197  ```json
1198  {
1199    "type": "input_touch",
1200    "eventType": "touchStart",
1201    "touchPoints": [{ "x": 100, "y": 200 }]
1202  }
1203  ```
1204  
1205  ## Architecture
1206  
1207  agent-browser uses a client-daemon architecture:
1208  
1209  1. **Rust CLI** - Parses commands, communicates with daemon
1210  2. **Rust Daemon** - Pure Rust daemon using direct CDP, no Node.js required
1211  
1212  The daemon starts automatically on first command and persists between commands for fast subsequent operations. To auto-shutdown the daemon after a period of inactivity, set `AGENT_BROWSER_IDLE_TIMEOUT_MS` (value in milliseconds). When set, the daemon closes the browser and exits after receiving no commands for the specified duration.
1213  
1214  **Browser Engine:** Uses Chrome (from Chrome for Testing) by default. The `--engine` flag selects between `chrome` and `lightpanda`. Supported browsers: Chromium/Chrome (via CDP) and Safari (via WebDriver for iOS).
1215  
1216  ## Platforms
1217  
1218  | Platform    | Binary      |
1219  | ----------- | ----------- |
1220  | macOS ARM64 | Native Rust |
1221  | macOS x64   | Native Rust |
1222  | Linux ARM64 | Native Rust |
1223  | Linux x64   | Native Rust |
1224  | Windows x64 | Native Rust |
1225  
1226  ## Usage with AI Agents
1227  
1228  ### Just ask the agent
1229  
1230  The simplest approach -- just tell your agent to use it:
1231  
1232  ```
1233  Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
1234  ```
1235  
1236  The `--help` output is comprehensive and most agents can figure it out from there.
1237  
1238  ### AI Coding Assistants (recommended)
1239  
1240  Add the skill to your AI coding assistant for richer context:
1241  
1242  ```bash
1243  npx skills add vercel-labs/agent-browser
1244  ```
1245  
1246  This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale.
1247  
1248  ### Claude Code
1249  
1250  Install as a Claude Code skill:
1251  
1252  ```bash
1253  npx skills add vercel-labs/agent-browser
1254  ```
1255  
1256  This adds a thin discovery stub at `.claude/skills/agent-browser/SKILL.md`. The stub is intentionally minimal — it points Claude Code at `agent-browser skills get core` to load the actual workflow content at runtime. This way the instructions always match the installed CLI version instead of going stale between releases.
1257  
1258  ### AGENTS.md / CLAUDE.md
1259  
1260  For more consistent results, add to your project or global instructions file:
1261  
1262  ```markdown
1263  ## Browser Automation
1264  
1265  Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
1266  
1267  Core workflow:
1268  
1269  1. `agent-browser open <url>` - Navigate to page
1270  2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
1271  3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
1272  4. Re-snapshot after page changes
1273  ```
1274  
1275  ## Integrations
1276  
1277  ### iOS Simulator
1278  
1279  Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode.
1280  
1281  **Setup:**
1282  
1283  ```bash
1284  # Install Appium and XCUITest driver
1285  npm install -g appium
1286  appium driver install xcuitest
1287  ```
1288  
1289  **Usage:**
1290  
1291  ```bash
1292  # List available iOS simulators
1293  agent-browser device list
1294  
1295  # Launch Safari on a specific device
1296  agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
1297  
1298  # Same commands as desktop
1299  agent-browser -p ios snapshot -i
1300  agent-browser -p ios tap @e1
1301  agent-browser -p ios fill @e2 "text"
1302  agent-browser -p ios screenshot mobile.png
1303  
1304  # Mobile-specific commands
1305  agent-browser -p ios swipe up
1306  agent-browser -p ios swipe down 500
1307  
1308  # Close session
1309  agent-browser -p ios close
1310  ```
1311  
1312  Or use environment variables:
1313  
1314  ```bash
1315  export AGENT_BROWSER_PROVIDER=ios
1316  export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro"
1317  agent-browser open https://example.com
1318  ```
1319  
1320  | Variable                   | Description                                     |
1321  | -------------------------- | ----------------------------------------------- |
1322  | `AGENT_BROWSER_PROVIDER`   | Set to `ios` to enable iOS mode                 |
1323  | `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") |
1324  | `AGENT_BROWSER_IOS_UDID`   | Device UDID (alternative to device name)        |
1325  
1326  **Supported devices:** All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices.
1327  
1328  **Note:** The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast.
1329  
1330  #### Real Device Support
1331  
1332  Appium also supports real iOS devices connected via USB. This requires additional one-time setup:
1333  
1334  **1. Get your device UDID:**
1335  
1336  ```bash
1337  xcrun xctrace list devices
1338  # or
1339  system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad"
1340  ```
1341  
1342  **2. Sign WebDriverAgent (one-time):**
1343  
1344  ```bash
1345  # Open the WebDriverAgent Xcode project
1346  cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent
1347  open WebDriverAgent.xcodeproj
1348  ```
1349  
1350  In Xcode:
1351  
1352  - Select the `WebDriverAgentRunner` target
1353  - Go to Signing & Capabilities
1354  - Select your Team (requires Apple Developer account, free tier works)
1355  - Let Xcode manage signing automatically
1356  
1357  **3. Use with agent-browser:**
1358  
1359  ```bash
1360  # Connect device via USB, then:
1361  agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com
1362  
1363  # Or use the device name if unique
1364  agent-browser -p ios --device "John's iPhone" open https://example.com
1365  ```
1366  
1367  **Real device notes:**
1368  
1369  - First run installs WebDriverAgent to the device (may require Trust prompt)
1370  - Device must be unlocked and connected via USB
1371  - Slightly slower initial connection than simulator
1372  - Tests against real Safari performance and behavior
1373  
1374  ### Browserless
1375  
1376  [Browserless](https://browserless.io) provides cloud browser infrastructure with a Sessions API. Use it when running agent-browser in environments where a local browser isn't available.
1377  
1378  To enable Browserless, use the `-p` flag:
1379  
1380  ```bash
1381  export BROWSERLESS_API_KEY="your-api-token"
1382  agent-browser -p browserless open https://example.com
1383  ```
1384  
1385  Or use environment variables for CI/scripts:
1386  
1387  ```bash
1388  export AGENT_BROWSER_PROVIDER=browserless
1389  export BROWSERLESS_API_KEY="your-api-token"
1390  agent-browser open https://example.com
1391  ```
1392  
1393  Optional configuration via environment variables:
1394  
1395  | Variable                   | Description                                      | Default                                 |
1396  | -------------------------- | ------------------------------------------------ | --------------------------------------- |
1397  | `BROWSERLESS_API_URL`      | Base API URL (for custom regions or self-hosted) | `https://production-sfo.browserless.io` |
1398  | `BROWSERLESS_BROWSER_TYPE` | Type of browser to use (chromium or chrome)      | chromium                                |
1399  | `BROWSERLESS_TTL`          | Session TTL in milliseconds                      | `300000`                                |
1400  | `BROWSERLESS_STEALTH`      | Enable stealth mode (`true`/`false`)             | `true`                                  |
1401  
1402  When enabled, agent-browser connects to a Browserless cloud session instead of launching a local browser. All commands work identically.
1403  
1404  Get your API token from the [Browserless Dashboard](https://browserless.io).
1405  
1406  ### Browserbase
1407  
1408  [Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
1409  
1410  To enable Browserbase, use the `-p` flag:
1411  
1412  ```bash
1413  export BROWSERBASE_API_KEY="your-api-key"
1414  agent-browser -p browserbase open https://example.com
1415  ```
1416  
1417  Or use environment variables for CI/scripts:
1418  
1419  ```bash
1420  export AGENT_BROWSER_PROVIDER=browserbase
1421  export BROWSERBASE_API_KEY="your-api-key"
1422  agent-browser open https://example.com
1423  ```
1424  
1425  When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically.
1426  
1427  Get your API key from the [Browserbase Dashboard](https://browserbase.com/overview).
1428  
1429  ### Browser Use
1430  
1431  [Browser Use](https://browser-use.com) provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.).
1432  
1433  To enable Browser Use, use the `-p` flag:
1434  
1435  ```bash
1436  export BROWSER_USE_API_KEY="your-api-key"
1437  agent-browser -p browseruse open https://example.com
1438  ```
1439  
1440  Or use environment variables for CI/scripts:
1441  
1442  ```bash
1443  export AGENT_BROWSER_PROVIDER=browseruse
1444  export BROWSER_USE_API_KEY="your-api-key"
1445  agent-browser open https://example.com
1446  ```
1447  
1448  When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically.
1449  
1450  Get your API key from the [Browser Use Cloud Dashboard](https://cloud.browser-use.com/settings?tab=api-keys). Free credits are available to get started, with pay-as-you-go pricing after.
1451  
1452  ### Kernel
1453  
1454  [Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
1455  
1456  To enable Kernel, use the `-p` flag:
1457  
1458  ```bash
1459  export KERNEL_API_KEY="your-api-key"
1460  agent-browser -p kernel open https://example.com
1461  ```
1462  
1463  Or use environment variables for CI/scripts:
1464  
1465  ```bash
1466  export AGENT_BROWSER_PROVIDER=kernel
1467  export KERNEL_API_KEY="your-api-key"
1468  agent-browser open https://example.com
1469  ```
1470  
1471  Optional configuration via environment variables:
1472  
1473  | Variable                 | Description                                                                      | Default |
1474  | ------------------------ | -------------------------------------------------------------------------------- | ------- |
1475  | `KERNEL_HEADLESS`        | Run browser in headless mode (`true`/`false`)                                    | `false` |
1476  | `KERNEL_STEALTH`         | Enable stealth mode to avoid bot detection (`true`/`false`)                      | `true`  |
1477  | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds                                                       | `300`   |
1478  | `KERNEL_PROFILE_NAME`    | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none)  |
1479  
1480  When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically.
1481  
1482  **Profile Persistence:** When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions.
1483  
1484  Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com).
1485  
1486  ### AgentCore
1487  
1488  [AWS Bedrock AgentCore](https://aws.amazon.com/bedrock/agentcore/) provides cloud browser sessions with SigV4 authentication.
1489  
1490  To enable AgentCore, use the `-p` flag:
1491  
1492  ```bash
1493  agent-browser -p agentcore open https://example.com
1494  ```
1495  
1496  Or use environment variables for CI/scripts:
1497  
1498  ```bash
1499  export AGENT_BROWSER_PROVIDER=agentcore
1500  agent-browser open https://example.com
1501  ```
1502  
1503  Credentials are automatically resolved from environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) or the AWS CLI (`aws configure export-credentials`), which supports SSO, profiles, and IAM roles.
1504  
1505  Optional configuration via environment variables:
1506  
1507  | Variable                   | Description                                                          | Default          |
1508  | -------------------------- | -------------------------------------------------------------------- | ---------------- |
1509  | `AGENTCORE_REGION`         | AWS region for the AgentCore endpoint                                | `us-east-1`      |
1510  | `AGENTCORE_BROWSER_ID`     | Browser identifier                                                   | `aws.browser.v1` |
1511  | `AGENTCORE_PROFILE_ID`     | Browser profile for persistent state (cookies, localStorage)         | (none)           |
1512  | `AGENTCORE_SESSION_TIMEOUT`| Session timeout in seconds                                           | `3600`           |
1513  | `AWS_PROFILE`              | AWS CLI profile for credential resolution                            | `default`        |
1514  
1515  **Browser profiles:** When `AGENTCORE_PROFILE_ID` is set, browser state (cookies, localStorage) is persisted across sessions automatically.
1516  
1517  When enabled, agent-browser connects to an AgentCore cloud browser session instead of launching a local browser. All commands work identically.
1518  
1519  ## License
1520  
1521  Apache-2.0