Getting started

How to give an AI agent a browser

A practical guide to giving an LLM agent a real, authenticated browser — from a single goal to deterministic, replayable action — without re-running the model on every step.

An LLM can reason about the web, but it cannot click, type, log in, or hold a session on its own. To act, it needs a browser execution layer — something that turns a natural-language goal into concrete browser actions, runs them in a real browser, and returns the result. This guide shows the fastest path from "I have an agent" to "my agent can do things on the web", and how to keep the cost flat as you scale.

What does it mean to give an agent a browser?

Giving an agent a browser means exposing a single, high-level action — "book the demo on this page", "download last month's invoices", "fill out this form" — and having a system carry it out end to end in a real, JavaScript-capable browser. Twin Browser is that layer: you POST a goal and a target URL, and it compiles the page into a token-efficient indexed state, plans the actions, executes them, and returns structured output. You never ship a headless browser, manage Chromium, or hand-write selectors.

How do I run my first goal?

Authenticate with a Bearer API key and POST a goal to /api/v1/run with the target URL. The target URL is the authorization signal — Twin runs where you are authorized (first-party sites, internal RPA, authorized testing). Auth, billing, and audit logging run on every call.

Run a goalbash
curl -X POST https://twin-browser.com/api/v1/run \
  -H "Authorization: Bearer $TWIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://app.example.com/invoices",
    "goal": "Download every invoice from May 2026 as PDF",
    "output_schema": { "files": "string[]" }
  }'

What happens on the first run vs. the next one?

The first time you run a goal it is a cold compile: Twin reads the live DOM, builds an indexed-state map of the interactive elements under a token budget, the planner picks actions, and a successful run is compiled into a reusable skill. The next time a similar — even re-phrased — request arrives, the semantic dispatch cache matches it to that skill and replays deterministically with zero LLM calls. So a ~50-step flow that costs real model tokens once becomes a ~$0-LLM replay afterward.

  • Cold compile: DOM → indexed state → plan → execute → compile a skill.
  • Warm path: semantic match → deterministic replay, no model call.
  • Re-phrased requests still hit the cache because the match is on intent, not an exact string.

How do I keep an authenticated session?

Real work needs login. Store credentials in the credential vault and reference them by name; Twin injects them at run time and never returns them to the model context. When a step needs a human — an approval, or MFA on an authorized flow — Twin pauses with a human-in-the-loop handoff, then resumes from where it left off.

Run with vaulted credentialsjson
{
  "url": "https://app.example.com/login",
  "goal": "Log in and export the team roster as CSV",
  "credentials": { "ref": "example-prod-login" },
  "on_blocked": "handoff"
}

How do I wire this into my agent framework?

Twin exposes itself three ways so it drops into whatever you already use: a REST API under /api/v1/*, an MCP server (tools run, compile_skill, run_skill) for Cursor / Claude Desktop / Claude Code / Cline, and tool adapters for LangChain and AutoGen. Pick the one that matches your stack and register Twin as a single tool your agent can call.

Keep going

The mechanics behind this guide: the semantic dispatch cache, deterministic replay, and skill compilation. Plug Twin into MCP, LangChain, or the REST API, or see it applied to AI agents and RPA replacement. Weighing options? See how Twin compares.

Do I need to run my own headless browser?
No. Twin runs the browser for you in the cloud. You send a goal and a URL over REST or MCP; Twin handles the browser, the session, and the execution.
Can the agent log in to sites?
Yes, on flows you are authorized to automate. Store credentials in the credential vault and reference them by name; Twin injects them at run time and keeps them out of the model context.
What stops cost from scaling with every run?
The semantic dispatch cache. After the first compile, re-phrased and repeated requests match a compiled skill and replay deterministically with no LLM call, so marginal cost trends toward zero.

Run your first skill

Compile a task once, then replay it deterministically with zero LLM calls. Start free.