How it works

From a goal to deterministic, replayable action

Twin is the browser execution layer for LLM agents. Here is the full pipeline — how a plain-English goal becomes a compiled skill, and how the next similar request skips the model entirely.

billing.acme.com
  1. Open billing.acme.comdone
  2. Compile DOM → indexed state (42 elements, ~3k tokens)done
  3. Plan: log in → open invoices → download latestdone
  4. Act: fill #user · fill #pass · click “Sign in”running
  5. Freeze the path into skill sk_9f2cqueued

One cold run, captured: open, compile the page into indexed state, plan, act, and freeze the path into a skill the next run replays for free.

The pipeline

Eight stages, one cost curve that bends down

The LLM does the hard thinking once, on the cold path. Everything after is cache and replay.

  1. 01

    DOM → indexed-state compiler

    A live page is compiled into a compact, numerically-indexed map of just the interactive elements — under a token budget — instead of raw HTML. A 50-step flow becomes ~3k tokens of state the model can actually reason over. token-efficient DOM

  2. 02

    Planner picks actions

    The planner reads the indexed state and chooses the next action — click element 14, type into element 7, submit. This is the only stage that needs an LLM, and only on the cold path.

  3. 03

    Successful run compiles into a skill

    When a goal completes, the path is frozen into a reusable skill: a deterministic action plan keyed to the page’s structure, stored in your agent/skill library. skill compilation

  4. 04

    Semantic dispatch cache matches re-phrased requests

    A new request is embedded and matched by meaning against compiled skills — so “schedule a call” finds the skill you built for “book a demo,” not just an exact string repeat.

  5. 05

    Deterministic replay (zero LLM)

    On a cache hit, the skill replays deterministically with no model call. This is where the cost curve bends: a hit is ~5× cheaper and the marginal cost of the next run trends toward zero.

  6. 06

    Human-in-the-loop handoff

    A blocked step — an approval, or MFA on a flow you’re authorized to run — pauses and hands off to a human via the live view, then resumes the skill where it left off.

  7. 07

    Cross-tenant skill corpus reuse

    A skill compiled once can be safely reused across tenants, so the cache-hit rate compounds as the whole network runs — you benefit from skills you never had to compile yourself.

  8. 08

    Live view, session video, vault & audit

    Every run streams a real-time live view and is recorded to durable session video. Credentials live in a per-tenant vault and every call is written to an audit log. credential vault

In code

The cold run, then the cache hit

First call compiles a skill. The next similar call — even re-worded — replays it deterministically with near-zero LLM tokens.

1 — cold run · compiles a skillbash
curl -X POST https://twin-browser.com/api/v1/run \
  -H "Authorization: Bearer $TWIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "goal": "Log in and download this month'\''s invoice",
    "url": "https://billing.acme.com"
  }'
# cold path: the planner compiles a skill,
# returns the result + a skill_id (llm_tokens: ~3120)
2 — dispatch · semantic cache hitbash
curl -X POST https://twin-browser.com/api/v1/dispatch \
  -H "Authorization: Bearer $TWIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "goal": "Grab the latest invoice PDF",
    "url": "https://billing.acme.com"
  }'
# semantic cache HIT -> deterministic replay
# ~0 LLM tokens, ~5x cheaper than the cold run

Full reference on the API and docs pages, or drive the exact same pipeline from your editor over the MCP server.

Authorization & trust

What runs on every call

Twin automates the web where you’re authorized — first-party sites, operator-approved automation, internal RPA, accessibility, and authorized testing.

The run’s target URL is the authorization signal. On every call, authentication, usage billing, and audit logging run before any action is taken. The backend is multi-tenant Supabase with default-deny RLS, per-tenant API keys, an audit log, and a credential vault for the secrets a flow needs.

Twin is not a CAPTCHA-bypass-for-hire or anti-bot evasion service. It’s the execution layer for the automation you’re allowed to run — and it keeps the receipts to prove it.

FAQ

The pipeline, answered

Why compile the DOM into indexed state instead of sending raw HTML?
Raw HTML is mostly markup the model doesn’t need, and it burns tokens fast. The compiler keeps only the interactive elements — roles, text, positions — as a numerically-indexed map under a token budget. The planner reasons over that compact state, and the stable indices are what make deterministic replay possible later.
When does an LLM actually run?
Only on the cold path — the first time a goal is seen for a host, when the planner has to discover the path. Once that run compiles into a skill, re-phrased requests hit the semantic dispatch cache and replay deterministically with no model call.
How does the semantic cache match a re-worded request?
New requests are embedded and matched by meaning against the skills you’ve compiled for that host. “Schedule a call” finds the skill you built for “book a demo” instead of cold-starting, so variants stay cheap.
What happens when a step needs a human?
A blocked step — an approval, or MFA on a flow you’re authorized to run — pauses the run and hands off to a human through the live view. When the human clears it, the skill resumes exactly where it left off.
Is anything shared across tenants?
Only the sanitized navigation skeleton of a skill — never your values, credentials, or exact paths. That lets the cross-tenant corpus raise everyone’s cache-hit rate while your data stays isolated under default-deny RLS.

Run your first skill in minutes

Free to start. Usage-based credits from $29/mo, with LLM cost metered and passed through at 1×.