A compact, indexed map of the page — not raw HTML
Twin compiles a live page into a numerically-indexed list of interactive elements under a token budget, so the planner sees signal, not markup.
x-twin-dom-tokens: 2,941What token-efficient dom state does
Feeding raw HTML to a model is slow and expensive — most of it is layout, scripts, and noise. Twin parses the live DOM into a compact, numerically-indexed map of just the interactive elements, capped to a token budget. The planner acts on stable indices instead of brittle selectors, which is faster, cheaper, and more reliable.
Interactive elements only
Buttons, links, inputs, and roles are extracted; layout, scripts, and decoration are dropped before the model ever sees the page.
Stable numeric indices
The agent acts on [1], [2], [3] instead of fragile CSS/XPath selectors, so plans survive small page changes.
Hard token budget
The map is capped to fit a budget, so even huge pages stay cheap to reason about — a 50-step flow can run on a few thousand tokens.
Accessibility-aware
Roles and labels feed the map, which makes Twin a natural fit for accessibility automation as well as agents.
From a goal to deterministic action
- 1Load the pageTwin drives a real browser to the target URL and waits for the interactive state to settle.
- 2Extract interactivesThe DOM is walked for actionable elements with their roles, labels, and positions.
- 3Index and budgetElements are assigned stable numeric indices and trimmed to fit the token budget.
- 4Hand to the plannerThe planner picks actions by index; the executor maps them back to real elements.
See it on a real call
A live page becomes a short indexed list — the planner acts on [2], [3], [4], not raw HTML.
# Indexed interactive map (x-twin-dom-tokens: 2,941)
[1] link "Calendar"
[2] button "Tuesday"
[3] combobox "Time slot"
[4] button "Reserve"
[5] input "Notes" (optional)
→ plan: click[2] · select[3]="09:00" · click[4]- Load the pagedone
- Extract interactivesrunning
- Index and budgetqueued
- Hand to the plannerqueued
What token-efficient dom state is
The facts — how it works, what it costs, and the signal you get back on every call.
| Property | Twin Browser |
|---|---|
| Input | Live DOM of the target page |
| Output | Indexed interactive map |
| Budget | Capped token count |
| Addressing | Stable numeric indices |
| Dropped | Layout, scripts, decoration |
| Signal | x-twin-dom-tokens header |
Token-efficient DOM state — common questions
Why not just send the HTML to the model?
What keeps plans from breaking on layout tweaks?
How big can a page be?
The rest of the platform
Deterministic replay
A successful run compiles into a skill — an ordered, parameterized program of browser actions that replays the same way every time.
Semantic dispatch cache
Re-phrased goals fuzzy-match a skill you already compiled — so the second request and every one after it skips the LLM entirely.
Live view & session video
Stream the browser session in real time, then keep a durable video of every run for debugging, audit, and proof of what happened.
Make every run cheaper than the last.
Start free, compile your first skill, and watch the marginal cost per run trend toward zero as your agents reuse what they have already learned.