Cost engineering

Cutting LLM cost in browser automation

Why most browser automation gets more expensive as you scale, and how a compile-once + semantic-cache + deterministic-replay pipeline bends the cost curve down instead of up.

The dominant pattern in agentic browser automation is to re-run the LLM on every execution: read the page, ask the model what to do, act, repeat. It works, but the bill scales linearly with usage — the 10,000th run costs the same as the first. This guide explains where that cost comes from and how to make the marginal cost per run fall as your agents run more, not less.

Why does browser automation get more expensive at scale?

Two costs dominate: model tokens and browser time. Browser time you can optimize. The model tokens are the trap — if every step feeds raw HTML to an LLM and asks for the next action, a single multi-step flow can burn tens of thousands of tokens, every single time it runs. Re-running the same workflow 1,000 times pays that token bill 1,000 times.

Lever 1: shrink the context with a token-efficient DOM

Raw HTML is mostly noise to a planner. Twin compiles the live page into an indexed-state map — a compact, numerically-indexed list of just the interactive elements — under a token budget. A page that would be ~40k tokens of HTML becomes a ~3k-token indexed state. That is a large, immediate saving even before any caching.

  • Index only interactive elements (inputs, buttons, links, roles).
  • Numerically reference them so the planner emits "click 14", not a brittle selector.
  • Stay under a token budget so the planner prompt is small and cheap.

Lever 2: stop re-planning — compile a skill once

The first successful run of a goal is compiled into a skill: an ordered, parameterized program of browser actions. Once you have the skill, you do not need the model to re-derive it. This is the single biggest lever, because it converts a recurring per-run model cost into a one-time compile cost.

Lever 3: match re-phrased requests with a semantic cache

The catch with naive replay is that it only helps when the exact same request repeats. Real agents phrase things differently every time. The semantic dispatch cache matches on intent using a vector match, so "export the May invoices" and "download last month's bills as PDF" can hit the same compiled skill. A cache hit is roughly 5x cheaper than a cold model-driven run.

Dispatch against the cachebash
curl -X POST https://twin-browser.com/api/v1/dispatch \
  -H "Authorization: Bearer $TWIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://app.example.com", "goal": "download last month'\''s bills as PDF" }'
# → matches a previously compiled "export invoices" skill, replays with 0 LLM calls

Lever 4: pool skills across tenants

A cross-tenant skill corpus compounds the savings: a skill compiled once can be safely reused across tenants, so a common flow on a popular site may already be in the corpus the first time you run it. The more the network runs, the higher the hit rate, and the lower everyone's marginal cost.

What does the cost curve look like?

With the every-run model, cost per 1,000 runs is flat — you pay the planner each time. With compile-once + semantic cache, cost per run falls as the same and similar workflows repeat, because more of them resolve to a cached replay. The honest framing: the first run is not cheaper; the thousandth is dramatically cheaper.

Keep going

The mechanics behind this guide: the semantic dispatch cache, deterministic replay, and skill compilation. Plug Twin into MCP, LangChain, or the REST API, or see it applied to AI agents and RPA replacement. Weighing options? See how Twin compares.

How much cheaper is a cache hit?
A deterministic replay makes no LLM planning call at all, so the model cost is effectively zero; illustratively a cache hit runs roughly 5x cheaper than a cold, model-driven execution. The exact ratio depends on your flow length and model.
Does this only help for identical requests?
No. The dispatch cache matches semantically, so re-worded requests for the same underlying task still hit the compiled skill. That is the difference from an exact-match or parameter-hash cache.
How is LLM cost billed?
LLM cost is metered and passed through at 1x, with a transparent rate card at /api/v1/pricing. You pay usage-based credits on top; the entry plan starts at $29/mo and you can start free.

Run your first skill

Compile a task once, then replay it deterministically with zero LLM calls. Start free.