A semantic cache that matches what you meant, not what you typed
Re-phrased goals fuzzy-match a skill you already compiled — so the second request and every one after it skips the LLM entirely.
POST /api/v1/agents/run → x-twin-cache: hitWhat semantic dispatch cache does
Most browser infra re-runs the model on every execution, so cost scales with usage. Twin embeds each goal and matches it against the skills you have already compiled. A near-match dispatches straight to a deterministic replay; only a genuine miss pays for a fresh compile.
Meaning, not string match
Goals are embedded and compared by intent, so "book the 9am" and "reserve the morning slot" hit the same compiled skill.
Tunable match threshold
Set how close a request must be to dispatch from cache. Tighten it for high-stakes flows; loosen it to compound savings on routine work.
Cost trends to zero
The first run compiles and pays full token cost. Every subsequent match replays for a flat credit — marginal cost per run falls as volume rises.
Transparent on every call
Each response reports whether it was a cache hit, a miss, or a fresh compile, so you can see exactly what you paid for.
From a goal to deterministic action
- 1Embed the goalThe incoming natural-language goal is embedded into the dispatch space.
- 2Match against compiled skillsTwin searches your tenant (and, where enabled, the shared corpus) for the nearest compiled skill above your threshold.
- 3Dispatch or compileA match replays deterministically with zero LLM calls; a miss runs the planner once and compiles the result into a new skill.
- 4Bill the differenceA hit costs a flat credit; a compile is metered at 1× LLM passthrough — so repeated work gets cheap, fast.
See it on a real call
A re-phrased goal dispatches to an existing compiled skill — no model call, flat credit.
// Same goal, re-phrased — still a cache hit
const res = await twin.agents.run({
goal: "Reserve the morning slot for Tuesday",
url: "https://acme.example.com/calendar",
});
// → x-twin-cache: hit
// → x-twin-skill: book-slot@v3
// → x-twin-llm-calls: 0- Embed the goaldone
- Match against compiled skillsrunning
- Dispatch or compilequeued
- Bill the differencequeued
What semantic dispatch cache is
The facts — how it works, what it costs, and the signal you get back on every call.
| Property | Twin Browser |
|---|---|
| Match basis | Embedding similarity (intent) |
| Threshold | Per-tenant, tunable |
| Scope | Tenant + opt-in shared corpus |
| Hit cost | Flat credit, 0 LLM calls |
| Miss cost | 1× metered LLM passthrough |
| Signal | x-twin-cache header on every call |
Semantic dispatch cache — common questions
How is this different from a normal response cache?
What happens on a cache miss?
Can I control how aggressive matching is?
The rest of the platform
Deterministic replay
A successful run compiles into a skill — an ordered, parameterized program of browser actions that replays the same way every time.
Cross-tenant skill corpus
Common flows — log in, search, paginate, fill a form — get compiled once and shared through a sanitized corpus, so your agents start ahead.
Token-efficient DOM state
Twin compiles a live page into a numerically-indexed list of interactive elements under a token budget, so the planner sees signal, not markup.
Make every run cheaper than the last.
Start free, compile your first skill, and watch the marginal cost per run trend toward zero as your agents reuse what they have already learned.