CAPABILITY · CACHE

A semantic cache that matches what you meant, not what you typed

Re-phrased goals fuzzy-match a skill you already compiled — so the second request and every one after it skips the LLM entirely.

POST /api/v1/agents/run → x-twin-cache: hit
Built for the cost wedge

What semantic dispatch cache does

Most browser infra re-runs the model on every execution, so cost scales with usage. Twin embeds each goal and matches it against the skills you have already compiled. A near-match dispatches straight to a deterministic replay; only a genuine miss pays for a fresh compile.

Meaning, not string match

Goals are embedded and compared by intent, so "book the 9am" and "reserve the morning slot" hit the same compiled skill.

Tunable match threshold

Set how close a request must be to dispatch from cache. Tighten it for high-stakes flows; loosen it to compound savings on routine work.

Cost trends to zero

The first run compiles and pays full token cost. Every subsequent match replays for a flat credit — marginal cost per run falls as volume rises.

Transparent on every call

Each response reports whether it was a cache hit, a miss, or a fresh compile, so you can see exactly what you paid for.

How it works

From a goal to deterministic action

  1. 1Embed the goalThe incoming natural-language goal is embedded into the dispatch space.
  2. 2Match against compiled skillsTwin searches your tenant (and, where enabled, the shared corpus) for the nearest compiled skill above your threshold.
  3. 3Dispatch or compileA match replays deterministically with zero LLM calls; a miss runs the planner once and compiles the result into a new skill.
  4. 4Bill the differenceA hit costs a flat credit; a compile is metered at 1× LLM passthrough — so repeated work gets cheap, fast.
In practice

See it on a real call

A re-phrased goal dispatches to an existing compiled skill — no model call, flat credit.

run.tsts
// Same goal, re-phrased — still a cache hit
const res = await twin.agents.run({
  goal: "Reserve the morning slot for Tuesday",
  url: "https://acme.example.com/calendar",
});

// → x-twin-cache: hit
// → x-twin-skill: book-slot@v3
// → x-twin-llm-calls: 0
api.twin-browser.com
  1. Embed the goaldone
  2. Match against compiled skillsrunning
  3. Dispatch or compilequeued
  4. Bill the differencequeued
At a glance

What semantic dispatch cache is

The facts — how it works, what it costs, and the signal you get back on every call.

PropertyTwin Browser
Match basisEmbedding similarity (intent)
ThresholdPer-tenant, tunable
ScopeTenant + opt-in shared corpus
Hit costFlat credit, 0 LLM calls
Miss cost1× metered LLM passthrough
Signalx-twin-cache header on every call
FAQ

Semantic dispatch cache — common questions

How is this different from a normal response cache?
A response cache keys on an exact request. Twin matches on the meaning of the goal, so paraphrased and structurally different requests still reuse the same compiled skill.
What happens on a cache miss?
Twin runs the planner once, executes the task, and compiles the successful run into a new skill — so the miss pays for itself by making the next run a hit.
Can I control how aggressive matching is?
Yes. The match threshold is per-tenant. Raise it for sensitive flows where only a very close match should replay; lower it to maximize reuse on routine work.

Make every run cheaper than the last.

Start free, compile your first skill, and watch the marginal cost per run trend toward zero as your agents reuse what they have already learned.