Use case

QA & test automation

Author end-to-end tests as goals, run them deterministically, and replay every failure as session video.

The problem

End-to-end test suites are expensive to write and flaky to run. Selector churn breaks tests on every redesign, and when a run fails in CI you get a stack trace, not a recording of what the browser actually did. Maintaining the suite often costs more than the bugs it catches.

See how Twin works

app.example.com

QA & test automation

Trigger scenario from CIdone
Replay the compiled test skillrunning
Capture durable session videoqueued
Assert the expected outcomequeued
Report pass / failqueued

A Twin run for qa & test automation — compile once, then replay on a cache hit.

The wedge

How Twin solves it

Twin lets you express a test as a goal, compiles it into a deterministic skill, and replays it the same way every time — so a pass is a pass and a failure is reproducible. Because skills plan over indexed DOM state rather than raw selectors, a cosmetic redesign doesn’t red-bar the suite. Every run is captured as live view plus durable session video, so a failure is a recording you can scrub, not a guess.

1Write each scenario as a natural-language goal; Twin compiles it into a replayable skill.
2Run skills deterministically in CI via the REST API with a Bearer key — no flaky LLM-on-every-step loop.
3A redesign shifts the indexed DOM map; intent-matched steps survive cosmetic churn instead of snapping on selectors.
4Every run records to durable session video and real-time live view for instant triage.
5Authenticated test accounts live in the credential vault, so login flows are covered without secrets in your test code.

In practice

One call, then it gets cheaper

Author tests as goals and run them deterministically in CI. Every run records to durable session video, so a failure is a recording you can scrub.

run.tsts

import Twin from '@twin-browser/sdk';

const twin = new Twin({ apiKey: process.env.TWIN_API_KEY });

// Run a compiled test skill from any CI runner with a Bearer key.
const run = await twin.agents.runSkill({
  skill: 'checkout-happy-path',
  url: 'https://staging.example.com',
  credentials: 'test-account',
});

console.log(run.status);       // 'completed'
console.log(run.assertions);   // [{ name: 'order placed', pass: true }]
console.log(run.videoUrl);     // durable session recording for triage

What happens on this call

Twin compiles the goal into a deterministic, replayable skill.
The next re-phrased request matches it in the semantic dispatch cache.
Matched runs replay with zero LLM calls — credits drop back toward ~1.
Every call is authenticated, billed, and written to the audit log.

Read the API docs

Under the hood

The machinery that bends the cost curve

Every use case runs on the same primitives — the wedge that makes browser work cheaper the more your agents run.

Semantic dispatch cache

Re-phrased requests fuzzy-match a skill you already compiled, so they skip the planner LLM entirely.

Learn more

Deterministic replay

Matched skills replay the same way every time — a pass is a pass, and the marginal cost trends toward zero.

Learn more

Token-efficient DOM state

A live page becomes a compact, numerically-indexed map of interactive elements instead of raw HTML.

Learn more

Human-in-the-loop handoff

Blocked steps — approvals, MFA on an authorized flow — pause for a person, then resume cleanly.

Learn more

The outcome

A flaky, selector-bound suite becomes a set of intent-driven skills that survive redesigns, with every failure reproducible from session video — illustratively cutting test-maintenance churn while keeping per-run cost low through deterministic replay.

Go deeper

Integrate with Playwright Glossary: Deterministic replay Glossary: Session video

FAQ

QA & test automation on Twin — common questions

Is Twin a replacement for Playwright or Cypress?

Twin complements them. You can keep your Playwright tests and add Twin for the high-churn, authenticated, intent-driven flows — Twin even integrates with Playwright. The difference is that Twin plans over indexed DOM state and replays deterministically, so cosmetic redesigns don’t break the run.

How do I debug a failed test run?

Every run is captured as durable session video plus real-time live view, so you scrub the exact browser session that failed instead of reconstructing it from a stack trace.

Can tests run in CI?

Yes. Call the REST API under /api/v1/* with a per-tenant Bearer key from any CI runner. Compiled skills replay deterministically, so test results are stable across runs.

AI agents

Give your LLM agent a real browser it can drive — and stop paying the model on every single run.

Internal workflow automation

Automate the internal tools and vendor portals that have no API — with audit logging and human approval built in.

Accessibility automation

Drive web tasks on a user’s behalf and audit pages for accessibility — over a token-efficient view of the live DOM.

Put qa & test automation on autopilot.

Start free, compile your first skill, and watch the marginal cost per run trend toward zero.

Start free Read the guides

QA & test automation

The problem

How Twin solves it

One call, then it gets cheaper

What happens on this call

The machinery that bends the cost curve

Semantic dispatch cache

Deterministic replay

Token-efficient DOM state

Human-in-the-loop handoff

The outcome

Go deeper

QA & test automation on Twin — common questions

More ways teams use Twin

AI agents

Internal workflow automation

Accessibility automation

Put qa & test automation on autopilot.