Use case

QA & test automation

Author end-to-end tests as goals, run them deterministically, and replay every failure as session video.

The problem

End-to-end test suites are expensive to write and flaky to run. Selector churn breaks tests on every redesign, and when a run fails in CI you get a stack trace, not a recording of what the browser actually did. Maintaining the suite often costs more than the bugs it catches.

app.example.com
  1. Trigger scenario from CIdone
  2. Replay the compiled test skillrunning
  3. Capture durable session videoqueued
  4. Assert the expected outcomequeued
  5. Report pass / failqueued

A Twin run for qa & test automation — compile once, then replay on a cache hit.

The wedge

How Twin solves it

Twin lets you express a test as a goal, compiles it into a deterministic skill, and replays it the same way every time — so a pass is a pass and a failure is reproducible. Because skills plan over indexed DOM state rather than raw selectors, a cosmetic redesign doesn’t red-bar the suite. Every run is captured as live view plus durable session video, so a failure is a recording you can scrub, not a guess.

  1. 1Write each scenario as a natural-language goal; Twin compiles it into a replayable skill.
  2. 2Run skills deterministically in CI via the REST API with a Bearer key — no flaky LLM-on-every-step loop.
  3. 3A redesign shifts the indexed DOM map; intent-matched steps survive cosmetic churn instead of snapping on selectors.
  4. 4Every run records to durable session video and real-time live view for instant triage.
  5. 5Authenticated test accounts live in the credential vault, so login flows are covered without secrets in your test code.
In practice

One call, then it gets cheaper

Author tests as goals and run them deterministically in CI. Every run records to durable session video, so a failure is a recording you can scrub.

run.tsts
import Twin from '@twin-browser/sdk';

const twin = new Twin({ apiKey: process.env.TWIN_API_KEY });

// Run a compiled test skill from any CI runner with a Bearer key.
const run = await twin.agents.runSkill({
  skill: 'checkout-happy-path',
  url: 'https://staging.example.com',
  credentials: 'test-account',
});

console.log(run.status);       // 'completed'
console.log(run.assertions);   // [{ name: 'order placed', pass: true }]
console.log(run.videoUrl);     // durable session recording for triage

What happens on this call

  • Twin compiles the goal into a deterministic, replayable skill.
  • The next re-phrased request matches it in the semantic dispatch cache.
  • Matched runs replay with zero LLM calls — credits drop back toward ~1.
  • Every call is authenticated, billed, and written to the audit log.
Read the API docs

The outcome

A flaky, selector-bound suite becomes a set of intent-driven skills that survive redesigns, with every failure reproducible from session video — illustratively cutting test-maintenance churn while keeping per-run cost low through deterministic replay.

FAQ

QA & test automation on Twin — common questions

Is Twin a replacement for Playwright or Cypress?
Twin complements them. You can keep your Playwright tests and add Twin for the high-churn, authenticated, intent-driven flows — Twin even integrates with Playwright. The difference is that Twin plans over indexed DOM state and replays deterministically, so cosmetic redesigns don’t break the run.
How do I debug a failed test run?
Every run is captured as durable session video plus real-time live view, so you scrub the exact browser session that failed instead of reconstructing it from a stack trace.
Can tests run in CI?
Yes. Call the REST API under /api/v1/* with a per-tenant Bearer key from any CI runner. Compiled skills replay deterministically, so test results are stable across runs.

Put qa & test automation on autopilot.

Start free, compile your first skill, and watch the marginal cost per run trend toward zero.