QA & test automation
Author end-to-end tests as goals, run them deterministically, and replay every failure as session video.
The problem
End-to-end test suites are expensive to write and flaky to run. Selector churn breaks tests on every redesign, and when a run fails in CI you get a stack trace, not a recording of what the browser actually did. Maintaining the suite often costs more than the bugs it catches.
- Trigger scenario from CIdone
- Replay the compiled test skillrunning
- Capture durable session videoqueued
- Assert the expected outcomequeued
- Report pass / failqueued
A Twin run for qa & test automation — compile once, then replay on a cache hit.
How Twin solves it
Twin lets you express a test as a goal, compiles it into a deterministic skill, and replays it the same way every time — so a pass is a pass and a failure is reproducible. Because skills plan over indexed DOM state rather than raw selectors, a cosmetic redesign doesn’t red-bar the suite. Every run is captured as live view plus durable session video, so a failure is a recording you can scrub, not a guess.
- 1Write each scenario as a natural-language goal; Twin compiles it into a replayable skill.
- 2Run skills deterministically in CI via the REST API with a Bearer key — no flaky LLM-on-every-step loop.
- 3A redesign shifts the indexed DOM map; intent-matched steps survive cosmetic churn instead of snapping on selectors.
- 4Every run records to durable session video and real-time live view for instant triage.
- 5Authenticated test accounts live in the credential vault, so login flows are covered without secrets in your test code.
One call, then it gets cheaper
Author tests as goals and run them deterministically in CI. Every run records to durable session video, so a failure is a recording you can scrub.
import Twin from '@twin-browser/sdk';
const twin = new Twin({ apiKey: process.env.TWIN_API_KEY });
// Run a compiled test skill from any CI runner with a Bearer key.
const run = await twin.agents.runSkill({
skill: 'checkout-happy-path',
url: 'https://staging.example.com',
credentials: 'test-account',
});
console.log(run.status); // 'completed'
console.log(run.assertions); // [{ name: 'order placed', pass: true }]
console.log(run.videoUrl); // durable session recording for triageWhat happens on this call
- Twin compiles the goal into a deterministic, replayable skill.
- The next re-phrased request matches it in the semantic dispatch cache.
- Matched runs replay with zero LLM calls — credits drop back toward ~1.
- Every call is authenticated, billed, and written to the audit log.
The machinery that bends the cost curve
Every use case runs on the same primitives — the wedge that makes browser work cheaper the more your agents run.
Semantic dispatch cache
Re-phrased requests fuzzy-match a skill you already compiled, so they skip the planner LLM entirely.
Learn moreDeterministic replay
Matched skills replay the same way every time — a pass is a pass, and the marginal cost trends toward zero.
Learn moreToken-efficient DOM state
A live page becomes a compact, numerically-indexed map of interactive elements instead of raw HTML.
Learn moreHuman-in-the-loop handoff
Blocked steps — approvals, MFA on an authorized flow — pause for a person, then resume cleanly.
Learn moreThe outcome
A flaky, selector-bound suite becomes a set of intent-driven skills that survive redesigns, with every failure reproducible from session video — illustratively cutting test-maintenance churn while keeping per-run cost low through deterministic replay.
QA & test automation on Twin — common questions
Is Twin a replacement for Playwright or Cypress?
How do I debug a failed test run?
Can tests run in CI?
More ways teams use Twin
AI agents
Give your LLM agent a real browser it can drive — and stop paying the model on every single run.
Internal workflow automation
Automate the internal tools and vendor portals that have no API — with audit logging and human approval built in.
Accessibility automation
Drive web tasks on a user’s behalf and audit pages for accessibility — over a token-efficient view of the live DOM.
Put qa & test automation on autopilot.
Start free, compile your first skill, and watch the marginal cost per run trend toward zero.