Playwright Best Practices: 8 Patterns for Stable E2E (2026)

Playwright best practices (2026 edition): Reach for getByRole first — it is what Playwright's official docs recommend and it doubles as an accessibility check. Fall back to getByLabel, getByText, and getByTestId before you consider raw CSS or XPath. Configure expect timeouts per assertion rather than globally. Shard in CI, use workers locally. Build fixtures around business actions, not raw Playwright APIs. Open the Trace Viewer before you touch flaky test code. In CI, set explicit action timeouts and take screenshots on failure. Store authenticated state with storageState and never re-login in every test. In 2026, skip the Page Object Model for small-to-medium suites. Composable fixtures give you the same isolation with less ceremony. Every practice in this article has a runnable companion spec.

When we analyzed flakiness across Playwright suites, the failures clustered around three root causes: selector fragility, authentication overhead, and misconfigured timeouts. Not framework bugs. Not infrastructure variance. Configuration choices that looked reasonable at test-zero and became expensive at test-four-hundred.

The selector problem alone accounts for roughly half of all "it worked last week" failures. CSS selectors break silently on redesigns. Text selectors fail the moment a PM updates copy or the app goes multilingual. The fix is simple and permanent, but the Playwright docs present all selector strategies as equally valid options, leaving teams to learn the hard way which ones hold.

The eight practices below address each failure class directly, ordered by how often we see them cause production-level test debt. Each one has a runnable spec so you can benchmark the difference on your own codebase.

1. Playwright Selector Strategy: The Hierarchy That Actually Holds

The Playwright docs rank locators for you explicitly: getByRole first, then getByLabel, getByPlaceholder, getByText, getByAltText, getByTitle. getByTestId is a deliberate fallback for the cases where nothing else works, and CSS or XPath sit below that. This is the hierarchy that produces the most stable suites, and it matches how real users identify elements.

Priority	Selector Type	When to Use	Stability
1	getByRole	Buttons, links, headings, form controls — the default for anything accessible	Highest
2	getByLabel / getByPlaceholder	Form inputs identified by visible labels or placeholders	High
3	getByText / getByAltText / getByTitle	Unique visible copy, image alt text, tooltip targets	Medium
4	getByTestId (data-testid)	Deliberate fallback when nothing user-facing fits	Medium-high
5	CSS / XPath	Last resort only	Low

Selector stability hierarchy showing five tiers from most stable at top to least stable at bottom, with the top tier glowing brightly and lower tiers progressively cracking and dimming

getByRole wins because it expresses intent ("this is a submit button") rather than implementation, and the match is anchored in the accessibility tree. When a designer moves the button or renames the class, the test still passes. The rule doubles as an accessibility check: if getByRole('button', { name: 'Submit' }) cannot find your button, a screen reader probably cannot either.

getByLabel and getByPlaceholder are the right reach for form inputs. They mirror how users actually identify fields ("the one labeled Email") and survive styling refactors cleanly. getByText handles static links, headings, and unique visible copy. All three tier below role because roles are more semantically precise.

getByTestId is the explicit fallback when nothing user-facing will do: a generic <div> you cannot give a role, an element whose copy is likely to change, a third-party widget that does not expose anything meaningful to assistive tech. The Playwright docs are explicit that data-testid is "not user facing" and should be reached for last among Playwright's own locators. Adding test IDs everywhere is a smell; using them where roles and labels do not exist is the right move.

CSS and XPath sit below even getByTestId. They couple your tests to implementation details: class names, DOM structure, nesting depth. Use them only when every user-facing locator is genuinely unavailable, and scope them as tightly as possible. If you find yourself writing a long XPath expression, that is a signal to add a semantic role or a test ID instead.

Here is how these four levels look in practice, with the same interaction written using each strategy:

// @ts-check
import { test, expect } from "@playwright/test";

/**
 * Pattern 1 — Selector strategy
 *
 * The same "click login, verify dashboard" interaction expressed four ways.
 * `data-testid` and `role` survive component renames; CSS and XPath break.
 *
 * Run:
 *   npx playwright test src/01-selector-strategy.spec.js
 *
 * Requires a local server on port 3000 with:
 *   - A page at "/" containing a login button
 *   - A "/dashboard" page shown after clicking login
 */

test.describe("Selector strategy comparison", () => {
  test.beforeEach(async ({ page }) => {
    await page.goto("/");
  });

  test("data-testid — survives renames", async ({ page }) => {
    const loginButton = page.locator('[data-testid="login-button"]');
    await expect(loginButton).toBeVisible();
    await loginButton.click();
    await expect(page).toHaveURL(/.*dashboard/);
    await expect(
      page.locator('[data-testid="welcome-message"]')
    ).toBeVisible();
  });

  test("role locator — survives renames", async ({ page }) => {
    const loginButton = page.getByRole("button", { name: "Log in" });
    await expect(loginButton).toBeVisible();
    await loginButton.click();
    await expect(page).toHaveURL(/.*dashboard/);
    await expect(
      page.getByRole("heading", { name: /welcome/i })
    ).toBeVisible();
  });

  test("CSS selector — breaks on component rename", async ({ page }) => {
    /*
     * This selector is tightly coupled to the class name.
     * If the component is renamed from "LoginButton" to "AuthButton",
     * the class name changes and this test breaks.
     */
    const loginButton = page.locator("button.LoginButton__primary");
    await expect(loginButton).toBeVisible();
    await loginButton.click();
    await expect(page).toHaveURL(/.*dashboard/);
    await expect(page.locator("h1.Dashboard__welcome")).toBeVisible();
  });

  test("XPath — breaks on DOM restructure", async ({ page }) => {
    /*
     * XPath encodes the exact DOM hierarchy.
     * Wrapping the button in a new <div> breaks this instantly.
     */
    const loginButton = page.locator(
      'xpath=//div[@class="header"]//button[contains(text(),"Log in")]'
    );
    await expect(loginButton).toBeVisible();
    await loginButton.click();
    await expect(page).toHaveURL(/.*dashboard/);
    await expect(
      page.locator('xpath=//main//h1[contains(text(),"Welcome")]')
    ).toBeVisible();
  });
});

The spec makes the stability difference concrete: the data-testid version survives a full component refactor. The XPath version breaks when the DOM nesting changes by one level.

2. Retry Logic: Stop Fighting Timeouts, Start Configuring Them

Most flaky Playwright tests are not actually flaky. They are tests with wrong timeout assumptions.

The default per-test timeout is 30 seconds. The default expect timeout, which controls how long Playwright retries an assertion before failing, is 5 seconds. The default actionTimeout and navigationTimeout are both 0 — meaning no per-action cap at all. Most teams never change these defaults, then wonder why tests that target slow animations or API-dependent UI randomly fail.

The right approach is the opposite of the instinct. Do not raise the global test timeout. Instead, set explicit per-action and per-assertion bounds (actionTimeout: 8000, expect: { timeout: 8000 } is usually enough for a fast app) and override individually where the UI genuinely needs more time. This forces intentional reasoning: "this specific assertion on this specific element takes up to 12 seconds because it waits for a background job." That comment in the test is far more useful than a blanket 30-second test timeout hiding every slow spot.

test.describe.configure({ retries: 2 }) is the other lever most teams either overuse or never touch. Retries are not a fix for flakiness. They are a band-aid that hides the root cause and inflates CI time. Use them sparingly: only for tests that interact with genuinely non-deterministic external systems (email delivery, third-party OAuth, payment processors). For everything else, a retry budget of 0 forces you to find the real issue.

The companion spec shows three patterns: a tight global timeout with per-assertion overrides, selective retry configuration scoped to a single describe block, and the expect.poll pattern for waiting on eventually-consistent state changes:

// @ts-check
import { test, expect } from "@playwright/test";

/**
 * Pattern 2 — Retry logic and timeouts
 *
 * Demonstrates:
 *   1. Tight global timeout (set in playwright.config.js at 8 s)
 *   2. Per-assertion timeout overrides
 *   3. describe-scoped retries for inherently racy specs
 *   4. expect.poll() for eventually-consistent UI
 *
 * Run:
 *   npx playwright test src/02-retry-logic.spec.js
 */

test.describe("Retry logic — tight timeouts", () => {
  test("assertion-level timeout override", async ({ page }) => {
    await page.goto("/");

    /*
     * The global expect timeout is 5 s (see playwright.config.js).
     * For a lazy-loaded element that takes longer, override per-assertion
     * rather than inflating the global timeout for every test.
     */
    const lazySection = page.locator('[data-testid="lazy-section"]');
    await expect(lazySection).toBeVisible({ timeout: 7_000 });
  });

  test("expect.poll — waiting for an eventually-consistent counter", async ({
    page,
  }) => {
    await page.goto("/");

    /*
     * expect.poll re-evaluates the callback every 250 ms (configurable)
     * until the assertion passes or the timeout expires.
     * This is ideal for elements that update asynchronously.
     */
    await expect
      .poll(
        async () => {
          const counterText = await page
            .locator('[data-testid="live-counter"]')
            .textContent();
          return parseInt(counterText || "0", 10);
        },
        {
          message: "Counter should reach at least 5",
          intervals: [250, 500, 1_000],
          timeout: 6_000,
        }
      )
      .toBeGreaterThanOrEqual(5);
  });
});

test.describe("Describe-scoped retries for racy specs", () => {
  /*
   * Instead of retrying every test globally, scope retries to the
   * describe block that contains the inherently racy interaction.
   * This keeps fast specs fast while giving flaky ones a second chance.
   */
  test.describe.configure({ retries: 2 });

  test("animation-dependent assertion", async ({ page }) => {
    await page.goto("/");

    const toast = page.locator('[data-testid="toast-notification"]');
    await page.locator('[data-testid="trigger-toast"]').click();

    /*
     * Toasts often depend on CSS animations whose duration varies
     * slightly between runs. Retries absorb this variance.
     */
    await expect(toast).toBeVisible();
    await expect(toast).toContainText("Saved");
  });
});

expect.poll deserves special attention. It is Playwright's built-in solution for assertions that need to repeatedly check a condition: polling an API endpoint until a status changes, waiting for a background job to complete, watching a counter increment. Using a while loop with page.waitForTimeout is the wrong pattern here; expect.poll gives you automatic retry, configurable intervals, and a clean failure message.

The Flaky Tests Consume 20% of CI Time analysis makes this concrete with real cost numbers: a suite with a 3% flake rate and 40-minute CI runs burns roughly one full engineering day per week on reruns alone. Tight timeouts and zero-tolerance retries are not just cleanliness. They are a budget decision.

3. Parallelism: Sharding vs. Workers and When Each Matters

Workers versus sharding comparison showing one machine with parallel lanes on the left and multiple distributed machines on the right

Playwright gives you two independent parallelism axes: workers (parallel execution within a single machine) and sharding (splitting the test suite across multiple CI machines). Teams routinely confuse them or try to use one when they need the other.

Workers are local-first. On a developer machine with 8 CPU cores, running with --workers=4 roughly halves execution time for an independent test suite. The constraint is that tests must be fully isolated: no shared browser state, no shared database rows, no implicit ordering. If your tests were written to run sequentially (common in older suites), adding workers will surface every hidden dependency.

Sharding is CI-first. Once your suite exceeds roughly 5 minutes on a single machine, sharding across multiple parallel CI runners is the right scaling move. With --shard=1/4 --shard=2/4 --shard=3/4 --shard=4/4, Playwright splits the file list evenly across four machines. Each shard runs its own subset with full worker parallelism. A 20-minute suite on a single machine can become a 5-minute suite across four shards.

The practical advice: start with workers and get test isolation right first. Then, when the suite grows past the single-machine ceiling, add sharding without changing how tests are written. Sharding a suite that has hidden sequential dependencies will produce intermittent failures that are extremely difficult to debug across machines.

// @ts-check
import { test, expect } from "@playwright/test";

/**
 * Pattern 3 — Parallelism and sharding
 *
 * Each test is fully isolated — no shared state between workers.
 * Run with parallelism:
 *   npx playwright test src/03-parallelism.spec.js --workers=4
 *
 * Or shard across two CI nodes:
 *   npx playwright test --shard=1/2
 *   npx playwright test --shard=2/2
 */

test.describe("Parallelism — isolated test suite A", () => {
  test("creates a new user and verifies profile", async ({ page }) => {
    const uniqueEmail = `user-${Date.now()}-a@example.com`;

    await page.goto("/signup");
    await page.getByLabel("Email").fill(uniqueEmail);
    await page.getByLabel("Password").fill("Test1234!");
    await page.getByRole("button", { name: "Sign up" }).click();

    await expect(page).toHaveURL(/.*profile/);
    await expect(page.getByText(uniqueEmail)).toBeVisible();
  });

  test("searches the product catalog", async ({ page }) => {
    await page.goto("/products");

    await page.getByPlaceholder("Search products").fill("keyboard");
    await page.getByRole("button", { name: "Search" }).click();

    const results = page.locator('[data-testid="product-card"]');
    await expect(results).not.toHaveCount(0);
    await expect(results.first()).toContainText(/keyboard/i);
  });
});

test.describe("Parallelism — isolated test suite B", () => {
  test("adds an item to the cart", async ({ page }) => {
    await page.goto("/products");

    const firstProduct = page.locator('[data-testid="product-card"]').first();
    await firstProduct.getByRole("button", { name: "Add to cart" }).click();

    const cartBadge = page.locator('[data-testid="cart-badge"]');
    await expect(cartBadge).toHaveText("1");
  });

  test("navigates the footer links", async ({ page }) => {
    await page.goto("/");

    const footer = page.locator("footer");
    const privacyLink = footer.getByRole("link", { name: "Privacy Policy" });
    await privacyLink.click();

    await expect(page).toHaveURL(/.*privacy/);
    await expect(
      page.getByRole("heading", { name: /privacy policy/i })
    ).toBeVisible();
  });
});

One thing the Playwright docs do not emphasize: shard count should match your CI machine count, not your CPU count. Adding four shards when you only have two CI runners means two machines each running two shards sequentially, and the net result is no speedup, just more configuration complexity.

4. Fixture Design: Business Actions, Not API Wrappers

Playwright fixtures are dependency injection for tests. The docs show the mechanics: how to define a fixture, how to scope it, how to compose fixtures together. What they do not show is the design principle that makes fixtures actually useful: fixtures should represent business actions and state, not wrap Playwright APIs.

A bad fixture wraps page.goto and page.click in a function called navigateToCheckout. It saves a few lines per test but creates a leaky abstraction. The fixture callers still need to know which page they are on and what state they are in.

A good fixture represents a complete business precondition: authenticatedUser, cartWithThreeItems, completedOrder. Each fixture sets up the application state its name promises and tears it down cleanly after the test. Callers receive a ready-to-use context, not a partially-configured page.

The difference shows up in test readability. A test using bad fixtures reads like a script. A test using good fixtures reads like a specification.

Fixture scoping matters too. Use scope: 'worker' for expensive setup that all tests in a worker can share (database seeding, server startup, browser context creation). Use the default test scope for anything that must be reset between tests. Misscoping is the second most common source of test cross-contamination after shared globals.

Three-layer fixture composition architecture showing a shared foundation at the bottom, test-scoped fixtures in the middle, and composed assertions at the top

// @ts-check
import { test as base, expect } from "@playwright/test";

/**
 * Pattern 4 — Fixture design
 *
 * Demonstrates three fixture scopes:
 *   1. Worker-scoped — shared server config across all tests in one worker
 *   2. Test-scoped — fresh authenticated browser context per test
 *   3. Composable — a cart fixture that builds on top of the auth fixture
 *
 * The test body contains only assertions; all setup lives in fixtures.
 *
 * Run:
 *   npx playwright test src/04-fixtures.spec.js
 */

/* ---------- Fixture definitions ---------- */

const test = base.extend({
  /**
   * Worker-scoped fixture: resolves the server URL once per worker.
   * Avoids repeating environment-lookup logic in every test.
   */
  serverURL: [
    // eslint-disable-next-line no-empty-pattern
    async ({}, use) => {
      const url = process.env.BASE_URL || "http://localhost:3000";
      await use(url);
    },
    { scope: "worker" },
  ],

  /**
   * Test-scoped fixture: creates a fresh browser context that is
   * already authenticated via storage state (or direct login).
   * Each test gets its own session — no cross-test leakage.
   */
  authenticatedPage: async ({ browser, serverURL }, use) => {
    const context = await browser.newContext();
    const page = await context.newPage();

    /* Perform a quick programmatic login. */
    await page.goto(`${serverURL}/login`);
    await page.getByLabel("Email").fill("test@example.com");
    await page.getByLabel("Password").fill("Test1234!");
    await page.getByRole("button", { name: "Log in" }).click();
    await page.waitForURL(/.*dashboard/);

    await use(page);

    /* Teardown: close the context to free resources. */
    await context.close();
  },

  /**
   * Composable fixture: builds on `authenticatedPage` to add an item
   * to the cart before the test body runs.
   */
  pageWithCart: async ({ authenticatedPage }, use) => {
    await authenticatedPage.goto("/products");
    await authenticatedPage
      .locator('[data-testid="product-card"]')
      .first()
      .getByRole("button", { name: "Add to cart" })
      .click();
    await expect(
      authenticatedPage.locator('[data-testid="cart-badge"]')
    ).toHaveText("1");

    await use(authenticatedPage);
  },
});

/* ---------- Tests — only assertions ---------- */

test.describe("Fixture design", () => {
  test("authenticated page shows the dashboard", async ({
    authenticatedPage,
  }) => {
    await expect(authenticatedPage).toHaveURL(/.*dashboard/);
    await expect(
      authenticatedPage.getByRole("heading", { name: /welcome/i })
    ).toBeVisible();
  });

  test("cart fixture starts with one item", async ({ pageWithCart }) => {
    await pageWithCart.goto("/cart");
    const cartItems = pageWithCart.locator('[data-testid="cart-item"]');
    await expect(cartItems).toHaveCount(1);
  });

  test("worker-scoped server URL is consistent", async ({
    serverURL,
    page,
  }) => {
    expect(serverURL).toMatch(/^https?:\/\//);
    await page.goto(serverURL);
    await expect(page).toHaveTitle(/.+/);
  });
});

The spec demonstrates three fixture levels: a worker-scoped server fixture, a test-scoped authenticated context, and a composable cart fixture that builds on the auth fixture. The key observation is how the test body itself becomes a pure assertion. Setup is invisible. This is the same design principle Autonoma follows when generating tests from your codebase: every generated scenario receives composable fixtures scoped to the exact business state it needs, so the test body stays focused on verification.

5. Playwright Trace Viewer: Your First Move on Any Flaky Test

Most engineers reach for console.log, page.pause(), or a barrage of waitForTimeout calls when a Playwright test fails unexpectedly. All of these are slower than opening the Trace Viewer.

The Trace Viewer captures a complete timeline of your test: every action, every network request, every DOM snapshot before and after each step, every console log, and a full video of the browser session. It makes the debugging loop almost trivial. You see the exact state of the DOM at the moment the assertion failed, the network response that came back 50ms too late, the element that was covered by an overlay.

Enable traces with trace: 'on-first-retry' in your Playwright config (not trace: 'on', which generates a trace for every test including passing ones, inflating CI artifact storage). On first-retry captures the trace precisely when you need it: after a test has failed once and is retrying, the trace records the second attempt so you have evidence of the failure mode.

The workflow is: run the suite, find the failing test in CI artifacts, download the .zip trace file, run npx playwright show-trace trace.zip. From first failure report to root cause identification, this usually takes under five minutes for timing issues and under two minutes for element visibility problems.

Four-stage trace debugging pipeline from test failure to trace recording to inspection to resolution

#!/usr/bin/env bash
# Pattern 5 — Trace Viewer workflow
#
# Runs the full test suite with trace-on-first-retry, finds the first
# .zip trace artifact, and opens it in the Playwright Trace Viewer.
#
# Usage:
#   bash scripts/05-trace-investigate.sh
#
# Requirements:
#   - Node 20+
#   - Playwright installed (`npx playwright install`)

set -euo pipefail

TRACE_DIR="test-results"

echo "==> Running test suite with tracing enabled (on-first-retry)..."
npx playwright test --retries=1 --reporter=list || true

echo ""
echo "==> Searching for trace artifacts in ${TRACE_DIR}/..."

TRACE_FILE=$(find "$TRACE_DIR" -name "trace.zip" -type f 2>/dev/null | head -n 1)

if [[ -z "$TRACE_FILE" ]]; then
  echo "    No trace artifacts found."
  echo "    This means every test passed on the first attempt — no retries triggered."
  echo ""
  echo "    To force a trace, run:"
  echo "      npx playwright test --trace on"
  echo "    and then:"
  echo "      npx playwright show-trace test-results/<test-folder>/trace.zip"
  exit 0
fi

echo "    Found trace: ${TRACE_FILE}"
echo ""
echo "==> Opening Trace Viewer..."
npx playwright show-trace "$TRACE_FILE"

The script automates the most common trace workflow: run tests, extract the first trace, open it directly. It is also the right pattern for a local debug loop: run the failing test once with --trace on, let it fail, then inspect without re-running.

For teams investing in Test Automation Metrics That Actually Predict Release Quality, trace data is underused signal. Time-to-failure, action count before failure, and network error patterns in traces can surface systemic instability before it becomes a manual debugging burden.

6. CI Flakiness Fixes: The Four That Actually Work

Flaky tests in CI almost always have one of four root causes: missing explicit action timeouts, navigation wait conditions that assume too much, no screenshot on failure, or environment assumptions baked into test logic.

Action timeouts are the most common. Playwright's default actionTimeout is 0 — no per-action cap — so the only thing bailing you out is the 30-second per-test timeout. A click that takes 25 seconds to complete is not slow. It is broken. Set actionTimeout: 8000 in your project config and watch previously-hidden slowness become visible failures instead of intermittent ones. The same applies to navigationTimeout, which also defaults to 0.

Navigation waits are subtler. page.goto(url) completes when the browser fires the load event, which happens before many React and Vue applications have finished hydrating. Tests that immediately query the DOM after navigation fail on slow CI machines where hydration takes slightly longer. The fix is page.goto(url, { waitUntil: 'networkidle' }) for apps with API calls on load, or a specific waitFor on a landmark element that only appears after hydration.

Screenshot on failure is non-negotiable in CI. Without it, a failing test report tells you the assertion that failed and the stack trace, not what the user saw. screenshot: 'only-on-failure' in the Playwright config adds a screenshot to every failure artifact at near-zero cost.

Environment assumptions are the sneakiest. Tests that pass on developer machines with seeded databases fail in CI environments with clean state. Tests that depend on a specific timezone, locale, or system font render differently across environments. The fix is to make every environmental dependency explicit: seed data as a fixture, set timezone in the Playwright config, avoid font-dependent visual comparisons in CI.

# Pattern 6 — CI flakiness fixes
#
# GitHub Actions workflow encoding four CI reliability patterns:
#   1. Tight action timeout (8 s in playwright.config.js)
#   2. networkidle navigation wait (handled in test code)
#   3. Screenshot artifact upload on failure only
#   4. TZ/LANG environment variable pinning
#
# See: https://getautonoma.com/blog/playwright-best-practices-2026

name: E2E Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

# Pin timezone and locale so date-formatting assertions are deterministic.
env:
  TZ: UTC
  LANG: en_US.UTF-8
  LC_ALL: en_US.UTF-8
  CI: "true"

jobs:
  e2e:
    runs-on: ubuntu-latest
    timeout-minutes: 10

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: "20"
          cache: "npm"

      - name: Install dependencies
        run: npm ci

      - name: Install Playwright browsers
        run: npx playwright install --with-deps chromium

      - name: Run E2E tests
        run: npx playwright test --reporter=github

      - name: Upload failure screenshots
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: playwright-screenshots
          path: test-results/**/*.png
          retention-days: 7

      - name: Upload trace artifacts
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: playwright-traces
          path: test-results/**/*.zip
          retention-days: 7

This GitHub Actions workflow encodes all four fixes: explicit action timeout, networkidle navigation, screenshot artifacts, and environment variables that pin timezone and locale for every run. It is also the starting configuration we recommend for teams following The E2E Testing Strategy That Scales With AI-Generated Code. CI reliability is the prerequisite for everything else.

Autonoma approaches this problem from the other direction. Rather than writing tests that need careful timeout tuning, AI agents generate tests from your codebase with verification steps built in. When the app changes, Autonoma adapts the tests automatically — no rewriting selectors — so the category of "CI flakiness caused by a UI change" largely disappears. For teams already deep in a Playwright suite, these four fixes are the right path. For teams starting fresh, it is worth asking whether you want to own the timeout configuration at all.

7. Playwright Authentication Storage State: Never Re-Login in Tests

Authentication is the most commonly mishandled setup step in Playwright suites. The typical pattern is a beforeEach that navigates to the login page, fills in credentials, clicks submit, and waits for the redirect. On a suite of 100 tests, that is 100 full round-trips through your login flow, easily 3-5 minutes of pure overhead.

Playwright's storageState feature eliminates this entirely. The idea is simple: run the login flow once, save the resulting browser storage (cookies, localStorage, sessionStorage) to a JSON file, then load that file as the starting state for every subsequent test. Tests start already authenticated, with zero login overhead.

The setup is a global setup file that runs once before the entire suite. It creates a browser context, logs in, saves the storage state to disk, and exits. Each spec file then references that state file in its fixture. If your app has multiple user roles, you run the setup once per role and maintain separate state files.

One important caveat: storage state files contain session tokens. They should live outside your repository (or in a .gitignore-d directory) and be regenerated at the start of each CI run rather than committed and reused across days. Stale tokens cause an entire category of mysterious authentication failures that look like flakiness but are actually token expiry.

// @ts-check
import { test as base, expect } from "@playwright/test";
import path from "node:path";
import { fileURLToPath } from "node:url";

/**
 * Pattern 7 — Auth via storageState
 *
 * Uses a global-setup script to authenticate once and persist
 * cookies/localStorage to a JSON file. Each test loads that file
 * via storageState — zero login overhead per test.
 *
 * Run:
 *   npx playwright test src/07-auth-storage-state.spec.js
 *
 * The global-setup.js file (in the repo root) creates
 * .auth/storage-state.json before any test runs.
 */

const __dirname = path.dirname(fileURLToPath(import.meta.url));
const STORAGE_STATE = path.resolve(__dirname, "..", ".auth", "storage-state.json");

/* ---------- Fixture: load the persisted auth state ---------- */

const test = base.extend({
  /**
   * Creates a browser context pre-loaded with the storage state
   * produced by global-setup.js. The test starts already logged in.
   */
  authedPage: async ({ browser }, use) => {
    const context = await browser.newContext({ storageState: STORAGE_STATE });
    const page = await context.newPage();
    await use(page);
    await context.close();
  },
});

/* ---------- Tests — already authenticated ---------- */

test.describe("Auth via storage state", () => {
  test("dashboard loads without a login step", async ({ authedPage }) => {
    await authedPage.goto("/dashboard");

    /* No login form, no redirect — we are already authenticated. */
    await expect(authedPage).toHaveURL(/.*dashboard/);
    await expect(
      authedPage.getByRole("heading", { name: /welcome/i })
    ).toBeVisible();
  });

  test("profile page shows the authenticated user", async ({ authedPage }) => {
    await authedPage.goto("/profile");

    await expect(authedPage.locator('[data-testid="user-email"]')).toHaveText(
      "test@example.com"
    );
  });

  test("API call includes auth cookie", async ({ authedPage }) => {
    await authedPage.goto("/dashboard");

    /*
     * Intercept an API call to verify the auth cookie is attached.
     * This proves storageState loaded correctly.
     */
    const [request] = await Promise.all([
      authedPage.waitForRequest((req) => req.url().includes("/api/me")),
      authedPage.locator('[data-testid="refresh-profile"]').click(),
    ]);

    const cookies = await authedPage.context().cookies();
    const authCookie = cookies.find((c) => c.name === "session");
    expect(authCookie).toBeTruthy();
    expect(request.headers()["cookie"]).toContain("session=");
  });
});

The spec shows the full pattern: global setup creating the state file, a fixture loading it per test, and a spec that uses the fixture without any login logic in the test body itself. The test body focuses entirely on what it is actually testing.

8. Playwright Page Object Model: Why You Might Not Need It in 2026

The Page Object Model has been the default architectural recommendation for E2E test suites for over a decade. The idea is sound: encapsulate page interactions behind class methods, making tests independent of UI implementation details.

In 2026, for Playwright specifically, the tradeoff has shifted. Playwright fixtures give you most of what the POM gives you (reusable setup, clean separation of concerns, composability) without the ceremony of class hierarchies and method chaining. A fixture that sets up a product page with specific inventory state is easier to understand, easier to modify, and easier to test than a ProductPage class with 20 methods.

The POM still makes sense in specific scenarios: large suites (200+ specs) where multiple engineers need a shared vocabulary for page interactions; teams migrating from Selenium where the POM is already established; and applications with very stable UI structure where the abstraction cost is genuinely paid back over time.

For everyone else, especially teams building new Playwright suites today, start with composable fixtures. If you later find yourself duplicating interaction logic across 20 fixtures, that is the natural signal to extract a page object. Do not start with the abstraction; let the duplication tell you when it is needed.

This connects directly to the broader testing architecture question in Storybook vs Playwright Component Testing: the right abstraction layer depends on what you are actually testing, not on what the conventional wisdom says you should do.

// @ts-check
import { test as base, expect } from "@playwright/test";

/**
 * Pattern 8 — Page Object Model vs. composable fixtures
 *
 * Two describe blocks implement the same checkout feature:
 *   1. Classic POM — a CheckoutPage class with methods
 *   2. Composable fixtures — small, reusable setup blocks
 *
 * Compare line count, readability, and isolation.
 *
 * Run:
 *   npx playwright test src/08-pom-or-not.spec.js
 */

/* ========================================================
 * Approach A — Classic Page Object Model
 * ======================================================== */

class CheckoutPage {
  /**
   * @param {import("@playwright/test").Page} page
   */
  constructor(page) {
    this.page = page;
    this.cardNumberInput = page.getByLabel("Card number");
    this.expiryInput = page.getByLabel("Expiry");
    this.cvcInput = page.getByLabel("CVC");
    this.payButton = page.getByRole("button", { name: "Pay now" });
    this.confirmationHeading = page.getByRole("heading", {
      name: /order confirmed/i,
    });
  }

  async navigate() {
    await this.page.goto("/checkout");
  }

  /**
   * @param {{ cardNumber: string; expiry: string; cvc: string }} details
   */
  async fillPaymentDetails({ cardNumber, expiry, cvc }) {
    await this.cardNumberInput.fill(cardNumber);
    await this.expiryInput.fill(expiry);
    await this.cvcInput.fill(cvc);
  }

  async submitPayment() {
    await this.payButton.click();
  }

  async expectConfirmation() {
    await expect(this.confirmationHeading).toBeVisible();
  }
}

base.describe("Approach A — Page Object Model", () => {
  base("completes a checkout with POM", async ({ page }) => {
    const checkout = new CheckoutPage(page);

    await checkout.navigate();
    await checkout.fillPaymentDetails({
      cardNumber: "4242424242424242",
      expiry: "12/30",
      cvc: "123",
    });
    await checkout.submitPayment();
    await checkout.expectConfirmation();
  });

  base("shows validation error for expired card (POM)", async ({ page }) => {
    const checkout = new CheckoutPage(page);

    await checkout.navigate();
    await checkout.fillPaymentDetails({
      cardNumber: "4242424242424242",
      expiry: "01/20",
      cvc: "123",
    });
    await checkout.submitPayment();

    await expect(page.locator('[data-testid="payment-error"]')).toContainText(
      /expired/i
    );
  });
});

/* ========================================================
 * Approach B — Composable Fixtures (no POM class)
 * ======================================================== */

const test = base.extend({
  /**
   * Fixture: navigates to checkout and fills in valid payment details.
   * Tests that need a different state can override the fixture.
   */
  checkoutPage: async ({ page }, use) => {
    await page.goto("/checkout");
    await page.getByLabel("Card number").fill("4242424242424242");
    await page.getByLabel("Expiry").fill("12/30");
    await page.getByLabel("CVC").fill("123");
    await use(page);
  },
});

test.describe("Approach B — Composable Fixtures", () => {
  test("completes a checkout with fixtures", async ({ checkoutPage }) => {
    await checkoutPage.getByRole("button", { name: "Pay now" }).click();
    await expect(
      checkoutPage.getByRole("heading", { name: /order confirmed/i })
    ).toBeVisible();
  });

  test("shows validation error for expired card (fixtures)", async ({
    page,
  }) => {
    /* Override the fixture's defaults for this specific scenario. */
    await page.goto("/checkout");
    await page.getByLabel("Card number").fill("4242424242424242");
    await page.getByLabel("Expiry").fill("01/20");
    await page.getByLabel("CVC").fill("123");
    await page.getByRole("button", { name: "Pay now" }).click();

    await expect(page.locator('[data-testid="payment-error"]')).toContainText(
      /expired/i
    );
  });
});

The spec includes two versions of the same test suite side-by-side: one using a classic POM class and one using composable fixtures. Both achieve the same isolation. The fixture version is 30% fewer lines and requires no class instantiation in test files.

Playwright Anti-Patterns to Avoid

Some patterns appear constantly in Playwright suites and consistently cause problems. None of them are obvious mistakes. They feel reasonable at the time.

Hardcoded waitForTimeout calls. page.waitForTimeout(2000) is a sleep. It adds two seconds to every test run unconditionally, makes tests slower on fast machines, and still fails on slow CI machines if two seconds is not enough. Replace with waitFor on a specific element or expect.poll for condition-based waiting.

Global state mutations in tests. Tests that modify application state without cleanup (creating users, changing settings, placing orders) contaminate subsequent tests. Use fixtures with teardown logic, or scope mutations to test-specific data that cannot affect other tests.

Ignoring the browser console. Playwright can capture browser console errors. A test that passes while the browser is logging uncaught exceptions is not actually passing. It is passing despite a broken state. Add a console listener in your global setup and fail tests that produce unexpected errors.

Testing implementation, not behavior. Selecting by class names, internal component IDs, or DOM structure couples tests to implementation details. When the implementation changes for a valid reason, tests break for the wrong reason. Test user-visible behavior: what is visible, what is interactive, what the user can accomplish.

One test file per page. The temptation to mirror your route structure in test file structure creates a maintenance coupling that hurts over time. Organize by user journey or feature: a checkout test file covers the full flow across multiple pages, not just the checkout page in isolation.

The most expensive test in your suite is not the slowest one. It is the one that fails intermittently, gets retried, eventually passes, and convinces everyone the suite is healthy when it is not.

Our Fix Any Flaky Test in 30 Minutes playbook maps every flaky test symptom to its root cause if you want the full debugging decision tree for the anti-patterns listed above.

For teams where the anti-pattern list feels like a backlog of tech debt rather than a warning, Autonoma is worth evaluating. Autonoma's agents read your codebase and generate tests that follow the right patterns by default, and the tests adapt automatically when the UI changes. The category of "we have tests but they're fragile" largely disappears.

A strict selector hierarchy has the highest long-term impact on flakiness. Follow Playwright's official ranking: getByRole first (it matches how users and screen readers identify elements), then getByLabel and getByPlaceholder for form fields, then getByText for unique copy, then getByTestId as an explicit fallback, and only CSS or XPath when nothing else fits. Tests built on role and label survive redesigns, copy changes, and layout refactors without modification. Combined with explicit per-assertion timeouts (rather than leaving the action timeout at its default of 0, which effectively relies on the 30-second per-test timeout to bail you out), most intermittent failures either disappear or become consistently-reproducible failures that are straightforward to debug.

For most teams starting fresh with Playwright, composable fixtures are a better choice than the classic Page Object Model. Fixtures give you the same isolation and reusability with less ceremony: no class hierarchies, no constructor injection, no method chaining. Start with fixtures and extract page objects only when duplicated interaction logic across many fixtures makes the cost obvious.

Enable trace: 'on-first-retry' in your Playwright config and screenshot: 'only-on-failure'. When the test fails in CI, download the trace artifact and run npx playwright show-trace trace.zip. The trace gives you a complete DOM snapshot, network timeline, and browser video at the exact moment of failure. CI-only failures are almost always timing issues (too-short timeouts), environment assumptions (timezone, locale, seeded data), or navigation wait conditions that assume load equals hydrated.

Workers run tests in parallel within a single machine. Sharding splits your test suite across multiple CI machines. Use workers to maximize the hardware you already have. Add sharding when a single machine cannot run the full suite within your time budget, typically around the 5-10 minute mark. Sharding requires test isolation. Tests cannot share database state or browser sessions across shard boundaries.

Use Playwright's storageState feature. Run the login flow once in a global setup file, save the resulting session to a JSON file, and load it as the browser state for every test that needs authentication. This eliminates login overhead entirely. For multi-role applications, maintain separate state files per role. Regenerate state files at the start of each CI run rather than committing them. Session tokens expire and stale tokens cause mysterious authentication failures.

No. Autonoma is a testing layer that sits above individual frameworks. It uses browser automation internally as part of the execution engine, but the distinction is who owns the test code. With Playwright directly, your team writes, maintains, and debugs every spec. With Autonoma, agents read your codebase, generate the tests, and the tests adapt automatically when the UI changes. If you already have a well-maintained Playwright suite following the practices in this article, Autonoma is complementary: it covers new flows automatically as you ship. If you are starting from zero or drowning in maintenance, Autonoma is the faster path to coverage. Autonoma is open source and free to self-host; a managed Cloud tier is also available.

Not usually. Sharding adds CI configuration complexity: you need multiple parallel runners, a merge step for coverage reports, and consistent environment setup across machines. For suites under 5 minutes on a single machine, the overhead is not worth it. Focus on worker parallelism first, which requires no CI changes. Add sharding when the suite grows past the point where a single fast CI machine cannot keep up.

Playwright Best Practices: 8 Patterns for a Stable 2026 E2E Suite

1. Playwright Selector Strategy: The Hierarchy That Actually Holds

2. Retry Logic: Stop Fighting Timeouts, Start Configuring Them

3. Parallelism: Sharding vs. Workers and When Each Matters

4. Fixture Design: Business Actions, Not API Wrappers

5. Playwright Trace Viewer: Your First Move on Any Flaky Test

6. CI Flakiness Fixes: The Four That Actually Work

7. Playwright Authentication Storage State: Never Re-Login in Tests

8. Playwright Page Object Model: Why You Might Not Need It in 2026

Playwright Anti-Patterns to Avoid

What is the most important Playwright best practice for reducing flakiness?

Should I use the Page Object Model with Playwright in 2026?

How do I debug a Playwright test that only fails in CI?

What is the difference between Playwright workers and sharding?

How should I handle authentication in a large Playwright test suite?

Does Autonoma replace Playwright?

Is it worth setting up Playwright sharding for a small test suite?

Playwright Best Practices: 8 Patterns for a Stable 2026 E2E Suite

1. Playwright Selector Strategy: The Hierarchy That Actually Holds

2. Retry Logic: Stop Fighting Timeouts, Start Configuring Them

3. Parallelism: Sharding vs. Workers and When Each Matters

4. Fixture Design: Business Actions, Not API Wrappers

5. Playwright Trace Viewer: Your First Move on Any Flaky Test

6. CI Flakiness Fixes: The Four That Actually Work

7. Playwright Authentication Storage State: Never Re-Login in Tests

8. Playwright Page Object Model: Why You Might Not Need It in 2026

Playwright Anti-Patterns to Avoid

What is the most important Playwright best practice for reducing flakiness?

Should I use the Page Object Model with Playwright in 2026?

How do I debug a Playwright test that only fails in CI?

What is the difference between Playwright workers and sharding?

How should I handle authentication in a large Playwright test suite?

Does Autonoma replace Playwright?

Is it worth setting up Playwright sharding for a small test suite?

Related articles

Managed vs Self-Hosted Playwright: What You Still Own

Playwright E2E Testing: The Complete Guide from Setup to CI/CD

Playwright Authentication: Cut Login Time by 80% with storageState

How to Use Playwright Codegen (and Why Recorded Tests Rot)