Edge Case Testing. Find Them Without Listing Them

Edge case testing verifies inputs and states at the boundary of expected behavior. A boundary test checks min, max, and just-outside values; an edge case stresses one unusual condition; a corner case combines several unusual conditions at once. For no-QA teams, the scalable model is to use Sentry and PostHog as production signals, then use Autonoma as the pre-deploy layer that generates, replays, and reviews the coverage those signals imply.

Teams without QA do not fail at edge case testing because they lack imagination. They fail because the operating model is wrong. A small engineering team cannot sit in a spreadsheet and list every weird input, abandoned state, expired session, file name, translation key, and browser-specific failure the product might hit.

The better model is signal-driven. Sentry, PostHog, support tickets, failed uploads, and customer-reported corner cases tell you where the product already surprised the team. Those signals should not become a permanent manual backlog. They should feed the planning and generation loop that turns real failure classes into pre-deploy tests.

Why listing every edge case fails

Beginner edge case testing guides usually start with definitions and then move into generic edge case examples: empty input, long input, zero quantity, invalid date, unsupported file type, slow network, expired session. That is useful vocabulary. It is not an operating system for a small team.

The list explodes because the product is not a list. Every form field has empty, whitespace-only, maximum-length, special-character, and malformed variants. Every async flow has retry, timeout, duplicate-submit, stale-cache, and session-expiry variants. Every locale has missing key, malformed JSON, untranslated fallback, pluralization, date format, and currency format variants. Combine two of those and you are in corner cases testing, not simple edge cases testing.

That is why manual enumeration breaks down. The product team thinks in features. Users think in outcomes. Production thinks in states. The bug shows up in the gap between those three models.

For small engineering teams, the practical rule is this: do not try to write an exhaustive edge case catalogue before you ship. Use a small generic baseline for boundary testing, then let production signals tell your test-generation layer which branches deserve permanent coverage. The shift-left testing for small engineering teams pattern applies here too. Move the real failure back into the PR loop as soon as you see it once.

Your Sentry errors are prioritization signals

Sentry errors are not the complete edge case testing solution. They are prioritization signals. A TypeError on checkout confirmation is not only an incident. It points to a missing paid-flow test. A URIError in the download route is not only a stack trace. It points to a special-character filename test. A missing translation key in production is not only an i18n bug. It points to a locale coverage gate.

Autonoma does not replace Sentry or PostHog. Keep them for monitoring, product analytics, exception grouping, user impact, and post-prod visibility. The pre-deploy layer has a different job: take the highest-signal failure classes and generate tests that stop the same class from reaching users twice.

For the broader upstream-vs-downstream framing, see Sentry alternatives for pre-deploy bug detection.

The input can still be simple. Export the last 30 days of exceptions from Sentry or query PostHog's exception events. Group by route, browser, error type, and message. Add affected users and occurrence count. Then annotate each row with the product consequence: data loss, silent failure, user lockout, paid-flow breakage, or noisy but recoverable UI error.

Diagram showing production errors grouped into prioritized edge case tests before the pull request gate

The SQL shape below is deliberately boring. Boring is good here. It gives the planning layer a ranked signal set where each row can become one coverage candidate instead of a vague "improve QA" task.

-- Sentry / PostHog error grouping query for edge case test backlog.
--
-- Goal: group production error events by the dimensions that matter
-- for reproduction in a test (URL path, client/browser context, error
-- type) and rank the groups by occurrence count. The top groups are
-- your Tier 1 edge case test candidates.
--
-- ============================================================
-- Column name swaps for Sentry vs PostHog
-- ============================================================
-- This query is written against the PostHog `events` table where:
--   event              = '$exception'  (PostHog exception event)
--   properties.$pathname            -> URL path of the error
--   properties.$browser              -> browser/client name
--   properties.$exception_type       -> error class (e.g. TypeError)
--   properties.$exception_message    -> error message
--
-- For a Sentry export (e.g. via Sentry's CSV/Snowflake/BigQuery
-- integration), swap the columns as follows:
--   PostHog                          Sentry export
--   ---------------------------------------------------
--   properties.$pathname          -> transaction        (or url)
--   properties.$browser           -> contexts.browser.name
--   properties.$exception_type    -> type
--   properties.$exception_message -> message            (or title)
--   timestamp                     -> received           (or timestamp)
--   event = '$exception'          -> level = 'error'    (filter)
--
-- The aggregation shape stays identical; only the column references
-- change. Adapt the FROM clause to the table that holds your error
-- events in whichever warehouse you have configured.

SELECT
    properties.$pathname            AS url_path,
    properties.$browser             AS browser,
    properties.$exception_type      AS error_type,
    properties.$exception_message   AS error_message,
    COUNT(*)                        AS occurrence_count,
    COUNT(DISTINCT distinct_id)     AS affected_users,
    MIN(timestamp)                  AS first_seen,
    MAX(timestamp)                  AS last_seen
FROM events
WHERE event = '$exception'
  AND timestamp >= NOW() - INTERVAL 30 DAY
GROUP BY
    url_path,
    browser,
    error_type,
    error_message
ORDER BY occurrence_count DESC
LIMIT 100;

Turn each high-signal row into a small planning brief. The brief should name the route, the exact exception class, the observed user action, the product consequence, and the assertion that would have failed before deploy. For example: route /files/:id/download, exception URIError, action "download uploaded PDF", consequence "user cannot retrieve stored file", assertion "downloaded file name matches uploaded file name." That row is enough context for Autonoma to prioritize the flow and enough context for a reviewer to understand why the generated coverage matters.

Keep the signal set short. Ten repeated Sentry rows are more useful than 100 speculative edge case examples because they came from real behavior. If a row has no product consequence, leave it in monitoring. If it maps to data loss, silent failure, user lockout, or a paid-flow break, feed it into the pre-deploy coverage loop.

If the top row is URIError: URI malformed on /api/uploads/download, the coverage should upload a file with parentheses and spaces, download it, and assert the filename round-trips without percent encoding. If the top row is a missing translation key, the coverage should walk locale files and assert every canonical key exists in every shipped locale. The monitor found the signal. The pre-deploy layer should own the repeatable test.

If the team is already comparing monitoring tools, keep the distinction clean. The Sentry alternatives for pre-deploy bug detection article covers the broader tool landscape. This article is narrower: take the errors your current monitor already captured and convert the highest-risk groups into tests. You can change the monitor later. The signal exists either way.

The signal owner should be engineering, not support. Support can tell you which users complained and which failures burned trust. Engineering has to translate that into a reproducible state: route, input, auth state, fixture data, browser, locale, and expected side effect. That translation is the moment an incident becomes coverage.

How Autonoma covers edge case testing

The manual version of this workflow still asks one overloaded engineer to notice the pattern, write the prompt, write the test, keep the selector current, and remember to run it. That is the gap Autonoma is built to remove for teams where "we don't have any QA" is the truth of the org chart.

In our four-stage pipeline, Planning reads the codebase, routes, components, and user flows. Generation explores the running app and turns behavior into test coverage. Replay stabilizes the path so the test can run again in CI. Review checks the result before the test becomes part of the suite. For edge case testing, the important distinction is that production signals guide priority, while the running product supplies the behavior to test.

That changes the failure surface. If the app exposes a file-upload flow, a quantity field, an account-recovery path, and a locale switcher, those are concrete behaviors the system can explore. It can vary inputs, observe validation, and preserve the cases that matter as repeatable E2E tests. The team does not need to list every corner case first.

The honest qualifier: Autonoma cannot generate tests for flows that are not implemented. If there is no refund flow in the app, it cannot infer the refund policy and test it. If an admin-only route is unreachable without seeded state, that state still has to exist or be generated by the planning layer. The leverage is not mind reading. The leverage is that the running product already contains more behavioral surface area than a prompt or checklist, and our agents use that surface area as the source of truth.

The edge-case prioritization decision tree

Do not prioritize edge cases by cleverness. Prioritize by blast radius. The hard rule is: if an edge case can cause data loss, silent failure, user lockout, or a paid-flow break, it is Tier 1. Anything else is Tier 2 or Tier 3.

Question	Tier	Coverage rule	Example
Causes data loss?	Tier 1	Generate before merge	Autosave drops edits
Silent failure?	Tier 1	Generate before merge	Payment accepted, order missing
User lockout?	Tier 1	Generate before merge	Password reset loop
Paid-flow break?	Tier 1	Generate before merge	Upgrade confirmation fails
Recoverable UI bug?	Tier 2	Batch after Tier 1	Tooltip clips text
Cosmetic oddity?	Tier 3	Track, don't block	Avatar crops badly

Diagram showing edge case priority branches for data loss, silent failure, user lockout, paid-flow breaks, and lower-risk UI bugs

This decision tree keeps the team honest. A zero-quantity cart item that creates a free order is not "just an edge case." It is data and revenue corruption. A malformed i18n file that crashes the checkout page for one locale is not "just localization." It is user lockout for that market. A file name rendered with %20 in a download header may be Tier 2 if it is cosmetic, but Tier 1 if the same encoding bug prevents a contract from opening after upload.

The triage should be strict, but it should not turn into a second QA job. Pull the top grouped errors, remove anything caused by an outage already fixed at the infrastructure layer, merge duplicates where the same root cause appears under different browser names, and classify the remaining rows by consequence. The output is not a manual engineering backlog. The output is a priority map for generated coverage and human review.

Tier 1 should become pre-deploy coverage if the affected surface is still active. The test does not have to be elegant. It has to reproduce the failure and prove the side effect is blocked. Tier 2 can be batched. Tier 3 can stay in monitoring unless it repeats enough times to become a trust issue.

This discipline prevents the common failure mode where the loudest alert wins. A noisy exception that users can refresh through is not more important than a silent failure that affects two paying users. The second one is the coverage target you prioritize first.

It also prevents overfitting. One weird stack trace from a retired beta feature should not block every release forever. A repeated file-upload error on the path customers use to send signed contracts should. The decision tree gives the team a way to say no to low-value edge cases without ignoring the high-risk ones hiding behind low volume.

What the generated coverage can look like

Edge case testing becomes real when the rule turns into a failing test. The examples below are not the recommended operating layer for a no-QA team. They are artifacts that show the kind of coverage Autonoma can generate and the kind of DIY fallback a team can write if it is not using Autonoma yet.

Start with boundary analysis because it is a high-signal baseline. For a numeric input, test the minimum valid value, the maximum valid value, and the just-outside values. Do not only assert that the input accepts text. Assert the side effect does or does not happen.

The Playwright example below tests a quantity field with 1, 99, 100, and 0. The important part is the negative assertion: when the value is out of range, the update endpoint must not receive the bad payload.

/**
 * Playwright boundary-value test for a numeric quantity input.
 *
 * The field under test is expected to:
 *   - accept the minimum valid value (1)
 *   - accept the maximum valid value (99)
 *   - reject values above the maximum (100) with a visible error
 *   - reject zero with a visible error
 *
 * The assertion goes one step beyond "the input accepts the value".
 * For rejected inputs we also assert that the server endpoint never
 * receives the out-of-range value, by listening for the POST request
 * and failing the test if it fires with a bad payload.
 *
 * Replace BASE_URL and the selectors with your application's values.
 */

import { test, expect, type Request } from "@playwright/test";

const BASE_URL = process.env.BASE_URL ?? "http://localhost:3000";
const FORM_PATH = "/cart";
const SUBMIT_ENDPOINT = "/api/cart/update";

const QUANTITY_INPUT = '[data-testid="quantity-input"]';
const SUBMIT_BUTTON = '[data-testid="submit-quantity"]';
const ERROR_MESSAGE = '[data-testid="quantity-error"]';
const SUCCESS_MESSAGE = '[data-testid="quantity-success"]';

async function fillAndSubmit(page: import("@playwright/test").Page, value: string) {
    await page.goto(`${BASE_URL}${FORM_PATH}`);
    await page.locator(QUANTITY_INPUT).fill(value);
    await page.locator(SUBMIT_BUTTON).click();
}

function captureUpdateRequests(page: import("@playwright/test").Page): Request[] {
    const captured: Request[] = [];
    page.on("request", (req) => {
        if (req.url().includes(SUBMIT_ENDPOINT) && req.method() === "POST") {
            captured.push(req);
        }
    });
    return captured;
}

test.describe("quantity input boundary values", () => {
    test("accepts the minimum valid value (1)", async ({ page }) => {
        const requests = captureUpdateRequests(page);
        await fillAndSubmit(page, "1");
        await expect(page.locator(SUCCESS_MESSAGE)).toBeVisible();
        await expect(page.locator(ERROR_MESSAGE)).toHaveCount(0);
        expect(requests.length).toBe(1);
    });

    test("accepts the maximum valid value (99)", async ({ page }) => {
        const requests = captureUpdateRequests(page);
        await fillAndSubmit(page, "99");
        await expect(page.locator(SUCCESS_MESSAGE)).toBeVisible();
        await expect(page.locator(ERROR_MESSAGE)).toHaveCount(0);
        expect(requests.length).toBe(1);
    });

    test("rejects max+1 (100) with a visible error and never POSTs the bad value", async ({ page }) => {
        const requests = captureUpdateRequests(page);
        await fillAndSubmit(page, "100");
        await expect(page.locator(ERROR_MESSAGE)).toBeVisible();
        await expect(page.locator(ERROR_MESSAGE)).toContainText(/maximum|too\s*(high|large)|99/i);
        const badPayloads = requests.filter((r) => {
            const body = r.postData() ?? "";
            return body.includes("\"quantity\":100") || body.includes("quantity=100");
        });
        expect(badPayloads.length).toBe(0);
    });

    test("rejects zero with a visible error and never POSTs the bad value", async ({ page }) => {
        const requests = captureUpdateRequests(page);
        await fillAndSubmit(page, "0");
        await expect(page.locator(ERROR_MESSAGE)).toBeVisible();
        await expect(page.locator(ERROR_MESSAGE)).toContainText(/at least|minimum|greater than 0|1/i);
        const badPayloads = requests.filter((r) => {
            const body = r.postData() ?? "";
            return body.includes("\"quantity\":0") || body.includes("quantity=0");
        });
        expect(badPayloads.length).toBe(0);
    });
});

String inputs need a different baseline. Empty strings, whitespace-only strings, and very long strings catch different bugs. Empty input catches missing required validation. Whitespace-only catches bad trimming. A 10,000-character string catches storage, rendering, truncation, and silent passthrough failures.

The Vitest example keeps the logic close to the handler. That is intentional. Not every edge case belongs in Playwright. If the bug is pure normalization, unit-level coverage is faster and more precise.

/**
 * Vitest unit tests for string-boundary conditions on a hypothetical
 * user-profile `bio` field handler.
 *
 * Behavior under test:
 *   - empty string input  -> returns empty string (or null), never throws
 *   - whitespace-only     -> trimmed to empty string (or null)
 *   - 10,000-char input   -> truncated to documented MAX_BIO_LENGTH,
 *                            never silently passed through
 *   - valid input         -> returned as-is (trimmed)
 *
 * If your real handler returns `null` instead of `""` for the empty
 * case, set EMPTY_BIO_RETURN below to null and the tests stay green.
 */

import { describe, it, expect } from "vitest";

export const MAX_BIO_LENGTH = 280;
const EMPTY_BIO_RETURN: string | null = "";

export function normalizeBio(input: unknown): string | null {
    if (input === undefined || input === null) {
        return EMPTY_BIO_RETURN;
    }
    if (typeof input !== "string") {
        throw new TypeError(`bio must be a string, got ${typeof input}`);
    }
    const trimmed = input.trim();
    if (trimmed.length === 0) {
        return EMPTY_BIO_RETURN;
    }
    if (trimmed.length > MAX_BIO_LENGTH) {
        return trimmed.slice(0, MAX_BIO_LENGTH);
    }
    return trimmed;
}

describe("normalizeBio - string boundary conditions", () => {
    it("returns the empty-bio sentinel for an empty string (does not throw)", () => {
        expect(() => normalizeBio("")).not.toThrow();
        expect(normalizeBio("")).toBe(EMPTY_BIO_RETURN);
    });

    it("returns the empty-bio sentinel for whitespace-only input (trim behavior)", () => {
        expect(normalizeBio("   ")).toBe(EMPTY_BIO_RETURN);
        expect(normalizeBio("\t\n  \n")).toBe(EMPTY_BIO_RETURN);
    });

    it("truncates a 10,000-character input to MAX_BIO_LENGTH (no silent passthrough)", () => {
        const longInput = "a".repeat(10_000);
        const result = normalizeBio(longInput);
        expect(typeof result).toBe("string");
        expect((result as string).length).toBe(MAX_BIO_LENGTH);
        expect((result as string).length).toBeLessThan(longInput.length);
    });

    it("returns a normal-length bio unchanged (trimmed)", () => {
        expect(normalizeBio("  hello world  ")).toBe("hello world");
    });

    it("handles null and undefined without throwing", () => {
        expect(() => normalizeBio(null)).not.toThrow();
        expect(() => normalizeBio(undefined)).not.toThrow();
        expect(normalizeBio(null)).toBe(EMPTY_BIO_RETURN);
        expect(normalizeBio(undefined)).toBe(EMPTY_BIO_RETURN);
    });
});

The code-level split keeps the suite maintainable. Use Vitest when the behavior can be proven without a browser: parser accepts or rejects a value, locale JSON contains a key, a normalizer trims whitespace, or a schema rejects malformed input. Use Playwright when the failure depends on the browser, a user-visible state, a navigation, a real upload, an auth redirect, or an integration side effect. A file name like resume-final(2)%.pdf needs browser coverage because the bug can appear anywhere between file selection, upload encoding, object storage, download headers, and display text.

That separation also gives you a review checklist. For every new edge case test, ask: is the assertion checking only a status code, or does it prove the user outcome? A 200 response from an upload endpoint is not enough if the file cannot be opened later. A rendered locale page is not enough if the UI contains a raw translation key. Edge case testing should prove the business consequence, not just the technical branch.

The rule is not "E2E everything." The rule is "put the test at the level where the failure is observable." Browser behavior, upload/download round trips, auth redirects, and rendering bugs belong in Playwright. Pure parsing, normalization, and JSON validation usually belong in Vitest. Both are edge case testing when the assertion covers a boundary the happy path skips.

Corner cases the product team never thought of

The corner cases that hurt are usually not exotic. They are ordinary conditions stacked together. One customer-discovery conversation surfaced a file-upload bug around special characters. The team had tested upload. They had tested download. The product worked with contract.pdf. It failed when the file name contained parentheses and a space. The upload encoded the name. The download header returned the encoded form. The user saw a different file name than the one they uploaded.

That is a corner case because the single conditions are normal. Files have spaces. File names have parentheses. Download endpoints set Content-Disposition. The bug only appears when the flow round-trips through upload, storage, download, and browser header parsing.

Diagram showing file upload round trips and locale key coverage feeding into edge case tests

The Playwright test below captures the full loop. It does not assert "upload returns 200." It asserts the user-visible filename survives the round trip.

/**
 * Playwright file-upload round-trip test.
 *
 * Reproduces the bug where a filename containing parentheses and a
 * space is URL-encoded at upload time but the download endpoint
 * returns the percent-encoded form in Content-Disposition, so the
 * filename the user sees on download does not match the filename
 * they uploaded.
 *
 * The test:
 *   1. Creates a temp file named "report (2024) final.pdf".
 *   2. Uploads it via the file input.
 *   3. Reads the download endpoint's Content-Disposition header.
 *   4. Asserts the filename in the header equals the original,
 *      with NO percent-encoding (no "%20", no "%28", no "%29").
 *
 * Replace BASE_URL and selectors with your application's values.
 */

import { test, expect } from "@playwright/test";
import * as fs from "node:fs";
import * as path from "node:path";
import * as os from "node:os";

const BASE_URL = process.env.BASE_URL ?? "http://localhost:3000";
const UPLOAD_PATH = "/uploads/new";
const FILE_INPUT = 'input[type="file"]';
const SUBMIT_BUTTON = '[data-testid="upload-submit"]';
const UPLOADED_LINK = '[data-testid="last-upload-link"]';

const TRICKY_FILENAME = "report (2024) final.pdf";

test("uploaded filename round-trips through download without percent-encoding", async ({ page, request }) => {
    const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "edge-case-upload-"));
    const filePath = path.join(tmpDir, TRICKY_FILENAME);
    fs.writeFileSync(filePath, "%PDF-1.4\nplaceholder body\n");

    try {
        await page.goto(`${BASE_URL}${UPLOAD_PATH}`);
        await page.locator(FILE_INPUT).setInputFiles(filePath);
        await page.locator(SUBMIT_BUTTON).click();

        const link = page.locator(UPLOADED_LINK);
        await expect(link).toBeVisible();
        const downloadUrl = await link.getAttribute("href");
        expect(downloadUrl, "download link must exist").toBeTruthy();

        const absoluteUrl = downloadUrl!.startsWith("http")
            ? downloadUrl!
            : `${BASE_URL}${downloadUrl!}`;
        const response = await request.get(absoluteUrl);
        expect(response.status(), "download endpoint should return 200").toBe(200);

        const disposition = response.headers()["content-disposition"];
        expect(disposition, "Content-Disposition header must be present").toBeTruthy();

        expect(disposition).not.toMatch(/%20/);
        expect(disposition).not.toMatch(/%28/);
        expect(disposition).not.toMatch(/%29/);

        const filenameMatch = disposition!.match(/filename\*?=(?:UTF-8'')?["]?([^";]+)["]?/i);
        expect(filenameMatch, `could not parse filename from: ${disposition}`).toBeTruthy();
        const returnedFilename = decodeURIComponent(filenameMatch![1]);
        expect(returnedFilename).toBe(TRICKY_FILENAME);
    } finally {
        try {
            fs.rmSync(tmpDir, { recursive: true, force: true });
        } catch {
            // best-effort cleanup
        }
    }
});

Another discovery call surfaced an i18n translation-gap bug. The product shipped with a canonical English locale and a secondary locale. A key was added to English, the secondary file missed it, and the UI rendered the raw key in production. The team did not hear about it in CI because no test walked the locale files. They heard about it real quick from a user.

That bug is not best solved by clicking through every translated screen. The durable test is a locale coverage gate: parse every locale JSON file, use the canonical locale as the key set, and fail if any shipped locale misses a key. This catches malformed JSON and missing translations before a browser ever renders the page.

/**
 * Vitest two-part locale-coverage test.
 *
 * Part 1 - JSON validity:
 *   Walk every file matching `locales/*.json`, parse it, and fail the
 *   test if any file is not valid JSON. Malformed JSON in a locale
 *   bundle ships as a runtime crash in the browser; catch it in CI.
 *
 * Part 2 - Key coverage:
 *   Treat `locales/en.json` as the canonical key set. For every other
 *   locale file, assert that every key (recursively, dot-paths) that
 *   exists in `en.json` also exists in the other locale. The bug this
 *   catches: a key added to `en.json` but forgotten in `fr.json`
 *   renders the raw key string ("checkout.shipping.estimate") to
 *   French-locale users.
 *
 * The test resolves locale files relative to the project root via a
 * configurable LOCALES_DIR. Default points at `locales/` next to the
 * project root.
 */

import { describe, it, expect, beforeAll } from "vitest";
import * as fs from "node:fs";
import * as path from "node:path";
import { globSync } from "glob";

const LOCALES_DIR = process.env.LOCALES_DIR
    ? path.resolve(process.env.LOCALES_DIR)
    : path.resolve(process.cwd(), "locales");

const CANONICAL_LOCALE = "en.json";

interface LocaleFile {
    file: string;
    name: string;
    raw: string;
    parsed: Record<string, unknown> | null;
    parseError: string | null;
}

function loadLocaleFiles(): LocaleFile[] {
    const pattern = path.join(LOCALES_DIR, "*.json").replace(/\\/g, "/");
    const files = globSync(pattern);
    return files.map((file) => {
        const raw = fs.readFileSync(file, "utf8");
        try {
            const parsed = JSON.parse(raw) as Record<string, unknown>;
            return { file, name: path.basename(file), raw, parsed, parseError: null };
        } catch (err) {
            return {
                file,
                name: path.basename(file),
                raw,
                parsed: null,
                parseError: err instanceof Error ? err.message : String(err),
            };
        }
    });
}

function collectKeys(obj: unknown, prefix = "", out: string[] = []): string[] {
    if (obj === null || typeof obj !== "object" || Array.isArray(obj)) {
        out.push(prefix);
        return out;
    }
    for (const [key, value] of Object.entries(obj as Record<string, unknown>)) {
        const next = prefix ? `${prefix}.${key}` : key;
        collectKeys(value, next, out);
    }
    return out;
}

describe("locale JSON validity", () => {
    let locales: LocaleFile[];

    beforeAll(() => {
        locales = loadLocaleFiles();
    });

    it("finds at least one locale file", () => {
        expect(locales.length, `no locale files found in ${LOCALES_DIR}`).toBeGreaterThan(0);
    });

    it("every locale file is valid JSON", () => {
        const malformed = locales.filter((l) => l.parseError !== null);
        const messages = malformed.map((l) => `${l.name}: ${l.parseError}`);
        expect(malformed, `malformed locale files:\n${messages.join("\n")}`).toHaveLength(0);
    });
});

describe("locale key coverage against canonical en.json", () => {
    let locales: LocaleFile[];
    let canonical: LocaleFile | undefined;
    let canonicalKeys: Set<string>;

    beforeAll(() => {
        locales = loadLocaleFiles();
        canonical = locales.find((l) => l.name === CANONICAL_LOCALE);
        canonicalKeys = canonical?.parsed
            ? new Set(collectKeys(canonical.parsed))
            : new Set();
    });

    it("canonical locale en.json exists and is parseable", () => {
        expect(canonical, `${CANONICAL_LOCALE} not found in ${LOCALES_DIR}`).toBeDefined();
        expect(canonical?.parsed, `${CANONICAL_LOCALE} failed to parse`).not.toBeNull();
    });

    it("every non-canonical locale contains every canonical key", () => {
        const others = locales.filter((l) => l.name !== CANONICAL_LOCALE && l.parsed !== null);
        const missing: { locale: string; keys: string[] }[] = [];

        for (const locale of others) {
            const keys = new Set(collectKeys(locale.parsed!));
            const missingForLocale = [...canonicalKeys].filter((k) => !keys.has(k));
            if (missingForLocale.length > 0) {
                missing.push({ locale: locale.name, keys: missingForLocale });
            }
        }

        const report = missing
            .map((m) => `${m.locale} missing ${m.keys.length} key(s):\n  - ${m.keys.join("\n  - ")}`)
            .join("\n\n");
        expect(missing, `locales with missing keys:\n\n${report}`).toHaveLength(0);
    });
});

These examples are why "corner cases" is the phrase customers use more often than "edge case testing." They are not asking for a taxonomy. They are describing the bug class the product team never thought of, the one that reaches a real user because the test suite only covered the intended path.

Why generic coding agents still need an Autonoma-like loop

Claude Code and Cursor can generate useful edge case tests if you give them a real signal set. They are weakest when the prompt is abstract: "write edge cases for checkout." They are stronger when the prompt includes the happy path already covered, the Sentry errors from the last 30 days, the boundary categories you want, and the assertion style you will reject.

The prompt template below is a DIY fallback for teams that are not using an automated planning, generation, replay, and review loop yet. It turns a coding agent from a happy-path test generator into a more useful edge-case test generator. It also names what the agent must not skip: visible error text, blocked side effects, maximum-length inputs, special-character round trips, malformed payloads, and expired-session behavior.

# Edge Case Test Generation Prompt

A structured prompt template for Claude Code, Cursor, or any coding
agent that can generate Playwright tests from a feature description
plus a list of production errors. Fill the slots in `[BRACKETS]`
before sending.

## Why this prompt exists

A vanilla prompt like "write a Playwright test for the checkout flow"
produces a happy-path test. This prompt constrains the agent to real
failure modes (your Sentry errors) and to explicit edge-case
categories that vanilla prompts skip. The closing assertion-style
constraint blocks the most common failure mode of AI-generated tests:
assertions that only check status codes and never the response body.

## The template

```
You are generating Playwright tests for the [FEATURE_NAME] feature.

HAPPY PATH (already covered by an existing test, do NOT regenerate):
[HAPPY_PATH_DESCRIPTION]

PRODUCTION ERRORS observed in Sentry over the last 30 days for this
feature. Each line is a real exception users hit. Generate one
Playwright test that reproduces each one:

[SENTRY_ERROR_LIST]
  - Example: "TypeError: Cannot read properties of undefined (reading 'id') at /checkout/confirm"
  - Example: "ValidationError: quantity must be <= 99 at /api/cart/update"
  - Example: "URIError: URI malformed at /api/uploads/download"

ADDITIONAL EDGE-CASE CATEGORIES to cover beyond the Sentry list. For
each category, generate at least one Playwright test:

  1. Empty input. Test every required text field with an empty
     string. Assert a visible error message AND assert the form
     submit endpoint is never POSTed.

  2. Maximum-length input. For every text field with a documented
     max length N, test exactly N characters (must pass) and N+1
     characters (must fail with a visible error).

  3. Boundary value. For every numeric field with a valid range
     [min, max], test min, max, min-1, and max+1. min and max must
     pass; min-1 and max+1 must fail with a visible error.

  4. Special characters in filenames and text. Test inputs
     containing parentheses, spaces, accented characters, and
     trailing whitespace. Assert the value round-trips correctly
     through any persistence and display layer.

  5. Malformed input. Test inputs that look valid but break parsers:
     a quantity of "1e5", a date of "2024-13-45", a JSON payload
     with a trailing comma. Assert the server returns a 4xx with a
     useful error message, NOT a 5xx.

  6. Expired session. Test the flow with a session token that has
     been expired server-side. Assert the user is redirected to
     login and the in-flight action is NOT silently dropped.

ASSERTION-STYLE REQUIREMENT (this is non-negotiable):

  Every assertion must check the CONTENT of the response, not just
  the status code. A test that asserts `response.status() === 200`
  is not an edge case test; it is a smoke test. For each test you
  generate:

    - If the test expects success, assert the response body, the
      rendered DOM, or the database/state side-effect contains the
      specific value you expect.
    - If the test expects failure, assert the visible error message
      text and assert that the side-effect (e.g. database write,
      outbound API call) did NOT occur.
    - For asynchronous bugs (race conditions, eventual consistency),
      use Playwright's auto-retrying assertions (`expect(...).toHaveText`)
      or `waitForResponse`. Never use a fixed `setTimeout`.

OUTPUT FORMAT:
  - One `test.describe` block per category.
  - One `test(...)` per case.
  - All tests in TypeScript using `@playwright/test`.
  - File header comment listing which Sentry error or category each
    test addresses, so the reviewer can audit coverage.
```

## How to use it

1. Run the [Sentry issue-grouping SQL](../queries/sentry-issue-grouping.sql)
   against your events warehouse. Paste the top 10 rows into
   `[SENTRY_ERROR_LIST]`.
2. Fill `[FEATURE_NAME]` and `[HAPPY_PATH_DESCRIPTION]` with the
   feature you want covered.
3. Hand the resulting prompt to your coding agent.
4. Review the generated tests against the three known gaps:
   - Boundary tests whose assertion is just "value accepted".
   - Assertions that never fail (often on async or race code paths).
   - Cases the agent skipped because they require out-of-band setup
     (DB seed, mocked third-party, feature flag). Add those by hand.

What coding agents still miss is the out-of-band setup. They can write the test for an expired session, but they may not know how to expire the token server-side. They can write a translation-key test, but they may not know which locale is canonical. They can test a file upload, but not the production storage provider's exact behavior unless you give them the fixture and environment.

The fallback strategy is to keep the agent narrow. Feed it the Sentry and PostHog signal rows. Ask it to draft tests only for the rows you have classified as Tier 1 or Tier 2. Review every assertion for content, not just status code. Then add the missing setup by hand or through your test factory. That can produce useful coverage, but it is still a manual backfill process.

The best review question is not "does this test pass?" It is "would this test have failed before the production bug?" If the answer is no, the test is probably a smoke test wearing edge-case clothing. A test that checks only 200 on a malformed payload would have passed while the UI still rendered the wrong message. A test that uploads a file but never downloads it would have passed while the round-trip bug still existed. A test that checks one locale file but not the canonical key set would have missed the translation-gap case.

That is why generic coding agents are helpful but incomplete. The missing layer is the loop: collect signals, plan against the actual product, generate the right browser-level coverage, replay it in CI, and review the result before it becomes part of the suite. Without that loop, the team is still manually translating incidents into tests and hoping the prompt includes enough context.

This also keeps the product team out of a pointless blame loop. The issue is not that someone "forgot" a corner case. Most corner cases are invisible until the product has real users, real files, real locales, real sessions, and real browser behavior. The win is to make each discovered corner case permanent coverage once, so the same class does not return in the next deploy.

The no-QA operating model

For a no-QA team, the operating model should be simple: keep Sentry and PostHog as signal sources, feed the high-risk classes into Autonoma, and use the generated tests as the pre-deploy replay layer. Monitoring tells you what escaped. Autonoma turns the relevant escape patterns into coverage before the next release.

That keeps the responsibilities clean. Sentry and PostHog show the symptoms, frequency, affected users, routes, browsers, and messages. Engineering classifies the product consequence. Autonoma plans against the running app, generates the edge case coverage, replays it in CI, and gives the team something concrete to review.

The review question stays the same: would this test have failed before the production bug? If the answer is yes, keep it. If the flow is retired or the risk no longer exists, remove or downgrade it. A startup test suite should not become a museum of bugs from product versions that no longer exist.

That is how edge case testing becomes sustainable without a QA team. You do not list every possible case. You keep listening to production signals, keep monitoring downstream, and use Autonoma upstream so the important failure classes become repeatable pre-deploy coverage.

FAQ

An edge case in testing is an input, state, or user path at the boundary of expected behavior. Examples include a minimum value, a maximum value, an empty string, a very long string, an expired session, or a file name with special characters. Good edge case testing checks both the visible result and the side effect.

An edge case usually stresses one unusual boundary, such as quantity 0 or a 10,000-character field. A corner case combines multiple uncommon conditions at once, such as a file name with spaces and parentheses moving through upload, storage, and download. Teams often use the terms loosely, but the distinction helps prioritize coverage.

Start with generic boundary testing for numeric, string, file, auth, and locale inputs. Then use Sentry and PostHog as signal sources: group errors by route, browser, error type, and affected users, and prioritize anything that can cause data loss, silent failure, user lockout, or paid-flow breakage.

Some are. A Sentry error is evidence that a real user reached a state your tests did not cover. Treat those errors as signals, then classify each row by product consequence. Monitoring stays downstream; Autonoma turns the relevant repeated failure classes into upstream pre-deploy tests.

AI coding agents can draft edge case tests when you provide the categories and production failures to cover. They are a DIY fallback, not the full operating layer, because they usually need manual setup for seeded state, expired tokens, feature flags, third-party state, or locale policy. Autonoma adds the planning, generation, replay, and review loop around that work.

Edge Case Testing. Find Them Without Listing Them

Why listing every edge case fails

Your Sentry errors are prioritization signals

How Autonoma covers edge case testing

The edge-case prioritization decision tree

What the generated coverage can look like

Corner cases the product team never thought of

Why generic coding agents still need an Autonoma-like loop

The no-QA operating model

FAQ

What is an edge case in testing?

What is the difference between an edge case and a corner case?

How do I find edge cases?

Are my Sentry errors edge cases?

Do AI coding agents test edge cases?

Edge Case Testing. Find Them Without Listing Them

Why listing every edge case fails

Your Sentry errors are prioritization signals

How Autonoma covers edge case testing

The edge-case prioritization decision tree

What the generated coverage can look like

Corner cases the product team never thought of

Why generic coding agents still need an Autonoma-like loop

The no-QA operating model

FAQ

What is an edge case in testing?

What is the difference between an edge case and a corner case?

How do I find edge cases?

Are my Sentry errors edge cases?

Do AI coding agents test edge cases?

Related articles

The Cost of Hiring Your First QA Engineer vs a Tool

Sauce Labs Pricing in 2026, Modeled by Team Size

AI Test Theater: The Confidence Trap Killing Your Test Suite

AI-Generated Tests That Pass But Don't Assert Anything