i18n Testing: Catching Missing Translations Automatically

i18n testing verifies that translated strings appear correctly in the rendered UI across every supported locale. The most common i18n bug in production is not a bad translation. It is a missing translation key the build did not catch: the key exists in English, is absent from German, and the UI renders the raw string checkout.button.submit instead of translated text. No linter catches it. Users find it first.

For engineers shipping multi-locale apps without a QA team, this class of bug is entirely avoidable, and Autonoma is the operating layer that catches it per PR before anyone sees production. The patterns in this article show how the problem works, what a targeted test looks like, and how our agents generate and maintain that coverage automatically from your codebase.

Why missing translation keys are a no-QA-team disaster

In a customer-discovery call, one team described the pattern precisely: "We don't have any QA, so we hear about it real quick when something breaks in another language. A user tweets a screenshot and suddenly we're doing a hotfix at 11pm." The missing key had been there for two sprints. The build passed. The linter passed. Their staging environment was English-only, so no one saw it before the release.

That feedback loop, where a user finds the bug before the team does, is exactly what Autonoma is built to cut. It generates the locale-switching tests that catch the missing key at PR review, so the 11pm hotfix is never scheduled in the first place.

That is the structural problem. Next.js with next-i18next (or any i18n library) resolves translation keys at render time, not compile time. If de/common.json is missing a key that en/common.json has, the library falls back to the raw key string. No error is thrown. The page renders. It just renders wrong.

The corner cases multiply fast. A team adding a new checkout flow adds the English key but forgets German. A developer renames a key in English and updates the component but not every locale file. A merge conflict resolution picks the wrong side of the JSON. Each scenario produces the same symptom: raw dotted keys visible to real users in a non-English locale. For a team with no QA, the feedback loop is entirely outside the engineering workflow.

The fix is not more careful developers. It is a runtime test that switches locale, loads the page, and asserts that no visible text node looks like namespace.section.key.

A missing translation key, start to finish

Here is the concrete scenario. A React component calls t('checkout.button.submit') via next-i18next. The English locale file has the key. The German locale file does not. In German, the button renders the literal string checkout.button.submit instead of "Bestellung abschicken".

The i18n config wires two locales with English as the default:

// next-i18next configuration.
// Wires two locales for Next.js pages-router locale routing:
//   - 'en' (default)
//   - 'de'
// Next.js reads `i18n` from next.config.js and next-i18next reuses the same
// shape here so getServerSideProps/getStaticProps can hydrate translations.
module.exports = {
  i18n: {
    defaultLocale: 'en',
    locales: ['en', 'de'],
  },
};

The React component is straightforward. It calls t('checkout.button.submit') and expects the library to resolve it:

import { useTranslation } from 'next-i18next';

/**
 * CheckoutButton renders the order-submit button.
 *
 * It looks up the label via t('checkout.button.submit').
 *
 * NOTE ON THE MISSING-KEY BEHAVIOR:
 * The key `checkout.button.submit` exists in public/locales/en/common.json
 * but is DELIBERATELY absent from public/locales/de/common.json. When i18next
 * cannot resolve a key, its default behavior is to fall back to the raw key
 * string. So under the German locale this button renders the literal text
 * "checkout.button.submit" instead of a translated label. That untranslated
 * key string is exactly what the Playwright test in tests/ is designed to catch.
 */
export default function CheckoutButton() {
  const { t } = useTranslation('common');

  return (
    <button type="submit" className="checkout-button">
      {t('checkout.button.submit')}
    </button>
  );
}

The locale files show the asymmetry clearly. English has the key; German does not:

{
  "checkout": {
    "button": {
      "submit": "Submit Order"
    }
  },
  "greeting": "Welcome"
}

Now the runtime test. The Playwright spec below loads the page in both locales, asserts that no visible text matches the dotted-key regex /[a-z]+.[a-z]+(.[a-z]+)+/, and fails the German render immediately:

/**
 * i18n missing-translation-key detection with Playwright.
 *
 * How to run (against a local Next.js dev server that serves the checkout page
 * at both the English and German locales):
 *
 *   npm install
 *   npx playwright install
 *   BASE_URL=http://localhost:3000 npx playwright test
 *
 * The idea: when i18next cannot resolve a key it falls back to rendering the
 * raw dotted key string (e.g. "checkout.button.submit"). We scan all visible
 * text on the rendered page and fail if anything looks like an untranslated
 * dotted key. The English page passes; the German page fails because
 * public/locales/de/common.json is missing checkout.button.submit.
 */
import { test, expect } from '@playwright/test';

const BASE_URL = process.env.BASE_URL || 'http://localhost:3000';

// Matches dotted lowercase identifiers like "checkout.button.submit" that leak
// into the UI when a translation key is not resolved. Requires at least two
// dots (three segments) to avoid flagging ordinary sentences or filenames.
const DOTTED_KEY_REGEX = /\b[a-z][a-z0-9]*\.[a-z][a-z0-9]*(\.[a-z][a-z0-9]*)+\b/i;

/**
 * Collect every visible text fragment on the page and return the ones that
 * look like untranslated translation keys.
 */
async function findUntranslatedKeys(page): Promise<string[]> {
  const texts = await page.locator('body').allInnerTexts();
  const fragments = texts
    .flatMap((block) => block.split(/\r?\n/))
    .map((line) => line.trim())
    .filter((line) => line.length > 0);

  return fragments.filter((fragment) => DOTTED_KEY_REGEX.test(fragment));
}

test('English locale renders no untranslated keys', async ({ page }) => {
  await page.goto(`${BASE_URL}/checkout`);
  await page.waitForLoadState('networkidle');

  const leaked = await findUntranslatedKeys(page);
  expect(
    leaked,
    `Found untranslated keys on the English page: ${JSON.stringify(leaked)}`,
  ).toEqual([]);
});

test('German locale renders no untranslated keys (catches the missing key)', async ({ page }) => {
  await page.goto(`${BASE_URL}/de/checkout`);
  await page.waitForLoadState('networkidle');

  const leaked = await findUntranslatedKeys(page);
  expect(
    leaked,
    `Found untranslated keys on the German page: ${JSON.stringify(leaked)}`,
  ).toEqual([]);
});

This is the pattern that catches what the build misses. The test is not checking translation quality. It is checking code correctness: did every key the component references actually resolve in every locale?

The test is also exactly the kind of corner case coverage that goes beyond the happy path. The happy path is English. The corner cases are every other locale, every key added in the last sprint, every merge that touched a JSON file.

Flow from a locale JSON file through a passing build step to a rendered UI with a broken element, then a user report, then a backfilled test looping back to the start

A green build can still render a missing key; only a runtime check catches it.

How Autonoma covers i18n testing for no-QA teams

The gap between "the build passed" and "a user is staring at checkout.button.submit" is exactly what Autonoma is built to close. The pain is structural: multi-locale apps accumulate missing translation keys because no automated layer checks rendered output per locale, per PR. The build does not catch it. The linter does not catch it. The team has no QA. So production catches it, via a screenshot from an angry user.

Autonoma's Planner agent reads the codebase, including the i18n configuration file, the locale file structure, and every component that calls t(). It maps each locale file against the keys the component tree actually references, then generates browser-level tests that switch locale, load the page, and assert that no visible text matches a raw dotted-key pattern. The Automator runs those tests against the preview environment spun up for the PR. The Maintainer keeps the coverage current as locale files change, new keys are added, or components are refactored.

This is how the autonomous testing platform handles i18n: not as a configuration task the team sets up once, but as a live coverage layer that evolves with the codebase. The Playwright spec above illustrates the kind of assertion our agents generate, isolated per PR so German-locale failures on a new checkout flow do not affect other open PRs.

The result is a different failure surface. The next time a developer adds t('payment.terms.label') and forgets to add it to de.json, the PR review shows a failing test, not a user screenshot at 11pm. This is not aspirational and it is not a roadmap item: reading the i18n config, generating locale-switching coverage, and maintaining it per PR is what Autonoma does today.

For a team that hears about i18n bugs "real quick" from users, the operating shift is from reactive (user reports raw key in screenshot) to proactive (agent flags missing key in PR review before merge). The coverage does not require a QA hire. It requires connecting the codebase.

The i18n testing patterns that scale

Three patterns compose well together. Understanding where each sits in the coverage stack matters more than running all three independently.

Three panels showing a missing translation key rendering a raw placeholder, a long string overflowing its container, and a layout mirrored for right-to-left direction

Three failure modes: unresolved keys, overflow on longer strings, broken RTL flow.

The failure modes these patterns have to catch are concrete:

A missing key renders its raw t() path (checkout.button.submit) straight into the UI.
A longer locale (German, Finnish) overflows or truncates a fixed-width element.
An RTL locale (Arabic, Hebrew) reverses margins, alignment, and component flow.
A silent fallback to English hides the gap from anyone not reading that locale.

Build-time lint is the cheapest first gate. A lint script (or a CI step using a tool like i18n-ally or a custom key-diff script) compares key sets across locale files and fails the build if any locale is missing a key present in the default. This catches structural parity issues: a key exists in en/common.json but not in de/common.json. What it does not catch is keys missing from the rendered DOM, keys that exist in the file but are referenced by a component under a dynamic namespace, or keys that exist but whose values are empty strings. The lint step is fast and catches the obvious cases. It is not sufficient on its own.

Runtime Playwright assertions are the second gate and the one that catches what lint misses. Loading the page in each locale and asserting that no visible text node matches a dotted-key regex is a render-time check that operates on the actual DOM. It catches dynamic namespace lookups, conditional key references, and any scenario where a key resolves at component level rather than file level. The limitation is maintenance: as the app grows and locale files change, someone must keep the test current. That maintenance burden is real for a team without QA. That maintenance burden is precisely the part Autonoma removes: the runtime locale-switching assertion described here is exactly what the Automator generates from your codebase and runs on every PR, and what the Maintainer keeps current as locale files change.

Visual diff on locale-specific layout is the third pattern and targets a different class of bug: strings that resolve correctly but break the layout. German strings are routinely 30 to 40 percent longer than their English equivalents. A button that reads "Submit" in English might read "Bestellung abschicken" in German. If the button has a fixed width, the German text overflows or truncates. Arabic locale switching introduces RTL layout requirements: margins, padding, text alignment, and component flow all reverse. A visual diff run per locale catches these rendering failures without writing assertions for every individual component.

Together, the three patterns cover structural parity (lint), render-time key resolution (Playwright), and layout integrity (visual diff). The Playwright and visual-diff layers are what Autonoma generates and runs on every PR, executing against the per-PR preview deploy.

Localization vs i18n testing

These terms are often used interchangeably. They address different problems.

i18n testing is an engineering concern. It asks: does the code handle multiple locales correctly? Does locale switching work? Do all keys resolve to non-raw-key text? Does the layout hold under longer strings and RTL direction? These are automatable assertions. A Playwright test can check them. A CI step can fail on them. No human translator is required.

Localization testing is a translator and product concern. It asks: is the German translation accurate? Does the Spanish phrasing sound natural? Is the cultural reference appropriate for the target market? Is the date format correct for the locale? These require human judgment. You can automate a check that a date field contains a date-shaped string, but you cannot automate whether the translation of "checkout" into French captures the right commercial register.

The SERP for "localization testing" is dominated by translator-workflow tools and manual QA services. That is the right answer for localization quality. It is the wrong answer for the missing-key problem, which is a code correctness issue, not a translation quality issue.

For teams without QA, the practical implication is: automate i18n testing entirely, and bring in human review only for localization quality when entering a market where accuracy matters competitively. The two workflows run at different cadences and involve different people. Do not conflate them, or you will end up treating a missing key as a translator problem when it is a build pipeline problem.

The durable i18n testing setup

The operating model that holds at scale is not "run this Playwright script in CI and hope someone remembers to update it." It is a coverage layer that evolves with the codebase automatically.

The lint step is the cheap first gate. Set it up once: compare key sets across locale files in CI, fail on structural parity issues. It costs five minutes to configure and runs in under a second. Keep it.

The Playwright examples in the companion repo illustrate the render-time assertions. They are not the team's permanent manual loop. They are the kind of test Autonoma's Planner generates directly from the codebase, the kind the Automator runs against the preview environment for every PR, and the kind the Maintainer keeps current as locale files and components change. When a developer adds a new key or renames an existing one, the coverage updates without a PR to the test file.

The visual diff layer handles layout integrity per locale. For teams adding RTL locales or entering high-competition markets where German or French strings are significantly longer than English, this layer surfaces rendering failures before they reach production.

The full stack, operated by Autonoma, means a team hears about missing translation keys from the PR review, not from a user screenshot. That is the difference between a five-second fix during code review and a hotfix at 11pm.

Autonoma comes here to solve this

Your build is green. Your linter is happy. And somewhere in your German locale file, a key is missing that a user will find before your team does.

Autonoma comes here to solve this by reading your i18n configuration, identifying every t() call in your component tree, generating locale-switching browser tests that assert resolved text in every supported locale, running those tests on every PR preview environment, and keeping them current as your locale files change. The missing translation never reaches production, because Autonoma catches it at PR review, not at 11pm.

If you ship in more than one language and you do not have a QA team, this is not a nice-to-have. It is the difference between your users seeing your product and your users seeing your code. Connect the repo. Let Autonoma own the i18n coverage.

i18n testing (internationalization testing) verifies that your application renders translated strings correctly across all supported locales. It checks that translation keys resolve to actual text, that layouts hold under longer strings, and that locale-specific patterns like RTL text direction and date formatting work correctly in the rendered UI.

i18n testing is an engineering concern: it checks code correctness for multi-locale rendering, such as whether keys resolve, whether layouts hold, and whether locale switching works. Localization testing is a translator workflow: it checks whether the translated text is accurate, natural, and culturally appropriate. Both matter, but only i18n testing is fully automatable by engineers.

Three layers work together. A build-time lint step checks that locale JSON files have the same key structure. A Playwright spec switches the app into each locale and asserts that no visible text node matches a raw dotted-key pattern like checkout.button.submit. An automated E2E layer like Autonoma generates and maintains those locale-switching assertions directly from your codebase, so no one writes or maintains the test by hand.

Yes. Playwright can load a page with a specific locale query string or cookie, then assert that no visible text matches a dotted-key regex. It can also assert that specific translated strings appear, or that the page language attribute matches the expected locale. The limitation is that someone must write and maintain those assertions as locale files and components change.

No. i18n testing is fully automatable. A build-time lint step handles structural key parity. A runtime Playwright layer catches keys that are present in the JSON but not rendering correctly. An AI-native testing tool like Autonoma generates and maintains those locale-switching tests from your codebase, so the coverage holds per PR without a dedicated QA hire.

i18n Testing: Catching Missing Translations Automatically

Why missing translation keys are a no-QA-team disaster

A missing translation key, start to finish

How Autonoma covers i18n testing for no-QA teams

The i18n testing patterns that scale

Localization vs i18n testing

The durable i18n testing setup

Autonoma comes here to solve this

What is i18n testing?

What is the difference between i18n and localization testing?

How do I catch missing translations automatically?

Can Playwright test translations?

Do I need a QA team for i18n testing?

i18n Testing: Catching Missing Translations Automatically

Why missing translation keys are a no-QA-team disaster

A missing translation key, start to finish

How Autonoma covers i18n testing for no-QA teams

The i18n testing patterns that scale

Localization vs i18n testing

The durable i18n testing setup

Autonoma comes here to solve this

What is i18n testing?

What is the difference between i18n and localization testing?

How do I catch missing translations automatically?

Can Playwright test translations?

Do I need a QA team for i18n testing?

Related articles

How to Use Playwright Codegen (and Why Recorded Tests Rot)

Selenium to Playwright Migration Guide

Playwright Locators: Types and Priority

Managed vs Self-Hosted Playwright: What You Still Own