Building a Multilingual Documentation Site
Practical guide to multilingual docs with Docusaurus, GitBook, and MkDocs: directory structure, automated translation pipelines, versioning, and URL strategies.
Your docs are in English. Your users aren't. Here's how to add language support to documentation sites built with Docusaurus, GitBook, and MkDocs — including the directory structure, translation pipeline, and versioning strategy that actually work at scale.
The URL structure decision
Before anything else, pick a URL strategy. This affects SEO, routing, and how you organize files.
Subdirectory (recommended): docs.example.com/ja/getting-started
- Single domain, good for SEO consolidation
- Easy to implement in most frameworks
- Clear language segmentation in analytics
ja.docs.example.com/getting-started
- Treated as separate sites by search engines
- More complex DNS and hosting setup
- Useful if different language versions are maintained by different teams
docs.example.jp/getting-started
- Maximum separation
- Most expensive to maintain
- Only makes sense for very large, independently managed locales
Docusaurus: built-in i18n
Docusaurus has first-class i18n support. Configuration in docusaurus.config.js:
module.exports = {
i18n: {
defaultLocale: "en",
locales: ["en", "ja", "de", "fr", "ko", "zh-Hans"],
localeConfigs: {
en: { label: "English" },
ja: { label: "日本語" },
de: { label: "Deutsch" },
fr: { label: "Français" },
ko: { label: "한국어" },
"zh-Hans": { label: "简体中文" },
},
},
};
Docusaurus uses a directory structure like:
docs/
getting-started.md
api-reference.md
i18n/
ja/
docusaurus-plugin-content-docs/
current/
getting-started.md
api-reference.md
de/
docusaurus-plugin-content-docs/
current/
getting-started.md
api-reference.md
The i18n/ directory mirrors the docs/ structure per locale. To initialize translation files:
npx docusaurus write-translations --locale ja
This generates JSON files for UI strings (navbar, footer, etc.) and copies the Markdown files as stubs.
The annoying part: Docusaurus doesn't auto-translate anything. It gives you the file structure and expects you to fill in translations. For a 200-page docs site across 5 languages, that's 1,000 files to manage manually — unless you automate it.
MkDocs: plugin-based i18n
MkDocs doesn't have built-in i18n, but the mkdocs-static-i18n plugin works well:
# mkdocs.yml
plugins:
- i18n:
default_language: en
languages:
en: English
ja: 日本語
de: Deutsch
File structure with the plugin:
docs/
getting-started.en.md
getting-started.ja.md
getting-started.de.md
api-reference.en.md
api-reference.ja.md
api-reference.de.md
Or alternatively with directory-based separation:
docs/
en/
getting-started.md
ja/
getting-started.md
de/
getting-started.md
The suffix-based approach keeps translations next to their source, making it easier to spot missing translations. The directory-based approach is cleaner for larger sites.
GitBook: a different model
GitBook handles i18n through "spaces" — each language version is a separate space. There's no built-in mechanism to keep them in sync. You create a space per language, duplicate the content, and translate it.
This works fine for 2-3 languages but becomes a maintenance nightmare at 5+. The main risk is content drift — the English docs get updated, the translations don't, and users in other languages see outdated information.
GitBook's API can be used to automate syncing, but it requires custom tooling.
Automating the translation pipeline
The manual workflow (export strings, send to translators, wait, import) doesn't scale for documentation that changes frequently. Here's an automated pipeline:
import os
import hashlib
import json
DOCS_DIR = "docs"
I18N_DIR = "i18n"
MANIFEST_FILE = ".translation-manifest.json"
TARGET_LOCALES = ["ja", "de", "fr"]
def get_file_hash(filepath):
with open(filepath, 'r') as f:
return hashlib.sha256(f.read().encode()).hexdigest()
def load_manifest():
if os.path.exists(MANIFEST_FILE):
with open(MANIFEST_FILE, 'r') as f:
return json.load(f)
return {}
def translate_docs():
manifest = load_manifest()
updated = []
for root, dirs, files in os.walk(DOCS_DIR):
for filename in files:
if not filename.endswith('.md'):
continue
filepath = os.path.join(root, filename)
current_hash = get_file_hash(filepath)
if manifest.get(filepath) == current_hash:
continue # File unchanged
# File is new or modified — translate it
with open(filepath, 'r') as f:
content = f.read()
for locale in TARGET_LOCALES:
translated = translate_markdown(content, target_locale=locale)
target_path = filepath.replace(DOCS_DIR, f"{I18N_DIR}/{locale}")
os.makedirs(os.path.dirname(target_path), exist_ok=True)
with open(target_path, 'w') as f:
f.write(translated)
manifest[filepath] = current_hash
updated.append(filepath)
with open(MANIFEST_FILE, 'w') as f:
json.dump(manifest, f, indent=2)
return updated
Run this in CI on every merge to main. Only changed files get re-translated, keeping costs and build times reasonable.
Versioned docs and translations
Documentation versioning adds complexity. If you maintain docs for v1, v2, and v3, each version needs translations. The naive approach (translate all versions of all pages into all languages) creates a combinatorial explosion.
A more practical strategy:
<!-- i18n-outdated: source-hash=abc123 current-hash=def456 -->
{
isOutdated && (
<Banner type="warning">
This translation may be outdated.
<a href={englishUrl}>View the English version</a>
</Banner>
);
}
Handling partial translations
You won't translate every page into every language immediately. You need a fallback strategy:
Option 1: Fall back to English. If a page doesn't exist in the user's language, show the English version. This is what Docusaurus does by default.
Option 2: Show what you have. Translate high-traffic pages first (getting started, API reference, common guides). Show a "not yet translated" notice on other pages with a link to the English version.
Option 3: Machine-translate everything, human-review priority pages. Use auto18n or similar to get a baseline translation of all pages, then have native speakers review the top 20% of pages by traffic. This gives users something in their language immediately while you improve quality incrementally.
Option 3 is what I recommend for most teams. A machine-translated page that's 90% accurate is better than no translation at all.
SEO for multilingual docs
For translated docs to rank in local search results:
hreflang tags. Every page needs hreflang annotations pointing to all its language variants:
<link
rel="alternate"
hreflang="en"
href="https://docs.example.com/en/getting-started"
/>
<link
rel="alternate"
hreflang="ja"
href="https://docs.example.com/ja/getting-started"
/>
<link
rel="alternate"
hreflang="de"
href="https://docs.example.com/de/getting-started"
/>
<link
rel="alternate"
hreflang="x-default"
href="https://docs.example.com/en/getting-started"
/>
Translated metadata. Page titles, meta descriptions, and OpenGraph tags should be translated, not just the body content.
Sitemap per locale. Generate a sitemap index that includes sitemaps for each language:
<sitemapindex>
<sitemap><loc>https://docs.example.com/en/sitemap.xml</loc></sitemap>
<sitemap><loc>https://docs.example.com/ja/sitemap.xml</loc></sitemap>
<sitemap><loc>https://docs.example.com/de/sitemap.xml</loc></sitemap>
</sitemapindex>
Don't auto-redirect based on IP. Let Google and users access all language versions from any location. Use hreflang to signal the right version, but let users choose.
The realistic timeline
For a 100-page docs site adding 5 languages:
- Week 1: Set up the i18n framework (directory structure, build config, locale switcher)
- Week 2: Machine-translate all content, set up the automated pipeline
- Week 3-4: Human review of the top 20 pages per language
- Ongoing: Automated translation of new/changed content, periodic human review