Prompt Engineering for Translation: Getting LLMs to Translate Like Professionals
System prompts, formality controls, glossary injection, few-shot examples, and markup preservation techniques for high-quality LLM translation.
Using ChatGPT or Claude for translation by typing "translate this to Spanish" works surprisingly well for casual use. But it leaves a lot of quality on the table. With the right prompting, you can get output that approaches professional human translation — and for some content types, matches it.
Here's what actually moves the needle.
The baseline system prompt
A bare "translate to German" prompt produces generic, middle-of-the-road translations. A good system prompt sets the ground rules:
You are a professional translator specializing in software documentation.
Translate the following text from English to German.
Rules:
- Use formal register (Sie, not du)
- Preserve all Markdown formatting exactly
- Do not translate text inside code blocks or inline code
- Do not translate URLs or file paths
- Keep brand names and product names in English
- Use German technical terminology where established terms exist
(e.g., "Datenbank" not "Database", but keep "API" as "API")
- Output only the translated text with no explanations or notes
Every sentence in that prompt prevents a specific class of error I've seen in production. Let's break them down.
Formality control
Many languages have formal and informal registers. German has du/Sie, French has tu/vous, Japanese has multiple levels of keigo (honorific speech), Korean has seven speech levels.
Without explicit instruction, LLMs default to a mix — sometimes formal, sometimes informal, sometimes switching mid-document. Specify the register:
Use formal register throughout:
- German: Sie-Form
- French: vouvoiement
- Japanese: です/ます form (desu/masu)
- Korean: 합쇼체 (formal polite)
For user-facing product copy, formal is usually safer. For developer docs, informal often reads better. For marketing, it depends on the brand voice. The point is to decide explicitly rather than letting the model guess.
Glossary injection
Technical products have terminology that must be translated consistently. "Workspace" should always be "Arbeitsbereich" in German, not sometimes "Arbeitsplatz" or "Arbeitsumgebung."
Inject a glossary in the system prompt:
Use this glossary for consistent terminology:
- workspace → Arbeitsbereich
- deployment → Bereitstellung
- pipeline → Pipeline (keep in English)
- repository → Repository (keep in English)
- pull request → Pull Request (keep in English)
- branch → Branch (keep in English)
- dashboard → Dashboard (keep in English)
- endpoint → Endpunkt
Keep the glossary under 50 terms — LLMs follow shorter, focused glossaries more reliably than exhaustive ones. Prioritize terms where inconsistency would confuse users or where the translation is non-obvious.
For larger glossaries, a two-pass approach works: first identify which glossary terms appear in the source text, then include only those in the prompt.
relevant_terms = {k: v for k, v in glossary.items() if k.lower() in source_text.lower()}
Few-shot examples
If you have existing human translations that match your desired style, include them as examples:
Here are examples of the translation style to follow:
English: "Click Save to apply your changes."
German: "Klicken Sie auf Speichern, um Ihre Änderungen zu übernehmen."
English: "Your deployment is in progress. This usually takes 2-3 minutes."
German: "Ihre Bereitstellung wird durchgeführt. Dies dauert in der Regel 2–3 Minuten."
Now translate the following:
Three to five examples are enough. More than that and you're spending tokens without proportional quality gains. Choose examples that demonstrate:
- The correct formality level
- How to handle UI element names (bold, quotes, etc.)
- Technical terminology preferences
- Sentence length and structure preferences
Markup preservation
This is the most common failure mode when using LLMs for translation in production. The model "helpfully" reformats your Markdown, changes HTML tags, or breaks JSX syntax.
Explicit instructions help but aren't sufficient alone:
, [], (), #, -, >
CRITICAL: Preserve all markup exactly. This includes:
,
, , , etc.
For reliability, use a two-step process:
Pre-process: Replace markup with numbered placeholders
Translate: Send the cleaned text with placeholders
Post-process: Restore markup from placeholders
python
import re
def protect_markup(text): placeholders = {} counter = [0]
def replace(match): counter[0] += 1 key = f"__PH{counter[0]}__" placeholders[key] = match.group(0) return key
# Protect code blocks
protected = re.sub(r'``[\s\S]*?`', replace, text)
# Protect inline code
protected = re.sub(r'[^]+', replace, protected)
# Protect links
protected = re.sub(r'\[([^\]]+)\]\(([^)]+)\)', lambda m: f'{m.group(1)}})', protected)
# Protect template variables
protected = re.sub(r'\{[^}]+\}', replace, protected)
return protected, placeholders
def restore_markup(translated, placeholders):
for key, value in placeholders.items():
translated = translated.replace(key, value)
return translated
This is more robust than relying on the LLM to preserve markup, though it does remove context that might help translation (like knowing that {userName} is a person's name).
Temperature and sampling
For translation, use temperature 0 or very close to it. Translation has a relatively narrow range of "correct" outputs, and higher temperatures introduce unnecessary variation:
- Temperature 0: Deterministic, consistent output
- Temperature 0.1-0.3: Slight variation, sometimes produces more natural phrasing
- Temperature 0.7+: Too much variation, inconsistent terminology
Chunking strategy
LLMs have context windows, and translation uses roughly 2x the input length in tokens (input + output). For long documents, you need to chunk — but where you split matters enormously.
Bad: Split every N tokens regardless of content boundaries.
Good: Split on paragraph boundaries, and include the previous paragraph as context:python def chunk_for_translation(paragraphs, max_tokens=2000): chunks = [] current_chunk = [] current_tokens = 0
for i, para in enumerate(paragraphs): para_tokens = count_tokens(para) if current_tokens + para_tokens > max_tokens and current_chunk: # Include last paragraph of previous chunk as context context = current_chunk[-1] if current_chunk else "" chunks.append({ "context": context, "translate": current_chunk }) current_chunk = [para] current_tokens = para_tokens else: current_chunk.append(para) current_tokens += para_tokens
if current_chunk: chunks.append({"context": "", "translate": current_chunk})
return chunks
Post-translation validation
Even with good prompts, validate the output:
python
def validate_translation(source, translated):
errors = []
# Check placeholder preservation source_placeholders = re.findall(r'\{[^}]+\}', source) translated_placeholders = re.findall(r'\{[^}]+\}', translated) if set(source_placeholders) != set(translated_placeholders): errors.append("Placeholder mismatch")
# Check markup balance for char in ['**', '', '[', ']']: if source.count(char) != translated.count(char): errors.append(f"Unbalanced markup: {char}")
# Check for untranslated content (heuristic) if len(translated) < len(source) * 0.5: errors.append("Translation suspiciously short")
# Check for added explanations if "Note:" in translated and "Note:" not in source: errors.append("Model may have added explanatory text")
return errors ```
If validation fails, retry with a more explicit prompt. If it fails repeatedly, flag for human review.
Putting it together
Services like auto18n handle most of this plumbing internally — glossary management, markup protection, formality settings, quality validation. But if you're building your own translation pipeline on raw LLM APIs, these are the techniques that separate "passable" from "professional-grade" output.
The single highest-ROI technique is the glossary. Consistent terminology is the number one thing professional translators enforce, and it's the easiest thing to automate with prompt engineering. Start there, add formality controls, then layer on few-shot examples if you need tighter style control.