Six techniques that actually reduce LLM hallucination in 2026: ground the model with retrieved context, require inline citations, use tools for verifiable facts, constrain to a structured output, explicitly permit "I don't know", and run an LLM-as-judge to flag suspect responses. None of these eliminate hallucination — that's still unsolved at the model architecture level — but stacked together they cut the rate dramatically. On a real customer-support pipeline, layered correctly they took the visible hallucination rate from ~12% to under 1%.
The reason "tell it not to hallucinate" doesn't work: the model has no internal mechanism that distinguishes "I know this for certain" from "this is a plausible-sounding string of tokens". Telling it to be accurate adds about as much value as telling someone to not lie. What works is changing the environment so the model has access to ground truth (grounding), is forced to expose its sources (citations), and has explicit permission to say it doesn't know.
Jump to:
- Technique 1: Ground with retrieved context (RAG)
- Technique 2: Require inline citations
- Technique 3: Use tools for verifiable facts
- Technique 4: Constrain to a structured output
- Technique 5: Explicitly permit "I don't know"
- Technique 6: Run an LLM-as-judge
- How to stack them in production
- FAQ
Technique 1: Ground with retrieved context (RAG)
The single biggest reduction comes from giving the model the actual source documents it should be answering from, rather than letting it generate from its training-data memory. This is Retrieval-Augmented Generation, and in 2026 it is the default architecture for any answer that depends on facts that change (your company's docs, current events, user data).
The pattern:
- Retrieve relevant documents (covered in How to Build RAG with Embeddings and Vector Search)
- Include them in the prompt with clear markers
- Instruct the model to answer only from the retrieved context
You are answering questions about TechEarl's product. Use ONLY the documentation
below to answer. If the answer is not in the documentation, say so.
<documentation>
{{retrieved_chunks}}
</documentation>
User question: {{question}}
The "ONLY the documentation below" instruction is doing most of the work. Without it, the model freely mixes retrieved context with its training-data memory and you can't tell which is which. With it, hallucination drops because the model has explicit permission to refuse rather than guess.
Technique 2: Require inline citations
Make the model show its work. After every claim in the answer, require an inline citation pointing to the chunk that supports it.
For every factual statement in your answer, add a citation in square brackets
referencing the source: [source: doc-123, line 45-60]. Do not add citations for
opinions or general knowledge.
If a claim cannot be cited, do not make the claim.
Two effects:
- You can verify post-hoc. If a claim has a citation, you can pull up the cited source and confirm. If a claim has no citation in a system that requires them, you know to question it.
- The model self-corrects. When the model knows it will have to cite, it generates more carefully. Fewer "this sounds right" claims slip through because the model is anticipating the citation step.
The citation format doesn't matter as long as it's parseable. [source: ...] works. <cite>...</cite> works. Footnote-style [1] with a footnote list works. Pick one and stick with it.
Technique 3: Use tools for verifiable facts
For facts the model is repeatedly wrong about — current date, current weather, your specific database row counts, today's stock price — give it a tool to look up the answer instead of generating it.
You have access to a tool: get_current_time(timezone)
Call this whenever you need to reason about the current time. Never guess.
The model is bad at "what's today's date" because its training data has no idea what "today" is. A tool call returns the actual answer, and the model uses that answer for the rest of the response. Same pattern works for currency conversions, user record lookups, document retrieval, anything that has a deterministic answer.
This is the foundation of agentic workflows — covered in How to Build an LLM Agent with Tool Use.
Technique 4: Constrain to a structured output
A free-text response is the maximum surface area for hallucination. A constrained structured output dramatically reduces the room for "creative" answers because the model has to fill specific fields with specific types.
Compare:
- "Summarise this customer's account status." → free-form, model can invent details.
- "Return this JSON:
{ "status": "active|inactive|past_due", "last_login_date": "YYYY-MM-DD or null", "open_tickets": integer }" → constrained, the model must pick from the enum, must produce a real date or null, must produce an integer.
The enum constraint alone (status must be one of three values) cuts the "creative liberty" hallucination rate hugely. The whole pattern is covered in How to Get Reliable JSON from an LLM.
Technique 5: Explicitly permit "I don't know"
Models hallucinate more when they perceive they're "supposed to" answer. The default training pushes them toward helpfulness; saying "I don't know" feels unhelpful. Tell them explicitly that "I don't know" is the right answer when they don't have the information.
If the user asks something the documentation doesn't cover, say so directly. Use
phrasing like: "I don't see that covered in the available documentation."
It is better to admit a knowledge gap than to guess. Guessing is the failure mode
we are working to eliminate.
This single instruction, added to a RAG system prompt, takes a meaningful hallucination chunk out of the residual rate. Models that "want to help" stop trying to help when help would mean inventing. They also start saying "I don't know" in cases where they should have all along.
Technique 6: Run an LLM-as-judge
For the residual hallucinations that get past the first five techniques, layer in a separate LLM call that evaluates whether the answer is grounded in the source material.
You are a fact-checker. Given a question, a model answer, and the source
documentation the answer should be based on, identify any claims in the answer
that are NOT supported by the documentation. List them in JSON:
{ "unsupported_claims": [...], "supported_claims": [...] }
Run this on every customer-facing answer before you ship it. If there are unsupported claims, either regenerate the answer or fall back to a "I'm not sure, can a human help?" template.
The LLM-as-judge is itself an LLM and can hallucinate, but it hallucinates independently from the answer-generation LLM. The probability that both hallucinate the same wrong claim in the same direction is dramatically lower than either alone.
For production-scale judging, Sonnet is a good default tier — covered in How to Choose Between Claude Haiku, Sonnet, and Opus. Don't use the same model as your answer-generation; you want some architectural diversity.
How to stack them in production
The order to add these to an existing prompt:
- RAG first — biggest single drop in hallucination rate. Get the model reading real documents.
- Explicit "I don't know" permission — one-line addition, large effect.
- Citation requirements — two-line addition, lets you verify and forces self-correction.
- Structured outputs where applicable — works best for extraction and classification, less so for free-form QA.
- Tool use for verifiable facts — when you have a specific kind of fact the model gets wrong.
- LLM-as-judge — only for high-stakes outputs where the cost of the second LLM call is justified.
Stop adding techniques when the residual hallucination rate is acceptable for your use case. For a casual chatbot, 1-2% might be fine. For a medical or legal summary, you want closer to 0.1% with humans in the loop above that.
What to do next
For the techniques that pair with anti-hallucination work:
- How to Build RAG with Embeddings and Vector Search — the retrieval layer that powers Technique 1.
- How to Build an LLM Agent with Tool Use — the foundation for Technique 3 (verifiable-fact tools).
- How to Write LLM Evals That Catch Regressions — the measurement layer to see if your anti-hallucination stack is actually working.
External reference: the Anthropic prompting guide on reducing hallucination covers the citation and grounding patterns from Anthropic's perspective.





