A production system prompt has five parts in this order: role (who the model is), capabilities (what it can do), constraints (what it must not do), output format (the shape of the response), and refusal policy (what to say when the user asks for something outside the role). Each part has a job. Skipping any of them leaves the model to guess, and the guess is rarely what you want. I'll walk all five with before/after examples, then cover the structure that maximises prompt-cache hits.
The cliché advice "be specific" isn't actionable. Specific about what? The five-part structure is what to be specific about. It works across Claude, GPT, Gemini, and local models because it covers the same gaps every model has: it doesn't know who it is, what it's allowed to do, what it must avoid, what shape the answer takes, or what to say when asked to go off-script.
Jump to:
- Part 1: Role
- Part 2: Capabilities
- Part 3: Constraints
- Part 4: Output format
- Part 5: Refusal policy
- Full example: before and after
- Structuring for prompt-cache hits
- FAQ
Part 1: Role
The role is one or two sentences telling the model who it is. Not "you are a helpful assistant" — that's the default and produces generic output. Something specific.
Bad: You are a helpful assistant.
Good: You are a senior backend engineer specialising in MySQL optimisation. You have 15 years of production experience and tend toward pragmatic over theoretical solutions.
The specifics shape every subsequent generation. A "senior backend engineer" writes different code than a "junior developer learning Rails". A "compliance officer" answers questions about user data differently than a "marketing copywriter". Pick the role that matches the actual job.
Part 2: Capabilities
List the things the model is supposed to be able to do. This sounds redundant — the model knows what it can do — but it's where you tell the model what to apply that capability to.
Capabilities:
- Review SQL queries for performance issues
- Suggest index changes
- Explain EXPLAIN output line by line
- Compare MySQL versions (5.7, 8.0, 8.4) when version-specific behaviour matters
Capabilities act as soft routing. When a user asks "should this be an INDEX or a UNIQUE INDEX?", the model knows it's allowed to give a strong opinion because "suggest index changes" is in the capability list. When they ask "rewrite my Python code", the model is more likely to redirect because Python isn't in the capability list.
Part 3: Constraints
The opposite of capabilities — what the model must not do. Constraints are where you encode the rules that matter for your product:
Constraints:
- Never generate SQL that drops a production table
- Never execute commands. Only suggest them, then let the user run them.
- Never assume the user is on a specific MySQL version unless they say so. Ask.
- Never write SQL longer than 50 lines without proposing a refactor first.
Constraints prevent the model from being helpful in ways that are dangerous. They also prevent specific kinds of unhelpful (the 50-line SQL one — without that constraint, you sometimes get walls of SQL that the user can't review).
Write constraints as imperatives ("Never X") not preferences ("Avoid X"). The imperative form is more reliably followed.
Part 4: Output format
The shape of the response. This is the part most prompts skip, and it's the part that has the biggest impact on whether the model output is usable in your app.
For structured-output use cases, this overlaps with schema-constrained JSON (covered in How to Get Reliable JSON from an LLM). For free-text use cases:
Output format:
- Start with a one-line summary in italics.
- Show the recommended query in a SQL code block.
- Add 2-4 bullet points explaining why this is better than the original.
- If you suggest an index, show the CREATE INDEX statement separately.
- Use Markdown, not HTML.
The result: every response has the same shape, which makes the UI rendering predictable and the user's mental model consistent.
Part 5: Refusal policy
What happens when the user asks for something outside the role's scope. Without this, the model either:
- Tries to help anyway and produces low-quality output outside its area
- Refuses awkwardly with generic "I can't help with that"
- Goes off-topic for the rest of the conversation
A good refusal policy gives the model a specific way to redirect:
Refusal policy:
- If the user asks about a topic outside MySQL optimisation, briefly acknowledge it, then ask whether they want to refocus on the SQL question or end the session.
- If the user asks you to run a destructive command, decline and explain the risk in one sentence.
- If the user asks for legal or financial advice, redirect them to a qualified professional.
Concrete redirects beat generic refusals. The model now has a template for how to handle the case rather than improvising.
Full example: before and after
Before (typical first-draft system prompt):
You are a helpful AI assistant that helps users with their SQL queries.
Be polite and explain your reasoning.
After (five-part structured):
ROLE
You are a senior MySQL DBA with 15 years of production experience. You tend toward
pragmatic, indexable solutions over clever ones.
CAPABILITIES
- Review SQL queries for performance issues
- Suggest schema and index changes
- Explain EXPLAIN output line by line
- Identify when a query needs to become two queries
- Call out version-specific behaviour for MySQL 5.7, 8.0, and 8.4
CONSTRAINTS
- Never run commands; suggest them and let the user execute
- Never generate destructive SQL (DROP, TRUNCATE without WHERE) without an explicit
confirmation step
- Never assume the MySQL version; ask if it's not in the conversation
- Keep individual SQL outputs under 50 lines; refactor or break into stages if longer
OUTPUT FORMAT
Start with a one-sentence summary. Then the recommended query in a SQL code block.
Then 2-4 bullets explaining why. If an index would help, show the CREATE INDEX in
a separate code block.
REFUSAL POLICY
- If the user asks about non-MySQL topics, acknowledge briefly and offer to refocus
- If the user asks to run a destructive operation, decline and explain the risk
- If the user is panicking about a production issue, give the safest fix first and
the optimal fix second
The "after" version is longer, but every part has a job. The model output is dramatically more predictable. It's also cacheable — the entire prompt is stable across calls, so prompt caching makes it essentially free after the first invocation (covered in How to Cut LLM API Costs with Prompt Caching).
Structuring for prompt-cache hits
Put the system prompt first, in full. Put dynamic content (user message, retrieved RAG context, current timestamp) outside the system prompt, in the user message. This way the system prompt is a stable prefix that prompt caching can recognise.
{
system: SYSTEM_PROMPT, // The whole 200-line stable block — cached
messages: [
{ role: "user", content: `User question: ${userMessage}\n\nRelevant context: ${ragContext}` }
]
}Not:
{
system: `${SYSTEM_PROMPT}\n\nCurrent time: ${new Date()}`, // Dynamic — kills the cache
messages: [{ role: "user", content: userMessage }]
}The current time, request ID, user ID, anything that varies per call has to live in the user message, not the system prompt. Otherwise no two calls share a prefix and you pay full price every time.
What to do next
For the techniques that compound with a well-structured system prompt:
- How to Get Reliable JSON from an LLM — the output-format part of the system prompt overlaps with structured outputs. Use both for production reliability.
- How to Stop an LLM from Hallucinating — covers the prompt patterns that explicitly reduce hallucination on top of the role definition.
For the LLM-tier and cost decisions a good system prompt enables:
- How to Choose Between Claude Haiku, Sonnet, and Opus — a strong system prompt is what lets you drop a workload from Sonnet to Haiku without quality loss.
External reference: the Anthropic prompt engineering guide covers the patterns Anthropic recommends. OpenAI's prompt engineering documentation covers GPT-specific patterns.





