How to Write an Effective System Prompt for an LLM

A production system prompt has five parts in this order: role (who the model is), capabilities (what it can do), constraints (what it must not do), output format (the shape of the response), and refusal policy (what to say when the user asks for something outside the role). Each part has a job. Skipping any of them leaves the model to guess, and the guess is rarely what you want. I'll walk all five with before/after examples, then cover the structure that maximises prompt-cache hits.

The cliché advice "be specific" isn't actionable. Specific about what? The five-part structure is what to be specific about. It works across Claude, GPT, Gemini, and local models because it covers the same gaps every model has: it doesn't know who it is, what it's allowed to do, what it must avoid, what shape the answer takes, or what to say when asked to go off-script.

Jump to:

Part 1: Role
Part 2: Capabilities
Part 3: Constraints
Part 4: Output format
Part 5: Refusal policy
Full example: before and after
Structuring for prompt-cache hits
FAQ

Part 1: Role

The role is one or two sentences telling the model who it is. Not "you are a helpful assistant" — that's the default and produces generic output. Something specific.

Bad: You are a helpful assistant.

Good: You are a senior backend engineer specialising in MySQL optimisation. You have 15 years of production experience and tend toward pragmatic over theoretical solutions.

The specifics shape every subsequent generation. A "senior backend engineer" writes different code than a "junior developer learning Rails". A "compliance officer" answers questions about user data differently than a "marketing copywriter". Pick the role that matches the actual job.

Part 2: Capabilities

List the things the model is supposed to be able to do. This sounds redundant — the model knows what it can do — but it's where you tell the model what to apply that capability to.

code

Capabilities:
- Review SQL queries for performance issues
- Suggest index changes
- Explain EXPLAIN output line by line
- Compare MySQL versions (5.7, 8.0, 8.4) when version-specific behaviour matters

Capabilities act as soft routing. When a user asks "should this be an INDEX or a UNIQUE INDEX?", the model knows it's allowed to give a strong opinion because "suggest index changes" is in the capability list. When they ask "rewrite my Python code", the model is more likely to redirect because Python isn't in the capability list.

Part 3: Constraints

The opposite of capabilities — what the model must not do. Constraints are where you encode the rules that matter for your product:

code

Constraints:
- Never generate SQL that drops a production table
- Never execute commands. Only suggest them, then let the user run them.
- Never assume the user is on a specific MySQL version unless they say so. Ask.
- Never write SQL longer than 50 lines without proposing a refactor first.

Constraints prevent the model from being helpful in ways that are dangerous. They also prevent specific kinds of unhelpful (the 50-line SQL one — without that constraint, you sometimes get walls of SQL that the user can't review).

Write constraints as imperatives ("Never X") not preferences ("Avoid X"). The imperative form is more reliably followed.

Part 4: Output format

The shape of the response. This is the part most prompts skip, and it's the part that has the biggest impact on whether the model output is usable in your app.

For structured-output use cases, this overlaps with schema-constrained JSON (covered in How to Get Reliable JSON from an LLM). For free-text use cases:

code

Output format:
- Start with a one-line summary in italics.
- Show the recommended query in a SQL code block.
- Add 2-4 bullet points explaining why this is better than the original.
- If you suggest an index, show the CREATE INDEX statement separately.
- Use Markdown, not HTML.

The result: every response has the same shape, which makes the UI rendering predictable and the user's mental model consistent.

Part 5: Refusal policy

What happens when the user asks for something outside the role's scope. Without this, the model either:

Tries to help anyway and produces low-quality output outside its area
Refuses awkwardly with generic "I can't help with that"
Goes off-topic for the rest of the conversation

A good refusal policy gives the model a specific way to redirect:

code

Refusal policy:
- If the user asks about a topic outside MySQL optimisation, briefly acknowledge it, then ask whether they want to refocus on the SQL question or end the session.
- If the user asks you to run a destructive command, decline and explain the risk in one sentence.
- If the user asks for legal or financial advice, redirect them to a qualified professional.

Concrete redirects beat generic refusals. The model now has a template for how to handle the case rather than improvising.

Full example: before and after

Before (typical first-draft system prompt):

code

You are a helpful AI assistant that helps users with their SQL queries.
Be polite and explain your reasoning.

After (five-part structured):

code

ROLE
You are a senior MySQL DBA with 15 years of production experience. You tend toward
pragmatic, indexable solutions over clever ones.

CAPABILITIES
- Review SQL queries for performance issues
- Suggest schema and index changes
- Explain EXPLAIN output line by line
- Identify when a query needs to become two queries
- Call out version-specific behaviour for MySQL 5.7, 8.0, and 8.4

CONSTRAINTS
- Never run commands; suggest them and let the user execute
- Never generate destructive SQL (DROP, TRUNCATE without WHERE) without an explicit
  confirmation step
- Never assume the MySQL version; ask if it's not in the conversation
- Keep individual SQL outputs under 50 lines; refactor or break into stages if longer

OUTPUT FORMAT
Start with a one-sentence summary. Then the recommended query in a SQL code block.
Then 2-4 bullets explaining why. If an index would help, show the CREATE INDEX in
a separate code block.

REFUSAL POLICY
- If the user asks about non-MySQL topics, acknowledge briefly and offer to refocus
- If the user asks to run a destructive operation, decline and explain the risk
- If the user is panicking about a production issue, give the safest fix first and
  the optimal fix second

The "after" version is longer, but every part has a job. The model output is dramatically more predictable. It's also cacheable — the entire prompt is stable across calls, so prompt caching makes it essentially free after the first invocation (covered in How to Cut LLM API Costs with Prompt Caching).

Structuring for prompt-cache hits

Put the system prompt first, in full. Put dynamic content (user message, retrieved RAG context, current timestamp) outside the system prompt, in the user message. This way the system prompt is a stable prefix that prompt caching can recognise.

javascript

{
  system: SYSTEM_PROMPT, // The whole 200-line stable block — cached
  messages: [
    { role: "user", content: `User question: ${userMessage}\n\nRelevant context: ${ragContext}` }
  ]
}

Not:

javascript

{
  system: `${SYSTEM_PROMPT}\n\nCurrent time: ${new Date()}`, // Dynamic — kills the cache
  messages: [{ role: "user", content: userMessage }]
}

The current time, request ID, user ID, anything that varies per call has to live in the user message, not the system prompt. Otherwise no two calls share a prefix and you pay full price every time.

What to do next

For the techniques that compound with a well-structured system prompt:

How to Get Reliable JSON from an LLM — the output-format part of the system prompt overlaps with structured outputs. Use both for production reliability.
How to Stop an LLM from Hallucinating — covers the prompt patterns that explicitly reduce hallucination on top of the role definition.

For the LLM-tier and cost decisions a good system prompt enables:

How to Choose Between Claude Haiku, Sonnet, and Opus — a strong system prompt is what lets you drop a workload from Sonnet to Haiku without quality loss.

External reference: the Anthropic prompt engineering guide covers the patterns Anthropic recommends. OpenAI's prompt engineering documentation covers GPT-specific patterns.

FAQ

Long enough to cover the five parts above; short enough to read in one sitting. In practice, production system prompts are 200-1,500 tokens. Below 200 tokens you're probably missing structure; above 1,500 you're probably duplicating yourself.

The length doesn't matter much for cost (prompt caching makes the system prompt essentially free after the first call), but it matters for cognitive overhead when you're maintaining and editing the prompt.

Second person: "You are X. You can Y. You must never Z." Models follow second-person instruction more reliably than first-person ("I am X. I will Y.") which the model can interpret as descriptive rather than directive.

Avoid third person ("The assistant is X") — it adds an unnecessary layer of indirection and the model sometimes ignores it.

Yes. Role first establishes who the model is — every subsequent part is interpreted through that role. Capabilities and Constraints next define what the role can and can't do. Output format and Refusal policy come last because they're applied at response-generation time.

If you put Constraints before Role, the model treats the constraints as general rules rather than rules-for-this-role and they bind less reliably.

Mostly yes. The five-part structure is engine-agnostic. The slight differences: Claude responds well to a clear hierarchy with markdown headers; current GPT models respond well to numbered lists; Gemini handles long XML-tagged sections well. The same content, lightly reformatted, works across all three.

Anthropic's recommended format wraps each section in XML-like tags (<role>...</role>); OpenAI's and Gemini's docs prefer markdown. Both work on the other engines.

Run an eval. Build a list of 20-50 representative user messages, plus a list of 10-20 adversarial messages (asking for things the constraints should block). Run them all through the model. Score whether each output matches what the system prompt should have produced.

An eval suite for this is covered in How to Write LLM Evals That Catch Regressions. The same eval catches regressions when you edit the system prompt.

How to Write an Effective System Prompt

Part 1: Role

Part 2: Capabilities

Part 3: Constraints

Part 4: Output format

Part 5: Refusal policy

Full example: before and after

Structuring for prompt-cache hits

What to do next

FAQ

Ishan Karunaratne

Related posts

How to Write a Dockerfile (FROM, COPY, RUN, CMD, ENTRYPOINT)

How to Start Working on an Existing Git Project

How to Exclude Files and Directories from grep

How long should a system prompt be?

Should I write the system prompt in first or second person?

Does the order of the five parts matter?

Can I use the same system prompt across Claude, GPT, and Gemini?

How do I test if my system prompt is working?

Ishan Karunaratne