Prompt engineering is the art of writing inputs that steer a large language model to produce the output you want — without changing any of its weights. With a sufficiently capable model, the difference between a useless answer and a perfect one is often just a few words in the prompt. Understanding prompts as a programming interface — with reusable patterns, common bugs, and best practices — has become a core skill for anyone working with LLMs.
Anatomy of a Modern Prompt
Modern chat models accept a structured conversation:
System prompt: sets the model's role, persona, constraints
─────────────
User turn: the actual question or instruction
Assistant turn: the model's response (generated)
User turn: follow-up question
Assistant turn: another response
...
A typical full prompt looks like:
System: 'You are a senior data scientist. Answer concisely. Always include
a 1-line code snippet when explaining a concept.'
User: 'Show me three examples first:
Q: How do I sort a list in Python?
A: Use list.sort() — sorts in place.
Q: ...
...
Now answer: How do I deduplicate a list in Python?'
Model: 'Use list(set(my_list)) — order is not preserved. ...'
Every part of this is engineerable.System message + few-shot examples + clear final question = the modern recipe
Key Prompting Techniques
- Zero-shot: just describe the task. Works for simple/well-known tasks. 'Translate to French: Hello' → 'Bonjour'.
- Few-shot (in-context learning): give 3-5 examples in the prompt. Massively boosts performance on format-sensitive or unusual tasks.
- Chain-of-Thought (CoT): ask the model to 'think step by step' before answering. Solves math/logic problems that fail with direct answering.
- Self-consistency: sample multiple chains-of-thought, take majority answer. Reduces variance for hard problems.
- Role prompting: 'You are an expert lawyer. Explain ...' — primes the model toward a domain's vocabulary and reasoning style.
- Constraint prompting: 'Reply with valid JSON only. No prose.' Forces structured output for downstream code consumption.
- Decomposition: break a complex task into smaller numbered sub-tasks. The model handles each cleanly.
Chain-of-Thought: The Reasoning Unlock
Standard prompt:
Q: 'A shop has 5 apples. They get 3 more, then sell 4. How many left?'
A: '4'
GPT-3 base (no CoT): often wrong on multi-step problems.
CoT prompt:
Q: '... How many left?'
A: 'Let me think step by step.
Step 1: Start with 5 apples.
Step 2: Get 3 more → 5 + 3 = 8.
Step 3: Sell 4 → 8 − 4 = 4.
Answer: 4 apples.'
Key trick: include CoT examples in your few-shot prompt. The model learns to imitate the reasoning style and applies it to new problems.
For GSM8K math benchmark, CoT lifted GPT-3 accuracy from 18% → 57%. Same model, same weights — just a better prompt.Forcing the model to externalise its reasoning consistently improves multi-step problems
Common Failure Modes & Fixes
- Too vague → 'Write something about climate.' Fix: specify length, audience, format, tone, and angle.
- No examples → asking for unusual format. Fix: show 2-3 examples of the exact format you want.
- Negation traps → 'Don't include any code.' Fix: positive framing: 'Reply with prose only — no code blocks.'
- Hallucinated facts → asking for current events. Fix: provide the facts in the prompt or use RAG.
- Lost in long context → important instruction at the end of a 5000-token prompt. Fix: put critical instructions at the start AND repeat at the end.
- Mode collapse → after long conversations, model becomes confused. Fix: occasionally summarise and restart with the summary as context.