NLP - Intermediate - 12 min

Learn Prompt Engineering

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

Prompt engineering is the art of writing inputs that steer a large language model to produce the output you want — without changing any of its weights. With a sufficiently capable model, the difference between a useless answer and a perfect one is often just a few words in the prompt. Understanding prompts as a programming interface — with reusable patterns, common bugs, and best practices — has become a core skill for anyone working with LLMs.

Anatomy of a Modern Prompt

Modern chat models accept a structured conversation:

  System prompt:    sets the model's role, persona, constraints
  ─────────────
  User turn:        the actual question or instruction
  Assistant turn:   the model's response (generated)
  User turn:        follow-up question
  Assistant turn:   another response
  ...

A typical full prompt looks like:

  System: 'You are a senior data scientist. Answer concisely. Always include
           a 1-line code snippet when explaining a concept.'
  User:   'Show me three examples first:
             Q: How do I sort a list in Python?
             A: Use list.sort() — sorts in place.
             Q: ...
             ...
           Now answer: How do I deduplicate a list in Python?'
  Model:  'Use list(set(my_list)) — order is not preserved. ...'

Every part of this is engineerable.

System message + few-shot examples + clear final question = the modern recipe

Key Prompting Techniques

  • Zero-shot: just describe the task. Works for simple/well-known tasks. 'Translate to French: Hello' → 'Bonjour'.
  • Few-shot (in-context learning): give 3-5 examples in the prompt. Massively boosts performance on format-sensitive or unusual tasks.
  • Chain-of-Thought (CoT): ask the model to 'think step by step' before answering. Solves math/logic problems that fail with direct answering.
  • Self-consistency: sample multiple chains-of-thought, take majority answer. Reduces variance for hard problems.
  • Role prompting: 'You are an expert lawyer. Explain ...' — primes the model toward a domain's vocabulary and reasoning style.
  • Constraint prompting: 'Reply with valid JSON only. No prose.' Forces structured output for downstream code consumption.
  • Decomposition: break a complex task into smaller numbered sub-tasks. The model handles each cleanly.

Chain-of-Thought: The Reasoning Unlock

Standard prompt:
  Q: 'A shop has 5 apples. They get 3 more, then sell 4. How many left?'
  A: '4'

GPT-3 base (no CoT): often wrong on multi-step problems.

CoT prompt:
  Q: '... How many left?'
  A: 'Let me think step by step.
       Step 1: Start with 5 apples.
       Step 2: Get 3 more → 5 + 3 = 8.
       Step 3: Sell 4 → 8 − 4 = 4.
       Answer: 4 apples.'

Key trick: include CoT examples in your few-shot prompt. The model learns to imitate the reasoning style and applies it to new problems.
For GSM8K math benchmark, CoT lifted GPT-3 accuracy from 18% → 57%. Same model, same weights — just a better prompt.

Forcing the model to externalise its reasoning consistently improves multi-step problems

Common Failure Modes & Fixes

  • Too vague → 'Write something about climate.' Fix: specify length, audience, format, tone, and angle.
  • No examples → asking for unusual format. Fix: show 2-3 examples of the exact format you want.
  • Negation traps → 'Don't include any code.' Fix: positive framing: 'Reply with prose only — no code blocks.'
  • Hallucinated facts → asking for current events. Fix: provide the facts in the prompt or use RAG.
  • Lost in long context → important instruction at the end of a 5000-token prompt. Fix: put critical instructions at the start AND repeat at the end.
  • Mode collapse → after long conversations, model becomes confused. Fix: occasionally summarise and restart with the summary as context.

Practice questions

  1. What does 'few-shot prompting' mean?
  2. Why does adding 'Let's think step by step' (chain-of-thought) often improve math/reasoning answers?
  3. What is the recommended way to get structured output (e.g., JSON) from an LLM?
  4. Why might the same prompt produce different results on different LLMs (e.g., Claude vs GPT-4)?

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More NLP lessons