Deep Learning - Intermediate - 12 min

Learn Overfitting & Dropout

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

RAG, or Retrieval Augmented Generation, connects a language model to external knowledge. Instead of asking the model to answer only from memory, the system first retrieves relevant documents, gives them to the model as context, and asks it to answer using that evidence.

Why it matters

RAG is one of the most practical ways to build AI assistants for company documents, policies, product manuals, course notes, support tickets, and research collections. It reduces hallucination risk by grounding answers in retrieved context, but it still needs evaluation and guardrails.

Core pipeline

  • Collect documents from trusted sources such as PDFs, docs, web pages, tickets, or database records.
  • Chunk the documents into useful pieces. Chunks that are too small lose context; chunks that are too large add noise.
  • Create embeddings, which are numeric vectors that capture semantic meaning.
  • Store embeddings and metadata in a vector database or search index.
  • Retrieve top-k chunks for the user's question, often combining vector search with keyword search.
  • Optionally rerank the retrieved chunks so the strongest evidence is placed first.
  • Pass the selected context to the LLM with instructions to answer from the evidence and cite sources.
  • Evaluate retrieval quality, answer faithfulness, citation quality, latency, and failure cases.

Key terms

  • Chunking: splitting documents into passages that are small enough to retrieve but large enough to preserve meaning.
  • Embedding: a vector representation used to compare semantic similarity.
  • Vector DB: a database or index optimized for nearest-neighbor search over embeddings.
  • Top-k retrieval: selecting the k most relevant chunks before generation.
  • Reranking: scoring retrieved chunks again with a stronger model to improve context quality.
  • Grounding: forcing the answer to rely on retrieved evidence rather than model memory alone.
  • Faithfulness: whether the final answer is supported by the provided context.

RAG vs fine-tuning

  • Use RAG when facts change often, citations matter, or knowledge lives in private documents.
  • Use fine-tuning when you need behavior, tone, format, task style, or domain-specific decision patterns to improve.
  • Use both when the model needs a better task style and also needs current private knowledge.
  • Do not fine-tune just to memorize a document collection. Retrieval is usually cheaper, safer, and easier to update.

Hallucination reduction

RAG reduces hallucination by giving the model relevant evidence, adding source citations, rejecting low-confidence retrievals, and instructing the model to say when the answer is not in the context. It does not eliminate hallucination automatically; bad chunks, weak retrieval, or loose prompts can still produce unsupported answers.

Visual explanation suggestion

Show the user query flowing into an embedding space, nearest chunks lighting up in a vector database, a reranker sorting evidence cards, and the final answer panel highlighting citations linked to source chunks.

Common mistakes

  • Using arbitrary chunk sizes without testing retrieval quality.
  • Retrieving only by vector similarity when exact keywords or IDs matter.
  • Stuffing too many chunks into the prompt and drowning the answer in noise.
  • Skipping evaluation. Good demos can still fail on edge questions.
  • Not storing metadata such as source, date, permissions, page number, and document version.

Interview-style questions

  • Explain a RAG pipeline from document ingestion to final answer.
  • How would you evaluate whether a RAG system is hallucinating?
  • When would you choose fine-tuning instead of RAG?
  • What can go wrong if chunking is too small or too large?

Related lessons

  • Word Embeddings
  • Transformers in AI
  • Prompt Engineering
  • Fine-tuning LLMs
  • Model Monitoring & Drift

Related project/template CTA

Build this concept with the RAG Chatbot with Citations template or the GenAI Portfolio Project Pack from the AI Projects page.

Practice questions

  1. What is the main purpose of RAG?
  2. What happens when chunks are too small?
  3. What does reranking do?
  4. When is RAG usually better than fine-tuning?

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Deep Learning lessons