Skip to content

Implement a RAG Pipeline

Build intermediate 45 min typescript
Sources verified Dec 22, 2025

Build a Retrieval-Augmented Generation system from scratch. Index documents, embed queries, retrieve relevant chunks, and generate sourced answers.

1. Understand the Scenario

You're building a documentation Q&A bot for a product. Instead of fine-tuning or stuffing everything into context, you'll implement RAG to retrieve only the relevant documentation chunks for each question.

Learning Objectives

  • Create document embeddings using OpenAI's embedding API
  • Implement cosine similarity for vector search
  • Build a retrieval pipeline that finds relevant chunks
  • Generate answers with source citations

2. Follow the Instructions

What You'll Build

A documentation assistant that:

  1. Indexes your docs by embedding them into vectors
  2. Finds the most relevant chunks when a user asks a question
  3. Passes those chunks to an LLM to generate an answer
  4. Cites which documentation was used

Step 1: Prepare Your Documents

First, chunk your documents into pieces small enough to fit in context, but large enough to be meaningful.

step1_documents.ts
// Sample documents to index
const documents = [
  {
    id: 'auth-1',
    title: 'Authentication',
    content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
  },
  {
    id: 'rate-1', 
    title: 'Rate Limits',
    content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
  },
  {
    id: 'errors-1',
    title: 'Error Handling', 
    content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
  }
];

Step 2: Create Embeddings

Embed each document chunk. Store the embeddings alongside the original content.

step2_embed.ts
import OpenAI from 'openai';

const openai = new OpenAI();

// Batch size for embedding requests (OpenAI supports array input)
const EMBEDDING_BATCH_SIZE = 20;

async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

// Embed all documents in batches to avoid 429 rate limit errors
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  const results: IndexedDocument[] = [];
  
  // Process in batches to prevent rate limiting
  for (let i = 0; i < docs.length; i += EMBEDDING_BATCH_SIZE) {
    const batch = docs.slice(i, i + EMBEDDING_BATCH_SIZE);
    
    // Use OpenAI's array input for efficiency (single API call per batch)
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: batch.map(doc => doc.content)
    });
    
    // Map embeddings back to documents
    for (let j = 0; j < batch.length; j++) {
      results.push({
        ...batch[j],
        embedding: response.data[j].embedding
      });
    }
  }
  
  return results;
}

When a user asks a question, embed the question and find the most similar document chunks using cosine similarity.

Your Task: Complete the findRelevantDocs and generateAnswer functions in the starter code below.

3. Try It Yourself

exercise_starter.ts
import OpenAI from 'openai';

const openai = new OpenAI();

interface Document {
  id: string;
  title: string;
  content: string;
}

interface IndexedDocument extends Document {
  embedding: number[];
}

// Sample documents to index
const documents: Document[] = [
  { id: 'auth-1', title: 'Authentication', content: 'Include API key in Authorization header...' },
  { id: 'rate-1', title: 'Rate Limits', content: '100 requests per minute for free tier...' },
];

// Pre-indexed documents (in production, use a vector database)
let index: IndexedDocument[] = [];

// TODO: Implement embedding function
async function embedText(text: string): Promise<number[]> {
  throw new Error('Not implemented');
}

// TODO: Implement document indexing
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  throw new Error('Not implemented');
}

// TODO: Implement cosine similarity
function cosineSimilarity(a: number[], b: number[]): number {
  // Your implementation here
  return 0;
}

// TODO: Implement document retrieval
async function findRelevantDocs(
  query: string, 
  topK: number = 3
): Promise<IndexedDocument[]> {
  // 1. Embed the query
  // 2. Calculate similarity with each indexed document
  // 3. Sort by similarity and return top K
  return [];
}

// TODO: Generate answer with citations
async function generateAnswer(question: string): Promise<string> {
  // 1. Find relevant documents
  // 2. Build a prompt with the retrieved context
  // 3. Generate an answer that cites sources
  return '';
}

// Test it
async function main() {
  // First, index the documents
  index = await indexDocuments(documents);
  
  // Then ask questions
  const answer = await generateAnswer('How do I authenticate?');
  console.log(answer);
}

main();

This typescript exercise requires local setup. Copy the code to your IDE to run.

4. Get Help (If Needed)

Reveal progressive hints
Hint 1: Cosine similarity formula: (a · b) / (||a|| × ||b||). The dot product divided by the product of magnitudes.
Hint 2: Use Promise.all to embed the query and sort results by similarity. The embedding model is the same for queries and documents.
Hint 3: For the generateAnswer function, include the document titles in the context so the LLM can cite them. Format like: [Title]: Content

5. Check the Solution

Reveal the complete solution
solution.ts
import OpenAI from 'openai';

const openai = new OpenAI();

interface Document {
  id: string;
  title: string;
  content: string;
}

interface IndexedDocument extends Document {
  embedding: number[];
}

const documents: Document[] = [
  {
    id: 'auth-1',
    title: 'Authentication',
    content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
  },
  {
    id: 'rate-1',
    title: 'Rate Limits', 
    content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
  },
  {
    id: 'errors-1',
    title: 'Error Handling',
    content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
  }
];

let index: IndexedDocument[] = [];

// Batch size for embedding requests (OpenAI supports array input)
const EMBEDDING_BATCH_SIZE = 20;

async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

// Embed all documents in batches to avoid 429 rate limit errors
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  const results: IndexedDocument[] = [];
  
  for (let i = 0; i < docs.length; i += EMBEDDING_BATCH_SIZE) {
    const batch = docs.slice(i, i + EMBEDDING_BATCH_SIZE);
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: batch.map(doc => doc.content)
    });
    
    for (let j = 0; j < batch.length; j++) {
      results.push({
        ...batch[j],
        embedding: response.data[j].embedding
      });
    }
  }
  
  return results;
}

function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

async function findRelevantDocs(
  query: string,
  topK: number = 3
): Promise<IndexedDocument[]> {
  // 1. Embed the query
  const queryEmbedding = await embedText(query);
  
  // 2. Calculate similarity with each indexed document
  const scored = index.map(doc => ({
    doc,
    similarity: cosineSimilarity(queryEmbedding, doc.embedding)
  }));
  
  // 3. Sort by similarity and return top K
  return scored
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, topK)
    .map(s => s.doc);
}

async function generateAnswer(question: string): Promise<string> {
  // 1. Find relevant documents
  const relevantDocs = await findRelevantDocs(question, 2);
  
  // 2. Build context from retrieved documents
  const context = relevantDocs
    .map(doc => `[${doc.title}]: ${doc.content}`)
    .join('\n\n');
  
  // 3. Generate answer with citations
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are a helpful documentation assistant. Answer questions based on the provided documentation. Always cite which document(s) you used in your answer using [Document Title] format.\n\nDocumentation:\n${context}`
      },
      {
        role: 'user',
        content: question
      }
    ]
  });
  
  return response.choices[0].message.content || '';
}

async function main() {
  console.log('Indexing documents...');
  index = await indexDocuments(documents);
  console.log(`Indexed ${index.length} documents\n`);
  
  const question = 'How do I authenticate with the API?';
  console.log(`Q: ${question}\n`);
  
  const answer = await generateAnswer(question);
  console.log(`A: ${answer}`);
}

main();

// Example output:
// Q: How do I authenticate with the API?
// A: To authenticate with the API, include your API key in the Authorization header.
// You can create API keys in the dashboard settings. Note that keys beginning with
// sk- are secret and should never be exposed client-side [Authentication].

Common Mistakes

Using different embedding models for documents vs queries

Why it's wrong: Different models produce incompatible vector spaces - similarity scores become meaningless

How to fix: Always use the same embedding model for both indexing and querying

Not normalizing vectors before cosine similarity

Why it's wrong: Some vector databases expect normalized vectors - unnormalized vectors give incorrect similarity scores

How to fix: Cosine similarity handles normalization in the formula, but verify your database's requirements

Chunks that are too large or too small

Why it's wrong: Too small loses context (sentence fragments). Too large reduces precision (irrelevant content retrieved)

How to fix: Aim for 200-500 tokens per chunk with some overlap between chunks

Using Promise.all to embed many documents at once

Why it's wrong: If you have 100+ documents, this triggers 100 simultaneous HTTP requests and immediately hits OpenAI's 429 rate limit

How to fix: Use batching - OpenAI supports array input for embeddings. Process in batches of 20 with a single API call per batch.

Test Cases

Cosine similarity works

Input: cosineSimilarity([1, 0], [1, 0])
Expected: 1 (identical vectors)

Retrieves relevant documents

Input: findRelevantDocs('authentication')
Expected: Should return auth-1 document first

Answer includes citation

Input: generateAnswer('How do I authenticate?')
Expected: Response should contain [Authentication] citation

Sources

Tempered AI Forged Through Practice, Not Hype

Keyboard Shortcuts

j
Next page
k
Previous page
h
Section home
/
Search
?
Show shortcuts
m
Toggle sidebar
Esc
Close modal
Shift+R
Reset all progress
? Keyboard shortcuts