Implement a RAG Pipeline

Build intermediate 45 min typescript

Sources verified Dec 22, 2025

Build a Retrieval-Augmented Generation system from scratch. Index documents, embed queries, retrieve relevant chunks, and generate sourced answers.

1. Understand the Scenario

You're building a documentation Q&A bot for a product. Instead of fine-tuning or stuffing everything into context, you'll implement RAG to retrieve only the relevant documentation chunks for each question.

Learning Objectives

Create document embeddings using OpenAI's embedding API
Implement cosine similarity for vector search
Build a retrieval pipeline that finds relevant chunks
Generate answers with source citations

Concepts You'll Practice

Embeddings & Vector Representations Retrieval-Augmented Generation (RAG)Context Windows

2. Follow the Instructions

What You'll Build

A documentation assistant that:

Indexes your docs by embedding them into vectors
Finds the most relevant chunks when a user asks a question
Passes those chunks to an LLM to generate an answer
Cites which documentation was used

Step 1: Prepare Your Documents

First, chunk your documents into pieces small enough to fit in context, but large enough to be meaningful.

step1_documents.ts
 // Sample documents to index
const documents = [
  {
    id: 'auth-1',
    title: 'Authentication',
    content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
  },
  {
    id: 'rate-1', 
    title: 'Rate Limits',
    content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
  },
  {
    id: 'errors-1',
    title: 'Error Handling', 
    content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
  }
]; 

Step 2: Create Embeddings

Embed each document chunk. Store the embeddings alongside the original content.

step2_embed.ts
 import OpenAI from 'openai';

const openai = new OpenAI();

// Batch size for embedding requests (OpenAI supports array input)
const EMBEDDING_BATCH_SIZE = 20;

async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

// Embed all documents in batches to avoid 429 rate limit errors
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  const results: IndexedDocument[] = [];
  
  // Process in batches to prevent rate limiting
  for (let i = 0; i < docs.length; i += EMBEDDING_BATCH_SIZE) {
    const batch = docs.slice(i, i + EMBEDDING_BATCH_SIZE);
    
    // Use OpenAI's array input for efficiency (single API call per batch)
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: batch.map(doc => doc.content)
    });
    
    // Map embeddings back to documents
    for (let j = 0; j < batch.length; j++) {
      results.push({
        ...batch[j],
        embedding: response.data[j].embedding
      });
    }
  }
  
  return results;
} 

Step 3: Implement Vector Search

When a user asks a question, embed the question and find the most similar document chunks using cosine similarity.

Your Task: Complete the findRelevantDocs and generateAnswer functions in the starter code below.

3. Try It Yourself

exercise_starter.ts
 import OpenAI from 'openai';

const openai = new OpenAI();

interface Document {
  id: string;
  title: string;
  content: string;
}

interface IndexedDocument extends Document {
  embedding: number[];
}

// Sample documents to index
const documents: Document[] = [
  { id: 'auth-1', title: 'Authentication', content: 'Include API key in Authorization header...' },
  { id: 'rate-1', title: 'Rate Limits', content: '100 requests per minute for free tier...' },
];

// Pre-indexed documents (in production, use a vector database)
let index: IndexedDocument[] = [];

// TODO: Implement embedding function
async function embedText(text: string): Promise<number[]> {
  throw new Error('Not implemented');
}

// TODO: Implement document indexing
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  throw new Error('Not implemented');
}

// TODO: Implement cosine similarity
function cosineSimilarity(a: number[], b: number[]): number {
  // Your implementation here
  return 0;
}

// TODO: Implement document retrieval
async function findRelevantDocs(
  query: string, 
  topK: number = 3
): Promise<IndexedDocument[]> {
  // 1. Embed the query
  // 2. Calculate similarity with each indexed document
  // 3. Sort by similarity and return top K
  return [];
}

// TODO: Generate answer with citations
async function generateAnswer(question: string): Promise<string> {
  // 1. Find relevant documents
  // 2. Build a prompt with the retrieved context
  // 3. Generate an answer that cites sources
  return '';
}

// Test it
async function main() {
  // First, index the documents
  index = await indexDocuments(documents);
  
  // Then ask questions
  const answer = await generateAnswer('How do I authenticate?');
  console.log(answer);
}

main(); 

This typescript exercise requires local setup. Copy the code to your IDE to run.

4. Get Help (If Needed)

Reveal progressive hints

Hint 1: Cosine similarity formula: (a · b) / (||a|| × ||b||). The dot product divided by the product of magnitudes.

Hint 2: Use Promise.all to embed the query and sort results by similarity. The embedding model is the same for queries and documents.

Hint 3: For the generateAnswer function, include the document titles in the context so the LLM can cite them. Format like: [Title]: Content

5. Check the Solution

Reveal the complete solution

solution.ts
 import OpenAI from 'openai';

const openai = new OpenAI();

interface Document {
  id: string;
  title: string;
  content: string;
}

interface IndexedDocument extends Document {
  embedding: number[];
}

const documents: Document[] = [
  {
    id: 'auth-1',
    title: 'Authentication',
    content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
  },
  {
    id: 'rate-1',
    title: 'Rate Limits', 
    content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
  },
  {
    id: 'errors-1',
    title: 'Error Handling',
    content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
  }
];

let index: IndexedDocument[] = [];

// Batch size for embedding requests (OpenAI supports array input)
const EMBEDDING_BATCH_SIZE = 20;

async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

// Embed all documents in batches to avoid 429 rate limit errors
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
  const results: IndexedDocument[] = [];
  
  for (let i = 0; i < docs.length; i += EMBEDDING_BATCH_SIZE) {
    const batch = docs.slice(i, i + EMBEDDING_BATCH_SIZE);
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: batch.map(doc => doc.content)
    });
    
    for (let j = 0; j < batch.length; j++) {
      results.push({
        ...batch[j],
        embedding: response.data[j].embedding
      });
    }
  }
  
  return results;
}

function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

async function findRelevantDocs(
  query: string,
  topK: number = 3
): Promise<IndexedDocument[]> {
  // 1. Embed the query
  const queryEmbedding = await embedText(query);
  
  // 2. Calculate similarity with each indexed document
  const scored = index.map(doc => ({
    doc,
    similarity: cosineSimilarity(queryEmbedding, doc.embedding)
  }));
  
  // 3. Sort by similarity and return top K
  return scored
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, topK)
    .map(s => s.doc);
}

async function generateAnswer(question: string): Promise<string> {
  // 1. Find relevant documents
  const relevantDocs = await findRelevantDocs(question, 2);
  
  // 2. Build context from retrieved documents
  const context = relevantDocs
    .map(doc => `[${doc.title}]: ${doc.content}`)
    .join('\n\n');
  
  // 3. Generate answer with citations
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are a helpful documentation assistant. Answer questions based on the provided documentation. Always cite which document(s) you used in your answer using [Document Title] format.\n\nDocumentation:\n${context}`
      },
      {
        role: 'user',
        content: question
      }
    ]
  });
  
  return response.choices[0].message.content || '';
}

async function main() {
  console.log('Indexing documents...');
  index = await indexDocuments(documents);
  console.log(`Indexed ${index.length} documents\n`);
  
  const question = 'How do I authenticate with the API?';
  console.log(`Q: ${question}\n`);
  
  const answer = await generateAnswer(question);
  console.log(`A: ${answer}`);
}

main();

// Example output:
// Q: How do I authenticate with the API?
// A: To authenticate with the API, include your API key in the Authorization header.
// You can create API keys in the dashboard settings. Note that keys beginning with
// sk- are secret and should never be exposed client-side [Authentication]. 

Common Mistakes

Using different embedding models for documents vs queries

Why it's wrong: Different models produce incompatible vector spaces - similarity scores become meaningless

How to fix: Always use the same embedding model for both indexing and querying

Not normalizing vectors before cosine similarity

Why it's wrong: Some vector databases expect normalized vectors - unnormalized vectors give incorrect similarity scores

How to fix: Cosine similarity handles normalization in the formula, but verify your database's requirements

Chunks that are too large or too small

Why it's wrong: Too small loses context (sentence fragments). Too large reduces precision (irrelevant content retrieved)

How to fix: Aim for 200-500 tokens per chunk with some overlap between chunks

Using Promise.all to embed many documents at once

Why it's wrong: If you have 100+ documents, this triggers 100 simultaneous HTTP requests and immediately hits OpenAI's 429 rate limit

How to fix: Use batching - OpenAI supports array input for embeddings. Process in batches of 20 with a single API call per batch.

Test Cases

Cosine similarity works

Input: cosineSimilarity([1, 0], [1, 0])

Expected: 1 (identical vectors)

Retrieves relevant documents

Input: findRelevantDocs('authentication')

Expected: Should return auth-1 document first

Answer includes citation

Input: generateAnswer('How do I authenticate?')

Expected: Response should contain [Authentication] citation

Sources

Tempered AI — Forged Through Practice, Not Hype

? Keyboard shortcuts