Implement a RAG Pipeline
Build a Retrieval-Augmented Generation system from scratch. Index documents, embed queries, retrieve relevant chunks, and generate sourced answers.
1. Understand the Scenario
You're building a documentation Q&A bot for a product. Instead of fine-tuning or stuffing everything into context, you'll implement RAG to retrieve only the relevant documentation chunks for each question.
Learning Objectives
- Create document embeddings using OpenAI's embedding API
- Implement cosine similarity for vector search
- Build a retrieval pipeline that finds relevant chunks
- Generate answers with source citations
Concepts You'll Practice
2. Follow the Instructions
What You'll Build
A documentation assistant that:
- Indexes your docs by embedding them into vectors
- Finds the most relevant chunks when a user asks a question
- Passes those chunks to an LLM to generate an answer
- Cites which documentation was used
Step 1: Prepare Your Documents
First, chunk your documents into pieces small enough to fit in context, but large enough to be meaningful.
// Sample documents to index
const documents = [
{
id: 'auth-1',
title: 'Authentication',
content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
},
{
id: 'rate-1',
title: 'Rate Limits',
content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
},
{
id: 'errors-1',
title: 'Error Handling',
content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
}
]; Step 2: Create Embeddings
Embed each document chunk. Store the embeddings alongside the original content.
import OpenAI from 'openai';
const openai = new OpenAI();
// Batch size for embedding requests (OpenAI supports array input)
const EMBEDDING_BATCH_SIZE = 20;
async function embedText(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text
});
return response.data[0].embedding;
}
// Embed all documents in batches to avoid 429 rate limit errors
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
const results: IndexedDocument[] = [];
// Process in batches to prevent rate limiting
for (let i = 0; i < docs.length; i += EMBEDDING_BATCH_SIZE) {
const batch = docs.slice(i, i + EMBEDDING_BATCH_SIZE);
// Use OpenAI's array input for efficiency (single API call per batch)
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: batch.map(doc => doc.content)
});
// Map embeddings back to documents
for (let j = 0; j < batch.length; j++) {
results.push({
...batch[j],
embedding: response.data[j].embedding
});
}
}
return results;
} Step 3: Implement Vector Search
When a user asks a question, embed the question and find the most similar document chunks using cosine similarity.
Your Task: Complete the findRelevantDocs and generateAnswer functions in the starter code below.
3. Try It Yourself
import OpenAI from 'openai';
const openai = new OpenAI();
interface Document {
id: string;
title: string;
content: string;
}
interface IndexedDocument extends Document {
embedding: number[];
}
// Sample documents to index
const documents: Document[] = [
{ id: 'auth-1', title: 'Authentication', content: 'Include API key in Authorization header...' },
{ id: 'rate-1', title: 'Rate Limits', content: '100 requests per minute for free tier...' },
];
// Pre-indexed documents (in production, use a vector database)
let index: IndexedDocument[] = [];
// TODO: Implement embedding function
async function embedText(text: string): Promise<number[]> {
throw new Error('Not implemented');
}
// TODO: Implement document indexing
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
throw new Error('Not implemented');
}
// TODO: Implement cosine similarity
function cosineSimilarity(a: number[], b: number[]): number {
// Your implementation here
return 0;
}
// TODO: Implement document retrieval
async function findRelevantDocs(
query: string,
topK: number = 3
): Promise<IndexedDocument[]> {
// 1. Embed the query
// 2. Calculate similarity with each indexed document
// 3. Sort by similarity and return top K
return [];
}
// TODO: Generate answer with citations
async function generateAnswer(question: string): Promise<string> {
// 1. Find relevant documents
// 2. Build a prompt with the retrieved context
// 3. Generate an answer that cites sources
return '';
}
// Test it
async function main() {
// First, index the documents
index = await indexDocuments(documents);
// Then ask questions
const answer = await generateAnswer('How do I authenticate?');
console.log(answer);
}
main(); This typescript exercise requires local setup. Copy the code to your IDE to run.
4. Get Help (If Needed)
Reveal progressive hints
5. Check the Solution
Reveal the complete solution
import OpenAI from 'openai';
const openai = new OpenAI();
interface Document {
id: string;
title: string;
content: string;
}
interface IndexedDocument extends Document {
embedding: number[];
}
const documents: Document[] = [
{
id: 'auth-1',
title: 'Authentication',
content: 'To authenticate, include an API key in the Authorization header. API keys can be created in the dashboard settings. Keys beginning with sk- are secret and should never be exposed client-side.'
},
{
id: 'rate-1',
title: 'Rate Limits',
content: 'The API has a rate limit of 100 requests per minute for free tier, 1000 for pro tier. Exceeding the limit returns a 429 status code. Implement exponential backoff for retries.'
},
{
id: 'errors-1',
title: 'Error Handling',
content: 'Errors return JSON with code and message fields. Common errors: 401 for invalid API key, 403 for insufficient permissions, 404 for resource not found, 500 for server errors.'
}
];
let index: IndexedDocument[] = [];
// Batch size for embedding requests (OpenAI supports array input)
const EMBEDDING_BATCH_SIZE = 20;
async function embedText(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text
});
return response.data[0].embedding;
}
// Embed all documents in batches to avoid 429 rate limit errors
async function indexDocuments(docs: Document[]): Promise<IndexedDocument[]> {
const results: IndexedDocument[] = [];
for (let i = 0; i < docs.length; i += EMBEDDING_BATCH_SIZE) {
const batch = docs.slice(i, i + EMBEDDING_BATCH_SIZE);
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: batch.map(doc => doc.content)
});
for (let j = 0; j < batch.length; j++) {
results.push({
...batch[j],
embedding: response.data[j].embedding
});
}
}
return results;
}
function cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
async function findRelevantDocs(
query: string,
topK: number = 3
): Promise<IndexedDocument[]> {
// 1. Embed the query
const queryEmbedding = await embedText(query);
// 2. Calculate similarity with each indexed document
const scored = index.map(doc => ({
doc,
similarity: cosineSimilarity(queryEmbedding, doc.embedding)
}));
// 3. Sort by similarity and return top K
return scored
.sort((a, b) => b.similarity - a.similarity)
.slice(0, topK)
.map(s => s.doc);
}
async function generateAnswer(question: string): Promise<string> {
// 1. Find relevant documents
const relevantDocs = await findRelevantDocs(question, 2);
// 2. Build context from retrieved documents
const context = relevantDocs
.map(doc => `[${doc.title}]: ${doc.content}`)
.join('\n\n');
// 3. Generate answer with citations
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `You are a helpful documentation assistant. Answer questions based on the provided documentation. Always cite which document(s) you used in your answer using [Document Title] format.\n\nDocumentation:\n${context}`
},
{
role: 'user',
content: question
}
]
});
return response.choices[0].message.content || '';
}
async function main() {
console.log('Indexing documents...');
index = await indexDocuments(documents);
console.log(`Indexed ${index.length} documents\n`);
const question = 'How do I authenticate with the API?';
console.log(`Q: ${question}\n`);
const answer = await generateAnswer(question);
console.log(`A: ${answer}`);
}
main();
// Example output:
// Q: How do I authenticate with the API?
// A: To authenticate with the API, include your API key in the Authorization header.
// You can create API keys in the dashboard settings. Note that keys beginning with
// sk- are secret and should never be exposed client-side [Authentication]. Common Mistakes
Using different embedding models for documents vs queries
Why it's wrong: Different models produce incompatible vector spaces - similarity scores become meaningless
How to fix: Always use the same embedding model for both indexing and querying
Not normalizing vectors before cosine similarity
Why it's wrong: Some vector databases expect normalized vectors - unnormalized vectors give incorrect similarity scores
How to fix: Cosine similarity handles normalization in the formula, but verify your database's requirements
Chunks that are too large or too small
Why it's wrong: Too small loses context (sentence fragments). Too large reduces precision (irrelevant content retrieved)
How to fix: Aim for 200-500 tokens per chunk with some overlap between chunks
Using Promise.all to embed many documents at once
Why it's wrong: If you have 100+ documents, this triggers 100 simultaneous HTTP requests and immediately hits OpenAI's 429 rate limit
How to fix: Use batching - OpenAI supports array input for embeddings. Process in batches of 20 with a single API call per batch.
Test Cases
Cosine similarity works
cosineSimilarity([1, 0], [1, 0])1 (identical vectors)Retrieves relevant documents
findRelevantDocs('authentication')Should return auth-1 document firstAnswer includes citation
generateAnswer('How do I authenticate?')Response should contain [Authentication] citation