GenAI System Design
Build AI-powered systems with LLMs and vector databases.
GenAI System Design
Design AI-powered systems using Large Language Models, vector databases, and modern AI infrastructure.
What You'll Learn
- LLM Integration: Work with OpenAI, Anthropic, and open-source models
- Vector Databases: Implement semantic search and RAG systems
- Prompt Engineering: Design effective prompts for different use cases
- AI Infrastructure: Handle model serving, caching, and scaling
- Cost Optimization: Minimize API costs while maintaining performance
- AI Safety: Implement guardrails and content filtering
Preview Chapter: Building a RAG System
Design a Retrieval-Augmented Generation (RAG) system that can answer questions about your company's documentation.
RAG Architecture Overview
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ User │───▶│ API │───▶│ LLM │
│ Query │ │ Gateway │ │ Service │
└─────────────┘ └─────────────┘ └─────────────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Vector │ │ Embedding │
│ Database │ │ Service │
│ (Pinecone) │ │ │
└─────────────┘ └─────────────┘
│
▼
┌─────────────┐
│ Document │
│ Processing │
│ Pipeline │
└─────────────┘Document Processing Pipeline
1. Text Extraction:
import PyPDF2
import docx
from bs4 import BeautifulSoup
def extract_text(file_path, file_type):
if file_type == 'pdf':
with open(file_path, 'rb') as file:
reader = PyPDF2.PdfReader(file)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
elif file_type == 'docx':
doc = docx.Document(file_path)
text = ""
for paragraph in doc.paragraphs:
text += paragraph.text + "\n"
return text
elif file_type == 'html':
with open(file_path, 'r', encoding='utf-8') as file:
soup = BeautifulSoup(file.read(), 'html.parser')
return soup.get_text()2. Text Chunking:
from langchain.text_splitter import RecursiveCharacterTextSplitter
def chunk_text(text, chunk_size=1000, chunk_overlap=200):
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
length_function=len,
separators=["\n\n", "\n", " ", ""]
)
chunks = splitter.split_text(text)
return chunks3. Embedding Generation:
import openai
from sentence_transformers import SentenceTransformer
class EmbeddingService:
def __init__(self, model_name="text-embedding-ada-002"):
self.model_name = model_name
self.local_model = SentenceTransformer('all-MiniLM-L6-v2')
def generate_embeddings(self, texts, use_local=False):
if use_local:
return self.local_model.encode(texts)
else:
response = openai.Embedding.create(
input=texts,
model=self.model_name
)
return [item['embedding'] for item in response['data']]Vector Database Integration
Pinecone Setup:
import pinecone
from pinecone import Pinecone
class VectorStore:
def __init__(self, api_key, environment):
pc = Pinecone(api_key=api_key)
self.index = pc.Index("document-embeddings")
def upsert_vectors(self, vectors, metadata):
vectors_to_upsert = []
for i, (vector, meta) in enumerate(zip(vectors, metadata)):
vectors_to_upsert.append({
'id': f"doc_{i}",
'values': vector,
'metadata': meta
})
self.index.upsert(vectors=vectors_to_upsert)
def search_similar(self, query_vector, top_k=5):
results = self.index.query(
vector=query_vector,
top_k=top_k,
include_metadata=True
)
return results['matches']RAG Query Processing
1. Query Processing:
class RAGSystem:
def __init__(self, embedding_service, vector_store, llm_service):
self.embedding_service = embedding_service
self.vector_store = vector_store
self.llm_service = llm_service
def process_query(self, user_query):
# Step 1: Generate query embedding
query_embedding = self.embedding_service.generate_embeddings([user_query])[0]
# Step 2: Search for relevant documents
similar_docs = self.vector_store.search_similar(query_embedding, top_k=5)
# Step 3: Prepare context
context = self._prepare_context(similar_docs)
# Step 4: Generate response
response = self._generate_response(user_query, context)
return response
def _prepare_context(self, similar_docs):
context = ""
for doc in similar_docs:
context += f"Document: {doc['metadata']['title']}\n"
context += f"Content: {doc['metadata']['content']}\n\n"
return context
def _generate_response(self, query, context):
prompt = f"""
Based on the following context, answer the user's question.
Context:
{context}
Question: {query}
Answer:
"""
response = self.llm_service.generate(prompt)
return response2. LLM Service:
import openai
from anthropic import Anthropic
class LLMService:
def __init__(self, provider="openai", model="gpt-3.5-turbo"):
self.provider = provider
self.model = model
if provider == "openai":
self.client = openai.OpenAI()
elif provider == "anthropic":
self.client = Anthropic()
def generate(self, prompt, max_tokens=1000):
if self.provider == "openai":
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens
)
return response.choices[0].message.content
elif self.provider == "anthropic":
response = self.client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=max_tokens,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].textAdvanced RAG Features
1. Hybrid Search:
class HybridSearch:
def __init__(self, vector_store, text_search):
self.vector_store = vector_store
self.text_search = text_search
def search(self, query, alpha=0.7):
# Semantic search
semantic_results = self.vector_store.search_similar(query)
# Keyword search
keyword_results = self.text_search.search(query)
# Combine results
combined_results = self._combine_results(
semantic_results,
keyword_results,
alpha
)
return combined_results
def _combine_results(self, semantic, keyword, alpha):
# Normalize scores
semantic_scores = [doc['score'] for doc in semantic]
keyword_scores = [doc['score'] for doc in keyword]
# Combine with weighted average
combined = []
for i, sem_doc in enumerate(semantic):
combined_score = alpha * sem_doc['score'] + (1 - alpha) * keyword[i]['score']
combined.append({
'document': sem_doc,
'score': combined_score
})
return sorted(combined, key=lambda x: x['score'], reverse=True)2. Query Expansion:
class QueryExpansion:
def __init__(self, llm_service):
self.llm_service = llm_service
def expand_query(self, original_query):
prompt = f"""
Expand the following query to include related terms and synonyms:
Original query: {original_query}
Expanded query:
"""
expanded = self.llm_service.generate(prompt, max_tokens=200)
return expandedCourse Structure
1. LLM Fundamentals (1 hour)
- Model types and capabilities
- API integration patterns
- Cost optimization strategies
- Performance considerations
2. Vector Databases (1 hour)
- Embedding models and techniques
- Vector similarity search
- Database selection criteria
- Scaling vector operations
3. RAG Systems (1.5 hours)
- Document processing pipelines
- Retrieval strategies
- Generation optimization
- Evaluation metrics
4. Advanced AI Features (1.5 hours)
- Multi-modal systems
- Real-time AI applications
- AI safety and guardrails
- Monitoring and observability
5. Production Considerations (1 hour)
- Model serving infrastructure
- Caching strategies
- Error handling
- Cost management
Design Patterns
1. Chain of Responsibility:
class AIProcessingChain:
def __init__(self):
self.handlers = []
def add_handler(self, handler):
self.handlers.append(handler)
def process(self, request):
for handler in self.handlers:
request = handler.handle(request)
if request.should_stop:
break
return request2. Strategy Pattern:
class ModelStrategy:
def __init__(self, strategy):
self.strategy = strategy
def generate(self, prompt):
return self.strategy.execute(prompt)
class GPTStrategy:
def execute(self, prompt):
# OpenAI GPT implementation
pass
class ClaudeStrategy:
def execute(self, prompt):
# Anthropic Claude implementation
passCommon Interview Questions
1. How would you design a chatbot that can answer questions about company documentation?
2. How do you handle rate limiting and cost optimization for LLM APIs?
3. What strategies would you use to improve retrieval accuracy in a RAG system?
4. How would you implement real-time AI features in a mobile app?
5. How do you ensure AI safety and prevent harmful outputs?
What's Next
After mastering GenAI system design, you'll be able to build sophisticated AI-powered applications that leverage the latest language models and AI infrastructure. You'll understand how to design systems that are both powerful and cost-effective.
GenAI System Design
Build AI-powered systems with LLMs and vector databases.