GenAI System Design

Design AI-powered systems using Large Language Models, vector databases, and modern AI infrastructure.

What You'll Learn

LLM Integration: Work with OpenAI, Anthropic, and open-source models

Vector Databases: Implement semantic search and RAG systems

Prompt Engineering: Design effective prompts for different use cases

AI Infrastructure: Handle model serving, caching, and scaling

Cost Optimization: Minimize API costs while maintaining performance

AI Safety: Implement guardrails and content filtering

Preview Chapter: Building a RAG System

Design a Retrieval-Augmented Generation (RAG) system that can answer questions about your company's documentation.

RAG Architecture Overview

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   User      │───▶│   API       │───▶│   LLM       │
│  Query      │    │  Gateway    │    │  Service    │
└─────────────┘    └─────────────┘    └─────────────┘
                           │                   │
                           ▼                   ▼
                   ┌─────────────┐    ┌─────────────┐
                   │  Vector     │    │  Embedding  │
                   │  Database   │    │  Service    │
                   │ (Pinecone)  │    │             │
                   └─────────────┘    └─────────────┘
                           │
                           ▼
                   ┌─────────────┐
                   │ Document    │
                   │ Processing  │
                   │ Pipeline    │
                   └─────────────┘

Document Processing Pipeline

1. Text Extraction:

import PyPDF2
import docx
from bs4 import BeautifulSoup

def extract_text(file_path, file_type):
    if file_type == 'pdf':
        with open(file_path, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            text = ""
            for page in reader.pages:
                text += page.extract_text()
        return text
    
    elif file_type == 'docx':
        doc = docx.Document(file_path)
        text = ""
        for paragraph in doc.paragraphs:
            text += paragraph.text + "\n"
        return text
    
    elif file_type == 'html':
        with open(file_path, 'r', encoding='utf-8') as file:
            soup = BeautifulSoup(file.read(), 'html.parser')
            return soup.get_text()

2. Text Chunking:

from langchain.text_splitter import RecursiveCharacterTextSplitter

def chunk_text(text, chunk_size=1000, chunk_overlap=200):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,
        separators=["\n\n", "\n", " ", ""]
    )
    
    chunks = splitter.split_text(text)
    return chunks

3. Embedding Generation:

import openai
from sentence_transformers import SentenceTransformer

class EmbeddingService:
    def __init__(self, model_name="text-embedding-ada-002"):
        self.model_name = model_name
        self.local_model = SentenceTransformer('all-MiniLM-L6-v2')
    
    def generate_embeddings(self, texts, use_local=False):
        if use_local:
            return self.local_model.encode(texts)
        else:
            response = openai.Embedding.create(
                input=texts,
                model=self.model_name
            )
            return [item['embedding'] for item in response['data']]

Vector Database Integration

Pinecone Setup:

import pinecone
from pinecone import Pinecone

class VectorStore:
    def __init__(self, api_key, environment):
        pc = Pinecone(api_key=api_key)
        self.index = pc.Index("document-embeddings")
    
    def upsert_vectors(self, vectors, metadata):
        vectors_to_upsert = []
        for i, (vector, meta) in enumerate(zip(vectors, metadata)):
            vectors_to_upsert.append({
                'id': f"doc_{i}",
                'values': vector,
                'metadata': meta
            })
        
        self.index.upsert(vectors=vectors_to_upsert)
    
    def search_similar(self, query_vector, top_k=5):
        results = self.index.query(
            vector=query_vector,
            top_k=top_k,
            include_metadata=True
        )
        return results['matches']

RAG Query Processing

1. Query Processing:

class RAGSystem:
    def __init__(self, embedding_service, vector_store, llm_service):
        self.embedding_service = embedding_service
        self.vector_store = vector_store
        self.llm_service = llm_service
    
    def process_query(self, user_query):
        # Step 1: Generate query embedding
        query_embedding = self.embedding_service.generate_embeddings([user_query])[0]
        
        # Step 2: Search for relevant documents
        similar_docs = self.vector_store.search_similar(query_embedding, top_k=5)
        
        # Step 3: Prepare context
        context = self._prepare_context(similar_docs)
        
        # Step 4: Generate response
        response = self._generate_response(user_query, context)
        
        return response
    
    def _prepare_context(self, similar_docs):
        context = ""
        for doc in similar_docs:
            context += f"Document: {doc['metadata']['title']}\n"
            context += f"Content: {doc['metadata']['content']}\n\n"
        return context
    
    def _generate_response(self, query, context):
        prompt = f"""
        Based on the following context, answer the user's question.
        
        Context:
        {context}
        
        Question: {query}
        
        Answer:
        """
        
        response = self.llm_service.generate(prompt)
        return response

2. LLM Service:

import openai
from anthropic import Anthropic

class LLMService:
    def __init__(self, provider="openai", model="gpt-3.5-turbo"):
        self.provider = provider
        self.model = model
        
        if provider == "openai":
            self.client = openai.OpenAI()
        elif provider == "anthropic":
            self.client = Anthropic()
    
    def generate(self, prompt, max_tokens=1000):
        if self.provider == "openai":
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=max_tokens
            )
            return response.choices[0].message.content
        
        elif self.provider == "anthropic":
            response = self.client.messages.create(
                model="claude-3-sonnet-20240229",
                max_tokens=max_tokens,
                messages=[{"role": "user", "content": prompt}]
            )
            return response.content[0].text

Advanced RAG Features

1. Hybrid Search:

class HybridSearch:
    def __init__(self, vector_store, text_search):
        self.vector_store = vector_store
        self.text_search = text_search
    
    def search(self, query, alpha=0.7):
        # Semantic search
        semantic_results = self.vector_store.search_similar(query)
        
        # Keyword search
        keyword_results = self.text_search.search(query)
        
        # Combine results
        combined_results = self._combine_results(
            semantic_results, 
            keyword_results, 
            alpha
        )
        
        return combined_results
    
    def _combine_results(self, semantic, keyword, alpha):
        # Normalize scores
        semantic_scores = [doc['score'] for doc in semantic]
        keyword_scores = [doc['score'] for doc in keyword]
        
        # Combine with weighted average
        combined = []
        for i, sem_doc in enumerate(semantic):
            combined_score = alpha * sem_doc['score'] + (1 - alpha) * keyword[i]['score']
            combined.append({
                'document': sem_doc,
                'score': combined_score
            })
        
        return sorted(combined, key=lambda x: x['score'], reverse=True)

2. Query Expansion:

class QueryExpansion:
    def __init__(self, llm_service):
        self.llm_service = llm_service
    
    def expand_query(self, original_query):
        prompt = f"""
        Expand the following query to include related terms and synonyms:
        
        Original query: {original_query}
        
        Expanded query:
        """
        
        expanded = self.llm_service.generate(prompt, max_tokens=200)
        return expanded

Course Structure

1. LLM Fundamentals (1 hour)

- Model types and capabilities

- API integration patterns

- Cost optimization strategies

- Performance considerations

2. Vector Databases (1 hour)

- Embedding models and techniques

- Vector similarity search

- Database selection criteria

- Scaling vector operations

3. RAG Systems (1.5 hours)

- Document processing pipelines

- Retrieval strategies

- Generation optimization

- Evaluation metrics

4. Advanced AI Features (1.5 hours)

- Multi-modal systems

- Real-time AI applications

- AI safety and guardrails

- Monitoring and observability

5. Production Considerations (1 hour)

- Model serving infrastructure

- Caching strategies

- Error handling

- Cost management

Design Patterns

1. Chain of Responsibility:

class AIProcessingChain:
    def __init__(self):
        self.handlers = []
    
    def add_handler(self, handler):
        self.handlers.append(handler)
    
    def process(self, request):
        for handler in self.handlers:
            request = handler.handle(request)
            if request.should_stop:
                break
        return request

2. Strategy Pattern:

class ModelStrategy:
    def __init__(self, strategy):
        self.strategy = strategy
    
    def generate(self, prompt):
        return self.strategy.execute(prompt)

class GPTStrategy:
    def execute(self, prompt):
        # OpenAI GPT implementation
        pass

class ClaudeStrategy:
    def execute(self, prompt):
        # Anthropic Claude implementation
        pass

Common Interview Questions

1. How would you design a chatbot that can answer questions about company documentation?

2. How do you handle rate limiting and cost optimization for LLM APIs?

3. What strategies would you use to improve retrieval accuracy in a RAG system?

4. How would you implement real-time AI features in a mobile app?

5. How do you ensure AI safety and prevent harmful outputs?

What's Next

After mastering GenAI system design, you'll be able to build sophisticated AI-powered applications that leverage the latest language models and AI infrastructure. You'll understand how to design systems that are both powerful and cost-effective.

GenAI System Design

GenAI System Design

What You'll Learn

Preview Chapter: Building a RAG System

RAG Architecture Overview

Document Processing Pipeline

Vector Database Integration

RAG Query Processing

Advanced RAG Features

Course Structure

Design Patterns

Common Interview Questions

What's Next

GenAI System Design

Table of Contents