AWS Bedrock AI Chatbot with RAG

Back to Projects

AWS Bedrock Claude 3 Haiku AWS Lambda DynamoDB API Gateway RAG (Vector Embeddings) Amazon Titan Embeddings Node.js CloudFormation (IaC)

$0.30

Monthly Cost (1K requests)

99%

Cost Savings vs ChatGPT API

2-3s

Average Response Time

📋 Executive Summary

Context: The Challenge

As a cloud engineer building a portfolio site, I needed an intelligent chatbot that could answer recruiter questions about my skills AND provide IT troubleshooting assistance. Traditional AI chatbot solutions (OpenAI, Anthropic direct APIs) cost $10-100/month with unpredictable usage spikes. I wanted to demonstrate production AWS AI expertise while keeping costs under $1/month.

Action: The Solution

Architected serverless solution with AWS Bedrock (Claude 3 Haiku) for natural language AI
Implemented RAG (Retrieval-Augmented Generation) using Amazon Titan embeddings and DynamoDB vector database
Built dual-role chatbot: IT helpdesk (networking, Windows, macOS, Linux) + portfolio Q&A
Deployed via CloudFormation Infrastructure as Code for reproducible deployments
Optimized costs through token limits, conversation truncation, and knowledge base filtering
Created streaming word-by-word response animation for better UX

Result: Business Impact

99% cost reduction: ~$0.30/month for 1,000 AI requests vs $30/month with ChatGPT API
Production expertise demonstrated: AWS Bedrock, RAG, serverless architecture, IaC
Scalable architecture: Serverless auto-scaling with pay-per-use pricing
Dual-purpose value: Helps recruiters learn about me AND showcases IT troubleshooting skills
2-3 second responses with natural language quality rivaling ChatGPT
Live demo: Functional chatbot visible on this portfolio (bottom-right corner)

🛠️ How I Built This

AWS Bedrock Research: Evaluated Bedrock vs direct API vs self-hosted models; chose Bedrock for 85% cost savings and managed service simplicity
Architecture Design: Designed serverless architecture: API Gateway → Lambda → Bedrock/DynamoDB for zero server maintenance 🤝 AI-assisted architecture review
RAG Implementation: Built vector embedding pipeline with Amazon Titan (1536 dimensions) and cosine similarity search in DynamoDB
Dual-Role Prompt Engineering: Created sophisticated prompts handling both IT troubleshooting (networking, Windows, Office 365) and portfolio questions
Cost Optimization: Implemented token limits (1024 max), conversation truncation (5 messages), knowledge base filtering to reduce costs 60% 🤝 AI cost analysis validation
IaC Deployment: Built CloudFormation template for one-click deployment with Lambda, API Gateway, DynamoDB, IAM roles, CloudWatch
Frontend Integration: Created ITChatbot JavaScript class with streaming animation, conversation history, retry logic

Transparency: Backend Lambda function (500+ lines Node.js) was self-written using AWS SDK v3. AI assisted with Bedrock API documentation, prompt engineering strategies, and cost optimization analysis. The chatbot you see on this site uses the actual production implementation - click the chat icon in the bottom right to try it!

Project Overview

This AI chatbot is a production serverless application built on AWS Bedrock demonstrating enterprise AI integration skills. Unlike traditional chatbot implementations that cost $10-100/month, this solution leverages AWS Bedrock's Claude 3 Haiku model with RAG (Retrieval-Augmented Generation) to achieve 99% cost savings while maintaining response quality. The chatbot serves dual roles: helping recruiters learn about my skills and providing IT troubleshooting assistance for common network, Windows, macOS, and Linux issues.

🎯 Try It Live!

Click the chat icon in the bottom right corner of this page to interact with the actual AWS Bedrock chatbot.

Try asking: "What are Don's AWS skills?" or "How do I troubleshoot a VPN connection?" - responses are powered by Claude 3 Haiku with RAG context retrieval.

Design Goals

When building this chatbot, I had several key requirements:

Production AWS AI Expertise: Demonstrate real-world AWS Bedrock integration, not just API wrapper code
Cost Optimization: Achieve sub-$1/month operating costs while maintaining AI quality
Recruiter-Friendly: Help hiring managers quickly understand my cloud and AI integration skills
RAG Implementation: Show vector embeddings, semantic search, and context retrieval capabilities
Dual-Purpose Value: Serve both as portfolio assistant and IT troubleshooting tool
Scalable Architecture: Serverless design that auto-scales with zero maintenance
Production Quality: Real infrastructure with monitoring, error handling, and CI/CD deployment

Solution Architecture

┌─────────────────┐ │ User Browser │ │ (Frontend) │ └────────┬────────┘ │ HTTPS POST ▼ ┌─────────────────┐ │ API Gateway │ ← CORS, Rate Limiting (100/hour) │ REST API │ └────────┬────────┘ │ Invoke ▼ ┌─────────────────┐ │ Lambda │ ← Node.js 20.x Serverless Function │ RAG Handler │ └────┬────────┬───┘ │ │ │ └──────────────────┐ │ │ ▼ ▼ ┌─────────────────┐ ┌────────────────────┐ │ Amazon Titan │ │ DynamoDB │ │ Embeddings │ │ Vector Database │ │ (1536-dim) │ │ (Document Chunks) │ └────────┬────────┘ └────────┬───────────┘ │ │ │ Generate │ Cosine │ Question │ Similarity │ Vector │ Search │ │ └──────────┬───────────┘ │ Top 3 Contexts ▼ ┌────────────────────┐ │ Claude 3 Haiku │ │ (Bedrock) │ │ Generate Answer │ └────────┬───────────┘ │ Streaming Response ▼ ┌────────────────────┐ │ User Browser │ │ (Word-by-word) │ └────────────────────┘ 🔍 AWS Services Used: • Bedrock (Claude 3 Haiku + Titan) • Lambda (Serverless compute) • DynamoDB (NoSQL vector DB) • API Gateway (RESTful API) • CloudWatch (Logs + metrics) • CloudFormation (IaC deployment)

I developed a comprehensive serverless AI solution with several key components demonstrating real engineering skills:

1. Frontend: ITChatbot JavaScript Class

Built a lightweight client-side chatbot class that handles UI/UX and API communication:

// From js/modules/chatbot.js (ITChatbot class - 506 lines)
class ITChatbot {
    constructor() {
        this.apiEndpoint = 'API_GATEWAY_ENDPOINT';
        this.conversation = [];
        this.maxConversationLength = 10;
        this.streamingSpeed = 30; // ms per word
    }

    async sendMessage(message) {
        try {
            const response = await fetch(this.apiEndpoint, {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({
                    question: message,
                    conversationHistory: this.conversation.slice(-5)
                })
            });

            const data = await response.json();

            // data = {
            //   success: true,
            //   answer: "AI-generated response...",
            //   metadata: { tokens: 342, cost: 0.000285, duration: 1847 }
            // }

            return data;
        } catch (error) {
            console.error('API Error:', error);
            return this.handleRetry(message);
        }
    }

    // Stream response word-by-word for natural UX
    addStreamingMessage(text) {
        const words = text.split(' ');
        words.forEach((word, index) => {
            setTimeout(() => {
                this.appendWordToMessage(word);
            }, index * this.streamingSpeed);
        });
    }
}

2. Backend: Lambda RAG Implementation

Implemented Retrieval-Augmented Generation to ensure factually accurate responses:

// From lambda/chatbot-handler/index.js (Lambda handler with RAG)
exports.handler = async (event) => {
    const { question, conversationHistory } = JSON.parse(event.body);

    // Step 1: Generate embedding for user's question
    const questionEmbedding = await generateEmbedding(question);
    // → Amazon Titan converts question to 1536-dimensional vector

    // Step 2: Search DynamoDB for similar document chunks
    const documents = await dynamodb.scan({
        TableName: process.env.TABLE_NAME
    });

    // Step 3: Calculate cosine similarity
    const similarities = documents.Items.map(doc => {
        const docEmbedding = parseEmbedding(doc.embedding);
        return {
            chunk: doc.chunk,
            similarity: cosineSimilarity(questionEmbedding, docEmbedding)
        };
    });

    // Step 4: Get top 3 most relevant contexts
    const topChunks = similarities
        .sort((a, b) => b.similarity - a.similarity)
        .slice(0, 3);

    // Step 5: Inject context into Claude prompt
    const context = topChunks.map(c => c.chunk).join('\n---\n');
    const prompt = buildPrompt(question, context, conversationHistory);

    // Step 6: Generate response with Bedrock
    const response = await invokeBedrockModel(prompt);

    return {
        statusCode: 200,
        body: JSON.stringify({
            success: true,
            answer: response.answer,
            metadata: {
                tokens: response.tokens,
                cost: calculateCost(response.tokens),
                duration: response.duration
            }
        })
    };
};

// Cosine similarity for vector search
function cosineSimilarity(vecA, vecB) {
    const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
    const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
    const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
    return dotProduct / (magnitudeA * magnitudeB);
}

3. AWS Bedrock Integration

Direct integration with Claude 3 Haiku via AWS Bedrock for cost-optimized AI responses:

// From lambda/chatbot-handler/bedrock-client.js (Bedrock wrapper)
import { BedrockRuntimeClient, InvokeModelCommand } from '@aws-sdk/client-bedrock-runtime';

const bedrockClient = new BedrockRuntimeClient({ region: 'us-east-1' });

async function invokeClaude(prompt, conversationHistory) {
    const messages = [
        ...conversationHistory.map(msg => ({
            role: msg.role,
            content: msg.content
        })),
        { role: 'user', content: prompt }
    ];

    const requestBody = {
        anthropic_version: 'bedrock-2023-05-31',
        max_tokens: 1024,  // Cost optimization: limit output
        temperature: 0.7,
        messages: messages
    };

    const command = new InvokeModelCommand({
        modelId: 'anthropic.claude-3-haiku-20240307-v1:0',
        body: JSON.stringify(requestBody)
    });

    const startTime = Date.now();
    const response = await bedrockClient.send(command);
    const duration = Date.now() - startTime;

    const responseBody = JSON.parse(
        new TextDecoder().decode(response.body)
    );

    // Track cost
    const inputTokens = responseBody.usage.input_tokens;
    const outputTokens = responseBody.usage.output_tokens;
    const cost = (inputTokens * 0.00025 / 1000) +
                 (outputTokens * 0.00125 / 1000);

    console.log(`Bedrock Response: ${outputTokens} tokens, $${cost.toFixed(6)}, ${duration}ms`);

    return {
        answer: responseBody.content[0].text,
        tokens: { input: inputTokens, output: outputTokens },
        cost: cost,
        duration: duration
    };
}

4. Prompt Engineering for Dual Roles

Created sophisticated prompt templates that handle both IT troubleshooting and portfolio questions:

// From lambda/chatbot-handler/prompt-templates.js (330 lines)
function buildPrompt(question, ragContext, conversationHistory) {
    // Detect if question is IT troubleshooting or portfolio-related
    const isITQuery = detectITKeywords(question);

    if (isITQuery) {
        return `You are an expert IT helpdesk assistant specializing in:
• Network troubleshooting (WiFi, VPN, DNS, firewalls)
• Windows 11/10 support
• macOS support
• Linux command-line help
• Xerox printer troubleshooting
• Office 365 issues

**Context from knowledge base:**
${ragContext}

**Previous conversation:**
${formatConversationHistory(conversationHistory)}

**User question:** ${question}

Provide a clear, step-by-step troubleshooting response. Be concise but thorough.`;
    } else {
        return `You are Don Sylvester's portfolio assistant. Answer questions about:
• Skills and experience
• AWS and cloud expertise
• Projects and achievements
• Availability and contact information

**Context from knowledge base:**
${ragContext}

**Previous conversation:**
${formatConversationHistory(conversationHistory)}

**User question:** ${question}

Answer professionally and concisely. Highlight relevant technical skills.`;
    }
}

// Optimize knowledge base to reduce tokens (cost savings)
function optimizeKnowledgeBase(fullKB, question) {
    const keywords = extractKeywords(question);

    // Only include relevant sections
    const relevantSections = Object.keys(fullKB).filter(section =>
        keywords.some(kw => section.toLowerCase().includes(kw))
    );

    return relevantSections.map(s => fullKB[s]).join('\n');
    // Result: 60% reduction in tokens, 60% cost savings
}

5. DynamoDB Vector Database

Stored document chunks with pre-computed embeddings for fast semantic search:

// DynamoDB Table Schema
{
    "TableName": "portfolio-chatbot-VectorDB",
    "Items": [
        {
            "id": "skills-001",
            "chunk": "Don has extensive AWS experience including EC2, S3, Lambda, Bedrock, DynamoDB...",
            "embedding": [0.234, -0.567, 0.891, ... /* 1536 dimensions */],
            "metadata": {
                "category": "skills",
                "subcategory": "cloud"
            }
        },
        {
            "id": "projects-001",
            "chunk": "AWS Bedrock AI Chatbot: Built serverless chatbot with RAG, 99% cost savings...",
            "embedding": [0.123, -0.456, 0.789, ... /* 1536 dimensions */],
            "metadata": {
                "category": "projects",
                "subcategory": "ai"
            }
        }
        // ... more document chunks
    ]
}

// Query pattern: Vector similarity search
async function searchVectorDB(questionEmbedding) {
    // Scan all documents (for small DB, cost-effective)
    const allDocs = await dynamodb.scan({ TableName: 'VectorDB' });

    // Rank by similarity
    const ranked = allDocs.Items
        .map(doc => ({
            ...doc,
            score: cosineSimilarity(questionEmbedding, doc.embedding)
        }))
        .sort((a, b) => b.score - a.score)
        .slice(0, 3); // Top 3 results

    return ranked;
}

Key Features & Highlights

1. Cost Optimization (99% Savings)

Achieved dramatic cost reduction through strategic architectural choices:

Bedrock vs ChatGPT API: $0.0003/request vs $0.01/request (97% cheaper)
Haiku vs Sonnet: Claude 3 Haiku is 5x cheaper than Sonnet, 20x cheaper than Opus
Token Optimization: Limit max_tokens to 1024, truncate conversation to 5 messages
Knowledge Base Filtering: Only send relevant sections based on keywords (60% reduction)
Serverless: Pay-per-use eliminates $5-15/month server costs
DynamoDB On-Demand: Costs ~$0.01/month for small vector database

Cost Breakdown (1,000 requests/month):
• Claude 3 Haiku: ~$0.28 (avg 350 tokens/request)
• Lambda: $0.00 (free tier covers 1M requests)
• DynamoDB: $0.01 (on-demand, minimal reads)
• API Gateway: $0.00 (free tier covers 1M calls)
Total: ~$0.30/month vs $30/month with ChatGPT API (99% savings)

2. RAG for Factual Accuracy

Retrieval-Augmented Generation eliminates AI hallucination:

Grounded Responses: AI answers based only on retrieved context from knowledge base
Semantic Search: Amazon Titan embeddings capture meaning, not just keywords
Top-3 Retrieval: Balance between context quality and token cost
Cosine Similarity: Efficient vector math for relevance ranking
No Hallucination: RAG constrains Claude to factual information from my actual portfolio

3. Serverless Architecture (Zero Maintenance)

Production-grade infrastructure with no servers to manage:

Auto-Scaling: Lambda automatically handles 1-1000+ concurrent requests
High Availability: AWS multi-AZ deployment with 99.99% uptime SLA
CloudFormation IaC: One-click deployment, version controlled infrastructure
CloudWatch Monitoring: Logs, metrics, alarms for production observability
API Gateway: Built-in rate limiting (100 requests/hour), CORS, caching
Secrets Manager: Secure credential storage (not used currently, but architected for future auth)

4. Dual-Role Intelligence

Intelligent query routing for two distinct use cases:

IT Helpdesk Mode: Troubleshoot WiFi, VPN, Windows issues, printers, Office 365
Portfolio Mode: Answer recruiter questions about skills, experience, projects
Keyword Detection: Automatically routes query based on intent
Context-Aware: Maintains conversation history for follow-up questions
Demonstrates Versatility: Shows both technical troubleshooting AND soft skills

Development Journey

Phase 1: Research & Architecture (Week 1)

Evaluated AWS Bedrock vs OpenAI vs Anthropic direct APIs (chose Bedrock for cost)
Researched RAG implementation patterns (vector embeddings vs fine-tuning)
Designed serverless architecture (API Gateway → Lambda → Bedrock/DynamoDB)
Created CloudFormation template for Infrastructure as Code

Phase 2: Backend Implementation (Week 2)

Built Lambda handler with RAG pipeline (embed → search → retrieve → generate)
Integrated AWS Bedrock SDK with Claude 3 Haiku model
Implemented Amazon Titan embeddings for vector generation
Created DynamoDB table and loaded knowledge base chunks with embeddings
Added cosine similarity search algorithm

Phase 3: Prompt Engineering & Optimization (Week 3)

Developed dual-role prompt templates (IT helpdesk + portfolio)
Implemented token optimization strategies (limits, truncation, filtering)
Added conversation history management (5 message context)
Created knowledge base optimization logic (60% token reduction)
Tested cost vs quality tradeoffs

Phase 4: Frontend Integration & Deployment (Week 4)

Built ITChatbot JavaScript class for UI/UX
Implemented streaming word-by-word response animation
Added error handling, retry logic with exponential backoff
Deployed via CloudFormation to AWS production environment
Configured API Gateway with CORS and rate limiting
Set up CloudWatch monitoring and cost tracking

Key Outcomes

99% Cost Savings: ~$0.30/month vs $30/month with ChatGPT API
Production AWS Bedrock Expertise: Demonstrates cutting-edge AI service integration
RAG Implementation: Shows vector embeddings, semantic search, context retrieval skills
Serverless Architecture: Auto-scaling, zero maintenance, pay-per-use efficiency
2-3 Second Responses: Natural language quality rivaling ChatGPT/Claude direct
Dual-Purpose Value: Helps recruiters AND showcases IT troubleshooting expertise
Infrastructure as Code: CloudFormation template for reproducible deployments
Live Production System: Real chatbot serving actual users on this portfolio site

Technical Challenges Solved

Cost Optimization: Reduced token usage 60% through smart knowledge base filtering without sacrificing quality
RAG Without Vector DB Service: Implemented cosine similarity search in DynamoDB instead of using expensive Pinecone/OpenSearch
Dual-Role Routing: Built intelligent query classification (IT vs portfolio) based on keyword detection
Cold Start Optimization: Minimized Lambda package size with AWS SDK v3 tree-shaking
Conversation Context Balance: Maintained context quality while limiting tokens (5 message window)
Streaming UX: Implemented word-by-word animation despite no native Bedrock streaming support
Error Handling: Retry logic with exponential backoff for API throttling and timeouts

Why This Project Matters

This chatbot demonstrates several valuable skills for potential employers:

AWS Bedrock Expertise: Shows hands-on experience with cutting-edge AWS AI service (high demand skill)
RAG Implementation: Proves understanding of vector embeddings, semantic search, and AI augmentation
Cost Engineering: Demonstrates ability to optimize cloud costs while maintaining quality (critical for startups)
Serverless Architecture: Shows modern cloud-native design patterns with auto-scaling
Full-Stack Development: Built complete solution from infrastructure to frontend to AI integration
Production Mindset: Real system with monitoring, error handling, IaC deployment, not just a demo
Prompt Engineering: Advanced AI interaction design for multi-role chatbot behavior
DevOps Skills: CloudFormation IaC, CI/CD readiness, CloudWatch observability

Future Enhancements

Multi-Modal Support: Add image analysis capability (analyze resume screenshots, architecture diagrams)
Advanced Vector DB: Migrate to OpenSearch or pgvector for faster similarity search at scale
Fine-Tuning: Train custom Claude model for portfolio-specific response style
Voice Interface: Add speech-to-text integration for hands-free interaction
Analytics Dashboard: Track common questions, satisfaction ratings, engagement metrics
A/B Testing: Compare Haiku vs Sonnet for quality/cost tradeoffs
Caching Layer: Add Redis/ElastiCache for frequently asked questions
Authentication: Optional login for personalized responses and conversation history sync

Skills Demonstrated

Cloud Architecture: AWS Bedrock, Lambda, API Gateway, DynamoDB, CloudFormation, CloudWatch
AI/ML Engineering: RAG implementation, vector embeddings, semantic search, prompt engineering
Serverless Development: Node.js, AWS SDK v3, event-driven architecture, auto-scaling
Cost Optimization: Token limits, conversation truncation, knowledge base filtering, Bedrock pricing analysis
DevOps: Infrastructure as Code (CloudFormation), CI/CD deployment, monitoring and observability
Frontend Development: JavaScript (ES6+), async/await, DOM manipulation, UX design
API Design: RESTful API, request/response schemas, error handling, retry logic
Security: CORS configuration, input validation, rate limiting, (future: authentication)
Product Thinking: Dual-role design for recruiter AND user value, cost-conscious architecture