AWS Cloud Optimization

Cost Efficiency • Performance • Resource Management
Back to Projects
AWS Cost Explorer CloudWatch Trusted Advisor Lambda Functions Auto Scaling Elastic Load Balancing Python
60-70%
Dev/Test Cost Reduction
24/7
Automated Scheduling
Multi-Region
Global Support

📋 Executive Summary

Context: The Challenge

As a solo developer running multiple development and staging environments on AWS, I faced the common problem of paying for resources 24/7 even though I only actively used them during working hours. My AWS bills were unnecessarily high because EC2 instances, databases, and other resources remained running overnight, on weekends, and during periods when I wasn't actively developing.

Action: The Solution

  • Built an intelligent EC2 Auto-Scheduler using AWS Lambda and Python with boto3
  • Implemented timezone-aware scheduling that respects different working hours across regions
  • Created flexible tag-based system allowing per-instance schedule overrides or SSM Parameter Store global policies
  • Added multi-region support to manage instances across different AWS regions from a single Lambda function
  • Integrated CloudWatch metrics for tracking started/stopped instances and cost savings
  • Built comprehensive error handling, dry-run mode, and structured JSON logging for monitoring

Result: Business Impact

  • Personal Cost Savings: Reduced my monthly AWS bill by automatically shutting down dev/staging resources outside business hours (weeknights and weekends)
  • Enterprise Potential: The same solution scales to organizations with hundreds of instances across multiple regions, potentially saving thousands monthly
  • Zero-Touch Automation: Instances automatically start during working hours and stop after hours with no manual intervention required
  • Flexible Scheduling: Supports custom schedules via tags or centralized policies, accommodating different teams and timezones
  • Production-Safe: Environment-based filtering ensures production workloads are never touched
  • Observable & Auditable: CloudWatch metrics and structured logging provide complete visibility into scheduler actions

🛠️ How I Built This

Transparency: This project leveraged AI for debugging complex boto3 interactions, validating right-sizing recommendations, and creating comprehensive documentation. Core architecture, optimization logic, and implementation were self-developed based on AWS best practices and hands-on experience.

Project Overview

As a solo developer managing multiple AWS projects, I built an intelligent EC2 Auto-Scheduler to solve a simple but expensive problem: I was paying for development and staging resources 24/7 even though I only used them during working hours. This Lambda-based solution automatically starts and stops EC2 instances based on customizable schedules, significantly reducing costs without impacting productivity.

The Problem I Solved

Like many developers running cloud infrastructure, I faced several challenges:

Key Features & Benefits

For Solo Developers

For Organizations

Technical Architecture

The EC2 Auto-Scheduler is built as a serverless Lambda function that runs on a schedule (typically every hour via EventBridge):

1. Intelligent Scheduling Engine

2. Multi-Region Support

3. Tagging Strategy

4. Robust Error Handling

How It Works: Step-by-Step

1. Initialization & Configuration

2. Instance Discovery

3. Schedule Evaluation

4. Action Execution

5. Metrics & Reporting

Key Outcomes

Technical Highlights

The EC2 Auto-Scheduler demonstrates several production-grade engineering practices:

1. Timezone-Aware Scheduling

Unlike simple time-based schedulers, this solution respects timezones using Python's zoneinfo module:

2. Flexible Schedule Configuration

Three-tier schedule hierarchy provides maximum flexibility:

# Option 1: Per-instance JSON schedule tag
Schedule={"days":["mon","tue","wed","thu","fri"], "start":"09:00", "end":"17:00"}

# Option 2: Reference to global schedule in SSM Parameter Store
Schedule=business-hours  # looks up /ec2-scheduler/schedule["business-hours"]

# Option 3: Use global default schedule
# If no Schedule tag, uses default from SSM or hardcoded Mon-Fri 08:00-18:00

3. Production-Grade Error Handling

Built to handle failures gracefully at scale:

4. Core Lambda Function (Simplified)

# lambda_function.py (simplified for illustration)
import os, json, logging
from datetime import datetime
from zoneinfo import ZoneInfo
import boto3

def lambda_handler(event, context):
    regions = os.getenv("REGIONS", "").split(",") or [os.environ["AWS_REGION"]]

    for region in regions:
        ec2 = boto3.client("ec2", region_name=region)

        # Find instances with AutoSchedule=true tag
        for inst in ec2.describe_instances(
            Filters=[
                {"Name":"tag:AutoSchedule", "Values":["true"]},
                {"Name":"tag:Environment", "Values":["development","staging","test"]}
            ]
        )["Reservations"]:
            for instance in inst["Instances"]:
                # Get timezone and schedule from tags
                tz = tags.get("Timezone", "Pacific/Honolulu")
                schedule = tags.get("Schedule", default_schedule)

                # Calculate current time in instance timezone
                current_time, current_day = _now_in_tz(tz)

                # Determine if instance should be running
                should_run = _is_within_window(current_time, current_day, schedule)

                # Take action if needed
                if should_run and state == "stopped":
                    ec2.start_instances(InstanceIds=[instance_id])
                elif not should_run and state == "running":
                    ec2.stop_instances(InstanceIds=[instance_id])

    return {"statusCode": 200, "body": json.dumps(summary)}

5. CloudWatch Integration

Complete observability through custom metrics and structured logging:

Real-World Impact

For Solo Developers & Small Teams

For Organizations & Enterprises

Potential Extensions

Key Takeaways

This project demonstrates important principles for cloud cost optimization:

Lessons Learned

Future Enhancements

Previous: Network Integration Next: AI Support Chatbot