By Veera Sir
26 Topics โ Click any card to start studying
Before cloud computing, every company had to build and manage its own data centers โ buying servers, networking gear, storage, hiring infrastructure teams, paying for power and cooling. This was expensive, slow, and inflexible. Cloud computing solves all of this by providing IT infrastructure over the internet as a service.
| Type | Description | Who Controls Hardware | Example |
|---|---|---|---|
| Public Cloud | Owned and operated by third-party cloud providers. Resources shared over internet. Multi-tenant. | Cloud Provider (AWS) | AWS, Azure, GCP |
| Private Cloud | Cloud infrastructure used exclusively by a single organization. Can be on-premises or hosted by third party. | Organization or hosted provider | VMware, OpenStack, IBM Cloud Private |
| Hybrid Cloud | Combination of public and private clouds with data and application portability between them. Best of both worlds. | Both | AWS Outposts + AWS Cloud |
| Multi-Cloud | Using services from multiple cloud providers simultaneously. Avoids vendor lock-in. | Multiple providers | AWS + Azure + GCP together |
| Community Cloud | Shared infrastructure for specific community with common concerns (compliance, security). | Community/Provider | Government cloud, Healthcare cloud |
These three models define HOW MUCH of the stack you manage vs the cloud provider. Think of it as a pizza analogy โ how much do you make yourself vs order in?
The responsibility shifts based on the service type: EC2 (IaaS) = you manage OS and above. RDS (PaaS) = AWS manages OS and DB engine patching. Lambda (Serverless) = AWS manages almost everything.
| Model | Description | Savings vs On-Demand | Best For |
|---|---|---|---|
| On-Demand | Pay per second/hour, no commitment, no upfront cost | Baseline (0%) | Unpredictable workloads, testing, short-term |
| Reserved Instances (RI) | 1-year or 3-year commitment. Standard RI or Convertible RI. | Up to 72% | Steady-state predictable workloads (databases, web servers) |
| Spot Instances | Bid on unused AWS EC2 capacity. Can be interrupted with 2-min warning. | Up to 90% | Fault-tolerant batch jobs, big data, CI/CD, stateless apps |
| Savings Plans | Flexible 1-3 year commitment to usage amount ($/hr). Covers EC2, Fargate, Lambda. | Up to 66% | Flexible usage across instance types and regions |
| Dedicated Hosts | Physical server dedicated to you. Bring your own license (BYOL). | 0-30% (BYOL savings) | Compliance requirements, software licensing, HIPAA |
| Dedicated Instances | Runs on hardware dedicated to you but AWS manages the host. | Small premium over On-Demand | Compliance requiring dedicated hardware |
Virtualization is the process of creating a software-based (virtual) representation of physical computing resources such as servers, storage, networks, and desktops. It uses a software layer called a Hypervisor (VMM โ Virtual Machine Monitor) to abstract physical hardware and present it to multiple virtual machines simultaneously.
Without virtualization, one physical server runs one operating system. With virtualization, one physical server can run 10, 50, or 100+ virtual machines โ each with its own OS, isolated from each other.
Virtualization is the foundation of cloud computing. Every EC2 instance you launch in AWS is actually a virtual machine running on AWS physical hardware. When you launch 100 EC2 instances, AWS spins up 100 VMs across their physical servers using the Nitro Hypervisor. You share physical hardware with other AWS customers but remain completely isolated.
| Type | Description | How It Works | AWS Service |
|---|---|---|---|
| Server/Hardware Virtualization | Multiple VMs share one physical server | Hypervisor divides CPU/RAM/storage into VMs | EC2 Instances |
| Storage Virtualization | Multiple physical storage devices pooled into one logical storage unit | Abstraction layer manages storage allocation transparently | EBS, EFS, S3 |
| Network Virtualization | Physical network resources abstracted into software-defined networks | VLANs, SDN, overlay networks | VPC, Security Groups, ENI |
| Desktop Virtualization (VDI) | Desktop environments hosted on central server, accessed remotely | Users stream desktop from server | Amazon WorkSpaces |
| Application Virtualization | Application runs in isolated environment separate from host OS | Container or sandbox wraps the app | Docker, ECS, EKS |
| OS-Level Virtualization (Containers) | Multiple isolated user-space instances on same OS kernel | Namespaces + cgroups isolate processes | ECS, EKS, Fargate |
Linux is the dominant OS in cloud computing. Over 90% of AWS workloads run on Linux. Amazon Linux 2 and Amazon Linux 2023 are AWS's own Linux distributions optimized for EC2. Linux is free, open-source, stable, and highly customizable โ perfect for servers.
ls -la โ list all files with permissions, hidden filespwd โ print working directory (current location)cd /path โ change directorycd .. โ go up one directorycd ~ โ go to home directorymkdir -p dir/subdir โ create directory (with parents)rm -rf dir โ remove directory recursively (careful!)cp -r src dst โ copy file/dir recursivelymv src dst โ move or renametouch file.txt โ create empty filecat file โ display file contentless file โ paginated view (q to quit)head -20 file โ first 20 linestail -20 file โ last 20 linestail -f /var/log/app.log โ follow log in real-timegrep "text" file โ search pattern in filegrep -r "text" /dir โ recursive searchfind / -name "*.conf" โ find files by namewc -l file โ count linesdiff file1 file2 โ compare two filesln -s /target /link โ create symbolic linktop โ live process monitor (q to quit)htop โ improved interactive process viewerps aux โ list all processes with user/CPU/memps aux | grep nginx โ find specific processkill PID โ terminate process gracefully (SIGTERM)kill -9 PID โ force kill (SIGKILL)pkill nginx โ kill by namedf -h โ disk usage (human readable)du -sh /var โ directory disk usagefree -h โ memory usageuptime โ system uptime + load averageuname -r โ kernel versionuname -a โ all system infohostname โ show/set hostnamewhoami โ current logged-in userid โ current user's UID, GID, groupshistory โ command historywhich cmd โ full path of commandsudo cmd โ run as superuser (root)su - username โ switch userenv โ show environment variablesecho $HOME โ print environment variableexport VAR=value โ set environment variable| Directory | Full Name | Purpose & Contents |
|---|---|---|
/ | Root | Top of the entire filesystem hierarchy. Everything is under / |
/bin | Binaries | Essential user command binaries (ls, cp, mv, cat, grep). Available to all users. |
/sbin | System Binaries | System administration binaries (iptables, fdisk, mount). Mostly for root user. |
/etc | Et Cetera | System-wide configuration files. /etc/hosts (hostname resolution), /etc/fstab (mounts), /etc/nginx (nginx config) |
/home | Home | User home directories. /home/ubuntu, /home/ec2-user. Personal files, settings. |
/root | Root Home | Home directory for the root user (NOT same as /) |
/var | Variable | Variable data that changes frequently: /var/log (logs), /var/www (web files), /var/lib (databases) |
/tmp | Temporary | Temporary files. Cleared on reboot. World-writable. Use for scratch space. |
/usr | Unix System Resources | User programs and data: /usr/bin (most user commands), /usr/lib (libraries), /usr/local (manually installed software) |
/opt | Optional | Optional/third-party software. JDK, AWS CLI, custom apps installed here. |
/proc | Process | Virtual filesystem. /proc/cpuinfo, /proc/meminfo, /proc/PID/. Kernel exposes system info here. |
/dev | Devices | Device files: /dev/sda (disk), /dev/null (discard output), /dev/random (random data) |
/mnt | Mount | Temporary mount point for external/additional filesystems (USB drives, EBS volumes) |
/boot | Boot | Boot loader files, Linux kernel (vmlinuz), initrd. Do NOT delete! |
/lib | Libraries | Essential shared libraries for /bin and /sbin binaries |
Linux permissions control who can read, write, and execute files. Every file has three permission sets: Owner (user), Group, and Others.
# Permission output from ls -la: # -rwxr-xr-- 1 ec2-user ec2-user 1234 Jan 1 file.sh # ^^^ ^^^ ^^^ # | | โโโ Others: r-- = read only # | โโโโโโโ Group: r-x = read and execute # โโโโโโโโโโโ Owner: rwx = read, write, execute # | # - = regular file, d = directory, l = symlink # Permission values: r=4, w=2, x=1 # rwx = 4+2+1 = 7 # r-x = 4+0+1 = 5 # r-- = 4+0+0 = 4 # rw- = 4+2+0 = 6 chmod 755 script.sh # owner=rwx(7), group=rx(5), others=rx(5) chmod 644 config.txt # owner=rw(6), group=r(4), others=r(4) chmod 600 key.pem # owner=rw only, no one else can read chmod 400 key.pem # owner=read only (SSH key requirement) chmod +x script.sh # add execute permission for all chmod -x script.sh # remove execute permission chmod u+w file # add write for owner (u=user/owner, g=group, o=others, a=all) # Change ownership chown ec2-user file.txt # change owner chown ec2-user:developers file # change owner AND group chgrp developers file # change group only chown -R ec2-user /var/www # recursive (entire directory)
ps aux # list all processes (a=all users, u=user-oriented, x=no terminal) top # real-time process monitor (press 'q' to quit) htop # colorful interactive process viewer kill -l # list all signals (SIGTERM=15, SIGKILL=9) kill 1234 # send SIGTERM (graceful shutdown) to PID 1234 kill -9 1234 # send SIGKILL (force kill) to PID 1234 killall nginx # kill all processes named 'nginx' # Background processes nohup ./script.sh & # run in background, ignore hangup signal, output to nohup.out ./script.sh & # run in background (killed on terminal close) jobs # list background jobs fg %1 # bring job #1 to foreground bg %1 # resume job #1 in background disown -h %1 # disown job so it persists after logout # Systemd (modern Linux service management) systemctl start nginx # start service systemctl stop nginx # stop service systemctl restart nginx # stop then start systemctl reload nginx # reload config without restart systemctl status nginx # check service status (running/stopped/failed) systemctl enable nginx # auto-start on boot systemctl disable nginx # disable auto-start systemctl list-units --type=service # list all services
# User management useradd -m username # create user with home directory useradd -m -s /bin/bash user # create with bash shell passwd username # set/change password usermod -aG sudo username # add to sudo group (Ubuntu) usermod -aG wheel username # add to wheel group (RHEL/CentOS) usermod -s /bin/bash user # change shell userdel username # delete user userdel -r username # delete user + home directory # Group management groupadd developers # create group groupdel developers # delete group groups username # show groups for user id username # show UID, GID, groups # Important user files cat /etc/passwd # user accounts (username:x:UID:GID:comment:home:shell) cat /etc/shadow # password hashes (root only) cat /etc/group # group definitions # Sudo configuration visudo # safely edit /etc/sudoers # Add: username ALL=(ALL) NOPASSWD: ALL (passwordless sudo)
# Ubuntu/Debian โ APT package manager sudo apt update # update package index (always do this first) sudo apt upgrade # upgrade all installed packages sudo apt install nginx -y # install nginx sudo apt remove nginx # remove nginx sudo apt purge nginx # remove nginx + config files sudo apt autoremove # remove unused dependencies apt search nginx # search for package apt show nginx # show package info dpkg -l | grep nginx # list installed packages matching nginx dpkg -l # list all installed packages # Amazon Linux 2 / RHEL / CentOS โ YUM package manager sudo yum update # update all packages sudo yum install httpd -y # install Apache sudo yum remove httpd # remove Apache sudo yum list installed # list installed packages sudo yum info httpd # show package info sudo yum search nginx # search packages # Amazon Linux 2023 / RHEL 8+ โ DNF (newer yum) sudo dnf update sudo dnf install nginx sudo dnf remove nginx
# TAR โ tape archive (most common backup tool) tar -czf backup.tar.gz /data/ # compress directory (c=create, z=gzip, f=filename) tar -cjf backup.tar.bz2 /data/ # compress with bzip2 (better compression) tar -xzf backup.tar.gz # extract (x=extract, z=gzip) tar -xzf backup.tar.gz -C /restore/ # extract to specific directory tar -tzf backup.tar.gz # list contents without extracting # RSYNC โ efficient file sync (only transfers changes) rsync -avz /local/dir/ user@host:/remote/dir/ # sync to remote (a=archive, v=verbose, z=compress) rsync -avz --delete /src/ /dst/ # sync and delete files not in source rsync -avz --exclude="*.log" /src/ /dst/ # exclude log files rsync -avz --dry-run /src/ /dst/ # preview without executing # SCP โ secure copy (simpler than rsync) scp file.txt user@host:/path/ # copy file to remote scp user@host:/path/file.txt . # copy from remote scp -r /local/dir user@host:/remote/ # copy directory recursively # DD โ disk image backup dd if=/dev/sda of=/backup/disk.img bs=4M status=progress # full disk image dd if=/backup/disk.img of=/dev/sda bs=4M # restore disk image
# Journald โ systemd journal (logs) journalctl -u nginx # logs for nginx service journalctl -u nginx -f # follow nginx logs in real-time journalctl -u nginx --since "1 hour ago" journalctl -u nginx --since "2024-01-01" --until "2024-01-02" journalctl -p err # only errors journalctl -b # logs since last boot journalctl --disk-usage # how much disk journal uses # Traditional log files tail -f /var/log/syslog # Ubuntu: follow system log tail -f /var/log/messages # RHEL: system messages tail -f /var/log/nginx/access.log # nginx access log tail -f /var/log/nginx/error.log # nginx error log tail -f /var/log/cloud-init.log # EC2 user-data script execution log # System monitoring commands vmstat 1 5 # virtual memory stats every 1s for 5 iterations iostat -x 1 # I/O statistics per device sar -u 5 3 # CPU usage report (5s interval, 3 times) netstat -tuln # open ports listening ss -tuln # modern replacement for netstat lsof -i :80 # what process is using port 80
# Block device management (critical for EBS volumes on EC2) lsblk # list block devices (disks/partitions) lsblk -f # with filesystem info fdisk -l # detailed partition table info blkid # show UUIDs of block devices # Create filesystem and mount (new EBS volume workflow) sudo fdisk /dev/xvdb # partition the disk (optional for small disks) sudo mkfs.ext4 /dev/xvdb # format as ext4 sudo mkfs.xfs /dev/xvdb # format as xfs (Amazon Linux default) sudo mkdir -p /mnt/data # create mount point sudo mount /dev/xvdb /mnt/data # mount the disk df -h # verify mount and available space # Persistent mount (survives reboot) โ add to /etc/fstab echo "UUID=$(blkid -s UUID -o value /dev/xvdb) /mnt/data ext4 defaults,nofail 0 2" | sudo tee -a /etc/fstab sudo mount -a # test fstab entries sudo umount /mnt/data # unmount
# Network interface info
ip addr show # show all interfaces and IPs
ip addr show eth0 # specific interface
ifconfig # older command (same as ip addr)
ip link show # link layer info (MAC address)
# Routing
ip route show # show routing table
ip route add 10.0.0.0/8 via 10.0.1.1 # add static route
route -n # older routing table command
# Connectivity testing
ping -c 4 google.com # send 4 ICMP packets
ping 8.8.8.8 # ping Google DNS
traceroute google.com # trace packet path
mtr google.com # combined ping + traceroute (real-time)
# DNS lookup
nslookup google.com # basic DNS lookup
dig google.com # detailed DNS info
dig google.com MX # look up MX records
dig @8.8.8.8 google.com # query specific DNS server
host google.com # simple name resolution
# HTTP/HTTPS testing
curl https://api.example.com # fetch URL content
curl -I https://example.com # fetch headers only
curl -o file.zip https://example.com/file.zip # download file
wget https://example.com/file.zip # download file (alternative to curl)
curl -X POST -H "Content-Type: application/json" -d '{"key":"value"}' https://api.example.com
# Port and connection info
netstat -tuln # TCP/UDP listening ports
ss -tuln # same, modern version
ss -tnp # connections with process names
lsof -i TCP:80 # what's using port 80
telnet host 3306 # test if port is reachable
nc -zv host 3306 # netcat port test (preferred over telnet)
Amazon EC2 (Elastic Compute Cloud) provides resizable virtual computing capacity (virtual machines) in the cloud. EC2 gives you complete control: choose the OS, configure networking, manage security, attach storage, and install any software you need. It is the backbone of AWS โ almost every architecture involves EC2 or services built on EC2.
| Family | Optimized For | Instance Types | When to Use |
|---|---|---|---|
| General Purpose | Balanced CPU/RAM/Network | t3, t4g, m5, m6i, m7i | Web servers, dev/test, small DBs, microservices |
| Compute Optimized | High-performance CPU | c5, c6g, c7g, c6i | Batch processing, media encoding, gaming servers, HPC |
| Memory Optimized | Large in-memory datasets | r5, r6i, x2idn, z1d, u-6tb1 | Big data, in-memory DBs (Redis/SAP HANA), real-time analytics |
| Storage Optimized | High sequential I/O read/write | i3, i4i, d2, h1, im4gn | NoSQL DBs, data warehouses, log processing, Hadoop |
| Accelerated Computing | GPU/FPGA hardware | p4, g5, f1, inf2, trn1 | ML training/inference, scientific computing, video rendering |
| HPC Optimized | Extreme compute + networking | hpc6a, hpc7g | High Performance Computing clusters, CFD, molecular dynamics |
Understanding how to read an instance type name: m5.2xlarge
chmod 400 key.pemssh -i key.pem [email protected]An AMI is a pre-configured template that provides the information required to launch an EC2 instance. It contains: the OS, application server, any pre-installed applications, configuration, and EBS snapshot(s).
| AMI Type | Source | Cost | Use Case |
|---|---|---|---|
| AWS-Provided | Amazon maintains these | Free (just EC2 cost) | Amazon Linux 2/2023, Ubuntu, Windows Server, RHEL |
| AWS Marketplace | Third-party vendors | License fee + EC2 | LAMP stacks, NGINX Plus, SAP, security appliances |
| Community AMIs | Other AWS users | Free (community) | Public images shared by the community (use with caution) |
| Custom AMIs | You create from existing EC2 | Storage cost (EBS snapshots) | "Golden images" โ pre-configured with your software for fast Auto Scaling |
By default, when you stop and start an EC2 instance, it gets a new public IP address. Elastic IP is a static, public IPv4 address that stays the same regardless of instance state.
# Allocate and associate Elastic IP using AWS CLI aws ec2 allocate-address --domain vpc # allocate EIP aws ec2 associate-address \ --instance-id i-1234567890abcdef0 \ --allocation-id eipalloc-12345678 # associate to instance aws ec2 disassociate-address --association-id eipassoc-xxx # disassociate aws ec2 release-address --allocation-id eipalloc-xxx # release (delete EIP)
Placement Groups control how EC2 instances are physically placed on AWS hardware to optimize performance or availability:
Key pairs are used for secure authentication to EC2 instances. They use asymmetric cryptography โ AWS stores the public key on the EC2 instance, and you keep the private key (.pem file) on your local machine.
chmod 400 mykey.pem (otherwise SSH rejects the key)# SSH with key pair chmod 400 mykey.pem # required permission (400 = owner read-only) ssh -i mykey.pem <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3c595f0e11494f594e7c0908120d0e120f0812090a">[email protected]</a> # connect to Amazon Linux ssh -i mykey.pem <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="abdec9dec5dfdeeb9e9f859a9985989f859e9d">[email protected]</a> # connect to Ubuntu ssh -i mykey.pem -p 2222 <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a8cdcb9a85dddbcddae89d9c86999a869b9c869d9e">[email protected]</a> # custom port # Lost your key pair? Recovery steps: # 1. Stop instance # 2. Detach root EBS volume # 3. Attach volume to another "helper" EC2 instance as /dev/xvdf # 4. Mount: sudo mount /dev/xvdf1 /mnt/recovery # 5. Add your new public key to: /mnt/recovery/home/ec2-user/.ssh/authorized_keys # 6. Unmount, detach, reattach to original instance, start it
Security Groups are stateful virtual firewalls at the instance level. They control inbound and outbound traffic to/from EC2 instances.
| Feature | Security Group | Network ACL (NACL) |
|---|---|---|
| Level | Instance (ENI) level | Subnet level |
| Rule type | Allow rules ONLY | Allow AND Deny rules |
| Stateful? | YES โ return traffic auto-allowed | NO โ must define both directions explicitly |
| Rule evaluation | All rules evaluated; most permissive wins | Rules evaluated in number order; first match wins |
| Default behavior | All inbound denied; all outbound allowed | Default NACL: all allowed. Custom NACL: all denied. |
| Scope | Can be assigned to multiple instances | Applies to all instances in the subnet |
| Volume | Type | Max IOPS | Max Throughput | Max Size | Use Case |
|---|---|---|---|---|---|
| gp3 | SSD | 16,000 | 1,000 MB/s | 16 TiB | Boot volumes, dev/test, low-latency interactive apps |
| gp2 | SSD (older) | 16,000 | 250 MB/s | 16 TiB | Legacy workloads (migrate to gp3 โ cheaper + better) |
| io1 | Provisioned IOPS SSD | 64,000 | 1,000 MB/s | 16 TiB | I/O-intensive databases |
| io2 | Provisioned IOPS SSD | 64,000 | 1,000 MB/s | 16 TiB | Critical databases (99.999% durability) |
| io2 Block Express | Provisioned IOPS SSD | 256,000 | 4,000 MB/s | 64 TiB | SAP HANA, Oracle RAC, highest performance |
| st1 | Throughput HDD | 500 | 500 MB/s | 16 TiB | Big data, log processing, streaming workloads |
| sc1 | Cold HDD | 250 | 250 MB/s | 16 TiB | Infrequent access archives, lowest cost HDD |
# EBS Snapshot operations (AWS CLI) # Create snapshot aws ec2 create-snapshot \ --volume-id vol-12345678 \ --description "Daily backup $(date +%Y-%m-%d)" # Copy snapshot to another region aws ec2 copy-snapshot \ --source-region us-east-1 \ --source-snapshot-id snap-12345678 \ --region ap-south-1 \ --description "Cross-region copy" # Create volume from snapshot (in different AZ) aws ec2 create-volume \ --snapshot-id snap-12345678 \ --availability-zone ap-south-1b \ --volume-type gp3
User Data is a script that runs automatically when an EC2 instance is launched for the first time (first boot only by default). It runs as root user.
#!/bin/bash # This runs at FIRST BOOT as root set -e # exit on any error exec > /var/log/user-data.log 2>&1 # redirect output to log file yum update -y yum install -y httpd php mysql git systemctl start httpd systemctl enable httpd # Create a simple webpage cat > /var/www/html/index.html << 'EOF' <html><body> <h1>Hello from EC2!</h1> <p>Instance ID: $(curl -s http://169.254.169.254/latest/meta-data/instance-id)</p> </body></html> EOF echo "User data completed successfully"
Instance Metadata is information about the running instance accessible from within the instance at the special IP 169.254.169.254. This is a link-local address โ only reachable from within the instance itself.
# Instance Metadata Service (IMDS) โ v1 (simpler) curl http://169.254.169.254/latest/meta-data/ # list all metadata categories curl http://169.254.169.254/latest/meta-data/instance-id # get instance ID curl http://169.254.169.254/latest/meta-data/instance-type # get instance type curl http://169.254.169.254/latest/meta-data/public-ipv4 # get public IP curl http://169.254.169.254/latest/meta-data/local-ipv4 # get private IP curl http://169.254.169.254/latest/meta-data/hostname # get hostname curl http://169.254.169.254/latest/meta-data/placement/region # get region curl http://169.254.169.254/latest/meta-data/iam/security-credentials/MyRole # get IAM role temp creds # Instance Metadata Service v2 (IMDSv2) โ more secure (token-based) TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" \ -H "X-aws-ec2-metadata-token-ttl-seconds: 21600") curl -H "X-aws-ec2-metadata-token: $TOKEN" \ http://169.254.169.254/latest/meta-data/instance-id # User data (view the script that ran) curl http://169.254.169.254/latest/user-data
A Launch Template stores the full EC2 instance configuration. It is the recommended way to define configurations for Auto Scaling Groups and EC2 Fleet.
A Load Balancer sits in front of your servers and distributes incoming traffic across multiple targets (EC2 instances, containers, Lambda functions, or IP addresses) in multiple Availability Zones. It continuously monitors the health of registered targets and routes traffic only to healthy ones.
| Type | OSI Layer | Protocols | Key Features | Best For |
|---|---|---|---|---|
| ALB (Application) | Layer 7 (Application) | HTTP, HTTPS, WebSocket, HTTP/2 | Path/host/header routing, WAF integration, Lambda targets, sticky sessions | Web apps, microservices, REST APIs, containers |
| NLB (Network) | Layer 4 (Transport) | TCP, UDP, TLS | Ultra-high performance, static IP per AZ, preserves source IP, TLS termination | Gaming, IoT, real-time trading, VPC Endpoint Services |
| GLB (Gateway) | Layer 3+4 (Network) | GENEVE (6081) | Transparent bump-in-the-wire traffic inspection, scales third-party appliances | Firewalls, IDS/IPS, DPI appliances |
| CLB (Classic) | Layer 4/7 | HTTP, HTTPS, TCP, SSL | Legacy service being deprecated | Old EC2-Classic apps (migrate to ALB/NLB) |
ALB operates at Layer 7 (HTTP/HTTPS) and makes routing decisions based on request content.
/api/* โ API servers, /images/* โ Image servers, / โ Main appapp.example.com โ App servers, api.example.com โ API servers?version=mobile โ mobile-optimized servers)# ALB Rule example (in AWS Console / CLI):
# IF path is /api/* โ Forward to API-TG (api target group)
# IF path is /static/* โ Forward to S3 bucket origin
# IF path starts with /admin AND source IP is 10.0.0.0/8 โ Forward to Admin-TG
# IF host is mobile.example.com โ Redirect to https://m.example.com/#{path}
# DEFAULT โ Forward to Web-TG
A Target Group is a logical grouping of targets that receives requests from a Load Balancer. Each listener rule points to a Target Group.
| Target Type | What It Is | Use Case |
|---|---|---|
| Instance | EC2 instances by instance ID | Traditional EC2 workloads |
| IP Address | Specific IP addresses (private IPs in VPC or on-premises) | Containers with dynamic ports, on-premises servers via Direct Connect/VPN |
| Lambda Function | A Lambda function (ALB only) | Serverless backends, event-driven apps |
| ALB | Another ALB (NLB only) | When you need NLB's static IP but ALB's HTTP routing |
Health checks run continuously. Unhealthy targets are removed from rotation until they recover. You configure:
Sticky sessions ensure a user's requests go to the SAME target throughout a session. Useful for stateful apps that store session data locally on the instance.
With cross-zone load balancing, each LB node distributes traffic evenly across ALL registered instances in ALL enabled AZs.
AWS billing is usage-based โ you pay only for what you use, when you use it. There are no upfront costs for most services. Understanding billing is critical to avoid unexpected charges.
| Service | Free Tier Amount | Duration | Type |
|---|---|---|---|
| EC2 | 750 hours/month t2.micro or t3.micro (Linux and Windows separately) | 12 months | New accounts |
| S3 | 5 GB storage, 20,000 GET requests, 2,000 PUT requests | 12 months | New accounts |
| RDS | 750 hours/month db.t2.micro or db.t3.micro Single-AZ | 12 months | New accounts |
| Lambda | 1 million requests + 400,000 GB-seconds compute time | Always free | Perpetual |
| CloudWatch | 10 custom metrics, 10 alarms, 5 GB log data | Always free | Perpetual |
| SNS | 1 million publishes, 100,000 HTTP deliveries | Always free | Perpetual |
| DynamoDB | 25 GB storage, 25 WCUs + 25 RCUs (enough for ~200M requests/month) | Always free | Perpetual |
CloudWatch Alarms watch a single metric and perform one or more actions when that metric breaches a threshold over a specified number of evaluation periods.
| Alarm State | Meaning | When It Occurs |
|---|---|---|
| OK | Metric is within the defined threshold | Metric is healthy |
| ALARM | Metric has breached the threshold for specified periods | Action is triggered |
| INSUFFICIENT_DATA | Not enough data points to determine state | Service just started, metric gap, new alarm |
| Metric | Description | Monitoring Period | Notes |
|---|---|---|---|
| CPUUtilization | % of allocated EC2 compute units in use | Basic: 5 min, Detailed: 1 min | Available by default |
| NetworkIn / NetworkOut | Bytes received/sent on all network interfaces | Basic: 5 min | Available by default |
| NetworkPacketsIn/Out | Packets received/sent | Basic: 5 min | Available by default |
| DiskReadOps / DiskWriteOps | IOPS completed for instance store | Basic: 5 min | Instance store only (not EBS) |
| DiskReadBytes / DiskWriteBytes | Bytes read/written to instance store | Basic: 5 min | Instance store only |
| StatusCheckFailed_Instance | Instance OS/software failure | 1 min | Action: reboot/recover |
| StatusCheckFailed_System | AWS physical host failure | 1 min | Action: recover (migrates to new host) |
| MemoryUtilization* | % of RAM in use | Custom metric | Requires CloudWatch Agent! |
| DiskSpaceUtilization* | % of disk used | Custom metric | Requires CloudWatch Agent! |
# Set up billing alarm (must be in us-east-1 region) # Step 1: Enable billing alerts in Billing โ Billing Preferences โ Receive Billing Alerts # Step 2: Create SNS topic for notification aws sns create-topic --name billing-alerts --region us-east-1 # Step 3: Subscribe your email to topic aws sns subscribe \ --topic-arn arn:aws:sns:us-east-1:123456789:billing-alerts \ --protocol email \ --notification-endpoint <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="5c2533292e1c39313d3530723f3331">[email protected]</a> # Step 4: Create CloudWatch alarm (ONLY works in us-east-1) aws cloudwatch put-metric-alarm \ --alarm-name "Monthly-Bill-Exceeds-10USD" \ --metric-name EstimatedCharges \ --namespace AWS/Billing \ --statistic Maximum \ --period 86400 \ --threshold 10 \ --comparison-operator GreaterThanThreshold \ --evaluation-periods 1 \ --alarm-actions arn:aws:sns:us-east-1:123456789:billing-alerts \ --dimensions Name=Currency,Value=USD \ --region us-east-1
EC2 Auto Scaling automatically adds or removes EC2 instances based on demand conditions you define. It ensures your application always has the right number of instances available to handle load, provides fault tolerance by replacing unhealthy instances, and optimizes costs by removing unnecessary instances.
| Policy Type | How It Works | Trigger | Best For |
|---|---|---|---|
| Manual Scaling | Manually change desired capacity in console or CLI | Human action | Planned events, maintenance windows, fixed capacity |
| Simple Scaling | One action per alarm breach. Waits for cooldown before next action. | CloudWatch alarm | Simple workloads (legacy, prefer Step/Target) |
| Step Scaling | Different actions based on HOW FAR metric is from threshold | CloudWatch alarm | When you need proportional response to varying load |
| Target Tracking | Automatically scale to keep a metric at a target value | Metric target value | Most workloads โ simplest and most effective |
| Scheduled Scaling | Scale based on time (cron expression) | Date/time schedule | Known traffic patterns (business hours, weekly peaks) |
| Predictive Scaling | ML model predicts future load and pre-scales proactively | ML forecast | Recurring cyclical patterns (daily, weekly) |
The most commonly used scaling policy. You specify a target value for a metric and Auto Scaling creates CloudWatch alarms automatically to scale in/out to maintain the target.
| Predefined Metric | Description | Common Target |
|---|---|---|
| ASGAverageCPUUtilization | Average CPU across all instances in the ASG | 50-70% |
| ALBRequestCountPerTarget | Number of requests per instance from ALB | 1000 req/instance |
| ASGAverageNetworkIn | Average network bytes in per instance | Depends on app |
| ASGAverageNetworkOut | Average network bytes out per instance | Depends on app |
# Target Tracking: Keep average CPU at 50% # ASG will automatically: # - Add instances if CPU goes above 50% # - Remove instances if CPU drops below ~45% (built-in buffer) # You don't write alarm rules โ AWS manages them automatically
Define multiple scaling steps based on how much the metric breaches the threshold. More granular control than Simple Scaling. Does NOT wait for cooldown between steps.
# Example Step Scaling Configuration:
# Scale OUT (add capacity):
# CPU 50-60% โ add 1 instance
# CPU 60-75% โ add 2 instances
# CPU 75-90% โ add 3 instances
# CPU > 90% โ add 4 instances
# Scale IN (remove capacity):
# CPU 40-50% โ remove 1 instance
# CPU 30-40% โ remove 2 instances
# CPU < 30% โ remove 3 instances
aws autoscaling put-scaling-policy \
--auto-scaling-group-name my-asg \
--policy-name scale-out-policy \
--policy-type StepScaling \
--step-adjustments MetricIntervalLowerBound=0,MetricIntervalUpperBound=10,ScalingAdjustment=1 \
MetricIntervalLowerBound=10,MetricIntervalUpperBound=25,ScalingAdjustment=2 \
MetricIntervalLowerBound=25,ScalingAdjustment=3 \
--adjustment-type ChangeInCapacity \
--metric-aggregation-type Average
# Scale up Mon-Fri at 8 AM IST (2:30 AM UTC) aws autoscaling put-scheduled-update-group-action \ --auto-scaling-group-name my-asg \ --scheduled-action-name scale-up-mornings \ --recurrence "30 2 * * 1-5" \ --min-size 4 --max-size 20 --desired-capacity 8 # Scale down Mon-Fri at 8 PM IST (2:30 PM UTC) aws autoscaling put-scheduled-update-group-action \ --auto-scaling-group-name my-asg \ --scheduled-action-name scale-down-evenings \ --recurrence "30 14 * * 1-5" \ --min-size 2 --max-size 10 --desired-capacity 2 # Scale up for expected traffic spike (one-time) aws autoscaling put-scheduled-update-group-action \ --auto-scaling-group-name my-asg \ --scheduled-action-name pre-event-scale \ --start-time "2024-03-01T02:00:00Z" \ --desired-capacity 20
| Termination Policy | How ASG Decides Which Instance to Terminate |
|---|---|
| Default | Oldest launch config/template โ oldest instance in that config โ closest to billing hour |
| OldestInstance | Terminates the oldest instance in the group |
| NewestInstance | Terminates the newest instance (useful for rolling updates testing) |
| OldestLaunchTemplate | Terminates instances using oldest launch template (good for rolling updates) |
| ClosestToNextInstanceHour | Terminates instance closest to next billing hour (cost optimization) |
EBS provides persistent block-level storage for EC2 instances. Think of it as a network-attached hard drive. When you terminate an EC2 instance, the EBS root volume is deleted by default (configurable), but additional EBS volumes persist. EBS volumes are automatically replicated within their AZ.
| Volume Type | IOPS | Throughput | Size | Multi-Attach | Use Case |
|---|---|---|---|---|---|
| gp3 (General SSD) | 3,000โ16,000 | 125โ1,000 MB/s | 1 GiBโ16 TiB | No | Boot volumes, dev/test, small/medium databases, virtual desktops |
| io2 (Provisioned IOPS SSD) | 100โ64,000 | 1,000 MB/s | 4 GiBโ16 TiB | Yes (same AZ) | I/O-intensive databases: MySQL, Oracle, SQL Server |
| io2 Block Express | up to 256,000 | 4,000 MB/s | 4 GiBโ64 TiB | Yes | SAP HANA, Oracle RAC, mission-critical workloads |
| st1 (Throughput HDD) | 500 max | 500 MB/s | 125 GiBโ16 TiB | No | Big data, data warehouses, log processing, Hadoop |
| sc1 (Cold HDD) | 250 max | 250 MB/s | 125 GiBโ16 TiB | No | Cold data requiring few scans/day. Cheapest option. |
# How to encrypt an existing UNENCRYPTED EBS volume: # Direct encryption of existing volume is NOT possible โ must use this workaround: # Step 1: Create a snapshot of the unencrypted volume aws ec2 create-snapshot --volume-id vol-unencrypted --description "Pre-encryption backup" # Step 2: Copy the snapshot with encryption enabled aws ec2 copy-snapshot \ --source-region ap-south-1 \ --source-snapshot-id snap-unencrypted \ --encrypted \ --kms-key-id arn:aws:kms:ap-south-1:123:key/your-key # Step 3: Create a new encrypted volume from the encrypted snapshot aws ec2 create-volume --snapshot-id snap-encrypted --volume-type gp3 \ --availability-zone ap-south-1a # Step 4: Detach old volume, attach new encrypted volume to instance # Step 5: Update /etc/fstab if needed
# Step 1: Verify the volume is attached lsblk # shows: xvda (root), xvdb (new unformatted) lsblk -f # check if filesystem exists # Step 2: Create filesystem (first time only โ destroys existing data!) sudo mkfs.ext4 /dev/xvdb # format as ext4 # OR sudo mkfs.xfs /dev/xvdb # format as xfs (Amazon Linux default) # Step 3: Create mount point sudo mkdir -p /data # Step 4: Mount the volume sudo mount /dev/xvdb /data df -h # verify it's mounted and available space # Step 5: Make it permanent โ add to /etc/fstab # Get UUID first (better than device name โ device names can change) sudo blkid /dev/xvdb # shows UUID # Add to /etc/fstab (edit with: sudo nano /etc/fstab): # UUID=xxxx-xxxx /data ext4 defaults,nofail 0 2 # "nofail" is critical โ prevents boot failure if volume not attached # Test fstab entry sudo umount /data sudo mount -a # mounts everything in fstab df -h # verify
EFS is a fully managed, scalable, shared file system (NFS - Network File System) for Linux workloads. Unlike EBS (one instance at a time), EFS can be mounted concurrently by thousands of EC2 instances across multiple AZs simultaneously. It automatically grows and shrinks as you add/remove files โ no capacity management needed.
| Feature | EFS (Elastic File System) | EBS (Elastic Block Store) | S3 (Simple Storage) |
|---|---|---|---|
| Storage type | File (NFS) | Block | Object |
| Multi-instance access | YES โ thousands of instances | NO (one at a time, except Multi-Attach io2) | YES โ accessible from anywhere |
| Multi-AZ | YES (Standard, Regional) | NO โ single AZ only | YES โ minimum 3 AZs |
| OS support | Linux only (POSIX) | Linux and Windows | Any (HTTP API) |
| Mount as filesystem | YES (NFS mount) | YES (block device) | NO (not a filesystem) |
| Capacity management | Automatic (elastic) | Fixed (you provision) | Unlimited |
| Max size | Petabytes (auto-scale) | 64 TiB | Unlimited |
| Relative cost | ~3x gp2 EBS | Baseline | Cheapest per GB |
| Use case | Shared storage, CMS, home dirs, containers | Boot volumes, databases, app data | Backups, static assets, data lakes |
| Storage Class | Availability | Cost | Use Case |
|---|---|---|---|
| EFS Standard | Multi-AZ (3+ AZs) | $0.30/GB/month | Frequently accessed files |
| EFS Standard-IA | Multi-AZ | $0.025/GB/month + retrieval | Infrequent access (save 92% vs Standard) |
| EFS One Zone | Single AZ | $0.153/GB/month | Dev/test, non-critical data (20% cheaper) |
| EFS One Zone-IA | Single AZ | $0.0133/GB/month | Dev/test infrequent access (cheapest) |
EFS Lifecycle Management: Automatically moves files to Standard-IA after they haven't been accessed for 7, 14, 30, 60, or 90 days. Files moved back to Standard on access. Reduces storage costs significantly for mixed workloads.
# Install EFS utilities (handles NFS mounting and TLS encryption) sudo yum install -y amazon-efs-utils # Amazon Linux sudo apt-get install amazon-efs-utils # Ubuntu # Mount using EFS mount helper (recommended โ supports encryption in transit) sudo mkdir -p /mnt/efs sudo mount -t efs -o tls fs-0123456789:/ /mnt/efs # with TLS encryption sudo mount -t efs fs-0123456789:/ /mnt/efs # without TLS # Mount specific directory/subdirectory sudo mount -t efs -o tls fs-0123456789:/myapp /mnt/app # Verify mount df -h /mnt/efs ls /mnt/efs # Auto-mount on reboot (/etc/fstab) fs-0123456789:/ /mnt/efs efs _netdev,tls,iam 0 0 # _netdev = wait for network before mounting # iam = use IAM for authorization # Mount using NFS directly (without efs utils) sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \ fs-0123456789.efs.ap-south-1.amazonaws.com:/ /mnt/efs
A Virtual Private Cloud (VPC) is your own logically isolated section of the AWS cloud. Think of it as your own private data center inside AWS โ you have complete control over your virtual networking environment including IP address ranges, subnets, route tables, and network gateways. Every AWS account gets a default VPC in each region so you can launch resources immediately.
| Concept | Description | Example |
|---|---|---|
| VPC | Isolated virtual network in a region. Spans all AZs in that region. | 10.0.0.0/16 (65,536 IPs) |
| Subnet | A subdivision of a VPC within a single AZ. Resources live in subnets. | 10.0.1.0/24 in ap-south-1a |
| Route Table | Set of rules (routes) that determine where network traffic is directed. | 0.0.0.0/0 โ IGW |
| Internet Gateway (IGW) | Allows communication between VPC and the internet. Horizontally scaled, HA, no bandwidth limits. | Attach to VPC for internet access |
| NAT Gateway | Allows private subnet resources to access internet but prevents inbound connections from internet. | Private EC2 downloading updates |
| Security Group | Virtual stateful firewall at instance level. Controls inbound/outbound traffic. | Allow port 80 from 0.0.0.0/0 |
| NACL | Stateless firewall at subnet level. Rules evaluated in order by number. | Deny rule 100: block bad IP |
| CIDR Block | IP address range assigned to VPC or subnet using CIDR notation. | 192.168.0.0/24 = 256 IPs |
Understanding IP addressing is fundamental to VPC design. AWS uses IPv4 CIDR notation where the number after the slash indicates how many bits are the network portion.
| CIDR | Total IPs | Usable IPs (AWS reserves 5) | Use Case |
|---|---|---|---|
| /16 | 65,536 | 65,531 | VPC (large enterprise) |
| /20 | 4,096 | 4,091 | Large subnet |
| /24 | 256 | 251 | Standard subnet |
| /28 | 16 | 11 | Small subnet (minimum for AWS) |
These IP ranges are not routable on the public internet โ they're used for private networks like VPCs. Always use these for VPC CIDR blocks.
10.0.0.0 โ 10.255.255.255 (10.0.0.0/8) โ Class A, 16M addresses 172.16.0.0 โ 172.31.255.255 (172.16.0.0/12) โ Class B, 1M addresses 192.168.0.0 โ 192.168.255.255 (192.168.0.0/16) โ Class C, 65K addresses # AWS default VPC always uses: 172.31.0.0/16 # Best practice for custom VPC: use 10.0.0.0/16 (avoids overlap with default)
The IGW is the door between your VPC and the public internet. It performs Network Address Translation (NAT) for instances with public IPs โ translating private IPs to public IPs for outbound traffic and vice versa for inbound.
# Public subnet route table Destination Target 10.0.0.0/16 local โ all VPC traffic stays local 0.0.0.0/0 igw-xxxxxxxx โ everything else goes to internet
NAT (Network Address Translation) Gateway allows EC2 instances in private subnets to initiate outbound connections to the internet (download patches, call APIs) while preventing the internet from initiating connections into your private instances.
# Private subnet route table Destination Target 10.0.0.0/16 local โ VPC traffic stays local 0.0.0.0/0 nat-xxxxxxxx โ internet via NAT Gateway
Flow Logs capture information about IP traffic going to/from network interfaces in your VPC. Essential for security analysis, troubleshooting, and compliance.
# Flow log record format: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status 2 123456789 eni-abc123 10.0.1.5 8.8.8.8 54321 443 6 10 5000 1609459200 1609459260 ACCEPT OK
An ENI is a virtual network card you can attach to EC2 instances. Every instance has at least one ENI (eth0 โ the primary). You can create additional ENIs and attach/detach them from instances.
Region: ap-south-1 VPC: 10.0.0.0/16 โโโ AZ: ap-south-1a AZ: ap-south-1b โ โโโ Public Subnet 10.0.1.0/24 Public Subnet 10.0.2.0/24 โ โ โโโ ALB node ALB node โ โ โโโ NAT Gateway (EIP) NAT Gateway (EIP) โ โโโ Private-App 10.0.11.0/24 Private-App 10.0.12.0/24 โ โ โโโ EC2 App Servers EC2 App Servers โ โโโ Private-DB 10.0.21.0/24 Private-DB 10.0.22.0/24 โ โโโ RDS Primary RDS Standby (Multi-AZ) โ โโโ Internet Gateway (attached to VPC) โโโ Public Route Table โ 0.0.0.0/0 to IGW โโโ Private Route Table โ 0.0.0.0/0 to NAT GW (per AZ) โโโ VPC Endpoints: S3 Gateway, DynamoDB Gateway (free!)
AWS gives you two layers of network security. Understanding when to use each is critical for both the exam and real-world architecture.
Security Groups act as virtual firewalls controlling traffic to/from EC2 instances. They're the primary and most-used security control in AWS.
# Security Group Rules โ key concepts: # Inbound: who can SEND traffic TO your instance # Outbound: where your instance can SEND traffic TO # Example: Web server SG Inbound Rules: Type Port Source Purpose HTTP 80 0.0.0.0/0 Allow all web traffic HTTPS 443 0.0.0.0/0 Allow all HTTPS traffic SSH 22 10.0.0.0/8 Allow SSH from internal only Outbound Rules: Type Port Destination Purpose All All 0.0.0.0/0 Allow all outbound (default) # SG referencing another SG (powerful pattern): # App server SG inbound: port 8080 source = web-server-SG-id # This means: only instances IN the web server SG can reach app server # No need to know IP addresses โ scales automatically
NACLs are the subnet-level firewall. Each subnet can only be associated with one NACL at a time. Rules are processed in ascending order โ first match wins.
| Rule # | Type | Protocol | Port | Source | Action |
|---|---|---|---|---|---|
| 100 | HTTP | TCP | 80 | 0.0.0.0/0 | ALLOW |
| 110 | HTTPS | TCP | 443 | 0.0.0.0/0 | ALLOW |
| 120 | Custom TCP | TCP | 1024-65535 | 0.0.0.0/0 | ALLOW โ ephemeral! |
| 200 | SSH | TCP | 22 | 1.2.3.4/32 | ALLOW |
| * | All traffic | All | All | 0.0.0.0/0 | DENY โ catch-all |
VPC Peering creates a direct, private network connection between two VPCs allowing instances to communicate as if they were in the same network โ using private IPs, no internet involved.
# VPC A (10.0.0.0/16) peered with VPC B (172.16.0.0/16) # VPC A route table must add: Destination Target 172.16.0.0/16 pcx-xxxxxxxxx โ peering connection to VPC B # VPC B route table must add: Destination Target 10.0.0.0/16 pcx-xxxxxxxxx โ peering connection to VPC A
VPC Endpoints allow private connectivity to AWS services without traffic leaving the AWS network โ no internet, no NAT Gateway, no extra cost per GB (for Gateway endpoints).
# S3 Gateway Endpoint - add to private route table:
Destination Target
pl-xxxxxxxx vpce-xxxxxxxx โ S3 prefix list โ endpoint
# No code change needed! Your existing S3 calls
# boto3.client('s3').upload_file(...) โ automatically uses endpoint
Transit Gateway is a network hub that connects thousands of VPCs, on-premises networks, and VPN connections through a single gateway. Instead of creating mesh of VPC peering connections, all VPCs connect to the TGW hub.
# Without TGW: 10 VPCs need 45 peering connections (n*(n-1)/2) # With TGW: 10 VPCs each connect once to TGW = 10 attachments TGW Attachments: โโโ VPC-A (prod) โโโ VPC-B (staging) โโโ VPC-C (shared-services) โโโ VPN Connection (on-premises data center) โโโ Direct Connect Gateway
Direct Connect establishes a dedicated physical network connection from your on-premises data center to AWS โ bypassing the public internet entirely for more consistent performance, lower latency, and reduced data transfer costs.
AWS Site-to-Site VPN creates an encrypted IPsec tunnel between your on-premises network and your AWS VPC over the public internet.
PrivateLink allows you to expose your service privately to other VPCs without peering, without public internet, and without exposing your entire VPC. It's the technology behind Interface VPC Endpoints.
Route 53 Resolver is the built-in DNS resolver that handles DNS queries from within your VPC. Understanding it is key for hybrid cloud DNS.
Amazon S3 is an object storage service โ not a filesystem, not a database. You store objects (files) in buckets. S3 provides 11 nines of durability (99.999999999%) by storing data across minimum 3 Availability Zones. S3 is accessed via HTTP/HTTPS API calls (PUT, GET, DELETE), not mounted as a filesystem.
| Class | Durability | Availability | AZs | Min Duration | Retrieval | Best For |
|---|---|---|---|---|---|---|
| S3 Standard | 11 9s | 99.99% | โฅ3 | None | Milliseconds (free) | Frequently accessed data, websites, mobile apps |
| S3 Intelligent-Tiering | 11 9s | 99.9% | โฅ3 | None | Millisecondsโhours | Unknown or changing access patterns |
| S3 Standard-IA | 11 9s | 99.9% | โฅ3 | 30 days | Milliseconds (per GB fee) | Disaster recovery, backups accessed monthly |
| S3 One Zone-IA | 11 9s | 99.5% | 1 | 30 days | Milliseconds (per GB fee) | Non-critical infrequent data. 20% cheaper than Standard-IA. |
| Glacier Instant | 11 9s | 99.9% | โฅ3 | 90 days | Milliseconds (per GB fee) | Archives accessed once a quarter |
| Glacier Flexible | 11 9s | 99.9% | โฅ3 | 90 days | 1-5 min (expedited), 3-5 hrs (standard), 5-12 hrs (bulk) | Archives accessed 1-2 times/year |
| Glacier Deep Archive | 11 9s | 99.9% | โฅ3 | 180 days | 12 hrs (standard), 48 hrs (bulk) | Compliance archives, 7-10 year retention |
Versioning stores multiple versions of the same object in a bucket. Every upload creates a new version ID. This protects against accidental overwrites and deletes.
# Make specific objects publicly readable
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-bucket/*"
}]
}
# Force HTTPS only (deny HTTP)
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"],
"Condition": {
"Bool": { "aws:SecureTransport": "false" }
}
}]
}
# Allow specific IAM role to access bucket
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::123456789:role/AppRole" },
"Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
"Resource": "arn:aws:s3:::my-bucket/*"
}]
}
Block Public Access is a safety net that prevents S3 buckets from being accidentally made public. Enabled by default on all new buckets and at the account level.
# Create bucket aws s3 mb s3://my-unique-bucket-name --region ap-south-1 # Upload file aws s3 cp myfile.txt s3://my-bucket/ aws s3 cp myfile.txt s3://my-bucket/folder/renamed.txt # Download file aws s3 cp s3://my-bucket/myfile.txt ./localfile.txt # List bucket contents aws s3 ls s3://my-bucket/ aws s3 ls s3://my-bucket/ --recursive # list all files including subdirs # Sync (only copies new or modified files) aws s3 sync ./local-folder/ s3://my-bucket/ aws s3 sync s3://source-bucket/ s3://dest-bucket/ # Delete file aws s3 rm s3://my-bucket/myfile.txt aws s3 rm s3://my-bucket/ --recursive # delete all objects (careful!) # Make object public aws s3api put-object-acl --bucket my-bucket --key file.txt --acl public-read
S3 automatically partitions data based on key prefixes for performance. AWS can handle 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix.
# Single prefix = limited to 5,500 GET/s s3://bucket/2024/all-files... # all under same prefix = limited # Multiple prefixes = multiply performance s3://bucket/2024/q1/file # prefix 1: 5,500 GET/s s3://bucket/2024/q2/file # prefix 2: 5,500 GET/s s3://bucket/2024/q3/file # prefix 3: 5,500 GET/s s3://bucket/2024/q4/file # prefix 4: 5,500 GET/s # Total: 22,000 GET/s with 4 prefixes! # Tip: Randomize prefixes to avoid hotspots (old advice for SSE-KMS uploads) # Modern S3 handles random keys well natively
# Multipart upload via CLI (handled automatically) aws s3 cp largefile.iso s3://my-bucket/ --expected-size 4294967296 # Or specify multipart threshold and chunk size aws configure set default.s3.multipart_threshold 64MB aws configure set default.s3.multipart_chunksize 16MB # Clean up incomplete multipart uploads aws s3api list-multipart-uploads --bucket my-bucket aws s3api abort-multipart-upload \ --bucket my-bucket \ --key object-key \ --upload-id upload-id
S3 Transfer Acceleration speeds up long-distance uploads to S3 by routing through AWS CloudFront Edge Locations. Instead of uploading directly to S3, data goes to the nearest Edge Location, then travels over AWS backbone to S3.
bucket.s3-accelerate.amazonaws.comReplication automatically copies objects between S3 buckets, either within the same region or across regions.
| Feature | CRR (Cross-Region) | SRR (Same-Region) |
|---|---|---|
| Purpose | Compliance, lower latency, cross-account backups | Log aggregation, data sharing, test/prod sync |
| Data transfer cost | Yes โ inter-region charges | No extra charges |
| Latency | Near real-time (asynchronous) | Near real-time (asynchronous) |
| Versioning | Required on both source and destination | Required on both |
Lifecycle policies automate transitioning objects between storage classes and expiring old objects/versions. Reduces storage costs significantly.
# Typical lifecycle policy example: # Day 0: Upload to S3 Standard # Day 30: Transition to S3 Standard-IA # Day 90: Transition to S3 Glacier Flexible Retrieval # Day 365: Transition to S3 Glacier Deep Archive # Day 2555 (7 years): Delete permanently # Also useful for: # - Expire incomplete multipart uploads after 7 days # - Delete old versions after 30 days (with versioning enabled) # - Delete expired object delete markers
| Encryption Type | Key Management | Header Required | Notes |
|---|---|---|---|
| SSE-S3 (default since Jan 2023) | AWS manages keys entirely. AES-256. | x-amz-server-side-encryption: AES256 | No configuration needed. Automatic on all new objects. |
| SSE-KMS | AWS KMS. You choose CMK. | x-amz-server-side-encryption: aws:kms | Audit trail in CloudTrail. KMS API quota limits. Use S3 Bucket Keys to reduce API calls. |
| SSE-C | You provide the key with EVERY request. | Key in request header | MUST use HTTPS. AWS doesn't store the key. You lose key = you lose data. |
| Client-Side Encryption | You encrypt before uploading. Complete control. | N/A โ encrypted before upload | AWS never sees plaintext. Use AWS Encryption SDK or your own solution. |
www.example.com)S3 can send event notifications when specific events occur on objects (create, delete, restore, replication).
| Destination | Use Case | Latency |
|---|---|---|
| SNS Topic | Fan-out to multiple systems, email alerts | Seconds |
| SQS Queue | Decouple processing, retry failed events | Seconds |
| Lambda Function | Process objects on upload (resize, validate, extract) | Seconds |
| EventBridge | Advanced filtering, 20+ targets, archive/replay events | Seconds |
# Example: Trigger Lambda when image is uploaded to /images/ prefix # S3 Event: ObjectCreated (PUT, POST, COPY) # Filter: Prefix = images/, Suffix = .jpg # Destination: Lambda function ARN # Common use case: Image processing pipeline # 1. User uploads image to S3 (s3://my-bucket/images/photo.jpg) # 2. S3 sends event notification to Lambda # 3. Lambda reads original image from S3 # 4. Lambda resizes to multiple dimensions # 5. Lambda writes thumbnails back to S3 (s3://my-bucket/thumbnails/)
When another AWS account needs to access your S3 bucket, you have three main approaches:
Add a bucket policy to Account A's bucket that grants permissions to Account B's users/roles.
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "CrossAccountAccess",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::ACCOUNT-B-ID:root", // all of Account B
"arn:aws:iam::ACCOUNT-B-ID:role/SpecificRole" // or just a specific role
]
},
"Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::account-a-bucket",
"arn:aws:s3:::account-a-bucket/*"
]
}]
}
// Account B users ALSO need IAM permission to make the S3 calls
# Account A creates IAM Role with S3 access + trust policy:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::ACCOUNT-B-ID:root" },
"Action": "sts:AssumeRole"
}]
}
# Account B user assumes the role:
aws sts assume-role \
--role-arn "arn:aws:iam::ACCOUNT-A-ID:role/S3AccessRole" \
--role-session-name "cross-account-session"
# Returns: AccessKeyId, SecretAccessKey, SessionToken (valid 1 hour)
# Use temporary credentials to access Account A's S3:
AWS_ACCESS_KEY_ID=xxx AWS_SECRET_ACCESS_KEY=yyy AWS_SESSION_TOKEN=zzz \
aws s3 ls s3://account-a-bucket/
Generate a time-limited URL that grants temporary access to a specific S3 object. The URL includes authentication information embedded in it. Anyone with the URL can access the object for the duration.
# Generate pre-signed URL for downloading (valid 1 hour)
aws s3 presign s3://my-bucket/private-report.pdf --expires-in 3600
# Output: https://my-bucket.s3.amazonaws.com/private-report.pdf?X-Amz-Algorithm=...&X-Amz-Expires=3600&...
# Generate pre-signed URL for uploading (PUT)
aws s3 presign s3://my-bucket/upload-here.jpg --expires-in 7200
# Python example
import boto3
s3 = boto3.client('s3')
url = s3.generate_presigned_url(
'get_object',
Params={'Bucket': 'my-bucket', 'Key': 'private-file.pdf'},
ExpiresIn=3600
)
Access Points are named network endpoints attached to a bucket, each with their own permissions policy. Instead of a single complex bucket policy managing hundreds of users, create one Access Point per use case.
# Example: Data lake bucket accessed by multiple teams # Instead of one complex bucket policy: # Create separate access points: # - data-scientists-ap: allow read/write to /analytics/ prefix only # - finance-ap: allow read to /finance/ prefix only # - dev-team-ap: allow read/write to /dev/ prefix only, VPC-only access aws s3control create-access-point \ --account-id 123456789012 \ --name data-scientists-ap \ --bucket my-data-lake \ --vpc-configuration VpcId=vpc-12345678 # VPC-only access # Access point ARN: arn:aws:s3:region:account:accesspoint/data-scientists-ap # Use access point ARN anywhere you'd use a bucket name in S3 API calls
S3 Object Lambda adds your code to process data retrieved from S3 before returning it to the requesting application. Data is modified on-the-fly without storing multiple versions.
# Lambda function for S3 Object Lambda (redact PII)
import boto3, re, json
s3_client = boto3.client('s3')
def lambda_handler(event, context):
# Get object from S3
object_get_context = event["getObjectContext"]
request_route = object_get_context["outputRoute"]
request_token = object_get_context["outputToken"]
s3_url = object_get_context["inputS3Url"]
# Retrieve original object
response = requests.get(s3_url)
original_content = response.text
# Redact SSNs (pattern: XXX-XX-XXXX)
redacted = re.sub(r'\d{3}-\d{2}-\d{4}', 'XXX-XX-XXXX', original_content)
# Return modified content
s3_client.write_get_object_response(
Body=redacted,
RequestRoute=request_route,
RequestToken=request_token
)
return {'status_code': 200}
MFA requires users to provide two forms of authentication: something they know (password) and something they have (MFA device). Even if password is stolen, attacker can't log in without the MFA device.
| MFA Type | Description | Examples |
|---|---|---|
| Virtual MFA Device | TOTP (Time-based One-Time Password) app on smartphone | Google Authenticator, Authy, Microsoft Authenticator, Duo |
| Hardware TOTP Token | Physical device that generates 6-digit codes | Gemalto token, RSA SecurID |
| FIDO Security Key (U2F) | Physical USB/NFC key โ press button to authenticate | YubiKey, Titan Security Key |
| Passkey / Biometric | Built-in biometric (fingerprint, face) stored in device | Touch ID on Mac, Windows Hello, smartphone biometrics |
| Entity | What It Is | Credentials | Best For |
|---|---|---|---|
| User | Person or application with long-term identity in your account | Password + Access Keys | Human employees, CI/CD pipelines (when no other option) |
| Group | Collection of IAM users โ policies applied to group apply to all members | N/A (inherits from policies) | Organizing users by job function (Developers, Admins, Read-Only) |
| Role | IAM identity with permission policies, but NO permanent credentials. Assumed by trusted entities. | Temporary credentials (STS) | EC2/Lambda accessing AWS services, cross-account access, identity federation |
| Policy | JSON document defining permissions (Allow/Deny actions on resources) | N/A | Attached to users, groups, roles, or resources |
Roles are the AWS-recommended way to grant AWS service permissions. Instead of creating IAM users with access keys for EC2 instances (insecure), you create a role with an Instance Profile.
{
"Version": "2012-10-17", // Always use this version
"Statement": [
{
"Sid": "AllowS3ReadWrite", // Optional: human-readable ID for this statement
"Effect": "Allow", // Allow or Deny
"Action": [ // What API calls are allowed/denied
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [ // What resources the action applies to
"arn:aws:s3:::my-bucket", // bucket (for ListBucket)
"arn:aws:s3:::my-bucket/*" // all objects in bucket
],
"Condition": { // Optional: when this policy applies
"StringEquals": {
"aws:RequestedRegion": "ap-south-1" // Only in Mumbai region
},
"Bool": {
"aws:MultiFactorAuthPresent": "true" // Only when using MFA
}
}
},
{
"Sid": "DenyDeleteProduction",
"Effect": "Deny", // Explicit Deny always wins over Allow
"Action": "s3:DeleteObject",
"Resource": "arn:aws:s3:::production-bucket/*",
"Condition": {
"StringNotEquals": {
"aws:PrincipalTag/Environment": "Admin" // Unless user has Admin tag
}
}
}
]
}
# Configure CLI with access keys (for human users) aws configure # Prompts: Access Key ID, Secret Access Key, Region, Output format # Or set specific profile aws configure --profile myproject # Use specific profile aws s3 ls --profile myproject # List configured profiles aws configure list-profiles # View current identity aws sts get-caller-identity # Returns: Account, UserId, Arn (who am I?) # On EC2 with IAM role โ NO configuration needed! aws s3 ls s3://my-bucket/ # uses instance profile automatically # Access key rotation (best practice: every 90 days) aws iam create-access-key --user-name myuser aws iam delete-access-key --user-name myuser --access-key-id AKIAIOSFODNN7EXAMPLE
Hardcoding passwords, API keys, or tokens directly in code is one of the most dangerous security mistakes. If code is pushed to GitHub (even accidentally), credentials are exposed publicly. AWS scanners, bots, and attackers actively scrape GitHub for AWS keys โ a compromised key can result in thousands of dollars of AWS charges within minutes.
Secrets Manager is a dedicated service for storing, rotating, and retrieving secrets. Applications call the API at runtime instead of having credentials in code or config files.
| Feature | Details |
|---|---|
| Automatic Rotation | Rotates RDS, Aurora, Redshift, DocumentDB credentials on schedule via Lambda. Zero downtime โ updates DB password and stores new value atomically. |
| Encryption | All secrets encrypted with KMS (AWS-managed or your own CMK) |
| Versioning | Keeps previous versions (AWSPREVIOUS) during rotation for zero-downtime cutover |
| Audit Trail | Every GetSecretValue call logged in CloudTrail โ full audit who accessed what and when |
| Cross-account | Share secrets across AWS accounts using resource-based policies |
| Cost | $0.40/secret/month + $0.05 per 10,000 API calls |
import boto3, json
def get_secret(secret_name):
client = boto3.client('secretsmanager', region_name='ap-south-1')
resp = client.get_secret_value(SecretId=secret_name)
return json.loads(resp['SecretString'])
# Usage - credentials fetched at runtime, never in code
creds = get_secret('prod/myapp/rds')
conn = pymysql.connect(host=creds['host'], user=creds['username'],
password=creds['password'], database=creds['dbname'])
Automatic rotation works by triggering a Lambda function on schedule. AWS provides pre-built rotation Lambdas for RDS, Aurora, Redshift, and DocumentDB. For other services, you write a custom Lambda.
KMS is the central key management service for all AWS encryption. It creates and controls cryptographic keys used to encrypt data. Crucially, plaintext keys NEVER leave KMS โ all encrypt/decrypt operations happen inside the service via API calls.
| Key Type | Who Manages | Cost | Use Case |
|---|---|---|---|
| AWS Owned Keys | AWS (hidden) | Free | Default for S3, SQS, DynamoDB |
| AWS Managed Keys | AWS (visible) | Free | aws/s3, aws/ebs, aws/rds |
| Customer Managed CMK | You | $1/month + $0.03/10K API calls | Custom rotation, cross-account, audit |
| Imported Keys | You (bring own key) | $1/month | Regulatory compliance (BYOK) |
Unlike IAM policies, KMS keys REQUIRE a key policy โ without one, no one (not even root) can use the key. Key policies are resource-based policies attached directly to the CMK.
{
"Statement": [
{
"Sid": "Enable IAM User Permissions",
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::123456789012:root"},
"Action": "kms:*",
"Resource": "*"
},
{
"Sid": "Allow Lambda to use key",
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::123456789012:role/lambda-role"},
"Action": ["kms:Decrypt","kms:GenerateDataKey"],
"Resource": "*"
}
]
}
CloudWatch is AWS's unified observability platform โ the single place to monitor all your AWS resources and applications. It collects metrics (numbers), logs (text), and traces (request paths) and lets you set alarms, create dashboards, and trigger automated actions. Think of it as the "nervous system" of your AWS infrastructure.
| Metric | Description | Unit | Alarm Threshold |
|---|---|---|---|
| CPUUtilization | % of CPU used by instance | Percent | Alert if >80% for 5 mins |
| NetworkIn / NetworkOut | Bytes received/sent | Bytes | Alert on traffic spikes |
| DiskReadOps / DiskWriteOps | I/O operations (instance store only) | Count | Detect disk bottleneck |
| StatusCheckFailed_Instance | OS-level issues (kernel panic, etc.) | Count (0 or 1) | Alert on any failure |
| StatusCheckFailed_System | AWS hardware issues | Count (0 or 1) | Alert on any failure |
The CloudWatch Agent is software you install on EC2 (or on-premises servers) to collect metrics and logs that aren't available by default โ especially memory utilization, disk space, and custom application logs.
# Install CloudWatch Agent on Amazon Linux 2 sudo yum install -y amazon-cloudwatch-agent # Run configuration wizard (interactive) sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard # Or use a config file (stored in SSM Parameter Store for central management) sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:/AmazonCloudWatch-Config # Start and enable sudo systemctl start amazon-cloudwatch-agent sudo systemctl enable amazon-cloudwatch-agent sudo systemctl status amazon-cloudwatch-agent
{
"metrics": {
"namespace": "CWAgent",
"metrics_collected": {
"mem": {
"measurement": ["mem_used_percent"],
"metrics_collection_interval": 60
},
"disk": {
"measurement": ["used_percent"],
"resources": ["/", "/data"],
"metrics_collection_interval": 300
},
"cpu": {
"totalcpu": true,
"metrics_collection_interval": 60
}
}
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/nginx/access.log",
"log_group_name": "/ec2/nginx/access",
"log_stream_name": "{instance_id}"
},
{
"file_path": "/var/log/myapp/app.log",
"log_group_name": "/ec2/myapp",
"log_stream_name": "{hostname}"
}
]
}
}
}
}
| Setting | What it means | Example |
|---|---|---|
| Metric | What to watch | CPUUtilization, namespace=AWS/EC2 |
| Statistic | How to aggregate data points | Average, Maximum, Sum, p99 |
| Period | Length of each evaluation window | 300 seconds (5 min) |
| Evaluation Periods | Total windows to look at | 3 (look at last 15 min) |
| Datapoints to Alarm | How many windows must breach (M of N) | 2 of 3 |
| Threshold | The trigger value | > 80% |
| Missing data | How to treat gaps | notBreaching / breaching / ignore |
# Find all ERROR log lines in last 1 hour fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 50 # Count errors by type fields @message | filter @message like /Exception/ | parse @message "* Exception: *" as prefix, errorType | stats count(*) as errorCount by errorType | sort errorCount desc # Lambda: find slow invocations (>3 seconds) filter @type = "REPORT" | fields @requestId, @duration, @billedDuration, @memorySize, @maxMemoryUsed | filter @duration > 3000 | sort @duration desc
EventBridge is a serverless event bus that connects applications using events. AWS services emit events when things happen (EC2 state change, S3 object uploaded, CodePipeline failed). EventBridge routes these events to target services for automated responses โ enabling event-driven architectures without polling.
# Scheduled rule examples:
rate(5 minutes) # every 5 minutes
rate(1 hour) # every hour
cron(0 18 ? * MON-FRI *) # 6 PM UTC weekdays
cron(30 3 * * ? *) # 3:30 AM UTC daily (9 AM IST)
# Event pattern โ trigger when EC2 instance stops:
{
"source": ["aws.ec2"],
"detail-type": ["EC2 Instance State-change Notification"],
"detail": {
"state": ["stopped", "terminated"]
}
}
# Event pattern โ S3 object uploaded:
{
"source": ["aws.s3"],
"detail-type": ["Object Created"],
"detail": {
"bucket": {"name": ["my-uploads-bucket"]},
"object": {"key": [{"prefix": "images/"}]}
}
}
EventBridge Pipes connects event sources (SQS, DynamoDB Streams, Kinesis) to targets with optional filtering, enrichment (Lambda/Step Functions), and transformation โ all without writing integration code.
Source (SQS) โ Filter โ Enrichment (Lambda) โ Target (Step Functions)
Composite alarms combine multiple CloudWatch alarms into a single alarm using AND/OR/NOT logic. They reduce alert noise by only notifying when multiple conditions are true simultaneously.
# Only alarm when BOTH CPU is high AND memory is high
# Prevents noisy false positives from individual metric spikes
{
"AlarmRule": "ALARM(cpu-alarm) AND ALARM(memory-alarm)"
}
# Alert when ANY of these critical conditions occur
{
"AlarmRule": "ALARM(disk-full) OR ALARM(health-check-failed) OR ALARM(db-connections-max)"
}
Metric Math lets you create new time-series by performing mathematical operations on existing metrics โ compute error rates, percentages, sums across instances, etc.
# Error rate calculation
METRICS:
m1: Errors (Count)
m2: Requests (Count)
EXPRESSION:
e1: (m1/m2)*100 โ ErrorRate (%)
# Sum CPU across all EC2 instances in an ASG
SEARCH('{AWS/EC2,InstanceId} CPUUtilization', 'Average', 300)
# Then: SUM(METRICS()) โ Total CPU across all instances
Analyzes log data to identify the "top contributors" to performance problems โ e.g., which IP addresses are generating the most 404s, which Lambda functions are causing the most errors, which URLs have the highest latency.
Uses machine learning to automatically create a "band" of expected values for any metric based on historical patterns. Alarms trigger when the metric goes outside the expected band โ no manual threshold needed.
X-Ray traces requests as they travel through your distributed application โ from API Gateway โ Lambda โ DynamoDB โ external API. It shows you exactly where latency comes from and which service is causing errors.
# Python Lambda with X-Ray tracing
from aws_xray_sdk.core import xray_recorder, patch_all
patch_all() # Auto-instrument boto3, requests, pymysql
@xray_recorder.capture("process_order")
def process_order(order_id):
xray_recorder.put_annotation("order_id", order_id)
# your code โ automatically traced
result = table.get_item(Key={"order_id": order_id})
return result
Synthetics lets you create "canaries" โ scripts that run on a schedule to test your endpoints and APIs from outside your application, simulating user behavior 24/7.
AWS provides a layered security approach โ multiple services working together cover different aspects of security: edge protection, identity, vulnerability management, threat detection, compliance, and incident response. Understanding which tool does what is essential.
| Service | Category | What it does |
|---|---|---|
| Shield | DDoS Protection | Protects against volumetric network attacks |
| WAF | App Firewall | Blocks malicious HTTP requests (SQLi, XSS) |
| ACM | SSL/TLS | Free certificates for AWS services |
| GuardDuty | Threat Detection | ML-based anomaly detection across your account |
| Inspector | Vulnerability Scan | CVE scanning for EC2, Lambda, containers |
| Macie | Data Security | Finds PII/sensitive data in S3 |
| Security Hub | CSPM | Centralized security findings dashboard |
| CloudTrail | Audit | Records all API calls in account |
| Config | Compliance | Tracks config changes, evaluates rules |
| Trusted Advisor | Best Practices | Recommendations across 5 pillars |
ACM provisions, manages, and auto-renews SSL/TLS certificates. Public certificates are completely FREE when used with AWS services โ no more paying certificate authorities or worrying about expiration dates.
WAF inspects HTTP/HTTPS requests at Layer 7 and blocks malicious traffic before it reaches your application. Deploy on CloudFront, ALB, API Gateway, or AppSync.
| Rule Type | Description | Example |
|---|---|---|
| IP Set Rules | Allow/block specific IPs or CIDRs | Block known bad IP ranges |
| Geographic Rules | Allow/block by country | Only allow India and US |
| Rate-Based Rules | Limit requests per IP per 5 minutes | Max 2000 req/5min per IP |
| SQL Injection Match | Detect SQL injection patterns in request | Block ' OR 1=1-- in query string |
| XSS Match | Detect cross-site scripting patterns | Block script tags in body |
| Regex Pattern | Custom regex matching on request parts | Block specific User-Agent strings |
| AWS Managed Rules | Pre-built rulesets maintained by AWS | Core Rule Set, Known Bad Inputs, PHP, WordPress |
GuardDuty is a threat detection service that continuously monitors your AWS account for malicious activity using machine learning, anomaly detection, and threat intelligence feeds. It requires no agents โ it analyzes VPC Flow Logs, CloudTrail, DNS logs, and S3 data events automatically.
# Auto-remediate GuardDuty finding via EventBridge + Lambda:
# Trigger: GuardDuty finding type = "UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration"
# Action Lambda: Revoke all active sessions for the IAM role, notify security team
import boto3
def lambda_handler(event, context):
iam = boto3.client('iam')
detail = event['detail']
role_name = detail['resource']['accessKeyDetails']['userName']
# Revoke all active sessions by attaching an explicit deny policy
iam.put_role_policy(RoleName=role_name, PolicyName='RevokeAllSessions',
PolicyDocument='{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"*","Resource":"*","Condition":{"DateLessThan":{"aws:TokenIssueTime":"' + str(datetime.utcnow().isoformat()) + '"}}}]}')
Inspector continuously scans EC2 instances, Lambda functions, and container images in ECR for software vulnerabilities (CVEs) and unintended network exposure. Unlike manual scans, Inspector rescans automatically when new CVEs are published.
Macie uses ML to automatically discover, classify, and protect sensitive data stored in S3. It identifies Personally Identifiable Information (PII) like names, credit card numbers, SSNs, passport numbers, and health records.
CloudTrail records every API call made in your AWS account โ the who, what, when, and from where of every action. It's your primary tool for security investigation, compliance auditing, and troubleshooting permission issues.
| Event Type | What's captured | Default? | Cost |
|---|---|---|---|
| Management Events | Control plane: create/delete/modify resources (RunInstances, CreateBucket, PutRolePolicy) | Yes โ 90 days in console | Free for first trail |
| Data Events | Data plane: S3 GetObject/PutObject, Lambda invocations, DynamoDB ops | No | $0.10/100K events |
| Insight Events | Unusual API activity (sudden spike in TerminateInstances calls) | No | $0.35/100K events |
# CloudTrail log entry example โ who deleted an S3 bucket:
{
"eventTime": "2024-01-15T14:23:11Z",
"eventName": "DeleteBucket",
"userIdentity": {
"type": "IAMUser",
"userName": "dev-john",
"arn": "arn:aws:iam::123456789:user/dev-john"
},
"sourceIPAddress": "203.0.113.45",
"requestParameters": {"bucketName": "prod-backup-bucket"},
"responseElements": null,
"errorCode": null โ null means SUCCESS (bucket deleted!)
}
Config continuously records configuration changes of your AWS resources and evaluates them against compliance rules. If something is misconfigured (public S3 bucket, unencrypted EBS volume), Config flags it and can auto-remediate.
Security Hub provides a centralized dashboard aggregating security findings from GuardDuty, Inspector, Macie, IAM Access Analyzer, Firewall Manager, and third-party tools into one place with a security score.
SNS is a fully managed pub/sub messaging service. Publishers send messages to topics, and all subscribers receive a copy. It's the glue that connects AWS monitoring alerts to humans and automated systems.
| Subscriber Type | Use Case |
|---|---|
| Email / Email-JSON | Alert engineers when alarm fires |
| SMS | Critical alerts to phones |
| HTTP/HTTPS | Webhook to external systems (PagerDuty, Slack) |
| Lambda | Automated remediation on alert |
| SQS | Fan-out: one message โ multiple queues processed independently |
| Kinesis Firehose | Stream alerts to S3/Splunk/Elasticsearch |
| Mobile Push | iOS/Android push notifications |
Trusted Advisor analyzes your AWS environment against AWS best practices across 5 categories and gives you recommendations. It's like having an AWS solutions architect review your account automatically.
| Category | Example Checks | Support Plan |
|---|---|---|
| ๐ฐ Cost Optimization | Idle EC2 instances, underutilized RDS, unattached EIPs, old snapshots | All plans |
| โก Performance | CloudFront enabled, EC2 instance types, EBS throughput | Business+ |
| ๐ Security | Open security group ports, MFA on root, S3 bucket permissions, exposed access keys | 7 basic checks for all |
| ๐ก๏ธ Fault Tolerance | Multi-AZ RDS, ELB health checks, EBS snapshots, Route 53 failover | Business+ |
| ๐ Service Limits | Approaching EC2, EIP, VPC limits | All plans |
Global Accelerator improves performance of internet applications by routing traffic through the AWS global backbone network instead of the unpredictable public internet โ reducing latency by 60%+ for global users.
Lambda is a serverless, event-driven compute service. You write code, upload it, and Lambda runs it in response to events. You never manage servers โ AWS handles provisioning, scaling, patching, and availability automatically. You pay only when code runs (per 1ms of execution).
| Setting | Range | Notes |
|---|---|---|
| Memory | 128 MB โ 10,240 MB | CPU power scales proportionally with memory |
| Timeout | 1 second โ 15 minutes | Function killed after timeout; set appropriately |
| /tmp Storage | 512 MB โ 10,240 MB | Temporary disk; shared across warm invocations |
| Concurrency | Up to 1,000 (default, region) | Request increase; set reserved concurrency to limit |
| Package size | 50 MB (zip), 250 MB (unzipped) | Use Layers for large dependencies |
| Env variables | 4 KB total | Use Secrets Manager for sensitive values |
import json, boto3, os
# Code OUTSIDE handler runs once per container (cold start)
# Reuse these across warm invocations!
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['TABLE_NAME'])
def lambda_handler(event, context):
# event: input data (from API GW, S3, SQS, etc.)
# context: runtime info (function name, remaining time, etc.)
print(f"Function: {context.function_name}")
print(f"Remaining time: {context.get_remaining_time_in_millis()}ms")
print(f"Event: {json.dumps(event)}")
# Process
name = event.get('name', 'World')
return {
'statusCode': 200,
'headers': {'Content-Type': 'application/json'},
'body': json.dumps({'message': f'Hello, {name}!'})
}
Layers are ZIP archives containing libraries, custom runtimes, or dependencies shared across multiple functions. Reduces deployment package size and code reuse.
# Create a layer with Python packages mkdir -p python/lib/python3.12/site-packages pip install pandas numpy requests -t python/lib/python3.12/site-packages/ zip -r pandas-layer.zip python/ aws lambda publish-layer-version \ --layer-name pandas-numpy \ --zip-file fileb://pandas-layer.zip \ --compatible-runtimes python3.12
| Source | Invocation Type | Use Case |
|---|---|---|
| API Gateway / ALB | Synchronous | REST APIs, web backends |
| S3 | Asynchronous | Image processing on upload, data pipeline |
| DynamoDB Streams | Stream (polling) | React to DB changes, replicate data |
| SQS | Stream (polling) | Process queue messages, decoupled workflows |
| SNS | Asynchronous | Fan-out processing, notifications |
| EventBridge | Asynchronous | Scheduled tasks (cron), event-driven workflows |
| Kinesis | Stream (polling) | Real-time data stream processing |
| CloudWatch Logs | Asynchronous | Log processing, alerting from log patterns |
| Limit | Value | Notes |
|---|---|---|
| Max timeout | 15 minutes | For long tasks use Step Functions or ECS |
| Max memory | 10,240 MB (10 GB) | More memory = more vCPU |
| Concurrency (default) | 1,000/region | Request increase via support |
| Package size (zip) | 50 MB | Use Layers for larger deps |
| Package (unzipped) | 250 MB | Including all layers |
| Response payload (sync) | 6 MB | Use S3 for large responses |
| Async payload | 256 KB | Pass S3 key for large data |
| Env variables | 4 KB total |
import boto3, pymysql, json, os
# Initialize OUTSIDE handler (connection reuse on warm invocations)
db_conn = None
def get_db_connection():
creds = boto3.client('secretsmanager').get_secret_value(
SecretId='prod/rds/mysql')
c = json.loads(creds['SecretString'])
return pymysql.connect(
host=os.environ['RDS_PROXY_ENDPOINT'], # Proxy, not RDS endpoint!
user=c['username'], password=c['password'],
database='myapp', cursorclass=pymysql.cursors.DictCursor,
connect_timeout=5
)
def lambda_handler(event, context):
global db_conn
if not db_conn or not db_conn.open:
db_conn = get_db_connection()
with db_conn.cursor() as cursor:
cursor.execute("SELECT * FROM users LIMIT 10")
return {'statusCode': 200, 'body': json.dumps(cursor.fetchall())}
import boto3
from decimal import Decimal
# Outside handler = reuse
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Orders')
def lambda_handler(event, context):
# Write (no connection pool needed - HTTP API)
table.put_item(Item={
'order_id': event['order_id'],
'user_id': event['user_id'],
'amount': Decimal(str(event['amount'])),
'status': 'pending'
})
# Read
resp = table.get_item(Key={'order_id': event['order_id']})
return resp.get('Item', {})
# API Gateway Proxy Integration passes full HTTP context to Lambda
# Request: POST /users โ Lambda receives:
event = {
"httpMethod": "POST",
"path": "/users",
"pathParameters": {"id": "123"},
"queryStringParameters": {"page": "1"},
"headers": {"Authorization": "Bearer token..."},
"body": '{"name":"Ravi","email":"<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="126073647b52776a737f627e773c717d7f">[email protected]</a>"}',
"isBase64Encoded": False
}
# Lambda MUST return this structure:
return {
"statusCode": 200,
"headers": {
"Content-Type": "application/json",
"Access-Control-Allow-Origin": "*" # for CORS
},
"body": json.dumps({"user_id": "U123", "name": "Ravi"})
}
# Environment variables (non-sensitive config)
import os
TABLE_NAME = os.environ['TABLE_NAME']
REGION = os.environ.get('AWS_REGION', 'ap-south-1')
# For SECRETS โ use Secrets Manager (not env vars!)
import boto3, json
_secret_cache = {}
def get_secret(name):
if name not in _secret_cache:
client = boto3.client('secretsmanager')
_secret_cache[name] = json.loads(
client.get_secret_value(SecretId=name)['SecretString']
)
return _secret_cache[name] # cached after first call
DNS (Domain Name System) translates human-readable domain names (google.com) into IP addresses (142.250.80.46) that computers use. Without DNS, you'd need to memorize IP addresses for every website. Route 53 is AWS's highly available and scalable DNS web service โ named after the DNS port (53).
User types: www.example.com in browser 1. Browser checks local cache โ not found 2. OS asks Recursive Resolver (usually ISP or 8.8.8.8) 3. Recursive Resolver asks Root Nameserver โ "ask .com TLD server" 4. Asks .com TLD server โ "ask ns-123.awsdns-45.com" 5. Asks Route 53 Nameserver โ returns "93.184.216.34" 6. Browser connects to 93.184.216.34 Total time: ~50-200ms (first time), ~0ms (cached)
| Record | Maps | Important Notes | Example |
|---|---|---|---|
| A | hostname โ IPv4 | Most common record | example.com โ 93.184.216.34 |
| AAAA | hostname โ IPv6 | Next-gen internet | example.com โ 2606:2800::/32 |
| CNAME | hostname โ hostname | Cannot be used for root/apex domain (example.com) โ only subdomains (www.example.com) | www.example.com โ example.com |
| Alias | hostname โ AWS resource | AWS extension. Works for root domain. FREE queries. Use instead of CNAME for AWS resources. | example.com โ myalb.amazonaws.com |
| MX | domain โ mail servers | Priority number (lower = preferred) | 10 mail.example.com |
| TXT | domain โ text string | Domain verification, SPF, DKIM | "v=spf1 include:amazonses.com ~all" |
| NS | zone โ nameservers | Which servers are authoritative for zone | ns-123.awsdns-45.com |
| SOA | Zone metadata | Start of Authority โ admin info, TTL defaults | Auto-created with hosted zone |
| PTR | IP โ hostname | Reverse DNS lookup | 34.216.93.184 โ ec2.amazonaws.com |
| SRV | Service location | Used for VoIP, XMPP, Kubernetes | _http._tcp.example.com |
TTL tells DNS resolvers how long to cache a record. Choosing the right TTL is a balance between DNS query costs and propagation speed.
| Policy | Algorithm | Best For | Health Checks |
|---|---|---|---|
| Simple | Returns all values, client picks randomly | Single resource, no health checks needed | No |
| Weighted | Route X% to A, Y% to B based on weights (0-255) | A/B testing, blue/green deployments, gradual migrations | Optional |
| Failover | Primary active, secondary passive. Auto-switch on health check failure. | DR setup, active-passive HA | Required on primary |
| Geolocation | Route based on user's geographic location (continent, country, state) | Content localization, GDPR data residency, language-specific content | Optional |
| Geoproximity | Route based on distance with adjustable bias (+/-) | Shift traffic between regions, fine-grained global routing | Optional |
| Latency-based | Route to AWS region with lowest measured latency for user | Global apps where performance matters most | Optional |
| Multi-Value | Returns up to 8 healthy records randomly | Simple client-side load balancing (not replacement for ELB) | Integrated |
| IP-based | Route based on client's originating IP CIDR | Route ISP traffic to specific endpoints, optimize peering | No |
Route 53 health checkers are deployed in 15+ locations globally. They check your endpoints every 10 or 30 seconds and mark them unhealthy if enough checks fail โ automatically removing them from DNS responses.
# Failover routing - example setup: Primary record: www.example.com โ ALB in us-east-1 (health check attached) Secondary record: www.example.com โ ALB in eu-west-1 (failover target) If primary health check fails for 3+ consecutive checks: โ Route 53 automatically serves the secondary record โ Recovery is automatic when primary becomes healthy again
CloudFront is AWS's Content Delivery Network (CDN) with 400+ Points of Presence (edge locations) globally. Content is cached at edge locations closest to users, reducing latency and origin load.
| Origin Type | Use Case | Security |
|---|---|---|
| S3 Bucket | Static websites, file downloads, media | OAC (Origin Access Control) โ blocks direct S3 URL access |
| ALB | Dynamic web apps, APIs behind load balancer | Custom header (X-Origin-Key) to verify requests from CF |
| EC2 Instance | Custom servers (must have public IP) | Security Group allow CF IP ranges |
| Any HTTP Endpoint | On-premises, third-party servers | Custom headers, IP whitelisting |
/api/* โ ALB (no cache), /images/* โ S3 (cache 7 days), /* โ S3 (cache 24h)# Force CloudFront to fetch fresh content from origin aws cloudfront create-invalidation \ --distribution-id E1234567ABCDEF \ --paths "/*" # all files # OR "/index.html" # specific file # OR "/images/*" # specific path # Cost: First 1,000 invalidation paths/month free, then $0.005 each # Better approach: use versioned filenames (main.v2.3.css) โ no invalidation needed!
Run Lambda functions at CloudFront edge locations to customize content delivery. 4 trigger points per request cycle:
Infrastructure as Code (IaC) means defining your cloud infrastructure in code files instead of clicking through consoles. This brings software engineering best practices (version control, code review, testing, CI/CD) to infrastructure management.
| Concept | Description | Example |
|---|---|---|
| Provider | Plugin that talks to cloud API. Translates HCL to API calls. | hashicorp/aws, hashicorp/azure, hashicorp/kubernetes |
| Resource | Infrastructure component you want to create/manage | aws_instance, aws_s3_bucket, aws_vpc |
| Data Source | Read existing resource info (don't manage it, just read) | data.aws_ami.latest, data.aws_vpc.default |
| Variable | Input parameter โ makes code reusable | var.instance_type, var.environment |
| Local | Computed value within module โ avoid repetition | local.name_prefix = "prod-app" |
| Output | Export values after apply โ use in other modules or scripts | output: EC2 IP, RDS endpoint |
| Module | Reusable package of Terraform code | module "vpc" { source = "./modules/vpc" } |
| State | JSON file tracking what Terraform has created. Source of truth. | terraform.tfstate (store in S3!) |
# 1. Initialize โ download providers, set up backend terraform init # 2. Format code (always run before committing) terraform fmt -recursive # 3. Validate syntax terraform validate # 4. ALWAYS review plan before applying! terraform plan terraform plan -out=tfplan.out # save plan for apply # 5. Apply changes terraform apply # interactive confirmation terraform apply tfplan.out # apply saved plan terraform apply -auto-approve # CI/CD (no prompt) # Other useful commands terraform destroy # destroy everything (careful!) terraform output # show outputs terraform state list # list managed resources terraform state show aws_instance.web # inspect resource state terraform import aws_s3_bucket.logs my-bucket-name # import existing resource
By default, Terraform stores state locally (terraform.tfstate). This breaks in teams โ two people can't work simultaneously, state isn't shared. ALWAYS use remote state with S3 + DynamoDB locking in production.
# versions.tf โ remote backend configuration
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
}
backend "s3" {
bucket = "mycompany-terraform-state" # must exist first!
key = "prod/ap-south-1/terraform.tfstate"
region = "ap-south-1"
encrypt = true # encrypt state at rest
dynamodb_table = "terraform-locks" # prevent concurrent applies
}
}
# Create the S3 bucket and DynamoDB table manually first (bootstrap):
aws s3api create-bucket --bucket mycompany-terraform-state --region ap-south-1
aws dynamodb create-table --table-name terraform-locks --attribute-definitions AttributeName=LockID,AttributeType=S --key-schema AttributeName=LockID,KeyType=HASH --billing-mode PAY_PER_REQUEST
# variables.tf
variable "env" { default = "dev" }
variable "region" { default = "ap-south-1" }
# vpc.tf
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = { Name = "${var.env}-vpc" }
}
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = "${var.region}a"
map_public_ip_on_launch = true
tags = { Name = "${var.env}-public-1a" }
}
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.main.id
tags = { Name = "${var.env}-igw" }
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route { cidr_block = "0.0.0.0/0"; gateway_id = aws_internet_gateway.igw.id }
tags = { Name = "${var.env}-public-rt" }
}
resource "aws_route_table_association" "public" {
subnet_id = aws_subnet.public.id
route_table_id = aws_route_table.public.id
}
# ec2.tf
data "aws_ami" "al2" {
most_recent = true
owners = ["amazon"]
filter { name = "name"; values = ["amzn2-ami-hvm-*-x86_64-gp2"] }
}
resource "aws_security_group" "web" {
name = "${var.env}-web-sg"
vpc_id = aws_vpc.main.id
ingress { from_port=80; to_port=80; protocol="tcp"; cidr_blocks=["0.0.0.0/0"] }
ingress { from_port=443; to_port=443; protocol="tcp"; cidr_blocks=["0.0.0.0/0"] }
ingress { from_port=22; to_port=22; protocol="tcp"; cidr_blocks=["10.0.0.0/8"] }
egress { from_port=0; to_port=0; protocol="-1"; cidr_blocks=["0.0.0.0/0"] }
}
resource "aws_instance" "web" {
ami = data.aws_ami.al2.id
instance_type = "t3.micro"
subnet_id = aws_subnet.public.id
vpc_security_group_ids = [aws_security_group.web.id]
user_data = <<-EOF
#!/bin/bash
yum install -y nginx
systemctl start nginx
systemctl enable nginx
EOF
tags = { Name = "${var.env}-web" }
}
# outputs.tf
output "web_public_ip" { value = aws_instance.web.public_ip }
output "web_public_dns" { value = aws_instance.web.public_dns }
Modules are reusable packages of Terraform configuration. Instead of copy-pasting VPC code across multiple projects, create a VPC module once and reuse it everywhere.
# Using community modules from Terraform Registry
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.0.0"
name = "my-vpc"
cidr = "10.0.0.0/16"
azs = ["ap-south-1a", "ap-south-1b"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
tags = { Terraform = "true", Environment = "dev" }
}
# Reference module outputs
resource "aws_instance" "app" {
subnet_id = module.vpc.private_subnets[0]
# ...
}
terraform plan and review before applyversion = "~> 5.0"terraform fmt pre-commit hooks for consistent formattingterraform validate in CI/CD pipelinesensitive = truecount or for_each instead of duplicating resourcesBoto3 is the official AWS SDK for Python. It lets you programmatically interact with AWS services โ create resources, manage infrastructure, automate tasks, and build applications that use AWS.
pip install boto3
# Two interfaces:
import boto3
# 1. Client (low-level, 1:1 map to AWS API)
ec2_client = boto3.client('ec2', region_name='ap-south-1')
# 2. Resource (high-level, object-oriented)
s3 = boto3.resource('s3')
# Authentication order:
# 1. Environment variables (AWS_ACCESS_KEY_ID, etc.)
# 2. ~/.aws/credentials file (aws configure)
# 3. IAM Instance Profile (EC2) or Task Role (ECS/Lambda) โ recommended on AWS
import boto3
ec2 = boto3.client('ec2', region_name='ap-south-1')
# List all running instances with details
def list_instances(state='running'):
paginator = ec2.get_paginator('describe_instances')
for page in paginator.paginate(Filters=[{'Name':'instance-state-name','Values':[state]}]):
for r in page['Reservations']:
for i in r['Instances']:
name = next((t['Value'] for t in i.get('Tags',[]) if t['Key']=='Name'), 'N/A')
print(f"{i['InstanceId']:20} {i['InstanceType']:12} {i.get('PublicIpAddress','Private'):15} {name}")
# Start/Stop/Reboot
ec2.start_instances(InstanceIds=['i-1234567890abcdef0'])
ec2.stop_instances(InstanceIds=['i-1234567890abcdef0'])
ec2.reboot_instances(InstanceIds=['i-1234567890abcdef0'])
# Create snapshot with tags
def snapshot_volume(volume_id, desc="Auto backup"):
snap = ec2.create_snapshot(VolumeId=volume_id, Description=desc,
TagSpecifications=[{'ResourceType':'snapshot',
'Tags':[{'Key':'AutoCreated','Value':'true'},
{'Key':'Date','Value':str(datetime.date.today())}]}])
return snap['SnapshotId']
# Delete old snapshots (older than N days)
def cleanup_old_snapshots(days=30):
from datetime import datetime, timezone, timedelta
cutoff = datetime.now(timezone.utc) - timedelta(days=days)
snaps = ec2.describe_snapshots(OwnerIds=['self'])['Snapshots']
for s in snaps:
if s['StartTime'] < cutoff and s.get('Tags'):
if any(t['Key']=='AutoCreated' for t in s['Tags']):
ec2.delete_snapshot(SnapshotId=s['SnapshotId'])
print(f"Deleted {s['SnapshotId']}")
import boto3, os
from pathlib import Path
s3 = boto3.client('s3')
# Upload file with progress
def upload_file(path, bucket, key=None, extra_args=None):
key = key or os.path.basename(path)
s3.upload_file(path, bucket, key, ExtraArgs=extra_args or {})
print(f"โ Uploaded {path} โ s3://{bucket}/{key}")
# Upload with metadata and encryption
upload_file('report.pdf', 'my-bucket', 'reports/report.pdf', {
'ContentType': 'application/pdf',
'ServerSideEncryption': 'aws:kms',
'Metadata': {'author': 'Ravi', 'version': '2.0'}
})
# List all objects with pagination
def list_all_objects(bucket, prefix=''):
paginator = s3.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
for obj in page.get('Contents', []):
print(f"{obj['Key']:60} {obj['Size']:10} bytes")
# Generate pre-signed URL for download
def get_presigned_url(bucket, key, expires=3600):
return s3.generate_presigned_url('get_object',
Params={'Bucket': bucket, 'Key': key}, ExpiresIn=expires)
# Clean up old files
def delete_old_files(bucket, prefix, days=30):
from datetime import datetime, timezone, timedelta
cutoff = datetime.now(timezone.utc) - timedelta(days=days)
paginator = s3.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
old = [{'Key': o['Key']} for o in page.get('Contents',[]) if o['LastModified'] < cutoff]
if old:
s3.delete_objects(Bucket=bucket, Delete={'Objects': old})
print(f"Deleted {len(old)} old files")
import boto3, json, zipfile, io
lambda_client = boto3.client('lambda', region_name='ap-south-1')
# Invoke Lambda synchronously
def invoke_lambda(func_name, payload):
resp = lambda_client.invoke(
FunctionName=func_name,
InvocationType='RequestResponse', # sync
Payload=json.dumps(payload)
)
result = json.loads(resp['Payload'].read())
if resp.get('FunctionError'):
raise Exception(f"Lambda error: {result}")
return result
# Update function code from local file
def deploy_function(func_name, code_file):
with open(code_file, 'rb') as f:
lambda_client.update_function_code(
FunctionName=func_name, ZipFile=f.read())
print(f"โ Deployed {func_name}")
# Update environment variables
lambda_client.update_function_configuration(
FunctionName='my-function',
Environment={'Variables': {'TABLE_NAME': 'NewTable', 'ENV': 'prod'}}
)
import boto3
logs = boto3.client('logs')
rds = boto3.client('rds', region_name='ap-south-1')
# Get Lambda error logs from last N hours
def get_lambda_errors(func_name, hours=1):
import time
log_group = f'/aws/lambda/{func_name}'
start_ms = int((time.time() - hours*3600) * 1000)
resp = logs.filter_log_events(
logGroupName=log_group, startTime=start_ms, filterPattern='ERROR')
for e in resp['events']:
print(e['message'].strip())
# Create RDS snapshot
def backup_rds(db_instance_id):
import datetime
snap_id = f"{db_instance_id}-{datetime.date.today().isoformat()}"
rds.create_db_snapshot(DBInstanceIdentifier=db_instance_id, DBSnapshotIdentifier=snap_id)
print(f"โ Snapshot {snap_id} created")
# List RDS instances with status
def list_rds_instances():
for db in rds.describe_db_instances()['DBInstances']:
print(f"{db['DBInstanceIdentifier']:30} {db['DBInstanceStatus']:12} {db['DBInstanceClass']}")
AWS Database Migration Service (DMS) helps migrate databases to AWS with minimal downtime. The source database remains fully operational during migration โ your application keeps running. Only a brief cutover pause (seconds to minutes) is needed at the very end. DMS handles the complexity of moving data, keeping it in sync, and notifying you when it's safe to switch.
Flow: Source DB โโโบ Replication Instance โโโบ Target DB (MySQL EC2) (reads changes via CDC) (Amazon Aurora)
| Type | How it Works | Downtime | When to Use |
|---|---|---|---|
| Full Load | Copies all existing data. No CDC. Source must be static during migration. | High (must stop writes) | Dev/test DBs, small non-critical DBs, can afford downtime |
| Full Load + CDC | Full load first, then CDC captures ongoing changes. Keeps target in sync until cutover. | Minutes (cutover only) | Production systems โ most common approach |
| CDC Only | Only replicates ongoing changes. Assumes initial data already in target. | None | Data already loaded manually (pg_dump), need ongoing sync |
CDC is the technology that enables near-zero downtime migration. DMS reads the database's transaction log (binlog for MySQL, WAL for PostgreSQL, redo log for Oracle) to capture every INSERT, UPDATE, DELETE and replay it on the target.
SCT is required for heterogeneous migrations (source and target are different database engines). It converts database schema, stored procedures, views, and functions from one SQL dialect to another.
| Class | vCPU | RAM | Use Case |
|---|---|---|---|
| dms.t3.micro | 2 | 1 GB | Dev/test, very small DBs (<1 GB) |
| dms.t3.medium | 2 | 4 GB | Small production (<10 GB) |
| dms.r5.large | 2 | 16 GB | Medium production (10-100 GB) |
| dms.r5.xlarge | 4 | 32 GB | Large production (100 GB+) |
| dms.r5.4xlarge | 16 | 128 GB | Very large migrations (TB scale) |