Multicloud DevOps With AI

🚀

DevOps Fundamentals

Fundamentals

What is DevOps?

DevOps is a cultural philosophy and technical movement that bridges the gap between software Development (Dev) and IT Operations (Ops). Before DevOps, developers wrote code and "threw it over the wall" to Ops teams, causing slow releases, blame culture, and frequent production failures. DevOps breaks these walls by establishing shared ownership, automated pipelines, and continuous feedback loops across all stages of the software lifecycle.

The term was coined by Patrick Debois in 2009. It is built on the CAMS framework: Culture (shared responsibility), Automation (eliminate manual toil), Measurement (DORA metrics), and Sharing (knowledge across teams). DevOps is not a tool — it's a mindset that enables organizations to deliver software faster, more reliably, and more securely.

DevOps Infinity Loop — Full Pipeline Architecture

   Developer Workstation
         | git push
         v
   ┌─────────────────────────────────────────────────────────┐
   │                 CI/CD PIPELINE (Jenkins / GitHub Actions)│
   │  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────────┐ │
   │  │PLAN  │→ │CODE  │→ │BUILD │→ │TEST  │→ │ RELEASE  │ │
   │  │Jira  │  │Git   │  │Maven │  │JUnit │  │ JFrog    │ │
   │  └──────┘  └──────┘  └──────┘  └──────┘  └──────────┘ │
   └─────────────────────────────┬───────────────────────────┘
                                 │ artifact
                                 v
   ┌─────────────────────────────────────────────────────────┐
   │                    CD / DEPLOY PHASE                    │
   │  DEV  ──auto──▶  STAGING ──integration tests──▶  PROD  │
   │                               (manual gate)            │
   └─────────────────────────────┬───────────────────────────┘
                                 │
                                 v
   ┌─────────────────────────────────────────────────────────┐
   │             OBSERVE & MONITOR (Feedback Loop)           │
   │  Prometheus → Grafana → Alertmanager → Slack/PagerDuty  │
   │  ELK Stack (Logs) → Jaeger (Traces) → On-call team      │
   └─────────────────────────────────────────────────────────┘

Traditional IT vs DevOps

Aspect	Traditional IT	DevOps
Release Frequency	Monthly/Quarterly	Multiple times per day
Lead Time	Weeks to months	Hours to days
Team Structure	Dev, QA, Ops silos	Cross-functional squads
Infrastructure	Manual, long-lived servers	IaC, immutable infrastructure
Failure Recovery	Hours to days (blame)	Minutes (blameless retro)
Change Failure Rate	30–50%	0–15%
Testing	Manual, end of cycle	Automated, shift-left
Security	Added at the end	Shift-left DevSecOps

DORA Metrics — Measuring DevOps Performance

Google's DevOps Research and Assessment (DORA) team identified four key metrics that distinguish elite engineering teams from low performers. These are the only 4 metrics you need to understand your pipeline health:

Metric	Elite	Low Performer	Meaning
Deployment Frequency	Multiple/day	Once per 6 months	How often code reaches production
Lead Time for Changes	<1 hour	6+ months	Commit → production time
Change Failure Rate	<5%	46–60%	% deployments causing incidents
Mean Time to Recovery	<1 hour	>6 months	Time to restore after failure

DevOps Engineer Skills Map

🔧 Technical Skills

Linux admin, Shell scripting
Git branching strategies
CI/CD: Jenkins, GitLab, GH Actions
Docker, Kubernetes, Helm
Cloud: AWS / Azure / GCP
IaC: Terraform, Ansible
Monitoring: Prometheus, Grafana

📋 Process Skills

Agile / Scrum ceremonies
Release and change management
Incident response, runbooks
SLO / SLA / SLI definition
Blameless post-mortems
Capacity planning

🔐 DevSecOps

Shift-left security testing
SAST/DAST in pipelines
Container scanning (Trivy)
Secret management (Vault)
IAM and least privilege
OWASP Top 10 awareness

💡 Key Principle: DevOps is about shortening feedback loops at every stage. The faster a developer knows their code is broken or insecure, the cheaper it is to fix. Automate everything that can be automated, and measure everything that matters.

💻

Shell Scripting

Scripting

Why Shell Scripting in DevOps?

Shell scripting is the foundational automation skill for every DevOps engineer. A shell (bash, sh, zsh) is the command interpreter that lets you interact with the Linux kernel. Shell scripts chain together commands, control flow, and system calls to automate repetitive tasks — from provisioning servers to deploying applications. Almost every CI/CD pipeline, cron job, and server automation uses shell scripting under the hood.

Bash (Bourne Again Shell) is the most common shell in Linux and is used in AWS EC2, Docker containers, and Kubernetes init containers. Understanding shell scripting saves hours of manual work and is essential for writing Jenkins pipelines, Docker entrypoints, and Kubernetes scripts.

Shell Script Execution Architecture

  User types command / Script file (.sh)
            |
            v
       ┌─────────┐
       │  SHELL  │  bash / sh / zsh / fish
       │ (Parser)│  Reads, tokenizes, expands variables
       └────┬────┘
            │
    ┌───────┴────────┐
    │                │
    v                v
 Built-in        External Command
 Commands        (fork + exec)
 (cd, echo,      /bin/ls, /usr/bin/curl
  export, alias)  spawns child process
            |
            v
       Linux Kernel  (system calls: read, write, fork, exec)
            |
            v
       Hardware / File System / Network

Script Structure & Shebang

#!/bin/bash                    # Shebang: tells OS which interpreter to use
# Script: deploy.sh            # Comments start with #
set -e                         # Exit immediately on error
set -o pipefail                # Catch pipe errors too

DEPLOY_ENV=${1:-"dev"}         # First argument, default = dev
APP_NAME="myapp"
TIMESTAMP=$(date +%Y%m%d%H%M%S)

echo "Deploying $APP_NAME to $DEPLOY_ENV at $TIMESTAMP"

Variables, Conditionals, Loops

# Variables
NAME="DevOps"
echo "Hello $NAME"
echo "Length: ${#NAME}"        # String length

# Conditionals
if [ "$DEPLOY_ENV" == "prod" ]; then
  echo "Production deployment - applying approval gate"
elif [ "$DEPLOY_ENV" == "staging" ]; then
  echo "Staging deployment"
else
  echo "Dev deployment - skipping approvals"
fi

# For loop
for SERVER in web1 web2 web3; do
  echo "Deploying to $SERVER"
  ssh ubuntu@$SERVER "sudo systemctl restart nginx"
done

# While loop
COUNT=0
while [ $COUNT -lt 5 ]; do
  echo "Health check attempt $COUNT"
  curl -sf http://localhost:8080/health && break
  COUNT=$((COUNT + 1))
  sleep 3
done

Functions & Error Handling

function check_service() {
  local SERVICE=$1
  if systemctl is-active --quiet $SERVICE; then
    echo "✅ $SERVICE is running"
    return 0
  else
    echo "❌ $SERVICE is NOT running"
    return 1
  fi
}

# Trap errors and cleanup
trap 'echo "Error on line $LINENO"; cleanup' ERR

function cleanup() {
  rm -f /tmp/deploy.lock
  exit 1
}

check_service nginx || cleanup

Common DevOps Shell Tasks

📁 File Operations

find /app -name "*.log" -mtime +7 -delete
tar -czf backup.tar.gz /var/www
rsync -avz ./dist/ user@server:/var/www
sed -i 's/OLD/NEW/g' config.txt
awk '{print $1,$3}' access.log

🌐 Network & Process

netstat -tlnp | grep 8080
curl -o /dev/null -s -w "%{http_code}" URL
ps aux | grep java | grep -v grep
kill -9 $(lsof -t -i:8080)
nohup ./server.sh > log.txt &

Best Practices:

Always use set -e and set -o pipefail at the top
Quote all variables: "$VAR" to handle spaces
Use functions to avoid code repetition
Add logging with timestamps: echo "[$(date)] Starting deploy"
Test scripts with bash -n script.sh (syntax check) and shellcheck script.sh

🌐

Web Servers

Infrastructure

What is a Web Server?

A web server is software that accepts HTTP/HTTPS requests from clients (browsers, APIs, mobile apps) and serves responses — either static files (HTML, CSS, images) or by proxying requests to application servers (Node.js, Python, Java). In DevOps, web servers like Nginx and Apache are critical for reverse proxying, load balancing, SSL termination, and serving microservices.

Nginx (Engine-X) uses an event-driven, non-blocking architecture that handles thousands of concurrent connections with minimal memory. Apache uses a process/thread-per-connection model that is more flexible with .htaccess but less performant at scale. In modern DevOps, Nginx dominates for reverse proxy and Kubernetes Ingress controllers.

Nginx Reverse Proxy + Load Balancer Architecture

  Internet
     │
     │ HTTPS :443
     v
  ┌─────────────────────────────────────┐
  │         NGINX (Reverse Proxy)       │
  │  - SSL/TLS Termination (Let's Encrypt│
  │  - Gzip Compression                 │
  │  - Rate Limiting                    │
  │  - Static File Caching              │
  └────────────────┬────────────────────┘
                   │ HTTP :8080 (internal)
         ┌─────────┼─────────┐
         │         │         │
         v         v         v
    ┌────────┐ ┌────────┐ ┌────────┐
    │App  #1 │ │App  #2 │ │App  #3 │  (Node.js / Java / Python)
    │:3000   │ │:3001   │ │:3002   │
    └────────┘ └────────┘ └────────┘
         │         │         │
         └─────────┴─────────┘
                   │
                   v
             ┌──────────┐
             │ Database │  (PostgreSQL / MySQL / MongoDB)
             └──────────┘

Nginx Key Configuration

# /etc/nginx/sites-available/myapp.conf

upstream backend {
    least_conn;                         # Load balancing: least connections
    server 10.0.1.10:8080 weight=3;    # Higher weight = more traffic
    server 10.0.1.11:8080 weight=1;
    server 10.0.1.12:8080 backup;      # Only used if others are down
}

server {
    listen 443 ssl http2;
    server_name myapp.example.com;

    ssl_certificate     /etc/ssl/certs/myapp.crt;
    ssl_certificate_key /etc/ssl/private/myapp.key;
    ssl_protocols       TLSv1.2 TLSv1.3;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN";
    add_header X-Content-Type-Options "nosniff";
    add_header Strict-Transport-Security "max-age=31536000";

    # Static files - serve directly (fast)
    location /static/ {
        root /var/www/myapp;
        expires 30d;
        add_header Cache-Control "public, immutable";
    }

    # API requests - proxy to backend
    location /api/ {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_connect_timeout 10s;
        proxy_read_timeout 30s;
    }

    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    location /api/login {
        limit_req zone=api burst=5 nodelay;
        proxy_pass http://backend;
    }
}

Nginx vs Apache Comparison

Feature	Nginx	Apache
Architecture	Event-driven, async	Process/thread per request
Concurrency	10,000+ connections easily	Struggles above 1000
Memory Usage	Very low (~2.5MB per worker)	Higher (forking model)
Config Style	Centralized nginx.conf	.htaccess per directory
PHP Support	Requires PHP-FPM	Built-in mod_php
SSL Termination	Excellent	Good
K8s Ingress	Official ingress controller	Available but less common
Best For	Reverse proxy, static, LB	Legacy apps, shared hosting

Common Web Server Commands

# Nginx
sudo nginx -t                         # Test config syntax
sudo systemctl reload nginx           # Hot reload (no downtime)
sudo systemctl restart nginx          # Full restart
sudo tail -f /var/log/nginx/access.log
sudo tail -f /var/log/nginx/error.log

# Apache
sudo apachectl configtest             # Test config
sudo systemctl reload apache2
sudo a2ensite myapp.conf              # Enable site
sudo a2enmod ssl rewrite proxy        # Enable modules

🌿

Git Introduction

Version Control

What is Git and Why It Matters

Git is a distributed version control system (DVCS) created by Linus Torvalds in 2005 to manage the Linux kernel source. Unlike centralized VCS (like SVN), every developer has a full copy of the entire repository including its history — on their local machine. This means you can commit, branch, diff, and log completely offline.

Git tracks changes as snapshots (not diffs). Each commit stores a complete snapshot of all tracked files at that point in time. This makes Git operations like branching, merging, and reverting extremely fast. Git is the foundation of every modern CI/CD pipeline — no code gets built or deployed without going through Git first.

Git Distributed Architecture

  Developer A (Local Repo)          Developer B (Local Repo)
  ┌─────────────────────┐           ┌─────────────────────┐
  │ Working Directory   │           │ Working Directory   │
  │ Staging Area (Index)│           │ Staging Area (Index)│
  │ Local Repository    │           │ Local Repository    │
  │  (full history)     │           │  (full history)     │
  └──────────┬──────────┘           └──────────┬──────────┘
             │ git push                        │ git pull
             │                                 │
             v                                 v
        ┌─────────────────────────────────────────┐
        │       REMOTE REPOSITORY (GitHub/GitLab) │
        │  main ──── feature/* ──── release/*     │
        │  (Central sync point, not single source)│
        └─────────────────────────────────────────┘

Git Object Model

Everything in Git is stored as one of four object types in the .git/objects directory. Understanding this helps you understand what's happening under the hood:

Object Type	What It Stores	Example
Blob	File content (no filename)	Contents of main.py
Tree	Directory listing (filenames + blob refs)	List of files in /src
Commit	Snapshot pointer + metadata (author, message, parent)	git log entry
Tag	Named pointer to a commit	v1.0.0 release tag

Three Areas of Git

  Working Directory   ──git add──▶   Staging Area   ──git commit──▶   Local Repo
  (untracked/modified files)       (index/.git/index)               (.git/objects)
          ▲                                                                 │
          └────────────────── git checkout / git restore ──────────────────┘
                                                                           │
                                                               git push ───▶ Remote

📦

Git Repositories

Version Control

Types of Git Repositories

A Git repository is a data store containing your project's files and the entire history of changes. There are two types: Local repositories (on your machine, full history in .git/) and Remote repositories (hosted on GitHub, GitLab, Bitbucket — used as sync points for teams).

A bare repository is a special type used for remote servers — it contains only the Git metadata (no working directory). When you push to GitHub, you're pushing to a bare repo. A fork is a server-side clone of a repo under your own account — used for open-source contribution workflows.

Repository Types & Workflow

  git init myproject          git clone https://...
       │                             │
       v                             v
  Local Repo (.git/)         Local Repo (.git/)    <── full history cloned
  ┌───────────────┐          ┌───────────────┐
  │ .git/         │          │ .git/         │
  │  ├ HEAD       │          │  ├ HEAD       │
  │  ├ config     │          │  ├ config     │
  │  ├ objects/   │          │  ├ objects/   │
  │  ├ refs/      │          │  ├ refs/      │
  │  └ index      │          │  └ index      │
  └───────────────┘          └───────────────┘
         │                          │
         └──────── git push ────────▶
                                    │
                              Remote (GitHub)
                              ┌─────────────┐
                              │  Bare Repo  │  (no working tree)
                              │  .git only  │
                              └─────────────┘

Creating and Managing Repos

# Create new repo locally
git init my-devops-project
cd my-devops-project

# Clone existing remote repo
git clone https://github.com/org/repo.git
git clone git@github.com:org/repo.git          # SSH clone

# Check repo status
git status
git log --oneline --graph --decorate --all     # Visual history

# Remote management
git remote -v                                  # Show remotes
git remote add upstream https://original.repo  # Add upstream for forks
git remote set-url origin https://new.url      # Change remote URL

⚙️

Git Setup

Version Control

Configuring Git Correctly

Before using Git in a team, proper configuration ensures your commits are correctly attributed, your editor works, and you can authenticate with remote services. Git configuration has three scopes: system (all users on the machine), global (your user account), and local (specific repository). Local overrides global, which overrides system.

Git Config Hierarchy

  /etc/gitconfig          ← System-wide (all users)
       │
  ~/.gitconfig            ← Global (your user)
       │
  .git/config             ← Local (this repo only — highest priority)

  Resolution: Local > Global > System

Essential Git Configuration

# Identity (required for commits)
git config --global user.name "Your Name"
git config --global user.email "you@company.com"

# Default editor (for commit messages)
git config --global core.editor "vim"          # or "code --wait" for VS Code

# Default branch name
git config --global init.defaultBranch main

# Line endings (important for Windows/Linux teams)
git config --global core.autocrlf input        # Linux/Mac: convert CRLF→LF on commit
git config --global core.autocrlf true         # Windows: convert LF↔CRLF

# Aliases (huge productivity boost)
git config --global alias.st status
git config --global alias.co checkout
git config --global alias.br branch
git config --global alias.lg "log --oneline --graph --decorate --all"
git config --global alias.undo "reset HEAD~1 --mixed"

# SSH Key setup for GitHub/GitLab
ssh-keygen -t ed25519 -C "you@company.com"     # Generate key
cat ~/.ssh/id_ed25519.pub                       # Copy to GitHub Settings → SSH Keys
ssh -T git@github.com                           # Test connection

.gitignore Best Practices

# .gitignore — tells git which files to never track
# Generated files
target/
dist/
build/
*.class
*.jar

# Environment files (NEVER commit secrets!)
.env
.env.local
*.env

# IDE files
.idea/
.vscode/
*.iml

# OS files
.DS_Store
Thumbs.db

# Dependency directories
node_modules/
vendor/
__pycache__/
*.pyc

⚠️ Critical: Never commit .env files, passwords, API keys, or private SSH keys to Git. Use tools like git-secrets, truffleHog, or GitHub's secret scanning to detect accidental secret commits. Once committed and pushed, secrets must be rotated — the history is permanent.

🔧

Git Commands

Version Control

Core Git Workflow Commands

Git commands map to the three-area model: Working Directory → Staging Area → Local Repository → Remote. Mastering these commands is essential for daily DevOps work, including cherry-picking hotfixes, undoing mistakes, and bisecting bugs in production.

Git Command Flow Diagram

  Untracked/Modified           Staged                   Committed              Remote
  (Working Dir)             (Index)                  (Local Repo)           (GitHub)
       │                       │                          │                     │
       │──── git add ─────────▶│                          │                     │
       │                       │──── git commit ─────────▶│                     │
       │                       │                          │─── git push ────────▶│
       │◀─── git restore ──────│                          │◀── git fetch ────────│
       │                       │                          │◀── git pull ─────────│
       │◀──────────── git checkout / git switch ──────────│                     │

Daily Workflow Commands

# Stage and commit
git add .                              # Stage all changes
git add src/app.py                     # Stage specific file
git add -p                             # Interactive staging (hunk by hunk)
git commit -m "feat: add login API"    # Commit with message
git commit --amend --no-edit           # Add to last commit without new message

# View history and differences
git log --oneline -10                  # Last 10 commits
git log --author="Veera" --since="2024-01-01"
git diff                               # Working dir vs staging
git diff --staged                      # Staging vs last commit
git show HEAD~2:src/app.py             # View file from 2 commits ago

# Undoing changes
git restore src/app.py                 # Discard working dir change
git restore --staged src/app.py        # Unstage file
git reset HEAD~1 --soft                # Undo last commit, keep staged
git reset HEAD~1 --mixed               # Undo last commit, keep files
git reset HEAD~1 --hard                # Undo last commit, DELETE changes
git revert abc1234                     # Safe undo (creates new commit)

# Stashing work-in-progress
git stash push -m "WIP: login feature"
git stash list
git stash pop                          # Apply and remove stash
git stash apply stash@{1}             # Apply specific stash

# Finding bugs
git bisect start
git bisect bad HEAD
git bisect good v1.0.0                 # Git binary-searches for bad commit
git bisect run pytest tests/

Power Commands

# Cherry-pick a specific commit to current branch
git cherry-pick abc1234                # Apply one commit
git cherry-pick abc1234..def5678       # Apply range of commits

# Rewrite history (use with caution on shared branches)
git rebase -i HEAD~5                   # Interactive rebase last 5 commits
# Options: pick, squash, fixup, reword, drop, edit

# Find who changed a line
git blame src/app.py
git blame -L 50,60 src/app.py          # Lines 50-60 only

# Search across all commits
git log -S "password_hash"             # Find commits touching this string
git grep "TODO" $(git rev-list --all)  # Search ALL history

🔗

Git Remotes

Version Control

Understanding Remotes

A remote in Git is a named reference to a repository hosted elsewhere — on GitHub, GitLab, Bitbucket, or your own server. The default remote after cloning is always called origin. In forked workflows, it's common to have two remotes: origin (your fork) and upstream (the original repo). Remotes allow teams to share code, collaborate, and trigger CI/CD pipelines.

Remote Tracking Architecture

  Local Repository                    Remote (GitHub)
  ┌───────────────────────────┐       ┌───────────────────────┐
  │ refs/heads/main           │       │ refs/heads/main       │
  │ refs/heads/feature/login  │       │ refs/heads/develop    │
  │                           │       │                       │
  │ refs/remotes/origin/main  │─────▶ │   (remote tracking)  │
  │ refs/remotes/origin/dev   │       │                       │
  └───────────────────────────┘       └───────────────────────┘
         │
         │ git fetch (updates refs/remotes/* without merging)
         │ git pull  (fetch + merge/rebase into current branch)
         │ git push  (upload local commits to remote)

Remote Operations

# List and manage remotes
git remote -v                          # Show all remotes with URLs
git remote add origin git@github.com:user/repo.git
git remote add upstream https://github.com/original/repo.git
git remote rename origin backup
git remote remove backup

# Fetch vs Pull (important distinction!)
git fetch origin                       # Download changes, DON'T merge
git fetch --all                        # Fetch all remotes
git pull origin main                   # fetch + merge (or rebase)
git pull --rebase origin main          # fetch + rebase (cleaner history)

# Push operations
git push origin main                   # Push to remote main
git push -u origin feature/login       # Push + set upstream tracking
git push origin --delete old-branch    # Delete remote branch
git push --force-with-lease            # Safe force push (checks remote state)
git push origin v1.0.0                 # Push a tag

# Sync fork with upstream
git fetch upstream
git checkout main
git merge upstream/main
git push origin main

🌲

Branching & Merging

Version Control

Git Branching — The Core of Collaboration

A Git branch is simply a lightweight movable pointer to a commit. Creating a branch costs almost nothing (just 41 bytes — a file containing the commit hash). This makes branching the primary mechanism for parallel development. Every feature, bug fix, hotfix, or release gets its own branch, isolated from the main codebase until it's ready.

GitFlow Branching Strategy Architecture

  main ────●──────────────────────────────────────●──── (production)
           │                                      │
           │                               merge release
           │                                      │
  develop ─●────●──────●────────●─────────────────●──── (integration)
                │      │        │
            feature  feature  bugfix
            /login   /search   /cart
                │      │        │
                ●──────●──────●─┘  (feature merged back to develop)

  hotfix ──────────────────────────●───── (direct to main + develop)
                                   │
                              critical fix for prod

Branch Commands

# Create and switch branches
git branch feature/login               # Create branch
git switch feature/login               # Switch to it
git switch -c feature/signup           # Create + switch (shortcut)
git branch -d feature/login            # Delete merged branch
git branch -D feature/login            # Force delete (unmerged)

# List branches
git branch                             # Local branches
git branch -r                          # Remote branches
git branch -a                          # All branches
git branch --merged main               # Branches merged into main

Merging Strategies

# Regular merge (creates merge commit — preserves history)
git checkout main
git merge feature/login
git merge --no-ff feature/login        # Always create merge commit

# Squash merge (squash all feature commits into one)
git merge --squash feature/login
git commit -m "feat: add login feature (squashed)"

# Rebase (rewrite history — linear, clean)
git checkout feature/login
git rebase main                        # Replay feature commits on top of main
git checkout main
git merge feature/login                # Fast-forward only

Branching Strategies Comparison

Strategy	Branches	Best For	Release Model
GitFlow	main, develop, feature/, release/, hotfix/*	Scheduled releases (apps, SaaS)	Versioned releases
GitHub Flow	main + feature branches	Continuous delivery teams	Deploy on merge
Trunk-Based	main only + short-lived branches	Very high-velocity teams	Feature flags
GitLab Flow	main + environment branches	Teams with staging/prod envs	Environment promotion

💡 Rule: Branches should be short-lived (1–3 days max). Long-lived branches create "merge hell." Use feature flags to merge incomplete features to main without activating them in production.

🏗️

Maven Build Tool

Build Tools

What is Maven?

Apache Maven is a build automation and dependency management tool for Java-based projects. Before Maven, developers manually downloaded JAR files, configured classpaths, and wrote custom Ant build scripts. Maven introduced the concept of Convention over Configuration — if you follow the standard project structure, Maven knows exactly what to do without extensive configuration.

Maven uses a POM (Project Object Model) file — pom.xml — to define project metadata, dependencies, plugins, and build lifecycle. It resolves dependencies from Maven Central Repository (or your JFrog Artifactory), downloads them once, and caches them in ~/.m2/repository. Maven is the most common build tool in enterprise Java and is used by Spring Boot, Quarkus, and most Java microservices.

Maven Build Lifecycle Architecture

  pom.xml ──▶ Maven reads project config
                      │
  ┌───────────────────▼─────────────────────────────────┐
  │              Maven Build Lifecycle                  │
  │                                                     │
  │  validate → compile → test → package → verify       │
  │                                    │                │
  │                              ┌─────▼───────┐        │
  │                              │  .jar/.war  │        │
  │                              └─────┬───────┘        │
  │                                    │                │
  │  install ──────────────────────────▶  deploy        │
  │  (local ~/.m2 cache)                 (Nexus/JFrog)  │
  └─────────────────────────────────────────────────────┘
                      │
              External Dependencies
              ┌───────┴───────┐
              │               │
        Maven Central    JFrog Artifactory
        (public repo)    (private/internal)

POM.xml Structure

<project>
  <modelVersion>4.0.0</modelVersion>

  <!-- Project coordinates (GAV = GroupId:ArtifactId:Version) -->
  <groupId>com.veera.devops</groupId>
  <artifactId>myapp</artifactId>
  <version>1.0.0-SNAPSHOT</version>
  <packaging>jar</packaging>

  <properties>
    <java.version>17</java.version>
    <spring.version>3.2.0</spring.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
      <version>${spring.version}</version>
    </dependency>
    <dependency>
      <groupId>org.junit.jupiter</groupId>
      <artifactId>junit-jupiter</artifactId>
      <scope>test</scope>              <!-- Only for testing -->
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-maven-plugin</artifactId>
      </plugin>
    </plugins>
  </build>
</project>

Key Maven Commands

mvn validate                   # Validate project structure
mvn compile                    # Compile source code → target/classes/
mvn test                       # Run unit tests (JUnit/TestNG)
mvn package                    # Create JAR/WAR → target/myapp-1.0.jar
mvn install                    # Package + install to local ~/.m2 cache
mvn deploy                     # Package + upload to remote repo (JFrog)
mvn clean                      # Delete target/ directory
mvn clean package -DskipTests  # Build without running tests (CI shortcut)
mvn dependency:tree            # View full dependency tree
mvn versions:display-updates   # Check for outdated dependencies

Dependency Scopes

Scope	Compile	Test	Runtime	Use For
compile (default)	✅	✅	✅	Core dependencies
test	❌	✅	❌	JUnit, Mockito
runtime	❌	✅	✅	JDBC drivers
provided	✅	✅	❌	Servlet API (from container)

🔍

SonarQube

Code Quality

What is SonarQube?

SonarQube is a continuous code quality and security inspection platform. It performs Static Application Security Testing (SAST) — analyzing source code without executing it — to detect bugs, code smells, vulnerabilities, and coverage gaps. SonarQube integrates into CI/CD pipelines to enforce quality gates that prevent bad code from reaching production.

SonarQube works by having a scanner analyze your code locally (or in CI), send results to the SonarQube Server, which stores them in a PostgreSQL database and presents dashboards and quality gate decisions. If your code fails the quality gate (e.g., coverage below 80%, new Critical vulnerabilities), the pipeline fails and deployment is blocked.

SonarQube Integration Architecture

  Developer pushes code
           │
           v
  Jenkins / GitHub Actions
           │
           │ Step 1: Build (mvn package)
           │
           │ Step 2: Run SonarScanner
           │  mvn sonar:sonar \
           │    -Dsonar.host.url=http://sonar:9000 \
           │    -Dsonar.login=$SONAR_TOKEN
           │
           v
  SonarQube Scanner sends analysis to:
  ┌────────────────────────────────────────────────┐
  │            SonarQube Server :9000              │
  │  ┌──────────────┐  ┌──────────────────────┐   │
  │  │ Analysis DB  │  │  Quality Gate Engine │   │
  │  │ (PostgreSQL) │  │  - Coverage ≥ 80%    │   │
  │  │              │  │  - 0 Critical bugs   │   │
  │  │              │  │  - Duplication < 3%  │   │
  │  └──────────────┘  └──────────┬───────────┘   │
  └─────────────────────────────────┼──────────────┘
                                    │
                     ┌──────────────┴──────────────┐
                     │                             │
                  PASSED                        FAILED
                     │                             │
              Pipeline continues            Pipeline FAILS
              → Deploy to staging           → Developer notified
                                            → Fix and re-push

What SonarQube Detects

Issue Type	Description	Example
Bug	Code that will cause incorrect behavior	NullPointerException risk, wrong operator
Vulnerability	Security weakness exploitable by attackers	SQL injection, XSS, hardcoded password
Code Smell	Maintainability issue (tech debt)	Too many parameters, duplicate code, long methods
Security Hotspot	Code needing manual security review	Use of MD5 hashing, HTTP vs HTTPS
Coverage Gap	Code lines not covered by unit tests	Exception handling not tested

Quality Gate Configuration

# sonar-project.properties
sonar.projectKey=my-devops-app
sonar.projectName=My DevOps App
sonar.sources=src/main/java
sonar.tests=src/test/java
sonar.java.coveragePlugin=jacoco
sonar.coverage.jacoco.xmlReportPaths=target/site/jacoco/jacoco.xml
sonar.exclusions=**/generated/**,**/test/**

# Quality Gate thresholds (configure in SonarQube UI):
# New Code:
#   Coverage >= 80%
#   Duplicated Lines < 3%
#   Maintainability Rating = A
#   Reliability Rating = A
#   Security Rating = A

🤖

Jenkins Introduction

CI/CD

What is Jenkins?

Jenkins is the world's most widely used open-source automation server for Continuous Integration and Continuous Delivery. Built in Java, it orchestrates the entire software delivery pipeline — from code commit to production deployment. Jenkins monitors version control for changes, triggers automated builds, runs tests, performs quality checks, builds Docker images, and deploys to environments.

Jenkins follows a master-agent architecture: the Controller (master) manages pipelines, jobs, scheduling, and the web UI. Agents (workers) are the machines that actually execute the build steps. This allows horizontal scaling — hundreds of concurrent builds across many agents. Agents can be physical machines, VMs, Docker containers, or Kubernetes pods.

Jenkins Master-Agent Architecture

  ┌─────────────────────────────────────────────────────┐
  │              JENKINS CONTROLLER (Master)            │
  │  Port: 8080 (UI) | Port: 50000 (agent JNLP)        │
  │  ┌──────────┐ ┌─────────┐ ┌────────┐ ┌──────────┐  │
  │  │ Pipeline │ │  Job    │ │ Plugin │ │  Cred    │  │
  │  │ Engine   │ │ Queue   │ │ Manager│ │ Store    │  │
  │  └──────────┘ └─────────┘ └────────┘ └──────────┘  │
  └───────────────────────┬─────────────────────────────┘
          SSH/JNLP/Inbound│
    ┌─────────────────────┼─────────────────────┐
    │                     │                     │
    v                     v                     v
┌─────────┐         ┌─────────┐           ┌─────────┐
│ Agent 1 │         │ Agent 2 │           │ Agent 3 │
│ Linux   │         │ Docker  │           │ K8s Pod │
│ Java    │         │ Container│          │(ephemeral│
│ Maven   │         │ Node.js │           │ agent)  │
└─────────┘         └─────────┘           └─────────┘
  Builds Java         Builds JS            Scales auto

Jenkins Installation

# Install Jenkins on Ubuntu
sudo apt update
sudo apt install -y openjdk-17-jdk
curl -fsSL https://pkg.jenkins.io/debian/jenkins.io-2023.key | sudo tee \
  /usr/share/keyrings/jenkins-keyring.asc > /dev/null
echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc] \
  https://pkg.jenkins.io/debian binary/" | sudo tee \
  /etc/apt/sources.list.d/jenkins.list > /dev/null
sudo apt update && sudo apt install -y jenkins
sudo systemctl start jenkins
sudo systemctl enable jenkins

# Get initial admin password
sudo cat /var/lib/jenkins/secrets/initialAdminPassword

First Jenkins Pipeline (Declarative)

// Jenkinsfile — stored in your repo root
pipeline {
    agent any                          // Run on any available agent

    environment {
        APP_NAME = "myapp"
        DOCKER_REGISTRY = "registry.company.com"
    }

    stages {
        stage('Checkout') {
            steps {
                git branch: 'main',
                    url: 'https://github.com/org/myapp.git'
            }
        }
        stage('Build') {
            steps {
                sh 'mvn clean package -DskipTests'
            }
        }
        stage('Test') {
            steps {
                sh 'mvn test'
            }
            post {
                always {
                    junit 'target/surefire-reports/*.xml'  // Publish test results
                }
            }
        }
        stage('Docker Build & Push') {
            steps {
                withCredentials([usernamePassword(credentialsId: 'docker-creds',
                    usernameVariable: 'USER', passwordVariable: 'PASS')]) {
                    sh """
                        docker build -t $DOCKER_REGISTRY/$APP_NAME:${BUILD_NUMBER} .
                        docker login $DOCKER_REGISTRY -u $USER -p $PASS
                        docker push $DOCKER_REGISTRY/$APP_NAME:${BUILD_NUMBER}
                    """
                }
            }
        }
        stage('Deploy to Dev') {
            steps {
                sh "kubectl set image deployment/$APP_NAME $APP_NAME=$DOCKER_REGISTRY/$APP_NAME:${BUILD_NUMBER}"
            }
        }
    }

    post {
        success { slackSend channel: '#devops', message: "✅ Build #${BUILD_NUMBER} succeeded!" }
        failure { slackSend channel: '#devops', message: "❌ Build #${BUILD_NUMBER} FAILED!" }
    }
}

⚡

Jenkins Advanced

CI/CD

Declarative vs Scripted Pipelines

Jenkins offers two pipeline syntaxes. Declarative (recommended) uses a structured, opinionated syntax with built-in validation — it's easier to read and enforce standards. Scripted pipelines use Groovy code directly, offering maximum flexibility but requiring Groovy knowledge. Both are defined in a Jenkinsfile stored in your Git repository (Pipeline as Code).

Multi-Stage Jenkins Pipeline with Parallel Stages

  Trigger (webhook / timer / manual)
           │
           v
    ┌─────────────┐
    │  Checkout   │  (SCM clone)
    └──────┬──────┘
           │
    ┌──────▼──────┐
    │    Build    │  (mvn package)
    └──────┬──────┘
           │
    ┌──────▼──────────────────────────────┐
    │     PARALLEL: Test + Scan           │
    │  ┌───────────┐  ┌────────────────┐  │
    │  │ Unit Test │  │ SonarQube Scan │  │
    │  │ (JUnit)   │  │ (Quality Gate) │  │
    │  └───────────┘  └────────────────┘  │
    └──────┬──────────────────────────────┘
           │  (all parallel stages must pass)
    ┌──────▼──────┐
    │ Docker Build│
    │   & Push    │
    └──────┬──────┘
           │
    ┌──────▼──────┐     ┌─────────────────────┐
    │ Deploy Dev  │────▶│ Auto Integration Test│
    └─────────────┘     └──────────┬──────────┘
                                   │
                         ┌─────────▼──────────┐
                         │  Manual Approval   │  (input step)
                         │   (Deploy to Prod?)│
                         └─────────┬──────────┘
                                   │ Approved
                         ┌─────────▼──────────┐
                         │   Deploy PROD      │
                         └────────────────────┘

Advanced Pipeline Features

pipeline {
    agent { label 'docker-agent' }     // Use labeled agents

    options {
        timeout(time: 30, unit: 'MINUTES')  // Pipeline timeout
        retry(2)                            // Auto-retry on failure
        disableConcurrentBuilds()           // One build at a time
        buildDiscarder(logRotator(numToKeepStr: '10'))
    }

    triggers {
        pollSCM('H/5 * * * *')         // Poll Git every 5 mins
        cron('0 2 * * 1-5')            // Nightly build weekdays at 2am
    }

    parameters {
        choice(name: 'ENVIRONMENT', choices: ['dev','staging','prod'])
        booleanParam(name: 'RUN_TESTS', defaultValue: true)
        string(name: 'IMAGE_TAG', defaultValue: 'latest')
    }

    stages {
        stage('Parallel Tests') {
            parallel {
                stage('Unit Tests') { steps { sh 'mvn test' } }
                stage('Integration Tests') { steps { sh 'mvn verify -P integration' } }
                stage('Security Scan') { steps { sh 'trivy image myapp:latest' } }
            }
        }

        stage('Deploy to Prod') {
            when {
                branch 'main'
                environment name: 'ENVIRONMENT', value: 'prod'
            }
            steps {
                input message: 'Deploy to Production?', ok: 'Deploy Now',
                      submitter: 'devops-leads'
                sh './deploy-prod.sh'
            }
        }
    }
}

Shared Libraries

// vars/deployToK8s.groovy (in shared-library repo)
def call(String appName, String image, String namespace) {
    sh """
        kubectl set image deployment/${appName} ${appName}=${image} -n ${namespace}
        kubectl rollout status deployment/${appName} -n ${namespace} --timeout=3m
    """
}

// Usage in any Jenkinsfile:
@Library('my-shared-library@main') _
pipeline {
    stages {
        stage('Deploy') {
            steps {
                deployToK8s('myapp', "registry/myapp:${BUILD_NUMBER}", 'production')
            }
        }
    }
}

🔒

Jenkins Security & Plugins

CI/CD

Jenkins Security Model

Jenkins security is critical because it has access to your source code, credentials, Kubernetes clusters, and production environments. A compromised Jenkins server is a full supply chain attack vector. Jenkins security involves Authentication (who are you?), Authorization (what can you do?), and Credential Management (how do we store secrets safely?).

Jenkins Security Architecture

  Request to Jenkins
         │
         v
  ┌─────────────────────────────────────────┐
  │          Authentication Layer           │
  │  ┌────────────┐  ┌────────────────────┐ │
  │  │ Jenkins DB │  │  LDAP / Active Dir │ │
  │  │ (local)    │  │  SSO / SAML / OIDC │ │
  │  └────────────┘  └────────────────────┘ │
  └─────────────────┬───────────────────────┘
                    │ Authenticated user
  ┌─────────────────▼───────────────────────┐
  │         Authorization Layer             │
  │  Role-Based Access Control (RBAC)       │
  │  ┌────────────┐ ┌──────────┐ ┌───────┐ │
  │  │  Admin     │ │Developer │ │Viewer │ │
  │  │ All access │ │Build/Run │ │Read   │ │
  │  └────────────┘ └──────────┘ └───────┘ │
  └─────────────────┬───────────────────────┘
                    │
  ┌─────────────────▼───────────────────────┐
  │      Credentials Store (Encrypted)      │
  │  SSH Keys | API Tokens | Passwords      │
  │  Docker Registry | K8s certs | Vault    │
  └─────────────────────────────────────────┘

Essential Jenkins Plugins

Plugin	Purpose
Pipeline	Declarative and scripted pipeline support
Git / GitHub	Source code integration and webhooks
Docker Pipeline	Build and push Docker images in pipelines
Kubernetes	Dynamic K8s pod agents
SonarQube Scanner	Code quality gate integration
Slack Notification	Build notifications to Slack
Role Strategy	Fine-grained RBAC for Jenkins
Credentials Binding	Inject secrets safely into pipelines
Blue Ocean	Modern pipeline visualization UI
JUnit / TestNG	Test result publishing
HashiCorp Vault	Dynamic secret injection from Vault

Secure Credential Handling

// WRONG - Never hardcode credentials
sh 'docker login -u admin -p password123 registry.io'

// CORRECT - Use Jenkins credentials store
withCredentials([
    usernamePassword(credentialsId: 'docker-registry',
        usernameVariable: 'DOCKER_USER',
        passwordVariable: 'DOCKER_PASS'),
    string(credentialsId: 'sonar-token', variable: 'SONAR_TOKEN'),
    sshUserPrivateKey(credentialsId: 'deploy-key', keyFileVariable: 'SSH_KEY')
]) {
    sh 'docker login -u $DOCKER_USER -p $DOCKER_PASS registry.io'
    sh "ssh -i $SSH_KEY ubuntu@server 'sudo systemctl restart app'"
}

🔌

Jenkins Integrations

CI/CD

Jenkins as the Integration Hub

Jenkins' power comes from its 1800+ plugins that integrate it with virtually every tool in the DevOps ecosystem. A complete pipeline connects Git (source), Maven/Gradle (build), SonarQube (quality), JFrog Artifactory (artifacts), Docker (containers), Kubernetes (deployment), and Slack/PagerDuty (notifications) — all orchestrated through Jenkins.

Complete Jenkins Integration Map

                        ┌─────────────┐
                        │   GitHub    │──── webhook ────▶
                        │   GitLab    │                  │
                        └─────────────┘                  v
                                                  ┌──────────────┐
  ┌─────────────┐                                 │   JENKINS    │
  │  SonarQube  │◀── sonar:sonar ─────────────────│   PIPELINE   │
  └─────────────┘                                 │              │
                                                  │              │──▶ Maven Build
  ┌─────────────┐                                 │              │──▶ Docker Build
  │JFrog/Nexus  │◀── mvn deploy ─────────────────│              │──▶ K8s Deploy
  └─────────────┘                                 │              │──▶ Test Report
                                                  └──────┬───────┘
  ┌─────────────┐                                        │
  │    Slack    │◀── notification ──────────────────────┘
  │  PagerDuty  │
  └─────────────┘

Multibranch Pipeline

A Multibranch Pipeline automatically discovers all branches in a repository that contain a Jenkinsfile and creates a separate pipeline for each. This means every feature branch gets its own CI pipeline automatically — no manual job creation required. Pull Requests also get their own pipeline for validation before merge.

// Each branch has its own Jenkinsfile (or shared one)
// Jenkins auto-discovers:
//   main         → triggers on every push
//   feature/login→ triggers on every push to this branch
//   PR #42       → triggers on PR open/update

// Environment-specific logic using branch name
stage('Deploy') {
    steps {
        script {
            if (env.BRANCH_NAME == 'main') {
                sh './deploy.sh production'
            } else if (env.BRANCH_NAME.startsWith('feature/')) {
                sh "./deploy.sh dev-${env.BRANCH_NAME.replaceAll('/', '-')}"
            }
        }
    }
}

🦊

GitLab CI/CD

CI/CD Pipelines

What is GitLab CI/CD?

GitLab CI/CD is a built-in, native CI/CD system in GitLab. Unlike Jenkins (external tool), GitLab CI is deeply integrated into the platform — code, issues, merge requests, and pipelines all live in one place. Pipelines are defined in a .gitlab-ci.yml file at the root of your repository. GitLab Runners execute the jobs — they can be shared (GitLab.com provides free shared runners) or self-hosted.

GitLab CI/CD Pipeline Architecture

  git push to GitLab
          │
          │ triggers
          v
  .gitlab-ci.yml parsed
          │
  ┌───────▼────────────────────────────────────────┐
  │                 PIPELINE                       │
  │  Stage 1: build    Stage 2: test   Stage 3: deploy │
  │  ┌─────────┐      ┌──────────┐    ┌─────────────┐ │
  │  │ compile │      │unit-test │    │ deploy-dev  │ │
  │  │ docker  │      │lint      │    │ deploy-prod │ │
  │  └─────────┘      │sonarqube │    │ (manual)    │ │
  │                   └──────────┘    └─────────────┘ │
  └────────────────────────────────────────────────────┘
          │
  GitLab Runners (execute jobs)
  ┌─────────────────────────────────┐
  │  Shell Runner | Docker Runner   │
  │  K8s Runner   | Shared Runners  │
  └─────────────────────────────────┘

Complete .gitlab-ci.yml

image: maven:3.9-eclipse-temurin-17      # Default Docker image for all jobs

variables:
  MAVEN_OPTS: "-Dmaven.repo.local=.m2/repository"
  IMAGE_NAME: "registry.gitlab.com/$CI_PROJECT_PATH"

cache:
  key: "$CI_COMMIT_REF_SLUG"
  paths:
    - .m2/repository                     # Cache Maven deps between jobs

stages:
  - build
  - test
  - security
  - package
  - deploy

build:
  stage: build
  script:
    - mvn compile
  artifacts:
    paths:
      - target/

unit-tests:
  stage: test
  script:
    - mvn test
  coverage: '/Total.*?([0-9]{1,3})%/'
  artifacts:
    reports:
      junit: target/surefire-reports/TEST-*.xml

sonarqube:
  stage: security
  script:
    - mvn sonar:sonar -Dsonar.host.url=$SONAR_URL -Dsonar.login=$SONAR_TOKEN
  allow_failure: false

docker-build:
  stage: package
  image: docker:24
  services:
    - docker:dind
  script:
    - docker build -t $IMAGE_NAME:$CI_COMMIT_SHA .
    - docker push $IMAGE_NAME:$CI_COMMIT_SHA

deploy-production:
  stage: deploy
  environment:
    name: production
    url: https://myapp.example.com
  when: manual                           # Requires manual click
  only:
    - main
  script:
    - kubectl set image deployment/myapp myapp=$IMAGE_NAME:$CI_COMMIT_SHA

🐙

GitHub Actions

CI/CD Pipelines

What is GitHub Actions?

GitHub Actions is GitHub's native CI/CD and automation platform. Workflows are defined as YAML files in .github/workflows/ and are triggered by GitHub events — pushes, pull requests, issues, releases, schedules, or manual dispatch. GitHub provides free hosted runners (Ubuntu, Windows, macOS) and you can bring your own self-hosted runners.

The key innovation of GitHub Actions is its marketplace of reusable Actions — over 20,000 community-built actions for common tasks (checkout, Docker build, deploy to AWS, send Slack, etc.). Instead of writing shell scripts for everything, you compose workflows from pre-built actions.

GitHub Actions Architecture

  Event: push to main / PR opened / cron schedule
              │
              v
  .github/workflows/ci.yml
              │
  ┌───────────▼────────────────────────────────────────┐
  │                    WORKFLOW                        │
  │  Job 1: build (runs-on: ubuntu-latest)             │
  │    Step 1: actions/checkout@v4                     │
  │    Step 2: actions/setup-java@v4                   │
  │    Step 3: mvn package                             │
  │    Step 4: actions/upload-artifact@v4              │
  │                                                    │
  │  Job 2: test (needs: build)                        │
  │    Step 1: actions/checkout@v4                     │
  │    Step 2: actions/download-artifact@v4            │
  │    Step 3: mvn test                                │
  │                                                    │
  │  Job 3: deploy (needs: test, if: main branch)      │
  │    Step 1: aws-actions/configure-aws-credentials   │
  │    Step 2: kubectl deploy                          │
  └────────────────────────────────────────────────────┘
              │
  GitHub-hosted Runner (ubuntu-latest)
  or Self-hosted Runner (your EC2 / K8s)

Complete CI/CD Workflow

name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]
  workflow_dispatch:                     # Allow manual trigger

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up JDK 17
        uses: actions/setup-java@v4
        with:
          java-version: '17'
          distribution: 'temurin'
          cache: maven

      - name: Build and Test
        run: mvn clean verify

      - name: Publish Test Results
        uses: dorny/test-reporter@v1
        if: always()
        with:
          name: JUnit Tests
          path: target/surefire-reports/*.xml
          reporter: java-junit

  docker-build-push:
    needs: build-and-test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    permissions:
      contents: read
      packages: write
    steps:
      - uses: actions/checkout@v4

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}

  deploy:
    needs: docker-build-push
    runs-on: ubuntu-latest
    environment: production              # Requires environment approval
    steps:
      - name: Deploy to Kubernetes
        run: |
          kubectl set image deployment/myapp \
            myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}

🐳

Docker

Containerization

What is Docker and Why Containers?

Docker is a containerization platform that packages an application and all its dependencies (runtime, libraries, config) into a portable, isolated unit called a container. Containers solve the classic "it works on my machine" problem — a Docker container runs identically on a developer laptop, CI server, or production Kubernetes cluster.

Containers use Linux kernel namespaces (for isolation) and cgroups (for resource limits) — they share the host OS kernel but are isolated at the process, network, and filesystem level. Unlike VMs, containers don't need a guest OS, making them start in milliseconds and use megabytes instead of gigabytes of memory.

Docker Architecture — How Containers Work

  ┌─────────────────────────────────────────────────────────┐
  │                    HOST MACHINE                         │
  │                                                         │
  │  ┌──────────┐  ┌──────────┐  ┌──────────┐             │
  │  │Container1│  │Container2│  │Container3│             │
  │  │ App: API │  │ App: DB  │  │ App: Web │             │
  │  │ Port:8080│  │ Port:5432│  │ Port:80  │             │
  │  │ /app/... │  │ /var/... │  │ /www/... │             │
  │  └────┬─────┘  └────┬─────┘  └────┬─────┘             │
  │       │              │              │                   │
  │  ┌────▼──────────────▼──────────────▼──────────────┐   │
  │  │          Docker Engine (dockerd)                 │   │
  │  │  containerd → runc → Linux Kernel namespaces     │   │
  │  └──────────────────────────────────────────────────┘   │
  │                  Linux Kernel                           │
  │              (namespaces + cgroups)                     │
  └─────────────────────────────────────────────────────────┘

  Dockerfile ──▶ docker build ──▶ Image ──▶ docker push ──▶ Registry
                                     └──▶ docker run  ──▶ Container

Production-Grade Dockerfile

# Multi-stage build — small, secure final image
# Stage 1: Build
FROM maven:3.9-eclipse-temurin-17-alpine AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline -q           # Cache deps in a separate layer
COPY src ./src
RUN mvn clean package -DskipTests

# Stage 2: Runtime (minimal image — no build tools)
FROM eclipse-temurin:17-jre-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup   # Non-root user
WORKDIR /app
COPY --from=builder /app/target/myapp.jar ./app.jar
RUN chown -R appuser:appgroup /app
USER appuser                               # Run as non-root (security!)
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
  CMD wget -qO- http://localhost:8080/actuator/health || exit 1
ENTRYPOINT ["java", "-jar", "-Xmx512m", "app.jar"]

Docker Commands

# Build and run
docker build -t myapp:1.0 .
docker build -t myapp:1.0 --build-arg ENV=prod .
docker run -d -p 8080:8080 --name myapp myapp:1.0
docker run -d -p 8080:8080 -e DB_HOST=localhost -v /data:/app/data myapp:1.0

# Manage containers
docker ps                              # Running containers
docker ps -a                           # All containers (including stopped)
docker logs -f myapp                   # Stream logs
docker exec -it myapp /bin/sh          # Shell inside container
docker stats                           # Real-time resource usage
docker stop myapp && docker rm myapp

# Registry operations
docker login registry.io
docker tag myapp:1.0 registry.io/team/myapp:1.0
docker push registry.io/team/myapp:1.0
docker pull registry.io/team/myapp:1.0

# Cleanup
docker system prune -af               # Remove all unused images/containers

Docker Compose — Multi-Container Apps

version: '3.8'
services:
  app:
    build: .
    ports: ["8080:8080"]
    environment:
      DB_HOST: postgres
      REDIS_URL: redis://redis:6379
    depends_on:
      postgres:
        condition: service_healthy
    restart: unless-stopped

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: admin
      POSTGRES_PASSWORD_FILE: /run/secrets/db_pass
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U admin"]
      interval: 10s

  redis:
    image: redis:7-alpine
    command: redis-server --maxmemory 256mb

volumes:
  pgdata:

VM vs Container vs Serverless

Aspect	Virtual Machine	Container	Serverless
Startup Time	Minutes	Seconds	Milliseconds
Size	GBs (full OS)	MBs (app only)	N/A (managed)
Isolation	Full hardware	Kernel-level	Full (per invocation)
Portability	Medium	Very High	Low (vendor-specific)
Best For	Legacy apps, isolation	Microservices, CI	Event-driven, bursty

📦

JFrog Artifact Management

Artifact Management

What is Artifact Management?

An artifact is any file produced by a build process — JAR files, Docker images, npm packages, Helm charts, Python wheels, RPMs, etc. After building, these artifacts need to be stored, versioned, scanned, and distributed. JFrog Artifactory is the industry-leading universal artifact repository manager that acts as a single source of truth for all your build outputs.

Without a proper artifact repository, teams download dependencies directly from the internet (unreliable, slow, security risk) and store build outputs in Jenkins workspaces (not versioned, not audited, lost on cleanup). Artifactory provides proxying (cache Maven Central, Docker Hub), hosting (store your own artifacts), and virtual repos (unified view of multiple repos).

JFrog Artifactory in CI/CD Pipeline

  Internet (Maven Central, Docker Hub, npm registry)
           │
           │ proxy + cache (once only)
           v
  ┌──────────────────────────────────────────────────┐
  │              JFrog Artifactory                   │
  │                                                  │
  │  Virtual Repos (unified access point)            │
  │  ┌──────────────┐  ┌──────────┐  ┌───────────┐  │
  │  │ libs-release │  │ docker-  │  │ helm-     │  │
  │  │ libs-snapshot│  │ local    │  │ local     │  │
  │  │ libs-proxy   │  │ docker-  │  │ helm-     │  │
  │  │ (Maven repos)│  │ proxy    │  │ proxy     │  │
  │  └──────────────┘  └──────────┘  └───────────┘  │
  │                                                  │
  │  Xray: Security scanning of all stored artifacts │
  └─────────────────┬────────────────────────────────┘
                    │
      ┌─────────────┴─────────────┐
      │                           │
      v                           v
  Maven build              Docker build
  (mvn deploy)             (docker push)
  stores .jar              stores image layers

Configuring Maven with Artifactory

# settings.xml (~/.m2/settings.xml)
<settings>
  <servers>
    <server>
      <id>artifactory</id>
      <username>${ARTIFACTORY_USER}</username>
      <password>${ARTIFACTORY_TOKEN}</password>
    </server>
  </servers>

  <mirrors>
    <mirror>
      <id>artifactory</id>
      <mirrorOf>*</mirrorOf>          <!-- Route ALL downloads through Artifactory -->
      <url>https://artifactory.company.com/artifactory/libs-virtual</url>
    </mirror>
  </mirrors>
</settings>

💡 JFrog Xray: JFrog Xray scans all artifacts for CVEs (Common Vulnerabilities and Exposures), license compliance issues, and operational risks. It can block downloads of vulnerable packages and fail CI pipelines when critical vulnerabilities are found in dependencies.

☸️

Kubernetes Introduction

Orchestration

What is Kubernetes?

Kubernetes (K8s) is an open-source container orchestration platform originally developed by Google (inspired by their internal Borg system) and donated to the CNCF in 2014. It automates the deployment, scaling, load balancing, healing, and management of containerized applications. If Docker is about running a single container, Kubernetes is about running hundreds or thousands of containers reliably across a cluster of machines.

The core problem Kubernetes solves: when you run containers at scale, you need to answer questions like: Which server should run this container? What happens if a server dies? How do I update 100 containers without downtime? How do I scale from 3 to 30 containers during traffic spikes? Kubernetes answers all of these automatically.

Kubernetes Cluster Architecture

  ┌─────────────────────────────────────────────────────────────────┐
  │                    CONTROL PLANE (Master)                       │
  │  ┌──────────────┐  ┌──────────┐  ┌───────────┐  ┌──────────┐  │
  │  │ API Server   │  │Scheduler │  │Controller │  │   etcd   │  │
  │  │ (kube-api)   │  │(bin pack)│  │  Manager  │  │ (state   │  │
  │  │ REST gateway │  │best node │  │(reconcile)│  │  store)  │  │
  │  └──────────────┘  └──────────┘  └───────────┘  └──────────┘  │
  └──────────────────────────┬──────────────────────────────────────┘
                             │ kubectl / CI/CD deploys here
          ┌──────────────────┼───────────────────┐
          │                  │                   │
          v                  v                   v
  ┌────────────────┐ ┌────────────────┐ ┌────────────────┐
  │  WORKER NODE 1 │ │  WORKER NODE 2 │ │  WORKER NODE 3 │
  │  ┌──────────┐  │ │  ┌──────────┐  │ │  ┌──────────┐  │
  │  │  kubelet │  │ │  │  kubelet │  │ │  │  kubelet │  │
  │  │  kube-   │  │ │  │  kube-   │  │ │  │  kube-   │  │
  │  │  proxy   │  │ │  │  proxy   │  │ │  │  proxy   │  │
  │  │  (pods)  │  │ │  │  (pods)  │  │ │  │  (pods)  │  │
  │  └──────────┘  │ │  └──────────┘  │ │  └──────────┘  │
  └────────────────┘ └────────────────┘ └────────────────┘

Core Kubernetes Objects

Object	Purpose	Analogy
Pod	Smallest unit — 1+ containers sharing network/storage	A single running process
Deployment	Manages ReplicaSets, rolling updates, rollbacks	A job description for pods
Service	Stable network endpoint for pods (load balancer)	A phone number for pods
Ingress	HTTP/HTTPS routing with hostname/path rules	A receptionist routing calls
ConfigMap	Non-sensitive configuration (env vars, files)	A config file
Secret	Sensitive data (passwords, tokens) — base64 encoded	A safe
PersistentVolume	Storage that outlives pods	An external hard drive
Namespace	Logical isolation within a cluster	A separate folder

Essential kubectl Commands

# Cluster info
kubectl cluster-info
kubectl get nodes
kubectl get all -n production

# Deployments
kubectl apply -f deployment.yaml
kubectl get deployments
kubectl get pods -o wide
kubectl describe pod myapp-abc-123
kubectl logs myapp-abc-123 -f             # Stream logs
kubectl exec -it myapp-abc-123 -- /bin/sh # Shell into pod

# Scaling
kubectl scale deployment myapp --replicas=5
kubectl autoscale deployment myapp --min=2 --max=10 --cpu-percent=60

# Updates and rollbacks
kubectl set image deployment/myapp myapp=myapp:2.0
kubectl rollout status deployment/myapp
kubectl rollout history deployment/myapp
kubectl rollout undo deployment/myapp     # Rollback to previous version

🚢

Kubernetes Deployments

Orchestration

Writing Production Kubernetes Manifests

A Kubernetes manifest is a YAML file that declares the desired state of your application. Kubernetes continuously reconciles the actual state with the desired state — if a pod crashes, it's automatically restarted; if a node dies, pods are rescheduled to healthy nodes. This declarative, self-healing model is what makes Kubernetes so powerful.

Kubernetes Deployment + Service + Ingress Stack

  Internet
     │ HTTPS
     v
  ┌─────────────────────────────────────────┐
  │  Ingress (nginx-ingress-controller)     │
  │  api.myapp.com → service:myapp-svc:8080 │
  └───────────────────┬─────────────────────┘
                      │
  ┌───────────────────▼─────────────────────┐
  │  Service (ClusterIP/LoadBalancer)        │
  │  Selector: app=myapp                    │
  │  Port: 8080 → targetPort: 8080          │
  │  Load balances across all matching pods │
  └───────┬───────────────────┬─────────────┘
          │                   │
   ┌──────▼──────┐    ┌───────▼──────┐
   │  Pod (app)  │    │  Pod (app)   │   ← Deployment manages these
   │  app:myapp  │    │  app:myapp   │
   │  v2.0.1     │    │  v2.0.1      │
   └─────────────┘    └──────────────┘

Complete Production Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
  labels:
    app: myapp
    version: v2.0.1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # Allow 1 extra pod during update
      maxUnavailable: 0    # Never reduce below 3 pods (zero-downtime)
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: registry.io/myapp:v2.0.1
        ports:
        - containerPort: 8080
        env:
        - name: DB_HOST
          valueFrom:
            configMapKeyRef:
              name: myapp-config
              key: db-host
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: myapp-secrets
              key: db-password
        resources:
          requests:             # Guaranteed resources
            memory: "256Mi"
            cpu: "250m"
          limits:               # Maximum resources
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: myapp-svc
  namespace: production
spec:
  selector:
    app: myapp
  ports:
  - port: 8080
    targetPort: 8080
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts: [api.myapp.com]
    secretName: myapp-tls
  rules:
  - host: api.myapp.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-svc
            port:
              number: 8080

🎯

Kubernetes Advanced

Orchestration

Advanced Kubernetes Concepts

Beyond basic deployments, production Kubernetes requires understanding of Horizontal Pod Autoscaling (HPA), Pod Disruption Budgets (PDB), RBAC, Network Policies, and Helm for package management. These features separate a test cluster from a production-grade, secure, auto-scaling cluster.

Kubernetes Auto-Scaling Architecture

  Traffic Spike
       │
       v
  Metrics Server collects CPU/Memory from nodes
       │
       v
  HPA Controller checks metrics every 15s
  ┌───────────────────────────────────────┐
  │  target CPU = 60%                     │
  │  current CPU = 85%                    │
  │  current replicas = 3                 │
  │  desired = ceil(3 × 85/60) = 5 pods  │
  └───────────────────┬───────────────────┘
                      │ scale up
                      v
  Deployment: 3 pods → 5 pods (new pods scheduled on nodes)
                      │
  Traffic drops:      │
                      v
  HPA scales down 5 → 3 (respects stabilization window)

Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60       # Scale up if avg CPU > 60%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5min before scaling down

Helm — Kubernetes Package Manager

# Helm is like apt/yum for Kubernetes
helm repo add stable https://charts.helm.sh/stable
helm repo add bitnami https://charts.bitnami.com/bitnami

helm install my-postgres bitnami/postgresql \
  --set auth.postgresPassword=secret \
  --set primary.persistence.size=10Gi

helm upgrade myapp ./myapp-chart --set image.tag=v2.0.1
helm rollback myapp 1               # Rollback to revision 1
helm list                           # List installed releases
helm history myapp                  # Show revision history

RBAC — Role-Based Access Control

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: developer-role
rules:
- apiGroups: ["apps"]
  resources: ["deployments", "pods"]
  verbs: ["get", "list", "watch"]     # Read-only for developers
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-binding
  namespace: production
subjects:
- kind: User
  name: john.doe@company.com
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer-role
  apiGroup: rbac.authorization.k8s.io

🔄

ArgoCD GitOps

GitOps

What is GitOps?

GitOps is a deployment methodology where Git is the single source of truth for both application code and infrastructure configuration. Instead of running kubectl apply from a CI pipeline (push model), a GitOps agent like ArgoCD runs inside the cluster, continuously watches a Git repository, and automatically syncs any changes to the cluster (pull model). If someone manually changes the cluster, ArgoCD detects the drift and reverts it.

ArgoCD is a declarative GitOps continuous delivery tool for Kubernetes. It monitors Git repos for changes to Kubernetes manifests (YAML, Helm charts, Kustomize) and ensures the cluster always matches what's in Git. This provides complete auditability — every change is a Git commit with author, timestamp, and reason.

ArgoCD GitOps Architecture — Push vs Pull

  TRADITIONAL (Push Model - Jenkins):
  Developer → git push → Jenkins pipeline → kubectl apply → K8s Cluster
  (CI has admin credentials to cluster — security risk)

  GITOPS (Pull Model - ArgoCD):
  Developer → git push → Git Repo (config changes)
                              │
                         ArgoCD watches │ (in-cluster, no external access needed)
                              │
                    ┌─────────▼──────────────────────┐
                    │         ArgoCD (in K8s)         │
                    │  ┌────────────────────────────┐ │
                    │  │ Application Controller     │ │
                    │  │ - Compares Git vs Cluster  │ │
                    │  │ - Detects drift            │ │
                    │  │ - Auto-syncs on change     │ │
                    │  └────────────────────────────┘ │
                    └────────────┬───────────────────┘
                                 │ sync
                                 v
                          K8s Cluster
                    (always matches Git state)

ArgoCD Application Manifest

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-production
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/company/k8s-configs.git
    targetRevision: main
    path: apps/production/myapp      # Folder with K8s YAML or Helm chart
    helm:
      valueFiles:
      - values-production.yaml
  destination:
    server: https://kubernetes.default.svc   # Target cluster
    namespace: production
  syncPolicy:
    automated:
      prune: true           # Delete resources removed from Git
      selfHeal: true        # Auto-revert manual changes
    syncOptions:
    - CreateNamespace=true
    retry:
      limit: 3
      backoff:
        duration: 5s
        maxDuration: 1m

CI/CD + GitOps Combined Pipeline

# CI Pipeline (GitHub Actions / Jenkins):
# 1. Developer pushes app code
# 2. CI builds, tests, scans Docker image
# 3. CI pushes image to registry (ghcr.io/team/myapp:abc123)
# 4. CI opens a PR to the config repo:
#      Update image tag in k8s-configs/apps/production/values.yaml
#      from: image.tag: v1.0.0
#      to:   image.tag: abc123

# GitOps Pipeline (ArgoCD):
# 5. PR reviewed and merged to config repo
# 6. ArgoCD detects change in config repo
# 7. ArgoCD syncs new image tag to cluster
# 8. Kubernetes performs rolling update
# 9. ArgoCD reports sync status ✅

📊

Grafana Monitoring

Monitoring

The Observability Stack

Observability means understanding the internal state of a system by examining its outputs. Modern systems require three pillars of observability: Metrics (numerical time-series data — CPU, request rate, error rate), Logs (textual event records), and Traces (request journeys across microservices). The standard DevOps observability stack combines Prometheus (metrics collection) + Grafana (visualization + alerting) + ELK/Loki (logs).

Complete Monitoring Stack Architecture

  Applications + Kubernetes Nodes
         │
  ┌──────┴─────────────────────────────────────────────┐
  │              DATA COLLECTION LAYER                 │
  │  ┌─────────────┐  ┌───────────┐  ┌─────────────┐  │
  │  │ Prometheus  │  │  Loki     │  │   Jaeger    │  │
  │  │ (metrics)   │  │  (logs)   │  │  (traces)   │  │
  │  │ scrapes     │  │  receives │  │  receives   │  │
  │  │ /metrics    │  │  log push │  │  spans      │  │
  │  └──────┬──────┘  └─────┬─────┘  └──────┬──────┘  │
  └─────────┼───────────────┼───────────────┼──────────┘
            │               │               │
            └───────────────┴───────────────┘
                            │
                      Data Sources
                            │
                  ┌─────────▼─────────┐
                  │     GRAFANA       │
                  │  Dashboards       │
                  │  Alerting Rules   │
                  │  Explore (adhoc)  │
                  └─────────┬─────────┘
                            │ alerts
                  ┌─────────▼─────────┐
                  │  Alert Manager    │
                  │  → Slack          │
                  │  → PagerDuty      │
                  │  → Email          │
                  └───────────────────┘

Prometheus Metrics & PromQL

# Prometheus scrape config (prometheus.yml)
scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true

# Key PromQL queries for DevOps dashboards:

# HTTP Request Rate (requests/second)
rate(http_requests_total[5m])

# Error Rate (%)
rate(http_requests_total{status=~"5.."}[5m])
/ rate(http_requests_total[5m]) * 100

# 95th Percentile Latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# CPU Usage per Pod
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)

# Memory Usage
container_memory_working_set_bytes{container!=""} / 1024 / 1024

Grafana Alerting

# Alert Rule Example (in Grafana UI or as YAML):
groups:
- name: application
  rules:
  - alert: HighErrorRate
    expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
    for: 2m                    # Must be true for 2 minutes before firing
    labels:
      severity: critical
    annotations:
      summary: "High error rate on {{ $labels.service }}"
      description: "Error rate is {{ $value | humanizePercentage }}"
      runbook: "https://wiki/runbooks/high-error-rate"

  - alert: PodCrashLooping
    expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
    for: 0m
    labels:
      severity: warning

🤝

Ansible Introduction

Config Management

What is Ansible?

Ansible is an agentless, open-source configuration management and automation tool. Unlike Chef or Puppet which require agents installed on managed nodes, Ansible uses SSH (or WinRM for Windows) to connect to target machines and execute tasks defined in human-readable YAML files called Playbooks. This agentless approach makes Ansible easy to adopt — just install Python on the target and you're ready.

Ansible follows a push-based model: the Ansible control node (your machine or CI server) pushes configuration to managed nodes. It is idempotent — running the same playbook 10 times produces the same result as running it once (it won't install Nginx twice if it's already installed).

Ansible Architecture — Agentless Push Model

  CONTROL NODE (your machine / Jenkins)
  ┌───────────────────────────────────────────┐
  │  Ansible Engine                           │
  │  ┌──────────┐ ┌──────────┐ ┌──────────┐  │
  │  │Inventory │ │Playbooks │ │ Modules  │  │
  │  │(who)     │ │(what)    │ │(how)     │  │
  │  └──────────┘ └──────────┘ └──────────┘  │
  └──────────────────────┬────────────────────┘
                         │
              SSH (port 22) — no agents needed!
          ┌──────────────┼──────────────┐
          │              │              │
          v              v              v
   ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
   │ Web Server  │ │  App Server │ │  DB Server  │
   │ Ubuntu 22   │ │  Ubuntu 22  │ │  Ubuntu 22  │
   │ Python only │ │  Python only│ │  Python only│
   │ (no agent)  │ │  (no agent) │ │  (no agent) │
   └─────────────┘ └─────────────┘ └─────────────┘

  Ansible copies module code to target via SSH,
  executes it, retrieves results, deletes temp files.

Ansible vs Other Config Management Tools

Feature	Ansible	Chef	Puppet	SaltStack
Agent Required	❌ No	✅ Yes	✅ Yes	Optional
Language	YAML	Ruby DSL	Puppet DSL	YAML/Python
Push/Pull	Push	Pull	Pull	Both
Learning Curve	Low	High	High	Medium
Best For	Ad-hoc, apps, cloud	Complex config	Enterprise config	Large scale

📋

Ansible Inventory & Commands

Config Management

Ansible Inventory

The inventory tells Ansible which hosts to manage and how to connect to them. Hosts can be grouped logically (web servers, DB servers, production, staging). Inventory can be static (an INI or YAML file) or dynamic (a script that queries AWS/Azure/GCP for running instances in real-time). Dynamic inventories are essential for cloud environments where servers come and go.

Inventory Structure

  inventory/
  ├── hosts.yml          (static inventory)
  ├── aws_ec2.yml        (dynamic inventory plugin)
  └── group_vars/
      ├── all.yml        (vars for ALL hosts)
      ├── webservers.yml (vars for webservers group)
      └── production.yml (vars for production group)

Static Inventory (YAML)

# inventory/hosts.yml
all:
  children:
    webservers:
      hosts:
        web1.prod.com:
          ansible_user: ubuntu
          ansible_ssh_private_key_file: ~/.ssh/prod.pem
        web2.prod.com:
          ansible_user: ubuntu
    dbservers:
      hosts:
        db1.prod.com:
          ansible_user: ubuntu
          ansible_port: 2222            # Custom SSH port
    staging:
      children:
        web_staging:
          hosts:
            staging-web.company.com:
        db_staging:
          hosts:
            staging-db.company.com:

Ad-Hoc Commands (Quick Tasks)

# Format: ansible [pattern] -m [module] -a [arguments]

# Test connectivity
ansible all -m ping
ansible webservers -m ping

# Run shell commands
ansible webservers -m shell -a "df -h"
ansible dbservers -m shell -a "systemctl status postgresql"

# Copy files
ansible webservers -m copy -a "src=./nginx.conf dest=/etc/nginx/nginx.conf"

# Install packages
ansible webservers -m apt -a "name=nginx state=present" --become

# Manage services
ansible webservers -m service -a "name=nginx state=restarted" --become

# Gather facts (info about target)
ansible web1.prod.com -m setup | grep ansible_distribution

📜

Ansible Playbooks

Config Management

What is a Playbook?

An Ansible Playbook is a YAML file that defines a set of ordered Plays. Each Play maps a set of Tasks to a group of hosts. Tasks call Modules — over 3000 built-in modules exist for managing packages, files, services, cloud resources, databases, users, and more. Playbooks are idempotent, version-controlled, and the primary way to automate complex multi-step configurations.

Playbook Execution Flow

  ansible-playbook deploy-app.yml
          │
          v
  ┌───────────────────────────────────────────────────┐
  │  PLAY 1: Configure Web Servers                    │
  │  hosts: webservers                                │
  │  ┌───────────────────────────────────────────┐    │
  │  │ Task 1: Update apt cache                  │    │
  │  │ Task 2: Install nginx                     │    │
  │  │ Task 3: Copy nginx config (template)      │    │
  │  │ Task 4: Notify handler: restart nginx     │    │
  │  └───────────────────────────────────────────┘    │
  │  HANDLERS (run only if notified):                 │
  │  │  → restart nginx                              │
  └───────────────────────────────────────────────────┘
          │
  ┌───────────────────────────────────────────────────┐
  │  PLAY 2: Deploy Application                       │
  │  hosts: webservers                                │
  │  ┌───────────────────────────────────────────┐    │
  │  │ Task 1: Pull Docker image                 │    │
  │  │ Task 2: Stop old container                │    │
  │  │ Task 3: Start new container               │    │
  │  │ Task 4: Health check                      │    │
  │  └───────────────────────────────────────────┘    │
  └───────────────────────────────────────────────────┘

Complete Playbook Example

---
- name: Configure and Deploy Web Application
  hosts: webservers
  become: yes                      # Run as sudo
  vars:
    app_name: myapp
    app_port: 8080
    app_version: "{{ lookup('env', 'APP_VERSION') | default('latest') }}"

  pre_tasks:
    - name: Update package cache
      apt:
        update_cache: yes
        cache_valid_time: 3600     # Only update if cache is > 1hr old

  tasks:
    - name: Install required packages
      apt:
        name:
          - nginx
          - docker.io
          - curl
        state: present

    - name: Deploy nginx config from template
      template:
        src: templates/nginx.conf.j2
        dest: /etc/nginx/sites-available/{{ app_name }}
        owner: www-data
        mode: '0644'
      notify: Reload nginx          # Trigger handler only if file changed

    - name: Pull application Docker image
      docker_image:
        name: "registry.io/{{ app_name }}:{{ app_version }}"
        source: pull

    - name: Run application container
      docker_container:
        name: "{{ app_name }}"
        image: "registry.io/{{ app_name }}:{{ app_version }}"
        state: started
        restart_policy: always
        ports:
          - "{{ app_port }}:{{ app_port }}"
        env:
          DB_HOST: "{{ db_host }}"
          DB_PASSWORD: "{{ vault_db_password }}"  # From Ansible Vault (encrypted)

    - name: Wait for application to be healthy
      uri:
        url: "http://localhost:{{ app_port }}/health"
        status_code: 200
      register: health_result
      until: health_result.status == 200
      retries: 12
      delay: 5

  handlers:
    - name: Reload nginx
      service:
        name: nginx
        state: reloaded

    - name: Restart app
      docker_container:
        name: "{{ app_name }}"
        state: started
        restart: yes

Ansible Roles — Reusable Structure

roles/
  nginx/
    tasks/
      main.yml        # Main task list
    handlers/
      main.yml        # Handlers
    templates/
      nginx.conf.j2   # Jinja2 templates
    vars/
      main.yml        # Role variables
    defaults/
      main.yml        # Default values (overridable)
    meta/
      main.yml        # Dependencies on other roles

# Using roles in a playbook:
- hosts: webservers
  roles:
    - nginx
    - { role: app-deploy, app_version: v2.0.1 }

🏛️

Terraform Introduction

IaC

What is Infrastructure as Code?

Infrastructure as Code (IaC) means managing and provisioning cloud infrastructure through machine-readable configuration files rather than clicking through web consoles or running manual commands. When infrastructure is code, it can be versioned, reviewed, tested, and rolled back — just like application code. This eliminates configuration drift (where production differs from what you think it is) and enables full infrastructure reproducibility.

Terraform by HashiCorp is the leading IaC tool. It uses a declarative language (HCL — HashiCorp Configuration Language) where you describe the desired end state of your infrastructure, and Terraform figures out how to get there. Terraform is cloud-agnostic — it works with 300+ providers including AWS, Azure, GCP, Kubernetes, GitHub, Datadog, and more through a provider plugin system.

Terraform Architecture & Execution Model

  .tf files (your code)
       │
       │ terraform init
       v
  Download providers (AWS, Azure, GCP plugins)
       │
       │ terraform plan
       v
  ┌──────────────────────────────────────────────────┐
  │              TERRAFORM PLAN                      │
  │  Read current state (terraform.tfstate)          │
  │  Query actual cloud state (via provider APIs)    │
  │  Compare: desired vs actual                      │
  │  Generate execution plan:                        │
  │    + resource "aws_instance" "web" (CREATE)      │
  │    ~ resource "aws_sg" "allow_http" (MODIFY)     │
  │    - resource "aws_instance" "old" (DESTROY)     │
  └──────────────────┬───────────────────────────────┘
                     │ terraform apply
                     v
  Terraform calls AWS/Azure/GCP APIs
  Creates/modifies/destroys resources
                     │
                     v
  terraform.tfstate updated (source of truth)

Terraform vs Ansible vs CloudFormation

Tool	Purpose	Approach	State	Best For
Terraform	Provision infrastructure	Declarative	State file	Cloud resources (VMs, VPCs, RDS)
Ansible	Configure servers	Procedural/Decl	Stateless	Software install, app deploy
CloudFormation	AWS infra only	Declarative	CF stacks	AWS-only shops
Pulumi	Provision infra	Declarative (code)	State file	Developers preferring Python/TS

🔨

Terraform Basics

IaC

HCL Syntax & Core Blocks

Terraform uses HCL (HashiCorp Configuration Language) — a human-readable configuration language designed to be easier than JSON/YAML while being machine-parseable. The four main block types are terraform (settings), provider (cloud APIs), resource (infrastructure objects), and data (read existing resources).

# main.tf — Provider configuration
terraform {
  required_version = ">= 1.6.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"           # Allow 5.x versions only
    }
  }
  backend "s3" {                   # Remote state in S3
    bucket = "my-terraform-state"
    key    = "production/terraform.tfstate"
    region = "ap-south-1"
    encrypt = true
  }
}

provider "aws" {
  region = var.aws_region          # Use variable
  default_tags {
    tags = {
      Project     = "DevOps-Class"
      ManagedBy   = "Terraform"
      Environment = var.environment
    }
  }
}

# variables.tf — Input variables
variable "aws_region" {
  description = "AWS region to deploy to"
  type        = string
  default     = "ap-south-1"
}

variable "instance_type" {
  type    = string
  default = "t3.micro"
  validation {
    condition     = contains(["t3.micro", "t3.small", "t3.medium"], var.instance_type)
    error_message = "Must be a valid t3 instance type."
  }
}

variable "environment" {
  type = string
}

# outputs.tf — Output values
output "public_ip" {
  description = "Public IP of web server"
  value       = aws_instance.web.public_ip
}

output "db_endpoint" {
  value     = aws_db_instance.main.endpoint
  sensitive = true                # Won't print in logs
}

Creating AWS Infrastructure

# ec2.tf
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
}

resource "aws_subnet" "public" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]
}

resource "aws_security_group" "web" {
  name   = "web-sg"
  vpc_id = aws_vpc.main.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_instance" "web" {
  count                  = 2                         # Create 2 instances
  ami                    = data.aws_ami.ubuntu.id
  instance_type          = var.instance_type
  subnet_id              = aws_subnet.public[count.index].id
  vpc_security_group_ids = [aws_security_group.web.id]
  key_name               = "my-keypair"

  user_data = <<-EOF
    #!/bin/bash
    apt-get update -y
    apt-get install -y nginx
    systemctl start nginx
    echo "Hello from instance ${count.index}" > /var/www/html/index.html
  EOF
}

data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]   # Canonical (Ubuntu)
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-*-22.04-amd64-server-*"]
  }
}

Terraform Workflow Commands

terraform init                        # Initialize, download providers & modules
terraform validate                    # Check syntax
terraform fmt                         # Format code (run before git commit)
terraform plan                        # Preview changes (dry run)
terraform plan -out=tfplan            # Save plan to file
terraform apply tfplan                # Apply saved plan (no confirmation needed)
terraform apply -target=aws_instance.web  # Apply only specific resource
terraform destroy                     # Destroy ALL resources
terraform state list                  # List all managed resources
terraform state show aws_instance.web  # Show resource details
terraform import aws_instance.web i-1234567890  # Import existing resource

📦

Terraform Modules

IaC

What are Terraform Modules?

A Terraform module is a collection of Terraform files in a directory that can be reused and shared. Modules are the primary mechanism for code reuse in Terraform — instead of writing the same VPC configuration in every project, you create a VPC module and call it with different parameters. Modules enforce consistency and reduce duplication across teams.

Module Architecture

  infrastructure/
  ├── main.tf           (root module — calls child modules)
  ├── variables.tf
  ├── outputs.tf
  └── modules/
      ├── vpc/          (VPC module)
      │   ├── main.tf
      │   ├── variables.tf
      │   └── outputs.tf
      ├── ec2/          (EC2 module)
      └── rds/          (RDS module)

  Root module calls child modules:
  module "vpc" {
    source = "./modules/vpc"
    cidr   = "10.0.0.0/16"
  }

  module "web_servers" {
    source      = "./modules/ec2"
    vpc_id      = module.vpc.vpc_id     # Pass output of one module to another
    subnet_ids  = module.vpc.public_subnet_ids
    instance_count = 3
  }

Creating a Reusable Module

# modules/ec2/main.tf
resource "aws_launch_template" "this" {
  name_prefix   = "${var.name}-"
  image_id      = var.ami_id
  instance_type = var.instance_type
  key_name      = var.key_name

  network_interfaces {
    security_groups = [aws_security_group.this.id]
  }

  tag_specifications {
    resource_type = "instance"
    tags = merge(var.tags, { Name = var.name })
  }
}

resource "aws_autoscaling_group" "this" {
  name               = var.name
  vpc_zone_identifier = var.subnet_ids
  desired_capacity   = var.desired_count
  min_size           = var.min_count
  max_size           = var.max_count

  launch_template {
    id      = aws_launch_template.this.id
    version = "$Latest"
  }
}

# modules/ec2/variables.tf
variable "name"          { type = string }
variable "ami_id"        { type = string }
variable "instance_type" { type = string; default = "t3.micro" }
variable "subnet_ids"    { type = list(string) }
variable "desired_count" { type = number; default = 2 }
variable "min_count"     { type = number; default = 1 }
variable "max_count"     { type = number; default = 10 }
variable "tags"          { type = map(string); default = {} }

# modules/ec2/outputs.tf
output "asg_name" { value = aws_autoscaling_group.this.name }
output "asg_arn"  { value = aws_autoscaling_group.this.arn }

Using Public Registry Modules

# Use official AWS VPC module from Terraform Registry
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "production-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["ap-south-1a", "ap-south-1b", "ap-south-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway = true
  enable_vpn_gateway = false
}

🗄️

Terraform State & Advanced

IaC

Understanding Terraform State

Terraform stores the mapping between your configuration and real-world resources in a state file (terraform.tfstate). This is how Terraform knows that aws_instance.web in your code corresponds to instance i-0a1b2c3d4e in AWS. Without state, Terraform would recreate every resource on every apply. The state file is the most critical file in your Terraform project — losing it means losing the ability to manage your infrastructure with Terraform.

Remote State Architecture with Locking

  Developer A                    Developer B
  terraform apply                terraform apply
       │                              │
       v                              v
  ┌────────────────────────────────────────────────────┐
  │              Remote State Backend (S3)             │
  │  s3://my-tfstate/prod/terraform.tfstate            │
  │                                                    │
  │  ┌────────────────────────────────────────────┐   │
  │  │            State Locking (DynamoDB)        │   │
  │  │  LockID: prod/terraform.tfstate            │   │
  │  │  If Dev A holds lock → Dev B gets error:  │   │
  │  │  "Error acquiring the state lock"          │   │
  │  │  Prevents simultaneous applies (race cond) │   │
  │  └────────────────────────────────────────────┘   │
  └────────────────────────────────────────────────────┘

Remote State Backend Setup

# backend.tf
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "production/myapp/terraform.tfstate"
    region         = "ap-south-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"     # For state locking
  }
}

# Create the S3 bucket and DynamoDB table first (bootstrap):
resource "aws_s3_bucket" "tf_state" {
  bucket = "company-terraform-state"
}
resource "aws_s3_bucket_versioning" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  versioning_configuration { status = "Enabled" }  # Versioning for state history!
}
resource "aws_dynamodb_table" "tf_lock" {
  name         = "terraform-state-lock"
  hash_key     = "LockID"
  billing_mode = "PAY_PER_REQUEST"
  attribute {
    name = "LockID"
    type = "S"
  }
}

Terraform Workspaces (Multiple Environments)

# Workspaces allow multiple state files for same config
terraform workspace new staging
terraform workspace new production
terraform workspace select production

# Use workspace in config:
resource "aws_instance" "web" {
  instance_type = terraform.workspace == "production" ? "t3.medium" : "t3.micro"
  count         = terraform.workspace == "production" ? 3 : 1
}

✅

Terraform Best Practices

IaC

Production Terraform Standards

Recommended Terraform Project Structure

  infrastructure/
  ├── environments/
  │   ├── dev/
  │   │   ├── main.tf          (calls modules with dev values)
  │   │   ├── variables.tf
  │   │   └── terraform.tfvars (dev-specific values — NOT in git for prod!)
  │   ├── staging/
  │   └── production/
  ├── modules/
  │   ├── vpc/
  │   ├── eks-cluster/
  │   ├── rds/
  │   └── alb/
  └── .github/
      └── workflows/
          └── terraform.yml    (CI/CD for infrastructure)

Terraform in CI/CD Pipeline

# .github/workflows/terraform.yml
on:
  pull_request:
    paths: ['infrastructure/**']
  push:
    branches: [main]
    paths: ['infrastructure/**']

jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0

      - name: Terraform Init
        run: terraform init
        working-directory: infrastructure/environments/production

      - name: Terraform Format Check
        run: terraform fmt -check -recursive

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan (on PR)
        if: github.event_name == 'pull_request'
        run: terraform plan -no-color -out=tfplan

      - name: Terraform Apply (on merge to main)
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: terraform apply -auto-approve tfplan

Key Best Practices

✅ Always Do

Use remote state with locking
Pin provider versions (~> 5.0)
Run terraform fmt before commit
Use modules for reusability
Tag all resources (cost tracking)
Store secrets in Vault/SSM, not .tfvars
Review terraform plan before apply

❌ Never Do

Commit terraform.tfstate to Git
Commit *.tfvars with secrets
Run terraform apply without plan
Use local state in team environments
Hardcode credentials in .tf files
Skip resource tagging
Import everything manually

⚠️ State File Security: The terraform.tfstate file contains ALL your infrastructure details in plaintext, including sensitive values. Always: (1) Store in S3 with encryption, (2) Enable versioning, (3) Restrict IAM access, (4) Never commit to Git.

🟢

OpenTofu

IaC

What is OpenTofu?

OpenTofu is a community-driven, open-source fork of Terraform created in response to HashiCorp's controversial license change in August 2023. HashiCorp changed Terraform from the Mozilla Public License (MPL 2.0 — truly open source) to the Business Source License (BSL/BUSL — restricts commercial use). This alarmed the DevOps community, leading to the formation of the OpenTofu project under the Linux Foundation.

OpenTofu aims to be a drop-in replacement for Terraform — it uses the same HCL syntax, same providers, same state format, and same workflow commands. If you know Terraform, you know OpenTofu. It is maintained by a community of contributors from companies like Spacelift, Gruntwork, env0, Scalr, and many others.

Terraform vs OpenTofu Timeline

  2014 ──── Terraform 0.1 released (MPL 2.0 license — open source)
  2022 ──── Terraform 1.0 → 1.5 (stable, widely adopted)
  Aug 2023 ─ HashiCorp changes license to BSL (commercial restriction)
             "If you compete with HashiCorp, you can't use Terraform"
             │
             │ Community reaction
             v
  Sep 2023 ─ OpenTofu fork announced (Linux Foundation)
             Supported by Gruntwork, Spacelift, env0, Scalr, etc.
  Jan 2024 ─ OpenTofu 1.6.0 released (stable, GA)
  2024 ──── OpenTofu adds features ahead of Terraform:
             - State encryption
             - Provider-defined functions
             - Improved testing framework

  OpenTofu promise: Always open source (MPL 2.0)

Migrating from Terraform to OpenTofu

# Install OpenTofu
curl -fsSL https://get.opentofu.org/install-opentofu.sh | sudo bash -s -- --install-method rpm

# Verify
tofu --version

# Migration is simple — OpenTofu reads Terraform state
cd your-terraform-project/
tofu init                    # Downloads providers (same as terraform init)
tofu plan                    # Same output as terraform plan
tofu apply                   # Same workflow

# tofu commands mirror terraform commands exactly:
# terraform init    → tofu init
# terraform plan    → tofu plan
# terraform apply   → tofu apply
# terraform destroy → tofu destroy
# terraform import  → tofu import

OpenTofu Unique Features (vs Terraform)

Feature	OpenTofu	Terraform
License	MPL 2.0 (truly open)	BSL (commercial restrictions)
State Encryption	✅ Built-in (1.7+)	❌ Only via backend
Provider Functions	✅ Supported	❌ Not yet
State backends	All Terraform backends	All + HCP Terraform
Cost	Free forever	Free (BSL restrictions apply)
Governance	Linux Foundation	HashiCorp / IBM

🐍

Python for DevOps

Scripting

Why Python for DevOps?

Python is the dominant scripting and automation language in DevOps. While bash is great for simple shell tasks, Python excels at complex automation, API integrations, data processing, and building DevOps tools. Ansible is written in Python. The AWS CLI, Azure CLI, and Google Cloud SDK all have Python SDKs. Kubernetes client, Docker SDK, Terraform CDK — all have Python support.

Python's rich standard library and enormous ecosystem (PyPI) means you can write a script that calls AWS APIs, processes JSON, reads YAML config, makes HTTP calls, and sends Slack notifications in under 50 lines of code. This is why Python is a must-have skill for DevOps engineers.

Python DevOps Automation Architecture

  Python Script / Tool
         │
  ┌──────┴─────────────────────────────────────────┐
  │            Python DevOps Ecosystem              │
  │                                                 │
  │  boto3 ──────────────▶ AWS APIs (EC2, S3, ECS) │
  │  kubernetes ──────────▶ K8s API Server          │
  │  docker ──────────────▶ Docker Engine API        │
  │  requests ────────────▶ Any REST API             │
  │  paramiko ────────────▶ SSH to servers           │
  │  PyYAML ──────────────▶ Parse YAML (K8s manifests│
  │  jinja2 ──────────────▶ Template config files    │
  │  click ───────────────▶ Build CLI tools          │
  │  schedule ────────────▶ Cron-like task scheduling│
  └─────────────────────────────────────────────────┘

AWS Automation with Boto3

import boto3 import json from datetime import datetime, timedelta # Auto-stop unused EC2 instances (save costs) def stop_idle_instances(): ec2 = boto3.client('ec2', region_name='ap-south-1') cloudwatch = boto3.client('cloudwatch', region_name='ap-south-1') # Get all running instances response = ec2.describe_instances( Filters=[{'Name': 'instance-state-name', 'Values': ['running']}] ) for reservation in response['Reservations']: for instance in reservation['Instances']: instance_id = instance['InstanceId'] # Check avg CPU in last hour metrics = cloudwatch.get_metric_statistics( Namespace='AWS/EC2', MetricName='CPUUtilization', Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}], StartTime=datetime.utcnow() - timedelta(hours=1), EndTime=datetime.utcnow(), Period=3600, Statistics=['Average'] ) if metrics['Datapoints']: avg_cpu = metrics['Datapoints'][0]['Average'] if avg_cpu < 5.0: # Less than 5% CPU — idle! print(f"Stopping idle instance {instance_id} (CPU: {avg_cpu:.1f}%)") ec2.stop_instances(InstanceIds=[instance_id]) stop_idle_instances()

Kubernetes API with Python

from kubernetes import client, config import sys def rolling_restart(namespace, deployment_name): config.load_kube_config() # Load ~/.kube/config apps_v1 = client.AppsV1Api() # Get current deployment deployment = apps_v1.read_namespaced_deployment( name=deployment_name, namespace=namespace ) # Add restart annotation (triggers rolling restart) if not deployment.spec.template.metadata.annotations: deployment.spec.template.metadata.annotations = {} deployment.spec.template.metadata.annotations['kubectl.kubernetes.io/restartedAt'] = \ datetime.utcnow().isoformat() apps_v1.patch_namespaced_deployment( name=deployment_name, namespace=namespace, body=deployment ) print(f"✅ Rolling restart triggered for {deployment_name}") rolling_restart('production', 'myapp')

Useful Python DevOps Libraries

Library	Purpose	Install
boto3	AWS SDK — EC2, S3, ECS, Lambda, etc.	pip install boto3
kubernetes	Kubernetes API client	pip install kubernetes
docker	Docker Engine API	pip install docker
requests	HTTP calls to any REST API	pip install requests
paramiko	SSH connections and SFTP	pip install paramiko
PyYAML	Parse and write YAML	pip install pyyaml
Jinja2	Template engine (config generation)	pip install jinja2
click	Build CLI tools easily	pip install click
python-dotenv	Load .env files as env vars	pip install python-dotenv
slack-sdk	Send Slack notifications	pip install slack-sdk

✅ Python DevOps Tip: Always use virtual environments (python -m venv .venv) and pin your dependencies in requirements.txt (pip freeze > requirements.txt). This ensures your automation scripts produce consistent results across different machines and CI environments.

Multicloud DevOpsWith AI

DevOps Fundamentals

What is DevOps?

Traditional IT vs DevOps

DORA Metrics — Measuring DevOps Performance

DevOps Engineer Skills Map

🔧 Technical Skills

📋 Process Skills

🔐 DevSecOps

Shell Scripting

Why Shell Scripting in DevOps?

Script Structure & Shebang

Variables, Conditionals, Loops

Functions & Error Handling

Common DevOps Shell Tasks

📁 File Operations

🌐 Network & Process

Web Servers

What is a Web Server?

Nginx Key Configuration

Nginx vs Apache Comparison

Common Web Server Commands

Git Introduction

What is Git and Why It Matters

Git Object Model

Three Areas of Git

Git Repositories

Types of Git Repositories

Creating and Managing Repos

Git Setup

Configuring Git Correctly

Essential Git Configuration

.gitignore Best Practices

Git Commands

Core Git Workflow Commands

Daily Workflow Commands

Power Commands

Git Remotes

Understanding Remotes

Remote Operations

Branching & Merging

Git Branching — The Core of Collaboration

Branch Commands

Merging Strategies

Branching Strategies Comparison

Maven Build Tool

What is Maven?

POM.xml Structure

Key Maven Commands

Dependency Scopes

SonarQube

What is SonarQube?

What SonarQube Detects

Quality Gate Configuration

Jenkins Introduction

What is Jenkins?

Jenkins Installation

First Jenkins Pipeline (Declarative)

Jenkins Advanced

Declarative vs Scripted Pipelines

Advanced Pipeline Features

Shared Libraries

Jenkins Security & Plugins

Jenkins Security Model

Essential Jenkins Plugins

Secure Credential Handling

Jenkins Integrations

Jenkins as the Integration Hub

Multibranch Pipeline

GitLab CI/CD

What is GitLab CI/CD?

Complete .gitlab-ci.yml

GitHub Actions

What is GitHub Actions?

Complete CI/CD Workflow

Docker

What is Docker and Why Containers?

Production-Grade Dockerfile

Docker Commands

Docker Compose — Multi-Container Apps

Multicloud DevOps
With AI