By Veera Sir
34 Topics β Complete Notes with Architecture & Deep Theory
DevOps is a cultural philosophy and technical movement that bridges the gap between software Development (Dev) and IT Operations (Ops). Before DevOps, developers wrote code and "threw it over the wall" to Ops teams, causing slow releases, blame culture, and frequent production failures. DevOps breaks these walls by establishing shared ownership, automated pipelines, and continuous feedback loops across all stages of the software lifecycle.
The term was coined by Patrick Debois in 2009. It is built on the CAMS framework: Culture (shared responsibility), Automation (eliminate manual toil), Measurement (DORA metrics), and Sharing (knowledge across teams). DevOps is not a tool β it's a mindset that enables organizations to deliver software faster, more reliably, and more securely.
Developer Workstation
| git push
v
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CI/CD PIPELINE (Jenkins / GitHub Actions)β
β ββββββββ ββββββββ ββββββββ ββββββββ ββββββββββββ β
β βPLAN ββ βCODE ββ βBUILD ββ βTEST ββ β RELEASE β β
β βJira β βGit β βMaven β βJUnit β β JFrog β β
β ββββββββ ββββββββ ββββββββ ββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β artifact
v
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CD / DEPLOY PHASE β
β DEV ββautoβββΆ STAGING ββintegration testsβββΆ PROD β
β (manual gate) β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
v
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OBSERVE & MONITOR (Feedback Loop) β
β Prometheus β Grafana β Alertmanager β Slack/PagerDuty β
β ELK Stack (Logs) β Jaeger (Traces) β On-call team β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| Aspect | Traditional IT | DevOps |
|---|---|---|
| Release Frequency | Monthly/Quarterly | Multiple times per day |
| Lead Time | Weeks to months | Hours to days |
| Team Structure | Dev, QA, Ops silos | Cross-functional squads |
| Infrastructure | Manual, long-lived servers | IaC, immutable infrastructure |
| Failure Recovery | Hours to days (blame) | Minutes (blameless retro) |
| Change Failure Rate | 30β50% | 0β15% |
| Testing | Manual, end of cycle | Automated, shift-left |
| Security | Added at the end | Shift-left DevSecOps |
Google's DevOps Research and Assessment (DORA) team identified four key metrics that distinguish elite engineering teams from low performers. These are the only 4 metrics you need to understand your pipeline health:
| Metric | Elite | Low Performer | Meaning |
|---|---|---|---|
| Deployment Frequency | Multiple/day | Once per 6 months | How often code reaches production |
| Lead Time for Changes | <1 hour | 6+ months | Commit β production time |
| Change Failure Rate | <5% | 46β60% | % deployments causing incidents |
| Mean Time to Recovery | <1 hour | >6 months | Time to restore after failure |
Shell scripting is the foundational automation skill for every DevOps engineer. A shell (bash, sh, zsh) is the command interpreter that lets you interact with the Linux kernel. Shell scripts chain together commands, control flow, and system calls to automate repetitive tasks β from provisioning servers to deploying applications. Almost every CI/CD pipeline, cron job, and server automation uses shell scripting under the hood.
Bash (Bourne Again Shell) is the most common shell in Linux and is used in AWS EC2, Docker containers, and Kubernetes init containers. Understanding shell scripting saves hours of manual work and is essential for writing Jenkins pipelines, Docker entrypoints, and Kubernetes scripts.
User types command / Script file (.sh)
|
v
βββββββββββ
β SHELL β bash / sh / zsh / fish
β (Parser)β Reads, tokenizes, expands variables
ββββββ¬βββββ
β
βββββββββ΄βββββββββ
β β
v v
Built-in External Command
Commands (fork + exec)
(cd, echo, /bin/ls, /usr/bin/curl
export, alias) spawns child process
|
v
Linux Kernel (system calls: read, write, fork, exec)
|
v
Hardware / File System / Network#!/bin/bash # Shebang: tells OS which interpreter to use
# Script: deploy.sh # Comments start with #
set -e # Exit immediately on error
set -o pipefail # Catch pipe errors too
DEPLOY_ENV=${1:-"dev"} # First argument, default = dev
APP_NAME="myapp"
TIMESTAMP=$(date +%Y%m%d%H%M%S)
echo "Deploying $APP_NAME to $DEPLOY_ENV at $TIMESTAMP"
# Variables
NAME="DevOps"
echo "Hello $NAME"
echo "Length: ${#NAME}" # String length
# Conditionals
if [ "$DEPLOY_ENV" == "prod" ]; then
echo "Production deployment - applying approval gate"
elif [ "$DEPLOY_ENV" == "staging" ]; then
echo "Staging deployment"
else
echo "Dev deployment - skipping approvals"
fi
# For loop
for SERVER in web1 web2 web3; do
echo "Deploying to $SERVER"
ssh ubuntu@$SERVER "sudo systemctl restart nginx"
done
# While loop
COUNT=0
while [ $COUNT -lt 5 ]; do
echo "Health check attempt $COUNT"
curl -sf http://localhost:8080/health && break
COUNT=$((COUNT + 1))
sleep 3
done
function check_service() {
local SERVICE=$1
if systemctl is-active --quiet $SERVICE; then
echo "β
$SERVICE is running"
return 0
else
echo "β $SERVICE is NOT running"
return 1
fi
}
# Trap errors and cleanup
trap 'echo "Error on line $LINENO"; cleanup' ERR
function cleanup() {
rm -f /tmp/deploy.lock
exit 1
}
check_service nginx || cleanup
find /app -name "*.log" -mtime +7 -deletetar -czf backup.tar.gz /var/wwwrsync -avz ./dist/ user@server:/var/wwwsed -i 's/OLD/NEW/g' config.txtawk '{print $1,$3}' access.lognetstat -tlnp | grep 8080curl -o /dev/null -s -w "%{http_code}" URLps aux | grep java | grep -v grepkill -9 $(lsof -t -i:8080)nohup ./server.sh > log.txt &set -e and set -o pipefail at the top"$VAR" to handle spacesecho "[$(date)] Starting deploy"bash -n script.sh (syntax check) and shellcheck script.shA web server is software that accepts HTTP/HTTPS requests from clients (browsers, APIs, mobile apps) and serves responses β either static files (HTML, CSS, images) or by proxying requests to application servers (Node.js, Python, Java). In DevOps, web servers like Nginx and Apache are critical for reverse proxying, load balancing, SSL termination, and serving microservices.
Nginx (Engine-X) uses an event-driven, non-blocking architecture that handles thousands of concurrent connections with minimal memory. Apache uses a process/thread-per-connection model that is more flexible with .htaccess but less performant at scale. In modern DevOps, Nginx dominates for reverse proxy and Kubernetes Ingress controllers.
Internet
β
β HTTPS :443
v
βββββββββββββββββββββββββββββββββββββββ
β NGINX (Reverse Proxy) β
β - SSL/TLS Termination (Let's Encryptβ
β - Gzip Compression β
β - Rate Limiting β
β - Static File Caching β
ββββββββββββββββββ¬βββββββββββββββββββββ
β HTTP :8080 (internal)
βββββββββββΌββββββββββ
β β β
v v v
ββββββββββ ββββββββββ ββββββββββ
βApp #1 β βApp #2 β βApp #3 β (Node.js / Java / Python)
β:3000 β β:3001 β β:3002 β
ββββββββββ ββββββββββ ββββββββββ
β β β
βββββββββββ΄ββββββββββ
β
v
ββββββββββββ
β Database β (PostgreSQL / MySQL / MongoDB)
ββββββββββββ# /etc/nginx/sites-available/myapp.conf
upstream backend {
least_conn; # Load balancing: least connections
server 10.0.1.10:8080 weight=3; # Higher weight = more traffic
server 10.0.1.11:8080 weight=1;
server 10.0.1.12:8080 backup; # Only used if others are down
}
server {
listen 443 ssl http2;
server_name myapp.example.com;
ssl_certificate /etc/ssl/certs/myapp.crt;
ssl_certificate_key /etc/ssl/private/myapp.key;
ssl_protocols TLSv1.2 TLSv1.3;
# Security headers
add_header X-Frame-Options "SAMEORIGIN";
add_header X-Content-Type-Options "nosniff";
add_header Strict-Transport-Security "max-age=31536000";
# Static files - serve directly (fast)
location /static/ {
root /var/www/myapp;
expires 30d;
add_header Cache-Control "public, immutable";
}
# API requests - proxy to backend
location /api/ {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_connect_timeout 10s;
proxy_read_timeout 30s;
}
# Rate limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
location /api/login {
limit_req zone=api burst=5 nodelay;
proxy_pass http://backend;
}
}
| Feature | Nginx | Apache |
|---|---|---|
| Architecture | Event-driven, async | Process/thread per request |
| Concurrency | 10,000+ connections easily | Struggles above 1000 |
| Memory Usage | Very low (~2.5MB per worker) | Higher (forking model) |
| Config Style | Centralized nginx.conf | .htaccess per directory |
| PHP Support | Requires PHP-FPM | Built-in mod_php |
| SSL Termination | Excellent | Good |
| K8s Ingress | Official ingress controller | Available but less common |
| Best For | Reverse proxy, static, LB | Legacy apps, shared hosting |
# Nginx sudo nginx -t # Test config syntax sudo systemctl reload nginx # Hot reload (no downtime) sudo systemctl restart nginx # Full restart sudo tail -f /var/log/nginx/access.log sudo tail -f /var/log/nginx/error.log # Apache sudo apachectl configtest # Test config sudo systemctl reload apache2 sudo a2ensite myapp.conf # Enable site sudo a2enmod ssl rewrite proxy # Enable modules
Git is a distributed version control system (DVCS) created by Linus Torvalds in 2005 to manage the Linux kernel source. Unlike centralized VCS (like SVN), every developer has a full copy of the entire repository including its history β on their local machine. This means you can commit, branch, diff, and log completely offline.
Git tracks changes as snapshots (not diffs). Each commit stores a complete snapshot of all tracked files at that point in time. This makes Git operations like branching, merging, and reverting extremely fast. Git is the foundation of every modern CI/CD pipeline β no code gets built or deployed without going through Git first.
Developer A (Local Repo) Developer B (Local Repo)
βββββββββββββββββββββββ βββββββββββββββββββββββ
β Working Directory β β Working Directory β
β Staging Area (Index)β β Staging Area (Index)β
β Local Repository β β Local Repository β
β (full history) β β (full history) β
ββββββββββββ¬βββββββββββ ββββββββββββ¬βββββββββββ
β git push β git pull
β β
v v
βββββββββββββββββββββββββββββββββββββββββββ
β REMOTE REPOSITORY (GitHub/GitLab) β
β main ββββ feature/* ββββ release/* β
β (Central sync point, not single source)β
βββββββββββββββββββββββββββββββββββββββββββEverything in Git is stored as one of four object types in the .git/objects directory. Understanding this helps you understand what's happening under the hood:
| Object Type | What It Stores | Example |
|---|---|---|
| Blob | File content (no filename) | Contents of main.py |
| Tree | Directory listing (filenames + blob refs) | List of files in /src |
| Commit | Snapshot pointer + metadata (author, message, parent) | git log entry |
| Tag | Named pointer to a commit | v1.0.0 release tag |
Working Directory ββgit addβββΆ Staging Area ββgit commitβββΆ Local Repo
(untracked/modified files) (index/.git/index) (.git/objects)
β² β
βββββββββββββββββββ git checkout / git restore βββββββββββββββββββ
β
git push ββββΆ Remote
A Git repository is a data store containing your project's files and the entire history of changes. There are two types: Local repositories (on your machine, full history in .git/) and Remote repositories (hosted on GitHub, GitLab, Bitbucket β used as sync points for teams).
A bare repository is a special type used for remote servers β it contains only the Git metadata (no working directory). When you push to GitHub, you're pushing to a bare repo. A fork is a server-side clone of a repo under your own account β used for open-source contribution workflows.
git init myproject git clone https://...
β β
v v
Local Repo (.git/) Local Repo (.git/) <ββ full history cloned
βββββββββββββββββ βββββββββββββββββ
β .git/ β β .git/ β
β β HEAD β β β HEAD β
β β config β β β config β
β β objects/ β β β objects/ β
β β refs/ β β β refs/ β
β β index β β β index β
βββββββββββββββββ βββββββββββββββββ
β β
βββββββββ git push βββββββββΆ
β
Remote (GitHub)
βββββββββββββββ
β Bare Repo β (no working tree)
β .git only β
βββββββββββββββ# Create new repo locally git init my-devops-project cd my-devops-project # Clone existing remote repo git clone https://github.com/org/repo.git git clone git@github.com:org/repo.git # SSH clone # Check repo status git status git log --oneline --graph --decorate --all # Visual history # Remote management git remote -v # Show remotes git remote add upstream https://original.repo # Add upstream for forks git remote set-url origin https://new.url # Change remote URL
Before using Git in a team, proper configuration ensures your commits are correctly attributed, your editor works, and you can authenticate with remote services. Git configuration has three scopes: system (all users on the machine), global (your user account), and local (specific repository). Local overrides global, which overrides system.
/etc/gitconfig β System-wide (all users)
β
~/.gitconfig β Global (your user)
β
.git/config β Local (this repo only β highest priority)
Resolution: Local > Global > System# Identity (required for commits) git config --global user.name "Your Name" git config --global user.email "you@company.com" # Default editor (for commit messages) git config --global core.editor "vim" # or "code --wait" for VS Code # Default branch name git config --global init.defaultBranch main # Line endings (important for Windows/Linux teams) git config --global core.autocrlf input # Linux/Mac: convert CRLFβLF on commit git config --global core.autocrlf true # Windows: convert LFβCRLF # Aliases (huge productivity boost) git config --global alias.st status git config --global alias.co checkout git config --global alias.br branch git config --global alias.lg "log --oneline --graph --decorate --all" git config --global alias.undo "reset HEAD~1 --mixed" # SSH Key setup for GitHub/GitLab ssh-keygen -t ed25519 -C "you@company.com" # Generate key cat ~/.ssh/id_ed25519.pub # Copy to GitHub Settings β SSH Keys ssh -T git@github.com # Test connection
# .gitignore β tells git which files to never track # Generated files target/ dist/ build/ *.class *.jar # Environment files (NEVER commit secrets!) .env .env.local *.env # IDE files .idea/ .vscode/ *.iml # OS files .DS_Store Thumbs.db # Dependency directories node_modules/ vendor/ __pycache__/ *.pyc
.env files, passwords, API keys, or private SSH keys to Git. Use tools like git-secrets, truffleHog, or GitHub's secret scanning to detect accidental secret commits. Once committed and pushed, secrets must be rotated β the history is permanent.Git commands map to the three-area model: Working Directory β Staging Area β Local Repository β Remote. Mastering these commands is essential for daily DevOps work, including cherry-picking hotfixes, undoing mistakes, and bisecting bugs in production.
Untracked/Modified Staged Committed Remote
(Working Dir) (Index) (Local Repo) (GitHub)
β β β β
βββββ git add ββββββββββΆβ β β
β βββββ git commit ββββββββββΆβ β
β β ββββ git push βββββββββΆβ
βββββ git restore βββββββ ββββ git fetch βββββββββ
β β ββββ git pull ββββββββββ
ββββββββββββββ git checkout / git switch βββββββββββ β# Stage and commit
git add . # Stage all changes
git add src/app.py # Stage specific file
git add -p # Interactive staging (hunk by hunk)
git commit -m "feat: add login API" # Commit with message
git commit --amend --no-edit # Add to last commit without new message
# View history and differences
git log --oneline -10 # Last 10 commits
git log --author="Veera" --since="2024-01-01"
git diff # Working dir vs staging
git diff --staged # Staging vs last commit
git show HEAD~2:src/app.py # View file from 2 commits ago
# Undoing changes
git restore src/app.py # Discard working dir change
git restore --staged src/app.py # Unstage file
git reset HEAD~1 --soft # Undo last commit, keep staged
git reset HEAD~1 --mixed # Undo last commit, keep files
git reset HEAD~1 --hard # Undo last commit, DELETE changes
git revert abc1234 # Safe undo (creates new commit)
# Stashing work-in-progress
git stash push -m "WIP: login feature"
git stash list
git stash pop # Apply and remove stash
git stash apply stash@{1} # Apply specific stash
# Finding bugs
git bisect start
git bisect bad HEAD
git bisect good v1.0.0 # Git binary-searches for bad commit
git bisect run pytest tests/
# Cherry-pick a specific commit to current branch git cherry-pick abc1234 # Apply one commit git cherry-pick abc1234..def5678 # Apply range of commits # Rewrite history (use with caution on shared branches) git rebase -i HEAD~5 # Interactive rebase last 5 commits # Options: pick, squash, fixup, reword, drop, edit # Find who changed a line git blame src/app.py git blame -L 50,60 src/app.py # Lines 50-60 only # Search across all commits git log -S "password_hash" # Find commits touching this string git grep "TODO" $(git rev-list --all) # Search ALL history
A remote in Git is a named reference to a repository hosted elsewhere β on GitHub, GitLab, Bitbucket, or your own server. The default remote after cloning is always called origin. In forked workflows, it's common to have two remotes: origin (your fork) and upstream (the original repo). Remotes allow teams to share code, collaborate, and trigger CI/CD pipelines.
Local Repository Remote (GitHub)
βββββββββββββββββββββββββββββ βββββββββββββββββββββββββ
β refs/heads/main β β refs/heads/main β
β refs/heads/feature/login β β refs/heads/develop β
β β β β
β refs/remotes/origin/main βββββββΆ β (remote tracking) β
β refs/remotes/origin/dev β β β
βββββββββββββββββββββββββββββ βββββββββββββββββββββββββ
β
β git fetch (updates refs/remotes/* without merging)
β git pull (fetch + merge/rebase into current branch)
β git push (upload local commits to remote)# List and manage remotes git remote -v # Show all remotes with URLs git remote add origin git@github.com:user/repo.git git remote add upstream https://github.com/original/repo.git git remote rename origin backup git remote remove backup # Fetch vs Pull (important distinction!) git fetch origin # Download changes, DON'T merge git fetch --all # Fetch all remotes git pull origin main # fetch + merge (or rebase) git pull --rebase origin main # fetch + rebase (cleaner history) # Push operations git push origin main # Push to remote main git push -u origin feature/login # Push + set upstream tracking git push origin --delete old-branch # Delete remote branch git push --force-with-lease # Safe force push (checks remote state) git push origin v1.0.0 # Push a tag # Sync fork with upstream git fetch upstream git checkout main git merge upstream/main git push origin main
A Git branch is simply a lightweight movable pointer to a commit. Creating a branch costs almost nothing (just 41 bytes β a file containing the commit hash). This makes branching the primary mechanism for parallel development. Every feature, bug fix, hotfix, or release gets its own branch, isolated from the main codebase until it's ready.
main ββββββββββββββββββββββββββββββββββββββββββββββββ (production)
β β
β merge release
β β
develop βββββββββββββββββββββββββββββββββββββββββββββ (integration)
β β β
feature feature bugfix
/login /search /cart
β β β
βββββββββββββββββ (feature merged back to develop)
hotfix ββββββββββββββββββββββββββββββββ (direct to main + develop)
β
critical fix for prod# Create and switch branches git branch feature/login # Create branch git switch feature/login # Switch to it git switch -c feature/signup # Create + switch (shortcut) git branch -d feature/login # Delete merged branch git branch -D feature/login # Force delete (unmerged) # List branches git branch # Local branches git branch -r # Remote branches git branch -a # All branches git branch --merged main # Branches merged into main
# Regular merge (creates merge commit β preserves history) git checkout main git merge feature/login git merge --no-ff feature/login # Always create merge commit # Squash merge (squash all feature commits into one) git merge --squash feature/login git commit -m "feat: add login feature (squashed)" # Rebase (rewrite history β linear, clean) git checkout feature/login git rebase main # Replay feature commits on top of main git checkout main git merge feature/login # Fast-forward only
| Strategy | Branches | Best For | Release Model |
|---|---|---|---|
| GitFlow | main, develop, feature/*, release/*, hotfix/* | Scheduled releases (apps, SaaS) | Versioned releases |
| GitHub Flow | main + feature branches | Continuous delivery teams | Deploy on merge |
| Trunk-Based | main only + short-lived branches | Very high-velocity teams | Feature flags |
| GitLab Flow | main + environment branches | Teams with staging/prod envs | Environment promotion |
Apache Maven is a build automation and dependency management tool for Java-based projects. Before Maven, developers manually downloaded JAR files, configured classpaths, and wrote custom Ant build scripts. Maven introduced the concept of Convention over Configuration β if you follow the standard project structure, Maven knows exactly what to do without extensive configuration.
Maven uses a POM (Project Object Model) file β pom.xml β to define project metadata, dependencies, plugins, and build lifecycle. It resolves dependencies from Maven Central Repository (or your JFrog Artifactory), downloads them once, and caches them in ~/.m2/repository. Maven is the most common build tool in enterprise Java and is used by Spring Boot, Quarkus, and most Java microservices.
pom.xml βββΆ Maven reads project config
β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β Maven Build Lifecycle β
β β
β validate β compile β test β package β verify β
β β β
β βββββββΌββββββββ β
β β .jar/.war β β
β βββββββ¬ββββββββ β
β β β
β install βββββββββββββββββββββββββββΆ deploy β
β (local ~/.m2 cache) (Nexus/JFrog) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
External Dependencies
βββββββββ΄ββββββββ
β β
Maven Central JFrog Artifactory
(public repo) (private/internal)<project>
<modelVersion>4.0.0</modelVersion>
<!-- Project coordinates (GAV = GroupId:ArtifactId:Version) -->
<groupId>com.veera.devops</groupId>
<artifactId>myapp</artifactId>
<version>1.0.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<java.version>17</java.version>
<spring.version>3.2.0</spring.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>${spring.version}</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<scope>test</scope> <!-- Only for testing -->
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
mvn validate # Validate project structure mvn compile # Compile source code β target/classes/ mvn test # Run unit tests (JUnit/TestNG) mvn package # Create JAR/WAR β target/myapp-1.0.jar mvn install # Package + install to local ~/.m2 cache mvn deploy # Package + upload to remote repo (JFrog) mvn clean # Delete target/ directory mvn clean package -DskipTests # Build without running tests (CI shortcut) mvn dependency:tree # View full dependency tree mvn versions:display-updates # Check for outdated dependencies
| Scope | Compile | Test | Runtime | Use For |
|---|---|---|---|---|
| compile (default) | β | β | β | Core dependencies |
| test | β | β | β | JUnit, Mockito |
| runtime | β | β | β | JDBC drivers |
| provided | β | β | β | Servlet API (from container) |
SonarQube is a continuous code quality and security inspection platform. It performs Static Application Security Testing (SAST) β analyzing source code without executing it β to detect bugs, code smells, vulnerabilities, and coverage gaps. SonarQube integrates into CI/CD pipelines to enforce quality gates that prevent bad code from reaching production.
SonarQube works by having a scanner analyze your code locally (or in CI), send results to the SonarQube Server, which stores them in a PostgreSQL database and presents dashboards and quality gate decisions. If your code fails the quality gate (e.g., coverage below 80%, new Critical vulnerabilities), the pipeline fails and deployment is blocked.
Developer pushes code
β
v
Jenkins / GitHub Actions
β
β Step 1: Build (mvn package)
β
β Step 2: Run SonarScanner
β mvn sonar:sonar \
β -Dsonar.host.url=http://sonar:9000 \
β -Dsonar.login=$SONAR_TOKEN
β
v
SonarQube Scanner sends analysis to:
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β SonarQube Server :9000 β
β ββββββββββββββββ ββββββββββββββββββββββββ β
β β Analysis DB β β Quality Gate Engine β β
β β (PostgreSQL) β β - Coverage β₯ 80% β β
β β β β - 0 Critical bugs β β
β β β β - Duplication < 3% β β
β ββββββββββββββββ ββββββββββββ¬ββββββββββββ β
βββββββββββββββββββββββββββββββββββΌβββββββββββββββ
β
ββββββββββββββββ΄βββββββββββββββ
β β
PASSED FAILED
β β
Pipeline continues Pipeline FAILS
β Deploy to staging β Developer notified
β Fix and re-push| Issue Type | Description | Example |
|---|---|---|
| Bug | Code that will cause incorrect behavior | NullPointerException risk, wrong operator |
| Vulnerability | Security weakness exploitable by attackers | SQL injection, XSS, hardcoded password |
| Code Smell | Maintainability issue (tech debt) | Too many parameters, duplicate code, long methods |
| Security Hotspot | Code needing manual security review | Use of MD5 hashing, HTTP vs HTTPS |
| Coverage Gap | Code lines not covered by unit tests | Exception handling not tested |
# sonar-project.properties sonar.projectKey=my-devops-app sonar.projectName=My DevOps App sonar.sources=src/main/java sonar.tests=src/test/java sonar.java.coveragePlugin=jacoco sonar.coverage.jacoco.xmlReportPaths=target/site/jacoco/jacoco.xml sonar.exclusions=**/generated/**,**/test/** # Quality Gate thresholds (configure in SonarQube UI): # New Code: # Coverage >= 80% # Duplicated Lines < 3% # Maintainability Rating = A # Reliability Rating = A # Security Rating = A
Jenkins is the world's most widely used open-source automation server for Continuous Integration and Continuous Delivery. Built in Java, it orchestrates the entire software delivery pipeline β from code commit to production deployment. Jenkins monitors version control for changes, triggers automated builds, runs tests, performs quality checks, builds Docker images, and deploys to environments.
Jenkins follows a master-agent architecture: the Controller (master) manages pipelines, jobs, scheduling, and the web UI. Agents (workers) are the machines that actually execute the build steps. This allows horizontal scaling β hundreds of concurrent builds across many agents. Agents can be physical machines, VMs, Docker containers, or Kubernetes pods.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JENKINS CONTROLLER (Master) β
β Port: 8080 (UI) | Port: 50000 (agent JNLP) β
β ββββββββββββ βββββββββββ ββββββββββ ββββββββββββ β
β β Pipeline β β Job β β Plugin β β Cred β β
β β Engine β β Queue β β Managerβ β Store β β
β ββββββββββββ βββββββββββ ββββββββββ ββββββββββββ β
βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ
SSH/JNLP/Inboundβ
βββββββββββββββββββββββΌββββββββββββββββββββββ
β β β
v v v
βββββββββββ βββββββββββ βββββββββββ
β Agent 1 β β Agent 2 β β Agent 3 β
β Linux β β Docker β β K8s Pod β
β Java β β Containerβ β(ephemeralβ
β Maven β β Node.js β β agent) β
βββββββββββ βββββββββββ βββββββββββ
Builds Java Builds JS Scales auto# Install Jenkins on Ubuntu sudo apt update sudo apt install -y openjdk-17-jdk curl -fsSL https://pkg.jenkins.io/debian/jenkins.io-2023.key | sudo tee \ /usr/share/keyrings/jenkins-keyring.asc > /dev/null echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc] \ https://pkg.jenkins.io/debian binary/" | sudo tee \ /etc/apt/sources.list.d/jenkins.list > /dev/null sudo apt update && sudo apt install -y jenkins sudo systemctl start jenkins sudo systemctl enable jenkins # Get initial admin password sudo cat /var/lib/jenkins/secrets/initialAdminPassword
// Jenkinsfile β stored in your repo root
pipeline {
agent any // Run on any available agent
environment {
APP_NAME = "myapp"
DOCKER_REGISTRY = "registry.company.com"
}
stages {
stage('Checkout') {
steps {
git branch: 'main',
url: 'https://github.com/org/myapp.git'
}
}
stage('Build') {
steps {
sh 'mvn clean package -DskipTests'
}
}
stage('Test') {
steps {
sh 'mvn test'
}
post {
always {
junit 'target/surefire-reports/*.xml' // Publish test results
}
}
}
stage('Docker Build & Push') {
steps {
withCredentials([usernamePassword(credentialsId: 'docker-creds',
usernameVariable: 'USER', passwordVariable: 'PASS')]) {
sh """
docker build -t $DOCKER_REGISTRY/$APP_NAME:${BUILD_NUMBER} .
docker login $DOCKER_REGISTRY -u $USER -p $PASS
docker push $DOCKER_REGISTRY/$APP_NAME:${BUILD_NUMBER}
"""
}
}
}
stage('Deploy to Dev') {
steps {
sh "kubectl set image deployment/$APP_NAME $APP_NAME=$DOCKER_REGISTRY/$APP_NAME:${BUILD_NUMBER}"
}
}
}
post {
success { slackSend channel: '#devops', message: "β
Build #${BUILD_NUMBER} succeeded!" }
failure { slackSend channel: '#devops', message: "β Build #${BUILD_NUMBER} FAILED!" }
}
}
Jenkins offers two pipeline syntaxes. Declarative (recommended) uses a structured, opinionated syntax with built-in validation β it's easier to read and enforce standards. Scripted pipelines use Groovy code directly, offering maximum flexibility but requiring Groovy knowledge. Both are defined in a Jenkinsfile stored in your Git repository (Pipeline as Code).
Trigger (webhook / timer / manual)
β
v
βββββββββββββββ
β Checkout β (SCM clone)
ββββββββ¬βββββββ
β
ββββββββΌβββββββ
β Build β (mvn package)
ββββββββ¬βββββββ
β
ββββββββΌβββββββββββββββββββββββββββββββ
β PARALLEL: Test + Scan β
β βββββββββββββ ββββββββββββββββββ β
β β Unit Test β β SonarQube Scan β β
β β (JUnit) β β (Quality Gate) β β
β βββββββββββββ ββββββββββββββββββ β
ββββββββ¬βββββββββββββββββββββββββββββββ
β (all parallel stages must pass)
ββββββββΌβββββββ
β Docker Buildβ
β & Push β
ββββββββ¬βββββββ
β
ββββββββΌβββββββ βββββββββββββββββββββββ
β Deploy Dev ββββββΆβ Auto Integration Testβ
βββββββββββββββ ββββββββββββ¬βββββββββββ
β
βββββββββββΌβββββββββββ
β Manual Approval β (input step)
β (Deploy to Prod?)β
βββββββββββ¬βββββββββββ
β Approved
βββββββββββΌβββββββββββ
β Deploy PROD β
ββββββββββββββββββββββpipeline {
agent { label 'docker-agent' } // Use labeled agents
options {
timeout(time: 30, unit: 'MINUTES') // Pipeline timeout
retry(2) // Auto-retry on failure
disableConcurrentBuilds() // One build at a time
buildDiscarder(logRotator(numToKeepStr: '10'))
}
triggers {
pollSCM('H/5 * * * *') // Poll Git every 5 mins
cron('0 2 * * 1-5') // Nightly build weekdays at 2am
}
parameters {
choice(name: 'ENVIRONMENT', choices: ['dev','staging','prod'])
booleanParam(name: 'RUN_TESTS', defaultValue: true)
string(name: 'IMAGE_TAG', defaultValue: 'latest')
}
stages {
stage('Parallel Tests') {
parallel {
stage('Unit Tests') { steps { sh 'mvn test' } }
stage('Integration Tests') { steps { sh 'mvn verify -P integration' } }
stage('Security Scan') { steps { sh 'trivy image myapp:latest' } }
}
}
stage('Deploy to Prod') {
when {
branch 'main'
environment name: 'ENVIRONMENT', value: 'prod'
}
steps {
input message: 'Deploy to Production?', ok: 'Deploy Now',
submitter: 'devops-leads'
sh './deploy-prod.sh'
}
}
}
}
// vars/deployToK8s.groovy (in shared-library repo)
def call(String appName, String image, String namespace) {
sh """
kubectl set image deployment/${appName} ${appName}=${image} -n ${namespace}
kubectl rollout status deployment/${appName} -n ${namespace} --timeout=3m
"""
}
// Usage in any Jenkinsfile:
@Library('my-shared-library@main') _
pipeline {
stages {
stage('Deploy') {
steps {
deployToK8s('myapp', "registry/myapp:${BUILD_NUMBER}", 'production')
}
}
}
}
Jenkins security is critical because it has access to your source code, credentials, Kubernetes clusters, and production environments. A compromised Jenkins server is a full supply chain attack vector. Jenkins security involves Authentication (who are you?), Authorization (what can you do?), and Credential Management (how do we store secrets safely?).
Request to Jenkins
β
v
βββββββββββββββββββββββββββββββββββββββββββ
β Authentication Layer β
β ββββββββββββββ ββββββββββββββββββββββ β
β β Jenkins DB β β LDAP / Active Dir β β
β β (local) β β SSO / SAML / OIDC β β
β ββββββββββββββ ββββββββββββββββββββββ β
βββββββββββββββββββ¬ββββββββββββββββββββββββ
β Authenticated user
βββββββββββββββββββΌββββββββββββββββββββββββ
β Authorization Layer β
β Role-Based Access Control (RBAC) β
β ββββββββββββββ ββββββββββββ βββββββββ β
β β Admin β βDeveloper β βViewer β β
β β All access β βBuild/Run β βRead β β
β ββββββββββββββ ββββββββββββ βββββββββ β
βββββββββββββββββββ¬ββββββββββββββββββββββββ
β
βββββββββββββββββββΌββββββββββββββββββββββββ
β Credentials Store (Encrypted) β
β SSH Keys | API Tokens | Passwords β
β Docker Registry | K8s certs | Vault β
βββββββββββββββββββββββββββββββββββββββββββ| Plugin | Purpose |
|---|---|
| Pipeline | Declarative and scripted pipeline support |
| Git / GitHub | Source code integration and webhooks |
| Docker Pipeline | Build and push Docker images in pipelines |
| Kubernetes | Dynamic K8s pod agents |
| SonarQube Scanner | Code quality gate integration |
| Slack Notification | Build notifications to Slack |
| Role Strategy | Fine-grained RBAC for Jenkins |
| Credentials Binding | Inject secrets safely into pipelines |
| Blue Ocean | Modern pipeline visualization UI |
| JUnit / TestNG | Test result publishing |
| HashiCorp Vault | Dynamic secret injection from Vault |
// WRONG - Never hardcode credentials
sh 'docker login -u admin -p password123 registry.io'
// CORRECT - Use Jenkins credentials store
withCredentials([
usernamePassword(credentialsId: 'docker-registry',
usernameVariable: 'DOCKER_USER',
passwordVariable: 'DOCKER_PASS'),
string(credentialsId: 'sonar-token', variable: 'SONAR_TOKEN'),
sshUserPrivateKey(credentialsId: 'deploy-key', keyFileVariable: 'SSH_KEY')
]) {
sh 'docker login -u $DOCKER_USER -p $DOCKER_PASS registry.io'
sh "ssh -i $SSH_KEY ubuntu@server 'sudo systemctl restart app'"
}
Jenkins' power comes from its 1800+ plugins that integrate it with virtually every tool in the DevOps ecosystem. A complete pipeline connects Git (source), Maven/Gradle (build), SonarQube (quality), JFrog Artifactory (artifacts), Docker (containers), Kubernetes (deployment), and Slack/PagerDuty (notifications) β all orchestrated through Jenkins.
βββββββββββββββ
β GitHub βββββ webhook βββββΆ
β GitLab β β
βββββββββββββββ v
ββββββββββββββββ
βββββββββββββββ β JENKINS β
β SonarQube ββββ sonar:sonar ββββββββββββββββββ PIPELINE β
βββββββββββββββ β β
β ββββΆ Maven Build
βββββββββββββββ β ββββΆ Docker Build
βJFrog/Nexus ββββ mvn deploy ββββββββββββββββββ ββββΆ K8s Deploy
βββββββββββββββ β ββββΆ Test Report
ββββββββ¬ββββββββ
βββββββββββββββ β
β Slack ββββ notification βββββββββββββββββββββββ
β PagerDuty β
βββββββββββββββA Multibranch Pipeline automatically discovers all branches in a repository that contain a Jenkinsfile and creates a separate pipeline for each. This means every feature branch gets its own CI pipeline automatically β no manual job creation required. Pull Requests also get their own pipeline for validation before merge.
// Each branch has its own Jenkinsfile (or shared one)
// Jenkins auto-discovers:
// main β triggers on every push
// feature/loginβ triggers on every push to this branch
// PR #42 β triggers on PR open/update
// Environment-specific logic using branch name
stage('Deploy') {
steps {
script {
if (env.BRANCH_NAME == 'main') {
sh './deploy.sh production'
} else if (env.BRANCH_NAME.startsWith('feature/')) {
sh "./deploy.sh dev-${env.BRANCH_NAME.replaceAll('/', '-')}"
}
}
}
}
GitLab CI/CD is a built-in, native CI/CD system in GitLab. Unlike Jenkins (external tool), GitLab CI is deeply integrated into the platform β code, issues, merge requests, and pipelines all live in one place. Pipelines are defined in a .gitlab-ci.yml file at the root of your repository. GitLab Runners execute the jobs β they can be shared (GitLab.com provides free shared runners) or self-hosted.
git push to GitLab
β
β triggers
v
.gitlab-ci.yml parsed
β
βββββββββΌβββββββββββββββββββββββββββββββββββββββββ
β PIPELINE β
β Stage 1: build Stage 2: test Stage 3: deploy β
β βββββββββββ ββββββββββββ βββββββββββββββ β
β β compile β βunit-test β β deploy-dev β β
β β docker β βlint β β deploy-prod β β
β βββββββββββ βsonarqube β β (manual) β β
β ββββββββββββ βββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
GitLab Runners (execute jobs)
βββββββββββββββββββββββββββββββββββ
β Shell Runner | Docker Runner β
β K8s Runner | Shared Runners β
βββββββββββββββββββββββββββββββββββimage: maven:3.9-eclipse-temurin-17 # Default Docker image for all jobs
variables:
MAVEN_OPTS: "-Dmaven.repo.local=.m2/repository"
IMAGE_NAME: "registry.gitlab.com/$CI_PROJECT_PATH"
cache:
key: "$CI_COMMIT_REF_SLUG"
paths:
- .m2/repository # Cache Maven deps between jobs
stages:
- build
- test
- security
- package
- deploy
build:
stage: build
script:
- mvn compile
artifacts:
paths:
- target/
unit-tests:
stage: test
script:
- mvn test
coverage: '/Total.*?([0-9]{1,3})%/'
artifacts:
reports:
junit: target/surefire-reports/TEST-*.xml
sonarqube:
stage: security
script:
- mvn sonar:sonar -Dsonar.host.url=$SONAR_URL -Dsonar.login=$SONAR_TOKEN
allow_failure: false
docker-build:
stage: package
image: docker:24
services:
- docker:dind
script:
- docker build -t $IMAGE_NAME:$CI_COMMIT_SHA .
- docker push $IMAGE_NAME:$CI_COMMIT_SHA
deploy-production:
stage: deploy
environment:
name: production
url: https://myapp.example.com
when: manual # Requires manual click
only:
- main
script:
- kubectl set image deployment/myapp myapp=$IMAGE_NAME:$CI_COMMIT_SHA
GitHub Actions is GitHub's native CI/CD and automation platform. Workflows are defined as YAML files in .github/workflows/ and are triggered by GitHub events β pushes, pull requests, issues, releases, schedules, or manual dispatch. GitHub provides free hosted runners (Ubuntu, Windows, macOS) and you can bring your own self-hosted runners.
The key innovation of GitHub Actions is its marketplace of reusable Actions β over 20,000 community-built actions for common tasks (checkout, Docker build, deploy to AWS, send Slack, etc.). Instead of writing shell scripts for everything, you compose workflows from pre-built actions.
Event: push to main / PR opened / cron schedule
β
v
.github/workflows/ci.yml
β
βββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ
β WORKFLOW β
β Job 1: build (runs-on: ubuntu-latest) β
β Step 1: actions/checkout@v4 β
β Step 2: actions/setup-java@v4 β
β Step 3: mvn package β
β Step 4: actions/upload-artifact@v4 β
β β
β Job 2: test (needs: build) β
β Step 1: actions/checkout@v4 β
β Step 2: actions/download-artifact@v4 β
β Step 3: mvn test β
β β
β Job 3: deploy (needs: test, if: main branch) β
β Step 1: aws-actions/configure-aws-credentials β
β Step 2: kubectl deploy β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
GitHub-hosted Runner (ubuntu-latest)
or Self-hosted Runner (your EC2 / K8s)name: CI/CD Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
workflow_dispatch: # Allow manual trigger
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up JDK 17
uses: actions/setup-java@v4
with:
java-version: '17'
distribution: 'temurin'
cache: maven
- name: Build and Test
run: mvn clean verify
- name: Publish Test Results
uses: dorny/test-reporter@v1
if: always()
with:
name: JUnit Tests
path: target/surefire-reports/*.xml
reporter: java-junit
docker-build-push:
needs: build-and-test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
deploy:
needs: docker-build-push
runs-on: ubuntu-latest
environment: production # Requires environment approval
steps:
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/myapp \
myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
Docker is a containerization platform that packages an application and all its dependencies (runtime, libraries, config) into a portable, isolated unit called a container. Containers solve the classic "it works on my machine" problem β a Docker container runs identically on a developer laptop, CI server, or production Kubernetes cluster.
Containers use Linux kernel namespaces (for isolation) and cgroups (for resource limits) β they share the host OS kernel but are isolated at the process, network, and filesystem level. Unlike VMs, containers don't need a guest OS, making them start in milliseconds and use megabytes instead of gigabytes of memory.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HOST MACHINE β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β βContainer1β βContainer2β βContainer3β β
β β App: API β β App: DB β β App: Web β β
β β Port:8080β β Port:5432β β Port:80 β β
β β /app/... β β /var/... β β /www/... β β
β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β
β β β β β
β ββββββΌβββββββββββββββΌβββββββββββββββΌβββββββββββββββ β
β β Docker Engine (dockerd) β β
β β containerd β runc β Linux Kernel namespaces β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β Linux Kernel β
β (namespaces + cgroups) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Dockerfile βββΆ docker build βββΆ Image βββΆ docker push βββΆ Registry
ββββΆ docker run βββΆ Container# Multi-stage build β small, secure final image # Stage 1: Build FROM maven:3.9-eclipse-temurin-17-alpine AS builder WORKDIR /app COPY pom.xml . RUN mvn dependency:go-offline -q # Cache deps in a separate layer COPY src ./src RUN mvn clean package -DskipTests # Stage 2: Runtime (minimal image β no build tools) FROM eclipse-temurin:17-jre-alpine RUN addgroup -S appgroup && adduser -S appuser -G appgroup # Non-root user WORKDIR /app COPY --from=builder /app/target/myapp.jar ./app.jar RUN chown -R appuser:appgroup /app USER appuser # Run as non-root (security!) EXPOSE 8080 HEALTHCHECK --interval=30s --timeout=10s --retries=3 \ CMD wget -qO- http://localhost:8080/actuator/health || exit 1 ENTRYPOINT ["java", "-jar", "-Xmx512m", "app.jar"]
# Build and run docker build -t myapp:1.0 . docker build -t myapp:1.0 --build-arg ENV=prod . docker run -d -p 8080:8080 --name myapp myapp:1.0 docker run -d -p 8080:8080 -e DB_HOST=localhost -v /data:/app/data myapp:1.0 # Manage containers docker ps # Running containers docker ps -a # All containers (including stopped) docker logs -f myapp # Stream logs docker exec -it myapp /bin/sh # Shell inside container docker stats # Real-time resource usage docker stop myapp && docker rm myapp # Registry operations docker login registry.io docker tag myapp:1.0 registry.io/team/myapp:1.0 docker push registry.io/team/myapp:1.0 docker pull registry.io/team/myapp:1.0 # Cleanup docker system prune -af # Remove all unused images/containers
version: '3.8'
services:
app:
build: .
ports: ["8080:8080"]
environment:
DB_HOST: postgres
REDIS_URL: redis://redis:6379
depends_on:
postgres:
condition: service_healthy
restart: unless-stopped
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: myapp
POSTGRES_USER: admin
POSTGRES_PASSWORD_FILE: /run/secrets/db_pass
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U admin"]
interval: 10s
redis:
image: redis:7-alpine
command: redis-server --maxmemory 256mb
volumes:
pgdata:
| Aspect | Virtual Machine | Container | Serverless |
|---|---|---|---|
| Startup Time | Minutes | Seconds | Milliseconds |
| Size | GBs (full OS) | MBs (app only) | N/A (managed) |
| Isolation | Full hardware | Kernel-level | Full (per invocation) |
| Portability | Medium | Very High | Low (vendor-specific) |
| Best For | Legacy apps, isolation | Microservices, CI | Event-driven, bursty |
An artifact is any file produced by a build process β JAR files, Docker images, npm packages, Helm charts, Python wheels, RPMs, etc. After building, these artifacts need to be stored, versioned, scanned, and distributed. JFrog Artifactory is the industry-leading universal artifact repository manager that acts as a single source of truth for all your build outputs.
Without a proper artifact repository, teams download dependencies directly from the internet (unreliable, slow, security risk) and store build outputs in Jenkins workspaces (not versioned, not audited, lost on cleanup). Artifactory provides proxying (cache Maven Central, Docker Hub), hosting (store your own artifacts), and virtual repos (unified view of multiple repos).
Internet (Maven Central, Docker Hub, npm registry)
β
β proxy + cache (once only)
v
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JFrog Artifactory β
β β
β Virtual Repos (unified access point) β
β ββββββββββββββββ ββββββββββββ βββββββββββββ β
β β libs-release β β docker- β β helm- β β
β β libs-snapshotβ β local β β local β β
β β libs-proxy β β docker- β β helm- β β
β β (Maven repos)β β proxy β β proxy β β
β ββββββββββββββββ ββββββββββββ βββββββββββββ β
β β
β Xray: Security scanning of all stored artifacts β
βββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
βββββββββββββββ΄ββββββββββββββ
β β
v v
Maven build Docker build
(mvn deploy) (docker push)
stores .jar stores image layers# settings.xml (~/.m2/settings.xml)
<settings>
<servers>
<server>
<id>artifactory</id>
<username>${ARTIFACTORY_USER}</username>
<password>${ARTIFACTORY_TOKEN}</password>
</server>
</servers>
<mirrors>
<mirror>
<id>artifactory</id>
<mirrorOf>*</mirrorOf> <!-- Route ALL downloads through Artifactory -->
<url>https://artifactory.company.com/artifactory/libs-virtual</url>
</mirror>
</mirrors>
</settings>
Kubernetes (K8s) is an open-source container orchestration platform originally developed by Google (inspired by their internal Borg system) and donated to the CNCF in 2014. It automates the deployment, scaling, load balancing, healing, and management of containerized applications. If Docker is about running a single container, Kubernetes is about running hundreds or thousands of containers reliably across a cluster of machines.
The core problem Kubernetes solves: when you run containers at scale, you need to answer questions like: Which server should run this container? What happens if a server dies? How do I update 100 containers without downtime? How do I scale from 3 to 30 containers during traffic spikes? Kubernetes answers all of these automatically.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONTROL PLANE (Master) β
β ββββββββββββββββ ββββββββββββ βββββββββββββ ββββββββββββ β
β β API Server β βScheduler β βController β β etcd β β
β β (kube-api) β β(bin pack)β β Manager β β (state β β
β β REST gateway β βbest node β β(reconcile)β β store) β β
β ββββββββββββββββ ββββββββββββ βββββββββββββ ββββββββββββ β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β kubectl / CI/CD deploys here
ββββββββββββββββββββΌββββββββββββββββββββ
β β β
v v v
ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ
β WORKER NODE 1 β β WORKER NODE 2 β β WORKER NODE 3 β
β ββββββββββββ β β ββββββββββββ β β ββββββββββββ β
β β kubelet β β β β kubelet β β β β kubelet β β
β β kube- β β β β kube- β β β β kube- β β
β β proxy β β β β proxy β β β β proxy β β
β β (pods) β β β β (pods) β β β β (pods) β β
β ββββββββββββ β β ββββββββββββ β β ββββββββββββ β
ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ| Object | Purpose | Analogy |
|---|---|---|
| Pod | Smallest unit β 1+ containers sharing network/storage | A single running process |
| Deployment | Manages ReplicaSets, rolling updates, rollbacks | A job description for pods |
| Service | Stable network endpoint for pods (load balancer) | A phone number for pods |
| Ingress | HTTP/HTTPS routing with hostname/path rules | A receptionist routing calls |
| ConfigMap | Non-sensitive configuration (env vars, files) | A config file |
| Secret | Sensitive data (passwords, tokens) β base64 encoded | A safe |
| PersistentVolume | Storage that outlives pods | An external hard drive |
| Namespace | Logical isolation within a cluster | A separate folder |
# Cluster info kubectl cluster-info kubectl get nodes kubectl get all -n production # Deployments kubectl apply -f deployment.yaml kubectl get deployments kubectl get pods -o wide kubectl describe pod myapp-abc-123 kubectl logs myapp-abc-123 -f # Stream logs kubectl exec -it myapp-abc-123 -- /bin/sh # Shell into pod # Scaling kubectl scale deployment myapp --replicas=5 kubectl autoscale deployment myapp --min=2 --max=10 --cpu-percent=60 # Updates and rollbacks kubectl set image deployment/myapp myapp=myapp:2.0 kubectl rollout status deployment/myapp kubectl rollout history deployment/myapp kubectl rollout undo deployment/myapp # Rollback to previous version
A Kubernetes manifest is a YAML file that declares the desired state of your application. Kubernetes continuously reconciles the actual state with the desired state β if a pod crashes, it's automatically restarted; if a node dies, pods are rescheduled to healthy nodes. This declarative, self-healing model is what makes Kubernetes so powerful.
Internet
β HTTPS
v
βββββββββββββββββββββββββββββββββββββββββββ
β Ingress (nginx-ingress-controller) β
β api.myapp.com β service:myapp-svc:8080 β
βββββββββββββββββββββ¬ββββββββββββββββββββββ
β
βββββββββββββββββββββΌββββββββββββββββββββββ
β Service (ClusterIP/LoadBalancer) β
β Selector: app=myapp β
β Port: 8080 β targetPort: 8080 β
β Load balances across all matching pods β
βββββββββ¬ββββββββββββββββββββ¬ββββββββββββββ
β β
ββββββββΌβββββββ βββββββββΌβββββββ
β Pod (app) β β Pod (app) β β Deployment manages these
β app:myapp β β app:myapp β
β v2.0.1 β β v2.0.1 β
βββββββββββββββ ββββββββββββββββapiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: production
labels:
app: myapp
version: v2.0.1
spec:
replicas: 3
selector:
matchLabels:
app: myapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Allow 1 extra pod during update
maxUnavailable: 0 # Never reduce below 3 pods (zero-downtime)
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: registry.io/myapp:v2.0.1
ports:
- containerPort: 8080
env:
- name: DB_HOST
valueFrom:
configMapKeyRef:
name: myapp-config
key: db-host
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: myapp-secrets
key: db-password
resources:
requests: # Guaranteed resources
memory: "256Mi"
cpu: "250m"
limits: # Maximum resources
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 15
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: myapp-svc
namespace: production
spec:
selector:
app: myapp
ports:
- port: 8080
targetPort: 8080
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts: [api.myapp.com]
secretName: myapp-tls
rules:
- host: api.myapp.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-svc
port:
number: 8080
Beyond basic deployments, production Kubernetes requires understanding of Horizontal Pod Autoscaling (HPA), Pod Disruption Budgets (PDB), RBAC, Network Policies, and Helm for package management. These features separate a test cluster from a production-grade, secure, auto-scaling cluster.
Traffic Spike
β
v
Metrics Server collects CPU/Memory from nodes
β
v
HPA Controller checks metrics every 15s
βββββββββββββββββββββββββββββββββββββββββ
β target CPU = 60% β
β current CPU = 85% β
β current replicas = 3 β
β desired = ceil(3 Γ 85/60) = 5 pods β
βββββββββββββββββββββ¬ββββββββββββββββββββ
β scale up
v
Deployment: 3 pods β 5 pods (new pods scheduled on nodes)
β
Traffic drops: β
v
HPA scales down 5 β 3 (respects stabilization window)apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60 # Scale up if avg CPU > 60%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5min before scaling down
# Helm is like apt/yum for Kubernetes helm repo add stable https://charts.helm.sh/stable helm repo add bitnami https://charts.bitnami.com/bitnami helm install my-postgres bitnami/postgresql \ --set auth.postgresPassword=secret \ --set primary.persistence.size=10Gi helm upgrade myapp ./myapp-chart --set image.tag=v2.0.1 helm rollback myapp 1 # Rollback to revision 1 helm list # List installed releases helm history myapp # Show revision history
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: production name: developer-role rules: - apiGroups: ["apps"] resources: ["deployments", "pods"] verbs: ["get", "list", "watch"] # Read-only for developers --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: developer-binding namespace: production subjects: - kind: User name: john.doe@company.com apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: developer-role apiGroup: rbac.authorization.k8s.io
GitOps is a deployment methodology where Git is the single source of truth for both application code and infrastructure configuration. Instead of running kubectl apply from a CI pipeline (push model), a GitOps agent like ArgoCD runs inside the cluster, continuously watches a Git repository, and automatically syncs any changes to the cluster (pull model). If someone manually changes the cluster, ArgoCD detects the drift and reverts it.
ArgoCD is a declarative GitOps continuous delivery tool for Kubernetes. It monitors Git repos for changes to Kubernetes manifests (YAML, Helm charts, Kustomize) and ensures the cluster always matches what's in Git. This provides complete auditability β every change is a Git commit with author, timestamp, and reason.
TRADITIONAL (Push Model - Jenkins):
Developer β git push β Jenkins pipeline β kubectl apply β K8s Cluster
(CI has admin credentials to cluster β security risk)
GITOPS (Pull Model - ArgoCD):
Developer β git push β Git Repo (config changes)
β
ArgoCD watches β (in-cluster, no external access needed)
β
βββββββββββΌβββββββββββββββββββββββ
β ArgoCD (in K8s) β
β ββββββββββββββββββββββββββββββ β
β β Application Controller β β
β β - Compares Git vs Cluster β β
β β - Detects drift β β
β β - Auto-syncs on change β β
β ββββββββββββββββββββββββββββββ β
ββββββββββββββ¬ββββββββββββββββββββ
β sync
v
K8s Cluster
(always matches Git state)apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-production
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/company/k8s-configs.git
targetRevision: main
path: apps/production/myapp # Folder with K8s YAML or Helm chart
helm:
valueFiles:
- values-production.yaml
destination:
server: https://kubernetes.default.svc # Target cluster
namespace: production
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Auto-revert manual changes
syncOptions:
- CreateNamespace=true
retry:
limit: 3
backoff:
duration: 5s
maxDuration: 1m
# CI Pipeline (GitHub Actions / Jenkins): # 1. Developer pushes app code # 2. CI builds, tests, scans Docker image # 3. CI pushes image to registry (ghcr.io/team/myapp:abc123) # 4. CI opens a PR to the config repo: # Update image tag in k8s-configs/apps/production/values.yaml # from: image.tag: v1.0.0 # to: image.tag: abc123 # GitOps Pipeline (ArgoCD): # 5. PR reviewed and merged to config repo # 6. ArgoCD detects change in config repo # 7. ArgoCD syncs new image tag to cluster # 8. Kubernetes performs rolling update # 9. ArgoCD reports sync status β
Observability means understanding the internal state of a system by examining its outputs. Modern systems require three pillars of observability: Metrics (numerical time-series data β CPU, request rate, error rate), Logs (textual event records), and Traces (request journeys across microservices). The standard DevOps observability stack combines Prometheus (metrics collection) + Grafana (visualization + alerting) + ELK/Loki (logs).
Applications + Kubernetes Nodes
β
ββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββ
β DATA COLLECTION LAYER β
β βββββββββββββββ βββββββββββββ βββββββββββββββ β
β β Prometheus β β Loki β β Jaeger β β
β β (metrics) β β (logs) β β (traces) β β
β β scrapes β β receives β β receives β β
β β /metrics β β log push β β spans β β
β ββββββββ¬βββββββ βββββββ¬ββββββ ββββββββ¬βββββββ β
βββββββββββΌββββββββββββββββΌββββββββββββββββΌβββββββββββ
β β β
βββββββββββββββββ΄ββββββββββββββββ
β
Data Sources
β
βββββββββββΌββββββββββ
β GRAFANA β
β Dashboards β
β Alerting Rules β
β Explore (adhoc) β
βββββββββββ¬ββββββββββ
β alerts
βββββββββββΌββββββββββ
β Alert Manager β
β β Slack β
β β PagerDuty β
β β Email β
βββββββββββββββββββββ# Prometheus scrape config (prometheus.yml)
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# Key PromQL queries for DevOps dashboards:
# HTTP Request Rate (requests/second)
rate(http_requests_total[5m])
# Error Rate (%)
rate(http_requests_total{status=~"5.."}[5m])
/ rate(http_requests_total[5m]) * 100
# 95th Percentile Latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# CPU Usage per Pod
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
# Memory Usage
container_memory_working_set_bytes{container!=""} / 1024 / 1024
# Alert Rule Example (in Grafana UI or as YAML):
groups:
- name: application
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
for: 2m # Must be true for 2 minutes before firing
labels:
severity: critical
annotations:
summary: "High error rate on {{ $labels.service }}"
description: "Error rate is {{ $value | humanizePercentage }}"
runbook: "https://wiki/runbooks/high-error-rate"
- alert: PodCrashLooping
expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
for: 0m
labels:
severity: warning
Ansible is an agentless, open-source configuration management and automation tool. Unlike Chef or Puppet which require agents installed on managed nodes, Ansible uses SSH (or WinRM for Windows) to connect to target machines and execute tasks defined in human-readable YAML files called Playbooks. This agentless approach makes Ansible easy to adopt β just install Python on the target and you're ready.
Ansible follows a push-based model: the Ansible control node (your machine or CI server) pushes configuration to managed nodes. It is idempotent β running the same playbook 10 times produces the same result as running it once (it won't install Nginx twice if it's already installed).
CONTROL NODE (your machine / Jenkins)
βββββββββββββββββββββββββββββββββββββββββββββ
β Ansible Engine β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β βInventory β βPlaybooks β β Modules β β
β β(who) β β(what) β β(how) β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
ββββββββββββββββββββββββ¬βββββββββββββββββββββ
β
SSH (port 22) β no agents needed!
ββββββββββββββββΌβββββββββββββββ
β β β
v v v
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Web Server β β App Server β β DB Server β
β Ubuntu 22 β β Ubuntu 22 β β Ubuntu 22 β
β Python only β β Python onlyβ β Python onlyβ
β (no agent) β β (no agent) β β (no agent) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
Ansible copies module code to target via SSH,
executes it, retrieves results, deletes temp files.| Feature | Ansible | Chef | Puppet | SaltStack |
|---|---|---|---|---|
| Agent Required | β No | β Yes | β Yes | Optional |
| Language | YAML | Ruby DSL | Puppet DSL | YAML/Python |
| Push/Pull | Push | Pull | Pull | Both |
| Learning Curve | Low | High | High | Medium |
| Best For | Ad-hoc, apps, cloud | Complex config | Enterprise config | Large scale |
The inventory tells Ansible which hosts to manage and how to connect to them. Hosts can be grouped logically (web servers, DB servers, production, staging). Inventory can be static (an INI or YAML file) or dynamic (a script that queries AWS/Azure/GCP for running instances in real-time). Dynamic inventories are essential for cloud environments where servers come and go.
inventory/
βββ hosts.yml (static inventory)
βββ aws_ec2.yml (dynamic inventory plugin)
βββ group_vars/
βββ all.yml (vars for ALL hosts)
βββ webservers.yml (vars for webservers group)
βββ production.yml (vars for production group)# inventory/hosts.yml
all:
children:
webservers:
hosts:
web1.prod.com:
ansible_user: ubuntu
ansible_ssh_private_key_file: ~/.ssh/prod.pem
web2.prod.com:
ansible_user: ubuntu
dbservers:
hosts:
db1.prod.com:
ansible_user: ubuntu
ansible_port: 2222 # Custom SSH port
staging:
children:
web_staging:
hosts:
staging-web.company.com:
db_staging:
hosts:
staging-db.company.com:
# Format: ansible [pattern] -m [module] -a [arguments] # Test connectivity ansible all -m ping ansible webservers -m ping # Run shell commands ansible webservers -m shell -a "df -h" ansible dbservers -m shell -a "systemctl status postgresql" # Copy files ansible webservers -m copy -a "src=./nginx.conf dest=/etc/nginx/nginx.conf" # Install packages ansible webservers -m apt -a "name=nginx state=present" --become # Manage services ansible webservers -m service -a "name=nginx state=restarted" --become # Gather facts (info about target) ansible web1.prod.com -m setup | grep ansible_distribution
An Ansible Playbook is a YAML file that defines a set of ordered Plays. Each Play maps a set of Tasks to a group of hosts. Tasks call Modules β over 3000 built-in modules exist for managing packages, files, services, cloud resources, databases, users, and more. Playbooks are idempotent, version-controlled, and the primary way to automate complex multi-step configurations.
ansible-playbook deploy-app.yml
β
v
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PLAY 1: Configure Web Servers β
β hosts: webservers β
β βββββββββββββββββββββββββββββββββββββββββββββ β
β β Task 1: Update apt cache β β
β β Task 2: Install nginx β β
β β Task 3: Copy nginx config (template) β β
β β Task 4: Notify handler: restart nginx β β
β βββββββββββββββββββββββββββββββββββββββββββββ β
β HANDLERS (run only if notified): β
β β β restart nginx β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PLAY 2: Deploy Application β
β hosts: webservers β
β βββββββββββββββββββββββββββββββββββββββββββββ β
β β Task 1: Pull Docker image β β
β β Task 2: Stop old container β β
β β Task 3: Start new container β β
β β Task 4: Health check β β
β βββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ---
- name: Configure and Deploy Web Application
hosts: webservers
become: yes # Run as sudo
vars:
app_name: myapp
app_port: 8080
app_version: "{{ lookup('env', 'APP_VERSION') | default('latest') }}"
pre_tasks:
- name: Update package cache
apt:
update_cache: yes
cache_valid_time: 3600 # Only update if cache is > 1hr old
tasks:
- name: Install required packages
apt:
name:
- nginx
- docker.io
- curl
state: present
- name: Deploy nginx config from template
template:
src: templates/nginx.conf.j2
dest: /etc/nginx/sites-available/{{ app_name }}
owner: www-data
mode: '0644'
notify: Reload nginx # Trigger handler only if file changed
- name: Pull application Docker image
docker_image:
name: "registry.io/{{ app_name }}:{{ app_version }}"
source: pull
- name: Run application container
docker_container:
name: "{{ app_name }}"
image: "registry.io/{{ app_name }}:{{ app_version }}"
state: started
restart_policy: always
ports:
- "{{ app_port }}:{{ app_port }}"
env:
DB_HOST: "{{ db_host }}"
DB_PASSWORD: "{{ vault_db_password }}" # From Ansible Vault (encrypted)
- name: Wait for application to be healthy
uri:
url: "http://localhost:{{ app_port }}/health"
status_code: 200
register: health_result
until: health_result.status == 200
retries: 12
delay: 5
handlers:
- name: Reload nginx
service:
name: nginx
state: reloaded
- name: Restart app
docker_container:
name: "{{ app_name }}"
state: started
restart: yes
roles/
nginx/
tasks/
main.yml # Main task list
handlers/
main.yml # Handlers
templates/
nginx.conf.j2 # Jinja2 templates
vars/
main.yml # Role variables
defaults/
main.yml # Default values (overridable)
meta/
main.yml # Dependencies on other roles
# Using roles in a playbook:
- hosts: webservers
roles:
- nginx
- { role: app-deploy, app_version: v2.0.1 }
Infrastructure as Code (IaC) means managing and provisioning cloud infrastructure through machine-readable configuration files rather than clicking through web consoles or running manual commands. When infrastructure is code, it can be versioned, reviewed, tested, and rolled back β just like application code. This eliminates configuration drift (where production differs from what you think it is) and enables full infrastructure reproducibility.
Terraform by HashiCorp is the leading IaC tool. It uses a declarative language (HCL β HashiCorp Configuration Language) where you describe the desired end state of your infrastructure, and Terraform figures out how to get there. Terraform is cloud-agnostic β it works with 300+ providers including AWS, Azure, GCP, Kubernetes, GitHub, Datadog, and more through a provider plugin system.
.tf files (your code)
β
β terraform init
v
Download providers (AWS, Azure, GCP plugins)
β
β terraform plan
v
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TERRAFORM PLAN β
β Read current state (terraform.tfstate) β
β Query actual cloud state (via provider APIs) β
β Compare: desired vs actual β
β Generate execution plan: β
β + resource "aws_instance" "web" (CREATE) β
β ~ resource "aws_sg" "allow_http" (MODIFY) β
β - resource "aws_instance" "old" (DESTROY) β
ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β terraform apply
v
Terraform calls AWS/Azure/GCP APIs
Creates/modifies/destroys resources
β
v
terraform.tfstate updated (source of truth)| Tool | Purpose | Approach | State | Best For |
|---|---|---|---|---|
| Terraform | Provision infrastructure | Declarative | State file | Cloud resources (VMs, VPCs, RDS) |
| Ansible | Configure servers | Procedural/Decl | Stateless | Software install, app deploy |
| CloudFormation | AWS infra only | Declarative | CF stacks | AWS-only shops |
| Pulumi | Provision infra | Declarative (code) | State file | Developers preferring Python/TS |
Terraform uses HCL (HashiCorp Configuration Language) β a human-readable configuration language designed to be easier than JSON/YAML while being machine-parseable. The four main block types are terraform (settings), provider (cloud APIs), resource (infrastructure objects), and data (read existing resources).
# main.tf β Provider configuration
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0" # Allow 5.x versions only
}
}
backend "s3" { # Remote state in S3
bucket = "my-terraform-state"
key = "production/terraform.tfstate"
region = "ap-south-1"
encrypt = true
}
}
provider "aws" {
region = var.aws_region # Use variable
default_tags {
tags = {
Project = "DevOps-Class"
ManagedBy = "Terraform"
Environment = var.environment
}
}
}
# variables.tf β Input variables
variable "aws_region" {
description = "AWS region to deploy to"
type = string
default = "ap-south-1"
}
variable "instance_type" {
type = string
default = "t3.micro"
validation {
condition = contains(["t3.micro", "t3.small", "t3.medium"], var.instance_type)
error_message = "Must be a valid t3 instance type."
}
}
variable "environment" {
type = string
}
# outputs.tf β Output values
output "public_ip" {
description = "Public IP of web server"
value = aws_instance.web.public_ip
}
output "db_endpoint" {
value = aws_db_instance.main.endpoint
sensitive = true # Won't print in logs
}
# ec2.tf
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
}
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
}
resource "aws_security_group" "web" {
name = "web-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_instance" "web" {
count = 2 # Create 2 instances
ami = data.aws_ami.ubuntu.id
instance_type = var.instance_type
subnet_id = aws_subnet.public[count.index].id
vpc_security_group_ids = [aws_security_group.web.id]
key_name = "my-keypair"
user_data = <<-EOF
#!/bin/bash
apt-get update -y
apt-get install -y nginx
systemctl start nginx
echo "Hello from instance ${count.index}" > /var/www/html/index.html
EOF
}
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical (Ubuntu)
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-*-22.04-amd64-server-*"]
}
}
terraform init # Initialize, download providers & modules terraform validate # Check syntax terraform fmt # Format code (run before git commit) terraform plan # Preview changes (dry run) terraform plan -out=tfplan # Save plan to file terraform apply tfplan # Apply saved plan (no confirmation needed) terraform apply -target=aws_instance.web # Apply only specific resource terraform destroy # Destroy ALL resources terraform state list # List all managed resources terraform state show aws_instance.web # Show resource details terraform import aws_instance.web i-1234567890 # Import existing resource
A Terraform module is a collection of Terraform files in a directory that can be reused and shared. Modules are the primary mechanism for code reuse in Terraform β instead of writing the same VPC configuration in every project, you create a VPC module and call it with different parameters. Modules enforce consistency and reduce duplication across teams.
infrastructure/
βββ main.tf (root module β calls child modules)
βββ variables.tf
βββ outputs.tf
βββ modules/
βββ vpc/ (VPC module)
β βββ main.tf
β βββ variables.tf
β βββ outputs.tf
βββ ec2/ (EC2 module)
βββ rds/ (RDS module)
Root module calls child modules:
module "vpc" {
source = "./modules/vpc"
cidr = "10.0.0.0/16"
}
module "web_servers" {
source = "./modules/ec2"
vpc_id = module.vpc.vpc_id # Pass output of one module to another
subnet_ids = module.vpc.public_subnet_ids
instance_count = 3
}# modules/ec2/main.tf
resource "aws_launch_template" "this" {
name_prefix = "${var.name}-"
image_id = var.ami_id
instance_type = var.instance_type
key_name = var.key_name
network_interfaces {
security_groups = [aws_security_group.this.id]
}
tag_specifications {
resource_type = "instance"
tags = merge(var.tags, { Name = var.name })
}
}
resource "aws_autoscaling_group" "this" {
name = var.name
vpc_zone_identifier = var.subnet_ids
desired_capacity = var.desired_count
min_size = var.min_count
max_size = var.max_count
launch_template {
id = aws_launch_template.this.id
version = "$Latest"
}
}
# modules/ec2/variables.tf
variable "name" { type = string }
variable "ami_id" { type = string }
variable "instance_type" { type = string; default = "t3.micro" }
variable "subnet_ids" { type = list(string) }
variable "desired_count" { type = number; default = 2 }
variable "min_count" { type = number; default = 1 }
variable "max_count" { type = number; default = 10 }
variable "tags" { type = map(string); default = {} }
# modules/ec2/outputs.tf
output "asg_name" { value = aws_autoscaling_group.this.name }
output "asg_arn" { value = aws_autoscaling_group.this.arn }
# Use official AWS VPC module from Terraform Registry
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "production-vpc"
cidr = "10.0.0.0/16"
azs = ["ap-south-1a", "ap-south-1b", "ap-south-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
enable_vpn_gateway = false
}
Terraform stores the mapping between your configuration and real-world resources in a state file (terraform.tfstate). This is how Terraform knows that aws_instance.web in your code corresponds to instance i-0a1b2c3d4e in AWS. Without state, Terraform would recreate every resource on every apply. The state file is the most critical file in your Terraform project β losing it means losing the ability to manage your infrastructure with Terraform.
Developer A Developer B
terraform apply terraform apply
β β
v v
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Remote State Backend (S3) β
β s3://my-tfstate/prod/terraform.tfstate β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββ β
β β State Locking (DynamoDB) β β
β β LockID: prod/terraform.tfstate β β
β β If Dev A holds lock β Dev B gets error: β β
β β "Error acquiring the state lock" β β
β β Prevents simultaneous applies (race cond) β β
β ββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ# backend.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "production/myapp/terraform.tfstate"
region = "ap-south-1"
encrypt = true
dynamodb_table = "terraform-state-lock" # For state locking
}
}
# Create the S3 bucket and DynamoDB table first (bootstrap):
resource "aws_s3_bucket" "tf_state" {
bucket = "company-terraform-state"
}
resource "aws_s3_bucket_versioning" "tf_state" {
bucket = aws_s3_bucket.tf_state.id
versioning_configuration { status = "Enabled" } # Versioning for state history!
}
resource "aws_dynamodb_table" "tf_lock" {
name = "terraform-state-lock"
hash_key = "LockID"
billing_mode = "PAY_PER_REQUEST"
attribute {
name = "LockID"
type = "S"
}
}
# Workspaces allow multiple state files for same config
terraform workspace new staging
terraform workspace new production
terraform workspace select production
# Use workspace in config:
resource "aws_instance" "web" {
instance_type = terraform.workspace == "production" ? "t3.medium" : "t3.micro"
count = terraform.workspace == "production" ? 3 : 1
}
infrastructure/
βββ environments/
β βββ dev/
β β βββ main.tf (calls modules with dev values)
β β βββ variables.tf
β β βββ terraform.tfvars (dev-specific values β NOT in git for prod!)
β βββ staging/
β βββ production/
βββ modules/
β βββ vpc/
β βββ eks-cluster/
β βββ rds/
β βββ alb/
βββ .github/
βββ workflows/
βββ terraform.yml (CI/CD for infrastructure)# .github/workflows/terraform.yml
on:
pull_request:
paths: ['infrastructure/**']
push:
branches: [main]
paths: ['infrastructure/**']
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.7.0
- name: Terraform Init
run: terraform init
working-directory: infrastructure/environments/production
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Validate
run: terraform validate
- name: Terraform Plan (on PR)
if: github.event_name == 'pull_request'
run: terraform plan -no-color -out=tfplan
- name: Terraform Apply (on merge to main)
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: terraform apply -auto-approve tfplan
terraform fmt before committerraform plan before applyterraform.tfstate to Git*.tfvars with secretsterraform apply without planterraform.tfstate file contains ALL your infrastructure details in plaintext, including sensitive values. Always: (1) Store in S3 with encryption, (2) Enable versioning, (3) Restrict IAM access, (4) Never commit to Git.OpenTofu is a community-driven, open-source fork of Terraform created in response to HashiCorp's controversial license change in August 2023. HashiCorp changed Terraform from the Mozilla Public License (MPL 2.0 β truly open source) to the Business Source License (BSL/BUSL β restricts commercial use). This alarmed the DevOps community, leading to the formation of the OpenTofu project under the Linux Foundation.
OpenTofu aims to be a drop-in replacement for Terraform β it uses the same HCL syntax, same providers, same state format, and same workflow commands. If you know Terraform, you know OpenTofu. It is maintained by a community of contributors from companies like Spacelift, Gruntwork, env0, Scalr, and many others.
2014 ββββ Terraform 0.1 released (MPL 2.0 license β open source)
2022 ββββ Terraform 1.0 β 1.5 (stable, widely adopted)
Aug 2023 β HashiCorp changes license to BSL (commercial restriction)
"If you compete with HashiCorp, you can't use Terraform"
β
β Community reaction
v
Sep 2023 β OpenTofu fork announced (Linux Foundation)
Supported by Gruntwork, Spacelift, env0, Scalr, etc.
Jan 2024 β OpenTofu 1.6.0 released (stable, GA)
2024 ββββ OpenTofu adds features ahead of Terraform:
- State encryption
- Provider-defined functions
- Improved testing framework
OpenTofu promise: Always open source (MPL 2.0)# Install OpenTofu curl -fsSL https://get.opentofu.org/install-opentofu.sh | sudo bash -s -- --install-method rpm # Verify tofu --version # Migration is simple β OpenTofu reads Terraform state cd your-terraform-project/ tofu init # Downloads providers (same as terraform init) tofu plan # Same output as terraform plan tofu apply # Same workflow # tofu commands mirror terraform commands exactly: # terraform init β tofu init # terraform plan β tofu plan # terraform apply β tofu apply # terraform destroy β tofu destroy # terraform import β tofu import
| Feature | OpenTofu | Terraform |
|---|---|---|
| License | MPL 2.0 (truly open) | BSL (commercial restrictions) |
| State Encryption | β Built-in (1.7+) | β Only via backend |
| Provider Functions | β Supported | β Not yet |
| State backends | All Terraform backends | All + HCP Terraform |
| Cost | Free forever | Free (BSL restrictions apply) |
| Governance | Linux Foundation | HashiCorp / IBM |
Python is the dominant scripting and automation language in DevOps. While bash is great for simple shell tasks, Python excels at complex automation, API integrations, data processing, and building DevOps tools. Ansible is written in Python. The AWS CLI, Azure CLI, and Google Cloud SDK all have Python SDKs. Kubernetes client, Docker SDK, Terraform CDK β all have Python support.
Python's rich standard library and enormous ecosystem (PyPI) means you can write a script that calls AWS APIs, processes JSON, reads YAML config, makes HTTP calls, and sends Slack notifications in under 50 lines of code. This is why Python is a must-have skill for DevOps engineers.
Python Script / Tool
β
ββββββββ΄ββββββββββββββββββββββββββββββββββββββββββ
β Python DevOps Ecosystem β
β β
β boto3 βββββββββββββββΆ AWS APIs (EC2, S3, ECS) β
β kubernetes βββββββββββΆ K8s API Server β
β docker βββββββββββββββΆ Docker Engine API β
β requests βββββββββββββΆ Any REST API β
β paramiko βββββββββββββΆ SSH to servers β
β PyYAML βββββββββββββββΆ Parse YAML (K8s manifestsβ
β jinja2 βββββββββββββββΆ Template config files β
β click ββββββββββββββββΆ Build CLI tools β
β schedule βββββββββββββΆ Cron-like task schedulingβ
βββββββββββββββββββββββββββββββββββββββββββββββββββimport boto3
import json
from datetime import datetime, timedelta
# Auto-stop unused EC2 instances (save costs)
def stop_idle_instances():
ec2 = boto3.client('ec2', region_name='ap-south-1')
cloudwatch = boto3.client('cloudwatch', region_name='ap-south-1')
# Get all running instances
response = ec2.describe_instances(
Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
)
for reservation in response['Reservations']:
for instance in reservation['Instances']:
instance_id = instance['InstanceId']
# Check avg CPU in last hour
metrics = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
StartTime=datetime.utcnow() - timedelta(hours=1),
EndTime=datetime.utcnow(),
Period=3600,
Statistics=['Average']
)
if metrics['Datapoints']:
avg_cpu = metrics['Datapoints'][0]['Average']
if avg_cpu < 5.0: # Less than 5% CPU β idle!
print(f"Stopping idle instance {instance_id} (CPU: {avg_cpu:.1f}%)")
ec2.stop_instances(InstanceIds=[instance_id])
stop_idle_instances()
from kubernetes import client, config
import sys
def rolling_restart(namespace, deployment_name):
config.load_kube_config() # Load ~/.kube/config
apps_v1 = client.AppsV1Api()
# Get current deployment
deployment = apps_v1.read_namespaced_deployment(
name=deployment_name,
namespace=namespace
)
# Add restart annotation (triggers rolling restart)
if not deployment.spec.template.metadata.annotations:
deployment.spec.template.metadata.annotations = {}
deployment.spec.template.metadata.annotations['kubectl.kubernetes.io/restartedAt'] = \
datetime.utcnow().isoformat()
apps_v1.patch_namespaced_deployment(
name=deployment_name,
namespace=namespace,
body=deployment
)
print(f"β
Rolling restart triggered for {deployment_name}")
rolling_restart('production', 'myapp')
| Library | Purpose | Install |
|---|---|---|
| boto3 | AWS SDK β EC2, S3, ECS, Lambda, etc. | pip install boto3 |
| kubernetes | Kubernetes API client | pip install kubernetes |
| docker | Docker Engine API | pip install docker |
| requests | HTTP calls to any REST API | pip install requests |
| paramiko | SSH connections and SFTP | pip install paramiko |
| PyYAML | Parse and write YAML | pip install pyyaml |
| Jinja2 | Template engine (config generation) | pip install jinja2 |
| click | Build CLI tools easily | pip install click |
| python-dotenv | Load .env files as env vars | pip install python-dotenv |
| slack-sdk | Send Slack notifications | pip install slack-sdk |
python -m venv .venv) and pin your dependencies in requirements.txt (pip freeze > requirements.txt). This ensures your automation scripts produce consistent results across different machines and CI environments.