
For the longest time, Kubernetes (commonly abbreviated as "k8s") seemed like some mystical black box that somehow solves the challenges of distributing software at scale.
If you're anything like me, you hate not understanding the infrastructure your software runs on. It's frustrating to think that once you git push, your code has been sent off to DevOps land where you have little ability to debug anything, completely dependent on the DevOps or Platform Engineers for things as simple as tailing logs from stderr. This article will demystify k8s for you, covering:
After covering the "what" and the "why" of Kubernetes, I'll walk you through setting up Minikube to emulate a Kubernetes cluster on your own machine so you can follow along. By the time you're done reading, you'll be confident enough with kubectl that your DevOps engineers will be impressed -- or at least they'll be glad you aren't asking them how to view your server logs anymore.
At its core, Kubernetes is container orchestration software. It manages your containerized applications, scales them and the infrastructure they run on, and handles all the communication between containers. This means that in order to host your app with Kubernetes, it first must be containerized using software like Docker. Kubernetes was originally developed at Google, inspired by their internal system called Borg, which should give you a sense of the scale it was designed for. It's now an open-source project under the Cloud Native Computing Foundation (CNCF) and is written in Go.
Fun fact: the word "Kubernetes" comes from Greek and means "helmsman" -- the person steering a ship. In the context of containerized applications, I immediately think of large freight ships carrying shipping containers.
Say you want to host your app on DigitalOcean or any other VPS provider. When you rent server space, you aren't getting your own dedicated machine (what's referred to as "bare metal"). You're getting a virtual machine, which lets the provider rent out multiple server instances to multiple people on a single physical machine. For example, in my own home lab, I'm running multiple applications on a single Raspberry Pi 4 server.
That works great until you have one month where your traffic spikes. Maybe you're having a sale and people are flocking to your site. If you don't have enough CPU or memory to handle the traffic, your app could slow to a crawl or completely crash.
In response, you allocate more resources to your VPS. But then the other 11 months out of the year you're paying for more server than you need. And what happens when the physical machine itself doesn't have enough resources? Now you need to upgrade hardware. In the case of my Raspberry Pi, maybe now I need to replace it with a mini PC or Mac Mini, and if my app and the amount of traffic I have maxes that out, I need to upgrade my hardware yet again. It's an expensive, reactive cycle.
That's where Kubernetes comes in. It connects multiple machines into a cluster and automatically distributes and load balances your containers across them. Need more capacity? Add another machine to the cluster. Traffic dies down? Scale back. No more guessing how much server you'll need next month. Some home lab aficionados even do this with Raspberry Pis, networking multiple Pis together into a Kubernetes cluster (using K3s, a lightweight Kubernetes distribution that strips out cloud-provider features and defaults to SQLite instead of etcd). Ran out of memory or CPU? Just buy another Raspberry Pi and add it to the cluster.
Kubernetes is especially valuable when your application is deployed as a collection of microservices. Consider a large application like Amazon. There may be some days where they have a lot of new user accounts being created, but not necessarily an increase in orders being placed. Then Cyber Monday comes along and they get a huge spike in the amount of orders being placed, but not necessarily new user accounts being created.
Deploying these individual parts of the application as their own microservice allows Amazon to scale each part of the application individually to meet demand without wasting resources. If the application were one big monolith, scaling up the backend that handles new user accounts would also mean scaling up the part of the app that handles cart checkout, even if there's no spike in orders being placed.
Let's start with three main vocabulary words: Pod, Node, and Control Plane.
A Pod is a wrapper around a container (or occasionally a small group of tightly coupled containers -- for example, a web server and a logging sidecar that share a filesystem). Pods are the smallest deployable units in Kubernetes. Most of the time it's one container per pod.
A Node is a machine in your cluster -- specifically, a machine that runs your pods. You'll also see these called "Worker Nodes" to distinguish them from the Control Plane.
The Control Plane is a separate node (or set of nodes) that orchestrates everything: scheduling pods, scaling up and down, and auto-healing failed containers. Think of the Control Plane as the conductor and the Worker Nodes as the orchestra. (The Control Plane used to be called the "Master Node," but that term has been retired.)
Here's a quick example of how the Control Plane thinks about resource allocation. Say you have three worker nodes:
| Node | RAM |
|---|---|
| Node 1 | 16GB |
| Node 2 | 8GB |
| Node 3 | 8GB |
And five applications to run:
| App | Required RAM |
|---|---|
| App 1 | 12GB |
| App 2 | 2GB |
| App 3 | 5GB |
| App 4 | 4GB |
| App 5 | 4GB |
Kubernetes looks at the resource requirements and figures out the best distribution:
| Node | Apps | RAM Left Over |
|---|---|---|
| Node 1 | App 1 (12GB), App 2 (2GB) | 2GB |
| Node 2 | App 4 (4GB), App 5 (4GB) | 0GB |
| Node 3 | App 3 (5GB) | 3GB |
You didn't have to manually assign anything. Kubernetes just handled it. And this doesn't have to be five different applications -- Kubernetes can scale multiple instances of the same application in response to demand. More on scaling later.
Now that we know the lingo, let's get our fingers moving. If you don't want to follow along at home, scroll to the next section to keep reading.
You don't need a cloud account to follow along. Minikube runs a single-node Kubernetes cluster on your local machine -- available on macOS, Linux, and Windows. You'll need at least 2 CPUs, 2GB of RAM (4GB recommended), 20GB of free disk space, and a container runtime like Docker. The official installation page walks you through setup for your specific OS, but the short version:
# macOS
brew install minikube
# Linux (x86-64)
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
# Windows (PowerShell as Administrator)
winget install Kubernetes.minikube
Once installed:
minikube start # Spin up a local cluster
kubectl get nodes # You should see one node: minikube
Everything in this guide works on minikube. The jump to a real multi-node cluster is mostly configuration -- the concepts and commands are identical.
With your cluster running, creating a deployment is surprisingly straightforward. You just need two things: a name and a Docker image. There are plenty of open-source projects on GitHub that provide Docker containers to make self-hosting easy. Pick any of them for the sake of playing around. Some apps that I actually use myself are Gogs (GitHub alternative), VaultWarden (password manager), and PiHole (DNS sinkhole).
kubectl create deployment my-web-app --image=my-registry/my-web-app:latest
To peek at what's running:
kubectl get pods
And to access it locally for testing:
kubectl port-forward <pod-name> 8080:3000
This makes it so localhost:8080 will go to port 3000 in your deployed application. Your application's actual port may be different.
That's it. You've deployed a container to Kubernetes. It's not production-ready yet, but the fact that we went from zero to a running container in three commands is pretty remarkable. If it's not working yet, don't worry. We'll talk about troubleshooting soon after we discuss some basics.
This was the first thing that genuinely surprised me about Kubernetes: pods die. They die often. Sometimes without warning.
The ephemeral nature of pods is a feature, not a bug. Unlike traditional VMs or physical servers that might run indefinitely, pods are designed to be spun up, torn down, and replaced at a moment's notice.
Why is this a good thing? Flexibility and resilience. If a pod has a problem, Kubernetes kills it and spins up a fresh one. Instead of manually patching or debugging a sick server, you just replace it. This promotes immutability -- every new pod is a clean slate.
What this means for you as a developer: Never store persistent data directly on a pod. It will vanish when the pod dies. Plan for your application to restart from scratch at any time. (We'll cover how to handle persistent data in the Persistent Volumes section.)
But how does Kubernetes know when a pod has a problem? That's where health checks (called "probes") come in:
A simple HTTP-based health check in your deployment looks like this:
containers:
- name: my-web-app
image: docker.io/my-account/my-web-app:latest
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
Without probes, Kubernetes only knows if a container's process has exited. With them, it can detect and recover from problems where the process is alive but the application isn't functioning.
Some handy commands for working with pods:
kubectl logs <pod-name> # Check what happened
kubectl delete pod <pod-name> # Kill a pod (k8s will restart it)
kubectl get pods -o wide # See pods with their IP addresses
Every pod gets a unique virtual IP address within the cluster. These are internal-only -- not the same as the node's IP. Pods can talk to each other using these addresses, which is how Kubernetes simplifies communication inside the cluster.
Here's where Kubernetes philosophy gets interesting. You don't tell Kubernetes how to run your app step by step. Instead, you describe your desired state -- "I want three replicas of this service running" -- and the Deployment Controller's job is to make reality match that description.
You declare what you want; Kubernetes figures out how to make it happen. That simple idea underpins almost everything in the system. The way we declare what we want is in the form of YAML files.
To view the YAML for the current deployment and edit it in our terminal, we can use the following commands:
kubectl get deployment my-web-app -o yaml # See the full deployment config
kubectl edit deployment my-web-app # Edit it live
A simple Kubernetes deployment file looks something like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-web-app
labels:
app: my-web-app
spec:
replicas: 1
selector:
matchLabels:
app: my-web-app
template:
metadata:
labels:
app: my-web-app
spec:
containers:
- name: my-web-app
image: docker.io/my-account/my-web-app:v1.2.3
env:
- name: HTTP_PORT
value: "3000"
TIP: Always pin your images to a specific version tag (like
v1.2.3) rather than using:latest. The:latesttag makes rollbacks unreliable because you can't tell which version is actually running, and Kubernetes may not pull a new image if the tag hasn't changed. Most examples in this article use:latestfor simplicity, but in production, use explicit version tags.
When you export a deployment to YAML, here's what the key sections mean:
apps/v1).Deployment)Notice how the label app: my-web-app appears in three places: the Deployment metadata, the selector, and the pod template. Labels are key-value pairs that Kubernetes uses as its primary organizational mechanism. They're how everything in k8s finds everything else -- Services use label selectors to find their pods, Deployments use them to manage their ReplicaSets, and you can use them to filter any kubectl get command with -l app=my-web-app. If the labels don't match, things silently won't connect, which makes them one of the most common sources of "why isn't this working?" moments.
The replica count is our desired number of pods at any given moment. Under the hood, deployments use something called a ReplicaSet to maintain a stable set of replica pods. You'll probably never create a ReplicaSet directly -- deployments manage them for you -- but you'll see the term in logs and documentation, so it's good to know what it is.
kubectl get replicasets
So far, we've exported the YAML from the Kubernetes CLI and edited the YAML in the CLI. But as developers, we want to be able to check everything into version control and edit files in our favorite editor, like IntelliJ or VS Code. This is especially important as our configurations get more complex and the number of YAML files grows. That's where the apply command comes in:
kubectl apply -f my-web-app-deployment.yaml
You can also point apply at an entire directory, and it will apply every YAML file in it:
kubectl apply -f k8s/ # applies all YAML files in the k8s/ directory
The -f flag tells the apply command that we're giving it a filename (or directory). The apply command will create the deployment if it doesn't exist and updates it if it does. This is different from the create command that we used before -- create will fail if the deployment already exists. This distinction reflects a broader Kubernetes philosophy: declarative vs. imperative. The create command is imperative -- "create this thing right now." The apply command is declarative -- "make reality match this file." In practice, apply is almost always what you want, because it's idempotent and plays nicely with version-controlled YAML files.
Every time you update a deployment -- say, by changing the image tag to deploy a new version -- Kubernetes doesn't just kill all your pods and start new ones. It performs a rolling update: gradually replacing old pods with new ones so your application stays available throughout the process.
You can watch this happen in real time:
kubectl rollout status deployment/my-web-app
This will stream progress until the rollout completes. If you've ever deployed a new version and wondered "is it actually rolling out or is it stuck?", this is the command you want.
Kubernetes also keeps a history of your deployments:
kubectl rollout history deployment/my-web-app
This shows your recent revisions. If you just deployed a bad image and need to undo it immediately, you don't have to scramble to find the previous image tag and redeploy. Just roll back:
kubectl rollout undo deployment/my-web-app
This reverts to the previous revision. You can also roll back to a specific revision number:
kubectl rollout undo deployment/my-web-app --to-revision=3
Rolling updates and rollbacks are one of the most practical things to know as a developer working with Kubernetes. Deployments go wrong -- it's nice to know that recovery is one command away.
Once you've created the deployment, you'll want to verify that all the pods in your deployment are running and healthy. This may be where you get your first taste of troubleshooting.
To view the state of the pods in your deployment, run
kubectl get pods
If something went wrong, you may see something like this:
NAME READY STATUS RESTARTS AGE
my-web-app-86786cb9c6-8rpmg 0/1 CrashLoopBackOff 1 (7s ago) 9s
CrashLoopBackOff is the most common faulty pod status. This means that something caused the pod to crash, and Kubernetes tries to self-heal by automatically restarting the container, but the container keeps crashing (the "crash loop"). However, each time it crashes, it will wait a little bit longer before trying to restart it again (the "back off").
Common causes:
To see why the application in the pod is crashing, start by checking the application logs:
kubectl get pods
kubectl logs <crashing-pod-name>
kubectl logs <crashing-pod-name> -f # Stream logs in real time (like tail -f)
kubectl logs <crashing-pod-name> --previous # Logs from the last crashed container
The --previous flag is especially important for CrashLoopBackOff. Without it, kubectl logs shows the current container's output, which may be empty if it just restarted. --previous shows the logs from the container that actually crashed -- which is where the error you're looking for lives.
Here's an example: the deployment above has an environment variable called "HTTP_PORT" with a value of "3000". If that environment variable is required by the container, and I forgot to set it in the deployment, the application might have an error log saying something like "No HTTP_PORT found in environment."
This tripped me up for an embarrassingly long time. kubectl logs and kubectl describe pod both help you debug, but they answer completely different questions.
kubectl logs <pod-name> shows you your application's stdout and stderr -- the same output you'd see if you ran the process in a terminal. This is where you'll find application-level errors: stack traces, unhandled exceptions, "connection refused to database" messages, that kind of thing. If your app started up and then crashed, the answer is usually in the logs.
kubectl describe pod <pod-name> shows you the Kubernetes-level story of what happened to the pod. It includes the event history: when the pod was scheduled, which node it was assigned to, whether the image was pulled successfully, why the container was killed. This is where you'll find infrastructure-level problems -- things that happened before your application code ever ran.
Here's how I think about it:
| Situation | Use This |
|---|---|
| App starts but crashes with an error | kubectl logs |
| App throws an exception at runtime | kubectl logs |
Pod is stuck in Pending and won't start |
kubectl describe pod |
Pod shows ImagePullBackOff |
kubectl describe pod |
Container keeps getting OOMKilled |
kubectl describe pod (to confirm), then kubectl logs (to understand why) |
| You have no idea what's wrong | kubectl describe pod first, then kubectl logs |
In practice, I usually run describe first to get the big picture, then logs to dig into the application-level details. Between the two of them, you can diagnose the vast majority of pod issues.
One more debugging tool worth knowing: kubectl get events shows a chronological log of everything happening in the cluster (or a specific namespace). It's like a timeline of scheduling decisions, image pulls, container starts, failures, and more -- across all pods, not just one.
kubectl get events # All events in the current namespace
kubectl get events --sort-by='.lastTimestamp' # Sorted by most recent
This is especially useful when multiple things are going wrong at once and you want to see the bigger picture, or when a pod hasn't even been created yet and there's nothing to describe.
Okay, this section is for everyone who's been SSH'ing into EC2 instances for years and is now staring at Kubernetes wondering how to just get in the box. I was right there with you.
In the old world, debugging an issue might look like: SSH into the server, check the logs, maybe fire up a REPL to poke at the running application, inspect some files on disk. Straightforward.
Kubernetes doesn't have SSH. Pods aren't long-lived servers you maintain -- they're disposable containers. But you absolutely can get a shell inside a running pod:
kubectl exec -it <pod-name> -- /bin/sh
The -it flags give you an interactive terminal (just like docker exec -it). The -- separates the kubectl arguments from the command you want to run inside the container. /bin/sh gives you a shell. If bash or zsh is available in the image, you can use /bin/bash or /bin/zsh instead.
Once you're in, you can do most things you'd do over SSH:
# Look around the filesystem
ls /app
# Check environment variables
env | grep HTTP_PORT
# Inspect a config file
cat /etc/my-config/settings.yaml
# Start a REPL or interactive shell for your language
python manage.py shell # Django
node # Node.js
Here's the thing, though: you should think of kubectl exec as a debugging tool, not a workflow. In the EC2 world, SSH'ing in and tweaking things was normal. In Kubernetes, that mindset will hurt you because:
The right fix is always in your image, your deployment config, or your ConfigMap -- never in a manual change inside a running pod. But for investigation? For poking around to understand why something is broken? kubectl exec can be invaluable.
As we saw in the deployment example, we can set environment variables in the deployment file itself, but that couples the configuration of the application (what's inside the pod) to the configuration of the deployment (what's outside the pod and the pods themselves). Having to edit a deployment to change your application's configuration isn't ideal, especially if you're setting a lot of environment variables for your application.
ConfigMaps decouple configuration from your container images. They exist in their own YAML file:
apiVersion: v1
kind: ConfigMap
metadata:
name: my-web-app-configmap
data:
HTTP_PORT: "3000"
LOG_LEVEL: "info"
And then update the deployment to use the ConfigMap:
apiVersion: apps/v1
[...]
spec:
containers:
- name: my-web-app
image: docker.io/my-account/my-web-app:latest
envFrom:
- configMapRef:
name: my-web-app-configmap
Apply it:
kubectl apply -f my-web-app-configmap.yaml
kubectl get configmaps
kubectl apply -f my-web-app-deployment.yaml
ConfigMaps are great for non-sensitive configuration: ports, URLs of other services, feature flags, etc.
But here's the critical thing: ConfigMaps are stored in plain text. Anyone with cluster access can read them. Do not store sensitive information like API keys in ConfigMaps.
For sensitive information, Kubernetes has Secrets. Secrets can be scoped with RBAC so only certain pods or users can access them, and they can be mounted as files or injected as environment variables -- just like ConfigMaps. Kubernetes also supports encryption at rest so that Secrets are encrypted in etcd rather than stored in plain text. You can create one from the command line:
kubectl create secret generic my-web-app-secrets \
--from-literal=API_KEY=abc123 \
--from-literal=DB_PASSWORD=supersecret
Then reference the Secret in your deployment, just like a ConfigMap:
apiVersion: apps/v1
[...]
spec:
containers:
- name: my-web-app
image: docker.io/my-account/my-web-app:latest
envFrom:
- configMapRef:
name: my-web-app-configmap
- secretRef:
name: my-web-app-secrets
One caveat: Secrets are base64-encoded by default, not encrypted. Base64 is an encoding, not a security measure. For production workloads, enable encryption at rest and consider using your cloud provider's secrets manager (AWS Secrets Manager, GCP Secret Manager, etc.) for an extra layer of protection.
Now that we've finally got our app up and running in a Kubernetes pod (or pods), it's time to connect our application to the outside world so we can actually route traffic into it.
If we run kubectl get pods -o wide, we'll see that every pod in the deployment has its own unique IP address. Moreover, if we kubectl delete pod <pod-name>, we'll see that the new pod Kubernetes creates to replace it has a different IP address than the one that was deleted. These IP addresses allow Kubernetes to communicate internally among the pods, but it isn't going to be very useful for trying to route internet traffic to our application. We need an IP address for all the pods of our deployment that isn't going to change.
That's what Services do. A Service provides a stable endpoint and load balances traffic across a group of pods. Even when individual pods are destroyed and recreated, the Service URL stays the same.
A YAML file for a service configuration looks like this:
apiVersion: v1
kind: Service
metadata:
name: web-service
spec:
selector:
app: my-web-app
ports:
- protocol: TCP
port: 80
targetPort: 3000
This listens for traffic on port 80 (the standard HTTP port) and tells Kubernetes to route that traffic to port 3000 of the pods in the specified deployment. In this case, 3000 is the same value that the example configMap set for HTTP_PORT, and selector.app: my-web-app is the same as the labels: app: value of the example deployment.
Then apply the service:
kubectl apply -f web-service.yaml
kubectl port-forward service/web-service 8080:80
kubectl get svc web-service -o yaml
There are several types of services, and they actually build on each other:
Which should you use? If your service only needs to be accessed within the cluster (which is the case for most microservices), go with ClusterIP. Use NodePort or LoadBalancer when you need to expose something to the outside world.
In most production setups, you won't use NodePort or LoadBalancer directly to expose services. Instead, you'll use an Ingress resource. If Services are the internal phone system, Ingress is the receptionist directing outside callers.
Ingress gives you:
An Ingress resource defines routing rules. For example, traffic to app.example.com goes to your web service, and traffic to api.example.com goes to your API service -- all through a single external IP. It's like declaring routes in Express.js or Ring (Clojure), but for going to different applications or services in your cluster.
Here's an example ingress YAML:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: my-web-app.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-service
port:
number: 80
This says "if traffic comes in for the host 'my-web-app.com' and the path matches '/', route that traffic to the service called 'web-service' on port 80."
Apply it like any other resource:
kubectl apply -f app-ingress.yaml
NOTE: Ingress resources require an Ingress Controller to actually function. On Minikube, enable one with
minikube addons enable ingress. In cloud environments, your provider typically has one built in.
The whole flow from a user typing your URL in their browser to getting your webpage looks something like this:
HTTP Request -> Ingress -> Service -> Pods
TIP: You can add any domain name you want to your
/etc/hosts(if you're on Mac or Linux) and resolve it to the IP of your cluster. On Minikube, runminikube ipto get the address. For example:192.168.49.2 my-web-app.internalThis lets you type
my-web-app.internalin your browser instead of using the raw IP address.
In production, you'll typically use annotations specific to your cloud provider. Each major provider (GCP, AWS, Azure) has their own ingress controllers with their own configuration. The concepts are the same everywhere -- just follow your provider's docs for the specifics.
One of the coolest things about Kubernetes is that it has its own internal IP network. Services can talk to each other without ever exposing them to the outside world. This is great because:
When creating a service, Kubernetes automatically creates an internal DNS with the following format:
<service-name>.<namespace>.svc.cluster.local
In practice, you can usually shorten this. http://<service-name>.<namespace> works, and if you're in the same namespace, just http://<service-name> is enough.
Rule of thumb: Every HTTP server should have a Service resource, but should only have an Ingress resource if it needs to be exposed outside the cluster. If it doesn't have to be exposed to the outside world, don't expose it.
At Clean Coders, we have two internal apps: Epic (project planning) and Poker (story estimation). One good use case for Kubernetes internal DNS is the communication between them. When an Epic project is linked to a Poker room, estimating a story in Poker automatically sends the estimate to Epic and applies it to the story. Likewise, creating a new story in Epic will create a new story in Poker, and editing the name of the story in Epic will edit the name of it in Poker.
The endpoints that Poker and Epic provide to be able to talk to each other have no reason to be public API endpoints. They don't need to be exposed to the outside world, and it seems a little silly to send that data out into the World Wide Web when the two apps are running on the same cluster.
Now we have an application running in Kubernetes that we can access from the browser -- great! But what if we wanted to upload a file to our application, or our application had its own SQLite database?
As we've established, the filesystem inside a container is ephemeral. Save a file, restart the pod, and it's gone. Every new pod starts as a blank slate, and pods don't have access to each other's filesystems either. That's great for stateless services, but obviously some applications need to persist data -- databases, file uploads, caches that should survive restarts.
For anything that stores data you care about, you need Persistent Volumes. A Persistent Volume (PV) is a cluster-level storage resource that exists independently of any pod. Pods come and go; the PV lives on.
PVs can be created:
Dynamic provisioning is generally the way to go. It requires less manual work and provides more flexibility.
A Persistent Volume Claim (PVC) is a request for storage. When using dynamic provisioning, creating a PVC automatically creates a matching PV.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-web-app-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
NOTE:
1Giis 1 gibibyte (1,073,741,824 bytes).1Gwould be 1 gigabyte (1,000,000,000 bytes). Kubernetes uses the binary units (Gi,Mi,Ki) by convention.
kubectl apply -f my-web-app-pvc.yaml
kubectl get pvc
kubectl get pv
After creating the PVC, update the deployment to use it:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-web-app
labels:
app: my-web-app
spec:
replicas: 1
selector:
matchLabels:
app: my-web-app
template:
metadata:
labels:
app: my-web-app
spec:
volumes: # declare a volume that references the PVC
- name: my-web-app-volume
persistentVolumeClaim:
claimName: my-web-app-pvc
containers:
- name: my-web-app
image: docker.io/my-account/my-web-app:latest
envFrom:
- configMapRef:
name: my-web-app-configmap
volumeMounts: # set a mount path for the PV
- name: my-web-app-volume
mountPath: /persist
This will make the PV accessible to the application at /persist in the application's file system, as if it were a directory inside the container.
If you're thinking "great, I'll run Postgres with a Deployment and a PVC," there's a catch. A Deployment with ReadWriteOnce storage and multiple replicas will fail -- only one pod can mount the volume at a time, and Deployments treat all pods as interchangeable. For stateful applications like databases that need stable network identities and dedicated storage per replica, Kubernetes offers StatefulSets. StatefulSets are beyond the scope of this crash course, but know that they exist and that Deployments are the wrong tool for running databases with multiple replicas.
Not all shared storage needs to be persistent. Kubernetes also supports ephemeral volumes -- temporary storage that exists for the lifetime of a pod. These are useful when multiple containers in the same pod need to share files (like a web server and a log-processing sidecar writing to the same directory) but you don't need the data to survive a restart.
If you know your app will run on Kubernetes, prefer your cloud provider's object storage (like AWS S3). It works regardless of whether you're on k8s or a plain VM like EC2, whereas local filesystem dependencies require PVC configuration when moving to Kubernetes.
If you start hosting multiple applications or services in the same cluster, you'll start to feel the mess. Just doing a quick kubectl get pods might return an unmanageably long list when you might only care about a specific application. Namespaces fix this. They're like directories for your Kubernetes resources.
Every resource has a unique name within its namespace. If you've been following along, everything we've done has been in the default namespace. But in a larger organization where multiple teams share a cluster, namespaces keep things separated.
kubectl get namespaces # or: kubectl get ns
kubectl create ns staging
You can specify namespaces on any command:
kubectl -n staging get pods
kubectl -n staging get svc
You can copy resources from one namespace to another, but you can't actually move resources. Kubernetes treats the combination of name + namespace as the unique identifier, so the closest thing to moving is copying the resource to the destination namespace and then removing it from the source namespace.
Now that we've built up the foundation -- deployments, services, storage, networking -- here's where it all pays off. Scaling is the reason Kubernetes exists, and it's where the system really shines.
As I alluded to earlier, there are two ways to handle more traffic:
Vertical scaling: Give your pods more CPU and RAM. This works until you hit the hardware ceiling of your nodes.
Horizontal scaling: Run more instances (pods) of your application. Pods can be distributed across nodes, so you can keep scaling as long as you have nodes to run them on. Need more capacity? Add another node to the cluster.
Kubernetes was specifically designed for horizontal scaling. There's no reason to add more CPU or RAM to a single node when you can just add another node to the whole cluster. And it gets particularly interesting with a microservice architecture: if your payment processing service is getting hammered during a flash sale but your user profile service is idle, you scale only the part that needs it rather than throwing resources at the entire monolith.
Kubernetes also autoheals across nodes, not just pods. If one of the Raspberry Pis in your Pi cluster goes down, Kubernetes will automatically move the pods to the other nodes as long as the other nodes have resources to spare. The app didn't crash, and you can easily replace the node that went down.
There are two critical concepts when it comes to how Kubernetes scales an application:
Resource limits stop any single pod from taking too many resources from the cluster. Resource requests prevent the cluster from crashing due to having more pods than there are resources to handle them.
Consider the following scenario without resource requests configured:
| Node | RAM |
|---|---|
| Node 1 | 8GB |
| Node 2 | 8GB |
| Pod | Node | Actual RAM Usage |
|---|---|---|
| Pod 1 | Node 1 | 3GB |
| Pod 2 | Node 1 | 3GB |
| Pod 3 | Node 2 | 3GB |
| Pod 4 | Node 2 | 3GB |
Each node has 6GB of its 8GB actually in use, but the scheduler doesn't know that -- without resource requests, it has no idea how much memory each pod actually needs. So when Kubernetes tries to schedule a fifth pod that also consumes 3GB, it might place it on a node that only has 2GB to spare. The pod gets scheduled, starts running, exceeds the available memory, and gets OOMKilled -- potentially leading to CrashLoopBackOff.
With resource requests in place, the scheduler knows each pod needs 3GB. When it tries to schedule that fifth pod, it sees that no node has 3GB available and the pod simply stays in Pending until sufficient resources become available.
Both Resource Requests and Resource Limits can be set in the deployment:
apiVersion: apps/v1
[...]
containers:
- image: docker.io/my-account/my-web-app:latest
name: my-web-app
resources:
limits:
memory: "2000Mi"
cpu: "500m"
requests:
memory: "1000Mi"
cpu: "100m"
Memory can be written with single-letter suffixes (M for megabytes, K for kilobytes, etc), or double-letter suffixes (Mi for mebibytes, Ki for kibibytes). A kibibyte is 1,024 bytes; a kilobyte is 1,000 bytes. Kubernetes convention is to use kibibytes, mebibytes, and gibibytes over kilobytes, megabytes, and gigabytes.
CPU is written in millicores, where:
1 or 1000m is 100% of one core500m is 50% of one core100m is 10% of one coreIf you're going to give extra headroom to one resource, make it memory. If your application runs out of memory, the pod crashes. Running out of CPU just slows down the application. Memory exhaustion is the more dangerous failure mode, so give it more headroom.
With our resource requests and limits in place, we're ready for the real payoff: letting Kubernetes scale automatically using a Horizontal Pod Autoscaler. This will observe the resource utilization of our deployments and automatically create more pods or kill unneeded pods in response to our traffic.
An autoscaler configuration looks like this:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-web-app
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
This monitors CPU usage and automatically scales the number of pods to keep the average utilization of all pods around 50%. Traffic spike? More pods. Quiet night? Scale back down. The autoscaling/v2 API also supports scaling based on memory usage or custom metrics -- you just add more entries to the metrics array. Note that the HPA has a default 5-minute stabilization window before scaling down, so it won't thrash between replica counts during fluctuating traffic.
To avoid potential conflicts with the HPA, we need to make sure we don't have a set number of replicas in our deployment YAML. Remove that line from the deployment, then:
kubectl apply -f my-web-app-hpa.yaml
kubectl apply -f my-web-app-deployment.yaml
NOTE: The HPA requires the metrics-server to be running in the cluster so it can read CPU and memory usage. On Minikube, enable it with
minikube addons enable metrics-server. Cloud-managed clusters like EKS, GKE, and AKS typically have this pre-installed.
Everything above works the same whether you're on Minikube or a production cluster -- the only difference is how you connect. Before you can run any kubectl commands against a real cluster, you need to configure your local machine to talk to it. Each cloud provider has its own CLI tool for this.
First, make sure you have the AWS CLI installed and configured (aws configure). Then:
aws eks update-kubeconfig --region <region> --name <cluster-name>
That's it. This command downloads the cluster's connection info and merges it into your ~/.kube/config file. You can verify it worked with:
kubectl get nodes
If you work with multiple AWS accounts or profiles, pass the --profile flag:
aws eks update-kubeconfig --region us-east-1 --name my-cluster --profile my-aws-profile
Install the gcloud CLI and authenticate (gcloud auth login). Then:
gcloud container clusters get-credentials <cluster-name> --region <region> --project <project-id>
Same idea -- this configures kubectl to point at your GKE cluster. Verify with:
kubectl get nodes
If you haven't installed the GKE auth plugin yet, gcloud will tell you. Install it with:
gcloud components install gke-gcloud-auth-plugin
Install the Azure CLI and log in (az login). Then:
az aks get-credentials --resource-group <resource-group> --name <cluster-name>
Same pattern as the others -- this merges the cluster's connection info into your kubeconfig. Verify with:
kubectl get nodes
If you connect to multiple clusters, kubectl stores all of them in your kubeconfig. Switch between them with:
kubectl config get-contexts # List all configured clusters
kubectl config use-context <name> # Switch to a different cluster
kubectl config current-context # Check which cluster you're talking to
Always check your context before running commands. You do not want to accidentally delete pods in production when you meant to hit staging.
That was a lot of ground to cover. The learning curve is real, but every concept builds on the last in a way that eventually clicks.
The core philosophy is simple: describe what you want, and let the system figure out how to make it happen. Pods are temporary. State lives in persistent volumes or external services. Services provide stable endpoints. Ingress handles the outside world. And autoscaling means you can sleep through traffic spikes instead of getting paged.
I hope that Kubernetes feels less like some mystical black box that your application disappears into whenever you git push. At the very least, you don't need to ask the DevOps Engineers how to view your application's logs anymore.
# General
kubectl apply -f <file.yaml>
kubectl apply -f <directory>/
kubectl delete -f <file.yaml>
# Pods
kubectl get pods
kubectl get pods -o wide
kubectl logs <pod-name>
kubectl logs <pod-name> -f # Stream logs in real time
kubectl logs <pod-name> --previous # Logs from last crashed container
kubectl logs <pod-name> --all-containers
kubectl delete pod <pod-name>
# Deployments
kubectl create deployment <name> --image=<image>
kubectl apply -f <deployment.yaml>
kubectl get deployment <name> -o yaml
kubectl edit deployment <name>
# Services
kubectl get svc
kubectl get svc <name> -o yaml
kubectl port-forward service/<name> 8080:80
# Namespaces
kubectl get ns
kubectl create ns <name>
kubectl -n <namespace> get pods
# Debugging
kubectl describe pod <pod-name>
kubectl exec -it <pod-name> -- /bin/sh
kubectl port-forward <pod-name> 7888:7888
kubectl get events
kubectl get events --sort-by='.lastTimestamp'
# Config & Storage
kubectl get configmaps
kubectl get secrets
kubectl get pvc
kubectl get pv
# Rollouts
kubectl rollout status deployment/<name>
kubectl rollout history deployment/<name>
kubectl rollout undo deployment/<name>
# Scaling
kubectl get replicasets
kubectl get hpa
kubectl top pods
kubectl top nodes
# Cluster
kubectl get nodes
kubectl config get-contexts
kubectl config use-context <name>