In the dynamic and ever-evolving world of container orchestration, ensuring that applications run smoothly and reliably is paramount. Kubernetes, as a leading container orchestration platform, provides a robust mechanism to monitor and manage the health of applications through the use of health probes. These probes play a critical role in maintaining the stability and availability of services by continuously checking the health and readiness of application components.
In this blog, we will delve into the necessity of health probes in Kubernetes, explore the different types of probes available, and understand how they contribute to the seamless operation of containerized applications. Let's get started.
Health Probes
Health probes are mechanisms used to determine the health and readiness of containers running in a Pod. These probes help ensure that the applications running inside the containers are functioning correctly and are ready to accept traffic.
Types of Health Probes
Liveness Probe
Purpose: Determines if a container is running. If the liveness probe fails, Kubernetes will kill the container and restart it according to the Pod's restart policy.
Usage: Helps to ensure that applications are still running and can recover from failures.
Readiness Probe
Purpose: Determines if a container is ready to start accepting traffic. If the readiness probe fails, the endpoints controller will remove the Pod's IP address from the endpoints of all services that match the Pod. This ensures that traffic is not sent to a Pod that is not ready.
Usage: Ensures that only healthy Pods receive traffic, preventing downtime and errors.
Startup Probe
Purpose: Determines if a container has started up. If the startup probe fails, Kubernetes will kill the container and restart it according to the Pod's restart policy. This probe is useful for applications that have a longer startup time.
Usage: Useful for complex applications that take a long time to start and might fail initial readiness or liveness checks.
Real-Life Example
Let's use a real-life example to explain Kubernetes probes. Imagine you have a web application called mywebapp
running in a Kubernetes cluster. This web application has three main characteristics:
It takes some time to start.
It needs to be ready to accept user requests.
It should be restarted if it stops responding.
Here's how each type of probe can be used to ensure the application runs smoothly:
Scenario
You have a mywebapp
container, and you want to:
Ensure it starts correctly before it starts processing requests.
Ensure it is always ready to handle requests.
Restart it if it stops working properly.
Liveness Probe
Imagine mywebapp
sometimes crashes or gets stuck. You want Kubernetes to restart it automatically in such cases.
- Liveness Probe: Think of this as a periodic check-up to see if the application is still alive. It might check an endpoint like
/healthz
that returns200 OK
if the application is running well. If this check fails, Kubernetes will restart the container.
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15 # after this time the probe start checking
periodSeconds: 20 # In this time-interval, probe check the status of running
- Real-Life Example: Imagine you have a monitoring system that checks if a web server is running every 20 minutes. If the server doesn't respond, you restart it. Similarly, the liveness probe checks if the app is alive and restarts it if it's not.
Readiness Probe
Once mywebapp
is running, you want to ensure it’s ready to handle user requests. For example, it might need to load some data or establish a database connection first.
- Readiness Probe: This probe checks if the application is ready to serve traffic. It might check an endpoint like
/ready
that returns200 OK
only when the app is fully ready. If this check fails, the app won’t receive any traffic until it passes.
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
- Real-Life Example: Think of a restaurant that opens at 10 AM. Even if the restaurant is technically open, it may not be ready to serve customers until the staff has prepared everything. Similarly, the readiness probe checks if the app is ready to serve requests.
Startup Probe
Suppose mywebapp
has a lengthy initialization process, like loading large datasets or performing some setup tasks. You want to give it enough time to start without failing initial checks.
- Startup Probe: This probe checks if the application has started up correctly. It might check an endpoint like
/startup
that returns200 OK
once the app has fully started. This probe is useful for applications that take a long time to start and might fail initial liveness or readiness checks.
startupProbe:
httpGet:
path: /startup
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
- Real-Life Example: Imagine a car engine that needs some time to warm up before it can start running smoothly. The startup probe gives the app the time it needs to warm up and start correctly.
Configuration of Probes
Probes can be configured using the following methods:
HTTP Probes: Send an HTTP request to the container. If the response is within the configured success range, the container is considered healthy.
livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 3 periodSeconds: 3
TCP Probes: Attempt to open a TCP connection to the specified port. If the connection is successful, the container is considered healthy.
livenessProbe: tcpSocket: port: 8080 initialDelaySeconds: 3 periodSeconds: 3
Command Probes: Execute a command inside the container. If the command returns a zero-exit status, the container is considered healthy.
livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 3 periodSeconds: 3
Hands-on Liveness Probe
liveness command-probe
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: containers: - name: liveness image: registry.k8s.io/busybox args: - /bin/sh - -c - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600 livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5
Step-by-Step Explanation
Pod Creation: When you apply this YAML file, Kubernetes creates a Pod named
liveness-exec
with a single container using thebusybox
image.Container Initialization:
The container starts and executes the command specified in
args
:touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
This command sequence does the following:
Creates a file named
/tmp/healthy
.Sleeps (pauses) for 30 seconds.
Deletes the
/tmp/healthy
file.Sleeps for 600 seconds (10 minutes).
Liveness Probe Initialization:
The
livenessProbe
is configured to check the container's health using anexec
command:livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5
initialDelaySeconds: 5
: This means Kubernetes waits for 5 seconds after the container starts before performing the first liveness check.periodSeconds: 5
: This means Kubernetes will perform the liveness check every 5 seconds.
Sequence of Events
Container Start:
- The container starts and executes the initial command:
touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
.
- The container starts and executes the initial command:
Initial Delay (5 seconds):
- Kubernetes waits for 5 seconds before performing the first liveness probe.
First Liveness Probe (after 5 seconds):
Kubernetes runs the liveness probe command:
cat /tmp/healthy
.Since the
/tmp/healthy
file exists, the command succeeds, and the container is considered healthy.
Subsequent Liveness Probes (every 5 seconds):
Kubernetes continues to run the liveness probe command every 5 seconds.
For the first 30 seconds, the
/tmp/healthy
file exists, so the container is considered healthy.
After 30 seconds:
The container's command sequence deletes the
/tmp/healthy
file:rm -f /tmp/healthy
.The container then sleeps for 600 seconds.
Liveness Probe Fails:
- Once the
/tmp/healthy
file is deleted, the next liveness probe (which runs every 5 seconds) will fail because thecat /tmp/healthy
command will return an error (file not found).
- Once the
Container Restart:
Because the liveness probe fails, Kubernetes will consider the container to be unhealthy and will restart it.
The container restarts, and the initial command sequence starts again, creating the
/tmp/healthy
file, and the process repeats.
Summary
Initial Delay: Kubernetes waits for 5 seconds before starting liveness checks.
Periodic Checks: Kubernetes checks the container's health every 5 seconds.
Health Check: The container is considered healthy as long as the
/tmp/healthy
file exists.Failure and Restart: After 30 seconds, the file is deleted, causing the liveness probe to fail, and Kubernetes restarts the container.
In Image we can see that our pod is restarts and when u describe the pod using kubectl describe pod/liveness-exec
u see that
liveness http-probe
apiVersion: v1 kind: Pod metadata: name: hello spec: containers: - name: liveness image: registry.k8s.io/e2e-test-images/agnhost:2.40 args: - liveness livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 3 periodSeconds: 3
In above yaml file the liveness probe is making httpGet request to /healthz which is not exist in the container so pod will restarts after every 3 sec.
Run the command
kubectl get po
From this we can see that our
hello
pod is restarted 4 times, after some time when more failure was there, it will go underCrashLoopBackOff
liveness tcp-probe
apiVersion: v1 kind: Pod metadata: name: tcp-pod labels: app: tcp-pod spec: containers: - name: goproxy image: registry.k8s.io/goproxy:0.1 ports: - containerPort: 8080 livenessProbe: tcpSocket: port: 3000 initialDelaySeconds: 10 periodSeconds: 5
Initial Delay: Kubernetes waits for 10 seconds before starting the liveness checks to allow the container to initialize.
Periodic Checks: Kubernetes performs a health check every 5 seconds.
Health Check: The liveness probe checks if a TCP connection to port
3000
can be established. If successful, the container is healthy.Failure and Restart: If the liveness probe fails (TCP connection cannot be established), Kubernetes will restart the container to ensure it remains healthy and functional.
the TCP connection to port 3000
fails (e.g., the application inside the container is not responding or the port is not open), the liveness probe fails,Kubernetes considers the container unhealthy and will restart the container to attempt to recover it.
the other two probes (readiness and startup) have also same syntax, please do some hands-on on readiness probe from your side for better understanding, u can take help from here.
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 10