Imagine we have a web application deployed in a Kubernetes cluster. As the traffic to your application grows, the CPU and memory usage of your pods also increase. If you don’t act quickly to scale up the number of replicas (instances of your application), your users might experience slow response times or even service outages. On the other hand, during periods of low traffic, running too many replicas can waste resources and increase operational costs. This is where autoscaling comes into play. In this blog post, we will delve into the concepts and hands-on implementation of autoscaling in Kubernetes. We will explore the Horizontal Pod Autoscaler (HPA). Let's get started.
Scaling in Kubernetes
When it comes to scaling applications, two primary methods are used: Horizontal Autoscaling and Vertical Autoscaling. Understanding the differences and use cases for each can help you make more informed decisions about how to manage the resources for your applications effectively.
Horizontal Autoscaling
Horizontal scaling, often referred to as scaling out, involves adding more instances (or replicas) of a service or application to handle increased load. In Kubernetes, this means increasing the number of pods running your application. This type of scaling is generally preferred for its flexibility and resilience. By distributing the load across multiple instances, you can achieve better performance and fault tolerance.
Key Benefits of Horizontal Scaling:
Improved Fault Tolerance: If one instance fails, others can continue to handle requests.
Enhanced Load Distribution: Traffic is spread across multiple instances, reducing the burden on any single pod.
Elasticity: Easy to add or remove instances based on demand, making it suitable for applications with fluctuating workloads.
In Kubernetes, horizontal scaling is typically managed using the Horizontal Pod Autoscaler (HPA), which adjusts the number of pod replicas based on metrics like CPU utilization, memory usage, or custom metrics.
Vertical Autoscaling
Vertical scaling, also known as scaling up, involves increasing the resources (CPU, memory) allocated to a single instance of a service or application. In Kubernetes, this means changing the resource requests and limits for a pod to provide it with more CPU and memory.
Key Benefits of Vertical Scaling:
Simplified Architecture: Fewer instances to manage can simplify application architecture and management.
Performance Boost: Increasing resources for a single instance can improve performance for applications that benefit from larger resource allocations.
Note: In Kubernetes, the Horizontal Pod Autoscaler (HPA) is natively supported and can be used in any Kubernetes cluster without additional setup. Other autoscaling features, such as the Cluster Autoscaler, Vertical Pod Autoscaler (VPA), and Node Auto-Provisioning, are often provided as additional services or integrated features by cloud providers like AKS (Azure Kubernetes Service), GKE (Google Kubernetes Engine), and EKS (Amazon Elastic Kubernetes Service). In this blog post, we will focus on a hands-on implementation of HPA only.
Hands-on Example: Horizontal Pod Autoscaler (HPA)
Let's walk through a hands-on example of setting up HPA in Kubernetes.
Prerequisites
A Kubernetes cluster (You can use a local cluster like Minikube or Kind).
kubectl
installed and configured to interact with your cluster.metrics-server
installed in your cluster to provide resource metrics. (We installedmetrics-server
in a previous blog, you can check it out here).
Step 1: Deploy an application
To demonstrate a HorizontalPodAutoscaler, you will first start a Deployment that runs a container using the hpa-example
image, and expose it as a Service using the following manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
Step 2: Create a Horizontal Pod Autoscaler
Create an HPA for the php-apache
deployment. This HPA will scale the pods based on CPU usage. For example, we want to maintain an average CPU utilization of 50% across all pods:
For the imperative way, we can use the command below:
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
For the declarative way, we can use the following YAML file:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-scaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
Apply this YAML file to create an HPA.
Step 3: Generate Load to Test Autoscaling
To see the autoscaler in action, you need to generate some load on the php-apache service. You can use a tool like kubectl run
to create a busybox pod that continuously makes requests to the php-apache service.
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
It will continuously print OK! message in it for higher workload
Step 4: Monitor Autoscaling
Watch the HPA status to see how it scales the number of pods:
kubectl get hpa nginx -w
We can see the HPA adjusting the number of replicas based on the load. We can also monitor the pods:
kubectl get pods -l run=php-apache
Step 5: Clean Up
Once you are done with the hands-on exercise, you can clean up the resources:
kubectl delete hpa nginx
kubectl delete deployment nginx
kubectl delete svc nginx
kubectl delete pod load-generator
Conclusion
Autoscaling is a powerful feature in Kubernetes that ensures your applications can handle varying loads efficiently. By following this hands-on example, we have a good understanding of how to set up and test Horizontal Pod Autoscaling in our Kubernetes cluster.