Autoscaling in Kubernetes - CKA

Autoscaling in Kubernetes - CKA

Imagine we have a web application deployed in a Kubernetes cluster. As the traffic to your application grows, the CPU and memory usage of your pods also increase. If you don’t act quickly to scale up the number of replicas (instances of your application), your users might experience slow response times or even service outages. On the other hand, during periods of low traffic, running too many replicas can waste resources and increase operational costs. This is where autoscaling comes into play. In this blog post, we will delve into the concepts and hands-on implementation of autoscaling in Kubernetes. We will explore the Horizontal Pod Autoscaler (HPA). Let's get started.

Scaling in Kubernetes

When it comes to scaling applications, two primary methods are used: Horizontal Autoscaling and Vertical Autoscaling. Understanding the differences and use cases for each can help you make more informed decisions about how to manage the resources for your applications effectively.

Horizontal Autoscaling

Horizontal scaling, often referred to as scaling out, involves adding more instances (or replicas) of a service or application to handle increased load. In Kubernetes, this means increasing the number of pods running your application. This type of scaling is generally preferred for its flexibility and resilience. By distributing the load across multiple instances, you can achieve better performance and fault tolerance.

Key Benefits of Horizontal Scaling:

  • Improved Fault Tolerance: If one instance fails, others can continue to handle requests.

  • Enhanced Load Distribution: Traffic is spread across multiple instances, reducing the burden on any single pod.

  • Elasticity: Easy to add or remove instances based on demand, making it suitable for applications with fluctuating workloads.

In Kubernetes, horizontal scaling is typically managed using the Horizontal Pod Autoscaler (HPA), which adjusts the number of pod replicas based on metrics like CPU utilization, memory usage, or custom metrics.

Vertical Autoscaling

Vertical scaling, also known as scaling up, involves increasing the resources (CPU, memory) allocated to a single instance of a service or application. In Kubernetes, this means changing the resource requests and limits for a pod to provide it with more CPU and memory.

Key Benefits of Vertical Scaling:

  • Simplified Architecture: Fewer instances to manage can simplify application architecture and management.

  • Performance Boost: Increasing resources for a single instance can improve performance for applications that benefit from larger resource allocations.

Note: In Kubernetes, the Horizontal Pod Autoscaler (HPA) is natively supported and can be used in any Kubernetes cluster without additional setup. Other autoscaling features, such as the Cluster Autoscaler, Vertical Pod Autoscaler (VPA), and Node Auto-Provisioning, are often provided as additional services or integrated features by cloud providers like AKS (Azure Kubernetes Service), GKE (Google Kubernetes Engine), and EKS (Amazon Elastic Kubernetes Service). In this blog post, we will focus on a hands-on implementation of HPA only.

Hands-on Example: Horizontal Pod Autoscaler (HPA)

Let's walk through a hands-on example of setting up HPA in Kubernetes.

Prerequisites

  • A Kubernetes cluster (You can use a local cluster like Minikube or Kind).

  • kubectl installed and configured to interact with your cluster.

  • metrics-server installed in your cluster to provide resource metrics. (We installed metrics-server in a previous blog, you can check it out here).

Step 1: Deploy an application

To demonstrate a HorizontalPodAutoscaler, you will first start a Deployment that runs a container using the hpa-example image, and expose it as a Service using the following manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

Step 2: Create a Horizontal Pod Autoscaler

Create an HPA for the php-apache deployment. This HPA will scale the pods based on CPU usage. For example, we want to maintain an average CPU utilization of 50% across all pods:

For the imperative way, we can use the command below:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

For the declarative way, we can use the following YAML file:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

Apply this YAML file to create an HPA.

Step 3: Generate Load to Test Autoscaling

To see the autoscaler in action, you need to generate some load on the php-apache service. You can use a tool like kubectl run to create a busybox pod that continuously makes requests to the php-apache service.

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

It will continuously print OK! message in it for higher workload

Step 4: Monitor Autoscaling

Watch the HPA status to see how it scales the number of pods:

kubectl get hpa nginx -w

We can see the HPA adjusting the number of replicas based on the load. We can also monitor the pods:

kubectl get pods -l run=php-apache

Step 5: Clean Up

Once you are done with the hands-on exercise, you can clean up the resources:

kubectl delete hpa nginx
kubectl delete deployment nginx
kubectl delete svc nginx
kubectl delete pod load-generator

Conclusion

Autoscaling is a powerful feature in Kubernetes that ensures your applications can handle varying loads efficiently. By following this hands-on example, we have a good understanding of how to set up and test Horizontal Pod Autoscaling in our Kubernetes cluster.