How to Upgrade a Multi-Node Kubernetes Cluster - CKA

How to Upgrade a Multi-Node Kubernetes Cluster - CKA

Upgrading your Kubernetes cluster is essential for ensuring security, stability, and access to the latest features. New releases bring critical fixes, performance improvements, and compatibility with evolving tools, helping to keep your infrastructure efficient and secure. Regular upgrades prevent potential vulnerabilities and performance issues, ensuring your workloads run smoothly in a fast-changing cloud environment.

Let's dive into this with an example.

Kubernetes Versioning

Kubernetes follows semantic versioning for its release cycle, and each version is represented by three numbers in the format X.Y.Z, where:

  • X: Major Release

  • Y: Minor Release

  • Z: Patch Release

Major Release

Example: 1.0.0, 2.0.0
The first number (X) is the major version. Major releases introduce breaking changes, meaning components from the previous major release may not be compatible with the new version. For example, certain APIs or features might be deprecated or removed altogether.

Minor Release

Example: 1.29.0, 1.30.0
The second number (Y) is the minor version. A minor release introduces new features, improvements, and deprecations, but it maintains backward compatibility. Minor releases are scheduled more frequently, about every 3-4 months.

Patch Release

Example: 1.29.2, 1.29.3
The third number (Z) is the patch version. Patch releases focus on bug fixes, security updates, and minor improvements. These do not introduce new features but ensure the platform remains stable and secure. Patch releases are fully backward compatible and typically issued as needed to address specific problems.

Upgrade process: 1 minor version at a time

In this case 1.28.2 is out-of-support version means no bug fixes updates or new patch will be issued on it. By default, Kubernetes releases updates on 3 latest minor versions only.

Upgrade steps

  1. Upgrade master node

  2. Upgrade worker node

Note: when master is down, management operation are down, pods continue to run

When we upgrade any component either master or worker node, the major steps are -

  1. Drain Node - Empty the node by clearing all services running in it

  2. Cordon Node - Make it Unschedulable, so that no workloads or resources assign on it during upgrade

  3. Upgrade - Install the latest version in it

  4. Uncordon - Make it Schedulable, so that all workflows continue again

Refer to this document, for detailed steps of upgrading version: Doc

Upgrade Strategies

All-at-Once Upgrade

This method upgrades the entire Kubernetes cluster at once. While it’s the fastest, it can result in downtime and is generally not suitable for production environments.

How it works:

  1. Shut down all workloads and control plane components.

  2. Upgrade Kubernetes binaries across the cluster.

  3. Restart the cluster.

Risks:

  • Complete downtime during the upgrade.

  • Can disrupt critical services.

Rolling Update

The rolling update strategy upgrades cluster nodes one at a time or in small batches, ensuring minimal downtime. Some nodes remain operational while others are upgraded.

How it works:

  1. Upgrade the control plane (API server, etcd).

  2. Upgrade worker nodes individually:

    • Cordon the node (prevent new pods from being scheduled on it).

    • Drain the node (evict existing pods).

    • Upgrade the Kubernetes binaries.

    • Uncordon the node (allow scheduling again).

  3. Repeat the process for each node.

Blue-Green Deployment

In this strategy, you have two environments—blue (current) and green (upgraded). Once the upgrade is complete and validated, traffic is switched from the blue environment to the green.

How it works:

  1. Set up a separate green environment with the new version.

  2. Deploy updated workloads.

  3. Validate the green environment.

  4. Switch traffic from blue to green using load balancers or DNS.

  5. Decommission the blue environment once the green is stable.

Advantages:

  • Zero downtime.

  • Easy rollback if issues arise.

Disadvantages:

  • Requires double the resources during the upgrade process.

Note: Rolling updates are the most commonly used strategy in production environments. Let’s proceed with a demo of the rolling update process.

Rolling Update Demo

You can follow official kubernetes doc also: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#changing-the-package-repository

Prerequisite: Set up a 1.29.6 version cluster or any version of your choice on any clouds VMs, and we will update that cluster to 1.30.8. You can check my blog for step-by-step setup of kubernetes cluster using kubeadm.

Upgrade Master Node

  1. Determine the version to upgrade to

     sudo apt update
     sudo apt-cache madison kubeadm
    

    The output may only show 1.29.x series upgrades. To upgrade to 1.30.x, we need to modify the Kubernetes package repositories.

  2. Changing Kubernetes package repositories

     sudo nano /etc/apt/sources.list.d/kubernetes.list
    

    Replace 1.29 with 1.30.

  3. Check available versions again

     sudo apt-cache madison kubeadm
    

    We will update to version 1.30.8-1.1.

  4. Upgrade kubeadm

      sudo apt-mark unhold kubeadm && \
      sudo apt-get update && sudo apt-get install -y kubeadm='1.30.x-*' && \
      sudo apt-mark hold kubeadm
    

    Put the version value, you want to upgrade, in our case value is 1.30.8-1.1

  5. Check Kubeadm version

     kubeadm version
    

  6. Verify the upgrade plan

     sudo kubeadm upgrade plan
    

    This command checks that our cluster can be upgraded and fetches the versions you can upgrade to. It also shows a table with the component config version states.

  7. Apply the upgrade

     sudo kubeadm upgrade apply v1.30.8
    

  8. Upgrade the CNI Plugin (e.g., Calico)

    We are using an updated Calico (CNI) plugin, so there is no need to worry.

  9. Checking the nodes version

    It is still showing 1.29 version, but it is a kubectl version not cluster version, for checking the control-plane component version we can check api-server.yaml file.

    sudo cat /etc/kubernetes/manifests/kube-apiserver.yaml

  10. Drain the node

    kubectl drain master --ignore-daemonsets
    

    We can see that on master node Scheduling is Disabled

  11. Upgrade kubelet and kubectl

    sudo apt-mark unhold kubelet kubectl && \
    sudo apt-get update && sudo apt-get install -y kubelet='1.30.x-*' kubectl='1.30.x-*' && \
    sudo apt-mark hold kubelet kubectl
    

    use the version 1.30.8-1.1

  12. Restart the kubelet

    sudo systemctl daemon-reload
    sudo systemctl restart kubelet
    
  13. Uncordon the node

    kubectl uncordon master
    
  14. Now, finally run the kubectl get nodes command and see our master node is upgraded

Upgrade Worker Nodes

  1. SSH into your worker node

  2. Changing Kubernetes package repositories

     sudo nano /etc/apt/sources.list.d/kubernetes.list
    

    Change 1.29 to 1.30

  3. Upgrade kubeadm

      sudo apt-mark unhold kubeadm && \
      sudo apt-get update && sudo apt-get install -y kubeadm='1.30.x-*' && \
      sudo apt-mark hold kubeadm
    

    Put the version value, you want to upgrade, in our case value is 1.30.8-1.1

  4. upgrades the local kubelet configuration

     sudo kubeadm upgrade node
    
  5. Drain the node

     kubectl drain <node-to-drain> --ignore-daemonsets
    

    Put the worker node name

  6. Upgrade kubelet and kubectl

      sudo apt-mark unhold kubelet kubectl && \
      sudo apt-get update && sudo apt-get install -y kubelet='1.30.x-*' kubectl='1.30.x-*' && \
      sudo apt-mark hold kubelet kubectl
    
  7. Restart the kubelet

      sudo systemctl daemon-reload
      sudo systemctl restart kubelet
    
  8. Uncordon the node

     kubectl uncordon <node-to-uncordon>
    

    Put the worker node name

Similarly do this for second worker node and the final output will be like this:

Conclusion

Congratulations! we have successfully upgraded both the master and worker nodes from version 1.29.6 to 1.30.8 using a rolling update strategy. This process helps minimize downtime and ensures our cluster remains secure and up to date with the latest features.

Thank you for reading my blog! Please try this hands-on tutorial yourself to gain a better understanding.