Overview
Testing out a new feature or upgrade in production is a challenging task. It is paramount to roll out changes frequently but without affecting the end user experience. This allows us to test the changes in real time, and the ability to quickly roll back the changes in the event of any unforeseen issues.
When you add the canary deployment to a Kubernetes cluster, it is managed by a service through selectors and labels. The service routes traffic to the pods with a specific label. This is helpful to add or remove deployments easily.
How Canary Deployments Work
Canary deployments involve running two versions of the application simultaneously. The old version is referred as “the stable” and the new “the canary.”
Here’s a step-by-step explanation of how canary deployment works:
Initial Deployment
The existing version of the software is currently running in the production environment.
Developers create a new version or release with updates, bug fixes, or new features.
Deployment to a Subset (Canary Group)
Instead of deploying the new version to the entire user base, it is first released to a small subset of users or servers. This subset is often referred to as the “canary group.”
The canary group typically represents a small percentage of the overall user base, allowing for a controlled and gradual release.
Monitoring and Testing
The performance, stability, and functionality of the new version are closely monitored within the canary group.
Automated testing and monitoring tools are often used to detect issues such as errors, crashes, or performance degradation.
Incremental Rollout
If the new version proves to be stable and performs well within the canary group, the deployment is gradually expanded to include a larger percentage of users.
This incremental rollout continues until the new version is deployed to the entire user base.
Rollback or Remediation
If issues are detected during the canary deployment, developers can quickly roll back the changes or implement fixes before the wider rollout.
This provides a safety net to minimize the impact of potential problems on the entire user base.
Completion:
Once the new version has been successfully deployed to the entire user base and no significant issues are detected, the canary deployment process is complete.
Canary Deployments in Kubernetes
Basically, a canary deployment creates a similar copy as that of the production environment with a load balancer routing user traffic between the available environments based on the defined parameters.
The canary deployment is controlled by services using selectors and labels. This service provides or forwards traffic to the labeled Kubernetes environment or pod, making it simple to add or remove deployments.
Firstly, a specific percentage of users are directed to the new application.The idea is to gradually roll out the new version to a subset of users or nodes, monitor its performance and stability, and then progressively deploy it to the entire system if everything looks good. This approach helps catch potential issues early and allows for quick rollbacks if problems arise.
For canary deployments, the selectors and labels used in the config or YAML file are different than those used in original deployments.
A service is created to allow access to all created pods or replicas through a single IP or name. Then ingress configuration sets a collection of rules allowing inbound connection to communicate with cluster services.
Why EKS
Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that eliminates the need to install, operate, and maintain your own Kubernetes control plane on Amazon Web Services (AWS).
Features of Amazon EKS
The following are key features of Amazon EKS:
Secure networking and authentication
Amazon EKS integrates your Kubernetes workloads with AWS networking and security services. It also integrates with AWS Identity and Access Management (IAM) to provide authentication for your Kubernetes clusters.
Easy cluster scaling
Amazon EKS enables you to scale your Kubernetes clusters up and down easily based on the demand of your workloads. Amazon EKS supports horizontal Pod autoscaling based on CPU or custom metrics, and cluster autoscaling based on the demand of the entire workload.
Managed Kubernetes experience
You can make changes to your Kubernetes clusters using eksctl, AWS Management Console, AWS Command Line Interface (AWS CLI), the API, kubectl, and Terraform.
High availability
Amazon EKS provides high availability for your control plane across multiple Availability Zones.
Integration with AWS services
Amazon EKS integrates with other AWS services, providing a comprehensive platform for deploying and managing your containerized applications. You can also more easily troubleshoot your Kubernetes workloads with various observability tools.
Reference
https://docs.aws.amazon.com/eks/latest/userguide/what-is-eks.html
Prerequisites:
- Kubernetes cluster set up and configured.
- kubectl command-line tool installed.
Environment setup
I have created EKS cluster using eksctl command line utility with below details.
- Cluster version is 1.27.
- Region ap-south-1.
- Node type t3.medium.
- Number of nodes 3.
eksctl create cluster --name my-demo-cluster --version 1.27 --region ap-south-1 --nodegroup-name standard-workers --node-type t3.medium --nodes 3 --nodes-min 1 --nodes-max 4 --managed
Architecture
Steps to follow
- 10 replicas of version 1 is serving traffic
- Deploy 1 replica of version 2 (meaning ~5% of traffic)
- Wait to confirm that version 2 is stable and not throwing unexpected errors
- Scale up version 2 replicas to 10 and scale
- Wait until all instances are ready
- Scale down version 1 to 9 replicas.
- Shutdown version 1
Actual implementation
Deploy the first application
kubectl apply -f app-v1.yaml
(https://github.com/vivekpophale/canaryexample/blob/main/appv1.yml)
#app-v1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-v1
labels:
app: my-app
spec:
replicas: 10
selector:
matchLabels:
app: my-app
version: v1.0.0
template:
metadata:
labels:
app: my-app
version: v1.0.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9101"
spec:
containers:
- name: my-app
image: containersol/k8s-deployment-strategies
ports:
- name: http
containerPort: 8080
- name: probe
containerPort: 8086
env:
- name: VERSION
value: v1.0.0
livenessProbe:
httpGet:
path: /live
port: probe
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: probe
periodSeconds: 5
Deploy the service
kubectl apply -f service.yaml
(https://github.com/vivekpophale/canaryexample/blob/main/service.yml)
#service.yaml
apiVersion: v1
kind: Service
metadata:
name: my-app
labels:
app: my-app
spec:
type: NodePort
ports:
- name: http
port: 80
targetPort: http
selector:
app: my-app
Test if the deployment was successful
To see the deployment in action, open a new terminal and run a watch command.
It will show you a better view on the progress
watch kubectl get po
Then deploy version 2 of the application and scale down version 1 to 9 replicas at same time
kubectl apply -f app-v2.yaml
kubectl scale --replicas=9 deploy my-app-v1
(https://github.com/vivekpophale/canaryexample/blob/main/appv2.yml
#app-v2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-v2
labels:
app: my-app
spec:
replicas: 1
selector:
matchLabels:
app: my-app
version: v2.0.0
template:
metadata:
labels:
app: my-app
version: v2.0.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9101"
spec:
containers:
- name: my-app
image: containersol/k8s-deployment-strategies
ports:
- name: http
containerPort: 8080
- name: probe
containerPort: 8086
env:
- name: VERSION
value: v2.0.0
livenessProbe:
httpGet:
path: /live
port: probe
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: probe
periodSeconds: 5
Only one pod with the new version should be running.
You can test if the second deployment was successful
*If you are happy with it, scale up the version 2 to 10 replicas
kubectl scale --replicas=10 deploy my-app-v2
Then, when all pods are running, you can safely delete the old deployment
kubectl delete deploy my-app-v1
Conclusion
This demo illustrated the benefit of using canary deployment and its ability to do capacity testing of the new version in a production environment with a safe rollback strategy if issues are found. By slowly ramping up the load, you can monitor and capture metrics about how the new version impacts the production environment. This is an alternative approach to creating an entirely separate capacity testing environment, because the environment will be as production-like as it can be.
Reference-https://martinfowler.com/bliki/CanaryRelease.html?ref=wellarchitected