Why Blue-Green Deployments?

Before we jump into the nitty-gritty, let's quickly recap why blue-green deployments are the cat's pajamas:

  • Zero downtime deployments
  • Easy rollbacks if things go south
  • Ability to test in production-like environment
  • Reduced risk and stress for your ops team

Now, imagine doing all of this with the power of Kubernetes Operators. Excited? You should be!

Setting the Stage: Our Custom Controller

Our mission, should we choose to accept it (and we do), is to create a custom controller that manages blue-green deployments. This controller will watch for changes to our custom resource and orchestrate the deployment process.

First things first, let's define our custom resource:

apiVersion: mycompany.com/v1
kind: BlueGreenDeployment
metadata:
  name: my-awesome-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-awesome-app
  template:
    metadata:
      labels:
        app: my-awesome-app
    spec:
      containers:
      - name: my-awesome-app
        image: myregistry.com/my-awesome-app:v1
        ports:
        - containerPort: 8080

Nothing too fancy here, just a standard Kubernetes deployment with a twist - it's our custom resource type!

The Heart of the Matter: Controller Logic

Now, let's dive into the controller logic. We'll be using Go because, well, it's Go-rgeous (sorry, couldn't resist).


package controller

import (
	"context"
	"fmt"
	"time"

	appsv1 "k8s.io/api/apps/v1"
	corev1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/api/errors"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/apimachinery/pkg/types"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/controller"
	"sigs.k8s.io/controller-runtime/pkg/handler"
	"sigs.k8s.io/controller-runtime/pkg/manager"
	"sigs.k8s.io/controller-runtime/pkg/reconcile"
	"sigs.k8s.io/controller-runtime/pkg/source"

	mycompanyv1 "github.com/mycompany/api/v1"
)

type BlueGreenReconciler struct {
	client.Client
	Scheme *runtime.Scheme
}

func (r *BlueGreenReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
	log := r.Log.WithValues("bluegreen", req.NamespacedName)

	// Fetch the BlueGreenDeployment instance
	blueGreen := &mycompanyv1.BlueGreenDeployment{}
	err := r.Get(ctx, req.NamespacedName, blueGreen)
	if err != nil {
		if errors.IsNotFound(err) {
			// Object not found, return.  Created objects are automatically garbage collected.
			return reconcile.Result{}, nil
		}
		// Error reading the object - requeue the request.
		return reconcile.Result{}, err
	}

	// Check if the deployment already exists, if not create a new one
	found := &appsv1.Deployment{}
	err = r.Get(ctx, types.NamespacedName{Name: blueGreen.Name + "-blue", Namespace: blueGreen.Namespace}, found)
	if err != nil && errors.IsNotFound(err) {
		// Define a new deployment
		dep := r.deploymentForBlueGreen(blueGreen, "-blue")
		log.Info("Creating a new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
		err = r.Create(ctx, dep)
		if err != nil {
			log.Error(err, "Failed to create new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
			return reconcile.Result{}, err
		}
		// Deployment created successfully - return and requeue
		return reconcile.Result{Requeue: true}, nil
	} else if err != nil {
		log.Error(err, "Failed to get Deployment")
		return reconcile.Result{}, err
	}

	// Ensure the deployment size is the same as the spec
	size := blueGreen.Spec.Size
	if *found.Spec.Replicas != size {
		found.Spec.Replicas = &size
		err = r.Update(ctx, found)
		if err != nil {
			log.Error(err, "Failed to update Deployment", "Deployment.Namespace", found.Namespace, "Deployment.Name", found.Name)
			return reconcile.Result{}, err
		}
		// Spec updated - return and requeue
		return reconcile.Result{Requeue: true}, nil
	}

	// Update the BlueGreenDeployment status with the pod names
	// List the pods for this deployment
	podList := &corev1.PodList{}
	listOpts := []client.ListOption{
		client.InNamespace(blueGreen.Namespace),
		client.MatchingLabels(labelsForBlueGreen(blueGreen.Name)),
	}
	if err = r.List(ctx, podList, listOpts...); err != nil {
		log.Error(err, "Failed to list pods", "BlueGreenDeployment.Namespace", blueGreen.Namespace, "BlueGreenDeployment.Name", blueGreen.Name)
		return reconcile.Result{}, err
	}
	podNames := getPodNames(podList.Items)

	// Update status.Nodes if needed
	if !reflect.DeepEqual(podNames, blueGreen.Status.Nodes) {
		blueGreen.Status.Nodes = podNames
		err := r.Status().Update(ctx, blueGreen)
		if err != nil {
			log.Error(err, "Failed to update BlueGreenDeployment status")
			return reconcile.Result{}, err
		}
	}

	return reconcile.Result{}, nil
}

// deploymentForBlueGreen returns a bluegreen Deployment object
func (r *BlueGreenReconciler) deploymentForBlueGreen(m *mycompanyv1.BlueGreenDeployment, suffix string) *appsv1.Deployment {
	ls := labelsForBlueGreen(m.Name)
	replicas := m.Spec.Size

	dep := &appsv1.Deployment{
		ObjectMeta: metav1.ObjectMeta{
			Name:      m.Name + suffix,
			Namespace: m.Namespace,
		},
		Spec: appsv1.DeploymentSpec{
			Replicas: &replicas,
			Selector: &metav1.LabelSelector{
				MatchLabels: ls,
			},
			Template: corev1.PodTemplateSpec{
				ObjectMeta: metav1.ObjectMeta{
					Labels: ls,
				},
				Spec: corev1.PodSpec{
					Containers: []corev1.Container{{
						Image: m.Spec.Image,
						Name:  "bluegreen",
						Ports: []corev1.ContainerPort{{
							ContainerPort: 8080,
							Name:          "bluegreen",
						}},
					}},
				},
			},
		},
	}
	// Set BlueGreenDeployment instance as the owner and controller
	controllerutil.SetControllerReference(m, dep, r.Scheme)
	return dep
}

// labelsForBlueGreen returns the labels for selecting the resources
// belonging to the given bluegreen CR name.
func labelsForBlueGreen(name string) map[string]string {
	return map[string]string{"app": "bluegreen", "bluegreen_cr": name}
}

// getPodNames returns the pod names of the array of pods passed in
func getPodNames(pods []corev1.Pod) []string {
	var podNames []string
	for _, pod := range pods {
		podNames = append(podNames, pod.Name)
	}
	return podNames
}

Whew! That's a chunk of code, but let's break it down:

  1. We define a BlueGreenReconciler struct that implements the Reconcile method.
  2. In the Reconcile method, we fetch our custom resource and check if a deployment exists.
  3. If the deployment doesn't exist, we create a new one using deploymentForBlueGreen.
  4. We ensure the deployment size matches our spec and update if necessary.
  5. Finally, we update the status of our custom resource with the pod names.

The Secret Sauce: Blue-Green Magic

Now, here's where the blue-green deployment magic happens. We need to add logic to create both blue and green deployments, and switch between them. Let's enhance our controller:


func (r *BlueGreenReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
	// ... (previous code)

	// Create or update blue deployment
	blueDeployment := r.deploymentForBlueGreen(blueGreen, "-blue")
	if err := r.createOrUpdateDeployment(ctx, blueDeployment); err != nil {
		return reconcile.Result{}, err
	}

	// Create or update green deployment
	greenDeployment := r.deploymentForBlueGreen(blueGreen, "-green")
	if err := r.createOrUpdateDeployment(ctx, greenDeployment); err != nil {
		return reconcile.Result{}, err
	}

	// Check if it's time to switch
	if shouldSwitch(blueGreen) {
		if err := r.switchTraffic(ctx, blueGreen); err != nil {
			return reconcile.Result{}, err
		}
	}

	// ... (rest of the code)
}

func (r *BlueGreenReconciler) createOrUpdateDeployment(ctx context.Context, dep *appsv1.Deployment) error {
	// Check if the deployment already exists
	found := &appsv1.Deployment{}
	err := r.Get(ctx, types.NamespacedName{Name: dep.Name, Namespace: dep.Namespace}, found)
	if err != nil && errors.IsNotFound(err) {
		// Create the deployment
		err = r.Create(ctx, dep)
		if err != nil {
			return err
		}
	} else if err != nil {
		return err
	} else {
		// Update the deployment
		found.Spec = dep.Spec
		err = r.Update(ctx, found)
		if err != nil {
			return err
		}
	}
	return nil
}

func shouldSwitch(bg *mycompanyv1.BlueGreenDeployment) bool {
	// Implement your logic to determine if it's time to switch
	// This could be based on a timer, manual trigger, or other criteria
	return false
}

func (r *BlueGreenReconciler) switchTraffic(ctx context.Context, bg *mycompanyv1.BlueGreenDeployment) error {
	// Implement the logic to switch traffic between blue and green
	// This could involve updating a service or ingress resource
	return nil
}

This enhanced version creates both blue and green deployments and includes placeholder functions for determining when to switch and how to switch the traffic.

Putting It All Together

Now that we have our controller logic, we need to set up the operator. Here's a basic main.go file to get us started:


package main

import (
	"flag"
	"os"

	"k8s.io/apimachinery/pkg/runtime"
	utilruntime "k8s.io/apimachinery/pkg/util/runtime"
	clientgoscheme "k8s.io/client-go/kubernetes/scheme"
	_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/log/zap"

	mycompanyv1 "github.com/mycompany/api/v1"
	"github.com/mycompany/controllers"
)

var (
	scheme   = runtime.NewScheme()
	setupLog = ctrl.Log.WithName("setup")
)

func init() {
	utilruntime.Must(clientgoscheme.AddToScheme(scheme))
	utilruntime.Must(mycompanyv1.AddToScheme(scheme))
}

func main() {
	var metricsAddr string
	var enableLeaderElection bool
	flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.")
	flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,
		"Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.")
	flag.Parse()

	ctrl.SetLogger(zap.New(zap.UseDevMode(true)))

	mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
		Scheme:             scheme,
		MetricsBindAddress: metricsAddr,
		LeaderElection:     enableLeaderElection,
		Port:               9443,
	})
	if err != nil {
		setupLog.Error(err, "unable to start manager")
		os.Exit(1)
	}

	if err = (&controllers.BlueGreenReconciler{
		Client: mgr.GetClient(),
		Log:    ctrl.Log.WithName("controllers").WithName("BlueGreen"),
		Scheme: mgr.GetScheme(),
	}).SetupWithManager(mgr); err != nil {
		setupLog.Error(err, "unable to create controller", "controller", "BlueGreen")
		os.Exit(1)
	}

	setupLog.Info("starting manager")
	if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
		setupLog.Error(err, "problem running manager")
		os.Exit(1)
	}
}

Deployment and Testing

Now that we have our operator ready, it's time to deploy and test it. Here's a quick checklist:

  1. Build your operator image and push it to a container registry.
  2. Create the necessary RBAC roles and bindings for your operator.
  3. Deploy your operator to your Kubernetes cluster.
  4. Create a BlueGreenDeployment custom resource and watch the magic happen!

Here's an example of how to create a BlueGreenDeployment:


apiVersion: mycompany.com/v1
kind: BlueGreenDeployment
metadata:
  name: my-cool-app
spec:
  replicas: 3
  image: mycoolapp:v1

Pitfalls and Gotchas

Before you run off to implement this in production, keep these points in mind:

  • Resource management: Running two deployments simultaneously can double your resource usage. Plan accordingly!
  • Database migrations: Be careful with database schemas that aren't backwards compatible.
  • Sticky sessions: If your app relies on sticky sessions, you'll need to handle this carefully during the switch.
  • Testing: Thoroughly test your operator in a non-production environment first. Trust me, you'll thank yourself later.

Wrapping Up

And there you have it! A custom Kubernetes Operator that handles blue-green deployments like a champ. We've covered a lot of ground, from custom resources to controller logic and even some deployment tips.

Remember, this is just the beginning. You can extend this operator to handle more complex scenarios, add monitoring and alerting, or even integrate with your CI/CD pipeline.

"With great power comes great responsibility" - Uncle Ben (and every DevOps engineer ever)

Now go forth and deploy with confidence! And if you run into any issues, well... that's what rollbacks are for, right?

Food for Thought

As you implement this in your own projects, consider the following:

  • How could you extend this operator to handle canary deployments?
  • What metrics would be useful to collect during the deployment process?
  • How might you integrate this with external tools like Prometheus or Grafana?

Happy coding, and may your deployments be ever green (or blue, depending on your preference)!