Patch Management in the Cloud - It's About Consistency and Automation
Let's talk about consistency with respect to patch management as a key aspect of your cloud computing strategy.
If you've chosen wisely, you environments across your public and private clouds are consistent. Often this means that you're looking at the same technical implementation of the same infrastructure and software to run your clouds, right down to the patch level of each component. Consistency also refers to the internally consistent state of your cloud with respect to your virtual workloads and servers/systems.
This is where the discussion of patch management becomes quite interesting. After doing some digging on Google for interesting sources of patch management (a la consistency) for the cloud, I've come up relatively empty for anything that addresses this particular perspective.
The big question is - how do we keep our environments consistent in the face of security requirements to push patches? The answers rely very heavily on automation and policy.
The main point of decision is around the policy direction you're planning on taking. There are 2 separate thoughts on how patches can be pushed to keep an environment consistent. Patch management can either be done via replacement or update of active workloads or virtual machines. Let's look at these two options...
One way to push patches is the same way we've always done it. The way we've always done it involves taking an existing machine, albeit now virtual, and updating the patch level of either the operating system, applications, or something else to the latest and greatest after extensive testing in a non-production environment.
What started out as an exercise done by hand many, many moons ago now has healthy obsession with automation to drive pushing patches in an automated way, and monitoring whether they 'take' in the environment. Automation creates a great deal of leverage, and for cloud environments at least, this means consistency can be achieved at much larger levels of scalability.
All seems right with the world... if you've deployed multiple clouds for non-production and production environments you have the ability to use automation to push patches to a non-production environment, watch the result and gather metrics, learn lessons and then when you're ready do the same to your production environment in some off-hours time or when traffic/workload is low.
I mention a healthy obsession with automation because that's what is required for every step of this to work right. You've got to have a fanatical attachment to your technology and deployment strategy to know that the non-production environment is an exact duplicate of your production environment - and that your automation technology (and policies/procedures) are designed such that they account for the vast numbers of failure modes you can possibly encounter. What if 1 system works and another fails?
There are thousands of what-ifs that must be accounted for an, in themselves, tested before you can trust this works. There are other issues as well... but let's leave those for another post.
To sum things up, in-place updates are all about an obsession with automation and understanding failure modes, accounting for them all while keeping consistency of the environment top-of-mind.
There is another alternative to patching in-place... swapping in-place. Rather than testing your patches in a non-production environment then rolling them out to production using mass automation and monitoring the push the other alternative is to simply forego that entire process and simply rebuild all the machines with the updated patch level (which arguably isn't very hard), deploy the applications or workloads to those updated machines, and cut the traffic over from the old (non-patched) to the new (fully patched) virtual machines.
Just like the in-place update, you're going to require a healthy obsession with automation, but this automation is less about actual patch-management and more about cloud management framework(s). Think about it this way... if you have an application running on server A with patch level 10 right now and you need to update it to patch level 11 quickly. You can either roll the dice and update that machine or you can simply update the base virtual machine and deploy it to the cloud as an updated base image.
Then you can take that base image, re-deploy your application to it (hello, automation!) and simply use the cloud management framework to cut traffic over to that new deployment a little bit at a time monitoring for any adverse reactions (does the patch break the application?) and if nothing breaks you slowly move the entirety of the traffic over to the new environment and decommission the old.
Like a wave of the magician's wand, done. Now, you can keep that old environment around, in a down state, for a while just in case you find some odd bug that only triggers with this new patch level on every other Tuesday at midnight or something weird... but for the most part you're happily re-deployed.
The in-place swap is a great way to showcase the power of the cloud... but honestly it's nothing we couldn't do before with load balancing and fail-over capabilities, it's just that cloud environments make this easier now.
Which is right?
Rather than asking which strategy of patch management is right across the board, let's ask which one is right for you. That's the only thing that matters. There are a number of factors to consider including the type of release cycles your organization is working off of - for example DevOps, NoOps, etc - and whether you have a single public cloud (or private cloud) or you've built your own consistent multi-cloud (public, private, hybrid) converged clouds across several vendors or physical deployments. Lots of things to think about.
If you find this topic interesting you'll really enjoy the next post, where I'll have a guest-post on this topic from one of my HP DevOps experts, with a perspective on this.
I'll of course add some security-based color to the post, but it'll be interesting to see a non-security perspective on this topic to get our heads a little out above our own echo... don't you think?
Cross-posted from Following the White Rabbit