Why are OpenStack Upgrades So Difficult?

Upgrades remain one of those concepts that everyone seems to feel is one of the Achilles' heels of OpenStack. Learn why they actually aren't difficult and, instead, why you may be making them difficult.

Over the past years, OpenStack has faced a ton of criticism around a few areas (hey, we did deserve it sometimes!). Still, upgrades remain one of those concepts that everyone seems to feel is one of the Achilles' heels of OpenStack. Therefore, I believe it would be useful to address that with a bit of history and context. This way, you can get a deeper understanding of how we tend to make upgrades difficult and what is genuinely tedious about the upgrades.

History

OpenStack has developed a history around very difficult, complicated, and challenging upgrades. We didn't get that reputation out of nowhere. There was indeed a time when upgrades were a colossal pain. We're talking about no rolling upgrades (simply restarting your entire cloud), little upgrade testing, and network data plane connectivity loss on agent restarts. It was pretty chaotic back then, and I think this has historically left a bit of a mark on the project.

Since that time, OpenStack has improved its upgrades significantly. To give context, many of the service projects utilize a tool called Grenade, which focuses on ensuring that the projects remain upgradable across releases. This tool runs on every single commit to these service projects, keeping in mind the code will not merge if they don't pass.

Additionally, almost all deployment tooling, which is consumed by OpenStack users, has CI jobs that run on every single commit. Therefore, it deploys the previous major version of OpenStack, tests it, runs an upgrade, and then tests it a second time. As a result, we're carrying out countless major upgrades every single day, and any breakages will result in no code going in, forcing teams to resolve those issues to continue merging code.

How do users make it hard?

By using software that is tested adequately for upgrades combined with the deployment tools that check for upgrades, there's no reason to be concerned about software not operating correctly. Although there are a few different groups who are generally affected by upgrades, finding them more difficult than usual, I believe that OpenStack is not at fault.

Vendor plugins

The first group relies on out-of-tree, vendor-specific drivers for things like storage or network. For example, you may be using a commercial SDN offering that lives out-of-tree. Therefore, you are unable to upgrade your cloud since you must wait for your vendor to test the functions with the new release.

In the OpenStack world, we have something called third-party CI. In this case, your vendor can run their CI jobs against every single commit of OpenStack, which means that they can instantly catch failures and fix them before the release of OpenStack. Therefore, they can release their plugins at the same time as OpenStack. I'm a huge believer in open source, and generally, I would recommend going towards open source drivers before any commercial ones since this will eliminate a possible upgrade blocker.

Unfortunately, not all vendors choose to run third party CI or ensure that they function from the day a new version of OpenStack is released. Therefore, it makes it challenging to upgrade OpenStack, even though it isn't at fault.

Forks

Forking OpenStack and maintaining local patches is another reason why the upgrade process is complicated. While I understand that there is a need for specific “customizations” in OpenStack, there have been one too many instances of individuals trying to accomplish what already exists within OpenStack.

OpenStack has been around for a while, and it services various types of operators (public cloud, enterprise, finance, government, carriers). Therefore, it's very unlikely that your use case isn't compatible with OpenStack. I would strongly recommend that you consider your use case and determine a few different alternatives before forking an OpenStack project. After all, once you fork the code and start maintaining your patches, you escalate the amount of work involved from “deploying OpenStack” to “developing OpenStack,” which is a significant leap.

Within OpenStack, we focus on the four opens (open source, open design, open development, open community). Nothing about OpenStack is built based on a specific vendor's roadmap. If you are running a fork of OpenStack with patches, I encourage you to reach out to our community and discuss your use case. We can guide you with the specific use case and determine if it is compatible with living inside OpenStack, or how to achieve what it is you're looking to do correctly.

Time

It's important to understand that OpenStack is a moving piece of technology. It needs to be looked after and continuously upgraded. If you're going to deploy it once and leave it there forever, you're setting yourself up for a ticking time bomb. In the end, you'll have to redeploy your entire cloud because you'll hit a point where it's way too taxing to upgrade your infrastructure.

The deployment tooling has now made it easier than ever to upgrade things on your own. It also helps with scoping enough time within your organization to enable continuous upgrades, which will be extremely impactful for your organization. After all, you will always be able to run newer releases of OpenStack and avoid massive multi-week efforts of upgrading your clouds from one release to another. What's more, is that they only get better and easier, the more often they happen.

API changes

Since OpenStack remains a project that is API driven, there's a chance that during upgrades, the APIs might see a change, resulting in a breakage. Therefore, serious efforts have been implemented within OpenStack by the SDK team. Indeed, by using the OpenStack SDK, you don't have to worry about what version of OpenStack you're talking to, as it'll take care of all the abstractions regardless.

Many OpenStack services implement microversions of their APIs. This helps ensure that even if the API has any changes, the specific microversion still returns an expected behavior for the user. Additionally, we never attempt to change any behavior that will change for at least one release without warning.

With regards to time, by upgrading often and on time, you're able to catch these changes and address them in smaller chunks. Therefore, you won't have to make copious amounts of modifications. After all, if you make multiple upgrades at once, you may introduce many behavioral changes at once.

Challenges

As you can see, the reasons listed above will not cause you problems during your upgrades. In reality, upgrades always remain stressful for a team, especially for a core component such as OpenStack. After all, they can affect the infrastructure operations of an organization significantly.

Over the years, we've helped customers with their upgrades, and have been able to conclude where the most significant challenges lie:

Lack of experience within the team can raise concerns during an upgrade as outages can be introduced, and they can't be solved in a timely basis
Being so far behind in upgrades that the process involves upgrading across several EOL'd releases and operating systems
Every single item mentioned previously, with “time” being the most frequent issue. If you have an OpenStack cloud that you're looking to upgrade and to get ahead to the latest codebase, please reach out, and we'd be happy to help you get there.

Closing thoughts

Evidently, there's one issue in common with most of the complaints regarding OpenStack. None of them actually have to do with OpenStack; they are all operational and infrastructure issues that are very much real regardless of the tooling that you're using.

If you're using any specific vendor plugins for any other infrastructure tooling, you'll need to make sure that they're compatible with the version you're upgrading to. If you're maintaining a fork of the tool you're using, you'll continuously have to backport, rebase, retest, and run your own QA to ensure your version works. If you ignore upgrading your infrastructure, you'll likely be in a tough spot as you'll have to upgrade to many different releases.

Ultimately, I hope this gave you a lot of insight regarding OpenStack and upgrades. Feel free to leave comments to start a discussion if you have any specific points or reach out on Twitter to ask any questions you have. You can reach me @_mnaser.