Your Platform Engineering Team Is Understaffed
The fix to platform team understaffing isn't hiring more — it's building on infrastructure where monitoring, security, and upgrades come built in.
Perspectives, mises à jour et histoires de notre équipe
The fix to platform team understaffing isn't hiring more — it's building on infrastructure where monitoring, security, and upgrades come built in.
Upstream contribution costs real engineering time. It also compounds over time in ways that internal fixes never do. What fifteen years of contributing to OpenStack, Kubernetes, and Ceph actually looks like.
A technical deep-dive into how Navos manages zero-downtime Kubernetes cluster upgrades — the sequencing, primitives, and operational process behind every upgrade.
Upstream contribution costs real engineering time. It also compounds over time in ways that internal fixes never do. What fifteen years of contributing to OpenStack, Kubernetes, and Ceph actually looks like.
We've been contributing to OpenStack since its second release in 2011. Our CEO, Mohammed Naser has served as chair of the OpenStack Technical Committee, as a board member of the OpenInfra Foundation, and as project team lead for OpenStack-Ansible and Puppet-OpenStack, while holding core membership across projects including Magnum. Since 2016, we've provided infrastructure that powers much of the OpenDev CI system, which is a critical part of how the OpenStack community develops and tests its software. In 2019, we received the Superuser Award at the OpenInfra Summit. We have a history of deploying the latest OpenStack releases on the same day they launch.
We tell you this not to credential-drop, but because it's the context without which the rest of this post makes no sense. We don't write about upstream contribution as an abstract strategy. We write about it as something we've been doing for fifteen years, across OpenStack, Kubernetes, Ceph, Zuul, and the broader open infrastructure ecosystem. Our community-driven philosophy isn't a positioning statement — it's the reason VEXXHOST exists in the shape it does today.
And because we've lived this for long enough, we want to write honestly about what upstream contribution actually costs and what it actually buys. The gratitude we feel toward this community is real. So is the operational complexity. Both things are true.
Upstream contribution is not free, and treating it as free is how companies build unsustainable programs that quietly collapse after a few quarters.
Engineering time is the most significant cost, and it's larger than it looks. When your engineer finds a bug, the internal fix and the upstream-ready patch are different things. The upstream patch has to be general enough to serve users you'll never meet, stripped of your internal context, written in the project's conventions, and accompanied by tests the maintainers will accept. The review process then asks your engineer to respond to feedback from people who have a different mental model of the codebase, sometimes over weeks of back-and-forth, with a branch that needs to stay rebased throughout. This is valuable work, but it takes time, meaningfully more than a purely internal fix.
Release coordination creates a gap you have to manage. Once a fix is merged upstream, you're waiting on the project's release cycle before you can consume it cleanly. We've solved a version of this ourselves by running on same-day OpenStack releases, which is only possible because our engineers are close enough to the development process to know exactly what's coming and why. Most teams aren't in that position. For them, the gap between merge and consumption means carrying patches temporarily, which is real overhead.
This is worth being precise about, because we're sometimes quoted as having "no local forks". We do not maintain permanent divergences from upstream. But the path from "we wrote this fix" to "upstream shipped it and we're running it clean" involves a period where we carry patches against a moving codebase, resolving conflicts when upstream changes and our patches need to be rebased. Our Atmosphere deployment infrastructure reflects this honestly: there are documented processes for managing patch series against upstream projects precisely because that's how responsible upstream contribution actually works. The goal is always to get the patch upstream and eliminate the carry cost. The temporary carry is the price of doing it right rather than just shipping an internal fix and moving on.
Review asymmetry is worth acknowledging. Maintainers of large projects are often working under significant time pressure. Your contribution is competing with their other priorities. A PR can sit. You will get change requests you don't entirely agree with. Your engineer will need to context-switch back to a branch they thought was done. This friction is the process by which shared codebases maintain coherence. But it's friction nonetheless, and it needs to be planned for.
We've been running this model long enough to see what the returns actually look like.
The maintenance math is the foundation. Every bug fixed upstream is a bug the community maintains forever. Every feature contributed upstream is a feature that gets tested, documented, and improved by people other than you. The alternative, which is carrying internal patches against a moving upstream, is a cost that doesn't end. It grows as the project evolves, as the engineers who wrote the original patch turn over, and as the divergence accumulates context that nobody fully understands anymore.
Running on same-day releases is our clearest demonstration of this. We can do it because there is no divergence to reconcile. We're deploying software we already understand because our engineers helped build it and is the compounding return on fifteen years of upstream investment.
Technical leadership gives you things you can't buy. Mohammed's work as TC chair, and our engineers' PTL and core roles across projects, means we are not passive consumers waiting to find out what the next release does. We are in the rooms where architectural decisions get made. We understand breaking changes before they ship. We advocate for the needs of operators, for our infrastructure, for our users. This is a capability that cannot be acquired through any means other than doing the work consistently over time.
The community trusts you differently when you've contributed to it. This is not a soft benefit. When you've been a known contributor to a project for years, you build deep relationships with the people making decisions. In a fast-moving ecosystem, this means you learn about changes before they become surprises. It means the community is genuinely invested in your success, in the same way you're invested in theirs. It means your concerns get weighed.
The OpenInfra Foundation relationship is a concrete example. We upgraded from Silver to Gold sponsorship because this community's success is inseparable from our own. Mohammed's statement when we made that move — that "our growth and success as a company springs from our community-driven philosophy and our open source foundations" and that the sponsorship reflects genuine gratitude — is not PR language. It describes how we actually think about the relationship.
Upstream contribution is how you build a team that stays. Engineers who care about craft (and the best ones do) want to work somewhere that does things the right way. Contributing upstream is a verifiable signal that your company invests in infrastructure rather than purely extracting from it. Our contribution history is public. Any engineer evaluating us can see it. That matters, particularly when the alternative is a company that claims to value engineering excellence but quietly maintains a private fork of everything.
The reason most companies underinvest in upstream contribution is a measurement problem, not a strategic one. The costs are immediate: engineer hours, delayed sprints, review cycles that take longer than expected. The benefits are distributed across time, often invisible until you're not paying them.
A maintenance cost avoided doesn't appear anywhere in your metrics until it becomes a production incident. The community relationship you've built doesn't show up until you need it urgently. The engineer you hired because your contribution history signalled something real. This asymmetry systematically pushes short-horizon planning toward underinvestment.
We've never resolved this through quarterly optimization. We resolved it by making upstream-first a principle rather than a case-by-case calculation. The policy is simple: any fix or improvement we develop is contributed back to the upstream project. Everything we deploy is contributed upstream.
That absolutism is sustainable for us because the infrastructure we build is the infrastructure we contribute to. Atmosphere, our open-source private cloud platform spanning virtual machines, Kubernetes, bare metal, storage, and networking — built on OpenStack, Kubernetes, and Ceph — is built and maintained by the same engineers who are core contributors to those projects. There's no organizational wall between "the people who contribute" and "the people who operate." They're the same people. The incentive alignment is structural.
Start with the projects that are genuinely central to what you do. Not every dependency deserves the same level of investment, but the ones at the core of your infrastructure, where the case for upstream contribution is clearest and most durable.
Be honest about the carry cost of patches. Temporary patches are not a failure of upstream-first discipline; they're how the process works in practice. The discipline is in treating every temporary patch as a debt with a deadline, pushing it upstream as fast as you can, and never letting the accumulation become a permanent divergence.
Get your engineers into the community, not just the codebase. The contribution that compounds fastest is an engineer who becomes a trusted voice in a project, who gets core membership, who eventually takes on a PTL role. That's a long investment, but the leverage it creates is qualitatively different from a series of drive-by fixes. We know this from fifteen years of watching it work.
And be honest about why you're doing it. The economics are real (we've laid them out here). But if the only reason you contribute upstream is to avoid maintenance costs, that tends to show, and it limits the depth of the relationships you can build. We contribute because we believe contributing to open source creates better software, because we know how important it is to be community-driven and we're committed to being in the heart of it, and because the community that makes our infrastructure possible deserves our genuine investment. The economics follow from that.
We've been contributing to the OpenStack community since 2011, and we're still learning how to do it better. VEXXHOST is a Gold Member of the OpenInfra Foundation and a member of the Linux Foundation, CNCF, and Ceph Foundation. If you're building a contribution program and want to compare notes on what's worked and what hasn't, we're genuinely happy to talk.
Choose from Atmosphere Cloud, Hosted, or On-Premise.
Simplify your cloud operations with our intuitive dashboard.
Run it yourself, tap our expert support, or opt for full remote operations.
Leverage Terraform, Ansible or APIs directly powered by OpenStack & Kubernetes