Sovereign by Architecture: Building AI Infrastructure for the EU AI Act
The EU AI Act takes effect August 2026. Compliance starts at the infrastructure layer. Learn why sovereign AI needs OpenStack, Kubernetes, and Atmosphere.
Perspectives, mises à jour et histoires de notre équipe
The EU AI Act takes effect August 2026. Compliance starts at the infrastructure layer. Learn why sovereign AI needs OpenStack, Kubernetes, and Atmosphere.
Learn how a lightweight keystoneauth1 plugin brings your existing browser-based MFA and SSO to the OpenStack CLI, with no changes to any client tools.
Hyperscaler AI looks fast but hides long-term lock-in and rising costs. See how OpenStack and Kubernetes deliver GPU infrastructure you actually control.
Many AI clusters run at only 30–50% GPU utilization. Learn why GPUs sit idle and how Kubernetes, scheduling, and better infrastructure design can improve AI infrastructure efficiency.
The AI industry has a GPU obsession.
Every conversation about AI infrastructure eventually returns to the same problem: there aren't enough GPUs. Enterprises wait months for accelerators. Cloud providers struggle to keep capacity available. Hyperscalers race to secure supply chains.
But inside many AI environments, a different reality exists.
GPUs are sitting idle.
Not because organizations bought too many accelerators. Not because AI workloads disappeared. But because the infrastructure around those GPUs isn't designed to use them efficiently.
The real bottleneck in AI infrastructure isn't always GPU supply. Often, it's GPU utilization.
Today, AI infrastructure design is becoming a discipline of its own. Organizations building GPU clusters for machine learning must consider far more than raw compute capacity. Efficient AI infrastructure requires the right combination of GPU scheduling, high-throughput storage, fast interconnects, and orchestration platforms capable of coordinating distributed training workloads. Without these components working together, even the most advanced GPU clusters can suffer from low utilization and wasted compute resources. As AI adoption accelerates across industries, improving GPU utilization in modern AI infrastructure is quickly becoming one of the most important challenges for platform engineering teams.
GPUs are expensive assets.
The NVIDIA H200 GPU costs $30K–$40K to buy outright, and large AI clusters easily reach millions of dollars in hardware investment. Historically, previous-generation flagship GPUs tend to see price adjustments once new architectures enter the market. With Blackwell B100/B200 GPUs now shipping, expect H200 rates to soften throughout 2026—but the underlying cost structure remains enormous.
Yet many organizations struggle to keep those GPUs fully utilized.
Typical challenges include:
The result is surprisingly common. Even at the frontier of AI, utilization rates fall well short of theoretical capacity. When GPT-4 was trained on 25,000 A100s, average utilization hovered at just 32–36%. It's worth noting that this figure refers to Model FLOPs Utilization (MFU) the percentage of peak theoretical compute actually used for useful training math rather than simple hardware occupancy. Despite the power of these GPUs, the model was running at only about 32% to 36% of the maximum theoretical utilization, meaning the majority of available compute went unused. In effect, you may be paying for five GPUs, but using only two.
To put this in dollar terms: for example, a 30 percent utilization rate on a $100 million GPU investment means $70 million is sitting idle. A high-density rack of B200s ($4M upfront cost) sitting at 40% utilization burns through cash much faster than a marginally inefficient cooling system.
For infrastructure teams, this represents a massive efficiency gap and a strategic problem.
Traditional infrastructure was not designed for AI workloads.
Cloud platforms evolved primarily to run:
These workloads scale horizontally and consume CPU and memory in predictable ways.
AI workloads behave differently.
Machine learning jobs require:
Kubernetes, the de facto standard for container orchestration, wasn't originally designed with GPUs in mind. It was built for CPU-centric workloads with predictable, preemptive scheduling. Schedulers optimized for traditional workloads often fail to pack GPU workloads efficiently across nodes.
One of the biggest challenges in AI infrastructure is resource fragmentation.
Training jobs often require specific GPU configurations:
When clusters cannot allocate GPUs efficiently, workloads queue even when GPUs remain technically available.
Here's what this looks like in practice: a data scientist submits a distributed training job at 9 AM requesting 4 GPUs. The cluster has four idle GPUs—but they're scattered across different nodes. The job needs all GPUs co-located on a single node with high-speed NVLink interconnects for efficient gradient synchronization. So, the job sits in queue until 3 PM, when a contiguous block finally opens up. Six hours of researcher productivity, gone. Four GPUs idle the entire time, costing the organization money and producing nothing.
Without gang scheduling, partial resource allocation causes deadlock—jobs wait forever for remaining GPUs that never become available. Organizations that win with AI at scale have made a cultural shift: they treat GPUs as a shared, policy-driven substrate governed by queues, not as pets hand-assigned to projects.
The result: underutilized hardware, slower training pipelines, and spiraling costs.
GPU utilization depends heavily on data throughput.
If data pipelines cannot feed GPUs fast enough, expensive accelerators simply wait. The input data pipeline for DNN training today cannot keep up with the speed of GPU computation, leaving the expensive accelerator devices stalled for data.
The scale of this problem is significant. Recent research shows that some DNNs could spend up to 70% of their epoch training time on blocking I/O despite data prefetching and pipelining. Meanwhile, a recent study of millions of ML training workloads at Google shows that jobs spend on average 30% of their training time on the input data pipeline. Whether it's 30% or 70%, the message is the same: GPUs are spending a significant portion of their time waiting for data rather than training models.
Common problems include:
The underlying cause of this problem is two-fold: storage bandwidth limitations and inefficient caching. Even the most modern GPUs can't accelerate training if they're sitting idle, waiting for data to process. When data starvation occurs, additional investments in more powerful compute hardware deliver diminishing returns, a costly inefficiency in production environments.
In large, distributed training environments, storage performance can directly determine GPU utilization.
This is where Kubernetes changes the equation.
Kubernetes has evolved from a CPU-centric container orchestrator to a capable platform for GPU-intensive AI/ML workloads. AI workloads are increasingly adopting the same orchestration model.
With the right configuration, Kubernetes can enable:
A growing ecosystem of Kubernetes-native tools makes this possible:
The ecosystem now provides stable GPU scheduling via device plugins and operators, multiple sharing strategies (MIG, MPS, time-slicing) for efficiency, and gang scheduling via Kueue and Volcano for distributed training.
The strategic impact is clear: in aggregate, these tools transform clusters into a high-efficiency AI platform capable of sustaining 90% GPU utilization under active load.
Even with Kubernetes in place, infrastructure design still matters.
A Kubernetes cluster orchestrating GPUs in a hyperscaler environment still depends on the availability, pricing, and scheduling policies of the underlying infrastructure. Instance availability is often opaque, pricing can shift unpredictably, and organizations frequently lack visibility into hardware topology—all of which limit GPU control even when the orchestration layer is well-designed. This economic problem is rooted in a technological one: GPUs are not easily virtualized or shared.
When organizations run Kubernetes on open infrastructure platforms like OpenStack, they gain greater control over GPU resources.
This enables:
Combined with high-performance storage systems such as Ceph, this architecture helps ensure that data pipelines keep pace with GPU throughput—so accelerators remain fully utilized rather than waiting on storage I/O bottlenecks.
For years, infrastructure conversations focused on cluster size.
How many GPUs do you have?
In 2026, the more important question is:
How efficiently are you using them?
For large-scale training runs, well-optimized organizations target 80–95% GPU utilization. Anything consistently below 50% signals significant room for improvement—in scheduling, data pipelines, or infrastructure architecture. Even moving from 35% to 65% utilization effectively doubles the useful output of existing hardware without buying a single additional GPU.
Organizations that maximize GPU utilization gain several advantages:
In an environment where GPU capacity is scarce and expensive, efficiency becomes a strategic advantage.
The AI infrastructure conversation often focuses on supply: more GPUs, larger clusters, bigger clouds.
But the organizations succeeding in AI aren't just buying more hardware.
They are building platforms that use that hardware effectively.
That means combining:
In 2026, the organizations winning the AI infrastructure race won't be the ones with the most GPUs. They'll be the ones with the highest utilization per dollar.
Want to learn how to build high-utilization AI infrastructure on open platforms? Feel free to contact us!
Choose from Atmosphere Cloud, Hosted, or On-Premise.
Simplify your cloud operations with our intuitive dashboard.
Run it yourself, tap our expert support, or opt for full remote operations.
Leverage Terraform, Ansible or APIs directly powered by OpenStack & Kubernetes