The 5-Minute GPU Audit: A Checklist for Instantly Spotting Waste
Most organizations waste 95% of their GPU spend without knowing it. Run this five minute audit to find the leaks and fix them before the next invoice.
Insights, updates, and stories from our team
Most organizations waste 95% of their GPU spend without knowing it. Run this five minute audit to find the leaks and fix them before the next invoice.
The fix to platform team understaffing isn't hiring more — it's building on infrastructure where monitoring, security, and upgrades come built in.
Upstream contribution costs real engineering time. It also compounds over time in ways that internal fixes never do. What fifteen years of contributing to OpenStack, Kubernetes, and Ceph actually looks like.
Most organizations waste 95% of their GPU spend without knowing it. Run this five minute audit to find the leaks and fix them before the next invoice.
GPU infrastructure has become one of the largest technology investments in modern enterprises. For organizations running AI at scale, compute spend now rivals, and in some cases exceeds, total headcount costs.
Yet most organizations have no systematic way to evaluate whether that spend is efficient.
Utilization rates remain stubbornly low across the industry. A 2026 Cast AI analysis of roughly 23,000 Kubernetes clusters across major cloud providers found average GPU utilization at just 5% in enterprise environments.
Workloads are routinely overprovisioned. Scheduling gaps go unmeasured. And in the absence of clear ownership or accountability, waste compounds quietly, quarter after quarter, buried inside cloud invoices that few people outside of finance ever scrutinize.
The tools to solve this already exist. Kubernetes native GPU scheduling can eliminate idle gaps between jobs. OpenStack based infrastructure can provide the visibility, portability, and control that proprietary clouds deliberately obscure. The technology isn't the bottleneck. The bottleneck is that most teams never stop to ask the right questions.
We've distilled the most common sources of GPU waste into a 10 point checklist, a five minute audit any team can run today. No specialized tooling required. No vendor assessment needed. Just ten questions, scored honestly, that will tell you whether your GPU investment is working for you or against you.
Start with the most basic question: right now, how many of your GPU instances are actually doing useful work?
Not allocated. Not provisioned. Running a training job, inference request, or batch process at this moment. In most environments, the answer is surprisingly few. The most common sources of idle GPU time are hiding in plain sight:
Instances left running between jobs. A training run finishes Friday afternoon. The instance stays up through the weekend. Nobody tears it down because the environment is fragile, or because the next job might start Monday. That’s 60+ hours of GPU time billed for zero output.
Dev and test environments that never shut down. A data scientist spins up a GPU instance to prototype a model. The experiment ends. The instance doesn’t. Multiply that across a team of ten and you're burning thousands per month on environments nobody is using.
Notebooks with GPU backends sitting open in browser tabs. Managed notebooks attached to GPU instances are some of the worst offenders. They stay alive as long as the session is open even when the user is in a meeting, at lunch, or asleep.
Scheduled jobs that no longer run, but the capacity remains reserved. A weekly retraining job was deprecated two months ago. The reserved GPU capacity is still being billed.
This is how utilization drops to single digits. Not through one bad decision, but through dozens of small oversights that nobody tracks because GPU usage rarely gets audited the way CPU, memory, or storage does. For a closer look at how AI experiments quietly become long-lived infrastructure costs, read The Hidden Infrastructure Cost of AI Experiments.
The first step takes 60 seconds: pull your GPU instance list and check how many are running workloads right now versus how many are just running.
The second most common source of GPU waste is simple: the hardware is larger than the workload actually needs.
A team selects a large GPU during training, the workload works, and the configuration quietly becomes the default even after the workload changes.
But many inference, fine-tuning, and batch workloads use only a fraction of the GPU memory and compute they were allocated.
The fix is simple: compare actual GPU utilization against allocated resources. If a workload consistently uses 30% of available GPU memory or compute, the remaining 70% is wasted spend.
For a closer look at how training and inference have very different hardware requirements, read AI Workloads on Kubernetes: Training vs. Inference Infrastructure.
GPU infrastructure is usually sized for peak demand, not normal usage.
The problem is that peak demand may happen only a few hours per week while the infrastructure stays provisioned and billed 24/7.
Common signs:
The key question: how much GPU capacity do you actually use during an average week versus how much you pay for? If the gap is large, you're funding infrastructure for peaks that rarely happen.
For a deeper look at how committed-use contracts and pricing structures create these traps, read What Your Cloud Provider Doesn't Want You to Think About.
You can't fix waste you can't see. And in most GPU environments, visibility is the first thing missing.
Ask yourself: if someone on your leadership team asked right now which team, project, or workload is consuming the most GPU spend, could you answer in under five minutes?
Most organizations can't. Common gaps include:
No tagging or labeling. GPU instances are provisioned without metadata tying them to a team, project, or budget.
No per-workload utilization tracking. You can see that a GPU instance is running, but not whether the workload is using 5% or 95% of available compute.
No idle detection. Nothing alerts when a GPU instance has been running for 48 hours with no workload activity.
No cost-to-value mapping. Teams know what they're spending on GPUs, but not what that spend is producing.
Without visibility, optimization becomes guesswork.
The audit step is simple: can you identify your top GPU cost drivers by team and workload right now? If not, visibility is your first problem to solve. For more on why infrastructure visibility is the foundation of both cost and energy efficiency, read When Your Net-Zero Pledge Meets Your GPU Cluster.
Some GPU waste isn’t caused by teams. It’s caused by the infrastructure itself.
When the platform limits how GPUs can be allocated, scheduled, and managed, waste becomes structural.
Fixed instance types force overprovisioning. Workloads end up consuming far more GPU capacity than they actually need.
No fractional GPU allocation. Without MIG or vGPU support, lightweight workloads still consume full GPUs.
Poor scheduling. Without topology or locality awareness, GPUs sit idle on one node while jobs queue on another.
No power optimization. Idle nodes stay online because workloads can’t be consolidated automatically.
These aren’t operational mistakes. They’re platform limitations.
For a deeper look at how GPU scheduling and infrastructure design affect utilization, read Why GPUs Sit Idle: The Hidden Efficiency Problem in AI Infrastructure.
Run through these questions right now. Every “no” is a potential source of GPU waste.
Utilization
☐ Do you know your average GPU utilization?
☐ Can you identify idle GPU instances right now?
☐ Are dev and test instances shut down when unused?
Sizing
☐ Have you compared actual GPU usage against allocated capacity?
☐ Are inference workloads running on appropriately sized hardware?
☐ Have instance sizes been reevaluated recently?
Scheduling
☐ Are GPU instances provisioned only when workloads are active?
☐ Do workloads scale down between jobs?
☐ Are reserved contracts still aligned with real demand?
Visibility
☐ Are GPU instances tagged by team and workload?
☐ Can you identify top GPU cost drivers quickly?
☐ Do you detect idle instances automatically?
Infrastructure
☐ Does your platform support MIG or vGPU?
☐ Does your scheduler understand topology and NUMA boundaries?
☐ Can idle nodes power down automatically?
If you answered “no” more than five times, there’s likely significant waste in your environment, and most of it is fixable.
Most items on that checklist fail not because teams aren't trying, but because the platform underneath doesn't provide the right controls. Atmosphere by VEXXHOST is built to pass every one.
Flexible GPU allocation. MIG, vGPU, and passthrough with NUMA-aware placement. Match GPU resources to actual workload requirements, full GPUs for heavy training, fractional GPUs for inference and experimentation. No forced overprovisioning.
GPU-aware scheduling. Kubernetes on Atmosphere sees real GPU topology, node placement, and resource availability. Workloads land on the right hardware instead of queuing while capacity sits idle elsewhere.
Dynamic power management. OpenStack consolidates workloads onto fewer nodes during low-demand periods and powers down idle infrastructure. Every watt goes toward actual work.
Full visibility. Prometheus-based monitoring built into the platform. Real-time utilization, per-workload attribution, and idle detection across every GPU, node, and cluster. No blind spots.
Open and controllable at every layer. Upstream OpenStack and CNCF-certified Kubernetes. No proprietary instance types limiting your sizing options. No vendor-locked scheduling. No black-box control planes hiding what's happening with your hardware.
The waste this audit uncovers is real. The infrastructure to eliminate it already exists.
For a complete look at the platform, read The Complete Guide to Managed OpenStack with Atmosphere.
Most GPU waste isn't malicious. It's invisible — hidden behind poor visibility, oversized defaults, and platforms that don't give you the controls to do better.
Five minutes of auditing can surface thousands in monthly waste. But spotting it is only half the fix. Eliminating it requires infrastructure that supports flexible allocation, intelligent scheduling, and real observability.
That's what open infrastructure is built for. That's what Atmosphere delivers. Explore!
Choose from Atmosphere Cloud, Hosted, or On-Premise.
Simplify your cloud operations with our intuitive dashboard.
Run it yourself, tap our expert support, or opt for full remote operations.
Leverage Terraform, Ansible or APIs directly powered by OpenStack & Kubernetes