~40% lower TCO than hyperscalers isn't magic
Where the cost actually comes from when you move enterprise AI off AWS/GCP/Azure onto Hetzner + OVH + Proxmox. Spoiler: it's mostly egress, idle GPUs, and observability SaaS.
When I quote "~40% lower TCO" in deck slides - at sustained scale, against hyperscaler GPU instances, not as a universal figure - the question I get most often from CIOs is: where does that actually come from?
Here's the honest breakdown for a typical mid-size sovereign-AI workload (one tenant, ~200 GB vector store, ~50k req/day, two on-call engineers).
The cost is not the GPU
If you compare AWS p5 / GCP A100 hourly rates against the same SKU on Hetzner or OVH bare-metal, the delta is real but smaller than people expect - maybe 30-35% on the compute line alone.
The 40% comes from compounding the smaller wins:
| Line item | Hyperscaler share | Bare-metal share |
|---|---|---|
| GPU compute (24/7) | 100% | ~65% |
| Egress to client networks | painful | ~free at OVH |
| Idle/over-provisioned capacity | substantial | tunable |
| Managed Postgres / Redis / Kafka | premium | self-hosted on the same fleet |
| Observability SaaS | per-host * fleet | Grafana + Loki on a spare node |
| Support tier you actually need to use | annual contract | per-incident |
Sum those and a workload that costs €11k/mo on AWS lands closer to €6.5k/mo on bare-metal with the same SLO.
What you give up
Be honest:
- Elastic scale-to-zero is gone. Bare-metal is provisioned. If you spike 5× overnight, you're calling someone.
- Managed services discipline is on you. Patches, backups, replica failover, certificate rotation - your team owns it. Don't run this stack with fewer than two SREs.
- The hyperscaler marketplace is gone. No one-click "Datadog integration." You install the agent, you write the dashboards.
If your engineering org isn't ready for that, the 40% will eat you in opex elsewhere. If it is, the savings are real and durable.
Where this breaks down
For workloads that genuinely need autoscaling (consumer-facing apps with spiky traffic, batch jobs that fluctuate 100×), this calculus inverts. Bare-metal is the wrong primitive there.
For regulated, predictable, enterprise SaaS at sustained scale - which is most of what I see - it's the right one. That sustained-scale workload is exactly where the ~40% holds; spiky or bursty traffic, per the section above, is not.