krishna@medialogic:~$ cat 40-percent-tco-isnt-magic.md

22 Apr 2026·2 min read·

#infrastructure#finops#bare-metal

~40% lower TCO than hyperscalers isn't magic

Where the cost actually comes from when you move enterprise AI off AWS/GCP/Azure onto Hetzner + OVH + Proxmox. Spoiler: it's mostly egress, idle GPUs, and observability SaaS.

When I quote "~40% lower TCO" in deck slides - at sustained scale, against hyperscaler GPU instances, not as a universal figure - the question I get most often from CIOs is: where does that actually come from?

Here's the honest breakdown for a typical mid-size sovereign-AI workload (one tenant, ~200 GB vector store, ~50k req/day, two on-call engineers).

The cost is not the GPU

If you compare AWS p5 / GCP A100 hourly rates against the same SKU on Hetzner or OVH bare-metal, the delta is real but smaller than people expect - maybe 30-35% on the compute line alone.

The 40% comes from compounding the smaller wins:

Line item	Hyperscaler share	Bare-metal share
GPU compute (24/7)	100%	~65%
Egress to client networks	painful	~free at OVH
Idle/over-provisioned capacity	substantial	tunable
Managed Postgres / Redis / Kafka	premium	self-hosted on the same fleet
Observability SaaS	per-host * fleet	Grafana + Loki on a spare node
Support tier you actually need to use	annual contract	per-incident

Sum those and a workload that costs €11k/mo on AWS lands closer to €6.5k/mo on bare-metal with the same SLO.

What you give up

Be honest:

Elastic scale-to-zero is gone. Bare-metal is provisioned. If you spike 5× overnight, you're calling someone.
Managed services discipline is on you. Patches, backups, replica failover, certificate rotation - your team owns it. Don't run this stack with fewer than two SREs.
The hyperscaler marketplace is gone. No one-click "Datadog integration." You install the agent, you write the dashboards.

If your engineering org isn't ready for that, the 40% will eat you in opex elsewhere. If it is, the savings are real and durable.

Where this breaks down

For workloads that genuinely need autoscaling (consumer-facing apps with spiky traffic, batch jobs that fluctuate 100×), this calculus inverts. Bare-metal is the wrong primitive there.

For regulated, predictable, enterprise SaaS at sustained scale - which is most of what I see - it's the right one. That sustained-scale workload is exactly where the ~40% holds; spiky or bursty traffic, per the section above, is not.

krishna@medialogic:~$ cd ../ · all opinions →