~/krishnamallam/opinions/40-percent-tco-isnt-magic.md
./home./opinionsonline · rome
krishna@medialogic:~$ cat 40-percent-tco-isnt-magic.md
22 Apr 2026·2 min read·
#infrastructure#finops#bare-metal

~40% lower TCO than hyperscalers isn't magic

Where the cost actually comes from when you move enterprise AI off AWS/GCP/Azure onto Hetzner + OVH + Proxmox. Spoiler: it's mostly egress, idle GPUs, and observability SaaS.

When I quote "~40% lower TCO" in deck slides - at sustained scale, against hyperscaler GPU instances, not as a universal figure - the question I get most often from CIOs is: where does that actually come from?

Here's the honest breakdown for a typical mid-size sovereign-AI workload (one tenant, ~200 GB vector store, ~50k req/day, two on-call engineers).

The cost is not the GPU

If you compare AWS p5 / GCP A100 hourly rates against the same SKU on Hetzner or OVH bare-metal, the delta is real but smaller than people expect - maybe 30-35% on the compute line alone.

The 40% comes from compounding the smaller wins:

Line itemHyperscaler shareBare-metal share
GPU compute (24/7)100%~65%
Egress to client networkspainful~free at OVH
Idle/over-provisioned capacitysubstantialtunable
Managed Postgres / Redis / Kafkapremiumself-hosted on the same fleet
Observability SaaSper-host * fleetGrafana + Loki on a spare node
Support tier you actually need to useannual contractper-incident

Sum those and a workload that costs €11k/mo on AWS lands closer to €6.5k/mo on bare-metal with the same SLO.

What you give up

Be honest:

  • Elastic scale-to-zero is gone. Bare-metal is provisioned. If you spike 5× overnight, you're calling someone.
  • Managed services discipline is on you. Patches, backups, replica failover, certificate rotation - your team owns it. Don't run this stack with fewer than two SREs.
  • The hyperscaler marketplace is gone. No one-click "Datadog integration." You install the agent, you write the dashboards.

If your engineering org isn't ready for that, the 40% will eat you in opex elsewhere. If it is, the savings are real and durable.

Where this breaks down

For workloads that genuinely need autoscaling (consumer-facing apps with spiky traffic, batch jobs that fluctuate 100×), this calculus inverts. Bare-metal is the wrong primitive there.

For regulated, predictable, enterprise SaaS at sustained scale - which is most of what I see - it's the right one. That sustained-scale workload is exactly where the ~40% holds; spiky or bursty traffic, per the section above, is not.

krishna@medialogic:~$ cd ../ · all opinions →