Go Big or go On-Prem

Published on 2025-06-02

Probably my favourite thing about the “Cloud Exit” from Basecamp that DHH is so vociferously writing about is that it has shifted the predisposition about Cloud usage. For many years it seemed there was no choice: you created an AWS, Google Cloud or Azure¹ account and started creating resources. Who the hell is Dell? What’s a hypervisor?

But economic realities have kicked in and people are starting to wonder why they spend as much on their Cloud bill as it would cost to buy a full rack of servers while employing even more operations staff.

On-Cloud, Prem-Cloud

On-Prem never really went away; depending on the survey, it seems that at least half of computing workloads are still hosted in a private data centre instead of the public cloud. And of course, you might argue what “Cloud” even means: is it about the technology, the economic aspect, or culture? If you can autonomously provision your workloads on a cluster that’s managed by somebody else and is running on machines inside your organisation, it feels like Cloud to you, but has a pay-to-own cost structure.

Conversely, if your organisation uses the services of a Cloud provider, but you don’t do DevOps and software cannot be deployed without operator intervention, you might be spending Cloud-money without embracing the culture and reaping the benefits.

Oxide advertises themselves as “The cloud you own” as their product is a rack-scale computer that can be provisioned like a public cloud account, using e.g. Terraform. You don’t rent it, you buy it once and run it as long as it’s feasible – and I bet that’s longer than most people would expect.

And then there are combinations which never really made sense to me²: buying a bunch of servers and then paying heaps of yearly licence fees for every layer in the software stack – hypervisor, operating system, storage system, etc. Especially since none of those vendors is able or willing to solve your problems in case the shit hits the fan.

My biggest gripe is the low-levelness of many solutions that e.g. AWS offers: it takes astonishing complexity to get anything going. Configuring a setup on EC2 might be quite easy, but it means that you don’t get many of the benefits. You want to add a load balancer? Prepare to configure VPCs and subnets and manage multiple availability zones, and oh: don’t forget to add CloudTrail and CloudWatch!

Of course, with all these tools stacked up, your bill will too. Your trivial workload incurs a non-trivial cost. Also, most of the cloud setups I encounter are massively overprovisioned, just in case. It’s great that your development team doesn’t need to count CPU cycles, and can just write Python, but when you only utilise 10% of already expensive compute, you’re not getting any of the savings.³ Development environments could be shut off overnight, when nobody is using them, but getting this process in place seems to cost more than it saves in usage.

Even solutions like Elastic Container Service still require a lot of prefabrication: you don’t have to manage EC2 instances anymore, but now one has to figure out how to connect your container registry. It might make it easier to scale or to add redundancy, but the complexity is just moved around at best.

Kubernetes might be more provider-agnostic⁴, but I don’t think that anybody has ever made the argument that it is non-complex. And boy, do I see many setups that do not warrant its complexity. If the number of microservices outnumbers the number of developers, you already have a problem.

Above the Cloud

Some people said “fuck all of that” and started moving into more upstack, but lightweight solutions, like AWS Lambda. If your databases, storage and load balancers are already run by your cloud provider, you can go a step further and eliminate your internal complexity. Write a piece of JavaScript, Go or Python, upload it to AWS, and they host it for you.

As with everything, there are trade-offs: at the low end, cost is lower than spinning up any EC2 instance, but it can eat through any IT budget at the high end. You can start quicker, but you are locked into your provider. You can use a familiar programming language, but not necessarily any framework.

Heroku was truly ahead of their time: they certainly made mistakes, in technology as in business, but connecting your Git repo and getting your software deployed in minutes was truly groundbreaking. Unfortunately, in 2015 one half was building the next Instagram so they thought they’d outgrow Heroku within seconds and reached for more grown-up cloud solutions, the others were still SFTP-ing PHP scripts to their server. Maybe the world simply wasn’t ready yet.

It’s all about the Benjamins

The new generation of more integrated Cloud offerings has better chances: borrowing money isn’t as cheap as it was, companies don’t want to hire additional Cloud operators, and developers shall be more productive. People are reaching for more managed tooling: Vercel is already a big hit, despite the underlying framework, Next.js, being an absolute nightmare. Quality of integration is more important than quality of your individual components.

With services like fly.io, the operational burden stays low while offering a lot of options. If you’ve got a Dockerfile, you can get your service running in mere minutes and it’s almost trivial to scale to multiple regions while not being more expensive than AWS et al.

I absolutely get it when people do the maths and conclude that buying and running their own hardware makes sense. I also agree that paying for an integrated service is great. But shelling out top dollar for furniture that you need an expert to assemble and to keep together?

As I wrote in my previous piece, the big cloud providers have one enormous advantage: unified and centralised billing. CFOs across the globe still think they made a killer deal when the Cloud vendor offers them 20% off or gives them a five or six-figure discount, but these are not acts of charity – they just make a slight dent in the long run. Make no mistake: they still have obscene margins, and there is a lot of room for negotiation. Especially egress is a money printer.

If they don’t want to make you a seemingly generous offer, they either think that you’re not going to grow or that your organisation is too incapable to switch to a different vendor.

It’s much easier to add another ELB than to get the company credit card to sign up with a new provider. As a side note, if you are not letting your staff create new resources autonomously without an architecture review board getting involved, it doesn’t make sense to be in the Cloud at all. A strategy to prevent getting too complacent can be personal budgets for services that developers can use freely to spend with other providers without going through a massive purchasing process.

Computer hardware has become much faster and cheaper, but many of the providers do not pass the savings on to the customer. AWS used to reduce prices on a regular schedule, but those times are over. EC2 prices have at best stayed flat, while Amazon runs their hardware for much longer these days. It’s a marker of their success that they are dominating the market, but also that CTOs have become a bit blind to the fundamentals. Even I was stumped when I realised that the smallest VM instance at Hetzner now offers 4GB of RAM, at just under 4 EUR (USD ~4.50) per month!

It’s a matter of incentives, too: technologists are not really pushed towards being cost-efficient. Setting up a Kubernetes cluster may be driven more by the desire to be able to put it on the CV than to improve reliability or speed.

The technology sector, as unconventional as it can be, has largely kept sales quite traditional: lower base salary, bigger commission. Which kind of makes sense – when sales gets a win, the whole company profits. With technical roles, one could argue the same: when a developer cuts the Cloud bill, everybody saves. But all these savings will end up, at best, in the shareholders’ pockets. Only when it’s almost too late, management will start rampaging and trying to cut down, starting to put every line item in question.

I think a more balanced leadership approach needs to be adopted that emphasises more mid-term thinking: the long term is overvalued in my opinion – some C-level staff try to imagine a wild future vision, where they are AI-first or whatever makes them look smart or gets them through their next funding round. The short term is dominated by reactive firefighting. In between, you can push for a better discount with your Cloud provider, move some workloads either on-prem or onto servers you own, push for better utilisation or move to higher value, cheaper maintenance all-in-one providers. Employ people who are smart and adaptable, not those who just have the right things on the CV you are currently using. Chances are, your business will need to change more than it needs the current crop of tech.

AWS likes to emphasise in their training materials the dangers of “undifferentiated heavy lifting” in which you are doing things that don’t improve your product, but require your attention. If you look at the heavy lifting that you need to do to satisfy AWS’s well-architected framework, one could argue that you shouldn’t be a customer there.

How is your strategy going? Are you going more upstack or doubling down on on-prem? I’d love to hear from your experiences!

God help us ↩︎
Despite some people’s best efforts to explain it to me like I am five. ↩︎
A 10% utilisation also means that you are not paying a 5x premium on the compute – you are paying 50x! ↩︎
It isn’t though for most. ↩︎