Your Data Platform Is Now 60% of Your Cloud Bill.
Do You Know Where It All Went?
Most companies can tell you what they spent on data platforms last year. Almost none can tell you what they got for it.
Three years ago, if a company was spending a million dollars on AWS, less than $50,000 of that went to data platforms like Databricks, Snowflake, or BigQuery. Less than 5%.
Today, for many enterprises, data platforms account for 60% of the total cloud bill. That shift happened in about two and a half years. And it’s still accelerating.
If you’re a senior data operations or FinOps leader, you’ve felt this. The budget conversations have changed. Early on, there was a spirit of experimentation. Spend on AI, build the models, see what’s possible. But that grace period is over. The question now is simple and unavoidable: show me the ROI.
The problem is, most teams can’t answer that question. Not because the value isn’t there, but because they don’t have the visibility or tools to measure and manage it.
The bill is growing. The understanding isn’t.
The growth in data and AI costs isn’t mysterious. The drivers are straightforward if you look one level down.
First, the user base has expanded dramatically. It’s no longer just your data engineering team running queries. Marketing, finance, legal, and business analysts are all now hitting these platforms. That’s a good thing. It means data and AI are becoming core business tools, not science projects. But it also means the volume of workloads is compounding in ways most infrastructure teams didn’t plan for.
Second, AI has made it absurdly easy to generate code. Natural language to SQL. Copilot-assisted pipelines. People who never wrote a query before are now creating them daily. Companies that had 10,000 lines of code now have millions. Companies that had millions now have hundreds of millions.
More users. More code. More queries. More pipelines. And every one of them costs money.
The accountability gap
Here’s where most organizations get stuck. They can tell you they spent $10 million on their data platform last year. What they can’t tell you is where that money actually went in business terms.
“Cluster A” or “Project B” doesn’t mean anything to the CFO asking about ROI. What matters is that the marketing department spent this much running the recommendation engine. The risk team spent this much on fraud detection models. The analytics group spent this much on customer reporting.
Without that decomposition into business context, you can’t answer the question that actually matters: was that spend well spent, or was it wasted?
This is the fundamental gap. It’s not a budgeting problem. It’s an attribution problem. And until you solve it, every ROI conversation will feel like guesswork.
There’s another layer to this that doesn’t get enough attention. As the user base expands and AI generates more code, the inefficiency problem compounds.
Think about what happens when someone runs a select-star query on a petabyte table. That single query can cost tens of thousands of dollars and take hours to complete. Now multiply that across hundreds of users, many of whom aren’t SQL experts, running queries they wrote themselves or had an AI write for them.
There’s a stat making the rounds right now: AI-generated SQL is correct 95% of the time. The other 5% is wrong, and you don’t know which 5%. So you have to test everything. But beyond correctness, there’s the efficiency question. A query can return the right results and still be wildly inefficient in how it gets there, which can dramatically increase costs.
Then there’s the legacy code problem. Pipelines written by people who no longer work at the company. Code that’s been modified over time, with no one fully understanding all the nuances. These things accumulate, and at scale, the cost of those inefficiencies is staggering.
The compounding effect is real. More users are generating more code with less expertise, running on data sets that are 10 times larger than last month. Every inefficiency gets amplified.
From cost-cutting to unit economics
The most sophisticated teams are moving beyond “how do I spend less?” to a better question: “What does it cost me to produce one unit of business value?”
I’ve seen this firsthand. A financial services company running credit score analytics for millions of consumers couldn’t answer a basic question: What does it cost us to produce one credit report? They knew their total platform bill.
They had no idea what it cost to generate a single unit of output, which meant they had no way to think intelligently about pricing, margins, or where to invest next.
Once they could decompose the spend to that level, the whole conversation changed. They found legacy pipelines eating hundreds of thousands of dollars a year that nobody was using. They identified workloads where a 30% efficiency improvement meant they could double throughput without increasing budget. The ROI story wrote itself, because they finally had the data to tell it.
That’s where this conversation grows up. It stops being about trimming budgets and starts being about understanding the economics of your data products well enough to make intelligent business decisions.
Prevention, not just cure
Most teams first encounter this problem after the fact. The bill arrives, it’s bigger than expected, and they scramble to figure out why. That’s the cure side.
The prevention side is more valuable. Can you catch inefficient code before it hits production? Can you establish guardrails during the dev-to-prod pipeline that distinguish healthy code from unhealthy code? Can you train the expanding user base to be better citizens of the platform so the problems don’t compound?
The best teams are doing both. They’re cleaning up the existing mess and putting systems in place so new workloads don’t create the same problems all over again. That combination of cure and prevention is what separates the organizations that get their data costs under control from the ones that keep playing catch-up.
The real question
The workloads aren’t slowing down. The user base isn’t shrinking. And “spend less” isn’t a strategy when your business is demanding more from data every quarter.
The question that actually matters isn’t “how do I cut my data platform bill?” It’s “Do I know what I’m buying?”
Can you trace a dollar of platform spend to a business outcome? Can you tell the difference between a workload that’s generating margin and one that’s burning budget because someone who left the company three years ago wrote a bad pipeline?
Until you can, the ROI conversation hasn’t started. You’re just arguing about a number on a bill.
This article was inspired by a conversation between Unravel Data CEO Kunal Agarwal and Eric Kavanagh on Inside Analysis.