According to Fortune, a new study released Thursday by the AI Energy Score project found that AI reasoning models used 30 times more power on average to respond to 1,000 written prompts than models without reasoning. The research, led by Hugging Face’s Sasha Luccioni and Salesforce’s Boris Gamazaychikov, evaluated 40 open AI models from companies like OpenAI, Google, and Microsoft. The disparity was extreme in some cases: a DeepSeek R1 model used 50 watt-hours with reasoning off but a staggering 7,626 watt-hours with it on. This adds to existing concerns about AI straining power grids, with a Bloomberg investigation noting wholesale electricity prices have jumped up to 267% in some data center-heavy areas. The push for reasoning models, like OpenAI’s o1, aims to solve complex problems but comes with a significant, previously under-researched energy cost.
The Reasoning Power Guzzler
Here’s the thing: we all want AI that can “think” step-by-step like a human. It’s fantastic for coding, math, and science. But this study, detailed on arXiv, basically shows we’re paying for that capability in literal megawatts. The core issue? These models generate way, way more text internally before giving you a final answer. It’s not a quick lookup; it’s a whole internal monologue you’re powering. And when you scale that from a single prompt to billions of queries, the numbers get scary fast. The researchers used a tool called CodeCarbon to track this in real time, and the variance was wild. Microsoft’s Phi 4 model went from 18 watt-hours to over 9,400 just by flipping the reasoning switch. That’s not a bump; it’s a cliff.
Not All Models Are Created Equal
But it’s not a uniform disaster. The research also shows some models handle it better than others. OpenAI’s gpt-o1 model showed a less dramatic jump between its “high” and “low” reasoning settings. And Google has argued its own median energy use per prompt is tiny—comparable to a few seconds of TV. So there’s clearly room for optimization and efficiency gains. The real problem, as Luccioni points out, is using a reasoning model for a task that doesn’t need it. Why use a computational chainsaw to cut a piece of paper? We’re in a classic tech hype cycle where “most advanced” is conflated with “always appropriate.” That’s a fast track to burning unnecessary energy and money.
The Bigger Picture: Strain and Scrutiny
This isn’t just about your ChatGPT query. It’s about infrastructure. The industry is rapidly shifting focus from training models (which is a huge, one-time-ish energy hit) to inference—the constant, daily running of them. Reasoning models are inference-heavy. So the very trend that’s making AI more useful is also making it more power-hungry, permanently. Microsoft CEO Satya Nadella recently said the industry needs to earn the “social permission to consume energy” for this, which is a remarkable admission. When utilities are warning about grid strain and companies like Microsoft admit data centers threaten their climate goals, you know it’s serious. The conversation is moving from environmental circles to the C-suite, and that’s where real trade-offs get made.
So What Do We Do About It?
Look, nobody’s saying we should stop developing reasoning models. The capabilities are too important. But this research is a crucial wake-up call for smarter deployment. It means developers need to build more efficient architectures and offer clearer “power settings” to users. It means companies using AI should have a tiered model strategy—using heavy reasoning only when absolutely necessary. And honestly, it means we need more transparency. Relying on internal estimates from companies, as seen in a discussion like this, isn’t enough. Independent benchmarking, like this study, has to become the norm. The goal should be powerful AI that doesn’t inadvertently power down our grids or cook the planet. Turns out, that’s another complex problem that needs solving.
