Not all AI prompts are equal. Some emit 50x more carbon than others. Here’s why.

When researchers delved into the tradeoff between AI sustainability and accuracy, they uncovered strategies for greener chatbots.

By Sarah DeWeerdt

June 24, 2025 in Anthropocene magazine

Some AI prompts result in 50 times more carbon emissions than others, according to a new study. The findings suggest that large language models (LLMs)—the technology behind advanced chatbots and ChatGPT—face a tradeoff between sustainability and accuracy when answering questions or responding to prompts from human users.

Generative AI models, a category that includes LLMs, consume an estimated 29.3 terawatt hours of electricity every year, roughly equivalent to Ireland’s annual energy consumption. But relatively little scientific research has focused on the environmental impacts of LLMs.

In the new study, researchers measured the carbon emissions generated by each of 14 different LLMs that were asked a series of 1,000 standard questions—500 multiple-choice and 500 free-response—covering philosophy, high school world history, international law, abstract algebra, and high school mathematics.

The LLMs in the study ranged in size from 7 billion to 72 billion parameters, referring to the controls or settings inside the model that calibrate how it responds. The study included both concise models, which produce short answers to prompts quickly and with minimal intermediate steps, and “reasoning-enabled” models, which explicitly describe the step-by-step logic underpinning their answers.

Larger models and reasoning-enabled models tend to yield more accurate answers, the researchers report in the journal Frontiers in Communication. But they also have a greater climate impact.

Cogito-70B, a reasoning-enabled model with 70 billion parameters, answered 84.9% of the questions correctly and generated 1.34 kilograms of carbon dioxide emissions. Qwen-7B, a concise model with just 7 billion parameters, only emitted 27.7 grams of carbon dioxde—but only got about one-third of the answers right.

People speak in words, while computers speak in a code of ones and zeroes. To translate between the two, LLMs generate “tokens”—words or parts of words that are converted into a string of numbers. Every token requires energy to produce and results in carbon emissions.

Recommended Reading:

Is the AI juice worth the carbon squeeze?

“Every extra point of accuracy usually comes with a markedly higher carbon cost, because larger or ‘reasoning-enabled’ models generate far more tokens per answer,” says study team member Maximilian Dauner, a graduate student at the Munich University of Applied Sciences.

The reasoning models generated an average 543.5 “thinking” tokens per question while the concise models generated just 37.7. These “thinking” tokens are used just to arrive at the answer; the models also produce additional “response” tokens to report their final answers.

Response tokens could be a surprising source of climate impact, Dauner adds. “Reasoning variants occasionally produced excessive verbosity despite being prompted for a simple answer,” he says. One model’s answer to an abstract algebra problem ran to 37,575 tokens, while another droned on for 14,187 tokens in response to a high school math question.

Meanwhile, LLMs are getting bigger and more complex. Additional studies will be necessary to understand the tradeoffs between accuracy and climate impact for LLMs like GPT-4 with hundreds of millions or even trillions of parameters.

The accuracy and climate impact of AI also depends on the subject matter, the researchers found. “Abstract algebra consistently forced all models, large or small, to think longer while still achieving the lowest accuracy,” says Dauner. “In contrast, factual history questions were answered quickly and correctly.”

Understanding these patterns can help people minimize the climate impact of their LLM use. “Ask for short answers (‘one word only,’ ‘bullet list,’ or ‘keep it under two sentences’) to reduce unnecessary response tokens, especially in reasoning mode,” Dauner suggests. “Use the smallest model that meets your quality needs.” And turn to LLMs for real information needs, not out of boredom or joking around.

Dauner and his collaborators are developing an algorithm to select the leanest LLM likely to yield a reliable answer to a given question, which could reduce emissions by an order of magnitude, he says. “Choosing ‘just-big-enough’ models and trimming superfluous reasoning is the fastest way to greener AI,” says Dauner.

Source: Dauner M. and G. Socher. “Energy costs of communicating with AI.” Frontiers in Communication 2025.

Leave a Comment