Why AI Training Consumes So Much Power
Training a large language model is one of the most energy-intensive computing tasks ever attempted. A single training run for a frontier AI model can require thousands of graphics processing units running continuously for weeks or months. Each GPU consumes between 350 and 700 watts at full load. A training cluster of 10,000 GPUs draws between 3.5 and 7 megawatts from the GPUs alone. When cooling, networking, power distribution, and storage are included, total facility power for a large training run can exceed 25 megawatts sustained over several months.
A 2025 Congressional Research Service report estimated that training a specific large AI model required a total power draw of 25.3 megawatts. Another study found that training a single frontier model consumed 50 gigawatt-hours of energy, enough to power San Francisco for three days. These figures represent individual training runs. Companies like OpenAI, Google, Meta, and Anthropic run multiple training experiments continuously, with each generation of model requiring substantially more compute than the last.
The Hardware Behind AI Power Consumption
The power consumption of AI training is driven primarily by GPUs and the infrastructure that supports them. NVIDIA’s H100 GPU, which became the standard training accelerator in 2023 and 2024, has a thermal design power of 700 watts. Its successor, the B200, pushes even higher. A single server containing eight H100 GPUs draws approximately 10 kilowatts. A rack filled with these servers can require 40 to 80 kilowatts, compared to 7 to 15 kilowatts for a traditional enterprise server rack.
This power density creates cascading demands. More power per rack means more heat per rack, requiring more sophisticated and energy-intensive cooling. The networking equipment connecting thousands of GPUs at high speed adds additional load. Storage systems feeding training data to the cluster run continuously. The total infrastructure power requirement for a 100-megawatt AI training campus dwarfs that of most traditional data center deployments.
How Training Clusters Are Organized
AI training requires not just raw power but coordinated power. Thousands of GPUs must communicate at extremely high bandwidth with minimal latency. This means training clusters are physically concentrated, typically filling multiple rows of racks in the same data hall, connected by high-speed InfiniBand or Ethernet networks. The physical proximity requirement means the power demand is concentrated in a small area, making it harder to cool and harder to supply from the local grid.
Training runs also demand consistent, uninterrupted power. A power fluctuation that causes even a few GPUs to fail can corrupt the training state and require restarting from a checkpoint, wasting hours or days of compute. This makes power reliability as important as power quantity. Most AI training facilities require dual-feed utility power, on-site backup generation, and sophisticated power conditioning equipment.
The Grid Impact
The concentrated, sustained power demands of AI training are fundamentally different from traditional data center loads. A hyperscale cloud data center might draw 50 megawatts with variable utilization throughout the day. An AI training facility might draw 100 megawatts at near-constant load around the clock. This baseload characteristic makes AI training facilities attractive to utilities in some respects but also strains local transmission and distribution infrastructure.
In Northern Virginia, the world’s largest data center market with approximately 4,000 megawatts of capacity, utilities like Dominion Energy are investing billions in grid upgrades to serve growing AI demand. In Texas, data center power demand is expected to reach 9.7 gigawatts in 2025. Some developers are pursuing behind-the-meter power solutions, including on-site natural gas generation and fuel cells, to bypass grid constraints entirely.
The Trajectory
AI training power demand is growing faster than any other category of electricity consumption. Gartner projects that AI-optimized servers will account for 44 percent of total data center power consumption by 2030, up from 21 percent in 2025. The power required to train each successive generation of frontier model has been growing roughly tenfold every two to three years. Whether efficiency improvements in chip design and training algorithms can moderate this trajectory remains one of the most consequential questions for both the AI and energy industries.
Leave a comment