Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google Cloud unveiled its seventh-generation Tensor Processing Unit (TPU) called Ironwood on Wednesday, a custom AI accelerator that the company claims delivers more than 24 times the computing power of the world’s fastest supercomputer when deployed at scale. The new chip, announced at Google Cloud Next ’25 , represents a significant pivot in Google’s decade-long AI chip development strategy.
While previous generations of TPUs were designed primarily for both training and inference workloads, Ironwood is the first purpose-built specifically for inference — the process of deploying trained AI models to make predictions or generate responses. “Ironwood is built to support this next phase of generative AI and its tremendous computational and communication requirements,” said Amin Vahdat, Google’s Vice President and General Manager of ML, Systems, and Cloud AI, in a virtual press conference ahead of the event. “This is what we call the ‘age of inference’ where AI agents will proactively retrieve and generate data to collaboratively deliver insights and answers, not just data.
” Shattering computational barriers: Inside Ironwood’s 42.5 exaflops of AI muscle The technical specifications of Ironwood are striking. When scaled to 9,216 chips per pod, Ironwood delivers 42.
5 exaflops of computing power — dwarfing El Capitan ‘s 1.7 exaflops, currently the world’s fastest supercomputer. Each individual Ironwood chip delivers peak compute of 4,614 teraflops.
Ironwood also features significant memory and bandwidth improvements. Each chip comes with 192GB of High Bandwidth Memory (HBM), six times more than Trillium , Google’s previous-generation TPU announced last year. Memory bandwidth reaches 7.
2 terabits per second per chip, a 4.5x improvement over Trillium. Perhaps most importantly in an era of power-constrained data centers, Ironwood delivers twice the performance per watt compared to Trillium , and is nearly 30 times more power efficient than Google’s first Cloud TPU from 2018.
“At a time when available power is one of the constraints for delivering AI capabilities, we deliver significantly more capacity per watt for customer workloads,” Vahdat explained. From model building to ‘thinking machines’: Why Google’s inference focus matters now The emphasis on inference rather than training represents a significant inflection point in the AI timeline. For years, the industry has been fixated on building increasingly massive foundation models, with companies competing primarily on parameter size and training capabilities.
Google’s pivot to inference optimization suggests we’re entering a new phase where deployment efficiency and reasoning capabilities take center stage. This transition makes sense. Training happens once, but inference operations occur billions of times daily as users interact with AI systems.
The economics of AI are increasingly tied to inference costs, especially as models grow more complex and computationally intensive. During the press conference, Vahdat revealed that Google has observed a 10x year-over-year increase in demand for AI compute over the past eight years — a staggering factor of 100 million overall. No amount of Moore’s Law progression could satisfy this growth curve without specialized architectures like Ironwood.
What’s particularly notable is the focus on “thinking models” that perform complex reasoning tasks rather than simple pattern recognition. This suggests Google sees the future of AI not just in larger models, but in models that can break down problems, reason through multiple steps, and essentially simulate human-like thought processes. Gemini’s thinking engine: How Google’s next-gen models leverage advanced hardware Google is positioning Ironwood as the foundation for its most advanced AI models, including Gemini 2.
5 , which the company describes as having “thinking capabilities natively built in.” At the conference, Google also announced Gemini 2.5 Flash , a more cost-effective version of its flagship model that “adjusts the depth of reasoning based on a prompt’s complexity.
” While Gemini 2.5 Pro is designed for complex use cases like drug discovery and financial modeling, Gemini 2.5 Flash is positioned for everyday applications where responsiveness is critical.
The company also demonstrated its full suite of generative media models, including text-to-image, text-to-video, and a newly announced text-to-music capability called Lyria . A demonstration showed how these tools could be used together to create a complete promotional video for a concert. Beyond silicon: Google’s comprehensive infrastructure strategy includes network and software Ironwood is just one part of Google’s broader AI infrastructure strategy.
The company also announced Cloud WAN , a managed wide-area network service that gives businesses access to Google’s planet-scale private network infrastructure. “Cloud WAN is a fully managed, viable and secure enterprise networking backbone that provides up to 40% improved network performance, while also reducing total cost of ownership by that same 40%,” Vahdat said. Google is also expanding its software offerings for AI workloads, including Pathways , its machine learning runtime developed by Google DeepMind.
Pathways on Google Cloud allows customers to scale out model serving across hundreds of TPUs. AI economics: How Google’s $12 billion cloud business plans to win the efficiency war These hardware and software announcements come at a crucial time for Google Cloud, which reported $12 billion in Q4 2024 revenue , up 30% year over year, in its latest earnings report. The economics of AI deployment are increasingly becoming a differentiating factor in the cloud wars.
Google faces intense competition from Microsoft Azure , which has leveraged its OpenAI partnership into a formidable market position, and Amazon Web Services , which continues to expand its Trainium and Inferentia chip offerings. What separates Google’s approach is its vertical integration. While rivals have partnerships with chip manufacturers or acquired startups, Google has been developing TPUs in-house for over a decade.
This gives the company unparalleled control over its AI stack, from silicon to software to services. By bringing this technology to enterprise customers, Google is betting that its hard-won experience building chips for Search, Gmail, and YouTube will translate into competitive advantages in the enterprise market. The strategy is clear: offer the same infrastructure that powers Google’s own AI, at scale, to anyone willing to pay for it.
The multi-agent ecosystem: Google’s audacious plan for AI systems that work together Beyond hardware, Google outlined a vision for AI centered around multi-agent systems. The company announced an Agent Development Kit (ADK) that allows developers to build systems where multiple AI agents can work together. Perhaps most significantly, Google announced an “agent-to-agent interoperability protocol” (A2A) that enables AI agents built on different frameworks and by different vendors to communicate with each other.
“2025 will be a transition year where generative AI shifts from answering single questions to solving complex problems through agented systems,” Vahdat predicted. Google is partnering with more than 50 industry leaders, including Salesforce , ServiceNow , and SAP , to advance this interoperability standard. Enterprise reality check: What Ironwood’s power and efficiency mean for your AI strategy For enterprises deploying AI, these announcements could significantly reduce the cost and complexity of running sophisticated AI models.
Ironwood’s improved efficiency could make running advanced reasoning models more economical, while the agent interoperability protocol could help businesses avoid vendor lock-in. The real-world impact of these advancements shouldn’t be underestimated. Many organizations have been reluctant to deploy advanced AI models due to prohibitive infrastructure costs and energy consumption.
If Google can deliver on its performance-per-watt promises, we could see a new wave of AI adoption in industries that have thus far remained on the sidelines. The multi-agent approach is equally significant for enterprises overwhelmed by the complexity of deploying AI across different systems and vendors. By standardizing how AI systems communicate, Google is attempting to break down the silos that have limited AI’s enterprise impact.
During the press conference, Google emphasized that over 400 customer stories would be shared at Next ’25, showcasing real business impact from its AI innovations. The silicon arms race: Will Google’s custom chips and open standards reshape AI’s future? As AI continues to advance, the infrastructure powering it will become increasingly critical. Google’s investments in specialized hardware like Ironwood, combined with its agent interoperability initiatives, suggest the company is positioning itself for a future where AI becomes more distributed, more complex, and more deeply integrated into business operations.
“Leading thinking models like Gemini 2.5 and the Nobel Prize winning AlphaFold all run on TPUs today,” Vahdat noted. “With Ironwood we can’t wait to see what AI breakthroughs are sparked by our own developers and Google Cloud customers when it becomes available later this year.
” The strategic implications extend beyond Google’s own business. By pushing for open standards in agent communication while maintaining proprietary advantages in hardware, Google is attempting a delicate balancing act. The company wants the broader ecosystem to flourish (with Google infrastructure underneath), while still maintaining competitive differentiation.
How quickly competitors respond to Google’s hardware advancements and whether the industry coalesces around the proposed agent interoperability standards will be key factors to watch in the months ahead. If history is any guide, we can expect Microsoft and Amazon to counter with their own inference optimization strategies, potentially setting up a three-way race to build the most efficient AI infrastructure stack. If you want to impress your boss, VB Daily has you covered.
We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI. Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here .
An error occured..
Technology
Google’s new Ironwood chip is 24x more powerful than the world’s fastest supercomputer

Google unveils Ironwood, its seventh-generation TPU chip delivering 42.5 exaflops of AI compute power — 24x more than the world's fastest supercomputer — ushering in the "age of inference."