Google’s AI supercomputer is powered by TPU chip, claimed to be up to 1.7 faster and 1.9 times more power-efficient than Nvidia’s A100 chip.
Heaptalk, Jakarta — Google announced new details of its supercomputer, particularly for training AI models using the Tensor Processing Unit (TPU) chip, claiming that the systems are better in speed and power efficiency than the Nvidia A100 chip.
Based on the scientific paper published by Google, the fourth-generation TPU chips are up to 1.7 faster and 1.9 times more power-efficient than a system based on Nvidia’s A100 chip.
In more detail, TPU v4 demonstrates how the company used its custom-developed optical switches to connect over 4,000 chips into a single computer based on Google’s scientific paper. This ability allows Google’s supercomputer to reconfigure connections between chips in real-time, improving performance, and avoiding issues.
The tech giant has built the TPU chip to work on over 90% of the company’s AI training, covering the process of providing data through models to make them beneficial for multiple tasks, for example responding to answering queries with human-like text or generating images.
The size of the large language model continues to escalate
Meanwhile, the increasing trend of large language models in AI is certainly tightening the competition among companies developing supercomputers. The large language models also experienced a rapid escalation in size, causing them to be too large to fit on a single chip.
These conditions forced the models to be separated across thousands of chips, which then have to work together for weeks to train the model. PaLM model, Google’s largest publicly disclosed language model to date, was trained by splitting it across two of the 4,000-chip supercomputers over 50 days.
According to Google, its supercomputers make it easy to reconfigure connections between chips on the fly, helping avoid problems and tweaks for performance gains. “Circuit switching makes it easy to route around failed components. This flexibility even allows us to change the topology of the supercomputer interconnect to accelerate the performance of an ML (machine learning) model,” explained Google Fellow Norm Jouppi and Google Distinguished Engineer David Patterson in a blog post.
The AI supercomputer has been functioning within the company since 2020 at a data center in Mayes County, Oklahoma. Google said startup Midjourney has worked with the system to train its model, which generates images after being provided with text instructions.