Our eighth generation TPUs: two chips for the agentic era

Summary (EN)

Google introduced its eighth-generation Tensor Processing Units, splitting the lineup into two purpose-built architectures, TPU 8t for large-scale training and TPU 8i for low-latency inference. The company said both chips were designed with Google DeepMind to meet the infrastructure demands of agentic systems, where models must reason through multi-step workflows, interact with other agents, and learn in continuous loops. TPU 8t is positioned as the training system, scaling to 9,600 chips in a superpod with two petabytes of shared high-bandwidth memory, 121 exaflops of compute, 10 times faster storage access, and nearly 3 times the per-pod compute performance of the previous generation. TPU 8i is optimized for inference and reasoning workloads, adding 288 GB of high-bandwidth memory, 384 MB of on-chip SRAM, higher interconnect bandwidth, an on-chip Collectives Acceleration Engine, and Axion Arm-based CPU hosts. Google says the result is 80% better performance per dollar than the prior generation for inference-heavy workloads. Both chips also target up to 2 times better performance per watt than Ironwood, supported by fourth-generation liquid cooling and tighter system co-design across compute, networking, and data center infrastructure. Google said the systems will become generally available later in 2026 as part of Google AI Hypercomputer, with support for JAX, PyTorch, SGLang, vLLM, bare-metal access, and open-source tools such as MaxText and Tunix.

Summary (ZH)

Google 发布第八代 TPU,并首次明确分为两条面向不同任务的架构路线,即用于大规模训练的 TPU 8t,以及面向低时延推理的 TPU 8i。公司表示,两款芯片与 Google DeepMind 协同设计,目标是满足 agentic AI 的基础设施需求,因为这类系统需要进行多步推理、执行复杂工作流、与其他代理协作,并在连续循环中学习。TPU 8t 被定位为训练平台,可在单个 superpod 中扩展到 9,600 颗芯片,提供 2 PB 共享高带宽内存、121 exaflops 算力、10 倍更快的存储访问速度,以及相较上一代每 pod 接近 3 倍的算力提升。TPU 8i 则针对推理与 reasoning 工作负载优化,配备 288 GB 高带宽内存、384 MB 片上 SRAM、更高的互连带宽、片上 Collectives Acceleration Engine,以及基于 Axion 的 Arm CPU 主机。Google 称其在推理密集型场景下可实现较上一代 80% 更优的每美元性能。两款芯片在能效方面也有显著提升,目标是相对 Ironwood 达到最多 2 倍的每瓦性能,并通过第四代液冷与从芯片、网络到数据中心的一体化协同设计实现。Google 表示,这两套系统将在 2026 年稍晚作为 Google AI Hypercomputer 的一部分正式商用,并支持 JAX、PyTorch、SGLang、vLLM、裸金属访问以及 MaxText、Tunix 等开源工具。

Source

https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/