Our eighth generation TPUs: two chips for the agentic era

Summary (EN)

Google introduced its eighth-generation Tensor Processing Units, unveiling two separate chips, TPU 8t and TPU 8i, that are explicitly designed around the infrastructure demands of large-scale model training and low-latency inference in agentic systems. Google says TPU 8t is optimized for massive training workloads, while TPU 8i is tuned for latency-sensitive serving and multi-agent interactions, reflecting the view that AI agents create different system bottlenecks than earlier single-response models. The company describes the two-chip strategy as the result of several years of anticipating rising inference demand and the need to specialize hardware rather than rely on one general-purpose accelerator profile. According to the post, TPU 8t can scale to a 9,600-chip superpod with two petabytes of shared high-bandwidth memory and nearly triple the compute performance per pod versus the previous generation, while also targeting high goodput through reliability and serviceability improvements. Google also says its new Virgo network fabric, along with JAX and Pathways, enables near-linear scaling to much larger logical clusters. More broadly, the release frames custom silicon, networking, software, and model co-design as the core reason TPUs remain strategic for Google’s frontier AI stack. The significance of the announcement is that it connects the next phase of AI applications, especially agent-based systems, to specialized infrastructure choices. It also shows how cloud competition in AI is increasingly being fought not only at the model layer, but through differentiated chips built for specific training and inference economics.

Summary (ZH)

Google 发布了第八代 Tensor Processing Unit，推出 TPU 8t 和 TPU 8i 两款分工明确的芯片，分别面向大规模模型训练和低时延推理场景，并明确将其定位为“agentic era”的基础设施。Google 表示，TPU 8t 主要针对高强度训练任务，而 TPU 8i 则面向对延迟极其敏感的推理与多智能体交互工作负载，这反映出 AI 智能体对系统提出了与传统单轮生成模型不同的新要求。Google 解释称，之所以采用双芯片路线，是因为数年前就预判到推理需求会上升，同时训练和服务侧的最优硬件设计不再一致，因此需要更强的专用化。根据官方数据，TPU 8t 可扩展到由 9,600 枚芯片组成的 superpod，配备 2 PB 共享高带宽内存，并在每个 pod 的计算性能上较上一代提升接近 3 倍，同时通过可靠性与可维护性设计来提升有效算力利用率。Google 还称，新的 Virgo 网络架构结合 JAX 和 Pathways，可支持更接近线性的超大规模集群扩展。整体来看，这次发布不仅是新芯片推出，更是 Google 强调“芯片、网络、软件和模型协同设计”仍然是其前沿 AI 基础设施战略核心的体现。其重要性在于，它将未来 AI 应用，尤其是智能体系统的能力边界，直接与底层专用硬件选择联系起来，也说明 AI 云竞争正越来越多地在面向特定训练和推理经济性的芯片层展开。

Source

https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/