by Bob Wheeler
August marked the in-person return of Hot Chips conference at Stanford University in California, and the sold-out 35th edition included plenty of deep technical content. AI/ML garnered lots of attention and optical interconnects were featured in both chip- and system-level AI and HPC talks. NVIDIA’s chief scientist, Bill Dally, keynoted Day 2 with a talk reviewing how accelerators achieved a 1,000x performance increase over the last 10 years. His big-picture view provided excellent context for AI-system design, but networking received only an honorable mention this year. Instead, Dally discussed future directions for accelerated compute.
Following the keynote, an ML-Training session presented talks from Google and Cerebras. The technical lead for TPUs at Google, Norm Jouppi made it clear he could only discuss the n-1 generation, meaning TPUv4. Meanwhile, Google revealed the TPUv5e at its own Google Cloud Next event the same day but provided only high-level specifications. Jouppi and his colleague, Andy Swing, largely reviewed details of the TPUv4 supercomputer already presented in a paper at ISCA 2023 and covered in some of our recent reports. That paper revealed Google's use of optical circuit switches in its TPUv4 cluster, following prior disclosures around OCS deployments in its datacenter spine layer.
Although Cerebras is best known for its wafer-scale AI accelerator, its talk explored its cluster architecture. Sean Li, cofounder and chief hardware architect, showed how the company builds supercomputers with up to 192 CS-2 systems, each of which contains the WSE-2 wafer-scale engine. This unique cluster design disaggregates memory into separate external memory systems called MemoryX which store weights for massive AI models. The network fabric, branded SwarmX, connects MemoryX nodes to multiple CS-2 nodes.
The Interconnects session, also on Day 2, included talks from NVIDIA, Lightelligence, and Intel. NVIDIA’s VP of Networking, Kevin Deierling, talked about why runtime programmability is needed in network data planes. The presentation revealed that both BlueField DPUs and Spectrum switch chips support the P4 programming language.
Silicon-photonics startup Lightelligence disclosed new details of its recently announced Hummingbird AI accelerator. Maurice Steinman, VP of engineering, talked about how this 3D system-in-package stacks a compute die with 64 SIMD cores on top of a silicon-photonic die that provides an optical network-on-a-chip (NoC). The photonic chip (PIC) implements a unique optical broadcast network using a U-shaped waveguide that underlays the compute die.
Jason Howard, a principal engineer at Intel, delivered the final talk in the Interconnects session. His talk was equal parts esoteric and groundbreaking, as the DARPA HIVE program funded Intel’s development. The esoteric part is the target workload: petabyte-scale graph analytics. This workload led Intel to develop a multicore processor with a custom instruction set. The portion of interest to our audience, however, is co-packaged optics for a mesh-to-mesh photonic fabric. To enable linear performance scaling, each graph-computation ASIC includes 1TB/s of optical bandwidth to create a glueless low-latency interconnect between sockets.
With so much focus on GPUs such as NVIDIA’s Hopper, networking is an afterthought for much of the Hot Chips audience. What is clear from many of this year’s talks, however, is that training the latest large language models requires supercomputer-scale computing. With capital flowing into generative-AI infrastructure, optical innovation is experiencing a corresponding boost.
The complete text of the research note is available to LightCounting subscribers at: https://www.lightcounting.com/login