Photonics-enabled disaggregated computing
LightCounting discusses highlights from Supercomputing Conference: SC 2023
The 12m-tall Blue Bear peering into the Colorado Convention Center hosting SC23.
Supercomputing Conference 2023 (SC23) set a record with over 14,000 attendees at the event held in Denver, Colorado from November 12-17, 2023.
Two themes dominated this year’s event: photonics’ growing role and the seeming contradiction between the pace of development of high-performance computing and that of AI supercomputers.
Using photonics, such protocols as PCI Express (PCIe) and the Compute Express Link (CXL) can be sent over fiber enabling system disaggregation and novel supercomputing architectures in the data center.
Drut Technologies, a systems start-up that emerged at SC22, used last year’s event to unveil its server disaggregation architecture. This year the start-up revealed its growing ambitions. It is developing an architecture that extends the concept to span the data center. Its DynamicXcelerator (DX) architecture will support up to 4,096 accelerators using optical switching, similar to how Google interconnects its tensor processor unit (TPU) clusters.
Other photonic show highlights include optical interconnect demonstrations from Avicena that showed what it claims is the world’s smallest 1 terabit-per-second (Tb/s) microLED-based transceiver. Ayar Labs showed its optical input-output (I/O) TeraPHY chiplets embedded with an Intel FPGA, while Lightelligence demonstrated memory disaggregation using PCIe/CXL over optical links.
SC23 was also where the latest Top500 supercomputers were unveiled. This year included a supercomputer - Microsoft Azure’s Eagle - in the Top 3, the first time a commercial machine featured so highly, and which was spun up by Microsoft in a week.
One trend the Top500 highlights is how high-performance computing is slowing down. Up till 2013, high performance computing was growing by 1000x every 11 years but since then it has slowed considerably. The Top500 committee believes computation is now growing at under 10x every 11 years. In contrast, hyperscalers are seeing a doubling in growth models every 3 to 4 months in AI’s computational needs and this will continue for the foreseeable future.
There are several reasons why high-performance computing and AI supercomputers have different growth gradients.
The processing required for high-performance computing is varied and hugely demanding. The discipline is thus the first to encounter key limitations given it is at the leading edge of computation. In contrast, computation for AI and machine learning is more specialized and the hyperscalers are doing an outstanding job accumulating gains across the board: at the processor’s instruction level, floating point math representations, the core, chip, and memory, and at the blade level.
Also, how blades are scaled up and scaled out to make up supercomputing systems by using advanced networking technologies and topologies. In turn, the exponential growth in the demands for computation for AI will not continue indefinitely without bottlenecks emerging. Like HPC, this will require new thinking.
The issues of HPC and AI computational bottlenecks were addressed in two sessions: a panel discussion on the role of optical I/O for future AI and high-performance computing systems and how chiplets could benefit high-performance computing and AI.
The focus of the supercomputing conference is software, algorithms, and applications. But hardware - processors, memory and interconnect including optics – has a key presence too. For example, the latest Compute Express Link (CXL) specification - version 3.1, the first upgrade in over a year - was announced at the show.
One surprising statement made by Nvidia during the event press conference is that NVLink networks are not using any optical connectivity. This means that the optical transceivers and AOCs deployed by Nvidia are used mostly for InfiniBand connections and some for Ethernet. LightCounting will discuss implications of this development in January 2024 report titled “Optics for AI”. More details on this new report are available in 2024 Research Calendar and report’s table of content is available on request.
Full text of the research note is available to LightCounting subscribers at: https://www.lightcounting.com/login