Research Note

September 2024 Alibaba and Tencent launched two new initiatives to scale up AI Clusters at ODCC 2024

September 2024
 

Abstract

LightCounting reports from Open Data Center Committee (ODCC) event

ODCC conference featured two major Scale Up related project launches and several related presentations: ALink, initiated by Alibaba, and ETH-X supernode project, initiated by Tencent. The two projects have the same strategic objective but differ in tactical approaches. The ALink seems to be a long-term initiative, while ETH-X plans to deliver the first cluster prototype in 12 months from now.

The ALink industrial alliance was jointly established by 18 entities, including CAICT, Alibaba Cloud, AMD, Huaqin Technology, H3C, Inspur, Netforward, Kiwimoore, Cintra, and others. The alliance members cover GPU chips, interconnect chips, server hardware, cloud computing and other industrial fields.

The alliance aims to promote the unified construction of Scale Up interconnect system standards and to develop the next generation of AI interconnect network hardware and software systems, compatible with GPUs manufactured by different vendors. Currently, ALS (ALink System) has established a comprehensive system covering protocols, chips, hardware devices, and software platforms. This system supports UALink on the ALS-D data plane and provides a unified interface specification as well as management and control software platform on the ALS-M management and control plane.


Another important project announced was the ETH-X Supernode project. The project is jointly promoted by CAICT, Tencent, Kuaishou, Enflame Technology, Birentech, Huaqin Technology, Ruijie, H3C, Yunbao Intelligence, Clounix, Centec, Luxshare, Accelink, and others.

The project was presented at the ODCC Network Working Group in May 2024. The project team has identified a limitation in the current typical eight-card server, which restricts the scale of Tensor parallelism. The project aims to address this limitation by developing super nodes with more than 16 cards, featuring ultra-high bandwidth within the nodes.

The ETH-X Supernode project introduces a new direction for exploration, with the goal of achieving high bandwidth capacity through Ethernet technology and establishing an open source Scale Up supernode system.

The project plans to complete the development of the hardware and software of the ETH-X hypernode prototype and the verification test of related business systems by the fall of 2025, and publish the technical specification 1.0 of the ETH-X hypernode.

Full text of the research note is available to LightCounting subscribers at: https://www.lightcounting.com/login

Price: $595

ADD

Meet the Author(s)

Carol Cao
Carol Cao
Senior Analyst

Ready to connect with LightCounting?

Enabling effective decision-making based on a unique combination of quantitative and qualitative analysis.
Reach us at info@lightcounting.com

Contact Us