LightCounting comments on CPO panel discussion at Photonics West
Optical Communications was not at the center of discussions at Photonics West, attended by 22,000 people this year. Yet, a well-attended panel on Co-Packaged Optics (CPO) raised an important question: why should we worry about the increasing power consumption of the optical transceivers, if networking accounts for just 2-3% of the total power used by Cloud datacenters?
Vipul Bhatt of Coherent presented data from a Science magazine article published two years ago, which showed that concerns about the increasing power consumptions of data centers are over-rated. This study compared the power consumption of all datacenters in 2010 and 2018, which only increased by about 8%. Extrapolating this trend to 2022-2023, suggests a very modest increase of 2-3%, because of continuing improvements in datacenter efficiency.
There are two main reasons for such improvements:
Power consumed by the networks accounted only for 1% in 2012, 2% in 2018 and around 3% by 2022. Should we even care about it?
Figure below presents our calculations of the power consumption of optical transceivers deployed in Cloud datacenters (in terms of annual deployments, not cumulative). Please note that the vertical scale is logarithmic. Any straight line in charts with a logarithmic scale should not be disregarded, as it indicates exponential growth. The power consumption will catch up with the skeptics before they know it and it may be too late to address the problem.
Based on this analysis, the total power of optical transceivers, deployed in Cloud datacenters in 2018-2022, adds up to 330MW or 1.2TWh - just above 1% of the total power consumption of Cloud datacenters now. The problem is that by 2028, the optics is projected to account for more than 8% of that. This analysis accounts for continuous improvements in power efficiency of pluggable optics: from 35 pJ/bit in 100G modules to 20pJ/bit in 800G transceivers.
Should we be concerned about the optics accounting for 8% of the total power consumed by Cloud datacenters in 2028?
Yes, we should. Operators of Cloud DC face significant constraints in provisioning more electricity to their facilities. If the optics consumes more power, they will be forced to reduce budgets allocated to servers and memory.
A more significant problem is that designs of AI Clusters are severely limited by the high power and cost of optical connectivity. Nvidia claims that they could use 32x more optics now, if they were not limited by power and cost. Execution of AI models requires large arrays of GPUs and high bandwidth optical connectivity will be the best solution, if suppliers can lower its power consumption and cost.
Next generation CPO designs of Ayar Labs, Broadcom, IBM and Ranovus are expected to reach 2-3pJ/bit in energy efficiency. Professor Rajeev Ram – a plenary speaker at Photonics West and co-founder of Ayar Labs, claims that 0.1pJ/bit are within reach of the existing technologies at 200G per lane. His team at MIT is working on low voltage modulators and improved detectors with an objective to reach interconnect power consumption of 0.001pJ/bit and below. Some of MIT’s solutions will use very low data (Mbps) rates to reduce power consumption. Ayar Labs’ approach is also based on using lower speed (64Gbps) NRZ optics to reach 2pJ/bit. Whether or not they can cross below 1pJ/bit with 112G per lane NRZ remains to be seen.
Our industry is at a crossroads. We can maintain the status quo, stay with pluggable optics and improve them gradually. The AI clusters will not scale as fast, but there will be other ways to optimize AI models, conforming to constraints of limited network bandwidth.
An alternative is to take a risk and develop new optical technologies with new packaging and fiber coupling designs to deliver radical improvements in cost and power efficiency. This approach would greatly benefit the development of AI and elevate our industry to a completely new level. This path is exciting, but the skeptics are right in saying that it is very challenging. Yet, it would be a mistake to miss out on this opportunity.
New optical component technologies are widely discussed in the industry, but one the most challenging aspects of CPO design is packaging and low loss fiber connectivity. Please join LightCounting at an upcoming webinar on this topic, hosted by the Open Compute Project (OCP).
LightCounting and OCP will also host a panel discussion on related topic: Optics in Future AI Systems: Interconnects, Switching and Processing
Full text of the research note is available to LightCounting subscribers at: https://www.lightcounting.com/login