40g and 100G in the Data Center

View In Chinese

40G interconnects in the data center are poised to take off as all the various servers, switch, cabling and transceivers parts have finally come together. After about 6 months delay, Intel finally released its next generation "Romley" architecture offering 10-cores per microprocessor and the PCI Express 3.0 bus supporting faster I/O. Now the servers and switches are ready to go and the 40G interconnects using Direct Attach Copper (DAC), Active Optical Cables (AOCs) and optical transceivers poised to take off as the data center infrastructure is finally ready to start its major upgrade cycle.

10G Facing Many Issues
Once the server upgrades, faster uplinks to top-of-rack switches are needed. But the 1G-to-10G transition is fraught with issues. In the past, server supplier included 1GbE RJ-45 LAN--on- motherboard (LOM) for "free" but a dual port 10GBASE-T today cost far too much. With Cat5e almost free as well, the interconnect was never a serious cost issue. Now it is. Server companies offer 10G ports on pluggable "daughter cards" that block out aftermarket competitors and insure high prices. Daughter cards come in different flavors of 1G and 10GBASE-T, 2-4 SFP+ ports or dual QSFP with a path to 100G CXP and CFP/2 in the future. As server manufacturers are making a lot of money on the 10G/40G upgrades, this begs the question, "Will server companies ever return to the LOM model where buyers consider it a free-be?". Our answer is yes, but just in time for the jump to 40G! 10GBASE-T has had problems with high power consumption, size and cost and has left the door open for SFP+ DAC cabling to move in while 10GBASE-T suppliers build 28-nm parts. This changed the entire industry. But DAC has its share of issues as it "electrically" connects two different systems together and not all SFP+ ports are alike. LightCounting estimates that 2012 will show about 1 million 10GBASE-T ports actually filled representing about 500,000 links - almost what can be found in a single large data center today with 1GBASE-T! SFP+ DAC is shaping up to be about 2.5-3.5 million ports filled linking servers to top-of-rack switches at <7 meters. SFP+ AOCs are on the near term horizon whereas optical transceivers are typically used to link switches together over reaches out to 100 meters. LightCounting forecasts about 6 million 10G SFP+ SR and LR optical transceivers will ship in 2012.

40G the "next big thing"
Upgrading server-switch links from 1G to 10G, forces switch uplinks to jump to 40G connecting top-of-rack to end-of-row and aggregation switch layers. As data center operators emerge from the economic recession, budgets are still very tight and "incremental upgrades" are the way operators are buying. Adding 10G/40G links "as needed" is the current buying practice. While "100G" seems to get all the trade show and press coverage, 40G is where the money is for the next 2-3 years. Data centers are just hitting the need for about 4-6G - not yet 10G; so many data centers are in a transitional, upgrade-as-needed state. The Google, Facebook, Microsoft etc. so called Mega Data Centers at $1 billion a piece do not represent the main stay of the data center although they garner a lot of attention and awe.

Chasing this transceiver opportunity, multiple transceiver suppliers have jumped at offering 40G QSFP SR transceivers and Ethernet AOCs for < 50 meters. Over 10 transceiver companies announced transceivers and AOCs and more suppliers are coming! Technical barriers to entry are low and cost sensitive Internet Data Centers (especially in China) are likely to gobble these up in volume. LightCounting expects the transceiver industry will do its traditional pricing act of "Let's all cut our own throats on price and see who bleeds to death last ". As a result 40G SR parts are likely to see a very rapid price drop from about $250 today to under $190 for fully compliant OEM prices next year. We even have seen "plug & hope they-play" parts at $65 - you get what you pay for! OEM prices for Ethernet AOCs can be found below at $190 - and that is for a complete link with both end and fiber! But at 25G signaling, the multi-mode fiber reach looking like 25m. Similarly, on the circuit boards 25G signaling shrinks trace length from 10-12" at 10G to 4-6 inches at 25G and will bring to popularity Embedded optical modules (EOMs) for mid-board use interconnecting rack internal electronics together.

The 40G QSFP MSA has a unique position of supporting short reach (SR) ~100 meters with multi-mode fibers or 10Km with duplex, single-mode fiber - all in the same QSFP switch port. Companies such as ColorChip, Sumitomo, and a few others offer LR4 QSFP parts and Opnext, NeoPhotonics, Finisar, InnoLight, etc. offer larger CFP devices. QSFP enables 36-44 per line card compared to only 4 CFP. Running at 32 Watts, at LightCounting we affectionately refer to the CFP as the "Compact Frying Pan" and although popular in telecom, it is not in datacom! OEM prices range from $2,000 to $3,500 depending on data center or telecom features.

 Implementing 100G is much more complex
Much noise has been made at industry conferences about the imminent need for tens of thousands of 100G medium reach links in the data center to support the "exa-flood" of traffic from server virtualization, big data, smartphones, tablets, and even software defined networking. 10-channel CXP is used for multi-mode primarily by the large core switching companies in both transceivers and AOCs. At 25G signaling for 4x25G, multi-mode noise spikes threatening to decrease the reach of multi-mode transceivers to 25-50 meters and may require FEC and/or equalization to reach 125 meters. This will drive the cost up for 25-125m reaches and close that cost gap with 2Km SMF transceivers further.

100G 2Km Problem
100G past 100 meters has proven frustratingly hard to implement and longer to develop than first expected. The IEEE 40/100G High Speed study group met in July and extended the study another 6 months to deal with the technical issues. For longer reaches, engineers are wrestling with trying to fit all the optics and electronics into new MSA packages and hit all the power, size, electrical and optical specs required. It is all achievable but at what cost and power is still an open issue. CFP/2 is not a given! Much debate still centers on zCXP vs CFP/2 for the next MSA with Molex and TE Connectivity backing zCXP - not yet CFP/2. Expect to see both in the market. Silicon photonics companies such as Luxtera, Kotura, LightWire/Cisco claim to be able to fit it into a QSFP! It's very important for IEEE groups to get the 25-28G line rate specifications right as it is a unique convergence point for a number of protocols - InfiniBand EDR at 26G, Ethernet at 25G, SAS 4.0 at 24G, Fibre Channel 28G and telecom at 28G. The result will be a lot of unit volume as the line rate will span many protocols in the data center besides Ethernet.

Today, there's no economically viable 100G solution for 100 meters to 600 meters (except for perhaps two-40G and two 10G transceivers). As data centers become bigger, this is a hot area and the center of debates in the IEEE community. One extra meter can bump the transceiver OEM cost from CXP at $1,000 to a telecom centric CFP at $16,000! Often referred to as 2 Km, it really refers to a reach of 400-600 meters and an optical budget of about 4-5 dB in a lossy data center environment with patch panels and dirty connectors. 10Km links need 6dB. Next generation lasers and CMOS electronics instead of SiGe are on the way but Mother Nature just keeps getting in the way of our industry PowerPoint slides!

40G & 100G Transceivers Basics
40G and 100G has two main "flavors" in the data center. Short reach (SR4) for ~100 meters using multi-mode and Long Reach (LR4) for 100 meters to 10Km using single-mode optics. The so called "nR4" yet undefined nomenclature is addressing the 2km 4dB issue. SR transceivers are typically used to connect compute clusters and various switches layers in data centers. Several SR transceivers can reach ~300 meters with OM4 fiber, but somewhere between 125-200 meters the economics of the fibers and transceivers justify converting to single-mode optics and even shorter at 25G signaling. 40G is typically deployed as four 10G lanes in QSFP or CFP MSAs. SR uses 8 multi-mode fibers (one for each direction), VCSEL lasers and the QSFP MSA. LR4 uses edge-emitting lasers and multiplexes the four 10G lanes onto two single-mode fibers capable of 10Km reach in a CFP MSA and soon CFP/2 and QSFP28 MSAs. At 40G, both SR4 and LR4 can be used in the same QSFP switch port without any issues - just plug & play - 1 meter to 10Km no problem. (Not so for 100G).

100G SR10 uses 20 multi-mode fibers, VCSELs and the CXP MSA and 100G LR4 uses CFP and 2 single-mode fibers. Although spec'd at 100 meters, SR10 CXP transceivers and AOCs are typically used to link together large aggregation and core switches at <50 meters as 20 multi-mode fibers becomes very expensive, very fast as the reach gets longer as multi-mode fiber is about three times more expensive than single-mode fiber. Only in 2012, have multiple transceiver companies started announcing CXP 100G SR transceivers whereas the 40G QSFP transceivers and AOCs has been available since about 2008. Going forward, 4x25G QSFP SR transceivers are likely to stunt the 10x10G CXP transceiver business.

The next few years will involve Intel's Romley new server architecture and subsequent silicon shrink, PCI Express 3.0, 10G uplinks to the top-of-rack switches and 40G uplinks in the switching infrastructure. 40G will be where the money is for the next 3 years but everyone can see that 100G will be the next stop with mid-board optics as well. IEEE will sort out the technical issues and the 100G infrastructure should kick in with volumes in late 2014. It is important for the community to get this right as 100G will be around for a very long time.

As the data center architectures evolve to a new model, so too are the transceiver interconnect schemes. Employing the traditional transceiver job security strategy of "When the profits stop, change the line rates and MSAs, confuse everybody, and lather, rinse, repeat" is in full play!

The jump to 25G signaling is likely to thin the vendor field considerably as the technology is becoming increasingly complex and less of a low cost manufacturing game with commodity parts. The jump to 25G ASICs will move many front panel transceivers to be mounted mid-board with embedded optical modules as well also changing the game.

Brad Smith is a Senior VP at, a market research company forecasting high-speed interconnects. This article is an excerpt from the soon to be released report, "40G & 100G Interconnects in the Data Center" and "Embedded Optical Modules" Report. At ECOC 2012, Brad will present on “Parallel Optics, AOCs and Embedded Optical Modules.” Join the presentation on Tuesday, September 18th at 1:20 PM in the Amsterdam RAI, Market Focus theatre.