Tomahawk-3

Announced Christmas 2017.

There are lots of trade rag articles that all seem to have been derived from Broadcom's product press release. I will not regurgitate.

For those attempting to keep the team roster in order, a similar announcement for this ASIC's predecessor used Roman numerals (tomahawk-ii) while the new part is Tomahawk 3. It is probably not right to make a big deal about this. Family part numbers for Tomahawk3 taken from the datasheet referenced at the end of this page:

    DeviceI/O BandwidthBuffer
    BCM5698012.8 Tbps64 MB
    BCM569828.0 Tbps64 MB
    BCM569836.4 Tbps32 MB
    BCM569846.4 Tbps64 MB

There are 256 Serdes in the parent BCM56980 part. Each runs at 50 Gb/s, so 12.8 Tb/s. Through the power of long division, the BCM56982 part will have 160 Serdes. Off into the world of wild speculation, a single 400 Gb/s port will have eight lanes. So there's a new QSFP-DD format with twice the lane density. A Top of Rack configuration might have 48 100 Gb/s ports along with eight 400 Gb/s uplink ports. It would need 160 Serdes. Something like that. To break the ports out at 100 Gb/s will result in 128 ports with 2 lanes each. That works for 2 x 50 Gb/s lanes, but that has no backward capability with QSFP28 which uses 4 lanes. Inphi makes gearbox ASICs that can break a 50 Gb/s PAM4 lane in to two NRZ 25 Gb/s lanes.

There is a series announcement. Perhaps more illuminating, Broadcom's architect talked about switch design generally with Tomahawk-3 as an example. If there is a take home it is that there are multiple competing design objectives and assumptions customers might make may not be same as the manufacturer.

Since there are no products for this just-announced ASIC, it is interesting to consider the design challenges at this density. The IEEE has been doing just that.

Update March 2018

Broadcom disclosed some additional details in a press release slide deck. Quoting

    New, state-of-the-art, integrated 12.8Tbps shared-buffer architecture offers 3X to 5X higher incast absorption and provides the highest performance and lowest end-to-end latency for RoCEv2 based workloads
. This does not say that Tomahawk-3 has a fully shared buffer. Nvidia/Mellanox believes it does not. On the not-so-good side, BCM remains shy about disclosing the amount of packet memory. Tomahawk and Tomahawk-ii have their packet memory divided into four separate chunks, each managed by its own slice.

In October 2018 an announcement of Arista products incorporating the Tomahawk-3 revealed that packet buffer is 64 MB. See the table up top for current family members.

Update November 2019

Not really news -- just an observation. Per the introductory slide deck the SERDES can run in either PAM4 50 Gb/s or NRZ 25 Gb/s mode. So 100 Gb/s QSFPs can be plugged into QSFP-DD slots and just work. But clearly burning a lot of SERDES at less than 50 Gb/s will hurt max performance. A design that will accomodate 100 Gb/s QSFP and preserve max performance will use a gearbox for its speed match function. Facebook Minipack and Arista 7368 do this.

November 2019 Buffer update

The actual claim in the product into sheet is that Tomahawk-3 is 4X better at burst absorption. It is not made clear 4X better than what. A discussion on the OCP-network call on November 11,2019 made it clear that at minimum, there is a hard split into two 32 MB partitions. Each partition is further split into quads, but these may be configurable. Into this discussion it was said that Broadcom probably doesn't want these details revealed outside those who have signed nondisclosure agreements. That stopped the discussion. Sigh.

A final note -- and a datasheet

Starting mid-2017 a datasheet was released for the Tomahawk-3. Note that a section at the end shows evolution of this datasheet over 2 years.

Cat fight!!

Tolly posted a paper in June 2021 singing the virtues of Nvidia Spectrum-3 ASIC over an unnamed rival referred to a commodity Silicon. The speeds and feeds of the Spectrum-3 (32 ports of 400 Gb/s) brackets the possible contenders for the honor of unknown switch. It could be a Broadcom Tomahawk-3. Or perhaps it is the silicon behind Cisco's Nexus 9332-GX2B. It is hard to imagine tnat any of the providers besides Cisco and Broadcom could be the commodity player.

Broadcom has gone to some trouble to make the case that they didn't do it. They published a blog Tomahawk 3 performance vs. Tolly's Commodity switch on October 21 2021. The original Tolly report is copyrighted so I won't include it here. It is available as report #221125. A summary of the Broadcom report is:

    We have repeated the Tolly studies on our Tomahawk-3 ASIC and clearly we are not the commodity switch of which they speak.
The Cisco Nexus 9332-GX2B has been out long enough to be covered by a CiscoLive! presentation, but for Covid. The datasheet says it is a two-slice design but I really need to see diagrams to be clear what they're talking about.

A bright point in the Broadcom blog is that it appears that -- perhaps with careful parameter tuning -- as much as 62 MBytes of the packet buffer is available to absorb a microburst. That is a nice number to have.