10GE TOR (Top Of Rack) port buffers (was Re: 10G switch recommendaton)
Hi, 

Is there a reason switch vendors 1U TOR 10GE aggregation switches are 
all cut-through and there are no models with deep buffers? 
I've ben looking at all vendors I can think of and all have the same models. 

TOR switches as cut-through with little buffers, and chassis based 
boxes with deep buffers. 

TOR: 
Juniper EX4500 208KB/10GE (4MB shared per PFE) 
Cisco 4900M 728KB/10GE (17.5MB shared) 
Cisco Nexus 3064 140KB/10GE (9MB shared) 
Cisco Nexus 5000 680KB/10GE 
Force10 S2410 I can't find it anymore, but it wasn't much 
Arista 7148SX 123KB/10GE (80KB per port plus 5MB dynamic) 
Arista 7050S 173KB/10GE (9MB shared) 
Brocade VDX 6730-32 170KB/10GE 
Brocade TurboIron 24X 85KB/10GE 
HP 6600-24XG 4500KB/10GE 
HP 5820-24XG-SFP+ 87KB/10GE 
Extreme Summit X650 375KB/10GE 

Chassis: 
Juniper EX8200-8XS 512MB/10GE 
Cisco WS-X6708-10GE 32MB/10GE (or 24MB) 
Cisco N7K-M132XP-12 36MB/10GE 
Arista DCS-7548S-LC 48MB/10GE 
Brocade BR-MLX-10Gx8-X 128MB/10GE (not sure) 

1GE aggregation. 
Force10 S60 1250MB shared 
HP 5830 3000MB shared 

I am at a loss why there are no 10GE TOR switches with deep buffers. 
Apparently there is a need for deep buffers as the vendors make them 
available in the chassis linecards. 
There also are deep buffer 1GE aggregation switches. 

Is there some (technical) reason for this? 
I can imagine some vendors would say that you need to scale up to a 
chassis if you need deep buffers, but at least one vendor should be 
able to get quite some customers with a 10G deep buffer TOR switch. 

I understand that flow-control should prevent loss with microbursts, 
but in my customers get adverse effects, with strong negative 
performance if they let flow-control do its thing. 

Any pointers why this is, or if there is a solution for microburst 
loss would be greatly appreciated. 

Thanks, 

Bas
 
saku at ytti 

Jan 27, 2012, 8:55 AM 

Post #2 of 27 (2118 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
On (2012-01-27 17:35 +0100), bas wrote: 

> Chassis: 
> Juniper EX8200-8XS 512MB/10GE 
> Cisco WS-X6708-10GE 32MB/10GE (or 24MB) 
> Cisco N7K-M132XP-12 36MB/10GE 
> Arista DCS-7548S-LC 48MB/10GE 
> Brocade BR-MLX-10Gx8-X 128MB/10GE (not sure) 
> 
> 1GE aggregation. 
> Force10 S60 1250MB shared 
> HP 5830 3000MB shared 

I'd take some of these with grain of salt, take EX8200-8XS, PDF 
indeed does agree: 
--- 
Total buffer size is 512 MB on each EX8200-8XS 10-Gigabit 
Ethernet port or each EX8200-40XS port group, and 42 MB 
on each EX8200-48T and EX8200-48F Gigabit Ethernet port, 
providing 50-100 ms of bandwidth delay buffering 
--- 

However 512MB is about 400ms of buffering, while 512Mb is 50ms. So I think 
JNPR PDF is just wrong. 
Similar error may exist for some other quoted numbers. 

But generally nice list, especially the 10GE fixed config looked realistic, 
sometimes I wish we'd have 'dpreview' style page for routers and switches, 
especially now with dozen or more vendors selling 'same' trident+ switch, 
differentiating them is hard. 

-- 
++ytti
 

	
 
tom.ammon at utah 

Jan 27, 2012, 9:55 AM 

Post #3 of 27 (2122 views) 

RE: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]

The HP6600 is a store and forward, not a cut-through. The HP reps that I have
dealt with seem to be pretty open to sharing architecture drawings of their 
stuff, so I bet you could probably get your hands on the same one that I have.
Their NDA is a mutual disclosure, though, so that might make things tough
depending on your organization's policies. 

Tom 


 

	
 
kilobit at gmail 

Jan 27, 2012, 1:40 PM 

Post #4 of 27 (2107 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
On Fri, Jan 27, 2012 at 5:55 PM, Saku Ytti  wrote: 
> On (2012-01-27 17:35 +0100), bas wrote: 
> But generally nice list, especially the 10GE fixed config looked realistic, 
> sometimes I wish we'd have 'dpreview' style page for routers and switches, 
> especially now with dozen or more vendors selling 'same' trident+ switch, 
> differentiating them is hard. 

But do you generally agree that "the market" has a requirement for a 
deep-buffer TOR switch? 

Or am I crazy for thinking that my customers need such a solution? 

Bas
 

	

Jan 27, 2012, 1:52 PM 

Post #5 of 27 (2117 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
In a message written on Fri, Jan 27, 2012 at 10:40:03PM +0100, bas wrote: 
> But do you generally agree that "the market" has a requirement for a 
> deep-buffer TOR switch? 
> 
> Or am I crazy for thinking that my customers need such a solution? 

You're crazy. :) 

You need to google "bufferbloat", which while the aim has been more 
at (SOHO) routers that have absurd (multi-second) buffers, the 
concepts at play work here as well. 

Let's say you have a VOIP application with 250ms of jitter tolerance, 
and you're going 80ms across country. You then add in a switch on 
one end that has 300ms of buffer. 

Ooops, you go way over, but only from time to time when the switch 
is full, getting 300+80ms of latency for a few packets. 

Dropped packets are a _GOOD_ thing. If your ethernet switch can't 
get the packet out another port in ~1-2ms it should drop it. The 
output port is congested, congestion is what tells the sender to 
back off. If you buffer the packets you get congestion collapse, 
which is far worse for throughput in the end, and in particular has 
severely detremental effects on the others on the LAN, not just the box 
filling the buffers. 

A network dropping packets is healthy, telling the upstream boxes 
to throttle to the appropiate speeds with packet loss which is how 
TCP operates. I can' tell you how many times I've seen network 
engineers tell me "no matter how big I make the buffers performance 
gets worse and worse". Well duh, you're just introducing more and 
more latency in your network, and making TCP backoff fail, rather 
than work properly. I go in and slash their 50-100 packet buffers 
down to 5 and magically the network performs great, even when full. 

Now, how much buffer do you need? One packet is the minimum. If 
you can't buffer one packet it becomes hard to reach 100% utilization 
on a link. Anyone who's tried with a pure cut-through switch can 
tell you it tops out around 90% (with multiple senders to a single 
egress). Amazing one packet of buffer almost entirely fixes the 
problem. 

When I can manually set the buffers, I generally go for 1ms of buffers 
on high speed (e.g. 10GE) links, and might increase that to as much as 
15 ms of buffers on extremely low speed links, like sub-T1. 

Remember, your RTT will vary (jitter) +- the sum of all buffers on all 
hops along the path. A 10 hop path with 15ms per hop could see 150ms of 
jitter if all links go between full and not full! 

Buffers in most network gear is bad, don't do it. 

-- 
Leo Bicknell - bicknell [at] ufp - CCIE 3440 
PGP keys at http://www.ufp.org/~bicknell/
 

	
 
kilobit at gmail 

Jan 27, 2012, 2:30 PM 

Post #6 of 27 (2108 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
Hi, 

On Fri, Jan 27, 2012 at 10:52 PM, Leo Bicknell  wrote: 
> In a message written on Fri, Jan 27, 2012 at 10:40:03PM +0100, bas wrote: 
>> But do you generally agree that "the market" has a requirement for a 
>> deep-buffer TOR switch? 
>> 
>> Or am I crazy for thinking that my customers need such a solution? 
> 
> You're crazy. :) 
> 
> You need to google "bufferbloat", which while the aim has been more 
> at (SOHO) routers that have absurd (multi-second) buffers, the 
> concepts at play work here as well. 

While your reasoning holds truth it does not explain why the expensive 
chassis solution (good) makes my customers happy, and the cheaper TOR 
solution makes my customers unhappy..... 

Bufferbloat does not matter to them as jitter and latency does not matter. 
As long as the TCP window size negotioation is not reset the total 
amount of bit/sec increases for them. 

If deep buffers are bad I would expect high-end chassis solutions not 
to offer them either.  But the market seems to offer expensive deep buffer 
chassis solutions  and cheap (per 10GE) TOR solutions. 

IMHO there is no reasoning why.... 
(why the expensive solution is not offered in a 1U box) 

My customers want to buffer 10 to 24 * 10GE in a 1 or 2 10GE uplinks 
to do this they need some buffers.... 

Bas
 

	
 
gbonser at seven 

Jan 27, 2012, 2:36 PM 

Post #7 of 27 (2105 views) 

RE: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
> 
> Buffers in most network gear is bad, don't do it. 
> 

+1 

I'm amazed at how many will spend money on switches with more buffering but won't
take steps to ease the congestion. Part of the reason is trying to convince 
non-technical people that packet loss in and of itself doesn't have to be a bad 
thing, that it allows applications to adapt to network conditions. They can use 
tools to see packet loss, that gives them something to complain about. They don't 
know how to interpret jitter or understand what impact that has on their 
applications. They just know that they can run some placket blaster and see a 
packet dropped and want that to go away, so we end up in "every packet is 
precious" mode. 

They would rather have a download that starts and stops and starts and stops 
rather than have one that progresses smoothly from start to finish and trying to 
explain to them that performance is "bursty" because nobody wants to allow a 
packet to be dropped sails right over their heads. 

They'll accept crappy performance with no packet loss before they will accept 
better overall performance with an occasional packet lost. 

If an applications is truly intolerant of packet loss, then you need to address 
the congestion, not get bigger buffers.
 

	
 
kilobit at gmail 

Jan 27, 2012, 2:53 PM 

Post #8 of 27 (2102 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
While I agree _again_!!!!! 

It does not explain why TOR boxes have little buffers and chassis box 
have many..... 



	
 
gbonser at seven 

Jan 27, 2012, 3:01 PM 

Post #9 of 27 (2112 views) 

RE: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
> -----Original Message----- 
> From: bas 
> Sent: Friday, January 27, 2012 2:54 PM 
> To: George Bonser 
> Subject: Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) 
> 
> While I agree _again_!!!!! 
> 
> It does not explain why TOR boxes have little buffers and chassis box 
> have many..... 

Because that is what customers think they want so that is what they sell. Customers don't realize that the added buffers are killing performance. 

I have had network sales reps tell me "you want this switch over here, it has 
bigger buffers" when that is exactly the opposite of what I want unless I am 
sending a bunch of UDP through very brief microbursts. If you are sending TCP 
streams, what you want is less buffering. Spend the extra money on more 
bandwidth to relieve the congestion. 

Going to 4 10G aggregated uplinks instead of 2 might get you a much better 
performance boost than increasing buffers. But it really depends on the 
end to end application.
 

	
 
bicknell at ufp 

Jan 27, 2012, 3:03 PM 

Post #10 of 27 (2105 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
In a message written on Fri, Jan 27, 2012 at 11:30:14PM +0100, bas wrote: 
> While your reasoning holds truth it does not explain why the expensive 
> chassis solution (good) makes my customers happy, and the cheaper TOR 
> solution makes my customers unhappy..... 
> 
> Bufferbloat does not matter to them as jitter and latency does not matter. 
> As long as the TCP window size negotioation is not reset the total 
> amount of bit/sec increases for them. 

I obviously don't know your application. The bufferbloat problem 
exists for 99.99% of the standard applications in the world. There 
are, however, a few corner cases. For instance, if you want to 
move a _single_ TCP stream at more than 1Gbps you need deep buffers. 
Dropping a single packet slows throughput too much due to a slow-start 
event. For most of the world with hundreds or thousands of TCP 
streams across a single port, such problems never occur. 

> If deep buffers are bad I would expect high-end chassis solutions not 
> to offer them either. 
> But the market seems to offer expensive deep buffer chassis solutions 
> and cheap (per 10GE) TOR solutions. 

The margin on a top-of-rack switch is very low. 48 port gige with 
10GE uplinks are basically commodity boxes, with plenty of competition. 
Saving $100 on the bill of materials by cutting out some buffer 
makes the box more competitive when it's at a $2k price point. 

In contrast, large, modular chasses have a much higher margin. They are 
designed with great flexability, to take things like firewall modules 
and SSL accelerator cards. There are configs where you want some (not 
much) buffer due to these active appliances in the chassis, plus it is 
easier to hide an extra $100 of RAM in a $100k box. 

Also, as was pointed out to me privately, it is also important to loook 
at adaptive queue management features. The most famous is WRED, but 
there are other choices. Having a queue management solution on your 
routers and switches that works in concert with the congestion control 
mechanism used by the end stations always results in better goodput. 
Many of the low end switches have limited or no AQM choices, while the 
higher end switches with fancier ASICs can default to something like 
WRED. Be sure it is the deeper buffers that are making the difference, 
and not simply some queue management. 

-- 
Leo Bicknell - bicknell [at] ufp - CCIE 3440 
PGP keys at http://www.ufp.org/~bicknell/
 

	
 
kilobit at gmail 

Jan 27, 2012, 3:08 PM 

Post #11 of 27 (2110 views) 
 
Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
Hi, 

On Fri, Jan 27, 2012 at 11:54 PM, George Bonser  wrote: 
>> 
>> My customers want to buffer 10 to 24 * 10GE in a 1 or 2 10GE uplinks to 
>> do this they need some buffers.... 
>> 
>> Bas 
> 
> It might be cheaper for them to go to 3 or 4 10G uplinks than to 
> replace all their switch hardware. 

Im my (our) busines model _is_ the internet connectivity... 
We could give the customer double the port capacity, if they were 
willing to pay, but in real life they do not care... 

While all respondents replies hold truth a (technial business) logic. 
None shed a light why there isn't TOR box that does 10GE 
deepbuffers...
 

	
 
kilobit at gmail 

Jan 27, 2012, 3:24 PM 

Post #12 of 27 (2108 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
On Sat, Jan 28, 2012 at 12:01 AM, George Bonser  wrote:
 
> Going to 4 10G aggregated uplinks instead of 2 might get you a much 
> better performance boost than increasing buffers. 
> But it really depends on the end to end application. 

Also these TOR boxes go to my (more expensive ASR9K and MX) boxes, so 
from an CAPEX standpoint I simply do not want to give them more ports 
than required.
 

	
 
kilobit at gmail 

Jan 27, 2012, 3:30 PM 

Post #13 of 27 (2108 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
Hi, 

> The margin on a top-of-rack switch is very low.  48 port gige with 
> 10GE uplinks are basically commodity boxes, with plenty of competition. 
> Saving $100 on the bill of materials by cutting out some buffer 
> makes the box more competitive when it's at a $2k price point. 

The list of 10GE TOR switches I sent earlier are list from $20K to $100K 
So actual purchase cost for us would be $10K to $30K 
$500 for some (S)(Q)(bla)RAM shouldn't hold back a vendor from 
releasing a bitchin switch.... 

Again this argument does not explain why there are 1GE aggregation 
switches with deep buffers.. 

> Also, as was pointed out to me privately, it is also important to loook 
> at adaptive queue management features.  The most famous is WRED, but 
> there are other choices.  Having a queue management solution on your 
> routers and switches that works in concert with the congestion control 
> mechanism used by the end stations always results in better goodput. 
> Many of the low end switches have limited or no AQM choices, while the 
> higher end switches with fancier ASICs can default to something like 
> WRED.  Be sure it is the deeper buffers that are making the difference, 
> and not simply some queue management. 

All true... Still no reason why not to offer a deep buffer TOR...
 

	
 
joelja at bogus 

Jan 27, 2012, 3:32 PM 

Post #14 of 27 (2106 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
On 1/27/12 14:53 , bas wrote: 
> While I agree _again_!!!!! 
> 
> It does not explain why TOR boxes have little buffers and chassis box 
> have many..... 

you need purportionally more buffer when you need to drain 16 x 10 gig 
into 4 x 10Gig then when you're trying to drain 10Gb/s into 2 x 1Gb/s 

there's a big incentive bom wise to not use offchip dram buffer in a 
merchant silicon single chip switch vs something that's more complex. 



	
 
joelja at bogus 

Jan 27, 2012, 3:38 PM 

Post #15 of 27 (2109 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
On 1/27/12 15:01 , George Bonser wrote: 
> 
> 
>> -----Original Message----- From: bas Sent: Friday, January 27, 2012 
>> 2:54 PM To: George Bonser Subject: Re: 10GE TOR port buffers (was 
>> Re: 10G switch recommendaton) 
>> 
>> While I agree _again_!!!!! 
>> 
>> It does not explain why TOR boxes have little buffers and chassis 
>> box have many..... 
> 
> Because that is what customers think they want so that is what they 
> sell. Customers don't realize that the added buffers are killing 
> performance. 

It is possible, trivial in fact to buy a switch that has a buffer too 
small to provide stable performance at some high fraction of it's uplink 
utilization. You can differentiate between the enterprise/soho 1gig 
switch you bought to support your ip-phones and wireless APs and the 
datacenter spec 1u tor along these lines. 

It is also possible and in fact easy to have enough to accumulate 
latency in places where you should be discarding packets earlier. 

I'd rather not be in either situation, but in the later I can police my 
way out of it. 


> I have had network sales reps tell me "you want this switch over 
> here, it has bigger buffers" when that is exactly the opposite of 
> what I want unless I am sending a bunch of UDP through very brief 
> microbursts. If you are sending TCP streams, what you want is less 
> buffering. Spend the extra money on more bandwidth to relieve the 
> congestion. 
> 
> Going to 4 10G aggregated uplinks instead of 2 might get you a much 
> better performance boost than increasing buffers. But it really 
> depends on the end to end application. 
> 
> 
> 
 

	
 
kilobit at gmail 

Jan 27, 2012, 3:40 PM 

Post #16 of 27 (2106 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
Hi All, 

On Sat, Jan 28, 2012 at 12:32 AM, Joel jaeggli  wrote: 
> On 1/27/12 14:53 , bas wrote: 
>> While I agree _again_!!!!! 
>> 
>> It does not explain why TOR boxes have little buffers and chassis box 
>> have many..... 
> 
> you need purportionally more buffer when you need to drain 16 x 10 gig 
> into 4 x 10Gig then when you're trying to drain 10Gb/s into 2 x 1Gb/s 
> 
> there's a big incentive bom wise to not use offchip dram buffer in a 
> merchant silicon single chip switch vs something that's more complex. 

I'm almost ready to throw the towel in the ring, and declare myself a looney.. 
I can imagine at least one vendor ingnoring the extra BOM capex, and 
simpky try to please #$%^#@! like me. 

C NSP has been full with threads about appalling microburst 
performance of the 6500 for years.. 
One would think a vendor would jump to a copetitive edge like this...
 

	
 
joelja at bogus 

Jan 27, 2012, 4:00 PM 

Post #17 of 27 (2109 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
On 1/27/12 15:40 , bas wrote: 
> Hi All, 
> 
> On Sat, Jan 28, 2012 at 12:32 AM, Joel jaeggli  wrote: 
>> On 1/27/12 14:53 , bas wrote: 
>>> While I agree _again_!!!!! 
>>> 
>>> It does not explain why TOR boxes have little buffers and chassis box 
>>> have many..... 
>> 
>> you need purportionally more buffer when you need to drain 16 x 10 gig 
>> into 4 x 10Gig then when you're trying to drain 10Gb/s into 2 x 1Gb/s 
>> 
>> there's a big incentive bom wise to not use offchip dram buffer in a 
>> merchant silicon single chip switch vs something that's more complex. 
> 
> I'm almost ready to throw the towel in the ring, and declare myself a looney.. 
> I can imagine at least one vendor ingnoring the extra BOM capex, and 
> simpky try to please #$%^#@! like me. 
> 
> C NSP has been full with threads about appalling microburst 
> performance of the 6500 for years.. 

And people who care have been using something other than a c6500 for 
years. it's a 15 year old architecture, and it's had a pretty good run, 
but it's 2012. 

An ex8200 has 512MB per port on non-oversuscribed 10Gig ports and 42MB 
per port on 1Gig ports. that's a lot of ram. 

to take this back to actual tors. 

a broadcom 56840 based switch has something in the neighborhood of 9MB 
available for packet buffer on chip if you need more then more drams are 
in order. while the TOR can cut-through-switch the chassis can't. the 
tor is also probably not built with offchip cam (there are examples of 
off chip cam as well) for much the same reason. 

> One would think a vendor would jump to a copetitive edge like this... 
> 
 

	
 
nick at foobar 

Jan 27, 2012, 4:51 PM 

Post #18 of 27 (2100 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
On 27 Jan 2012, at 23:08, bas  wrote: 
> Im my (our) busines model _is_ the internet connectivity... 
> We could give the customer double the port capacity, if they were 
> willing to pay, but in real life they do not care... 
> 
> While all respondents replies hold truth a (technial business) logic. 
> None shed a light why there isn't TOR box that does 10GE deepbuffers 

There are a couple of reasons for this: first, dropping the amount of buffer 
space decreases the cost of the hardware. Secondly, you really only need 
large buffers when you need to shape traffic. Shaping traffic is important 
if you're down stepping from a faster port to a slower port (this is a 
common use case for a blade switch like a c6500), or else if you're 
running qos on the port and you need to implement sophisticated queuing 
and policing. You can't run qos effectively without having generous buffers 
which is why LAN switches typically have very little buffer space and metro 
Ethernet switches typically have lots. 

In the case of a tor switch, the use case is typically in a situation where 
you're not downstepping from a higher speed to a lower speed, and where you 
don't really need fancy qos. So as its not generally needed for the sort of 
things that tor switches are used for, its not added to the hardware spec. 

Nick
 

	
 
gbonser at seven 

Jan 27, 2012, 4:57 PM 

Post #19 of 27 (2093 views) 
 
RE: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
> 
> It is also possible and in fact easy to have enough to accumulate 
> latency in places where you should be discarding packets earlier. 
> 
> I'd rather not be in either situation, but in the later I can police my 
> way out of it. 

That is why I added the "it depends on the end to end application" caveat.
 

	
 
gbonser at seven 

Jan 27, 2012, 5:03 PM 

Post #20 of 27 (2099 views) 

RE: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]

I assumed since he was asking about a "top of rack" (TOR) switch, he was 
actually using it as a top of rack switch and adding a couple more uplinks 
to his core would be cheaper than replacing all the hardware. Not 
understanding the topology and the application makes good recommendations 
a crap shoot, at best. 


From: Nick Hilliard 
Sent: Friday, January 27, 2012 4:51 PM 
To: bas 
Cc: George Bonser; nanog 
Subject: Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) 


In the case of a tor switch, the use case is typically in a situation 
where you're not downstepping from a higher speed to a lower speed, 
and where you don't really need fancy qos. ÂSo as its not generally 
needed for the sort of things that tor switches are used for, its notÂ
added to the hardware spec. 

NickÂ
 

	
 
bicknell at ufp 

Jan 27, 2012, 5:19 PM 

Post #21 of 27 (2092 views) 
 
Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
In a message written on Fri, Jan 27, 2012 at 04:00:36PM -0800, Joel jaeggli wrote:
 
> And people who care have been using something other than a c6500 for 
> years. it's a 15 year old architecture, and it's had a pretty good run, 
> but it's 2012. 

One of the frustrating things, which the c6500 embodies best, is 
that the chassis has had many generations of linecards. 

It came out in 1999, running CatOS, with a 32Gbps shared bus. 

It exists now as a IOS box with a 720Gbps bus, running distributed 
switching. 

While you can call both a 6500, they share little more than some 
sheet metal, fans, and copper traces on the backplane. Wisdom 
learned running CatOS on 1st generation cards flat out does not 
apply to current generation cards. And woe be the admin who mixes 
and matches generations of cards, there are a million different 
configurations and pitfalls. 

Cisco is not the only vendor, and the 6500 is not the only product with 
this problem. It makes conversation extremely difficult though, you 
can't say a "6500 has xyz property" without detailing a lot more about 
the config of the box. 

-- 
Leo Bicknell - bicknell [at] ufp - CCIE 3440 
PGP keys at http://www.ufp.org/~bicknell/
 

	
 
lukasz at bromirski 

Jan 27, 2012, 6:32 PM 

Post #22 of 27 (2100 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
On 2012-01-28 01:00, Joel jaeggli wrote: 

>> C NSP has been full with threads about appalling microburst 
>> performance of the 6500 for years.. 
> And people who care have been using something other than a c6500 for 
> years. it's a 15 year old architecture, and it's had a pretty good run, 
> but it's 2012. 
> An ex8200 has 512MB per port on non-oversuscribed 10Gig ports and 42MB 
> per port on 1Gig ports. that's a lot of ram. 

6500 has up to 256MB for non-oversubscribed 10GE ports. People 
complaining about microburst tend to use the cheapest 6704 linecard, 
and 'microbursts' are a problem seen across most of the products that 
don't even try to have a 1/12th of a 6500 history. 

Everyone has it's own problems, and as people already said, not 
understanding the way properly sized buffers influence the way TCP 
traffic behaves can do more harm than good. 

-- 
"There's no sense in being precise when | £ukasz Bromirski 
you don't know what you're talking | jid:lbromirski [at] jabber 
about." John von Neumann | http://lukasz.bromirski.net
 

	
 
saku at ytti 

Jan 28, 2012, 2:07 AM 

Post #23 of 27 (2087 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
On (2012-01-27 22:40 +0100), bas wrote: 

> But do you generally agree that "the market" has a requirement for a 
> deep-buffer TOR switch? 
> 
> Or am I crazy for thinking that my customers need such a solution? 

No, you're not crazy. If your core is higher rate than your customer, then 
you need at minimum serialization delay difference of buffering. 
If core is 10G and access 100M, you need buffer for minimum of 100 packets, 
to handle the single 10G incoming, without any extra buffering. 

Now if you add QoS on top of this, you probably need 100 per each class you 
are going to support. 
And if switch does support QoS but operator configures only BE, and 
operator does not limit BE queue size, operator will see buffer bloat, and 
think it's clueless vendor dropping expensive memory there for the lulz, 
while it's just misconfigured box. 

When it comes to these trident+ 64x10GE/48x10GE+4x40G, your serialization 
delay difference between interfaces is minimal, and so is buffering demand. 

-- 
++ytti
 

	
 
mohta at necom830 

Jan 28, 2012, 4:06 AM 

Post #24 of 27 (2087 views) 

Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
Saku Ytti wrote: 

> No, you're not crazy. If your core is higher rate than your customer, 
> then you need at minimum serialization delay difference of buffering. 
> If core is 10G and access 100M, you need buffer for minimum of 100 
> packets, to handle the single 10G incoming, without any extra buffering. 

The required amount of memory is merely 150KB. 

> Now if you add QoS on top of this, you probably need 100 per each 
> class you are going to support. 

If you have 10 classes, it is still 1.5MB. 

> And if switch does support QoS but operator configures only BE, and 
> operator does not limit BE queue size, operator will see buffer bloat, 

1.5MB @ 10Gbps is only 1.2ms, which is not buffer bloat. 

Masataka Ohta
 

	
 
saku at ytti 

Jan 28, 2012, 4:38 AM 

Post #25 of 27 (2080 views) 
 
Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
On (2012-01-28 21:06 +0900), Masataka Ohta wrote: 

> The required amount of memory is merely 150KB. 

Assuming we don't support jumbo frames and switch cannot queue sub packet 
sizes (normally they can't but VXR at least has 512B cell concept, so 
tx-ring is packet size agnostic, but this is just PA-A3) 

> If you have 10 classes, it is still 1.5MB. 

Yup, that's not bad at all in 100M port, infact 10 classes would be quite 
much. 

> > And if switch does support QoS but operator configures only BE, and 
> > operator does not limit BE queue size, operator will see buffer bloat, 
> 
> 1.5MB @ 10Gbps is only 1.2ms, which is not buffer bloat. 

You can't buffer these in ingress or you risk HOLB issue, you must buffer 
these in the egress 100M and drop in ingress if egress buffer is full. 

But I fully agree, it's not buffer bloat. But having switch which does 
support very different traffic rates in ingress and egress (ingress could 
even be LACP, which further mandates larger buffers on egress) and if you 
also need to support QoS towards customer, the amount of buffer quickly 
reaches the level some of these vendors are supporting. 
When it becomes buffer bloat, is when inexperienced operator allows all of 
the buffer to be used for single class in matching ingress/egress rates. 

-- 
++ytti

Post #26 of 27
 
Saku Ytti wrote: 

>>> And if switch does support QoS but operator configures only BE, and 
>>> operator does not limit BE queue size, operator will see buffer bloat, 
>> 
>> 1.5MB @ 10Gbps is only 1.2ms, which is not buffer bloat. 
> 
> You can't buffer these in ingress or you risk HOLB issue, you must buffer 
> these in the egress 100M and drop in ingress if egress buffer is full. 

1.5MB @ 100Mbps is 120ms, which is prohibitively lengthy 
even as BE. 

The solution is to have less number of classes. 

For QoS assurance, you only need to have two classes for 
infinitely many flows with different QoS, if flows in higher 
priority class receive policing against reserved bandwidths 
of the flow. 

Masataka Ohta 



> 
> But I fully agree, it's not buffer bloat. But having switch which does 
> support very different traffic rates in ingress and egress (ingress could 
> even be LACP, which further mandates larger buffers on egress) and if you 
> also need to support QoS towards customer, the amount of buffer quickly 
> reaches the level some of these vendors are supporting. 
> When it becomes buffer bloat, is when inexperienced operator allows all of 
> the buffer to be used for single class in matching ingress/egress rates. 
> 
 

	
 
saku at ytti 

Jan 28, 2012, 5:09 AM 

Post #27 of 27 (221 views) 
 
Re: 10GE TOR port buffers (was Re: 10G switch recommendaton) [In reply to]
On (2012-01-28 21:53 +0900), Masataka Ohta wrote: 

> 1.5MB @ 100Mbps is 120ms, which is prohibitively lengthy 
> even as BE. 
> 
> The solution is to have less number of classes. 

The solution is to per class define max queue size, so user with fewer 
queues configured will not use all available buffer in remaining queues. 
JNPR MX is happy to buffer >4s on 10GE on QX interfaces. Reading some posts 
on this thread seems to imply vendor is not knowing what they are doing, 
but in this case there is good reason why there is potentially lot of 
buffer space and it's simply operator mistake not to limit it if 
application is just single class in single vlan/untagged 10G interface 

-- 
++ytti