Understanding bufferbloat and the network buffer arms race
This WiFi network was hooked up to a fairly pathetic 128kbps ADSL line. This worked OK as long as I did some light Web browsing, but as soon as I started downloading a file, my browser became completely unworkable: every click took 10 seconds to register. It turned out that the ADSL router had a buffer that accommodated some 80 packets, so 10 seconds worth of packets belonging to my download would be occupying the buffers at any given time. Web packets had to join the conga line at the end and were delayed by 10 seconds. Not good.
Networks need buffers to function well. Think of a network as a road system where everyone drives at the maximum speed. When the road gets full, there are only two choices: crash into other cars, or get off the road and wait until things get better. The former isn’t as disastrous on a network as it would be in real life: losing packets in the middle of a communication session isn’t a big deal. (Losing them at the beginning or the end of a session can lead to some user-visible delays.) But making a packet wait for a short time is usually better than “dropping” it and having to wait for a retransmission.
For this reason, routers—but also switches and even cable or ADSL modems—have buffers that cause packets that can’t be transmitted immediately to be kept for a short time. Network traffic is inherently bursty, so buffers are necessary to smooth out the flow of traffic—without any buffering, it wouldn’t be possible to use the available bandwidth fully. Network stacks and/or device drivers also use some buffering, so the software can generate multiple packets at once, which are then transmitted one at a time by the network hardware. Incoming packets are also buffered until the CPU has time to look at them.
The trouble starts when the buffers in the network start to fill up. Suppose there’s a 64-packet buffer on the network card—although it would be hard to fill it entirely—and another 64 packets are buffered by the router. With 1500-byte Ethernet packets, that’s 192K of data being buffered. So TCP simply increases its buffer by 192K, assuming that the big quake happened and LA is now a bit further away than it used to be.
The waiting is the hardest part
Of course with all the router buffers filled up with packets from a single session, there’s no longer much room to accommodate the bursts that the router buffers were designed to smooth out, so more packets get lost. To add insult to injury, all this waiting in buffers can take a noticeable amount of time, especially on relatively low bandwidth networks.
Cringely gets many of the details wrong. To name a few: he posits that modems and routers pre-fetch and buffer data in case it’s needed later. Those simple devices—including the big routers in the core of the Internet—simply aren’t smart enough to do any of that. They just buffer data that flows through them for a fraction of a second to reduce the burstiness of network traffic and then immediately forget about it. Having more devices, each with their own buffers, doesn’t make the problem worse: there will be one network link that’s the bottleneck and fills up, and packets will be buffered there. The other links will run below capacity so the packets drain from those buffers faster than they arrive.
He mentions that TCP congestion control—not flow control, that’s something else—requires dropped packets to function, but that’s not entirely true. TCP’s transmission speed can be limited by the send and/or receive buffers and the round-trip time, or it can slow down because packets get lost. Both excessive buffering and excessive packet loss are unpleasant, so it’s good to find some middle ground.
Unfortunately, it looks like the router vendors and the network stack makers got into something of an arms race, pushing up buffer space at both ends. Or maybe, as Gettys suggests, it’s just that memory is so cheap these days. The network stacks need large buffers for sessions to high-bandwidth, far-away destinations. (I really like being able to transfer files from Amsterdam to Madrid at 7Mbps!) So it’s mostly up to the (home) router vendors to show restraint, and limit the amount of buffering they put in their products. Ideally, they should also use a good active queuing mechanism that avoids most of these problems either way.