Bufferbloat on Packet Pushers

Back in October, I had the pleasure of chatting with Ethan Banks on the Packet Pushers podcast.

In it, we talked about the definition of bufferbloat, and how it harms the performance of VoIP, gaming, and general internet use. (It’s the reason our children say, “The internet is slow today, Daddy.”) I also described how the CoDel algorithm, and its fq_codel implementation in Linux makes the problem go away.

The most fun was the CoDel demonstration, which occurs near the end of the podcast. The podcast was recorded over Skype. When I turned on many netperf upload and download sessions to fully load my 7mbps/768kbps DSL link, the quality degraded without fq_codel. But it bounced right back when I re-enabled it.

Listen to the podcast – PQ Show 57 – Improve Your Home Internet Performance Using CoDel

WiTi Router Board update

I have the Witi Router working now with SQM-QoS – the bufferbloat-fighting technology that has been implemented in OpenWrt. Read more about Bufferbloat at Bufferbloat and the Ski Shop.

Demetris has made a build of OpenWrt Chaos Calmer (15.05) that has SQM-QoS built-in, so it’s a simple download and flash of the firmware. You can get a copy of the firmware at the mqmaker forum.

I haven’t tried it myself, but it looks as if stas2z’s BB version of the firmware also has the SQM-QoS as well at Alternate openwrt at mqmaker.

mqmaker.com WiTi Router – a new toy

I recently got my WiTi board from mqmaker.com – a high performance open-source router platform. It was funded from a IndieGoGo project and was based on OpenWrt, and thus is easy to customize.

The project doesn’t have a lot of documentation yet, so I just updated its description in the OpenWrt wiki to describe the initial out-of-box setup (pretty smooth – no surprises). Read it at: https://wiki.openwrt.org/toh/mqmaker/witi

Comment to the FCC re: Consumer Router Firmware

Update: The deadline for filing has been extended to 9 October 2015, so you now don’t have any excuse for not sending in your opinion.

I filed the following comment on the FCC page regarding the new Proposed Rule that would (ultimately) prohibit individuals from installing third-party firmware (such as OpenWrt, DD-WRT, Tomato, and the myriad others) on their home router.

If you have an opinion on this, you should go to the FCC Comment Page and click the green SUBMIT A FORMAL COMMENT button.

You’re free to plagiarize my note, or take a cue from the Save WiFi page at LibrePlanet.org. I wrote:

I recommend that the FCC RESCIND its Proposed Rule, Document number 2015-18402 regarding wireless devices. The Proposed Rule is overbroad, would harm many communities of Americans, and is not warranted by the facts on the ground.

Although the FCC has the power to regulate equipment that generates radio frequencies, this is a heavy-handed rule that could be addressed other ways. Specifically, I am concerned about the ability of third parties to modify and create new firmware for “consumer routers.”

The proposed rule would require that router manufacturers lock down the RF portion of the router to obtain FCC approval. This “lock down” would prevent modification to the radio’s power, frequencies, etc to prevent it from radiating outside the specified limits. This is a laudable goal, but the application of this rule as written would result in undesirable consequences.

In practice, most radio functions are very tightly wedded to all the other factors of the hardware/software. The most likely way manufacturers would likely lock down the RF operation would be to make it impossible to modify any of the code in the routers.

There would be a number of adverse consequences both for me personally, to consumers in the US, and the networking industry. These consequences can be ameliorated by allowing the owners of routers to install their own code.

1) Security of the router. It is well known that vendor-supplied firmware for consumer routers often contain flaws. Just last week, the CERT released knowledge of a vulnerability to Belkin routers. See http://www.kb.cert.org/vuls/id/201168 The ability to install well-tested, secure firmware into a router benefits all consumers. The ability for a person to update their own router on a regular basis (as opposed to many manufacturer’s seemingly lackadaisical schedule) preserves security.

2) Research into the field of computer networking. Non-traditional research efforts (outside academia) lead to real improvements in the state of computer networking. An example is the CeroWrt project that developed the fq_codel algorithm. http://www.bufferbloat.net/projects/cerowrt The result of this multi-year effort was a major advance in performance for all routers. The fq_codel code has been accepted into the Linux kernel and now runs in hundreds of millions of devices. As a member of the team that worked on this, I assert that without the ease of modification of a consumer router to prove out the ideas, this improvement would likely not have occurred.

3) Personal learning environments. Individuals, as well as network professionals, often use these consumer routers as test beds for increased understanding of network operation. Losing the ability to reprogram the router would make it more expensive, if not prohibitive, for Americans to improve their knowledge and become more competitive.

4) I would incorporate all the other “talking points” listed on the Save WiFI page at: https://libreplanet.org/wiki/Save_WiFi

5) Finally, I want to address the FCC’s original concern – that these consumer routers are SDRs, and they must not be operated outside their original design parameters. While the goal of reducing radio frequency interference is important, the FCC has failed to demonstrate that the widespread practice of installing/updating firmware in consumer routers has caused actual problems. Furthermore, the FCC can use its current enforcement powers to monitor and shut down equipment that is interfering.

Creating a broad, wide-ranging rule to address a theoretical problem harms industry and individuals, and is an overreach of the rules necessary to preserve America’s airwaves.

Bufferbloat and the Ski Shop

Bufferbloat is undesirable latency caused by a router buffering too much data. It makes your kids say, “The Internet is slow today, Daddy”. It’s caused by routers and other network equipment buffering (accepting for delivery) more data than can be delivered in a timely way. Bufferbloat causes much of the poor performance and human pain experienced using today’s Internet.

Update: I found a way better analogy for explaining Bufferbloat. I write about it in my post Best Bufferbloat Analogy – Ever

First a story…

Imagine a ski shop with one employee. That employee handles everything: small purchases, renting skis, installing new bindings, making repairs, etc. He also handles customers in first-come, first-served order, and accepts all the jobs, even if there’s already a big backlog. Imagine, too, that he never stops working with a customer until their purchase is complete. He never goes out of order, never pauses a job in the middle, not even to sell a Chapstick.

That’s dumb, you say. No store would do that. Their customers – if they had any left – would get really terrible service, and would never know when they’re going to be served. And you would be right.

The Ski-Shop Router

Unfortunately, a lot of network routers (both home and commercial) work just like that fictitious ski shop. And they give terrible service. These routers have a single first-in, first-out queue for packets. All traffic is queued, regardless of whether it’s a big or small packet, whether there has been a lot of traffic from a particular source, or whether things are getting backed up.

Since the router has no global knowledge of what’s happening, it cannot inform a big sender to slow down, or throttle a particular stream of traffic by discarding some data (in the ski shop analogy, sending away a customer who has another long repair job). The dumb router simply accepts the data, buffers it up, and expects to send it sooner or later. To make matters worse, in networking (but not in ski shops), if delays get long enough, computers can resend the data, thinking that the original packet must have been lost. These retransmissions further increase bloat and delay, because there are now two copies of the same data buffered up, waiting to be sent…

This is the genesis of the name “bufferbloat” – the router’s memory gets bloated with buffers of packets. When the router doggedly determines to send that data, it blocks newer sessions from even starting, and the entire network gets slow.

What’s the solution?

The members of the CeroWrt team have been working for the last three years to solve the problem of bufferbloat. We’ve largely succeeded: the CeroWrt firmware works really well. CeroWrt users no longer see problems with “the internet being slow” even when uploading and downloading files, watching videos, etc. We have pushed those changes into the Linux Kernel, and also into OpenWrt mainline, and now its Barrier Breaker release contains the Smart Queue Management (SQM) fixes.

SQM introduces a new queueing discipline called fq_codel (see http://tools.ietf.org/html/draft-nichols-tsvwg-codel-02 and http://tools.ietf.org/html/draft-hoeiland-joergensen-aqm-fq-codel-00 for details) that can detect which flows (streams of data between two endpoints) are using more than their share of the bottleneck link (usually, the connection to the ISP).

SQM divides the traffic into multiple queues, one per flow, and sends packets from each queue in round-robin order. (The algorithm is somewhat more involved, so read the full description for details.) fq_codel also measures the time that each packet has been queued (its sojourn time). If packets for a flow have been in the queue for “too long”, then fq_codel either marks them for ECN (Explicit Congestion Notification), or discards a certain percentage of them, preventing the flow from using more than its fair share of bandwidth on the bottleneck.

Wait a minute – discarding packets? Doesn’t that make things worse?

It does slow the rate for the affected flow, but that is exactly what should happen. If a sender has sent so many packets that they’re building up in the router’s memory, then the router must offer back pressure for that flow by dropping/marking some of its packets.

In the meantime, all the other flows (in their own queues) have their packets sent promptly, since they’re not building up and their sojourn time stays low. This automatically keeps everything responsive: short packets, and those from low-volume flows automatically get sent first. The big senders, whose packets are dropped/marked, will re-send the data, but at a slower rate, bringing the entire system back into balance.

What about Quality of Service (QoS)? Doesn’t that help?

Yes, it helps a bit. If you configure your router for QoS, the router will use that information to prioritize certain packets. Good QoS settings can help certain traffic types by sending it first, ahead of the bulk traffic that’s buffered up. But there are several problems with QoS:

  • It doesn’t solve the problem of overbuffering. The QoS rules allow certain packets to go to the head of the queue. But the buffers from large flows are still there, and will have to be sent at some point. And those buffers will stand in the way of any newly arrived traffic that hasn’t been prioritized.
  • As a corollary, there’s no throttling of the big senders: they don’t get early feedback that they are using more than their fair share of the capacity. If the queue/delay gets large enough, the sender could even retransmit the data, making the overall situation worse.
  • It’s annoying to configure QoS. You have to understand how to configure the router and manually make the changes. This is something that only a network geek could come to enjoy. It’s also a maintenance hassle: if you hear about a new application, it may not work well until you adjust the QoS rules to take it into account.
  • Finally, QoS doesn’t help for the download direction. It can improve traffic being sent from your local (home) network toward the Internet. But if the equipment at the far end at your DSL or cable provider is bloated (and very often it is), then QoS in your router won’t make things any better.

The fq_codel and other algorithms in CeroWrt handle all this automatically. The only configuration parameters are what kind of link you have (DSL, Cable, etc.) and the speeds of those links. You don’t have to adjust QoS settings or make other adjustments.

Is my network affected by Bufferbloat?

Quite possibly – here’s one symptom: If the network works well when no one else is using it (early morning, or late at night after everyone else is asleep), but gets slow when others are on the net, then you are suffering from Bufferbloat. Another symptom is if your voice, video chat, or gaming degrade when others are using the network.

Here’s a more scientific test. The DSLReports Speed Test http://dslreports.com/speedtest runs a latency measurement during the download and upload speed tests. (Most speed test sites only measure a few pings when the line is idle.) If the latency gets high on this test, then your router is probably bloated. You can also check out the Quick Test for Bufferbloat on the CeroWrt site for additional information.

What can I do about this?

Three years of network research have paid off: the networks work great at our houses. Our algorithms have been adopted and implemented in the Linux kernel, other operating systems, and an increasing set of commercial network equipment. Our changes have been pushed in to the OpenWRT project. We are making the code available at no charge, and are encouraging all vendors to embrace it.

Regrettably, nearly every piece of equipment with a network connection – home router, ISP headend and DSLAM, phone, personal computer, laptop, tablet, even big commercial routers and switches – needs to have some form of SQM installed. We now have the technology, and it’s simple, but it needs to be deployed.

TL;DR – What you can do about Bufferbloat at your home

  • Consider installing the stable OpenWrt Barrier Breaker firmware on your router. The luci-app-sqm and sqm-scripts packages include the enhancements that we’ve tested and then pushed into the OpenWrt mainline source code. Use the Supported Devices page to find your router, then read the SQM/fq_codel HOWTO. If you don’t want to do all that…
  • Call your router vendor’s support line. With the information from the DSLReports Speed Test in hand, you can mention that the ping times get really high when up/downloading files, and that it really hurts your network performance. Ask if they’re working on the problem, and when they’re going to release a firmware update that solves it. Leave a comment here with their response – I’d love to hear.

An earlier draft of this note appeared in the Bloat and Codel mailing lists. See https://lists.bufferbloat.net/pipermail/codel/2014-February/000802.html Latest update 3May2015

CeroWrt 3.10.50-1 Field Report

Comments on the CeroWrt 3.10.50-1 build that I installed on my WNDR3700v2 primary router:

  • Seemed to install and configure properly (I retrieved the updated version with the lighthttpd fix)

  • I used a local copy of config-cerowrt.sh (from https://github.com/richb-hanover/CeroWrtScripts#config-cerowrtsh) to configure the router to have my own DSL user/pw, SSID names, etc.

  • mDNS seems to work across subnets. (I don’t think that it worked properly for the last few CeroWrt releases)

  • The first time I looked, /usr/lib/CeroWrtScripts didn’t seem to have been updated with the latest from github.com (see link above). But I looked again (after a reboot, and running the config-cerowrt.sh script, and another reboot (maybe something else)), and the script matches the github version… Not sure how that could have happened.

  • Debloating seems to be working as expected. netperf-wrapper with RRUL shows a jump from 60 to 80 msec (normal on my 7mbps/768kbps DSL link). Here’s the plot:

speedof.me

Doc Searls mentioned in passing that he uses a new speed test website. I checked it out, and it was very cool…

www.speedof.me is an all-HTML5 website that seems to make accurate measurements of the up and download speeds of your internet connection. It’s also very attractive, and the real-time plots of the speed show interesting info. (See below.)

Now if we could get them to include a latency measurement so people could point out their bufferbloated equipment.

New CeroWrt Router scripts

I posted a set of scripts that people can use to test, configure and debug their CeroWrt router installations. CeroWrt router firmware is a test bed for learning about and eliminating bufferbloat.

The scripts are available on Github at https://github.com/richb-hanover/CeroWrtScripts. They include:

  • betterspeedtest.sh – a script that emulates the famous (but limited) speedtest.net. This script is better because it measures ping latency during the download and upload, and tells you how bad (or not) the bloat is in your network/router.

  • netperfrunner.sh – a script that emulates the RRUL test suite. It runs on the CeroWrt router (or your laptop) to measure the latency under heavy load.

  • config-cerowrt.sh and tunnelbroker.sh – two scripts for configuring a CeroWrt router to a consistent state after flashing new firmware.

  • cerostats.sh – a script that collects a number of configuration and operational stats that is useful for debugging.