IEPM

Bandwidth to the World - Formal Measurements

Bulk Throughput Measurements | Bulk Throughput Simulation | Windows vs. streams | Effect of load on RTT and loss | Bulk file transfer measurements

Introduction

The Bandwidth to the World project was designed to demonstrate the current data transfer capabilities to several sites with high performance links, worldwide. In a sense the site at SC2001 was acting like a High Energy Physics tier 0 or tier 1 site (an accelerator or major computation site) in distributing copies of the raw data to multiple replica sites.

Preparation

To prepare this demonstration we set up a collaboration of about 25 sites in 6 countries. Each of the sites had one or more hosts that were able to accept TCP data from the iperf applicaton or file copy data from the the bbcp application, from a central site initially at SLAC and later at SC2001. The initial measurements, prior to the formal challenge measurements were made over production networks with no effort being made to reduce other traffic. They were made from the central site to each site for 10 seconds per hour, and indicated that we could achieve in aggregate over 2Gbits/sec both from SLAC and from SC2001.

Formal Measurement

For the formal demonstration, we had a 30 minute time slot at SC2001 starting at 6:15pm Tuesday 13th November MST, and sent iperf data to about 17 sites in 5 different countries. Paolo Grosso, Steffen Luitz, Les Cottrell and Connie Logg (from left to right in the photo) launched and monitored the applications and throughputs, and Gary Buhrmaster assisted in setting up the booth router. The sites were chosen from those that were more reliable, had high throughput or were representative of countries requiring high throughput to the US.

The TCP traffic was generated by 3 Linux PCs, in the SLAC/FNAL booth, running iperf with large windows and multiple streams. Two of the PCs (a018 and a020) had two GE NICs each, and one (a017) had a single GE NIC. The link from the booth was two bonded GE links (Etherchannel) to SCinet.

Results

We first ran two 5 minute tests with different window sizes (large with 1024KBytes, small with 256kBytes) and numbers of streams 5 streams and 10 streams respectively, and observed aggregated throughputs of 1.2 to 1.3 Gbits/sec. The details are shown in the table below.
WindowsQBSSStartStopAggregate TCP throughput
LargeNo6:16pm MST6:21pm MST1250 Mbits/s
SmallNo6:26pm MST6:30pm MST1319 Mbits/s
LargeYes6:33pm MST6:38pm MST1338 Mbits/s
SmallYes6:40pm MST6:45pm MST1511 Mbits/s
Then we enabled QBone Scavenger Service (QBSS) using iperf's QOS option on the host with a single GE NIC (a017). This host was transmitting data to a single other host (a118) in the ANL booth on the SC2001 showfloor and so was capable of high throughput. This had the effect of giving the QBSS application lower priority than the other applications. The aggregate throughput when QBSS was utilized was essentially the same at 1.33Gbits/s, however the QBSS service dropped from 200-300 Mbits/s to about 100 Mbits/sec and the other applications were able to more fully utilize the bandwidth (see plot below). This demonstrates that the bulk data transfers that are critical to physics grid computing architectures can take place effectively even in the presence of saturated links with reduced impact to other applications. The graphs below show the effect of the throughputs by host without and with QBSS applied to the one host (in red). It can be seen that the throughput to a118 (the red lines) dramatically reduces from about 275 Mbits/s to under 100Mbits/s when QBSS is applied to it. At the same time the throughputs to the other hosts increases in almost all cases.

While running the tests, each 5 seconds, we read out the Cisco 6500 Catalysts port counters to obtain the transmit rate for each SLAC/FNAL booth host. The results for each host are shown below. The purple line is for the a017 host to which QBSS was applied in the 2nd pair of runs. The drop in performance for a017 is easily seen in the latter 2 traces to the right.

The plot below shows the aggregate throughputs for the tests. It is seen that we peaked on the fourth test at about 1.6Gbits/s.
While we ran the tests the SC2001 Network Operations Center (NOC) monitored their equipment. Snapshots of the throughputs on various links (thanks to Gregory Goddard of the University of Florida) were obtained for 6:38pm, 6:41pm and 6:44pm. The SLAC booth router was connected to the NOC Core-rtr-2 and the outgoing traffic can be seen to be between 1.3 and 1.6Gbits/s, while the inbound traffic (mainly ACKs) was < 200Mbits/s.

Footnote

We were 3rd in the bandwidth challenge throughput competition, following LBL and ANL. Both LBL and ANL had at least 10Gbits/e to their booths so we could not hope to beat them. Despite this we felt the demonstration of QBSS at these speeds was an important contribution.