Introduction
The Bandwidth to the World project was designed to demonstrate the
current
data transfer capabilities to several sites with high performance links,
worldwide. In a sense the site at SC2001 was acting like a High Energy
Physics tier 0 or tier 1 site (an accelerator or major computation site)
in distributing copies of the raw data to multiple replica sites.
Preparation
To prepare this demonstration we set up a
collaboration
of
about 25 sites in 6 countries. Each of the sites had one or more hosts
that were able to accept TCP data from the
iperf applicaton or
file copy
data from the
the bbcp
application, from a
central site
initially at SLAC and later at SC2001. The
initial measurements, prior to the formal challenge measurements were made
over production networks with no effort being made to reduce other traffic. They were made from the
central site to
each site for 10 seconds per hour, and indicated that we could achieve in aggregate over 2Gbits/sec
both from
SLAC
and from SC2001.
Formal Measurement
For the
formal demonstration, we had a 30 minute time slot at SC2001 starting at 6:15pm Tuesday 13th November MST,
and sent iperf data to about 17
sites in 5 different countries.
Paolo Grosso, Steffen Luitz, Les Cottrell and Connie Logg
(from left to right in the photo)
launched and monitored the applications and throughputs, and Gary Buhrmaster assisted in setting up the booth
router.
The sites were chosen from those that were more reliable,
had high throughput or were representative of countries requiring high throughput to the US.
The TCP traffic was generated by 3 Linux PCs, in the SLAC/FNAL
booth, running iperf with large
windows and multiple streams. Two of the PCs (a018 and a020) had two GE
NICs each,
and one (a017) had a single GE NIC. The link from the booth was two
bonded GE
links
(Etherchannel) to SCinet.
Results
We first ran two 5 minute tests with
different window sizes (large with 1024KBytes, small with 256kBytes) and
numbers of streams 5 streams and 10 streams respectively, and observed
aggregated throughputs of
1.2 to 1.3 Gbits/sec. The details are shown in the table below.
| Windows | QBSS | Start | Stop | Aggregate
TCP throughput |
| Large | No | 6:16pm MST | 6:21pm
MST | 1250 Mbits/s |
| Small | No | 6:26pm MST | 6:30pm
MST | 1319 Mbits/s |
| Large | Yes | 6:33pm MST | 6:38pm
MST | 1338 Mbits/s |
| Small | Yes | 6:40pm MST | 6:45pm
MST | 1511 Mbits/s |
Then we enabled QBone Scavenger Service (QBSS) using iperf's QOS
option on the host with a single GE NIC (a017). This host was
transmitting data to a single other host (a118) in the ANL booth on the
SC2001 showfloor and
so was capable of high throughput. This had the effect
of giving the QBSS application lower priority
than the other applications. The aggregate throughput when QBSS was
utilized was
essentially the same at 1.33Gbits/s, however the QBSS service dropped from
200-300 Mbits/s to about 100 Mbits/sec and the other applications were
able to more fully utilize the bandwidth (see plot below).
This demonstrates that the bulk data transfers that are critical to
physics grid computing architectures can take place effectively
even in the presence of saturated links with reduced impact to
other applications.
The graphs below show the effect of the throughputs by host
without and with QBSS applied to the one host (in red). It can be seen that
the throughput to a118 (the red lines) dramatically reduces from
about 275 Mbits/s to under 100Mbits/s when
QBSS is applied to it. At the same time the throughputs to the other hosts
increases in almost all cases.
While running the tests, each 5 seconds, we read out the Cisco 6500
Catalysts port counters to obtain the transmit rate for each SLAC/FNAL
booth host. The results for each host are shown below. The purple line
is for the a017 host to which QBSS was applied in the 2nd pair of runs.
The drop in performance for a017 is easily seen in the latter
2 traces to the right.
The plot below shows the aggregate throughputs for the tests. It is seen that we peaked on the
fourth test at about 1.6Gbits/s.
While we ran the tests the SC2001 Network Operations Center (NOC) monitored their
equipment. Snapshots of the throughputs on various links
(thanks to Gregory Goddard of the University of Florida) were obtained for
6:38pm, 6:41pm and
6:44pm. The SLAC booth router was connected
to the NOC Core-rtr-2 and the outgoing traffic can be seen to be
between 1.3 and 1.6Gbits/s, while the inbound
traffic (mainly ACKs) was < 200Mbits/s.
Footnote
We were 3rd in the bandwidth challenge throughput competition, following LBL and ANL.
Both LBL and ANL had at least 10Gbits/e to their booths so we could not hope to
beat them. Despite this we felt the demonstration of QBSS at these speeds was an
important contribution.