 |
SC2000 bulk thruput measurements to SLAC
 |
|
Introduction
SLAC participated with
FNAL in the
SC2000
show in Dallas to illustrate the needs and challenges of
data intensive science and in particular for
the
Particle Physics Data Grid (PPDG).
At the show our
booth had connectivity
via SCInet to
Internet 2 and to
NTON.
We measured the thruput on both links during the
show. This was done in the spirit of the
SC2000 Network Challenge, though we did not officially
enter the challenge since we did not expect to have NTON
connectivity to SLAC in time. The NTON
link to SLAC went live late Monday
afternoon November 6, 2000. The NTON link to SCInet was an OC48
(2.4Gbits/s) packet
over SONET link.
On the show floor in our booth there were 2 Dell Intel PCs, one a
dual 533MHz Pentium III
PowerEdge with a 64 bit PCI bus, the other a
single Intel processor running at 833MHz with a 32 bit PCI bus.
Both were running the Linux kernel (2.4-test10). Both had
3Com Gigabit Ethernet (GigE) interfaces to a Cisco
Catalyst 6009. The Catalyst 6009 had 2 GigE interfaces to SCInet
and a SUP1A with an MSFC and was graciously loaned to us by
Cisco for the duration of the show. The interfaces
were bonded together using GigE channel and connected
to an Extreme Networks switch at the SCInet NOC.
At the SLAC end the NTON OC48-POS comes into a Cisco
120012 GSR router. From the GSR there was a GigE interface
to pharlap.slac.stanford.edu
(a Sun E4500 with 6 processors running at 336MHz) via
a Catalyst 5500
and a 2nd GigE interface also via a Catalyst 5500
to datamove5.slac.stanford.edu
(a Sun E4500 with 4 cpus running at 400MHz).
The measurements were made by Davide Salomoni and Steffen Luitz with
help from Les Cottrell all of SLAC.
NTON tests
The early tests ran into problems with packet losses on the show floor etc.
On the morning of Thursday 11/9/00 (the last day of the show), our tests
showed a peak transfer rate from the booth in
Dallas to SLAC via NTON of around 990 Mbit/s, achieved using two PCs on the floor
and two Suns (pharlap and datamove5) at SLAC. A screenshot of the
application
(iperf)
shows the peak rate as measured in MBytes/s by the MIB
variables on the Catalyst interface ports. They were not
read on the MSFC router (because the
MSFC was doing hardware switching, and therefore the counters on the MSFC
did not reflect the actual load). The best
results were achieved with a 128KB window size and 25 parallel streams.
Bigger window sizes caused noticeable performance degradation.
In the last timeslot we also made another interesting experiment: Pushing data
via UDP as fast as possible to SLAC and see what arrives at the GSR. With
our two PCs we achieved about 1.25GBit/s (1GBit/s from the Dell Poweredge
and 250MBit/s from the other PC - we don't understand why the rate was
only 250MBit/s). Unfortunately we could not take advantage of the
GigaEtherChannel to scinet - the two PCs happened to be mapped to the same
interface. Out of the 1GBit raw bandwidth we sent we received ca.
975MBit/s (5min average) at the GSR at SLAC. Still an interesting result.
The 990 MBit/s were measured in a few second peak (screenshot included)
and we were using 2 second sampling (as opposed to 5 sec in the bandwidth
challenge).
This thruput exceeds the goal of 100MBytes/sec that was set in the PPDG
early in 2000.
Internet 2 tests
In another test, we also tried to pump traffic from the booth to SLAC using
the normal Internet2/Stanford link, and we saw (shown by rtr-msfc-dmz on a 5
minutes average) slightly more than 300 Mbit/s sustained coming into SLAC
(plus around 40 Mbit/s going out). The CPU utilization on the MSFC was
around 81%.
The path characterization shows the route
and some
pchar
estimates of the various links along the route.
Ping measurements (56 byte pings separated by 1 second intervals)
from SLAC to the booth, starting at 13:56 PST
on Thursday 9th November
just before the show closed and we lost connectivity,
showed a min/avg/max RTT of
48.6/53.1/172 msec. and no loss in 370 pings.
The ping distribution is shown below.
SC2000 floor tests
We also made tests with iperf running on the same PowerEdge
in the SLAC booth and sending TCP data to a
2 cpu 733MHz Dell host running Red Hat Linux (2.2.12) at the Caltech SC2000
booth. With this we acheived only 300Mbits/s. With the PowerEdge sending iperf
TCP data to both the Caltech machine and the 2nd Dell in the SLAC booth we acheived
800Mbits/s. At this rate the PowerEdge was saturated.
Conclusions
The future ready availability of such high speed connectivity on wide area
links will change how we approach things and open up new applications
requiring the transfer of large amounts of data. Such applications include
data intensive science and multi-media. Some examples are given below.
- At the show we had a QuickTime
movie
explaining black holes and how the
Gamma Ray Large Area
Space Telescope (GLAST) experiment will shed light on them. This movie is
about 10 minutes or 132MBytes long and takes over 2 hours and 10 minutes to
upload on my DSL link (upload speed about 144kbit/s) to
home. With 990Mbits/s it would take about 1.1 seconds.
-
The BaBar experiment at SLAC
is currently transferring about 200 GBytes/day from SLAC to
IN2P3 in Lyon
France at a
sustained rate of 20Mbit/s. If we had a 990Mbit/s link then
in a day they could transfer 10TBytes/day, or alternatively
transfer the 200GBytes in under 30 minutes.
-
BaBar is currently accumulating about 200Tbytes of data a year. This
could be tranferred to another site in about 20 days at 990Mbit/s.
Such transfers would be used to provide off-site backup or for
replicating file systems or databases.
-
The scarcest and most valuable commodity is time. Studies in the late
1970s and early 1980s by Walt Doherty of IBM and others showed the
economic value of Rapid Response Time:
- 0-0.4s High productivity interactive response
- 0.4-2s Fully interactive regime
- 2-12s Sporadically interactive regime
In 0.4 seconds at 990Mbit/s
one can transfer about 50MBytes, in 2 seconds 250Mbytes
(or 12K BaBar physics events, enough to achieve 1% statistics)
and in 12 seconds 1.5Gbytes (or over 2 CD-ROMs).
-
Typical voice over IP phone calls require about 20kbit/s with compression.
With 990Mbit/s one can support about 50,000 simultaneous calls, or
sufficient to support offsite calls for about 500 sites of SLAC's
size (2000 people on site) and usage patterns.
Acknowledgements
We would like to especially
thank the following for their assistance during the show.
Richard Mount of SLAC for encouraging and challenging us to "go for it",
Dave Millsom of SLAC for making the NTON link work to SLAC just in time,
Hal Edwards of
Nortel and Bill Lennon of LLNL for providing the NTON connection at
SLAC, Paul Dasprit for coordinating NTON activities at SC2000,
Bill Wing of ORNL for coordinating our connection to SCInet, and
Cisco for loaning us the Catalyst 6009 and MSFC.
Back to top
Created November 9, 2000, last update November 16,2000.
Comments to iepm-l@slac.stanford.edu