Caltech

NIKHEF logo

IEPM

"Bandwidth Lust":

Distributed Particle Physics Analysis using Ultra high speed TCP on the GriD

SC2003 Bandwidth Challenge Proposal

Synopsis (6.6TBytes in under 50 minutes)

The Caltech/SLAC/LANL/U Manchester/NIKHEF/CERN "Bandwidth Lust" team captured the SuperComputing 2003 Bandwidth Challenge award for the most data transferred. The data was exchanged with 3 booths (Caltech, SLAC and LANL) over three 10Gbps links from SCInet to four countries (USA, Netherlands, Switzerland and  Japan) on 3 continents, via three 10Gbits/s links: a dedicated 10 Gbits/s Level(3) donated link to the CENIC PoP in LA and on via a donated CENIC wavelength to the PAIX POP in Palo Alto; a shared 10Gbits/s Abilene link through LA, Sunnyvale to StarLight in Chicago and then onto Amsterdam via SurfNet/Netherlight; and a TeraGrid dedicated 10Gbits/s link to StarLight in Chicago and onto CERN  via DataTAG.

Typical single stream host to host TCP data rates achieved were 3.5 to 5 Gbits/s. Multiple TCP stacks were demonstrated including standard NewReno, FAST, High-speed and Scalable. The peak aggregate bandwidth from the booths was 23.21Gbits/s and typical one way link utilizations of over 90% were achieved. The amount of data transferred during the 48 minute demonstration was over 6.6 TeraBytes.

Participants

  • Caltech/HEP/CACR/NetLab: Harvey Newman, Julian Bunn, Sylvain Ravot, Conrad Steenberg, Yang Xia, Chang Jin, Sanjay Hegde, Raj Jayaram, David Wei, Dan Nae, Suresh Singh, Steven Low
  • SLAC/IEPM: Les Cottrell, Gary Buhrmaster, Connie Logg
  • LANL: Wu-chun Feng, Gus Horowitz
  • University of Manchester: Richard Hughes-Jones
  • NIKHEF/U of Amsterdam: Antony Antony
  • CERN/DataTAG: Olivier Martin, Paolo Moroni

Contributors

Press Releases

11/25/03 SuperComputing Online

11/26/03 NCSA, HPCWire, SDSC, EE Times

12/1/03: ACM TechNews, Yahoo

12/10/03 Official Press Release from Caltech

12/12/03 SLAC Interaction Point

1/14/04 Silicon Valley Biz Link

 Cisco Systems, Inc.(R)







StarLight logo
SURFnet


More on bulk throughput
Bulk throughput measurements | Bulk throughput simulation | Windows vs. streams | Effect of load on RTT and loss | Bulk file transfer measurements | FAST TCP Stack Measurements | QBSS measurements

Demonstrations
SC2001 challenge | iGrid2002 demonstration | SC2002 SLAC/FNAL SC2002 bandwidth challenge | Internet2 Land Speed Record


Project Description | Detailed Technical Requirements | Results | Photos

Project description

This is a joint Caltech,  SLAC, LANL, University of Manchester, LANL, NIKHEF/UvA project with Cisco, Level 3, CENIC, DataTAG, StarLight, TeraGrid, SurfNet, HP, Intel, AMD, as sponsors. We will demonstrate high network and application throughput  on trans-continental (10 Gbits/s) and trans-Atlantic (10 Gbits/s) links between SLAC/PAIX, CERN/Geneva, Caltech/LA, NIKHEF/Amsterdam, StarLight/Chicago and SC2003/Phoenix.

In this demonstration we will demonstrate several components of a Grid-enabled distributed Analysis Environment (GAE ) being developed to search for the Higgs particles thought to be responsible for mass in the universe, and for other signs of new physics processes to be explored using CERN 's Large Hadron Collider (LHC), SLAC s BaBar, and FNAL s CDF and D0. We use simulated events resulting from proton-proton collisions at 14 Teraelectron Volts (TeV) as they would appear in the LHCs Compact Muon Solenoid (CMS) experiment, which is now under construction and which will collect data starting in 2007.

We hope to demonstrate high (> 1 Gbit/s) disk-to-disk throughput and even higher  (several Gbits/s sustained) memory-to-memory throughputs between the above sites. The showcase will be at SC2003  in Phoenix, Arizona, November 15-21 2003.

For more details see: the submitted final proposal, the SC2003 abstract and a more informal plan. Also see the pamphlet overview of networking.

We will also have several slides shows illustrating:

Measuring the Digital Divide (PingER) Bandwidth Monitoring (IEPM-BW) Available Bandwidth Monitoring (ABwE)
Internet Traffic Characterization (NetFlow) Worldwide Sharing of Internet Performance Information (MonaLISA) Real Time passive network monitoring (Network Physics)
Comparing TCP stacks, long version    

In addition we will be:

Detailed technical requirements

Measurements

We will use a Java applet to monitor and display the real-time utilization of the various switch interfaces.

The demonstration utilized a loaned CENIC 10 Gbits/s wavelength from the Palo Alto Internet Exchange (PAIX) to Los Angeles and a loaned Level(3) 10Gbits/s circuit from LA to Phoenix, and a QWest fiber from the Level(3) Phoenix PoP to SC2003. The fiber from SLAC to Stanford's Forsythe building is SLAC/Stanford owned in SBC conduits. The fiber from Forsythe to the RTF and  thence to the Stanford boundary (at Hanover and College in Palo Alto) is Stanford fiber. The fiber from Hanover & College to PAIX is Palo Alto Municipality owned. The link from PAIX through Sunnyvale to LA was loaned by CENIC. The conversion from the 10GE LAN-PHY between SLAC and LA and the OC192/POS from LA to Phoenix was accomplished by a CENIC GSR (with interfaces loaned by Cisco)  located at the LA CENIC POP. Cisco provided wavelength multiplexing equipment as well as routers and switches. Due to problems with signal distance from SLAC to PAIX, SLAC placed a Dell PowerEdge 2650 (2*3.06GHz cpus) with an Intel 10GE NIC at the PAIX POP in Stanford University's rack space.

The transatlantic link from Amsterdam to Chicago was a 10GE link instead of POS. It used 10GE WAN PHY interfaces on a Force10.

SLAC link, fiber specifications Links: SLAC/PAIX, Caltech/LA, SC03, StarLight/Chicago, CERN, NIKHEF/Amsterdam Caltech Link
Host at PAIX (137.174.27.30), PNI completion notice Addresses and traceroutes of remote hosts accessed from booths Specifications of NIKHEF hosts at Amsterdam and Chicago

Operational Information, BWC schedule, booth log

Results

There were 3 booths involved: SLAC, Caltech and LANL. We had 4 10GE links from the booths to SciNet (two each from SLAC and Caltech and one from LANL). See the traceroutes and remote host IP addresses for the paths being used by the various booths.We had access to three 10Gbps links from SCinet to the outside world: 10Gbps dedicated loaned by Level (3) to LA/Caltech and thence loaned by CENIC to Palo Alto/SLAC; 10Gbps shared to Abilene/Internet2 to US Universities, Japan and via Surfnet to NIKHEF/Amsterdam; 10Gbps dedicated loaned by TeraGrid to StarLight/Chicago and thence via DataTAG to CERN. Data was exchanged between 4 countries (USA, Netherlands, Switzerland and Japan) on 3 continents.

In the SLAC booth we had 3 Dell PowerEdge hosts (two with 2*3.06GHz Xeon cpus each, one with 2*2.4GHz Xeon cpus) each with a 10Ge Intel NIC. Two of the 10GE NICs were loaned by the ASCI/Livermore folks.  In the LANL booth we used an Opteron cpu loaned by AMD. In the Caltech booth there were Xeon and Itanium (1.5GHz) cpus.

We started the demonstration running the Caltech developed Calrens GAE server real-world application reading physics experiment data from two separate disk servers at LA/Caltech and sending them each at 200MBytes/s to two hosts at the Caltech booth where they were written to disk.  HEP events were extracted from the data stream analyzed and displayed in real-time in the Caltech booth . A peak disk to disk throughput of 3.2 Gbits/s was achieved.

After a few minutes we then started up the TCP streams using iperf from SLAC, sending: ~ 4.5Gbps to PAIX via the dedicated Level(3)/CENIC link from one of the 3.06GHz Xeons at the SLAC booth; ~4.5GHz from a second 3.06GHz Xeon to an Itanium/F10 at NIKHEF Amsterdam via the shared Abilene link; and ~3.5GHz from a 2.4GHz Xeon to an Itanium at Chicago. Caltech also started sending TCP streams to LA/Caltech to fill up the Level(3) link as well as a few hundred Mbits/s to KEK/Japan and to the AMPATH PoP at Florida International University in Miami Florida. LANL tried sending from an Opteron in the LANL booth to CERN via  Teragrid/DataTAG. However, consistent high throughput could not be achieved due to congestion. LANL therefore reversed the flow and sent from an Itanium at CERN to the Opteron at the LANL booth. Various TCP stacks were used in the sending hosts during the demonstration including the standard Linux (2.4 and 2.6) TCP stacks (NewReno), FAST, High-Speed and Scalable TCP. Jumbo (9000Byte) frames were also utilized.

The router interface throughputs with the booths as recorded at SCInet indicated a peak aggregate throughput of ~23.21Gbit/s. The router interface throughputs with the external world recorded at SCInet indicate over 90-100% utilization of the LA/PAIX link and about 85% utilization of the Abilene link. We believe the limitations were cpu power in the hosts. We took many snapshots of the link utilizations and created a spreadsheet of the results. The maximum aggregate throughput recorded on all links was 30Gbits/s. A typical snapshot of the link utilizations during the demonstration showed  8.5Gbits/s outbound from SCInet on the Abilene link, 8.9Gbits/s inbound to SCInet on the TeraGrid link and 9.9Gbits/s on the Level(3) link. The Abilene/Internet2 traffic between LA and Sunnyvale indicated a peak of over 8Gbits/s (typical loads on this link are a few hundreds of Mbits/s) which could be seen at Chicago and on the University of Amsterdam/NIKHEF link.  The total amount of data transferred during the 48 minute demonstration was 6.6TeraBytes.

The official SCInet Bandwidth Challenge Results web page showed that our demonstration achieved the most bandwidth by over a factor of 2.5.

TCP Stack evaluations on the SC03 10Gbits/s links

Photos


Created September 5, 2003, last update December 3, 2003: Les Cottrell, SLAC