NIKHEF logo

Bandwidth Challenge from the Low-lands

SC2001 Bandwidth Challenge Formal Measurements

Historical Performance Reports
| Round trip time | Loss | Derived bandwidth | Conditional loss | IPDV | Duplicate packets

Performance Maps
Iperf | Ping RTT & Loss 

More on bulk throughput
Bulk throughput measurements | Bulk throughput simulation | Windows vs. streams | Effect of load on RTT and loss | Bulk file transfer measurements | QBSS measurements | SC2001 bandwidth challenge | SC2002 bandwidth challenge | U.S. Secretary of Commerce Demo | Video clip

Page Contents
Primary Contact | Project Description | Acknowledgements | Site Contacts  

IEPM

 


 Primary contacts

Anthony Anthony, NIKHEF Amsterdam, The Netherlands, <antony@nikhef.nl>
Dr. R. Les Cottrell, MS 97, Stanford Linear Accelerator Center (SLAC), 2575 Sand Hill Road, Menlo Park, California 94025, <cottrell@slac.stanford.edu

Project description

The avalanche of data already being generated by and for new and future High Energy and Nuclear Physics (HENP) experiments demands new strategies for how the data is collected, shared, analyzed and presented. For example the SLAC BaBar experiment and JLab are each already collecting over a TByte/day, and BaBar expects to increase by a factor of 2 in the coming year. The SLAC BaBar and Fermilab CDF and D0 experiments  have already gathered well over a PetaByte of data, and the CERN LHC experiment expects to collect over ten million TBytes. The strategies being adopted to analyze and store this unprecedented amount of data is the coordinated deployment of Grid technologies such as those being developed for the Particle Physics Data Grid and the Grid Physics Network. It is anticipated that these technologies will be deployed at hundreds of institutes that will be able to search out and analyze information from an interconnected worldwide grid of tens of thousands of computers and storage devices. This in turn will require the ability to sustain over long periods the transfer of large amounts of data between collaborating sites with relatively low latency. <-- A short PowerPoint presentation covers some of the highlights. ->

This project is designed to demonstrate the current data transfer capabilities to several sites with high performance links, worldwide. In a sense the site at iGrid2002  is acting like a HENP tier 0 or tier 1 site (an accelerator or major computation site) in distributing copies of the raw data to multiple replica sites. The demonstration will be over real live production networks with no efforts to manually limit other traffic. The results will be displayed in real-time. Researchers investigate/demonstrate issues regarding TCP implementations for high-bandwidth long-latency links, and create a repository of trace files of a few interesting flows. These traces, valuable to projects like DataTAG, help explain the behavior of transport protocols over various production networks.

The physics part of the demo will use press cuttings illustrating BaBar SLAC aerial view, PEP II beamline, Moving the coil from Italy by C5A aircraft, single event displays, S.F. Chronical front page article on Babar discovery, SSRL aerial photo, Polymerase molecule from Science front-page, the first web page in the U.S. from 1991, a KRON news clip on the Babar database being the largest known database in the world, the growth & current size of the BaBar database. BaBar database).

  • Throughput to the world
  • Italian collaborator site traffic
  • Top 10 sites from SLAC by MBytes
  • SLAC Internet 2 Traffic growth 2002
  • SLAC Offsite bandwidth growth, 1983 - 2002

    To support these high throughput requirements, we are measuring TCP and file copy throughputs to many of the sites from 9 sites in 4 countries (including iGrid2002) to over 35 hosts in 8 countries. Some throughputs are seen here. We used the standard TCP stack with regular MTUs. We optimized the window sizes and number of streams by running Iperf.TCP for 10 seconds to each remote host from iGrid2002 for a range of windows and streams. By looking at the graphs we selected the window size for the minimum number of streams that gave about 80% of the maximum throughput result. We also record the routes and ping Round Trip Times (RTTs), losses, and derived throughputs among other metrics, the Ethernet interface transmit and receive Mbytes/second as well as providing animations of RTT, loss and derived throughput measured by ping and throughput measured by iperf measured from iGrid2002 and from SLAC, and a Java applet to show ping RTTs to the world in real-time from your computer (see the mock-up in case you cannot load the applet onto your computer).

    Demos

    The goal of the demos was to show monitoring of high performance end-to-end links from iGrid2002 (Amsterdam) to 9 countries in N. America, Europe and Japan. Several people at SLAC including Connie Logg, Warren Matthews, Jiri Navratil, Jerrod Williams and myself spent several weeks before iGrid2002 developing applications and installing and configuring software on 2 hosts at NIKHEF. We were ably assisted by Antony Antony of NIKHEF. I met Antony face to face for the first time when I arrived at iGrid2002 on Saturday 21st September and we spent long hours on Saturday, Sunday and Monday setting up 3 more hosts and porting the applications. We decided to use NFS to simplify configurations (this was very successful and dramatically simplified configuring and updating, however, we had to be careful about ensuring various logs/recording files did not overlap from various hosts), added TCP parameter flushing to all hosts, set up 5 groups of remote hosts with roughly equal aggregate throughput, modified our iGrid2002 web page to provide easy access to all demo pages.

    Our first demo was scheduled for 9am Tuesday morning September 24, 2002. The formal demonstration took about 10 minutes and was videotaped. We used one screen to introduce the physics needs. This evolved to screen shots showing the historical and future data and bandwidth requirements. This led to the need to manage and understand how to replicate data to multiple sites and the building of a high throughput measurement infrastructure to help address these needs. We illustrated this by the PingWorld applet, the Available Bandwidth Estimation (ABE) servlet, the animated iperf world map, the traceroute topology for iGrid2002/IEPM and plots of ifconfig throughputs as a function of time as we ran the bandwidth tester in sequential mode on one host (keeshond) and in flood mode on the 2nd host (stier). We also ran a few iperfs from a 3rd host (haan) to a few high performance hosts. The aggregate throughputs measured at the router was over 2Gbits/s.

    The second demo slot was on Thursday 26, 2002. We had 5 hosts: keeshond, stier, haan, hp3 and hp4 all running Linux and all with 2 * 1 GE interfaces. We did not have time to get the 2nd GE interfaces working on the hosts. Keeshond was set to make sequential tests (iperf, bbcp and bbftp) except that instead of calling the script to drives the tests (run-bw-tests) from a cron job, it was called from a continous loop. Each of the remaining 4 machines ran iperf in TCP mode simultaneouly to different sets of about 6 hosts. The hosts in each set was determined from an an earlier snapshot of the throughputs. We measured the load generated noting read (RX) and transmit (TX) bytes reported by ifconfig for the appropriate Gigabit Ethernet interfaces, every 2 seconds and calculating the differences. This was accomplished and displayed using a java servlet. With this setup we were able to achieve the following throughputs:
    Host address interfaceGroupRouter InterfaceThroughput
    stier145.147.2.2 eth3 0 2/16800-900 Mbits/s
    haan 145.147.2.3 eth1 2
    800 Mbits/s
    hp3 145.147.2.18
    3
    800 Mbits/s
    hp4 145.147.2.19
    4 8/4 700 Mbits/s
    The throughputs in the above table are the flood throughputs acheivable (as measured by UTH on stier) for each of the groups. No measurements were made for hp3 and hp4 since we did not get access to these hosts until the last minute and they did not have the appropriate Java Development Kit installed (JDK 1.4.1). The MRTG plot for stier (averaged over 5 minute intervals) indicates that during the demonstration from 9am to 1pm Thursday we acieved about 650Mbits/s from stier. The MRTG plot of the aggregate throughputs recorded for iGrid2002 indicates that during the demo slot (10:00am to noon, though we started setting things up around 8:30-9:00am) the output throughput (blue line) measured over 5 minute intervals peaked at about 3 Gbits/s, which is consistent with 4 hosts sending data at 700-800Mbits/s each. However, there is evidence that there may have been about 1 Gbits/s output throughput from other sources. Unfortunately more detailed records of throughputs (e.g. by switch port where we could identify which switch port went with which host) are not available at this time.

    Problems Encountered

    The main problems were caused by late access (Monday evening) to 2 of the measurement hosts and the large screens (this caused difficulty in installing WebStart, getting X windows to work with the various video cards, the Perl Tk module, emacs, & adjusting the displays to work with the limited resolution), non-uniform configurations of the various hosts, a broken software link not discovered until later, losing connectivity to some remote sites, e.g. due to hacker compromise, one site had to quarantine the remote host we had hoped to monitor, late addition/tuning of some remote hosts (2 at StarLight, one at UIUC) took time away from other tasks, an unexplained problem with the serial UTH plots locking up starting run-nw-tests on the first host, problems with ifconfig not keeping up at 1/sec readouts (we changed the software to allow an option to increase the time between measurements), the low resolution of the large flat panel displays, we were unable to take advantage of the 2nd GE interfaces in the hosts, there was a problem (Jerrod discovered and fixed) with the data extraction caused by trying to copy the data to antonia. The lessons learned will assist in the demos at SCS2002.

    Some improvements suggested for the demos were: to display the configurations etc. of the hosts being accessed in the flood mode; provide an aggregate (for all hosts) throughput real-time time series plot independent of the central network's MRTG router displays.

    Lessons learnt & value of iGrid2002 as staging event for SC2002.

    Acknowledgements

    We will be using the Internet 2, ESnet, JAnet, GARR, Renater, SURFnet, Japanese WANs and the CERN-STARTAP link.

    The work has been sponsored by: 

    Offsite resources will be at the sites listed in the table below. Each site will have one or more Unix hosts running iperf and bbftp servers. 

    Logo Site
    APAN-JP, Japan
    ANL Illinois, USA

    go to home page

    BNL New York, USA
    Caltech California, USA
    CERN Geneva, Switzerland
    CESnet CESnet, Prague, CZ
    Council for the Central Laboratory of the Research Councils Daresbury Liverpool, UK
    FNAL Illinois, USA
    NASA Meatball GSFC Maryland, USA
    IN2P3 Lyon, France

    INFN, Milan, Italy

    internet2_logo Internet 2
    Jefferson Lab JLab Virginia, USA
    LogoSite
    KEK Tokyo, Japan
    Los Alamos National Laboratory LANL New Mexico, USA
    Berkeley Lab LBL California, USA
    Manchester University, UK
    NERSC California, USA
    NIKHEF logo NIKHEF Amsterdam, Netherlands
    ORNL Tennessee, USA
    Council for the Central Laboratory of the Research Councils RAL Oxford, UK
    Rice University Texas, USA
    RIKEN, Japan
    LogoSite
    Rome, Italy
    SDSC - a unit of UC San Diego SDSC California, USA
    SLAC California, USA
    Stanford California, USA
    TRIUMF LOGO TRIUMF vancouver, Canada
    UCL University College London, UK
    U Florida, USA
    U Delaware, USA
    Back to the UTD homepage UT Dallas Texas, USA
    U Michigan, USA
    University of 

         Wisconsin-Madison U Wisconsin, USA

    Contact information for all collaborating sites

    The following are the contacts at the various remote sites.

    Ayumu Kubota, APAN-JP <kubota@kddilabs.jp>
    Linda Winkler, ANL, US, <winkler@mcs.anl.gov> + William E. Allcock [allcock@mcs.anl.gov]
    Dantong Yu, BNL, Long Island, US, <dtyu@rcf.rhic.bnl.gov>
    Harvey Newman, Caltech, Pasadena, US, <newman@hep.caltech.edu> + Julian J. Bunn [julian@cacr.caltech.edu] + Suresh Singh <suresh@cacr.caltech.edu>
    Olivier Martin, CERN, Geneva, CH, <omartin@dxcoms.cern.ch> + Sylvain Ravot [Sylvain.Ravot@cern.ch]
    Robin Tasker, Daresbury Lab, Liverpool, UK, <R.Tasker@dl.ac.uk> + Kummer, P. S (Paul) [P.S.Kummer@dl.ac.uk]
    Jim Leighton, ESnet, Berkeley, US, <JFLeighton@lbl.gov>
    Ruth Pordes, FNAL, Chicago, US, <ruth@fnal.gov> + Frank Nagy <nagy@fnal.gov> + Phil DeMar <demar@fnal.gov>
    Andy Germain, NASA/GSFC, US, <andyg@rattler-f.gsfc.nasa.gov> + George Uhl [uhl@rattler-f.gsfc.nasa.gov]
    Jerome Bernier, IN2P3, Lyon, FR, <bernier@cc.in2p3.fr> + Dominique Boutigny [boutigny@in2p3.fr]
    Fabrizio Coccetti, INFN, Milan, IT, <Fabrizio Coccetti [f@fc8.net]>
    Emanuele Leonardi, INFN, Rome, IT,  <Emanuele.Leonardi@roma1.infn.it>
    Guy Almes, Internet 2, US, <almes@internet2.edu> + Matt Zekauskas <matt@advanced.org> + Stanislav Shalunov <shalunov@internet2.edu> + Ben Teitelbaum <ben@internet2.edu>
    Chip Watson, JLab, Newport News, US, <chip.watson@jlab.gov> + Robert Lukens <rlukens@jlab.org>
    Yukio Karita, KEK, Tokyo, JP, <karita@nwgvax.kek.jp>, Teiji Nakamura <teiji@nwgsun2.kek.jp>
    Wu-chun Feng, LANL, Los Alamos, US, <feng@lanl.gov>, Mike Fisk <mfisk@lanl.gov>
    Bob Jacobsen, LBL, Berkeley, US, <Bob_Jacobsen@lbl.gov>, Shane Canon <Canon@nersc.gov>
    Richard Hughes-Jones, Manchester University, UK, <rich@a3.ph.man.ac.uk>
    Anthony Anthony, NIKHEF, Netherlands, <anthony@nokhef.nl>
    Tom Dunigan, ORNL, Oak Ridge, US, <thd@ornl.gov> + Bill Wing <wrw@email.cind.ornl.gov>
    Richard Baraniuk, Rice University, <richb@rice.edu>, Rolf Riedi [riedi@rice.edu]
    Takashi Ichihara, RIKEN, Japan, [ichihara@rarfaxp.riken.go.jp]
    John Gordon, Rutherford Lab, Oxford, UK, <J.C.Gordon@RL.AC.UK> + Adye, TJ (Tim) [T.J.Adye@RL.AC.UK]
    Reagan Moore, SDSC, San Diego, UA, <moore@SDSC.EDU> + Kevin Walsh [kwalsh@SDSC.EDU] + Arcot Rajasekar <sekar@SDSC.EDU>
    Warren Matthews, SLAC, Menlo Park, US <matthews@slac.stanford.edu> + Paola Grosso <grosso@slac.stanford.edu> + Gary Buhrmaster <buhrmaster@slac.stanford.edu> + Connie Logg <cal@slac.stanford.edu> + Andy Hanushevsky <abh@slac.stanford.edu> + Jerrod Williams <jerrodw@slac.stanford.edu> + Steffen Luitz <luitz@slac.stanford.edu>
    Warren Matthews, Stanford University, Palo Alto, US, Milt Mallory <
    milt@stanford.edu>
    William Smith, Sun Micro Systems [William.Smith@sun.com],  Rocky Snyder rocky.snyder@sun.com
    Andrew Daviel, TRIUMF, Vancouver, CA, <andrew@andrew.triumf.ca>
    Yee-Ting Li, University College London, UK,  <ytl@hep.ucl.ac.uk> + Peter Clarke <clarke@hep.ucl.ac.uk>
    Constantinos Dovrolis, University of Delaware, US,  <dovrolis@mail.eecis.udel.edu>
    Paul Avery, University of Florida, Gainesville, US,  <avery@phys.ufl.edu> + Gregory Goddard [gregg@nersp.nerdc.ufl.edu]
    Thomas Hacker, University of Muchigan, <hacker@umich.edu>
    Joe Izen, University of Texas at Dallas, US, <joe@utdallas.edu>
    Miron Livny, University of Wisconsin, Madison, US, <miron@cs.wisc.edu> + Paul Barford <pb@cs.wisc.edu> + Dave Plonka <plonka@doit.wisc.edu>


    Created August 17, 2002; last update August 17, 2002.
    Comments to iepm-l@slac.stanford.edu