High Speed Terabyte Data Transfers for Physics: FAQ

SC2004 Bandwidth Challenge Proposal

This FAQ was started in response to questions and comments from SlashDot.
I mean that is a full terabyte almost every minute and a half. What has so much data?
When the Large Hadron Collider (LHC) at CERN comes online in about 5 years, it is expected to churn out petabytes of data. SLAC and Fermilab are already turning out terabytes/day but they will be surpassed by CERN.
Other data intensive sciences with soon to be similar needs include the Human Genome, Astrophyics, Health, Fusion, Seismology.
The Library of Congress books and other print collections are about 11TBytes. The web in 2000 had about 2.1 billion publicly available pages, was growing at about 7 Million pages/day (it is now (Nov 2004) estimated to be at about 500 billions pages), the average page size was 10kBytes. WalMart has 460 terabytes of consumer-tracking goodness stored on mainframes at its headquarters: http://developers.slashdot.org/article.pl?sid=04/11/14/2057228&tid=187&tid=221&tid=198&tid=1  See also There is a lot of Data out There, Data Powers of Ten.
Who needs it?
Nowhere in the article does it say how long they ran the test for. A second? A minute? An hour?
On the various 10Gbps paths we were able to sustain over 99% of the available bandwidth for hours at a time. We sustained an aggregate of over 100Gbits/s for about 2 minutes. The HEP bandwidth challenge ran for 48 minutes.
What are the limitations today?
The main limitation today to achieving high network throughput (>= 10Gbits/s) between two hosts is the bus bandwidth in the hosts. The PCI-X bus today limits the throughput to about 8Gbits/s. Also a limiting factor are the disk speeds and file system. One needs a highly parallel disk/file system to achieve beyond a hundred or so MBytes/sec.
How does this compare with the Internet Land Speed Record (LSR)?
The LSR is for one host sending to one host. The Bandwidth Challenge used many hosts (the SLAC booth alone had seven Sun AMD Opteron hosts) and multiple 10Gbits/s paths (Caltech had 7 and SLAC had 3). The LSR also factors in the distance between the hosts.

Created November 29, 2004: Les Cottrell, SLAC