High Speed Terabyte Data
Transfers for Physics: FAQ
SC2004 Bandwidth Challenge Proposal
- This FAQ was started in response to questions and comments from
- I mean that is a full terabyte almost every minute and a half. What has so much data?
When the Large Hadron Collider (LHC) at CERN comes online in about 5 years, it is expected to churn out petabytes of data.
Fermilab are already turning out terabytes/day but they will be surpassed by CERN.
Other data intensive sciences with soon to be similar needs include the Human Genome, Astrophyics,
Health, Fusion, Seismology.
The Library of Congress books and other print collections are about 11TBytes.
The web in 2000 had about 2.1 billion publicly available pages, was growing at about 7 Million pages/day
(it is now (Nov 2004) estimated to be at about 500 billions pages),
the average page size was 10kBytes. WalMart has 460 terabytes of
consumer-tracking goodness stored on mainframes at its headquarters:
There is a lot of
Data out There,
Data Powers of Ten.
- Who needs it?
- High Energy Physics (HEP) for high bulk throughput (today use 0.5Gbps, 5-10
years need 1000Gbps);
Global Climate (data & computation) for high bulk throughput (today
0.5Gbps, 5-10 years Nx1000Gbps);
- Nanoscience (Spallation Neutron Source) for remote control and time critical
throughput (has not started to use large bandwidth yet, 5-10 years 1000Gbps +
QoS for control channel);
- Fusion energy for time critical throughput (today 0.066Gbps (500MB/s
bursts), 5-10 years Nx1000Gbps);
- Astrophysics for computational steeting and collaborations (today 0.013Gbps,
TByte/week, 5-10 years 1000Gbps)
- Genomics for high throughput and steering (0.091Gbps (1 TByte/day), 5-10
years 1000Gbps + QoS for control channel).
- Nowhere in the article does it say how long they ran the test for. A second? A minute? An hour?
- On the various 10Gbps paths we were able to sustain over 99% of the available bandwidth for hours at
a time. We sustained an aggregate of over 100Gbits/s for about 2 minutes. The
HEP bandwidth challenge ran for 48 minutes.
- What are the limitations today?
- The main limitation today to achieving high network throughput (>=
10Gbits/s) between two hosts is the bus bandwidth in the hosts. The PCI-X bus
today limits the throughput to about 8Gbits/s. Also a limiting factor are the
disk speeds and file system. One needs a highly parallel disk/file system to
achieve beyond a hundred or so MBytes/sec.
- How does this compare with the Internet Land Speed Record (LSR)?
- The LSR is for one host sending to one host. The Bandwidth Challenge used
many hosts (the SLAC booth alone had seven Sun AMD Opteron hosts) and multiple
10Gbits/s paths (Caltech had 7 and SLAC had 3). The LSR also factors in the
distance between the hosts.
Created November 29, 2004: Les Cottrell,