IEPM

Bulk Throughput Measurements - Manchester University, England

Les Cottrell, created May 18 '02
Bulk Throughput Measurements | Bulk Throughput Simulation | Windows vs. streams | Effect of load on RTT and loss | Bulk file transfer measurements

Configurations

Richard Hughes-Jones and of the HEP group at Manchester University set up a host (henceforth referred to as node1.man.ac.uk, this is not its real name which is with-held for security reasons). Node1 was an 800MHz PIII with a GE interface to a Cisco 65xx switch and then to the Manchester MAN and SuperJANET (2.5Gbps backbone, I think). It was running Linux 2.4.14 with a standard TCP stack. The windows/buffers were set as follows:
cat /proc/sys/net/core/wmem_max = 8388608
;cat /proc/sys/net/core/rmem_max = 8388608
;cat /proc/sys/net/core/rmem_default = 65536
;cat /proc/sys/net/core/wmem_default = 65536
;cat /proc/sys/net/ipv4/tcp_rmem = 4096 87380   4194304
;cat /proc/sys/net/ipv4/tcp_wmem = 4096 65536   4194304
At the SLAC end was a 1133 MHz PIII also running Linux 2.4. and with a 3COM GE interface to a Cisco 6509 and then via GE to a 622Mbps ESnet link. The TCP stack at the SLAC end was Web100. The windows/buffers settings were:
;cat /proc/sys/net/core/rmem_max = 8388608
;cat /proc/sys/net/core/rmem_default = 65536
;cat /proc/sys/net/core/wmem_default = 65536
;cat /proc/sys/net/ipv4/tcp_rmem = 4096 87380   4194304
;cat /proc/sys/net/ipv4/tcp_wmem = 4096 65536   4194304

Measurement methodology

We used tcpload.pl to make 10 second iperf TCP measurements. For each measurement we use a fixed window and number of parallel streams and measured client end (SLAC) cpu times, iperf throughput, ping responses (loaded) and also recorded various Web100 variables:
 @vars=("StartTime",
               "PktsOut",        "DataBytesOut",
               "PktsRetrans",    "CongestionSignals",
               "SmoothedRTT",    "MinRTT",            "MaxRTT", "CurrentRTO",
               "SACKEnabled",    "NagleEnabled",
               "CurrentRwinSent","MaxRwinSent",       "MinRwinSent",
               "SndLimTimeRwin", "SndLimTimeCwnd",    "SndLimTimeSender");

Following each iperf measurement we ran ping for 10 seconds (unloaded) and recorded the responses. Following the above pair of a 10 second iperf measurement followed by 10 seconds of no iperf throughput, the stream size was changed and the pair repeated. When all selected window sizes had been measured, then a different number of streams was selected and the cycle repeated.

Results

A traceroute from SLAC to Manchester is shown below:

traceroute to node1.man.ac.uk, 30 hops max, 38 byte packets
 1  RTR-GSR-TEST ()  0.224 ms  0.209 ms  0.094 ms
 2  RTR-DMZ1-GER ()  0.334 ms  0.225 ms  0.236 ms
 3  SLAC-RT4.ES.NET (192.68.191.146)  0.341 ms  0.295 ms  0.338 ms
 4  snv-pos-slac.es.net (134.55.209.1)  0.698 ms  0.746 ms  0.693 ms
 5  chi-s-snv.es.net (134.55.205.102)  48.660 ms  51.432 ms  48.768 ms
 6  nyc-s-chi.es.net (134.55.205.105)  69.305 ms  68.903 ms  68.830 ms
 7  ny-pop.ja.net (193.62.157.213)  68.948 ms  69.109 ms  68.835 ms
 8  london-bar5.ja.net (146.97.37.89)  148.916 ms  148.994 ms  149.172 ms
 9  po15-0.lond-scr.ja.net (146.97.35.137)  148.810 ms  148.392 ms  148.335 ms
10  po2-0.read-scr.ja.net (146.97.33.74)  149.745 ms  149.927 ms  149.976 ms
11  po0-0.warr-scr.ja.net (146.97.33.54)  153.731 ms  153.209 ms  162.060 ms
12  manchester-bar.ja.net (146.97.35.46)  153.498 ms  153.777 ms  153.963 ms
13  gw-nnw.core.netnw.net.uk (146.97.40.178)  153.970 ms  153.883 ms  153.978 ms
14  gw-man.netnw.net.uk (194.66.25.98)  153.496 ms  153.681 ms  153.961 ms
15  * * *
The throughput topped out at 469Mbits/s (90 streams, 1024 MB window). I estimate (using http://www.indo.com/distance/) the distance from San Francisco to Chicago to NYC to London to Manchester to be about 8000km. This gives a bandwidth distance product of 375200 Mbits-km, (for throughput 469Mbits/s). This is about 70% of the multi-stream Internet2 Land Speed Record (http://www.internet2.edu/html/i2lsr.shtml)

A plot of the throughputs vs streams and windows is seen below:

 

There does not appear to be a lot of congestion. Plotting the Web100 SmoothedRTT and CongestionSignals vs throughput gives the plot below:

Questions

Were the IN2P3 transfer idle at the time? Wouldn't they normally use 255Mb/s of SLAC's 622Mb/s?, Stephen Gowdy.

Both In2p3 links (one via CERN with a 155Mbps capacity, the other directly to Renater limited to 100Mbps) use ESnet to get to SLAC.  The IN2P3 link via CERN was idle. The Renater link was carrying about 40Mbits/s. Just for the record the measurements were made between 19:12 and 20:28 Friday May 17 2002 PDT. The IN2P3 graphs are shown below in Lyon time:

The SLAC ESnet link utilization is shown below. The measurements to Manchester were made between between 19:12 and 20:28 Friday May 17 2002 PDT and were made for 10 seconds running then a delay of 10 seconds for each measurement (stream/window setting).
 
Do you understand what causes the dips in throughput at the 256KB, 512Kb, & 1024KB windows as the streams increase beyond 60, 20, & 10, respectively?  The pattern looks somewhat consistent (ie., caused by the same phemonenon), just left-shifted as the window size increases.  Is it an artifact of the test methodology, or would you really expect to receive, for example, three times the thruput at 25 streams versus 20 streams with a 1024KB window?  I believe I've seen these dips on similar throughput graphs you've shown, but I can't recall if the dips were ever identified as significant or otherwise explained... Phil DeMar and Connie Logg.
 
To understand whether there is a consistent pattern here or whether it is due some short term external effect such cross-traffic at the time the 10 second measurement is made, we measured the throughputs multiple times for various numbers of streams and window sizes of 256KBytes, 512KBytes, 1024KBytes and 2048KBytes. It was hoped the multiple measurements would identify whether there was a consistent pattern or whether for a given number of streams and windows the measurements scatter, for example depending on cross-traffic during the 10 second measurements.

The graphs below show the results. It appears that there is a lot of variability in the measurements as one moves up the slope to the knee. When a large number of streams are used the variation appears to be less, perhaps, since as one stream is congestion limited, others take up the slack. It also appears that the averages and maxima for each stream setting do not exhibit big fluctuations. Thus I do not believe there is a consistent pattern caused by for example a feature in the measurement moethodology, rather I believe the fluctuations seen in the single measurement per window/stream are due to instantaneous external effects such as cross-traffic as opposed to some phenomenon of the measurement itself. The smooth linear increases in throughput with window size, for smaller window sizes, are consistent with the dips for large windows and streams being caused by congestion with cross-traffic, since the smaller window sizes do not provoke enough throughput to cause congestion with the cross-traffic.

For completeness the smoothed RTT (SRTT) and the number of congestion signals recorded by Web 100 during the 10 second measurements are shown below as a function of the TCP throughput recorded by Iperf. It is seen that SRTT begins to climb after about 320 Mbits/s. It is also noticeable that even for the high throughputs many of the measurements indicate no congestion. Further investigation of the smoothed RTT behavior with throughput indicates that there is very little correlation with either window size or number of streams or the product of the two. The maximum source CPU utilization for any measurement was 20%.

 



Comments to iepm-l@slac.stanford.edu