TCP Stack Measurements on SC03 10Gbits/s Links
TCP stacks on lightly loaded 1Gbit/s links (Jan '03) | TCP stacks on Production Links (April '03) | 7 TCP stack comparisons on Production Links (Oct '03) | Windows vs. streams | Effect of load on RTT and loss | Bulk file transfer measurements | QBSS measurements | SC03 Bandwidth Challenge
Les Cottrell. Created 16 Dec '03
During SuperComputing 2003 (SC03), we made some tentative TCP performance measurements on 10Gbits/s links between hosts at the SLAC booth at the Phoenix convention center and a host at the SLAC/Stanford Point of Presence at the Palo Alto Internet eXchange (PAIX) and hosts at StarLight in Chicago. Due to the short amount of time we had access to these links (~ 3 days) and the emphasis on demonstrating the maximum throughput for the SC03 Bandwidth Challenge these measurement are necessarily incomplete, however some of the tentative results are felt to be worth reporting.
All hosts used Intel PRO/10GbE LR NICS plugged into the 133MHz 64 bit PCI-X slots and
ran Linux 2.4.19 or more recent (the most recent was 2.4.22). At PAIX there was
a Dell 2650 PowerEdge with two 3.06GHz Xeon cpus. It was connected directly to a
Cisco 15540 DWDM multiplexer at 10Gbits/s. The 10Gbits/s wavelength was carried
to LA by CENIC where it was plugged into a 10GE interface in a Cisco HPR router.
On the other side of the router an OC192 POS interface sent the signal on a
Level(3) circuit via San Diego to Phoenix where it was plugged into a Juniper router at SCInet, and thence through a Force10 E1200 and to a Cisco 6506 at
the SLAC booth. Three twin cpu Dell 2650s with Intel 10GE NICs were plugged into
the SLAC booth Cisco 6506. Two of the Dell 2650s had dual 3.06GHz cpus and the third
had 2.4GHz cpus. The PAIX/LA/Phoenix link was dedicated to the SLAC and Caltech
booth traffic. The route appeared as
[root@antonia ~]# traceroute 126.96.36.199
traceroute to 188.8.131.52 (184.108.40.206), 30 hops max, 38 byte packets
1 B1.211.sc03.org (220.127.116.11) 0.203 ms 0.129 ms 0.119 ms
2 scinet-211-C.sc03.org (18.104.22.168) 2.016 ms 1.405 ms *
3 core-rtr-2-bwc-rtr-1.sc03.org (22.214.171.124) 0.306 ms 0.258 ms 0.253 ms
4 slac-core-rtr-2.sc03.org (126.96.36.199) 9.851 ms 9.635 ms 9.644 ms
5 hpr-slac-sc03--lax-hpr.cenic.net (188.8.131.52) 17.268 ms 17.252 ms 17.248 ms
A second 10Gbits/s link was also used via the shared
Abilene backbone to StarLight/Chicago. At Starlight there was an HP Integrity
rx2600 system with . dual Itanium 1.5 GHz CPUs with 8 GB RAM. The
route appeared as:
[root@iphicles root]# /usr/sbin/traceroute 184.108.40.206
traceroute to 220.127.116.11 (18.104.22.168), 30 hops max, 38 byte packets
1 22.214.171.124 (126.96.36.199) 0.455 ms 0.200 ms 0.191 ms
2 188.8.131.52 (184.108.40.206) 2.075 ms 1.394 ms *
3 220.127.116.11 (18.104.22.168) 0.383 ms 0.385 ms 0.334 ms
4 22.214.171.124 (126.96.36.199) 9.659 ms 9.629 ms 9.621 ms
5 188.8.131.52 (184.108.40.206) 17.138 ms 17.112 ms 17.111 ms
6 220.127.116.11 (18.104.22.168) 52.442 ms 52.425 ms 58.185 ms
7 22.214.171.124 (126.96.36.199) 325.548 ms 344.866 ms 329.918 ms
8 188.8.131.52 (184.108.40.206) 65.599 ms 65.422 ms 65.476 ms
9 220.127.116.11 (18.104.22.168) 65.526 ms 65.478 ms 65.468 ms
Even though the Abilene backbone was shared, the typical cross-traffic was a few hundreds of Mbits/s, so to first order these 10Gbits/s links were dedicated to our use, unlike the the production network results reported in TCP Stacks Testbed.
We also had access via Abilene/Chicago to an HP Integrity rx2600
system with dual 1.5GHz cpus with 4GB RAM in Amsterdam/NIKHEF. The route to this
host was as follows:
[root@antonia ~]# /usr.sbin/traceroute 22.214.171.124
/usr.sbin/traceroute: Command not found.
[root@antonia ~]# /usr/sbin/traceroute 126.96.36.199
traceroute to 188.8.131.52 (184.108.40.206), 30 hops max, 38 byte packets
1 B1.211.sc03.org (220.127.116.11) 0.198 ms 0.130 ms 0.118 ms
2 scinet-211-B.sc03.org (18.104.22.168) 0.928 ms 1.781 ms 1.459 ms
3 core-rtr-1-a-bwc-rtr-1-a.sc03.org (22.214.171.124) 0.303 ms 0.294 ms 0.253 ms
4 abilene-core-rtr-1.sc03.org (126.96.36.199) 10.727 ms 9.677 ms 9.615 ms
5 snvang-losang.abilene.ucaid.edu (188.8.131.52) 24.372 ms 17.069 ms 21.460 ms
6 kscyng-snvang.abilene.ucaid.edu (184.108.40.206) 52.696 ms 57.096 ms 59.529 ms
7 * iplsng-kscyng.abilene.ucaid.edu (220.127.116.11) 251.159 ms *
8 * * chinng-iplsng.abilene.ucaid.edu (18.104.22.168) 314.288 ms
9 rc-lab-11.nc3a.nato.int (22.214.171.124) 174.666 ms 174.583 ms 174.578 ms
We set up the sending hosts at SC2003 with the Caltech FAST TCP stack, and the DataTAG multi-TCP stack that allowed dynamic (without reboot) selection of the standard Linux TCP stack (New Reno with Fast re-transmit), the Manchester University implementation of the High Speed TCP (HS TCP) and the Cambridge University Scalable TCP stack. By default we set the Maximum Transfer Unit (MTU) to 9000Bytes and the transmit queue length (txqueuelen) to 2000 packets.
We made two sets of multi-stack measurements.
The effects of the Caltech/SLAC/FNAL bandwidth demo on the external router 10 Gbps interfaces with LA/PAIX, Abilene and Teragrid can be seen below:
Similar results are seen by looking at the Force 10 interfaces to the SLAC booth. More details on the measurements with Scalable, HS-TCP and stock TCP.
The effects of the measurements on the SCInet router external interfaces (facing LA/PAIX and Abilene) are shown below. The stacks used and the windows sizes are labeled. In the case of HS-TCP with an 8MByte window and Scalable with a 16MByte we changed the MTU from 9000Bytes to 1500Bytes half-way through each test. To start the FAST tests we had to reboot the sending host and due to an oversight, the FAST TCP measurements were all made with MTUs of only 1500Bytes. Also following the reboot the txqueuelen was set to 100 packets.
On the Phoenix to PAIX link we used maximum window sizes of 8Mbytes, 16MBytes and 32MBytes. This bracketed the nominal optimum window size calculated form the bandwidth delay product (17ms * 10Gbits/s) of ~ 20MBytes. For the PAIX link, all the tests were made with a single TCP stream. For Reno, HS-TCP and Scalable there was little observable differences in the behavior between stacks:
The 4.3Gbits/s limit was slightly less than the ~ 5.0Gbit/s achieved with UDP transfers in the lab between back to back 3.06GHz Dell PowerEdge 2650 hosts. The limitation in throughput is believed to be due to CPU factors (CPU speed, memory speed or the I/O chipset). The relative decrease in throughput going from 9000Byte MTU to a 1500Byte MTU was roughly proportional to the reduction in MTU size. This maybe related to the extra cpu cycles required to process the 6 times as many, but 6 times as small MTUs. Back to back UDP transfers in the lab between 3.06GHz Dell PowerEdge 2650 hosts achieved about 1.5Gbits/s or about twice the 700Mbits/s achieved with the SC03 long distance TCP transfers.
The BDP indicates a window size of about 80MBytes is needed (65 ms * 10Gbits/s). With a single Reno stream, we only had time to try windows of 8MBytes and 16MBytes and achieved stable average throughputs of only 767Mbits/s and 1530Mbit/s respectively. With 10 Reno streams, the throughput was much less stable (see 16384KByte and 32768KByte graphs). With the 32768KByte windows, we sustained stable peaks for several minutes of over 3.9Gbits/s (the Stability Index for the peak periods was about 12%) and an average throughput of about 3Gbits/s (Stability Index 39%). For the 16384KByte window there were short peaks of ~2.5Gbits/s and an average throughput of ~970Mbits/s (Stability Index 72%). From the SCInet router utilization data there was no evidence of congestion in SCInet or from SCInet to Abilene, however, there may have been other sources of congestion on the path (however, the RTTs also show no evidence of congestion) or in the host at Chicago. .
The configurations and results are summarized in the table below. Clicking on the stack name in the first column will display the time series result of the measurement.
|Stack||Streams||Window||MTU||txqueuelen||Throughputs Mbits/s||Stdev Mbits/s||Stability|
|Chicago (65ms)||6:25||Reno||10||16384KB||9000B||2000||2500 max / 972 avg||699||72%|
|Chicago (65ms)||6:55||Reno||10||32768KB||9000B||2000||4000 max / 3063 avg||449/1198||12% / 39%|
Comments to firstname.lastname@example.org