Extreme Bandwidth: SC2002 Bandwidth Challenge | Internet Land Speed Record | TCP stack & jumbo frame measurements

Tests performed on 3/3/03

Les Cottrell and Fabrizio Coccetti
Sections: Overview | General setup | Stacks Comparison | Single stack results
Subsections: Overview | Topology | Receiver setup | Sender setup | Jumbo Frames | Standard Frames | Stock TCP | Scalable TCP | Fast TCP | High Speed TCP | Test Transfer using Rapid

All results can be found in this excel book.

1. Overview

We performed several throughput measures between two hosts with 10 GE interfaces, using stock TCP (Reno), Fast TCP, Scalable TCP and High Speed TCP. The results show that there are a lot of variables to take into account to improve the performances. Some of these variables are: TCP memory settings, window size, mtu, txqueuelen. A small change for one of their values may alter significantly the results. This document begins with a description of the testbed and of the general settings for Sender and Receiver hosts. In the next section we compare the best results we could get for the four different TCP stacks, using Jumbo and Standard MTUs. The following section shows the measures obtained changing values of mtu, txqueuelens and other settings.  Results are grouped by TCP stacks.

At the end of the page we report a test transfer using rapid, we have been able to transfer from Sunnyvale to Geneva the equivalent amount of data of 4 high quality DVD movies in one minute.

More information on the setup and other measurements can be found at 10GE end-to-end TCP tests, and High Performance Networking: 10GbE test.


2. General setup

Topology

All tests were performed from a Sunnyvale 10GE machine (198.51.111.90) to a Geneva 10GE machine (192.91.239.213),
the link had a 2.5 Gbps limitation. This is the map of the network.
We checked that both hosts had enough free memory before every measure.

Receiver setup (GVA)

Unless different values are specified, the receiver was always configured using:

echo "4096 87380 128388607" > /proc/sys/net/ipv4/tcp_rmem
echo "4096 65530 128388607" > /proc/sys/net/ipv4/tcp_wmem

echo 128388607 > /proc/sys/net/core/wmem_max
echo 128388607 > /proc/sys/net/core/rmem_max

A specific configuration of the pci bus was used for the Intel 10 GE NIC.

Sender setup (SNV)

The sender used the following values, unless specified otherwise:

net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
kernel.core_uses_pid = 1
vm.bdflush = 100 1200 128 512 15 5000 500 1884 2
vm.max-readahead = 256
vm.min-readahead = 128
fs.file-max = 32768
kernel.shmmax = 1073741824
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_mem = 8388608 8388608 67108864

We did not use any "/sbin/setpci" for the Intel 10GE card, as we observed better performances using the standard values provided at boot time for a Linux box.


3. Stacks Comparison

Jumbo Frames

Interfaces on Sender and Receiver configured with Jumbo frames (MTU=9000B).
For all measures: Sender window size = 40 M, Receiver window size = 128 M

A) Setup of the tcp window settings of the Sender (SNV) for Fast TCP:
net.ipv4.tcp_rmem = 4096 33554432 134217728
net.ipv4.tcp_wmem = 4096 33554432 134217728
net.ipv4.tcp_mem = 4096 33554432 134217728

B) Setup of the tcp window settings of the Sender (SNV) for Stock, Scalable and HSTCP:
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_mem = 8388608 8388608 67108864

Note that Fast TCP did not performed well using the settings described in B).
If the Sender adopted that settings, then the Throughput achieved was ten times lower, as shown in this section.

Summary plot of different TCP stacks from SNV to GVA , using Jumbo.

Blowup of the summary plot of different TCP stacks from SNV to GVA , using Jumbo.

 

Standard Frames

Interfaces on Sender and Receiver configured with standard frames (MTU=1500B).
For all measures: Sender window size = 40 M, Receiver window size = 128 M

A) Setup of the tcp window settings of the Sender (SNV) for Fast TCP:
net.ipv4.tcp_rmem = 4096 33554432 134217728
net.ipv4.tcp_wmem = 4096 33554432 134217728
net.ipv4.tcp_mem = 4096 33554432 134217728

B) Setup of the tcp window settings of the Sender (SNV) for Stock, Scalable and HSTCP:
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_mem = 8388608 8388608 67108864


4. Single stack results

Stock TCP

For all measures: Sender window size = 40 M, Receiver window size = 128 M



A) Setup of the tcp window settings of the Sender (SNV) for the the two upper plots:
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_mem = 8388608 8388608 67108864

B) Setup of the tcp window settings of the Sender (SNV)  for the bottom plot:
net.ipv4.tcp_rmem = 4096 33554432 134217728
net.ipv4.tcp_wmem = 4096 33554432 134217728
net.ipv4.tcp_mem = 4096 33554432 134217728

 

Scalable TCP

For the plot at the bottom right: Sender window size = 32 M, Receiver window size = 32 M
For the others: Sender window size = 40 M, Receiver window size = 128 M



Setup of the tcp window settings of the Sender (SNV):
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_mem = 8388608 8388608 67108864

 

Fast TCP

Looking below at picture A) on the left and picture B) on the right, we see a big difference for the throughput we could achieve.
We used for both measures the same MTU, window size and txqueuelen values, the difference was only in the tcp memory settings.
The first measure (plot on the left) was taken using bigger tcp memory values than the second measure (plot on the right).
Settings are reported below the plots.

For both measures: Sender window size = 40 M, Receiver window size = 128 M

A) Setup of the tcp window settings of the Sender (SNV) for the plot on the left:
net.ipv4.tcp_rmem = 4096 33554432 134217728
net.ipv4.tcp_wmem = 4096 33554432 134217728
net.ipv4.tcp_mem = 4096 33554432 134217728

B) Setup of the tcp window settings of the Sender (SNV) for the plot on the right:
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_mem = 8388608 8388608 67108864

A) Setup of the tcp window settings of the Sender (SNV) for the plot on the left:
nnet.ipv4.tcp_rmem = 4096 33554432 134217728
net.ipv4.tcp_wmem = 4096 33554432 134217728
net.ipv4.tcp_mem = 4096 33554432 134217728

B) Setup of the tcp window settings of the Sender (SNV) for the plot on the right:
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_mem = 8388608 8388608 67108864

 

High Speed TCP

The wad.conf file, used to pass configuration parameters to HS TCP:

[GVA]
src_addr: 0.0.0.0/32
src_port: 0
dst_addr: 192.91.239.0/24
dst_port: 0
mode: 1
sndbuf: 41943040
rcvbuf: 41943040
wadai: 6
wadmd: .3
maxssth: 100
wad_ifq: 1
divide: 1
floyd: 1

Using Jumbo frames (MTU = 9000B) and txqueulen= 10000 packets

Sender: MTU = 9000, txq = 10000, window size = 40M
net.ipv4.tcp_rmem = 4096 33554432 134217728
net.ipv4.tcp_wmem = 4096 33554432 134217728
net.ipv4.tcp_mem = 4096 33554432 134217728
wad.conf: sndbuf: 134217728, rcvbuf: 134217728

Receiver window size = 128M

Sender: MTU = 9000, txq = 10000, window size = 40M
net.ipv4.tcp_rmem = 4096 33554432 134217728
net.ipv4.tcp_wmem = 4096 33554432 134217728
net.ipv4.tcp_mem = 4096 33554432 134217728
wad.conf: sndbuf: 41943040, rcvbuf: 41943040

Receiver window size = 128M

 

Sender: MTU = 9000, txq = 10000, window size = 40M
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_mem = 8388608 8388608 67108864
wad.conf: sndbuf: 41943040, rcvbuf: 41943040

Receiver: window size = 128M

Using Jumbo frames (MTU = 9000B) and txqueulen = 3000 packets

Sender: MTU = 9000, txq = 3000, window size = 40M
net.ipv4.tcp_rmem = 4096 33554432 134217728,
net.ipv4.tcp_wmem = 4096 33554432 134217728,
net.ipv4.tcp_mem = 4096 33554432 134217728
wad.conf: sndbuf: 134217728, rcvbuf: 134217728

Receiver window size = 128M

 

Sender: MTU = 9000, txq = 3000, window size = 32M

Receiver window size = 32 M

Sender: MTU = 9000, txq = 3000, window size = 40M

Receiver window size = 128 M

Using standard frames (MTU = 1500B)

Sender: MTU = 1500, txq = 100, window size = 32 M

Receiver window size = 32 M

Sender: MTU = 1500, txq = 3000, window size = 32 M

Receiver window size = 32 M

Sender: MTU = 1500, txq = 3000, window size = 40 M

Receiver window size = 128 M

Sender: MTU = 1500, txq = 10000, window size = 40 M


Receiver window size = 128 M


Test transfer using Rapid

Running Rapid for one minute from Sunnyvale to Geneva,
we have been able to transfer more than 16 GBytes of data.
In other words, we have been able to ship 4 high quality DVD movies from USA to Europe in a minute.

[root@w02gva fc]# ./rapid -r -i192.91.239.213 -l16388608 -n1000 -R154217728

Server bound!

Server listening!

Client accepted!

Network activity progressing...



statistics for process: 0

-------------------------

amount transferred: 16388608000.0 bytes

start time: 1046671178.446

stop time: 1046671238.437

total time: 59.991 secounds

throughput: 2185.476 megabits/sec



totals:

=======

total amount transferred: 16388608000.0 bytes

the earliest start time of any child: 1046671178.446

the latest stop time of any child: 1046671238.437

total elapsed time: 59.991 seconds

total throughput: 2185.476 megabits/sec



Done!

---------------------------

[root@cit-slac19 fc]# ./rapid -s -i192.91.239.213 -l16388608 -S42543040 -n1000

Client connected!



Done!

-----------------------------

[root@cit-slac19 fc]# ping 192.91.236.245

PING 192.91.236.245 (192.91.236.245) from 198.51.111.90 : 56(84) bytes of data.

64 bytes from 192.91.236.245: icmp_seq=1 ttl=61 time=65.1 ms

64 bytes from 192.91.236.245: icmp_seq=2 ttl=61 time=64.7 ms

64 bytes from 192.91.236.245: icmp_seq=3 ttl=61 time=64.7 ms

64 bytes from 192.91.236.245: icmp_seq=4 ttl=61 time=64.7 ms



--- 192.91.236.245 ping statistics ---

4 packets transmitted, 4 received, 0% loss, time 3034ms

rtt min/avg/max/mdev = 64.786/64.879/65.152/0.349 ms

----------------------------

[root@cit-slac19 fc]# traceroute -n 192.91.236.245

traceroute to 192.91.236.245 (192.91.236.245), 30 hops max, 38 byte packets

1 198.51.111.89 0.629 ms 0.283 ms 0.260 ms

2 192.5.175.129 64.930 ms 64.922 ms 64.883 ms

3 192.65.184.158 64.939 ms 64.894 ms 64.884 ms

4 192.91.236.245 64.808 ms 64.809 ms 64.799 ms

---------------------------

[root@cit-slac19 fc]# date

Sun Mar 2 22:11:44 PST 2003

--------------------------

[root@w02gva fc]# ping 198.51.111.90

PING 198.51.111.90 (198.51.111.90) from 192.91.239.213 : 56(84) bytes of data.

64 bytes from 198.51.111.90: icmp_seq=1 ttl=60 time=183 ms

64 bytes from 198.51.111.90: icmp_seq=2 ttl=60 time=183 ms

64 bytes from 198.51.111.90: icmp_seq=3 ttl=60 time=183 ms

64 bytes from 198.51.111.90: icmp_seq=4 ttl=60 time=183 ms

64 bytes from 198.51.111.90: icmp_seq=5 ttl=60 time=183 ms



--- 198.51.111.90 ping statistics ---

5 packets transmitted, 5 received, 0% loss, time 4038ms

rtt min/avg/max/mdev = 183.013/183.022/183.044/0.270 ms

-------------------------

[root@w02gva fc]# traceroute -n 198.51.111.90

traceroute to 198.51.111.90 (198.51.111.90), 30 hops max, 38 byte packets

1 192.91.239.214 0.294 ms 0.150 ms 0.144 ms

2 192.91.239.225 118.468 ms 118.378 ms 118.376 ms

3 192.65.184.157 118.491 ms 118.440 ms 118.427 ms

4 192.5.175.130 183.407 ms 183.286 ms 183.181 ms

5 198.51.111.90 183.030 ms 183.021 ms 183.022 ms

-------------------------

[root@w02gva fc]# date

Mon Mar 3 07:07:34 CET 2003

[ Top ]