"haste" machines (Dell poweredge 2650's from LANL) 198.51.111.94: sunnyvale in haste3 (removed 2/28/03) -> gw 198.51.111.93 (Cisco GSR 12406) 192.5.175.133: Chicago in haste2 (removed 2/28/03) -> Juniper T640 Disk array machines from Caltech/CERN 198.51.111.90: sunnyvale in cit-slac-19 eth1 -> gw 198.51.111.89 (Cisco GSR 12406) 192.91.236.245: Chicago in v12chi -> gw 192.91.236.246 (Cisco 7609) 192.91.239.213: Geneva in w02chi -> gw 192.91.239.214Buffer in bottleneck router (the Cisco 7609 at Geneva) was increased from 2048 packets to the maximum of 4096 packets early 2/27/03.
1.2Gbps with 1500B MTU & FAST back to back between 198.51.11.90 and haste3 (191.51.111.94) [Eric] 1.7Gbps with 8160B MTU & FAST back to back between " " " " [Eric] Send UDP data at 1.9 Gbps from GVA (maximum sending rate could reach with iperf) [Sylvain]. Sylvain reported "I have very poor performance from CERN. I can only send UDP traffic at 30 Mbps from the 10 GbE interface." TCP (Reno) performances are very poor around 320 Mbps using Jumbo frames (from w02gva to a station at Chicago which has a GbE interface). Packets are lost when the throughput reaches 550 Mbps. [Sylvain] Can send TCP traffic at 2.2 Gbps from Chicago to the 10 GE Intel card at CERN using 3 TCP (Reno) streams and Jumbo frames. We are close to the saturation of the transatlantic link and it means that that Intel card installed at CERN can receive more than 2.2 Gbps of TCP traffic. [Sylvain] Stock TCP from SNV to GVA max was 70Mbps [Eric] From SNV CHI on the 1st day we installed the card (Monday) Fabrizio & Eric got 1.3Gbits/s with HSTCP from SNV (.90) to CHI. From SNV to CHI with FAST David got 1.35Gbps Between SNV (.90) and CHI somebody reported "I think the CPU is the bottleneck finally. UDP can reach only 1.5Gbps from .90 to chicago and CPU was nearly fully used" It appears we have major problems between SNV and GVA, SNV CHI is at least an order of magnitude better. We need to understand this in more detail. How does it look with UDP, how does it look between CHI & SNV?
1. UDP tests of receiving rate (sending >= 1.5Gbps) between CHI/SNV/GVA with MTU 1500:
To: CHI SNV GVA
From
CHI n/a 840Mbps 1.8Gbps
SNV 1.5Gbps n/a 1.5Gbps
GVA 95Mbps 736Mbps n/a
The route from GVA (192.91.239.213 or .2) to CHI is through the
100Mbps port.
(see raw logs are in: http://www.cs.caltech.edu/~weixl/feb26testing/udp/)
2. Stock TCP tests
To: 198.51.111.82 (SNV19 1GE) 198.51.111.90 (SNV19 10GE)
FROM (CHI)
192.91.239.213 10GE n/a <100Mbps & unstable (seem to be many losses)
192.91.239.2 1GE >780Mbps for peak rate n/a
This further supports the suspicion of the problem in receiving
path of 198.51.111.90...
-----------------------------------------------------
3. FAST TCP tests
To: 192.91.239.213 CHI (10GE) 192.91.239.2 (1GE)
FROM (SNV19)
198.51.111.82 (1G) n/a 940Mbps (plot)
198.51.111.90 (10G) 123Mbps n/a (plot)
Note:
192.91.239.213 is a 10GE and 192.91.239.2 is a 1GE, both on
the same machine in Geneva.
198.51.111.90 is a 10GE and 198.51.111.82 is a 1GE, both on
the same machine in Sunnyvale.
David surmised that the bottlenck was the cpu, since UDP could only reach
1.5Gbps from .90 to Chicago and the cpu was almost fully used
Chicago's 10Ge has only 1 CPU. Although SNV's .90 has 2 (seen as 4 by
hyperthreading), iperf can use 1 on a time. And that CPU was fully used.
-----------------------------------------------------
4. High-Speed TCP (web100) tests (nearly the same as FAST)
To: 192.91.239.213 (10GE) 192.91.239.2 (1GE)
FROM
198.51.111.82 (1G) n/a 940Mbps (alpha=400)plot)
198.51.111.90 (10G) 121~126Mbps n/a (alpha=2000)(plot)
I tried multiple (3) flows with HSTCP from SNV 10GE to GVA 10GE.
Each flow got 123Mbps.
Hence, we may suspect there is some problem other than congestion
control algorithm, that prevents single flow rate going higher.
-----------------------------------------------------
5. Tests from 198.51.111.66 to Geneva 213/02 (198.51.111.66 is a 1GE
port on another machine in Sunnyval)
To: 192.91.239.213 (10GE) 192.91.239.2 (1GE)
FROM
198.51.111.66 (1G) 124Mbps 848 Mbps
(Yet, the UDP achieved 957Mbps from 198.51.111.66 to 192.91.239.213.)
(Console).
-----------------------------------------------------
6. TCP dump:
Part of the tests in 5 is recorded by "tcpdump -i eth3" to capture the
return path:
For 198.51.111.66 -> 192.91.239.213
(tcpdump)
For 198.51.111.66 -> 192.91.239.2
(tcpdump)
We can see that the advertized receiving window of SNV66-GVA2 is 43906,
but the advertized receiving window of connection SNV66-GVA213 is about 2746,
which prevents the sender (SNV) sending faster.
The same difference on the connection from SNV82-GVA2 and SNV90-GVA213 with web100 kernel. (http://www.cs.caltech.edu/~weixl/feb26testing/tcpdumpfrom90/)
Anyway, I still don't know why the advertized window of connections to the 10GE card is so small -- All the other things are the same except the receiving card. Any idea?
I confirm a throughput > 900 Mbps for HS(mtu1500,txq100) from SNV(1GE) -> GVA(1GE) ---- HS TCP (mtu=1500 txq=10000) Same as above, but with a much bigger txq. SNV(10GE) -> GVA(10GE) : 138 Mbps --- HS TCP (mtu=8192, txq=100) SNV(10GE) -> GVA(10GE) : IPERF HUNG, RETURNING NO RESULT I got the same behavior for MTU 4096, 2000, 3000 (I did not try other values) If I run the same test (MTU:8192) from SNV to CHI(192.5.175.133) SNV(10GE) -> CHI(10GE) : 1.3Gbps after 10 sec ---- from CHI(192.5.175.133) using 2.4.19-16mdk (I believe it is Stock TCP) CHI(10GE) -> GVA(10GE) : 177Mbps
Sunnyvale : Hast 3 -> GVA w02gva : UDP transfer at 1.8 Gbps (sending rate = 2.18 Gbps - loss rate = 17 %, MTU = 1500 byte - Sender CPU load = 100%) Sunnyvale : Hast 3 -> CHI v12chi : UDP transfer at 1.9 Gbps (sending rate = 2.21 Gbps - loss rate = 13 %, MTU = 1500 byte - Sender CPU load = 100%) Sunnyvale : Cit-slac19 -> GVA w02gva : UDP transfer at 1.86 Gbps (sending rate = 1.86 Gbps - loss rate = 0.4 %, MTU = 1500 byte - Sender CPU0 load = 30% CPU2 load = 100%) Sunnyvale : Cit-slac19 -> GVA w02chi : UDP transfer at 1.8 Gbps (sending rate = 1.8 Gbps - loss rate = 1.5 %, MTU = 1500 byte - Sender CPU0 load = 30% CPU2 load = 100%)
Sunnyvale : Hast 3 -> GVA v12gva: 1.9 Gbps
Sunnyvale : Cit-slac19-> GVA v12gva: 2.15 Gbps (30 Gbytes in 120
seconds) Sender CPU0 load = 25% CPU2 load = 65%)
Sunnyvale: GVA 2.37 Gbps according to the iperf output using TCP Reno and Jumbo frames
and 128MByte window (requested, 256MB allocated) for 180s.
Consoles
Sunnyvale: GVA for 600s jumbo, 128MB (requested) got 2.37Gbps,
for 120s jumbo, 128MB (requested) got 2.34Gbps,
for 120s jumbo, 64MB (requested) got 2.15Gbps
Consoles
Sunnyvale: GVA for 3700s transferred > 1 TByte in < 1 hour with jumbo, 1 stream, 128MB window
(requested). Console and plot.
Note that TCP performances are better that UDP performance because I am using Jumbo frames.
1. Jumbos with stock between SNV & GVA 2. MTU 1500 with stock between SNV & GVAOther possibilities (not prioritized, letters are just to help with later referencing) are below. Please add others that come to your mind. Then we will need to organize who does what and make sure we do not collide during the US time slot today.
A. SNV-GVA FAST TCP optimization with & without jumbos [Cheng] B. SNV-CHI with jumbo & stock since it is a 10G path (not 2.5G) [Eric] C. Multi-stream tests between SNV & GVA D. SNV-GVA HS vs Scalable vs FAST [Cheng] E. SNV-CHI disk to disk with optimum TCP [Julian?] F. I am unsure we can apply for the LSR (no production routers in path, hardware not generally available), but if we can then we need to study the rules and make an effort (can't use iperf since data must not replicate each packet etc.) [Fabrizio]We can make parallel measurements from SNV to CHI with, measurements from SNV to GVA. If we try parallel measurements then Caltech should take disk servers (.90 at SNV), and SLAC/LANL the hastes (Dell 2550s).
2.4.19 Stock TCP
1500 MTU 4000 MTU 6000 MTU 8000-9000 MTU
peak 273 Mbps 1.1 Gbps 1.0 Gbps 2.2 Gbps
FAST
peak 268 Mbps 1.1 Gbps 400 Mbps 2.2 Gbps
HSTCP
peak 221 Mbps 1.1 Gbps 1.1 Gbps 1.4 Gbps
Scalable TCP
peak 230 Mbps 1.1 Gbps 1.0 Gbps 2.2 Gbps
The fast output was stable. Fast was able to reach stable max fairly quickly.
The only problem I have ever seen was with 5000 - 8000 MTUs between snv and cern for fast.
The reno output was less stable. If one calculates the loss using the Mathis
formula (loss=((0.75*MTU/RTT)/(rate))2) then one gets the plot below:
----------------------------------------
Iperf : 2.38 Gbps (duration 1 Hour) (Console)
(couldn't record headers of the transfer because the maximum
file size of our linux system is too small)
2.35 Gbps (duration 3 minutes) (We have the TCPdump file => 990 MB of headers!!!)
Rapid: 2.189 Gbps (Console)
(No TCPdump file because TCPdump running in parallel affect performance)
Rapid: 2.079 Gbps (Console) (I have the TCPdump file)
Multi stream - Jumbo frames - TCP Reno:
-------------------------------------- Iperf (3 streams): 2.35 Gbps (I have the TCP dump file) ConsoleI haven't any results with rapid.
Sylvain rebooted the router and checked its configuration. Everything seems to be OK. He did not know the origin of the problem. He also checked v12chi but couldn't solve the problem.