 |
Effect of Load on RTT and Loss
 |
|
Introduction
With the success of BaBar and the need to support multiple remote computer
centers, the need for high performace between the remote computer centers
and SLAC was imperative. To assist with understanding the performance
we set out to identify the bulk data flows, see
how well they performed, identify where the bottlenecks were located, and
identify the impact on other traffic. For more information on how we tuned
the TCP stack and application to optimize bulk-data throughput and the impact on
hosts etc. see
High Performance Throughput Tuning/Measurements. Also for measurements
made on a test network see:
Internet2 Land Speed
Record.
SLAC Link Traffic
SLAC has 2 high speed production links to the Internet:
- An OC3 (155Mbps) link from SLAC to the Stanford campus. Via the Stanford campus
there is access
to Internet 2. The peering is such that this link is only use for traffic between SLAC
and CalREN2 sites (mainly Stanford,
and
the Northern University of California sites).
- An ATM 43 Mbps link from SLAC to ESnet. Due to ATM headers etc.,
plus the reservation of a 3.5 Mbps
PVC for a testbed, the capacity of this link is about 28-34 Mbps.
The utilisation of the SLAC ATM link is seen below in an
MRTG
plot. It is seen that for long
periods (days at a time), the link is saturated and during this period there
was more traffic to SLAC (green) than from SLAC (blue).
The protocol distribution (this data is obtained from the
JNFlow tool that is used
to analyze and report on data obtained from
Netflow
data from the SLAC border router interface to the ESnet ATM) is seen below.
In this period (the 24 hours
prior to 9:00am 8/26/00) there is more traffic inbound to SLAC (the negative
numbers). The main protocols are FTP, TCP Other (this is mainly
Objectivity database
traffic), and ssh, all of which are TCP based. there is also some UDP
traffic supporting the Advanced File System (AFS). Outbound the traffic is mainly FTP.
The top 25 sources and destinations are shown in the following graphs.
The top 14 communicating pairs are shown in the table below:
Source Dest Protocol Packets Bytes
DATAMOVE3.SLAC.Stanford.EDU ccobsn04.in2p3.fr TCP:51823/4020 61141037 47420682306
FTP2.SLAC.Stanford.EDU heppcs1.cithep.caltech.edu TCP:ftp-data 9467220 13094015557
FTP2.SLAC.Stanford.EDU tourmalet.Colorado.EDU TCP:ftp-data 5174732 7271402667
NORIC03.SLAC.Stanford.EDU pc6.ph.man.ac.uk TCP:ssh 850924 1272630791
DATAMOVE6.SLAC.Stanford.EDU south.llnl.gov TCP:6779/3954 844144 1187437954
DATAMOVE6.SLAC.Stanford.EDU northeast.llnl.gov TCP:6779/2155 841254 1182204791
LOWRIE.SLAC.Stanford.EDU babar1.pp.rhbnc.ac.uk TCP:ssh 706996 1059782372
DATAMOVE1.SLAC.Stanford.EDU northeast.llnl.gov TCP:6779/4946 1177886 976244220
DATAMOVE1.SLAC.Stanford.EDU south.llnl.gov TCP:6779/3965 1132436 962653009
DATAMOVE6.SLAC.Stanford.EDU west.llnl.gov TCP:6779/3879 622026 874387920
DATAMOVE3.SLAC.Stanford.EDU heppcs1.cithep.caltech.edu TCP:ssh 15733257 818168976
AFS05.SLAC.Stanford.EDU neutrino3.Stanford.EDU UDP:afs3-callbackafs3-fileserver 536720 767098934
AFS09.SLAC.Stanford.EDU muse.phys.uvic.ca UDP:afs3-callbackafs3-fileserver 551399 752882947
DATAMOVE1.SLAC.Stanford.EDU west.llnl.gov TCP:6779/4131 643259 666342669
Routes
For the record the
traceroutes from SLAC to the top 5 sites were recorded.
Pathchar
To try and characterize the paths to the top 5 sites we used
pathchar.
Pathchar allows a user to find
the bandwidth, delay, average queue and loss rate of every hop between a
source & destination on the Internet.
Pathchar is a useful tool, however, it does not give exact results, the error
is not neglectable, in our case probably all results between 20 and 30 Mbps
should be
considered as equivalent
- The measurements from SLAC to IN2P3
measurements from SLAC to IN2P3 indicate that there are comparable bottlen
between ESNET-A-GATEWAY at SLAC and the ESnet router at Chicago (26Mb/s), betw
hop 5 and 6 (24Mb/s) and hops 7 and 8 (21Mb/s). According to Gilles Farrache o
the latter two routera are located at CERN, each connected to the same
LS1010 with a 155 Mbps fiber, and having a 34 Mbps VC defined between
them.
- The measurements from IN2P3 to SLAC
indicate similar bottlenecks.
- The pathchar measurements from SLAC to
Caltech indicate possible bottlenecks
at Caltech (20Mbps), between the ESNET-A-GATEWAT at SLAC and the ESnet router at
General Atomics (GAC) in San Diego (26Mb/s).
- For SLAC to Colorado
the bottlenecks appear to be between ESNET-A-GATEWAY and the ESnet
router at LBL (28Mb/s) and at Colorado (20Mbps).
- For SLAC to LLNL
the bottleneck is unclear. Intuitively one might expect it
to be the SLAC ESnet ATM link since the other links are knwon to have
greater capacity.
- The pathchar from SLAC to
Manchester Universisty in England indicates
that there is a bottleneck of 7.5Mbps at Manchester. There also appears
to be a 12 Mbps bottleneck between ESnet and DANTE at 60 Hudson Street
in New York. There is also observed to be a lot of queueing after
60 Hudson Street.
Performance between SLAC and top communicating sites
The PingER RTT and losses for the top 5 sites for the week up until
9am 8/25/00, are shown below.
The generally good tracking between similar hosts and the lack of tracking between sites
indicates that the increases in RTTs not due a common cause across all sites.
This would appear to rule out that the congestion on the SLAC-ESnet ATM
link is a common cause in the increases in RTT.
| IN2P3.fr | cithep.CalTech | pizero.Colorado | dl.ac.UK | LLNL |
|---|
 |
 |
 |
 |
 |
| CERN.ch | CalTech | surveyor.colorado | rl.ac.UK | LBL |
|---|
 |
 |
 |
 |
 |
The transfer rates measured over 5 minute intervals when the top sites are
communicating heavily with SLAC vary.
This may be partially due to multiple transfers competing for network bandwidth,
though it may also be partially due to the
applications and hosts.
| IN2P3.fr | cithep.CalTech | pizero.Colorado | dl.ac.UK | LLNL |
|---|
| 5-37Mbps, avg=15Mbps |
| 3.6-27Mbps, avg=10.1Mbps |
|
|
One way performance
Surveyor provides one way losses and
delays between SLAC and the UK (UKERNA), Colorado and CERN.
It is een that there is considerabel asymmetry in the performance, the RTTs to SLAC
being more variable (a sign of more congestion). As mentioed above most of the
traffic is going to SLAC so this is consistent with the asymmetry.
Below we show the one way delay and losses of the link from Colorado to SLAC.
It is seen that
around 18:30pm UTC (11:30am PDT) the median one way delay (green dots)
increases by a factor of 3 to 4.
The route between Colorado and SLAC are shown below. No changes in the routes were
noted in the 24 hours (the routes were measured every 20 minutes).
Active measurements
To provide further insight we used
iperf
to generate TCP traffic from SLAC
(flora02.slac.stanford.edu a Sun Ultra 2 running Solaris 5.6 without the SACK extension) to
CERN (sunstats.cern.ch). Iperf was set to have 256kbyte windows
and 10 parallel streams.
We ran iperf in this fashion for 40 minutes from 17:19:17 8/26/00
thru 17:59:21 8/26/00 simultaneously measuring the ping (sending a 100byte
ping once a second with a timeout of 20 seconds)
RTT and loss. While doing this we also observed the link utilization.
The aggregate measured througput from SLAC to CERN was 20Mbps (26Mbps)
which is close to
the bottleneck bandwidth.
The ping loss was less than 0.05% (0 pings lost in 2400 sent),
the minimum ping RTT was
166ms, the average 258ms (226ms) and the maximum was 411ms (373ms).
We followed this up by measuring the ping RTT and loss for 40 minutes
without generating any iperf traffic starting at 18:53 and ending at 19:33
8/26/00. In this case there was no packet loss (2400 pings sent, 2400
received) and the minimum RTT was 166ms, the average was 167ms and the
maximum was 189ms.
The loading on the CERN to USA link and on the SLAC to ESnet ATM link
is shown below. The blue peak just around 17:00 hours PDT on Aug 26
on the SLAC-ESnet ATM link and the green peak at about 2:30 on
the CERN-USA corresponds to the iperf traffic.
Plots of the RTTs for the load and the no load measurements are shown below.
Some of the statistics for the RTT in msec. are:
Loaded NoLoad Units
Average 258.7 167.2 msec.
Stdev 52.3 0.9 msec.
Median 260 167 msec.
IQR 109 0 msec.
Min 166 166 msec.
Max 411 189 msec.
It can be seen that the iperf load appears to increase the
average and median RTT by
about 100msec and the distribution is much flatter for the
loaded case. On a 25Mbit/sec link a queuing delay of 100msec. would
correspond to about 2.5Mbits or about 300kbytes of data or 2100
packets with a maximum segment size of 1460 bytes.
The increase in RTT with loading seen on the SLAC CERN link is
similar to the increase in RTT observed on the
Colorado link above. It is interesting that on the SLAC CERN link
no ping packet loss was observed in either the loaded or unloaded
case.
We also measured the effect of varying the TCP window size for iperf
(with 10 parallel streams) on the
RTT. To do this we used iperf to send TCP data from SLAC to CERN
for
40 minutes at times when the SLAC to CERN link was not congested.
At the same time, once a second, we measured the ping RTT and
loss for 100 byte packets with a timeout of 20 seconds.
This was repeated for various TCP window sizes from 8kbytes to 300 kbytes.
For each measurement we noted the ping loss and RTT, and the iperf
thruput and window size.
The losses observed
in these measurements were always less than or equal to 3 packets in 2400.
The graph below shows the RTT versus window size.
It is seen that the RTT (median, average, 90 percentile and IQR) increases
steeply between a TCP window size 55 kbytes and 64 kbytes
The magnitude of the increase is about 60-10msec. for the median and average RTTs.
At the same time there is little change in the iperf thruput.
The RTT distributions are also shown below to illustrate the marked change
as one goes from a TCP window size of 55kbytes to 64kbytes.
To further study the impact of iperf and other link loading on the ping
performance, we wrote a script that for each of a set of parallel
streams (1, 2, 5, 10, 15)
and for each of a set of window sizes (8kB, 16kB, 32kb, 50kB, 55kB, 60kB, 64kB,
128kB, 256kB and 500kB) it sent a TCP iperf stream for 60 seconds (loaded case)
and simultaneously measured the iperf thruput, the ping
minimum/average/maximum RTT and losses (one 100Byte
ping/second with a timeout of 20 seconds), then for the following 60 seconds it
sent no iperf load but measured the ping RTT and loss again (this is referred to
as the iperf unloaded case). The measurements were then repeated for a different
window size.
From observing the link MRTG
utilization plots it appears that when this link gets loaded (i.e. over
50% utilization) it stays that way for long intervals (one hour to
days). Thus the differences in background (non test generated iperf) link utilization
between the iperf loaded measurement and the iperrf unloaded measurements
should be small since the measurements are made within 60 seconds of one another.
Thus to a fair aproximation we assume the background loads
are similar for the iperf loaded and unloaded cases.
Further the thruput we achieve with iperf is expected to be dependent on the
competing background load.
A scatter plot of the average and maximum loaded and unloaded RTTs versus
the iperf measured thruput can be seen below. The power series curves
are fits to the average RTTs and are to guide the eye. The R2
values for the 2 curves are shown. It is seen that there is a strong
correlation for the loaded average RTT with the iperf thruput and a medium
correlation for the unloaded average RTT (uAvg). It is also seen
that the loaded RTTs is higher than the unloaded by about 50 msec.
Though not shown the minimum RTTs differ by less than 3 msec. on average
and the medians of the minimum RTTs are identical.
To look more closely at the effect of thruput loading on loss we made
measurements over the weekend of September 30 thru
October 1, 200, with and without iperf loading for a longer period (12 hours) between
SLAC and CERN and between SLAC and Caltech.
At this time the SLAC link was lightly loaded (apart from our traffic), as was the
CERN to USA link.
The measurements were made
in 2 sets: 256kByte
window with 2 streams, 64kByte window with 8 streams.
The details of the methodology are available in
Bulk thruput: windows versus streams.
The results are shown in
the table below (the numbers in parentheses are the losses for the unloaded case, i.e.
iperf not running). It is seen that though the absoloute loss is low (< 1%) in all cases,
in the case of the SLAC to CERN link the impact of the iperf thruput is large (>
a factor
of 20 in the loss percentage for the smaller window/more streams case and a factor
of 5 for the larger window case). It is also seen that the difference in the loaded
versus unloaded losses are greater for a given link when the thruput is greater (i.e
when we are using smaller windows and larger numbers of streams).
| Destination | 256kB window | 64kB window |
| CERN Loss | 0.26% (0.01%) | 0.37% (0.07%) |
| Caltech Loss | 0.21% (0.42%) | 0.69% (0.39%) |
| CERN Thruput | 8.82Mbits/s | 24.7Mbits/s |
| Caltech Thruput | 46Mbits/s | 62.8Mbits/s |
Summary
-
We appear to be able to saturate the bottlneck links with bulk-data traffic.
-
With 10 simultaneous TCP streams running from SLAC to CERN
(when the SLAC to CERN link is unloaded) there appears to be a threshold
in the window size. Below this threshold (between 55kbytes and
64 kbytes) the ping
RTT is minimally impacted by the bulk data transfer,
above the threshold the RTT increases
by 60-100 msec.
As might be expected the impact on the delay is only in the direction of the
bulk data flow
(in this case SLAC to CERN).
- As would be expected the ping performance and iperf throughput
also depends on the existing (non iperf generator) load on the link.
A measure of the non iperf generator
load may be obtained by comparing the iperf throughput with that seen
when the link is not loaded with other traffic. For this link the iperf
throughput when the link is not loaded with other traffic is about 28Mbps.
Another indicator
is the average ping RTT when the iperf generator is not running. Without
the iperf generator running and with little other traffic on this link
the average RTT is within 2 msec of the minimum RTT of 166msec. With heavy
load on the link and the iperf generator not running the typical
average RTT is 180-200 msec.
- The magntitude in the increase in RTT (many tens of milliseconds) indicates
that there must be some long queues (e.g. a 50msec queuing delay on say
a 25Mbps link would indicate that there might be of the order of 15kBytes
queued up in buffers).
Other Interesting High Performance Links
Back to top
Created August 25, 1999
Comments to iepm-l@slac.stanford.edu