IEPM

Bulk throughput simulation

SLAC Home Page
Bulk throughput measurements | Windows vs. streams | Effect of load on RTT and loss

Introduction

To try and get a better idea of the impact of various parameters such as round trip time (RTT), window size, flows etc., on bulk throughput performance and impacts we worked with George Riley of Georgia Tech to modify and install the ns-2 network simulator at SLAC. A secondary purpose for this work was to validate ns-2 for predicting bulk throughput by comparing simulated predictions with actual measurements.

Note that in this page we use the words throughput and goodput interchangeably and the words streams and flows interchangeably.

Simulator "measurements"

After downloading and installing ns-2, we familiarized ourselves with how to use it, and looked at the impact of varying various parameters such as queue length (Q), run time, RTT, bottleneck bandwidth (BW), bottleneck factor (BF), number of flows (NF) and window sizes. We used the TCPVariant TCP/Newreno, though we verified that changing to TCP/Tahoe had no noticeable effect on the conclusions. A couple of examples of the ns-2 goodput predictions for various parameters are shown below:
Goodput vs bandwidth and window Goodput vs flow and RTT
We then selected reasonable initial values or ranges of the above paramters so we could simulate the link between SLAC and the remote sites.

Real measurements and comparisons

Real TCP throughput/goodput measurements were made between SLAC and various sites using iperf. The details of the methodology can be found in Bulk throughput measurements. For all simulation runs we used window sizes (in packets) of 6, 11, 22, 44, 88, 175, 350, and 700 which roughly correspond to 8760, 16060, 32120, 64240, 128480, 511000, 1022000 bytes assuming a packet size of 1460 bytes.

Daresbury Lab, nr. Liverpool UK

Below are shown the observed throughput/goodput performance measured from SLAC to Daresbury and that simulated by ns-2. The simulation paramter values selected were: Using the above parameters, we then varied the window size (measured in packets which we asssumed were maximum segment sizes of 1460 bytes) and the number of flows. The ranges we used for window and flows were:
Observed goodput between SLAC and DL Simulated goodput between SLAC and DL

To investigate the impact of multiple flows and window sizes on losses as well as goodput, we modified ns-2 to report on losses. Using the above ns-2 paramters for DL we reran ns-2 for multiple settings of windows and flows. We then plotted the utilization = goodput / bottleneck_bandwidth versus the loss for flows of 1, 2, 4, 7, 10, 15, 20, 25, 30 and 40 and windows of 8kB, 16kB, 32kB, 64kB, 128kB, 256kB, 512kB and 1024kB. The results are shown below. As one runs from right to left on a line of equal number of flows, the window size is decreased (by a factor of 2 for each point on the line) with the right hand mostpoint being for a 1024kByte window. It is seen that one can obtain over 70% utilization with no packet loss, as long as the number of flows is 4 or more. The optimum goodput is over 90% utilization with no loss. This is acheived with modest window sizes (<= 64kBytes) and large number of streams (>= 25). There appear to be maxima in the goodput utilization beyond which increasing the number of flows or window size causes increased packet loss without improvement in goodput utilization. At the same time the goodput does not decrease markedly after one passes the maximum. For a small number of flows (<= 4) one cannot get beyond 70% goodput utilization regardless of the window size. Also for flows of <=4 the maximum goodput is obtained for the window size closest to the bandwidth delay product (10Mbps * 162 msec.) of about 200kBytes. As the number of flows increases beyond 4 the maximum goodput is obtained for increasingly smaller window sizes.
simulated utilization vs flows & windows for DL

IN2P3 in Lyon, France

The simulation paramter values selected were: Using the above parameters, we then varied the window size (measured in packets which we asssumed were maximum segment sizes of 1460 bytes) and the number of flows. The ranges we used for window and flows were: The results are shown below:
Observed goodput SLAC IN2P3
Simulated goodput with bandwidth factor=2 Simulated goodput with bandwidth factor=2

If one compares the observed (o) versus simulated (s) goodput for IN2P3 then one gets graphs as shown below. The first graph is for the normalized residuals (o - s) / o, and the second is the absolute residuals (o - s). It is seen that in general the agreement in the Residual=(o - s) / o is typically better than 30%.
Observer - simulated residuals for IN2P3 Observer - simulated absolute residuals for IN2P3

Caltech, Pasadena, California

We measured the observed goodput using iperf from pharlap.slac.stanford.edu to heppcn08.hep.caltech.edu. These measurements were made over Christmas 2000 when the link was lightly loaded. In all between 16 and 21 sets of 10 second measurements were made for each setting of flows and windows over a period of 36 hours. These measurements were aggregated.

We also used the ns-2 simulator to predict the goodput. The simulation paramter values selected were:

Using the above parameters, we then varied the window size (measured in packets which we asssumed were maximum segment sizes of 1460 bytes) and the number of flows. The ranges we used for window and flows were: The results are shown below. It is seen that there is quite good qualitative agreement.
Observed goodput SLAC Caltech Simulated goodput SLAC to Caltech
The residuals (residual = (o - s) / o), seen below, also show up the poor agreement for low numbers of flows, and the general underestimation of the simulator for high number of flows (possibly due to the guesstimate of the bottleneck bandwidth being low). If one excludes the data from low numbers (<= 3) of parallel flows data then the agreement for goodput between observation and simulation as measured by the residuals is within +- 20%. If one scatter plots the observed versus the simulated goodput, one point for each setting of window and flow, then one gets the next plot. The line is a linear fit constrained to go through the origin and its R2 correlation factor is 0.7746. If the line is not constrained to go through the origin then the R2 is 0.802.
Residuals for goodput from SLAC to Caltech Scatter plot of obs vs simulated goodput for SLAC to Caltech
We also compared the losses observed by sending 100 byte pings once a second while the goodput measurements were being made. Since about 20 10 second measurements were made for each setting of flow and window, the total number of pings for each setting was about 200. The results are shown below. Series 1 is for the measured ping data, series 2 is from the simulation, It is seen that the there is a much lower rate of ping loss compared to the TCP loss predicted by simulation. It is also seen that losses increase with number of flows. The dependence of observed ping losses on window size is less clear.
Obsered and simulated loss while doing bulk thrupt from SLAC to Caltech

LBNL, Berkeley, California

We measured the observed goodput using iperf from pharlap.slac.stanford.edu to costard.lbl.gov.

We also used the ns-2 simulator to predict the goodput. The simulation paramter values selected were:

Using the above parameters, we then varied the window size (measured in packets which we asssumed were maximum segment sizes of 1460 bytes) and the number of flows. The ranges we used for window and flows were: The results are shown below. Changing the window size appears to have little effect, probably since the bandwidth delay product (30Mbps * 3.4msec.) gives a small window size of 12.75kBytes. It is seen that the simulated results are more uniform. The simulation also appears to do a poor job for small numbers (<= 4) of flows. We also tried the TCP/Variant Sack1 for the simulation but the results were identical (which makes me suspect a user error in using/understanding the simulator).
Observed goodput SLAC LBNL Simulated goodput SLAC to LBNL
The residuals (residual = (o - s) / o), seen below, also show up the poor agreement for low numbers of flows.
Residuals for goodput from SLAC to LBNL
The simulated goodput vs flows and loss for window sizes of 8kBytes, 16kBytes, 32kBytes, 64kBytes, 128kBytes, 256kBytes, 512kBytes and 1024kBytes are shown below. The larger window sizes are the dots to the right for a given flow set.

Summary

The simulator appears to give quite good qualitative agreement of the goodput with observations for the links between SLAC and DL, SLAC and IN2P3, SLAC and Caltech in November/December 2000, but not so good for the shorter (in RTT) link to LBNL. The values of R2 for the simulated versus observed goodput are seen in the table below: 
Remote site BW (Mbits/s) RTT (msec.) BW * RTT (kBytes) R2
Daresbury lab 10 162 203 0.885
IN2P3 28 180 630 0.85
Caltech 45 10 56 0.785
LBNL 30 3.4 13 0.2
If the generally good agreement holds up it would provide a simple way of estimating what goodput is possible with default window sizes and single flows, selecting sensible window sizes and number of flows to optimize goodput, setting expectations for replicating data between different sites, helping decide the impact of link upgrades etc. For small numbers of flows the simulator predicts the maximum goodput for window sizes given by the bandwidth delay product. This does not hold for larger numbers of flows. The maximum goodput is obtained for flows of 7 or more. There is a discrepancy in the simulated TCP loss and the ping loss observations. This needs further study but may partially be due to the different ways of "measuring" packet loss (by ping, versus in the TCP stack), and the different packet sizes.

Using the simulator is simpler than making measurements with iperf, since there is no need to install iperf servers and clients at all the sites one is interested in. Such iperf installations can be difficult due to security concerns or finding someone willing to take the time, or to provide an account and password. Also many hosts are set with a very low (<= 64kBytes) maximum TCP window size. Another benefit is that the simulator measurements do not impact the network by adding extra traffic. Since bulk throughput measurements can easily utilize over 90% of the bottleneck bandwidth this can be a very important consideration. With the existence of network simulators such as ns-2, it is much simpler to simulate throughput than to actually measure it for many paths (see below for more discussion on this). The simulator can therefore be used to more simply understand the current performance achieved by bulk data throughput applications, especially those using non-optimal default settings, and to aid in optimizing the parameters. It will also be of value to predict performance available between sites on existing paths and existing TCP stacks and how well a proposed upgrade to the path or stack can perform. After an upgrade is put in place, the simulator can be used to help ensure the expected performance is achieved and help identify where further improvement may be needed. The simulator can also be used to look at the impact of the bulk throughput in terms of the increased losses and RTT variability, and guidelines can be provided to limit the impact of the bulk throughput, e.g. by deliberately setting the parameters so that the bulk throughput leaves some fraction of the bandwidth unused, or so that it does not drive the network into unnecessary packet losses and over-runs. On the other hand the simulator does not currently (see What do packet dispersion techniques measure for some work in this direction) simulate the effects of competing (cross) traffic or its variability with time of day or day of week etc., and hence in general has less variability than real world measurements.

In order to use the simulator effectively one also needs quick, effective ways to measure or estimate the parameters needed for the simulator. There are available methods to realistically estimate many of the parameters needed for the simulation. For example ping can be used to estimate the RTT, and pipechar or pchar or Nettimer or Pathrate can give estimates of the bottleneck bandwidth for lower speed (<=T3) paths. Since the simulations can be done fairly quickly one can also optimize the parameters to improve the fit between observed and simulated values of various metrics.

Back to top


Created December 15, 2000, last update December 28, 2000.
Comments to iepm-l@slac.stanford.edu