Comparing
the Available Bandwidth Estimator (ABwE) packet train estimator results with
Iperf
Jiri Navratil and Les Cottrell, SLAC, May 2003C
We have installed, configured and tested a new method for Available Bandwidth[1]
Estimation (ABwE). During practical measurements, starting in November
2002, we have collected and evaluated hundreds of monitoring results over a
wide range of paths and studied the behavior of this tool in real network
conditions. In the past few months we have started to compare these results to
the results of other tools used for network monitoring of bandwidth,
particularly to the Iperf
type of measurements.
The Iperf measurement is very popular and it looks relatively simple. It is widely used by many ISP's or network specialists for different types of network measurement including Available Bandwidth. Unfortunately, Iperf results can be misleading because, for large congestion window paths, they are very dependent on parameters such as the number of parallel streams and TCP’s maximum window size (see for example http://www-iepm.slac.stanford.edu/monitoring/bulk/index.html). Setting the optimal parameters is quite a delicate process. Such “tuning” often uses quite sophisticated methods. Some of these methods have been developed at SLAC and some of the methods are built into other network measurement tools, for example the netest-2 tool developed at LBNL. During our analysis we discovered that even if the Iperf parameters have been setup by these methods or by an expert, the optimal parameters do not stay constant, after some time (several hours or days) they can become invalid. This is because the load on the path plays an important role and the path itself can change. The Iperf results between well and poorly set parameters could differ by several hundred percent.
Instead of using of our own Iperf
measurements, we compared the ABwE results with the iperf results from the IEPM-BW project. The source
(Iperf client) uses TCP to send as many packets as it can in a fixed amount of
time. The aggregated traffic, from all the parallel streams, achieved during a
set number of seconds represents the “achievable bandwidth” for TCP
applications. If the path is relatively empty, the aggregated traffic
approaches the real capacity of the path and the estimate is approximately
correct. However, if the path is congested on
some link(s), the Iperf packets share this link with other user traffic
(Cross-traffic) and the Iperf result will probably over-estimate the Available
Bandwidth. This is since Iperf is typically configured to achieve "the
maximum possible" bandwidth on the path by using multiple parallel
streams, so other single stream applications, on congested links shared with
the Iperf traffic, will not get their fair share. This is especially the case
if the Iperf traffic is a major component of the bottleneck link. The Iperf
results can vary this way to reflect the situation on the path at a particular
time. The practical results obtained by this method usually come in the range
between the Available Bandwidth and the “real capacity” of the path. A further
disadvantage of the Iperf method is that during a measurement it can saturate
the bottleneck link during the testing period. Thus it should not be used for
very frequent measurements.
We believe
that packet dispersion techniques can report results comparable to Iperf
measurements. The packets dispersion methods used in ABwE or pathchirp are
more modest in the sense that they load the network with little extra
monitoring traffic. The ABwE or pathchirp methods are also much less dependent
on setting of parameters and so don’t create a space for ambiguity (getting
much different results) in one path. However, these methods have other weak
points, so the results could be different in some situations (compared to Iperf
or between themselves).
Currently, we are in the phase of evaluating ABwE and pathchirp. One of our main tasks for the near future will be to prove that these tools give good and accurate results for real network paths, especially those with high speeds (>= 100 Mbits/s). We started this type of testing in the beginning of 2003 and the first stage was to find a relationship between Iperf and the ABwE results. The first results were presented at the PAM2003 talk by Jiri Navratil.
We illustrate the current capability of ABwE with several example graphs shown below. Each of the examples has been selected to characterize different situations that we encounter. ABwE always reports 3 values: the DBC – Dominating Bottleneck Capacity (i.e. the capacity of the link that is currently the dominant limitation of the available bandwidth); CT - Cross-traffic; and AB - Available Bandwidth. The graphs show the situation on different paths during 24 hours with measurements made at 2.5 minute intervals (one monitoring cycle for all 22 remote nodes/paths takes 2.5 minutes). In all examples we are comparing ABwE data (AB) with the throughput results of obtained by Iperf (the bars) as described in previous paragraphs.
For a better understanding of the graphs (especially the high peaks) we must point out that the real paths that we measured are very dynamic. Thus at one moment the path is relatively empty, and in the next moment the path may be heavily congested. ABwE can reflect these changes quite well as demonstrated in Fig. 2. There is a Narrow Link (the capacity limiting link) in the path, and it determines the upper limit of the path (in Fig. 2 the Narrow Link is 100 Mbits). Very often the Narrow Link is also the DBC. However, since most paths consist of many links with different capacities, it is possible for the DBC to move to another faster link that has much more Cross-traffic and less available bandwidth. In such a case the packets passing being transmitted by the router/switch node are delivered with the full speed of this link and are thus compressed. Thus we see peaks of cross-traffic which are much higher than the Narrow Link capacity. This doesn’t mean that the original Narrow Link disappeared; rather there is another source of the bottleneck that dominated at this particular moment.
The other type of negative peaks, or valleys (negative DBC), seen in Fig 4 & 5 for example, represent another situation on the path, i.e. a fully loaded path (utilization close to the 1). Such valleys are caused by extremely intensive and “aggressive” traffic loads, usually it is traffic which is directed to the same destination host that ABwE is monitoring. One of the sources of such situation is illustrated in Fig. 4 and is caused by high local networking activity (e.g. NFS or AFS activity) on the remote host being monitored. A second source of such valleys are the IEPM-BW Iperf measurements themselves. This is illustrated in Fig. 5 where the valleys of DBS are in good agreement with the beginning of the Iperf bars. If there is an overlap between the 0.5 second ABwE measurement and the 10 second Iperf then such a valley appears.

Figure 1: The
result of the experiment for testing the narrow-band in the path between SLAC
and NERSC. There was very low
cross-traffic (XTraffic) on this path (red-line). The Bottleneck Capacity measured by ABwE
(green line) agreed well with the narrow band
capacity (100 Mbits/s). The estimate of an Available bandwidth is close
to the capacity. The results of IEPM
iperf measurements are black bars (repeated every 90 minutes). The
agreement between ABW and Iperf is
within 5%.

Figure 2: The
results of the experiment between SLAC and Internet2 office in

Figure 3: The
results of the experiment between SLAC and RICE. There is an expected amount of cross-traffic (red-line) at about 30-40%
capacity during the 24 hour period. The Bottleneck Capacity measured by ABwE
(green line) shows an average value around 114 Mbits/s. The estimate of an
Available bandwidth is moving between 60 - 100 Mbits/s. The agreement with the
IEPM Iperf results (black bars) is very good (between 5-10%).

Figure 4: The
results of the experiment for testing the ESnet high
speed path between SLAC and FNAL.
There is modest cross-traffic (red-line) at about 40 – 90 Mbits/s. The Bottleneck Capacity measured by ABwE
(green line) shows an average value of 410 Mbits/s. The estimate of Available
bandwidth varies between 300 - 400 Mbits/s with individual drops caused by
randomly appearing cross traffic. The agreement with the IEPM Iperf results
(black bars) is very good.

Figure 5: The
results of the experiment for testing the high speed path between SLAC and
NERSC. There is cross-traffic
(red-line) with visibly increasing and decreasing trends over 24 hours. The Bottleneck Capacity measured by ABwE
(green line) shows a value of 622 Mbits/s, which is the real capacity OC12 line
between both labs. The estimate of the Available bandwidth is moving according
to the cross-traffic profile. Individual drops in ABwE corresponds to the Iperf
measurements made by IEPM-BW (in most cases). The drop is visible in the case
when the measurements via Iperf (10 seconds) and the probing time of ABwE (0.5
seconds) match in the time. The
agreement with the IEPM Iperf results (black bars) is very good, within 10%.

Figure
6: The results of the experiment for testing the high speed path between
SLAC and CALTECH. There is high cross-traffic (red-line) at about 100 – 300
Mbits/s with a special pattern typical
of paths where the total traffic is an aggregate of the activity of many people.
The cross-traffic is increasing during the day time and decreasing during the
night. The agreement with the IEPM Iperf results (black bars) is very good for
most of the time. However, there are periods, when the Iperf gives much lower
results. Compare to the previous examples, there is also different curve of the
bottleneck capacity measured by ABwE (green line). This curve is also smoothly
changing during the day. It is probably because the ABwE method is using
relative relations between all measured values.
The ABwE lightweight bandwidth estimation toolkit has been carefully evaluated and now provides good bandwidth estimates (i.e. good agreement with Iperf and our general experiences) in over 80% of the cases. It can make an estimate in real time (< 1 second) with minimal impact (40kbits). The ABwE is providing feedback to IEPM. A significant difference between measurements can be an indication that Iperf used in IEPM needs to re-evaluate the parameters (windows & streams) used for our heavier weight Iperf estimator.
In previous paragraphs we have demonstrated the current
capability of ABwE. It allows us to do continuous monitoring with the
possibility of estimating available bandwidth on the path with an accuracy
between 80% and 85%. Currently, we are trying to monitor more than 20
representative paths to different destinations. We are probing several ESnet sites, many
Unfortunately, on some sites we still have difficulties interpreting our results because they did not match well with other measurements. It is hard to say which methods give more accurate results in these situations. There are still more factors which should be tested and verified. The problems with such situations can possibly be split into three main categories: the devices on the path works in a different fashion than we expect (packet dispersion problems); there is a traffic policy (traffic shaping) in some devices on the path which can limit any type of transfer (including Iperf); and the bottleneck node(s) problems on high speed segments, which can generate bursts of packets.
In the near future, we will concentrate on comparing both our methods (ABwE and pathchirp) and developing them in the framework of INCITE. We are optimistic based on the first results obtained in March.
With its real-time capability and low impact it is very suitable to use ABwE or pathchirp for providing real time feedback of anomalous changes in bandwidth performance. We will also work on prediction algorithms using ABwE as a source of information. Due to increasing interest from the networking community to test ABwE methods, we are going to prepare a standalone version for common use. We will also work on publishing our monitoring results via tools used in other systems (such as MonaLisa used in some Grid projects.)
[1] The available bandwidth “is the maximum IP-layer throughput that the path can provide to a flow, given the path’s current cross-traffic load”, What do packet dispersion techniques measure?, Dovrolis, Ramanathan, Moore