Comparing the Available Bandwidth Estimator (ABwE) packet train estimator results with Iperf

 

Jiri Navratil and Les Cottrell, SLAC, May  2003C

 

Introduction

 

We have installed, configured and tested a new method for Available Bandwidth[1] Estimation (ABwE). During practical measurements, starting in November 2002, we have collected and evaluated hundreds of monitoring results over a wide range of paths and studied the behavior of this tool in real network conditions. In the past few months we have started to compare these results to the results of other tools used for network monitoring of bandwidth, particularly to the Iperf type of measurements.

            The Iperf measurement is very popular and it looks relatively simple. It is widely used by many ISP's or network specialists for different types of network measurement including Available Bandwidth. Unfortunately, Iperf results can be misleading because, for large congestion window paths, they are very dependent on parameters such as the number of parallel streams and TCP’s maximum window size (see for example http://www-iepm.slac.stanford.edu/monitoring/bulk/index.html). Setting the optimal parameters is quite a delicate process. Such “tuning” often uses quite sophisticated methods. Some of these methods have been developed at SLAC and some of the methods are built into other network measurement tools, for example the netest-2 tool developed at LBNL. During our analysis we discovered that even if the Iperf parameters have been setup by these methods or by an expert, the optimal parameters do not stay constant, after some time (several hours or days) they can become invalid. This is because the load on the path plays an important role and the path itself can change. The Iperf results between well and poorly set parameters could differ by several hundred percent.

Methodology

            Instead of using of our own Iperf measurements, we compared the ABwE results with the iperf results from the IEPM-BW project. The source (Iperf client) uses TCP to send as many packets as it can in a fixed amount of time. The aggregated traffic, from all the parallel streams, achieved during a set number of seconds represents the “achievable bandwidth” for TCP applications. If the path is relatively empty, the aggregated traffic approaches the real capacity of the path and the estimate is approximately correct. However, if the path is congested on some link(s), the Iperf packets share this link with other user traffic (Cross-traffic) and the Iperf result will probably over-estimate the Available Bandwidth. This is since Iperf is typically configured to achieve "the maximum possible" bandwidth on the path by using multiple parallel streams, so other single stream applications, on congested links shared with the Iperf traffic, will not get their fair share. This is especially the case if the Iperf traffic is a major component of the bottleneck link. The Iperf results can vary this way to reflect the situation on the path at a particular time. The practical results obtained by this method usually come in the range between the Available Bandwidth and the “real capacity” of the path. A further disadvantage of the Iperf method is that during a measurement it can saturate the bottleneck link during the testing period. Thus it should not be used for very frequent measurements.

            We believe that packet dispersion techniques can report results comparable to Iperf measurements. The packets dispersion methods used in ABwE or pathchirp are more modest in the sense that they load the network with little extra monitoring traffic. The ABwE or pathchirp methods are also much less dependent on setting of parameters and so don’t create a space for ambiguity (getting much different results) in one path. However, these methods have other weak points, so the results could be different in some situations (compared to Iperf or between themselves).

            Currently, we are in the phase of evaluating ABwE and pathchirp. One of our main tasks for the near future will be to prove that these tools give good and accurate results for real network paths, especially those with high speeds (>= 100 Mbits/s). We started this type of testing in the beginning of 2003 and the first stage was to find a relationship between Iperf and the ABwE results. The first results were presented at the PAM2003 talk by Jiri Navratil.

Results

            We illustrate the current capability of ABwE with several example graphs shown below. Each of the examples has been selected to characterize different situations that we encounter. ABwE always reports 3 values: the DBC – Dominating Bottleneck Capacity (i.e. the capacity of the link that is currently the dominant limitation of the available bandwidth); CT - Cross-traffic; and AB - Available Bandwidth.  The graphs show the situation on different paths during 24 hours with measurements made at 2.5 minute intervals (one monitoring cycle for all 22 remote nodes/paths takes 2.5 minutes). In all examples we are comparing ABwE data (AB) with the throughput results of obtained by Iperf (the bars) as described in previous paragraphs.

            For a better understanding of the graphs (especially the high peaks) we must point out that the real paths that we measured are very dynamic. Thus at one moment the path is relatively empty, and in the next moment the path may be heavily congested. ABwE can reflect these changes quite well as demonstrated in Fig. 2. There is a Narrow Link (the capacity limiting link) in the path, and it determines the upper limit of the path (in Fig. 2 the Narrow Link is 100 Mbits). Very often the Narrow Link is also the DBC. However, since most paths consist of many links with different capacities, it is possible for the DBC to move to another faster link that has much more Cross-traffic and less available bandwidth. In such a case the packets passing being transmitted by the router/switch node are delivered with the full speed of this link and are thus compressed. Thus we see peaks of cross-traffic which are much higher than the Narrow Link capacity. This doesn’t mean that the original Narrow Link disappeared; rather there is another source of the bottleneck that dominated at this particular moment.

            The other type of negative peaks, or valleys (negative DBC), seen in Fig 4 & 5 for example, represent another situation on the path, i.e. a fully loaded path (utilization close to the 1). Such valleys are caused by extremely intensive and “aggressive” traffic loads, usually it is traffic which is directed to the same destination host that ABwE is monitoring. One of the sources of such situation is illustrated in Fig. 4 and is caused by high local networking activity (e.g. NFS or AFS activity) on the remote host being monitored. A second source of such valleys are the IEPM-BW Iperf measurements themselves. This is illustrated in Fig. 5 where the valleys of DBS are in good agreement with the beginning of the Iperf bars. If there is an overlap between the 0.5 second ABwE measurement and the 10 second Iperf then such a valley appears. 

 

 

Figure 1:  The result of the experiment for testing the narrow-band in the path between SLAC and NERSC. There was very low cross-traffic (XTraffic) on this  path (red-line).  The Bottleneck Capacity measured by ABwE (green line) agreed well with the narrow band  capacity (100 Mbits/s). The estimate of an Available bandwidth is close to the capacity. The results of IEPM  iperf measurements are black bars (repeated every 90 minutes). The agreement between ABW and Iperf  is within 5%.

 

 

 

 

Figure 2:  The results of the experiment between SLAC and Internet2 office in Ann Arbor. The route is via Abilene and MichNet (The Michigan regional network). Low cross-traffic (red line) dominates on this path with individual peaks of  high traffic. The Bottleneck Capacity measured by ABwE (green line) shows domination of the narrow band  capacity (100 Mbits/s) somewhere in the path. If the Xtraffic in any part of the path increases, we can see it as a value which determines the new bottleneck. The “Instant” bottleneck caused by high speed devices in the path could be very different from the dominating bottleneck. ABwE remains constant at the level of 100Mbits/s. The agreement with Iperf (black bars) is within 5%.

Figure 3:  The results of the experiment between SLAC and RICE. There is an expected amount of cross-traffic (red-line) at about 30-40% capacity during the 24 hour period. The Bottleneck Capacity measured by ABwE (green line) shows an average value around 114 Mbits/s. The estimate of an Available bandwidth is moving between 60 - 100 Mbits/s. The agreement with the IEPM Iperf results (black bars) is very good (between 5-10%).

 

 

Figure 4:  The results of the experiment for testing the ESnet high speed path between SLAC and FNAL. There is modest cross-traffic (red-line) at about 40 – 90 Mbits/s.  The Bottleneck Capacity measured by ABwE (green line) shows an average value of 410 Mbits/s. The estimate of Available bandwidth varies between 300 - 400 Mbits/s with individual drops caused by randomly appearing cross traffic. The agreement with the IEPM Iperf results (black bars) is very good.

 

 

Figure 5:  The results of the experiment for testing the high speed path between SLAC and NERSC. There is cross-traffic (red-line) with visibly increasing and decreasing trends over 24 hours.  The Bottleneck Capacity measured by ABwE (green line) shows a value of 622 Mbits/s, which is the real capacity OC12 line between both labs. The estimate of the Available bandwidth is moving according to the cross-traffic profile. Individual drops in ABwE corresponds to the Iperf measurements made by IEPM-BW (in most cases). The drop is visible in the case when the measurements via Iperf (10 seconds) and the probing time of ABwE (0.5 seconds) match  in the time. The agreement with the IEPM Iperf results (black bars) is very good, within 10%.

 

 

 

Figure 6:  The results of the experiment for testing the high speed path between SLAC and CALTECH. There is high cross-traffic (red-line) at about 100 – 300 Mbits/s with a  special pattern typical of paths where the total traffic is an aggregate of the activity of many people. The cross-traffic is increasing during the day time and decreasing during the night. The agreement with the IEPM Iperf results (black bars) is very good for most of the time. However, there are periods, when the Iperf gives much lower results. Compare to the previous examples, there is also different curve of the bottleneck capacity measured by ABwE (green line). This curve is also smoothly changing during the day. It is probably because the ABwE method is using relative relations between all measured values.

Conclusions

            The ABwE lightweight bandwidth estimation toolkit has been carefully evaluated and now provides good bandwidth estimates (i.e. good agreement with Iperf and our general experiences) in over 80% of the cases. It can make an estimate in real time (< 1 second) with minimal impact (40kbits).  The ABwE is providing feedback to IEPM. A significant difference between measurements can be an indication that Iperf used in IEPM needs to re-evaluate the parameters (windows & streams) used for our heavier weight Iperf estimator. 

Future work

 

In previous paragraphs we have demonstrated the current capability of ABwE. It allows us to do continuous monitoring with the possibility of estimating available bandwidth on the path with an accuracy between 80% and 85%. Currently, we are trying to monitor more than 20 representative paths to different destinations. We are probing several ESnet sites, many Abilene sites, 5 - 7 sites in Europe (connected via Géant), three sites in Japan and one site in Canada. It is quite a good set of probes and we can say that in about 80% of cases it gives us similar results as we presented in Fig.1- Fig. 6.

            Unfortunately, on some sites we still have difficulties interpreting our results because they did not match well with other measurements. It is hard to say which methods give more accurate results in these situations. There are still more factors which should be tested and verified. The problems with such situations can possibly be split into three main categories: the devices on the path works in a different fashion than we expect (packet dispersion problems); there is a traffic policy (traffic shaping) in some devices on the path which can limit any type of transfer (including Iperf); and the bottleneck node(s) problems on high speed segments, which can generate bursts of packets.

            In the near future, we will concentrate on comparing both our methods (ABwE and pathchirp) and developing them in the framework of INCITE. We are optimistic based on the first results obtained in March.

            With its real-time capability and low impact it is very suitable to use ABwE or pathchirp for providing real time feedback of anomalous changes in bandwidth performance. We will also work on prediction algorithms using ABwE as a source of information. Due to increasing interest from the networking community to test ABwE methods, we are going to prepare a standalone version for common use. We will also work on publishing our monitoring results via tools used in other systems (such as MonaLisa used in some Grid projects.)

 

 

 



[1] The available bandwidth “is the maximum IP-layer throughput that the path can provide to a flow, given the path’s current cross-traffic load”, What do packet dispersion techniques measure?, Dovrolis, Ramanathan, Moore