IEPM

PingER and NetPerf

R. L. Cottrell and Warren Matthews
SLAC Home Page

Netperf throughput measurements between CERN (sunstats) & Caltech for the week March 8 to March 15 shows some interesting structure. Visual inspection suggests the typical throughput is around 9Mbps with several short glitches early in the week and a long (almost all day) hit on Friday. Things recovered but got flakey again on Tuesday and throughput has been very poor since then.
Deriving the Throughput from the packet Loss and RTT reported by PingeER by using the Equation of Mathis et al, PingER shows a similar structure. Aggregating the data to one point per day fails to see the short glitches, but clearly shows the large performance hits.

Note: Further work is required to correctly calibrate the calculation so the graph is shown here without units.

Furthermore, PingER allows the network performance that underlies the application performance to be revealed. The packet loss for this period from pingtable shows there was no packet loss associated with the problem on the 10th, but the problem on the 14th and 15th was related to packet loss of over 1%.
Similarly, inspection of the RTT shows the Friday problem was due to an increase in RTT, and traceping from CERN shows there were some routing problems and packets from CERN were being sent via The swiss Switch network and Dante rather than directly via the CERN-Abilene connection in Chicago.
Closer inspection of PingER to attempt to see the finer structure such as the glitches earlier in the week breaks down. In fact even the poor performance seen at the end of the week disappears ! This is because the packet loss is spread out and only aggregating it sees the problem.

Conclusion

Clearly PingER is capable of providing an summary of performance that tracks netperf well. PingER also provides an added bonus of providing information on why the problem occurs by allowing us to measure at the network level rather than application level only.

However, PingER clearly fails to show fine grain details, but this is a failure of the sampling rate rather than the pingER methodology. Specific sites involved in Monitoring for the PPDG will have higher sampling rates and consequently finer grained measurements will result.

Back to Top


Updated March 16 2000.
URL: http://www-iepm.slac.stanford.edu/
Comments to iepm-l@slac.stanford.edu