Report on IEPM PPDG Efforts for the Quarter October - December 2005

Report by Les. Cottrell for the IEPM team, SLAC

Bandwidth/Throughput Monitoring

Since our last update, we incorporated several features to improve the manageability of the toolkit. Included in this are: functionality for entering comments about the target and monitoring host nodes; methods to synchronize what is being monitored from each of the Monitoring hosts; and also a mechanism for archiving the "current" state of Iepm-BW on Monitoring hosts to simplfy reinstallation of an existing system in case of problems. We reintroduced Pathload measurements into the Iepm-BW tests and also updated our versions of probes such as Iperf and Pathchirp tests. Several new types of scatterplots have been added to display the probes vs each other. Work was done on the plateau anomaly detection code. For example, to prevent multiple alerts, once an alert is generated (after a 6 hour sustained drop), the bandwidth must recover for at least 3 hours for another alert to be generated. The Holt-Winters anomaly detection algorithm was incorportated to run in parallel with the plateau algorithm so that we can compare the two. Analysis of the ping data is now done on a regular basis to look for packet loss and packet reordering. Time series graphs are generated from this analysis showing the packet loss and reordering so that they can be compared side by side with the probe timeseries graphs. Functionality has been added to save the alerts in a data base table so that a report of all alerts for a given period can be generated. Currently all 2005 data is being run through the new plateau algorithm to check that it is working and for alert archival purposes.

Passive Monitoring

Given the promise of using passive monitoring via Netflow to measure the performance of bulk-data applications over the network, we proposed a technique for using this data for forecasting. Basically we look at the modes of the application performance distributions, come up with methods for choosing the most appropriate (e.g. eliminate anomalous peaks, choose the most frequent for a particular time of day, day of week etc.) and provide the parameters of this mode for forecast estimation.

PingER and Developing Region Monitoring

Through Internet2 we have made contact with people in Palestine and have agreement to install PingER on a couple of hosts there.

With a student from NIIT/Pakistan we are working on developing a web site to enable locating specified hosts by triangulating on ping RTTs from landmark sites. As part of this we wrote a new version of the reverse traceroute server script to also enable pings. This has been successfully installed at about 5 landmark sites.

One of our future goals is to integrate different monitoring infrastructures into a Federation. As a first step in this, we are working on a front end to the AMP/NLANR ping data measurements so they can be accessed, analyzed and displayed by the PingER project. We are also working to integrate our traceroute analysis program with AMP. We are also working with MonALISA to make IEPM-BW data available via MonALISA.

In collaboration with NIIT/Pakistan, we installed two new PingER monitoring sites at NTC/PERN in Paksitan. This should enable us to have a better evaluation of Internet performance within Pakistan.

With the conversion of BINP from the dedicated 512kbits/s Novosibirsk to KEK link to using GLORIAD we measured and analyzed the performance to show the performance has gone up by a factor of 10. However it is not consistent and has large diurnal variations which need to be understood.


With Caltech, Manchester, FNAL, CERN and others, once again we prepared for and entered the SC2005 (in Seattle) BandWidth Challenge (BWC). We put together a web site to publicize our efforts. Equipment loans were secured from Sun, Cisco, Boston Computers, QLogic, Neterion, and Chelsio. We arranged for seven 10 Gbits/s waves to the SLAC/FNAL booth (2 from SLAC, 4 from FNAL and one from the UK). At SLAC we installed an xrootd cluster of ten Sun v20z dual 1.8GHz Opterons, plus 4 file servers. At SC2005 installed eight file servers from Boston Computers, a cluster of ten Sun v20z with dual 2.4GHz Opterons, 40Gbits/ fibre channel connection to 20 TBytes in the StorCloud booth at SC2005. Our team won the BWC for the third year in succession, this year achieving over 150Gbits/s, and we put out several press releases.

Following the success of using xrootd in the bandwidth challenge we worked with the developers to evaluate its performance with 10Gbits/s Network Interfaces from Chelsio and Neterion.

We have made contact with Microsoft and have put together an MOU to evaluate a new TCP stack for Windows Vista, on real networks.

Admin, visits, papers, presentations, proposals etc.

We hosted a visit by the Rector of the National University of Sciences and Technology and the Dean of NUST's Institute of Information Technology. Met with the vice president of Stanford, the director of SLAC, the Dean of Stanford Hospital and many others.

Yee Ting Li, a postdoc from UCL England, joined the IEPM team to work on the Terapaths project.


We made the following presentations: