Report on IEPM PPDG Efforts for the Quarter April - June 2005

Report by Les. Cottrell, SLAC

Bandwidth/Throughput Monitoring

The DataGrid Wide Area Network Monitoring Infrastructure (DWMI) now has IEPM-BW monitoring successfully installed, making measurements, collecting, analyzing and reporting results at: BNL, Caltech, CERN, FNAL, and SLAC.

The measurements include traceroutes and pings at 10 minute intervals, and capacity/throughput, plus we have just added support for pathload. Analysis, and visualization includes time series, the traceroute visualization, and a beta version of the bandwidth change analysis. Based on our own observations/studies and tests made at CAIDA and reported at PAM2005, we are replacing the ABwE/abing lightweight packet pair dispersion bandwidth estimation with pathchirp. This should provide more accurate results at a cost of more network utilization (factor of 10) and time to make a measurement (10 seconds vs. one second). Pathload requires another factor of 100 (above pathchirp) and provides even more accuracy. For monitoring between the monitoring sites we are also using the more intensive iperf and have introduced support for thrulay a new tool from Stanislav Shalunov of Internet2. Both iperf and thrulay measure achievable throughput. Iperf provides multi-stream measurements. Thrulay provides the Round Trip Time and also appears to be easier to control than iperf. Initial comparisons indicate that the two methods give similar results.

We worked with Bill Allcock of ANL to try and integrate GridFTP into IEPM-BW. The main challenge was with the certificates and running the tests unattended (without requiring manual renewing of the certificate). For the minute we have dropped this activity, and will use bbcp or bbftp if we need file transfer rates to be measured.

The BNL IEPM-BW monitoring host has been upgraded to a more powerful (3GHz) cpu with a 1Gbits/s interface. We installed the latest version of IEPM-BW, configuring this host, getting the appropriate ports enabled, optimizing the TCP window sizes, and getting the measurements to run stably. We then beta tested the ESnet On-demand Secure Circuits and Advance Reservation System (OSCARS) to reserve circuits with dedicated bandwidth (between SLAC and CERN, and SLAC and BNL) and investigated the impact on achievable thoughput and jitter.

For SC2005 and iGrid2005 demos, we have added maps of the IEPM-BW deployment. We are also working with Iosif Legrand of CERN/Caltech to decide how to provide MonALISA access to the data via a graphical web interface. With the new version of IEPM-BW (version 3), the web services access to the data no longer works. We will be reviving this to enable application access to the data.

Passive Monitoring

To potentially reduce traffic on the network, provide monitoring at 10Gbps, a realm where packet pair dispersion techniques are likely to fail due to Network Interface Card offloading function ofloading, and also the lack of sufficient granularity for the Unix system, we are looking at collecting Netflow records for big flows (> 100Kbits/s) and seeing if we can use these for forecasting between sites. This required getting approval from the SLAC security folks to make filtered Neflow records available to NIIT, mapping the data from multiple weeks onto a single week for a given site pair, followed by regularizing and interpolating the data, followed by a forecasting technique such as Holt-Winters. Initial results look promising. There appear to be about 15 sites seen from SLAC where there are sufficient Netflow flows to be able to make forecasts even in the presence of seasonal variations. Further the Netflow throughputs are strongly correlated with the packet pair available bandwidth measurements.

PingER and Developing Region Monitoring

Two graduate students from NIIT are now at SLAC for a year as part of the MAGGIE-NS collaboration working on the IEPM project and PingER.

We put together and made a presentations for the International Workshop on HEP Networking (in Korea) and for the Internet2 Members Meeting (in Washington), on Quantifying the Digital Divide from Within and Without including new measurements made from the new monitoring sites in Pakistan and India. We added new sites in Uganda, and are working to get a monitoring site in South Africa (the first on the African continent).

Problem Case Reports

We put together two case reports in this period, both for the Budker Institute of Nuclear Physics at Novosibirsk. Both had to do with major routing changes. The first resulted in loss of connectivity due to security configurations, the latter used GLORIAD to increase the achievable throughput from 500kbits/s to 7Mbits/s and reduce the RTT from 350ms to 250ms.


The 10Gbps wide area network testbed at Sunnyvale is still in place with a connection to UltraLight, and an imminent connection to UltraScienceNet. Due to lack of fundings its future is uncertain. We have been assisting UltraScienceNet with their connection at Sunnyvale.

ESnet has recently connected SLAC at 10Gbits/s to the BAMAN. We assisted ESnet to demonstrate utilization of the BAMAN between SLAC and NERSC for Ray Orbach's visit to LBNL on June 24th, 2005.

With Caltech, Manchester, FNAL and others we plan to participate at iGrid2005 (in San Diego) and SC2005 (in Seattle). We have put together a web site to publicize our efforts.

We gave an invited talk at the the Sun Microsystems SuperG meeting in Washington on Characterization and Evaluation of TCP and UDP-based Transport on Real Networks.

Admin, visits, papers, presentations, proposals etc.

We hosted: a one day site visit by our DoE/MICS project manager, Thomas Ndousse; a one week visit by the MAGGIE-NS PI from Pakistan, Prof. Dr. Arshad Ali; a visit by Kars Ohrenberg of DESY.

With FNAL we submitted a paper to the 14th IEEE Workshop on Local and Metropolitan Area Networks, entitled "Anomalous Event Detection for Internet End-to-end Performance Measurements".
With Caltech and others we had published: FAST TCP: From Theory to Experiments, Cheng Jin et. al. IEEE Network, Jan/Feb 2005, vol 19, No.1, also SLAC-PUB-10869.

We made the following presentations:

We submitted the following proposals: We are also working on a proposal to Cisco for network topology discovery.