Report on IEPM PPDG Efforts for the Quarter January - March 2005

Report prepared by Les. Cottrell April 2005

Bandwidth/Throughput Monitoring

The DataGrid Wide Area Network Monitoring Infrastructure (DWMI) toolkit now has IEPM-BW monitoring successfully installed, making measurements, collecting, analyzing and reporting results at: After some difficulties in getting ports opened at BNL, all sites are up and running successfully. We are working with BNL to get a faster host at BNL, currently it is limited to 100Mbps. All the other hosts are running at 1Gbpss.

The measurements include traceroutes at 10 minute intervals, pings and capacity/throughput. To reduce load on the network and remote sites being monitored we are mainly using the lightweight ABwE/abing monitoring tool developed by the INCITE project. It provides rough estimates of capacity, cross traffic and available bandwidth while using only 20 packets. For monitoring between the monitoring sites we are also using the more intensive iperf tool. We are looking at an alternative to iperf to get achievable throughput measurements. This uses thrulay a new tool from Stanislav Shalunov of Internet2. This appears to be easier to control than iperf. It has been installed at SLAC and initial experiences are positive. We need to compare and contrast the results from iperf and thrulay before we deploy it elsewhere, We presented a status report on DWMI at the Internet2 Joint Techs meeting in Salt Lake City in February.

We are working on providing maps of the IEPM-BW deployment. We are also working with Iosif Legrand of CERN/Caltech to decide how to provide MonALISA access to the data via a graphical web interface. With the new version of IEPM-BW (version 3), the web services access to the data no longer works. We will be reviving this to enable application access to the data.

We need to work with BNL to understand how we can monitor and compare MPLS/QoS circuits with shared best-effort circuits.

Anomalous Event Detection

We are working with FNAL to evaluate various methods for detecting anomalous events in the monitoring data. These include the Plateau Algorithm, Kolmogorv-Smirnoff, Holt-Winters, Principal Component Analysis, and a technique from Mark Burgess. Avoiding false positives and missing events is made difficult by the diurnal behavior of the data and the noise in the ABwE/abing data. We have presented a status report at the ESCC in Salt lake City and are working on a publication.

PingER and Developing Regions Monitoring

We put together and submitted the 2005 ICFA/SCIC Network Monitoring Working Group Report on Internet Performance.

Two members of the IEPM team visited NIIT in Pakistan for two weeks each, to further the MAGGIE-NS collaboration that is building and extending tools to provide sustainable monitoring of the Internet.

We have set up PingER monitoring sites in Pakistan and India which will assist in providing information about performance within, between and from Developing Regions, assisting the exiting measurements from Brazil and Russia. We put together a presentation for the World Bank on Quantifying the Digital Divide

Testbeds

To facilitate usage of the UltraScienceNet (USN) and UltraLight 10GE circuits being connected to Sunnyvale, we have acquired (from CENIC) colocation space and power on a temporary basis ($1K/month) at the Sunnyvale Level/CENIC Point of Presence. The plan is that when ESnet provides 10Gbits/s access from SLAC to Sunnyvale (planned for July 2005), we will review whether we need to retain / can afford this colocation space.

At Sunnyvale we have installed an UltraLightCisco 6509 router switch, which is connected to the 10GE UltraLight circuit and has access to StarLight and Caltech. In addition, from BaBar, we have loaned four SunFire V20Z's each with two 1.8GHz AMD Opteron 64bit cpus. Two of these have been configured with Red Hat EL 4 Linux and SLAC owned Neterion/S2io 10GE Network Interface Cards (NICs). These have been installed at Sunnyvale and connected to the Cisco 6509. To enable remote management at Sunnyvale we have also installed remote power cycling control, console access via a terminal server, and 10Mbps management access to the hosts. We have exchanged accounts with other UltraLight sites at CalTetch, CENIC-LA and StarLight and are currently working on optimizing the configurations to achieve optimum throughput between the V20zs. Currently the throughput achieved at Sunnyvale is 5.7Gbits/s for the Neterion NICs. At SC2004 we achieved 7.4Gbit/s with 2.4GHz V20zs and S2io and Chelsio TOE NICs. We need to understand the discrepancy.

The remaining two V20z's are still at SLAC. They are being configured with Chelsio TCP Offload Engine (TOE) 10GE NICs loaned from Chelsio. We are also working with Sun to install Solaris 10 on one of the hosts so we can evaluate its performance for 10GE.

Next week (4-8 April 2005) USN plans to make the physical installation of the UltraScienceNet at Sunnyvale. Initially the USN circuits will support IP traffic so we will plug it into the Cisco 6509.

We published a paper on Characterization and Evaluation of TCP and UDP-based Transport on Real Networks at the Protocols for Fast Long-Distance Networks in Lyon in February 2005.