Status report for Terapaths: DWMI, 6/16/06

Les. Cottrell (PI), Connie Logg, Gary Buhrmaster, Fawad Nazir, Jerrod Williams, SLAC

Active Monitoring

The DataGrid Wide Area Network Monitoring Infrastructure (DWMI) now has IEPM-BW monitoring successfully installed, making measurements, collecting, analyzing and reporting results at: BNL, Caltech, CERN, FNAL, and SLAC.

The measurements include traceroutes and pings at 10 minute intervals, and capacity/throughput. Analysis, and visualization include time series, the traceroute visualization, and a beta version of the bandwidth change analysis. Based on our own observations/studies and tests made at CAIDA and reported at PAM2005, we are replacing the ABwE/abing lightweight packet pair dispersion bandwidth estimation with pathchirp. This should provide more accurate results at a cost of slightly more network utilization and time to make a measurement (10 seconds vs. one second). For monitoring between the monitoring sites we are also using the more intensive iperf and have introduced support for thrulay a new tool from Stanislav Shalunov of Internet2. Both iperf and thrulay measure achievable throughput. Iperf provides multi-stream measurements. Thrulay provides the and also appears to be easier to control than iperf. Initial comparisons indicate that the two methods give similar results.

We are working with Bill Allcock of ANL to try and integrate GridFTP into IEPM-BW. The main challenges appear to be with the certificates and running the tests unattended (without requiring renewing the certificate).

Recently the BNL host has been upgraded to a more powerful (3GHz) cpu with a 1Gbits/s interface. The focus recently has therefore been on installing the latest version of IEPM-BW, configuring this host, getting the appropriate ports (ping/ICMP is currently blocked at the BNL firewall) enabled, optimizing the TCP window sizes, and getting the measurements to run stably. The next step will be to make measurements on both the best effort paths and the MPLS/QoS enabled paths. Dantong has devised a method using the TOS bits to allow these measurements to be made from a single host which should facailitate comparisons.

We have added maps of the IEPM-BW deployment. We are also working with Iosif Legrand of CERN/Caltech to decide how to provide MonALISA access to the data via a graphical web interface. With the new version of IEPM-BW (version 3), the web services access to the data no longer works. We will be reviving this to enable application access to the data.

Passive Monitoring

To potentially reduce traffic on the network we are looking at collecting Netflow records for big flows and seeing if we can use these for forecasting between sites. This will require mapping the data from multiple weeks onto a single week for a given site pair, followed by regularizing and interpolating the data, followed by forecasting technique such as Holt-Winters.


To facilitate usage of the UltraScienceNet (USN) and UltraLight 10GE circuits being connected to Sunnyvale, we have acquired (from CENIC) colocation space and power on a temporary basis ($1K/month) at the Sunnyvale Level/CENIC Point of Presence.

At Sunnyvale we have installed an UltraLight Cisco 6509 router switch, which is connected to the 10GE UltraLight circuit and has access to StarLight and Caltech. In addition, from BaBar, we have loaned four SunFire V20Z's each with two 1.8GHz AMD Opteron 64bit cpus. Two of these have been configured with Red Hat EL 4 Linux and SLAC owned Neterion/S2io 10GE Network Interface Cards (NICs). These have been installed at Sunnyvale and connected to the Cisco 6509. To enable remote management at Sunnyvale we have also installed remote power cycling control, console access via a terminal server, and 10Mbps management access to the hosts. We are currently working on optimizing the configurations to achieve optimum throughput between the V20zs. Currently the throughput achieved at Sunnyvale is 5.7Gbits/s for the Neterion NICs. At SC2004 we achieved 7.4Gbit/s with 2.4GHz V20zs and S2io and Chelsio TOE NICs. We need to understand the discrepancy.

The remaining two V20z's are still at SLAC. They are being configured with Chelsio TCP Offload Engine (TOE) 10GE NICs loaned from Chelsio. We are also working with Sun to install Solaris 10 on one of the hosts so we can evaluate its performance for 10GE.

ESnet has recently connected SLAC at 10Gbits/s to the BAMAN. We will assist ESnet to demonstrate utilization of the BAMAN between SLAC and NERSC for Ray Orbach's visit to LBNL on June 24th, 2005.

UltraScienceNet is also connecting at Sunnyvale so we should be able to make tests over USN using our equipment at Sunnyvale.

Papers, presentations etc.

With FNAL we submitted a paper to the 14th IEEE Workshop on Local and Metropolitan Area Networks "Anomalous Event Detection for Internet End-to-end Performance Measurements".

We made the following presentations: