Terapaths: A QoS Collaborative Data Sharing Infrastructure for Petascale Computing Research -II

DWMI: Datagrid Wide Area Monitoring Infrastructure

PI: Les Cottrell, SLAC


Todays's data intensive sciences, such as High Energy Physics (HEP), need to share large amounts of data at high speeds. This in turn requires high-performance, reliable end-to-end network paths between the major collaborating sites. In addition end-users need long and short-term forecasting for application and network performance for planning, setting expectations and trouble-shooting. To enable this requires a network monitoring infrastructure between the major sites.

The main goal of the DWMI project is to build, deploy and effectively learn how to use an initially relatively small but rich, robust, sustainable, manageable network monitoring infrastructure focused on the needs of critical HEP experiments such as Atlas, BaBar and CMS. A characteristic of these experiments is a hierarchical tiering of sites. The major data sources (accelerator sites such as CERN, FNAL or SLAC) are tier 0, tier 1 sites are major data re-distribution centers for a region (e.g. a major HEP data center in each of France, Italy, Germany, the UK and US etc.), tier 2 are major collaborator sites (typically major university sites such as Caltech), tier 3 are smaller collaborators etc. The idea is that the raw experimental data is replicated from the tier 0 to tier 1 sites, where it is analyzed and made available to higher tiered sites. To match this architecture, DWMI needs to be deployed at tier 0, tier 1, and a few tier 2 sites. The measurements at each of these sites will then be configured to provide regular end-to-end network performance measurements and analysis to its collaborator sites.

The sub-goals of the DWMI project are:


Impact to specific DoE Science applications

Improved network understand and expectations together with more quickly discovering and reporting network problems is critical to all network based applications. The DWMI project's deployment of the IEPM-BW infrastructure focused on the needs of the DoE supported LHC, BaBar, CDF and D0 HEP experiments provides an evolving and practical basis for improved networking.

Synergy developed with DoE application developers to facilitate technology transfers

We are collaborating with groups at CERN, BNL, FNAL and Caltech to install, configure and put into use the IEPM-BW measurement toolkits. We have set up a network of contacts at the monitoring and monitored IEPM-BW sites. When we receive alerts and deem them of interest, we communicate with our contacts at the relevant sites to alert them to the problem and to better understand it.

We have made contact with the Open Science Grid (OSG) community's Wilko Kroeger to explore how to assist them with their network monitoring needs.

We have and will continue to work with the ESnet OSCARS project to assist in monitoring the effectiveness of QoS, and to help specify the requirements for monitoring (e.g. to provide persistent requests, and a program to program API to the scheduler. We are also working closely with Dantong Yu and the BNL Terapaths project to provide monitoring and support for the QoS services.

IN addition to working with DoE develkopers, are in regular contact with developers of monitoring infrastructures and tools funded by other agencies. In particular we are working closely with Internet2 to evaluate and improve thrulay and more closely integrate perfSONAR, and with the NLANR AMP developers to integrate the traceroute analysis and visualization. We are working closely with Iosif Legrand and others at the Caltech HEP group and CERN to integrate the IEPM and PingER measurements into MonALISA. We are also evaluating whether to include PingER and/or IEPM-BW as part of the Virtual Development Toolkit.