17. WORK PROPOSAL DESCRIPTION (Approach, anticipated benefit in 200 words or less)

This proposal covers the further development and deployment of Internet End-to-End Performance Monitoring (IEPM) tools.  The success of the current efforts, both within and beyond the High Energy Physics (HEP) community and the Energy Sciences Network (ESnet) has provided ample incentive to extend the capabilities of the existing tools, and develop new ones for purposes not previously envisioned by the project.

The project will further develop tools for an improved understanding of the critical components that limit end-to-end performance, and electronically publish results in the form of tables and graphs similar to those already developed.  In addition, the project intends to extend the monitoring to high performance networks and bulk throughput applications. This will necessarily be more intensive in its utilization of the network and less extensively deployed. It will be aimed at understanding how to achieve high throughput, setting expectations, assist in trouble-shooting and provide  steering information to grid applications. Furthermore, new tools intended to aid visualization and understanding of the analyzed results by expert and layman alike will be developed.

These tools, existing and new, and the increased emphasis high network performance throughput, will lead to a greater understanding of the dynamics of the Internet, help provide realistic service quality expectations, and identify where extra resources may be effectively applied.

 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


18. TASK DESCRIPTION (Approach, relation to work package, in 200 words or less)

 

FTPA #1

This task provides for maintenance, enhancement of the existing PingER tools and infrastructure. In addition it will support the development of a new active and passive network and application measurements, data collection and archiving, analysis and associated reports. In January 2002, the IEPM PingER data gathering tools were deployed at 34 sites in 14 countries.  A total of almost 700 remote nodes at about 420 sites in 74 countries are probed by these monitoring sites, providing performance metrics on over 3500 end-to-end pairs.  The 72 countries have over 78% of the world's population and about 99% of the online users of the Internet.  PingER is thought to be the most extensive network performance monitoring project in existence. The new high performance measurements will be critical to understanding today's and future high speed networks and emerging Grid applications. As with so many aspects of modern computing and networking, the needs of the High Energy Physics (HEP) and ESnet communities continue to push the envelope, and the PingER tools are an important part in measuring the success.

 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 



Internet End-to-End Performance Monitoring

Progress in FY2001

The extent of the PingER monitoring grew to about 3600 monitor-remote site pairs, 34 monitoring sites, 691 remote hosts at 44 sites covering 74 countries that between them contain over 99% of the world’s population with Internet connections. The utilization grew to over 2000 web site hits per day with the accesses coming from over 45 countries. FNAL got re-involved with the IEPM project so we spent time bringing the new group up to speed, and mapping out plans. The robustness and quality of the FNAL graphing tools have been improved and the data collection at FNAL has been dramatically improved.

The PingER analysis programs were extensively re-written and reports were extended to provide information on inter packet delay variation, duplicate pings and out-of-order pings. New graphical reporting tools were added. Traceroute measurements (traceping), archiving, mining and reporting was ported from the existing VMS platform to Unix, and the new tool installed at SLAC. We continued to run the legacy VMS version to ensure continuity and to validate measurements.  As before, all raw and processed data and results is made public via the web to enable researchers and networkers to perform their own analyses and accelerate progress and understanding in this important field. The pingroute tool was improved to use parallel pings to speed up the meausurements by an order of magnitude or more. 

In collaboration with Georgia Tech, SLAC installed, configured, understood, extended and started to use the UCB ns-2 network simulator to further understand Internet performance and dynamics, and allow comparison and contrast with real measurements of throughput, re-ordering etc.

We installed, learnt about and started Web100, an instrumented TCP stack for Linux. We hope this will assist in understanding measurements. We worked with the Internet 2 folks to learn about the QBone Scavenger Service (QBSS). We built 2 test beds, one with a10Mbits/s bottleneck the other with a 100Mbits/s bottleneck, made measurements, analyzed and made presentation on our results. We attended a workshop on QoS in Boulder Colorado.

We worked with the developer of bbcp to recommend and evaluate several enhancements to aid in making network measurements of file copying, and jointly authored a paper on bbcp that was published at CHEP01.

SLAC assisted with installing the SLAC-developed IPv6 monitoring tools at the 6-TAP in Chicago and at the CERNet NOC in China.

We provided reports and transparencies to the ESnet review at Lawrence Berkeley National Laboratory (LBNL), the ESnet Annual report, and the International Union of Pure and Applied Physics (IUPAP), the latter to assist in recommending improved connectivity for physicists in developing countries. We prepared a yearly report on  IEPM accomplishments for DoE, and a SciDAC report. We assisted the Budker Institute of Nuclear Physics in Novosibirsk in understanding and successfully justifying their need to upgrade their link to Japan. We also assisted with about a dozen challenging wide area network problems reported to us by SLAC collaborators at major collaborator sites.

Early field test applications from the NetPredict commercial start-up were evaluated (funded under a separate SBIR) and we  provided feedback and suggestions on improving their effectiveness. We also made contact with NetPhysics a startup that is developing network monitoring tools, and provided input to their developments.

Prompted by BaBar and the PPDG needs for high throughput we started making systematic measurements of high throughput between SLAC and collaborator sites. We set up a web site to host the information we gathered. We also set up a Cisco switch with NetFlow and started to gather and analyze the data in order to characterize the SLAC Internet border traffic. We developed and presented 3 network performance and monitoring demonstrations and participated in the bandwidth challenge at the SLAC/FNAL booth at SC2000. We proposed and set up a collaboration of 32 sites to make high throughput measurements, initially for SC2001.

Presentations were made at:

·         CHEP01.  The Computing in High Energy Physics Conference held in September 2001 is the primary showcase for this work in the HEP community.  Four talks were presented at the meeting, one on IPv6, one on high performance throughput, one on passive measurements, and one on summarizing network developments.

·         Internet2 International.  SLAC detailed findings on performance to sites on networks involved in the Internet2 International group.  In some cases, this provided a before-and-after picture of the effect of peering.

·         PAM2001.  We presented a paper at the Passive and Active Monitoring conference in April.  The paper looked at how to provide high throughput performance on the Internet.

·         ITU.  A presentation was made and a SLAC representative was a member of a panel on Voice over IP (VoIP) and QoS at the ITU meeting in Geneva in April 2001.

·         SC2001.  SLAC will participate in the Super Computing 2001 in Fall 2001 so several demonstration of network measurements were prepared

·         A series of 6 lectures was presented on “High Performance networking and network measurements” at the Islamabad summer school in June 2001. A lecture on "Characterizing the Internet" was given to Stanford summer students.

·         Ipv6 Forum and IETF.  SLAC was involved in performance monitoring for the forthcoming IPv6 forum summit.  A paper reviewing SLAC’s results was presented to the IPv6 Working Group at the IETF meeting. We also made a presentation at the Internet 2 IPv6 working group in Nebraska.

·         ESCC: several talks were given at various ESnet venues, inclusting the ESCC and the ESnet review in Santa Fe.

·         Global Grid Forum: we presented a talk on network performance at the GGF in Washington DC.

·         We wrote an article about working with QBSS, which appeared in the CENIC publication.

Four papers were published, three at CHEP01, and a fourth at PAM 2001.

We prepared and submitted 2 SciDAC proposals involving 2 other Labs, and 3 universities with SLAC as the lead institute and a further two with SLAC as a secondary institute. The Rice university led INCITE proposal was accepted but the others were not.

Expected Progress in FY2002

The PingER tools will be supported and extended to also provide access to the IEPM-BW results (see below). The measurements will continue to be made, archived, analyzed and reported on via the web. Anew beacon list will be developed and deployed. We will work with FNAL to make the data collection more robust and automated and to improve the graph formats. We expect to add a few sites such as LANL, Imperial College London, INFN Trieste and Milan. We will analyze the PingER data to make presentations for the ICFA/SCIC and for developing countries such as Romania to assist in planning and setting expectations.

The traceroute server tools will be extended to improve their logging and anomaly reporting, and reduce the probability of receiving false alarms for scanning.  SLAC will improve the new traceping tool, provide packaging for downloading and assist in deploying it at critical sites. We will also provide assistance to those using the tool to measure and understand Internet topology, etc.  SLAC will work with the Rice university INCITE folks to define the need for topology and tomography tools and to evaluate their development.   SLAC will continue to run the PingER project, coordinating with the monitoring sites, developing and deploying new and improved measurement and analysis/reporting/display tools.

A simple Java tool will be developed for collecting and graphing ping responses to a small number of hosts in real time and placing the time series graphs on an image background, e.g. a world map. This will be used to demonstrate Internet performance at SC2001. Also driven partially by SC2001 we will develop world maps with animated maps of Internet performance. It is hoped these will be considered for entry into the Internet Atlas.

A web site will be designed that contains a collection of illustrative troubles, the attempts to diagnose and, where appropriate, how the troubles were solved.  As this collection grows, a taxonomy of troubles will be developed and navigation tools will be added to assist in matching new troubles to the existing reports.

A new focus for IEPM will be to make, analyze, understand and present high throughput network measurements. This project will be referred to as IEPM-BW. As part of this a new Internet high performance network monitoring infrastructure will be defined and developed. The pilot version using active measurements will be deployed for SC2001. The infrastructure will support both network and application measurement tools. We will initially integrate ping, traceroute, iperf, bbcp and bbftp into the infrastructure. As part of this we will develop tools for data reduction and analysis, together with tools to produce web accessible tabular and graphical reports with drill down and navigation.

Following the initial pilot we will re-engineer the initial code based on what we have learnt and then we will evaluate bandwidth measurement tools such as pipechar, pathload, pathrate and the INCITE tools, as well as other applications such as GridFTP, to understand how to use them, validate how they work, and choose which ones to build into IEPM-BW. We will work with the developers of these tools, provide feedback and promote improvements. In order to evaluate GridFTP we will install Globus and study user authentication using certificates and work with ESnet and the Globus people to get certificates for Globus. We will compare the various available active measurement tools to validate one with another, and to determine regions of applicability.

We will also evaluate methods to optimize throughput to minimize impact on others while still achieving high throughput. Such methods include using compression, Quality of Service (QoS), developing and using application self rate limiting. We will install, understand and integrate Web100 (an instrumented TCP stack for Linux) to assist in understanding the TCP dynamics of high performance throughput.

We will develop methods to gather passive measurements of throughput, initially using Cisco's Netflow. We will develop simple algorithms to identify how to aggregate parallel streams and what application is causing the traffic. We will compare the parallel measurements of throughput with those reported by the active applications. If successful this will help validate both the active and passive measurements and also provide a rich new source of high performance application throughputs.

We will evaluate the effectiveness and cost of simple forecasting mechanisms to provide an indicator for expected throughput for an application such as bulk throughput.

We will document the raw data format and make it available to interested researchers and network folks. We will also provide access via the web to much of the analyzed data to enable further analysis by those interested.

We will build management tools for IEPM-BW to automatically provide configuration information from the remote hosts, downloading of code to remote hosts. We will port the tools to a second operating system (Linux), and provide documentation to assist other in installing and managing the infrastructure. We will then work with a selected second site to port the tools to that site.

We will set up a formal collaboration with the Particle Physics Data Grid (PPDG) and join the PPDG monitoring working group. We will work closely with the European Data Grid since they have embraced, deployed and extended the PingER tools.

Coordinating efforts will be continued and extended with the XIWT, ESnet/ESSC/ESCC, Internet 2/Abilene in particular the Internet End-to-End performanceInitiativeIE2Epi) and the Internet 2 HENP networking working group, the IETF/IPPM, the ICFA/SCIC, HENP, FNAL, the European DataGrid and companies such as NetPredict and NetPhysics. We will set up our CAIDA developed Active Measurement Project (AMP) probe to participate in the collaboration to design and evaluate a new Internet Measurement Protocol (IPMP). Assistance in trouble shooting network problems will continue to be provided to PPDG, BaBar collaborators and physics groups as requested.

We plan to make presentations on worldwide Internet performance, the new Network monitoring infrastructure, QoS and networking issues for various communities including:

·         ESnet: Presentations on Grid Monitoring, QBSS and achieving high performance throughput at the ANL ESCC meeting.

·         DARPA PIs,

·         SciDAC

·         Global Grid Forum

·         ICFA/SCIC

·         Romanian Ministry of Telecommunications and Information

·         Internet 2. presentations on Achieving high performance throughput at the Internet End to end performance initiative (I2 E2Epi) inaugural meeting in Ann Arbor Michigan.

We will participate in the SC2001 Bandwidth Challenge and demonstrate several monitoring applications and results.

Expected Progress in FY2003

We will explore, evaluate and develop mechanisms to optimize (minimize) the durations and frequencies of active measurements. We will investigate a predictive framework that combines infrequent but accurate measurements of throughput (e.g. from bbcp) with frequent, lighter weight, but less accurate measurements to provide reasonable forecasts. We will recommend a tool suite for high performance throughput active measurements. We will once again re-engineer the code based on our experiences, in particular to improve robustness, manageability and deployment, and will deploy the infrastructure to further grid and collaborator sites. As more measurement sites are added we will design and develop tools to assist in the collection and archiving of data.

We will evaluate how to provide a standard publish and subscribe mechanism to access our network measurement data. We will provide remote computer-to-computer access to network measurement data so as to enable an application to select reasonable configuration parameters such as initial window and streams.

We will design and develop a feedback mechanism for a high performance application that will use information from TCP stack dynamic variables to optimize the rate at which the application sends data. The optimization will be to achieve the maximum bandwidth while minimizing the effect on other using the network. We will extend a high throughput application such as bbcp to enable it to select whether or not to use compression based of the power of the cpus, the bandwidth available and the compression ratios achievable. We will collaborate with others such as ESnet and European Data Grid members such as Daresbury to deploy and evaluate QoS services such as the QBone Scavenger Service (QBSS). We will collaborate with the Network Weather Service (NWS) to provide forecasting information from our measurement data base.

Expected Progress in FY2004

We will implement and deploy a distributed data collection and archiving system to support multiple IEPM-BW measurement sites. We will assist in the further deployment of the IEPM-BW infrastructure to all PPDG sites, major HENP sites and other important high performance sites. We will evaluate other measurement infrastructures such as NIMI as they come available, and assist with integrating our measurement tools on the dedicated platforms. We will extend the IEPM-BW infrastructure to collect and archive the data from the new measurement infrastructure. We will develop new analysis and visualization tools that scale up to meet the demands of multiple measurement sites.