U.S. DEPARTMENT OF ENERGY

FIELD WORK PROPOSAL

1. WORK PACKAGE NUMBER

2. REVISION NO.

0

3. DATE PREPARED

1/26/98

4. WORK PACKAGE TITLE

Internet End-to-End Performance Monitoring (IEPM)

5. BUDGET AND REPORTING CODE

KJ-01

6. WORK PROPOSAL TERM

12 Months

7. IS THIS WORK PACKAGE INCLUDED IN THE INSTITUTIONAL PLAN?

No

8. DOE PROGRAM MANAGER

Daniel A. Hitchcock, Acting Director
Mathematical, Information, And Computational Sciences Division
FTS 233-7486

11. HEADQUARTERS ORGANIZATION

Energy Research

14. DOE ORGANIZATION CODE

9. OPERATIONS OFFICE WORK PROPOSAL REVIEWER

 
 

12. OPERATIONS OFFICE

Oakland

15. DOE ORGANIZATION CODE

 

10. CONTRACTOR WORK PROPOSAL MANAGER

Dr. Burton Richter
FTS 462-2601

13. CONTRACTOR NAME

Stanford Linear
Accelerator Center

16. CONTRACTOR CODE

55

17. WORK PROPOSAL DESCRIPTION (Approach, anticipated benefit in 200 words or less.)

This proposal covers the further development and deployment of end-to-end Internet monitoring (IEPM) tools. The success of the initial effort [1], both within and beyond the High Energy Physics (HEP) community and the Energy Sciences Network (ESnet) has provided ample incentive to extend the capabilities of the existing tools, and develop new ones for purposes not previously envisioned by the project.

The project will further develop tools for an improved understanding of the critical components that limit end-to-end performance, and electronically publish results in the form of TABLEs and graphs similar to those already developed. In addition the project intends to develop new tools to aid understanding in the same low impact (on the network and servers), low cost (to deploy), and understandable mechanisms as before. The aspect of problem diagnosis using the tools and data will be refined. Furthermore new tools intended to aid visualization and understanding of the analyzed results by expert and layman alike will be developed.

These tools, existing and new, and the increased emphasis on the visualization aspect, will lead to a greater understanding of the dynamics of the Internet and help provide realistic service quality expectations and identify where extra resources may be effectively applied.

18. CONTRACTOR WORK PROPOSAL MANAGER

Burton Richter, Director
Stanford Linear Accelerator Center

 

 

________________
      (Signature)

 

_________
(Date)   

19. OPERATIONS OFFICE REVIEW OFFICIAL

 
 

 

 

________________
      (Signature)

 

_________
(Date)   

20. ATTACHMENTS

 

21. STAFFING (in Staff Years)

 

a. SCIENTIFIC

b. OTHER DIRECT

c. TOTAL DIRECT

 

FY1998

FY1999

 

FY2000

BY+1

BY+2

TOTAL TO COMPLETE

22. OPERATING EXPENSE (in Thousands)

a. OBLIGATIONS

b. COSTS

 

 

 

 

 

 

 

23. EQUIPMENT

a. OBLIGATIONS

b. COSTS

 

 

 

 

 

 

 

24. MILESTONE SCHEDULE (To be defined for each project)

 

 

 

 

 

 

 

 

 

 

Proposed Work

The work will involve two main threads; further development of the existing end-to-end Internet monitoring tools, and development of new tools to increase and complement the existing ones. In addition, tools to aid visualization of the information obtained from the tools will be developed.

SLAC has developed several tools to aid analysis of raw data. Analysis is made utilizing existing commercial and public domain tools. The reports are accessible via WWW pages that are dynamically customizable or pages that are static, where reports cannot be generated sufficiently quickly to be interactive. Reports are produced in the form of TABLEs and graphs. The ping measuring tools (collectively referred to as PingER) are now installed at 14 HENP/ESnet collection sites in 8 countries, over 480 links are being monitored in 22 countries and HEPNRC is acting as the Archive site. In addition the tools and methodology have been adopted by the Cross Industry Working Team (XIWT) / Internet Performance Working Team (IPWT) and as of November 1997 the collection tools were installed at 6 commercial sites with about 15 expected in early 1998 [2]. Sites are currently monitored at roughly 30 minute intervals and the reports generated show various metrics for the last sample taken as well as averaged over daily and monthly intervals.

The project will extend the existing tools to to improve their robustness, extend the "user-friendly" and "installer-friendly" aspects of the tools to make them easier to use and understand and for collection sites to install. It will also allow for easy upgradeability to ensure collection sites are not deterred from implementing changes due to lack of time and man power.

Currently SLAC monitors some 70 sites, and the tools show how SLAC (or any collection site) sees those sites. The number of sites will inevitably increase, and the number of collection sites will increase. The scalability of the existing tools in their present form is limited, so more emphasis will be placed on grouping the data. That means the connectivity of (for example) the sites connected to ESnet or vBNS will be the focus, rather than the connectivity of the nodes themselves. New reports will be created to enable direct comparison between groups of nodes to compare network performance. New tools will be developed to show a "reverse" perspective, that means the tool will examine how the results from the various collection sites see one node, rather than how one collection site sees all the nodes it monitors.

The expected increase in monitored sites and collection sites, and the large volume of data that will be generated means major importance and significant effort will be placed on making the tools easy to use and the results easy to understand and visualize. Some initial trials have begun to investigate the possibility of developing tools using the very latest web software (Java, 3DJava, VRML) and web server technology (Java Servlets). The current IEPM effort is developing the CAIDA tools MapNet and Anemone for use with visualizing pingER data, and this development will continue and include the data gathered from the new tools. Care will be taken to ensure the results can be viewed with any modern graphical web browser regardless of platform.

The process of issuing warnings and alarms regarding performance and connectivity (or lack of) will be refined. New tools and/or other tools will be incorporated to attempt to diagnose the problem (network problem, the remote node is down, a router is dropping the ping packets, etc), although it is not envisioned that these tools will be used to act as a front line warning mechanism to network administrators. The tools will be adapted to take full advantage of changes and refinements with the processing done by the database. More effort will be made to reduce the degree of maintenance required (e.g. define formatting standards and use environment variables to determine values rather than explicitly define them) to the point where the tools will not require any human intervention to work when other factors change.

It is an important but hard problem to distinguish which aspects of end-to-end measurement reflect limitations in the application, the local area networks at the endpoints, the Internet connectivity of the source and destination sites, and the wide area network path between them. Separating these effects was considered beyond the scope of the initial IEPM effort, but several approaches have been identified that may allow development of the IEPM architecture towards this goal. We will look at developing tools involving traceroute and pathchar, and how measurements taken with these tools relate to measurements taken with the pingER tools. We will look at integrating our tools with the NSF-sponsored "National Internet Measurement Infrastructre" (NIMI) and other dedicated monitoring projects.

IEPM and NIMI mesh very well. The goal of NIMI is to pilot a scalable measurement system for probing Internet clouds and assessing the performance they deliver. A key facet of NIMI is the development of a modular architecture that can accommodate numerous, different measurement techniques. NIMI will provide the general mechanisms for scheduling measurements and retrieving results. IEPM will provide NIMI with access to ESnet and HEP sites (the initial deployment of systems will be to LBNL, SLAC and FNAL, this will be followed by systems at ORNL, BNL and/or CERN) as a testbed for developing and refining the measurement architecture (in particular, the mechanisms for scheduling individual measurements and returning the results) in the context of a large, but not unduly large, network. Both projects will contribute measurement and analysis tools to one another; and IEPM will be assured of compatibility with the NIMI infrastructure if/when the latter becomes widely deployed. After the initial deployment is successfully completed, for a future project, we expect to deploy NIMI systems at one to two dozen sites in N. America, Europe and Japan.

The key differences between the work we propose here and that of NIMI are:

Criticism has been made of monitoring projects that ping heavily loaded servers, and ping nodes that give ping low priority (e.g. routers). The PingER effort has deliberately aimed to monitor machines which we assume to be lightly loaded, such as name servers, and integration into the NIMI project will go further to reduce the number of unsuiTABLE destination sites. Furthermore, criticism has been made of regular pinging (i.e. the pinging is done regularly on each half hour). Future releases of the data gathering tools will use poisson sampling to make random samples of the data.

Development Collaboration & Relationship to Other Work

The previous IEPM work has involved a nationwide, and indeed a worldwide, collaboration of HEP sites, and in particular a strong partnership between SLAC and FNAL. SLAC has begun work with ESnet to identify and make available monitoring information pertaining to the operation of ESnet. The project will track the work done as part of the IPPM effort. Details were given in the previous proposal of the strong relationship between SLAC and LBNL and how the PingER and NIMI projects complement each other. The project will use and develop the relationships with ESnet, FNAL and LBNL and develop an understanding of how the projects can be used to understand network performance and requirements. SLAC also intends to further develop the international effort that has grown around the PingER tools. In particular SLAC has a strong connection with CERN (Switzerland), INFN (Italy), KEK (Japan), and other HENP labs around the world. In addition SLAC acts as a link between groups and a central point around which many efforts communicate.

The Cross Industry Working Team (XIWT) has adopted the IEPM PingER tools to monitor their own network requirements. SLAC is taking a very active role in developing the tools and understanding the needs of groups outside HEP and ESnet.

The Co-operative Association for Internet Data Analysis (CAIDA) has developed several tools to assist with the visualization of Internet related areas, some of which are being applied to the field of network monitoring by SLAC. SLAC intends to develop close connections with CAIDA and its sister organization NLANR to develop visualization tools, and to share experience and knowledge.

Furthermore, SLAC has developed and maintains a large amount of general interest information in the form of web pages relating to the field of networking and network monitoring. The project will continue to maintain and update this as well as document the new tools and new knowledge.

Our Strengths

SLAC and LBNL have a long history of Wide Area Network support going back to the original creation of HEPnet (the predecessor of ESnet), the creation of the first Internet link to Mainland China, and the collocation of ESnet management at LBNL. SLAC and LBNL have assumed a leadership role in wide area network monitoring. Les Cottrell of SLAC is the chairman of the ESnet Network Monitoring Task Force, and the Network Monitoring Focal Group.

Those involved in the IEPM effort have a long history of Networking and Network monitoring. In addition the IEPM effort so far has established many new alliances and connections.

Deliverables and Schedule

Initial versions will be available for beta testing 4 months after the start of the project.

In addition,

References

  1. Proposal for IEPM funding 1997
  2. December 1997 Interim Report of the ICFA-NTF Monitoring Working Group

[IEPM Home Page] [IEPM Site Map] [Network Monitoring] [PingER] [IEPM Papers and Presentations] [SLAC Welcome Page]

Revised 2 July 1999
URL: http://www-iepm.slac.stanford.edu/about/fwp/fwp-98.html
Comments to iepm-l@slac.stanford.edu