17.
WORK PROPOSAL DESCRIPTION (Approach,
anticipated benefit in 200 words or less)
This proposal covers the further development and deployment
of Internet End-to-End Performance Monitoring (IEPM) tools. The success of the current efforts, both
within and beyond the High Energy Physics (HEP) community and the Energy
Sciences Network (ESnet) has provided ample incentive to extend the
capabilities of the existing tools, and develop new ones for purposes not
previously envisioned by the project. The project will further develop tools for an improved
understanding of the critical components that limit end-to-end performance,
and electronically publish results in the form of tables and graphs similar
to those already developed. In
addition, the project intends to extend the monitoring to high performance
networks and bulk throughput applications. This will necessarily be more
intensive in its utilization of the network and less extensively deployed.
It will be aimed at understanding how to achieve high throughput, setting
expectations, assist in trouble-shooting and provide steering information to grid
applications. Furthermore, new tools intended to aid visualization and
understanding of the analyzed results by expert and layman alike will be
developed. These tools, existing and new, and the increased emphasis
high network performance throughput, will lead to a greater understanding
of the dynamics of the Internet, help provide realistic service quality
expectations, and identify where extra resources may be effectively
applied.
18. TASK DESCRIPTION (Approach, relation to work package, in
200 words or less)
FTPA #1 This task provides for maintenance, enhancement of the
existing PingER tools and infrastructure. In
addition it will support the development of a new active and passive
network and application measurements, data collection and archiving,
analysis and associated reports. In January 2002, the IEPM PingER data
gathering tools were deployed at 34 sites in 14 countries. A total of almost 700 remote nodes at
about 420 sites in 74 countries are probed by these monitoring sites,
providing performance metrics on over 3500 end-to-end pairs. The 72 countries have over 78% of the
world's population and about 99% of the online users of the Internet. PingER is thought to be the most
extensive network performance monitoring project in existence. The new
high performance measurements will be critical to understanding today's
and future high speed networks and emerging Grid applications. As with so
many aspects of modern computing and networking, the needs of the High
Energy Physics (HEP) and ESnet communities continue to push the envelope,
and the PingER tools are an important part in measuring the success.
Internet End-to-End Performance Monitoring
Progress in FY2001
The extent of the PingER monitoring grew to
about 3600 monitor-remote site pairs, 34 monitoring sites, 691 remote hosts at
44 sites covering 74 countries that between them contain over 99% of the
world’s population with Internet connections. The utilization grew to over 2000
web site hits per day with the accesses coming from over 45 countries. FNAL
got re-involved with the IEPM project so we spent time bringing the new group
up to speed, and mapping out plans. The robustness and quality of the FNAL
graphing tools have been improved and the data collection at FNAL has been
dramatically improved.
The PingER
analysis programs were extensively re-written and
reports were extended to provide information on inter packet delay variation,
duplicate pings and out-of-order pings. New graphical reporting tools were
added. Traceroute measurements (traceping),
archiving, mining and reporting was ported from the
existing VMS platform to Unix, and the new tool installed at SLAC. We continued
to run the legacy VMS version to ensure continuity and to validate
measurements. As before, all raw and
processed data and results is made public via the web
to enable researchers and networkers to perform their
own analyses and accelerate progress and understanding in this important field.
The pingroute tool was improved to use parallel pings
to speed up the meausurements by an order of
magnitude or more.
In collaboration with Georgia
Tech, SLAC installed, configured, understood, extended and started to use the
UCB ns-2 network simulator to further understand Internet performance and
dynamics, and allow comparison and contrast with real measurements of
throughput, re-ordering etc.
We installed, learnt about
and started Web100, an instrumented TCP stack for Linux. We hope this will
assist in understanding measurements. We worked with the Internet 2 folks to
learn about the QBone Scavenger Service (QBSS). We
built 2 test beds, one with a10Mbits/s bottleneck the other with a 100Mbits/s
bottleneck, made measurements, analyzed and made presentation on our results.
We attended a workshop on QoS in
We worked with the developer
of bbcp to recommend and evaluate several
enhancements to aid in making network measurements of file copying, and jointly
authored a paper on bbcp that was published at
CHEP01.
SLAC assisted with installing
the SLAC-developed IPv6 monitoring tools at the 6-TAP in
We provided reports and
transparencies to the ESnet review at Lawrence
Berkeley National Laboratory (LBNL), the ESnet Annual
report, and the International Union of Pure and Applied Physics (IUPAP), the
latter to assist in recommending improved connectivity for physicists in
developing countries. We prepared a yearly report on IEPM accomplishments for DoE, and a SciDAC report. We
assisted the Budker Institute of Nuclear Physics in
Early field test applications
from the NetPredict commercial start-up were
evaluated (funded under a separate SBIR) and we provided feedback and suggestions on
improving their effectiveness. We also made contact with NetPhysics
a startup that is developing network monitoring tools, and provided input to
their developments.
Prompted by BaBar and the PPDG needs for high throughput we started
making systematic measurements of high throughput between SLAC and collaborator
sites. We set up a web site to host the information we gathered. We also set up
a Cisco switch with NetFlow and started to gather and
analyze the data in order to characterize the SLAC Internet border traffic. We
developed and presented 3 network performance and monitoring demonstrations and
participated in the bandwidth challenge at the SLAC/FNAL booth at SC2000. We
proposed and set up a collaboration of 32 sites to make high throughput measurements,
initially for SC2001.
Presentations were made at:
·
CHEP01.
The Computing in High Energy Physics Conference held in September 2001
is the primary showcase for this work in the HEP community. Four talks were presented at the meeting, one
on IPv6, one on high performance throughput, one on passive measurements, and
one on summarizing network developments.
·
Internet2 International. SLAC detailed findings on performance to
sites on networks involved in the Internet2 International group. In some cases, this provided a
before-and-after picture of the effect of peering.
·
PAM2001.
We presented a paper at the Passive and Active Monitoring conference in
April. The paper looked at how to
provide high throughput performance on the Internet.
·
ITU. A
presentation was made and a SLAC representative was a member of a panel on
Voice over IP (VoIP) and QoS
at the ITU meeting in
·
SC2001.
SLAC will participate in the Super Computing 2001 in Fall 2001 so
several demonstration of network measurements were prepared
·
A series of 6 lectures was presented on “High
Performance networking and network measurements” at the
·
Ipv6 Forum and IETF. SLAC was involved in performance monitoring
for the forthcoming IPv6 forum summit. A
paper reviewing SLAC’s results was presented to the
IPv6 Working Group at the IETF meeting. We also made a presentation at the
Internet 2 IPv6 working group in
·
ESCC: several talks were given at various ESnet venues, inclusting the ESCC
and the ESnet review in
·
Global Grid Forum: we presented a talk on
network performance at the GGF in
·
We wrote an article about working with QBSS,
which appeared in the CENIC publication.
Four papers were published, three at CHEP01, and a fourth at PAM
2001.
We prepared and submitted 2 SciDAC
proposals involving 2 other Labs, and 3 universities with SLAC as the lead
institute and a further two with SLAC as a secondary institute. The Rice
university led INCITE proposal was accepted but the others were not.
Expected Progress in FY2002
The PingER tools will be supported and
extended to also provide access to the IEPM-BW results (see below). The
measurements will continue to be made, archived, analyzed and reported on via
the web. Anew beacon list will be developed and deployed. We will work with
FNAL to make the data collection more robust and automated and to improve the
graph formats. We expect to add a few sites such as LANL, Imperial College
London, INFN Trieste and
The traceroute server tools will be
extended to improve their logging and anomaly reporting, and reduce the
probability of receiving false alarms for scanning. SLAC will improve the new traceping
tool, provide packaging for downloading and assist in deploying it at critical
sites. We will also provide assistance to those using the tool to measure and
understand Internet topology, etc. SLAC
will work with the Rice university INCITE folks to define the need for topology
and tomography tools and to evaluate their development. SLAC will continue to run the PingER project, coordinating with the monitoring sites,
developing and deploying new and improved measurement and
analysis/reporting/display tools.
A simple Java tool will
be developed for collecting and graphing ping responses to a small number of
hosts in real time and placing the time series graphs on an image background,
e.g. a world map. This will be used to demonstrate Internet performance at
SC2001. Also driven partially by SC2001 we will develop world maps with
animated maps of Internet performance. It is hoped these will be considered for
entry into the Internet Atlas.
A web site will be designed that contains a collection of
illustrative troubles, the attempts to diagnose and, where appropriate, how the
troubles were solved. As this collection
grows, a taxonomy of troubles will be developed and
navigation tools will be added to assist in matching new troubles to the
existing reports.
A new focus for IEPM will be to make, analyze, understand and
present high throughput network measurements. This project will be referred to
as IEPM-BW. As part of this a new Internet high performance network monitoring
infrastructure will be defined and developed. The pilot version using active
measurements will be deployed for SC2001. The infrastructure will support both
network and application measurement tools. We will initially integrate ping, traceroute, iperf, bbcp and bbftp into the
infrastructure. As part of this we will develop tools for data reduction and
analysis, together with tools to produce web accessible tabular and graphical
reports with drill down and navigation.
Following the initial pilot we will re-engineer the initial code
based on what we have learnt and then we will evaluate bandwidth measurement
tools such as pipechar, pathload,
pathrate and the INCITE tools, as well as other
applications such as GridFTP, to understand how to
use them, validate how they work, and choose which ones to build into IEPM-BW.
We will work with the developers of these tools, provide feedback and promote
improvements. In order to evaluate GridFTP we will
install Globus and study user authentication using
certificates and work with ESnet and the Globus people to get certificates for Globus.
We will compare the various available active measurement tools to validate one
with another, and to determine regions of applicability.
We will also evaluate methods to optimize throughput to minimize
impact on others while still achieving high throughput. Such methods include
using compression, Quality of Service (QoS),
developing and using application self rate limiting. We will install,
understand and integrate Web100 (an instrumented TCP stack for Linux) to assist
in understanding the TCP dynamics of high performance throughput.
We will develop methods to gather passive measurements of
throughput, initially using Cisco's Netflow. We will
develop simple algorithms to identify how to aggregate parallel streams and
what application is causing the traffic. We will compare the parallel
measurements of throughput with those reported by the active applications. If
successful this will help validate both the active and passive measurements and
also provide a rich new source of high performance application throughputs.
We will evaluate the effectiveness and cost of simple
forecasting mechanisms to provide an indicator for expected throughput for an
application such as bulk throughput.
We will document the raw data format and make it available to
interested researchers and network folks. We will also provide access via the
web to much of the analyzed data to enable further analysis by those
interested.
We will build management tools for IEPM-BW to automatically
provide configuration information from the remote hosts, downloading of code to
remote hosts. We will port the tools to a second operating system (Linux), and
provide documentation to assist other in installing and managing the
infrastructure. We will then work with a selected second site to port the tools
to that site.
We will set up a formal collaboration with the Particle Physics
Data Grid (PPDG) and join the PPDG monitoring working group. We will work
closely with the European Data Grid since they have embraced, deployed and
extended the PingER tools.
Coordinating efforts will be continued and extended with the
XIWT, ESnet/ESSC/ESCC, Internet 2/Abilene in
particular the Internet End-to-End performanceInitiativeIE2Epi) and the
Internet 2 HENP networking working group, the IETF/IPPM, the ICFA/SCIC, HENP,
FNAL, the European DataGrid and companies such as NetPredict and NetPhysics. We
will set up our CAIDA developed Active Measurement Project (AMP) probe to
participate in the collaboration to design and evaluate a new Internet
Measurement Protocol (IPMP). Assistance in trouble shooting network problems
will continue to be provided to PPDG, BaBar
collaborators and physics groups as requested.
We plan to make presentations on worldwide Internet performance,
the new Network monitoring infrastructure, QoS and
networking issues for various communities including:
·
ESnet: Presentations
on Grid Monitoring, QBSS and achieving high performance throughput at the ANL
ESCC meeting.
·
DARPA PIs,
·
SciDAC
·
Global Grid Forum
·
ICFA/SCIC
·
Romanian Ministry of Telecommunications and
Information
·
Internet 2. presentations on Achieving high
performance throughput at the Internet End to end performance initiative (I2
E2Epi) inaugural meeting in Ann Arbor Michigan.
We will participate in the SC2001 Bandwidth Challenge and
demonstrate several monitoring applications and results.
Expected Progress in FY2003
We will explore, evaluate and develop
mechanisms to optimize (minimize) the durations and frequencies of active
measurements. We will investigate a predictive framework that combines
infrequent but accurate measurements of throughput (e.g. from bbcp) with frequent, lighter weight, but less accurate
measurements to provide reasonable forecasts. We will recommend a tool suite
for high performance throughput active measurements. We will once again
re-engineer the code based on our experiences, in particular to improve
robustness, manageability and deployment, and will deploy the infrastructure to
further grid and collaborator sites. As more measurement sites are added we
will design and develop tools to assist in the collection and archiving of
data.
We will evaluate how to provide a
standard publish and subscribe mechanism to access our network measurement
data. We will provide remote computer-to-computer access to network measurement
data so as to enable an application to select reasonable configuration parameters
such as initial window and streams.
We will design and develop a feedback
mechanism for a high performance application that will use information from TCP
stack dynamic variables to optimize the rate at which the application sends
data. The optimization will be to achieve the maximum bandwidth while
minimizing the effect on other using the network. We will extend a high
throughput application such as bbcp to enable it to
select whether or not to use compression based of the power of the cpus, the bandwidth available and the compression ratios
achievable. We will collaborate with others such as ESnet
and European Data Grid members such as Daresbury to
deploy and evaluate QoS services such as the QBone Scavenger Service (QBSS). We
will collaborate with the Network Weather Service (NWS) to provide forecasting
information from our measurement data base.
Expected Progress in FY2004
We will implement and deploy a distributed data collection and
archiving system to support multiple IEPM-BW measurement sites. We will assist
in the further deployment of the IEPM-BW infrastructure to all PPDG sites,
major HENP sites and other important high performance sites. We will evaluate
other measurement infrastructures such as NIMI as they come available, and
assist with integrating our measurement tools on the dedicated platforms. We
will extend the IEPM-BW infrastructure to collect and archive the data from the
new measurement infrastructure. We will develop new analysis and visualization
tools that scale up to meet the demands of multiple measurement sites.