SLAC Home Page

IEPM - BW: Methodology

Bulk throughput measurements | Bulk throughput simulation | Windows vs. streams | Effect of load on RTT and loss | Bulk file transfer measurements | QBSS measurements
Created July 7, 2002, last update July 7, 2002.
Authors: Les Cottrell and Connie Logg, SLAC.

The goal was to develop a simple, robust infrastructure to enable making network and applications measurements for links capable of high throughput. This document will give an overview of the methodology adopted. For more details on the goals etc. see Internet End-to-end Performance Monitoring - Bandwidth to the World Project.

There are 2 types of hosts: 

Since the  logs may be collected in a network file system, the monitoring host function may be split into a host that makes the measurements and separate hosts that extract, analyze and report on the data, with yet a further separate web server to make the reports available. 

Each monitoring site works with its collaborators to decide on the remote hosts to monitor. There are typically multiple remote hosts monitored by a monitoring host. For each remote host an account must be provided that is accessible, via the secure shell (“ssh”), to the monitoring host. After installing the appropriate public key in the account on the remote host, the account is remotely configured from the monitoring host. Information on the remote hosts is kept in a database on the monitoring host.

The monitoring host starts a set of measurements at regular intervals driven by a Unix cron table. Typical intervals are of the order of an hour or two. The actual intervals depend on the load acceptable on the monitoring hosts link, and the amount of time it takes to make a set of measurements to all remote hosts. The actual start of the measurements is randomized with a flat distribution over a 15 minute interval. For each set of measurements, the measurement host selects each remote host in turn and runs ping for 10 seconds, does a traceroute (with one measurement per hop) followed by running the iperf. TCP transfer tool, secure file copy using the peer-to-peer bbcp tool with both memory to memory (bbcpmem) and disk to disk copies (bbcpdisk), followed by the bbftp file transfer tool and the packet dispersion bandwidth estimator pipechar. To provide robustness, servers are remotely started and killed for each measurement, also each sensor command (e.g. iperf) is started as a separate task, so it can be timed out and killed in case of problems. Also some sanity checks are observed, e.g. if ping is expected to work (as defined in the remote host configuration database), then if it fails the file copy and transfer measurements are not attempted.

The data is extracted from the logs and converted into space-separated tables, whose format is documented, and made vailable via the web. The analysis is performed on the extracted tables and it produces web accessible pages containing time series (short term for the last 28 days, and longer term aggregated) plots, histogram plots, and scatter plots, statistical and analyzed tables (accessible over the web in formats suitable for loading into Excel), and narrative.

Comments to