INCITE voice meeting between SLAC & Rice, 3/27/02

Authors: Les Cottrell. Created:  March 27, 2002

Attendees: Rob Nowak, Yolanda Tsang, Jiri Navratil, Connie Logg, Les Cottrell

XIWT | IEPM | PingER | Tutorial

Introduction

Some thoughts before the meeting (from Les, Jiri & Yolanda):

Since only Bob maybe with us for the meeting, we should focus on the topology/tomography measurements.

  1. Topology: we would like to see topology maps of the connectivity from SLAC to a few 10's of sites (e.g. the IEPM-BW sites). These should be produced automatically as web displayable images (e.g. gif, jpg or png) on a daily basis, then they can be linked to from various web pages. To do this:
    1. We need the data from the Rice topology measurements (fat boy) to be interfaced to a graphics package that can display the map. Jiri has such a graphics package that he wrote a year or so ago. Jiri can you post a typical graph created by this package? Jiri needs to revive this package or try another say from CAIDA (Warren has some pointers to possible packages). Then Jiri and Yo need to define an interface from her topology packge to the graphics package and then implement.
    2. We need to automate the measurements and analysis. Among other things this means the job must be callable from a cron table, and we need to be able to either replace MatLab with a program that uses some numerical library (e.g. IMSL, NAG etc.), or we need to decide how to run MatLab automatically in a robust form. This may require SLAC to purchase a MatLab license, or the jobs to be run at Rice, or run the jobs at SLAC when there are spare licenses (e.g. late at night).
    3. We (SLAC) need to set up an infrastructure to install the various Rice servers at the IEPM-BW sites. It would help if there was only one Rice server that did chirp, fat boy etc. at each remote host. Is this reasonable? We (SLAC) need to start and stop the Rice server(s) at the remote sites on demand for robustness and security reasons. Once a server is started we can make the measurement(s) from the SLAC end and then kill the server.
  2. We need to match the fat boy topologies to the traceroute topologies.
    1. Automating traceroute measurements is easy. In fact Warren already has traceroutes(1) (and even pings to individual nodes along the route) for about 80 sites seen from SLAC and we could extend this for about another 10 measuring hosts). Yo can already take multiple traceroutes into her program and using MatLab create tables of connectivity. So what is needed (as above) is for a graphics program and an interface between the fat boy connectivity tables and the graphical mapping program. This is probably easier than item 1, and maybe should be attempted first or at least in parallel with 1.
    2. In addition we want to get the RTT measurements out of the traceroute measurements, so the links (edges) can be labelled (and possibly colored) with the average RTTs to the node at the end of the edge.
  3. We want to make tomography loss measurements and apply them to the maps.
    1. Yo has discussed a way to to this. It is quite intensive so we need to discuss how to do this, i.e. how to reduce the number of end nodes (e.g. by breaking the world in regions with common connectivity), how many probes per end node, how often etc.
    2. Once we have the measurements we need to decide how/whether we can label the graphs or how to visualize the data.

Yolanda wrote:

I have some clarifications to the points that Les made. I also try to point out some difficulties and limitations that arise. If we can identify the problems in an earlier stage, it will definitely ease the future development.

Item 1

SLAC is interested in the topology estimation, specifically branch point identification which we call the logical topology. Rice has a tool in estimating the logical topology, however, the current tool can only do binary tree identification, that is each node can only have two children. The estimation can be modified such that some of the links with low "index" value can be collapsed in order to have a topology closer to the real one. This is done manually at the current stage and automation is needed here. Besides, since it is a delay based estimation, links with very high bandwidth are likely to have insignificant or zero delay. For those links, we might have difficulties in identifying them.

Item 2

Mark or Rui has a program for plotting the traceroute information. However, it simply shows the logical topology without extra information on the links and internal nodes. What Les meant is instead of the topology, we should also take advantage of the existing information and locate them. These include but not limit to, average the round trip delay on each logical link and indicate the ip address on the internal nodes (branching nodes) Currently, we plot the logical tree using matlab. The coordinate of the nodes are computed as well in matlab. However, it cannot be scaled to a large number of receivers. Jiri has a better program in plotting the connectivity, however, scaling might remain as a problem.

Item 3

Based on the number of measuremetns (number of packets sent and received), we will estimate the individual link loss rate on the logical tree structured network. To provide correlated information, we will send closely spaced packet pairs to receivers. consider the following simple example:
     A    <- sender
     |
    / \
   B   C  <- receivers
In order to isolate information for each link, we need four types of measurements, (AB, AB), (AB, AC), (AC, AC) and (AC, AB), where <.> is a packet pair and AB implies packet from A to B. To achieve reasonable estimates, we need to have an adequate amount of probes for each measurement.

The questions arising include (1) what does it mean by adequate? If it is a good link, we might have difficulties in seeing losses and thus will require more packets. If it is a lossy link, we do not want to further degrade the network service and we will only send a small amount of probes. (2) how many receivers should we include in the tree? You can see that the number of measurements depends on the number of receivers. If there is a large number of receivers, the number of measurements might be considerable.

Meeting notes

We went over the information above and decided on the following:


(1): try looking at the data files in /nfs/oceanus/u4/traceping/data/,  I think the format is pretty straightforward and a perl script can easily convert to a more matlab friendly format.

(2): I've written code that can send back to back ICMP echo request packets(pings) as fast as they can be transmitted onto the network. Yes, it does support sending them to different hosts. The spacing between them is usually less than a millisecond, depending on rate of the link at the sender. I do not use "ping". The new code was written from scratch. Ryan Christopher King [ryanking@owlnet.rice.edu]


[ Feedback ]