DL/SLAC meeting, November 1, 2000

11/1/00, at SLAC

Rough notes by Les Cottrell and Robin Tasker


Attendees:

Paul Kummer, Robin Tasker, Alan Clarkson from DL and Warren Matthews, Les Cottrell from SLAC

Agenda

Day 1

At the NY POP there is a GSR 12008 but could find nothing about what it has, it is running 12.0(S).10. They are hoping to upgrade to .12 or .14. It is now successfully running CAR & WRED (in the same router) since early June. There are multiple algorithms for selecting which queue to empty, one of these might be WFQ. The router also has Generic Traffic Shaping (GTS) at the exit. UKERNA owns the router and are edgy about changing it. Les provided a Cisco white paper on the ACLs in the GSR.

The GTS has been tested on a small (1605) router at DL. It appears to work well for big packets (1000 bytes) but it appears to kick in early  for 104 byte packets, probably due to power of the test router (1605), which with 104 byte packet can only transmit <   4Mbits/s. The ACLs allow one to select the machines and port numbers at both ends. There was no CAR available in the 1605 (as far as can be told from the Cisco release notes), but it does have WRED and WFQ. They hope to test WRED and WFQ on the DL test bed before they try with the UKERNA router.

The effect of the marking and applying QoS is noticeable on PingER peak packet loss (went from 40% to 10%), but since Abilene opened up the peering and bandwidth it is not as noticeable.

The goal is to make tests from DL to Stanford. We will have accounts on the Stanford machine (loggy).

We discussed how to make measurements. We will probably use pings (possibly with an extended version of PingER) on the low and high priority queues when there is a generated load. Gen_send (at http://www.citi.umich.edu/projects/qbone/generator.html) allows one to specify the UDP bits/second to send, the packet size, and the frequency. This allows one to generate evenly spaced or bursty traffic and report on thruput and losses etc. We also want to record the routes.

RL still has congestion problems at their firewall due to security ACLs. This will be fixed with a new  line speed firewall when they upgrade to 622Mbps later this year.   DL will be getting an Extreme Networks switch/router that is advertised to run at wire speed even with filtering. There will be a dedicated Extreme box for external fltering. This upgrade will be summer 2001. 

Richard Hughes-Jones (RHJ) of Manchester  has some tools for measuring jitter. There are good contacts with Richard.

Tools

Standard tools to be available on the end hosts will be nping, ntrace, tcpdump, iperf, gen_send/gen_recv, pchar, possibly Richard Hughes Jones jittter tools. Need to set iperf servers to run all the time (e.g. use inetd). Loggy and rtlin1 will have iperf tied into inetd to keep them running all the time. We also need to see if we can allow ports 5001-5009 into rtlin1 so we can use rtlin1 as an iperf server from clients outside DL. Gen_send needs modifying to allow selection of port, and maybe some other features such as selecting output file, allo setting the duration of the test. Paul is looking at modifying gen_send/gen_recv.

Measurements

Idea is to have 2 queues chosen by CAR. Iperf will be on ports 5001-9 and will be normal priority and then other apps on port 6000.... Can we also mark the ICMP packets (e.g. by using TOS bits)? Can multi-homing help, will CAR over-ride ping TOS bits (Paul believes it will)? We agreed to start out by not using port number marking rather onl using IP address for marking. We also discussed  multi-home versus  multi-host, and agreed multi-home is preferred since simplifies in case the machines are not identical. In the background run ping to characterize the performance etc. Every now and again run traceroute to keep track of the route changes (only keep diffs). Use gen_send for traffic generation, e.g. for background traffic, and wind it up and see effect on regular priority and high priority pings (loss/RTT/jitter). Run gen_send vs iperf both at regular priorities and see the effect on iperf thruput as gen-send increases then repeat with iperf at high priority. Try something similar with RHJ to measure jitter and see if it is more sensitive to queue management and also how it agrees with ping jitter measurements.

Next can play with the scheduler that decides how to select things (e.g. by using WFQ) from multiple queues. Can also play with more queues for example with one emulating Less than Best Effort Service, one for normal traffic and one for expedited service. It would be useful to learn from Cisco whether they have any plans on this. The GTS is just there to guarantee congestion by limiting bandwidth available (currently set to 2Mits/s).

Another thing a bit closer to applications is to do HTTP GETs on standard multisized files and see impact of QoS. This will help show the TCP set up effects.

It may be interesting to look at changing GTS on the DL testbed to see the effect on iperf. There may be other  Cisco QoS features we can test, e.g. FRED. May need to improve our contacts with Cisco to learn more of upcoming features and to have better sources of information.

DL will dicsuss with UKERNA about access to NY router utilization (e.g. MRTG) information and whether there is other information available (e.g. flows from OCXMON). Also may be desirable for other routers on the paths. Also maybe other information in the routers that may be instructive. e.g. queue lengths etc.

It would be good to have a Surveyor at DL. DL have already talked to ANS about this. Les will contact Guy Almes and Matt Zekauskas to try and move this forward. This would also provide route history.

Timetable

The UKERNA schedule has slipped by a month. UKERNA (project management) wants a more detailed project plan in 2 months. So we need to focus on what happens in the next 3 months.

UKERNA wants monthly reports from the project. There will be a more formal report to a NetyworkShop in March. Les will make sure that DL is on the distribution list for the monthly IEPM reports, and DL will make ure the DL/UKERNA reports are also sent to SLAC.

Day 2

 Tasks Identified
  ================

- accounts on loggy - WM
- buffer size upgrade on loggy - WM
- install general software on loggy and rtlin1 - WM RT
- install PingER on loggy and rtlin1 - WM RT
- install traceroute monitor stuff on loggy and rtlin1 - WM RT
- multihome rtlin1 - RT

- characterise link both ways between rtlin(H) & loggy
                                      rtlin(L) & loggy

  use PingER at 15 min interval with 100 pings of 100 and 1000 bytes WM RT
  traceroute script to compare against previous routes - WM
  RHJ jitter tests (modified to run from a script) - RT
  iperf for throughput - LC
  queue lengths etc in NY PoP - needs checking with cisco and UKERNA - PSK

- further DL testbed testing - AC/RT/PSK
- understand existing pkt marking in NY PoP and relative marking of
  our test traffic - PSK
- final config of cisco 1605R -> UKERNA router i.e. - PSK


 
     ACL               WRED                     ACL
      |                  |                       |
      V                  V                       V

                    |-- queue -- rtlin(H) --|
  <--GTS--- sched --|                       |-- CAR <--- Abilene
                    |-- queue -- rtlin(N) --|

where,  rtlin1 is multihomed with IP addresses rtlin(H) and rtlin(N)

so (reversed)

                                    WRED        GTS
                                      |          |
                                      V          V

        |->   everything else   |-- queue1 ----------> 622Mbps (620?)
 (high) |                     ->|               |
        |->   loggy->rtlin(H)   |-- queue2 ---  | -|
 CAR-<                                          |  |
        |->  evagore->icfamon   |-- queue3 --- -|  | 
(normal)|                     ->|                  V
        |->  loggy->rtlin(N)    |-- queue4----------->  2 Mbps|
     

but where does all the other traffic go in this scheme, i.e. high, normal
or elsewhere?

- ability to get throughput data out of NY PoP router during tests - PSK

- check of scheduling algorithms available within NY PoP router - PSK

- more detailed Project Plan - PSK

- modify gen-send to used specifed port and be scriptable - PSK

- modify RHJ stuff as appropriate - RT

- modify ping for testbed via Perl to be scriptable etc - WM

- iperf via Perl script - LC

- http-get Perl script needed - WM RT

- experimental test suite,  - ALL

     for   ------------ background increasing ------------------>

               via gen_send() input: pkt size= 100, 1000, 1400?
                              output: actual thruput 
     do
           ping  N   | input   pkt sizes (15) between 100 - 1000 * 100
           ping  H   | output  rtt, pkt
                                     
           iperf N   | input   windows, streams
           iperf H   | output  thruput
                             
           RHJ   N   | input   pkt sizes (15) * 100 pkts         
           RHJ   H   | output  oneway transit time, loss, jitter

        http-get N   | input   filesize (4K, 64K)
        http-get H   | output  load time

    estimate that each test run will produce < 1Mbyte of data but need to add
    textual description of the data in the raw dataset

 - done at times through the day/week (to show no effect!) and frequency
   of each time point

 - need to know time for single test suite to run

 - set up email list
          - use of Log Book - archiver within mail (URLs allowed!) - RT

Day 3

Notes from 03/11/2000
=====================

- possible problems with flow routing and GTS in cisco,
  they're mutually exclusive. So alternatives....

          --->
  --> CAR ---> Queue --> Shape -->
          --->

  is it possible to use CAR to mark the pkt rather than drop but
  the problem remains that unless there is congestion there will be
  no WRED effect, i.e. what we do now with ns2 and evagore testing.

  BGP can use multiple links on a per pkt basis which would lead
  potentially to pkt re-ordering which for IP isn't a problem.

  Could seperate functions such that CAR is performed at NY and the
  Queue/Shape function in a router at DL but does this invalidate the
  purpose. Probably not except that it could be viewed as a glorified
  benchtest but this would not control the trans-atlantic congestion
  and the effect on jitter.

  Could this be done on an I2 router? This must be the preferred option!

  PSK to talk with UKERNA; WM to Hawaii

- traceroute monitor done by Warren on loggy/rtlin2 route - see

  http://www-iepm.slac.stanford.edu/monitoring/qos/

- accounts done at Stanford (loggy) but we need passwords!!!!

- need consolidated action list with timescales -> project plan for UKERNA

- need VC in December (early) to continue progression  of work items - RT WM

- division of work for test suite etc

- PingER on rtlin1, send via ftp the data files to SLAC and to DL - RT