IEPM

UK to US QoS testing


SLAC Home Page

Introduction

Measurements of the effect of QoS are being made for UK Particle Physics Network Coordination Group (PPNCG) to see the effects of QoS between UK JANet sites and US ESnet and Abilene sites. This is reported on in A Measure of the Effects of QoS Features between ESnet, Abilene and JANET.

Routing

The routing between a typical SLAC host (in this case www3.slac.stanford.edu) and icfamon.dl.ac.uk is seen below:
traceroute to icfamon.dl.ac.uk (193.62.127.224): 1-25 hops, 38 byte packets
 1  {core}.Stanford.EDU (134.79.x.x) [AS3671 - SU-SLAC]  1.07 ms (ttl=255)
 2  {legacy}.SLAC.Stanford.EDU (134.79.x.x) [AS3671 - SU-SLAC]  1.08 ms (ttl=254)
 3  {border}.SLAC.Stanford.EDU (134.79.x.x) [AS3671 - SU-SLAC]  1.18 ms (ttl=253)
 4  ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18) [AS32 - Stanford Linear Accelerator Center]  1.35 ms (ttl=252)
 5  pppl4-atms.es.net (134.55.24.10) [AS293 - Energy Sciences Network (ESnet)]  60.3 ms (ttl=251)
 6  60hudson-pppl.es.net (134.55.43.2) [AS293 - Energy Sciences Network (ESnet)]  63.0 ms (ttl=250)
 7  193.62.157.5 (193.62.157.5) [AS786 - JANET]  63.6 ms (ttl=249)
 8  193.62.157.13 (193.62.157.13) [AS786 - JANET]  131 ms (ttl=248)
 9  128.86.1.246 (128.86.1.246) [AS786 - University of London Computer Centre]  138 ms (ttl=247)
10  193.62.157.82 (193.62.157.82) [AS786 - JANET]  147 ms (ttl=246)
11  146.97.255.178 (146.97.255.178) [AS786 - UK Academic Joint Network Team]  144 ms (ttl=245)
12  gw-dl.netnw.net.uk (194.66.24.2) [AS786 - JANET]  144 ms (ttl=19)
13  gw-fw.dl.ac.uk (193.63.74.233) [AS786 - JANET]  143 ms (ttl=243)
14  alan3.dl.ac.uk (193.63.74.129) [AS786 - JANET]  143 ms (ttl=115)
15  icfamon.dl.ac.uk (193.62.127.224) [AS786 - JANET]  150 ms (ttl=50!)
A typical route from Oxford University to SLAC recorded by traceping on May 3rd, 2000 is shown below.
   Router                           Address
router.physics.ox.ac.uk          163.1.247.254                                           
                                 192.76.34.202                                           
oxford.london-core.ja.net        146.97.251.81                                        
london.external-gw.ja.net        146.97.251.57                           
us-gw.ja.net                     193.63.94.90                                                          
ny-pop.ja.net                    193.62.157.14                                                          
ny-pop.esnet.ja.net              193.62.157.6                                                           
pppl-60hudson.es.net             134.55.43.1                                                            
slac1-atms.es.net                134.55.24.13                                                           
{border}.SLAC.Stanford.EDU       192.68.x.x                                                          
NS2.SLAC.Stanford.EDU            134.79.x.x                                                           
The route from Stanford to UKERNA is also shown below.

Pathchar

Running pathchar between a node on the SLAC network (flora04.slac.stanford.edu) and icfamon.dl.ac.uk gives the result shown below. It can be seen that packet loss starts between nodes 6 and 7, i.e. as one goes from the ESnet Autonomous System (AS) domain to the JANET AS domain. The worst loss appears to start between hops 10 and 12 (bear in mind that losses are cumulative as one passes over more hops). The bottleneck bandwidth appears to be 12Mbps which is on the Daresbury Laboratory site.
 mtu limitted to 1500 bytes at FLORA01.SLAC.Stanford.EDU (134.79.16.29)
 doing 32 probes at each of 64 to 1500 by 44
 0 FLORA01.SLAC.Stanford.EDU (134.79.16.29)
 |    30 Mb/s,   199 us (792 us)
 1 {core}.SLAC.Stanford.EDU (134.79.x.x)
 |    50 Mb/s,   190 us (1.41 ms)
 2 (legacy}.SLAC.Stanford.EDU (134.79.x.x)
 |   120 Mb/s,   74 us (1.66 ms)
 3 {border}.SLAC.Stanford.EDU (134.79.x.x)
 |    88 Mb/s,   -90 us (1.62 ms)
 4 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18)
 |    29 Mb/s,   29.6 ms (61.3 ms)
 5 pppl4-atms.es.net (134.55.24.10)
                        -> 134.55.24.10 (1)           
 |    24 Mb/s,   1.18 ms (64.2 ms)
 6?60hudson-pppl.es.net (134.55.43.2)
                        -> 134.55.43.2 (1)           
 |    20 Mb/s,   211 us (65.2 ms),  2% dropped
 7?193.62.157.5 (193.62.157.5)
                        -> 193.62.157.5 (1)           
 |    42 Mb/s,   38.3 ms (142 ms),  2% dropped
 8?193.62.157.9 (193.62.157.9)
                        -> 193.62.157.9 (1)           
 |   123 Mb/s,   -2836 us (137 ms),  2% dropped
 9?193.63.94.246 (193.63.94.246)
 |   501 Mb/s,   4.67 ms (146 ms),  2% dropped
10 193.62.157.82 (193.62.157.82)
                        -> 193.62.157.82 (2)           
 |    29 Mb/s,   2.96 ms (152 ms),  7% dropped
11?146.97.255.178 (146.97.255.178)
                        -> 146.97.255.178 (1)           
 |    32 Mb/s,   845 us (154 ms),  12% dropped
12?gw-dl.netnw.net.uk (194.66.24.2)
                        -> 194.66.24.2 (3)           
 |   ?? b/s,   96 us (155 ms),  8% dropped
13?gw-fw.dl.ac.uk (193.63.74.233)
                        -> 193.63.74.233 (1)           
 |    19 Mb/s,   48 us (155 ms),  +q 1.09 ms (2.55 KB),  8% dropped
14?alan3.dl.ac.uk (193.63.74.129)
                        -> 193.63.74.129 (2)           
 |    12 Mb/s,   -2605 us (151 ms),  11% dropped
15?icfamon.dl.ac.uk (193.62.127.224)
15 hops, rtt 146 ms (151 ms), bottleneck  12 Mb/s, pipe 217314 bytes

Pingroute

A pingroute.pl measurement (see below) confirms pathchar's conclusions about loss. Bear in mind that pingroute.pl is pinging routers which may give a different priority to responding to pings than they give to routing traffic. Thus the losses reported may be different to those seen by packets going through the routers. However, usually loss of packets is caused by the router or a link being busy. It can be seen that the losses reported by pingroute.pl are similar to those reported by pathchar. Also until the penultimate hop there is little difference in the losses betwene small (100 Byte) and large (1472 byte) packets.
40cottrell@flora01:~>bin/pingroute.pl -c 100 -s 1472 icfamon.dl.ac.uk
Architecture=SUN5, commands=traceroute -q 1 and ping -s node 1472 100, pingroute.pl version=1.4, 5/16/00, debug=1
pingroute.pl version 1.4, 5/16/00 using traceroute to get nodes in route from flora01 to icfamon.dl.ac.uk
traceroute: Warning: ckecksums disabled
traceroute to icfamon.dl.ac.uk (193.62.127.224), 30 hops max, 40 byte packets
pingroute.pl version 1.4, 5/16/00 found 15 hops in route from flora01 to icfamon.dl.ac.uk
1  {core}.SLAC.Stanford.EDU (134.79.x.x)  0.653 ms
2  {legacy}.SLAC.Stanford.EDU (134.79.x.x)  1.043 ms
3  {border}.SLAC.Stanford.EDU (134.79.x.x)  3.529 ms
4  ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18)  1.450 ms
5  pppl4-atms.es.net (134.55.24.10)  60.781 ms
6  60hudson-pppl.es.net (134.55.43.2)  62.815 ms
7  193.62.157.5 (193.62.157.5)  68.664 ms
8  193.62.157.9 (193.62.157.9)  142.859 ms
9  193.63.94.246 (193.63.94.246)  139.580 ms
10  193.62.157.82 (193.62.157.82)  154.224 ms
11  146.97.255.178 (146.97.255.178)  153.692 ms
12  gw-dl.netnw.net.uk (194.66.24.2)  150.355 ms
13  gw-fw.dl.ac.uk (193.63.74.233)  149.862 ms
14  alan3.dl.ac.uk (193.63.74.129)  150.005 ms
15  icfamon.dl.ac.uk (193.62.127.224)  156.676 ms
Wrote 15 addresses to /tmp/pingaddr, now ping each address 100 times from flora01
         pings/node=100                              100 byte packets           1472 byte packets
         NODE                                  %loss    min    max    avg %loss   min    max    avg from flora01
134.79.19.2     {core}.SLAC.STANFORD.EDU          0%    0.0   12.0    0.0   0%    1.0    3.0    1.0 Thu May 18 13:36:39 PDT 2000
134.79.135.6    {legacy}.SLAC.STANFORD.EDU        0%    0.0    2.0    0.0   0%    2.0    2.0    2.0 Thu May 18 13:39:58 PDT 2000
134.79.111.4    (border}.SLAC.STANFORD.EDU        0%    1.0    5.0    1.0   0%    2.0    4.0    2.0 Thu May 18 13:43:16 PDT 2000
192.68.191.18   ESNET-A-GATEWAY.SLAC.STANFORD.    0%    0.0  383.0   22.0   0%    2.0  324.0   13.0 Thu May 18 13:46:34 PDT 2000
134.55.24.10    PPPL4-ATMS.ES.NET                 0%   60.0  460.0   67.0   0%   63.0  257.0   68.0 Thu May 18 13:49:53 PDT 2000
134.55.43.2     60HUDSON-PPPL.ES.NET              0%   62.0   98.0   64.0   0%   66.0  209.0   70.0 Thu May 18 13:53:11 PDT 2000
193.62.157.5    193.62.157.5                      3%   63.0   76.0   64.0   2%   68.0   87.0   69.0 Thu May 18 13:56:29 PDT 2000
193.62.157.9    193.62.157.9                      3%  142.0  342.0  146.0   3%  148.0  324.0  152.0 Thu May 18 13:59:49 PDT 2000
193.63.94.246   193.63.94.246                     2%  131.0  144.0  133.0   7%  137.0  151.0  138.0 Thu May 18 14:03:09 PDT 2000
193.62.157.82   193.62.157.82                     4%  152.0  391.0  157.0   0%  158.0  344.0  161.0 Thu May 18 14:06:30 PDT 2000
146.97.255.178  146.97.255.178                   10%  146.0  164.0  149.0   9%  153.0  164.0  154.0 Thu May 18 14:09:49 PDT 2000
194.66.24.2     GW-DL.NETNW.NET.UK                8%  142.0  161.0  144.0   5%  149.0  161.0  151.0 Thu May 18 14:13:09 PDT 2000
193.63.74.233   GW-FW.DL.AC.UK                    8%  142.0  155.0  144.0  11%  150.0  163.0  151.0 Thu May 18 14:16:29 PDT 2000
193.63.74.129   ALAN3.DL.AC.UK                    7%  143.0  642.0  150.0  11%  151.0  164.0  152.0 Thu May 18 14:19:49 PDT 2000
193.62.127.224  ICFAMON.DL.AC.UK                  6%  155.0  371.0  160.0  12%  166.0  408.0  170.0 Thu May 18 14:23:10 PDT 2000

One way measurements

We also used sting with 100 probes to look at the one way packet losses in both directions. The results show that the loss at this time (6:20pm PST 5/15/2000) were greater (9% compared to 0%) in the direction from the UK to the US.
% sudo sting icfamon.dl.ac.uk
(Unreliable) Connection setup took 138 ms
src = 134.79.24.97:10502 (4259008577)
dst = 193.62.127.224:80 (1543681059)

dataSent = 100, dataReceived = 100
acksSent = 100, acksReceived = 91
Forward drop rate = 0.000000
Reverse drop rate = 0.090000
630 packets received by filter
0 packets dropped by kernel
One can also look at the Surveyor reports to look at the one way delays and losses. Graphs of the one way delay are shown below for SLAC to UKERNA and UKERNA to SLAC. These graphs show that during the daytime in Europe the variability in delay is much greater in the SLAC-UKERNA direction than vice-versa. The losses (delay of >= 300ms.) are also much greater in the SLAC to UKERNA direction. From these graphs it is seen that congestion is more likely to be greater (and thus QoS more likely to be effective) in the daytime in Europe and on the path from the US to the UK.

The Surveyor graphs for one way measurements between UKERNA and Stanford University are seen below. They are very similar to those between SLAC and UKERNA. This suggests that the congestion is in the common path for these pairs, i.e. in JANET.

For further measurements made by Robin Tasker of Daresbury Lab on the routing and ping responses seen from the UK end see http://icfamon.dl.ac.uk/papers/QoS/stats-1705.pdf.

Summary

Congestion appears to be more prevalent in the UK to US direction. It looks like the congestion is within JANET so if the QoS is being applied correctly within the entire JANET route then why it should show no effect is strange. I would expect the QoS to reduce the loss unless the QoS queues are equally as congested as the non QoS queue.

Some possibilities are that the QoS code is not properly applied (e.g. is only configured in one direction in particular may not be active in the UK to US direction), or is only partially applied (e.g. is only applied to the trans-Atlantic link and the congestion elsewhere is defeating its best efforts) or may not be working properly. Early versions (Spring 1999) of the Cisco WFQ code failed to improve the performance seen on a testbed between SLAC and LBL, maybe something similar is happening with the WRED code.

Email from Peter Clarke dated 5/18/00 clarifies that "the CAR/WRED is active only at the entry point of ESNET into JANET and only on traffic INBOUND to UK". This partial application may be reason the effect is not noticeable since there may be substantial congestion elsewhere that is causing the problem. It may be necessary to extend the deployment or at least review the utilization of all the links along the route.


Back to top


Created: May 9, 2000
URL: http://www-iepm.slac.stanford.edu/pinger/tools.html
Comments to iepm-l@slac.stanford.edu