IEPM

ICMP Rate Limiting 

Executive Summary

Part 2: Detailed Study and List of Candidates

SLAC Home Page

Part 1 : Introduction : Primary Tools and Techniques

Slope of Number of Successful Packets vs Packet number :

From SLAC we monitor various sites around the world and send a bunch of 10 pings every half an hour. We analyzed the data collected over a period of about 9 months (from  July 1999 to March 2000) from this process. We performed a linear estimation of the points representing the number of successful packets transmitted to each site over a period of one month, and calculated the slope of this line. We hypothesize that any site that performs ICMP rate-limiting using a leaky-bucket mechanism would be observed as creating a distinct slope in this data as described in detail in part I. The results from this analysis are presented below.

The average slope was found to be 0 which indicated that most sites either did not perform rate-limiting on ICMP or else used a different technique for rate-limiting such as RED which would not be observabable through this technique.

Presented here is the histogram of the slope per month of these sites during the nine month period :
 

Likely Candidates:

From this data, we chose all those sites whose avg slope was < -0.5, since they seemed to show some consistency in the sign of the slope over the entire data collection period. These comprised about 20% of the monitored sites. Note that among these, the .gov sites showed a spectacularly high slope over the month of September and very variable, small-magnitude slope over the rest of the period and are not therefore promising candidates, but since this list was better off being overly inclusive rather than excluding some possible candidates, we kept them in the original list of candidates. However, later results proved that these .gov sites were not suitable candidates and were excluded from the presentation of further results.

Presented below is the list of candidates chosen for further study and their average slopes over the nine month period.
Table 1: Tail-drop slopes for Candidate sites
Node Tail-drop Slope
concave.cs.wits.ac.za -59.309
www.lbl.gov  -14.31183333
fnal.fnal.gov -12.251
www.wits.ac.za  -12.13766667
wwwmics.er.doe.gov -10.761375
dns1.arm.gov  -10.719625
www.ornl.gov  -10.511375
www.llnl.gov  -10.367875
ping.bnl.gov  -8.967125
eros.cnea.gov.ar -6.203222222
www.utm.my  -4.6702
df.uba.ar  -4.244555556
www.uzsci.net  -3.7042
sun.ihep.ac.cn  -3.565666667
fisica.edu.uy  -3.32875
main.tirana.al  -3.13475
ffden-1.phys.uaf.edu -2.335625
sci.am  -2.121142857
www.hepi.edu.ge  -2.105857143
www.bu.ac.th  -1.89375
www.unimas.my  -1.6534
beta.carnet.hr  -1.562
tnp.saha.ernet.in -1.5574
mail2.starcom.co.ug -1.5245
www.jinr.dubna.su -1.521888889
magic.mn  -1.47425
dns1.vn  -1.4208
tifr.res.in  -1.389333333
satsun.sci.kz  -1.3815
www.lu.lv  -1.310857143
www.cs.ui.ac.id  -1.2484
kadri.ut.ee  -1.236285714
www.usm.my  -1.1382
ns.itep.ru  -1.128777778
www.ihep.su  -1.127333333
www.uct.ac.za  -1.096
moon.atomki.hu  -1.068111111
lhr.comsats.net.pk -0.88625
www.ucr.ac.cr  -0.88625
www.iisc.ernet.in -0.873
proxy.fm.intel.com -0.848
bgcict.acad.bg  -0.834714286
cc.metu.edu.tr  -0.828571429
www-ribf.riken.go.jp -0.823666667
www.hr  -0.8196
ping.lucas.lu.se -0.814222222
www.dolphinics.no -0.812222222
cni.md  -0.810428571
chapar.ipm.ac.ir -0.799142857
if.usp.br  -0.774
dns.edu.cn  -0.757
csg-ippm.net.wisc.edu -0.737333333
intrans.baku.az  -0.735142857
www.ht.hr  -0.727
nic.uniandes.edu.co -0.674666667
pinger3.cs.waikato.ac.nz -0.6665
ns1.waikato.ac.nz -0.610777778
www.cab.cnea.gov.ar -0.5835
ping.isnet.is  -0.5746
ns2.sri.ucl.ac.be -0.571
julius.ktl.mii.lt -0.540285714
ns.riken.go.jp  -0.5346
gamma.carnet.hr  -0.5162
traceroute.hkt.net -0.515111111
g.root-servers.net -0.511666667
www.itb.ac.id  -0.4884

Comparative study of Ping loss and Sting loss

To substantiate our claims that some of our candidates are rate-limiting, we need to gather data about traffic that is not subjected to a similar pattern of limiting. In order to bring forth this comparison, we simultaneously probed the candidate nodes with Pings and Stings ( TCP probe using the Sting package) such that both the processes - Ping and Sting - probed each and every node in a lock-step mechanism. The lock-step mechanism was achieved by using signals to allow the Ping and Sting process to intercommunicate - when Sting, which probes for a fixed number of packets and therefore takes a variable amount of time, finishes probing a site it sends a USR signal to the Ping process. The Ping process is executing a bunch of 10 pings at a time. On receiving a USR signal from the process carrying out the Sting, it waits for the current bunch of 10 pings to be over, then starts pinging the next node and sends a different USR signal back to the process carrying out the Sting. On receipt of this signal the Sting process also starts probing the next node. In this fashion we obtain synchronicity between Ping and Sting measurements with Ping slightly overprobing each site on either site of the Sting process. Our findings for the sites that allowed the TCP probe on port 80 to go through are listed below.

Before discussing the results, we need to introduce a new metric we term Asymmetry.

Asymmetry:

We define the "method asymmetry" as Amethod = (lossping - losssting) / (lossping + losssting). We expect this to be positive for most sites, since we expect ping performance to be no better than general TCP performance, and worse if ICMP is being rate-limited. The following diagram shows a graphical representation of the asymmetry calculated for the candidates culled from our observations for sites that showed a high slope, consistently over a period of 9 months, in the Ping results.
Table 2: Losses and "method asymmetries" for Ping and Sting
Name of Node Pings xmt Pings rcv Pings lost Ping Loss % Sting fwd loss Sting rev loss Sting overall loss asymmetry
clan2.fit.unimas.my 330 190 140 42.42 0.10% 0.00% 0.100 1.00
ultra.hepi.edu.ge 870 786 84 9.66 0.55% 0.08% 0.627 0.88
www.dolphinics.no 990 980 10 1.01 0.20% 0.00% 0.204 0.66
ns.ucr.ac.cr 1000 987 13 1.30 0.45% 0.06% 0.510 0.44
cab.cnea.gov.ar 1010 1002 8 0.79 0.26% 0.14% 0.400 0.33
tjev.tel.fer.hr 1010 989 21 2.08 1.06% 0.26% 1.318 0.22
lhr.comsats.net.pk 1110 1032 78 7.03 4.65% 0.00% 4.646 0.20
pknt.utm.my 1080 1046 34 3.15 2.00% 0.12% 2.120 0.20
tnp.saha.ernet.in 1120 329 791 70.63 45.97% 8.32% 50.460 0.17
www.jinr.dubna.su 1400 1146 254 18.14 13.02% 0.03% 13.043 0.16
ns.itep.ru 1210 1082 128 10.58 7.76% 0.02% 7.779 0.15
gamma.carnet.hr 1040 1000 40 3.85 2.64% 0.20% 2.835 0.15
intrans.baku.az 1050 1031 19 1.81 1.32% 0.02% 1.342 0.15
ns1b.itb.ac.id 1670 1317 353 21.14 16.06% 2.19% 17.899 0.08
daimon.uniandes.edu.co 1000 978 22 2.20 1.72% 0.36% 2.075 0.03
groa.uct.ac.za 1450 1330 120 8.28 8.28% 0.02% 8.300 0.00
tifr.res.in 1040 827 213 20.48 3.40% 17.81% 20.601 0.00
www.bu.ac.th 1400 1215 185 13.21 10.66% 3.29% 13.598 -0.01
www.usm.my 1050 824 226 21.52 2.51% 22.28% 24.232 -0.06
moon.atomki.hu 990 938 52 5.25 0.18% 5.87% 6.042 -0.07
cni.md 990 982 8 0.81 0.62% 0.42% 1.040 -0.13
sun.ihep.ac.cn 950 938 12 1.26 0.55% 2.11% 2.651 -0.35
ns1.waikato.ac.nz 1560 1482 78 5.00 78.16% -355.21% 0.595 0.79
altair.ihep.su 1710 1253 457 26.73 3.32% -15.02% -11.203 2.44
www.llnl.gov 1000 999 1 0.10 0.02% 0.00% 0.020 0.67
dagobert.lucas.lu.se 1000 998 2 0.20 0.14% 0.00% 0.140 0.18
www.lu.lv 1000 994 6 0.60 0.64% 0.00% 0.640 -0.03
otf1.er.doe.gov 1000 999 1 0.10 0.12% 0.04% 0.160 -0.23
www-ribf.riken.go.jp 1000 999 1 0.10 0.16% 0.00% 0.160 -0.23
dns1.arm.gov 1000 999 1 0.10 0.16% 0.02% 0.180 -0.29
traceroute.hkt.net 1000 997 3 0.30 0.60% 0.02% 0.620 -0.35
fisica.edu.uy 1000 999 1 0.10 0.50% 0.00% 0.500 -0.67
kadri.ut.ee 1060 1057 3 0.28 6.38% 0.08% 6.456 -0.92
eros.cnea.gov.ar 1000 995 5 0.50 0.20% 50.06% 50.159 -0.98
fnal.fnal.gov 750 750 0 0.00 0.16% 0.00% 0.160 -1.00
infosrv1.ctd.ornl.gov 1000 1000 0 0.00 0.08% 0.04% 0.120 -1.00
isgate.isnet.is 1000 1000 0 0.00 0.18% 0.00% 0.180 -1.00
sunsite.wits.ac.za 1000 1000 0 0.00 0.22% -2.02% -1.796 -1.00
w4.lbl.gov 1000 1000 0 0.00 0.14% 0.00% 0.140 -1.00 

Note that in some cases, Sting reports a negative loss. These results are anomalous and have to be discarded.

The plot below shows the method asymmetries for the selected hosts :


 

The following diagram shows a plot of the histogram and cumulative frequency of the asymmetry calculated for these sites based on Table 1.

The following is a scatter plot of the tail-drop slope calculated for the month of March ( the most recent data at that point in time, since the plots were generated in April 2000) versus the method asymmetry from the ping-vs-sting results. The likeliest candidates are those that lie in the bottom right-hand quadrant. We see that most of the candidates investigated lie in this plane The site in Malaysia emerges as a very strong candidate for possible rate-limiting on the basis of data above.
img SRC="slope-vs-asymmetry.jpg" height=447 width=563>

Higher Statistics Measurements

We attempted to repeat the measurements in order to obtain greater statistical accuracy, with somewhat mixed results. the results are shown below in Tables 2 and 3. While Table 2 was calculated by repeating the experiment with the same parameters, but over a longer period, including a weekend, Table 3 was calculated by configuring Sting to probe each site with 1000 TCP packets instead of the default of 100 and the results are quite different.

Longer Term Measurement

The following is a much longer trace (about 10 times as long) and was obtained over a the period of a week, including a weekend. A new column is defined here, numTimes, which counts the number of valid Sting responses that it counted (A Sting response has been considered to be valid if both rev and fwd loss are reported as +ve numbers). Some sites like julius.ktl.mii..lt show a very low number of valid Sting responses, and hence their results should be treated with less confidence than those which were based on a higher number of pings.
Note:
  1. We ignore possible warnings issued by Sting including Route Change since they seem to be very frequent.
  2. Both the tables, Table 2 and Table 3 are based on stings of 100 packets each while Table 4 relies on Sting of 1000 packets each. Hence the number of independent Sting measurements are lower for Table 4 even though both Table 3 and Table 4 are based on about the same amount of data (since 1000 packet Stings take longer to run than 100 packet Stings)
Table 3: Ping and Sting measurements for 1000 packets
Name pings-snt pings-rcv ping-loss sting-fwd-loss sting-rev-loss sting-loss numTimes asymmetry
dagobert.lucas.lu.se 10390 10056 3.21% 0.16% 0.00% 0.17% 515 0.90
www-ribf.riken.go.jp 10410 10049 3.47% 0.25% 0.04% 0.29% 516 0.84
tifr.res.in 10370 9934 4.20% 0.42% 0.52% 0.93% 504 0.64
isgate.isnet.is 10410 9995 3.99% 0.54% 0.55% 1.09% 509 0.57
main.tirana.al 10550 10142 3.87% 0.77% 0.37% 1.14% 514 0.55
www.dolphinics.no 10030 9597 4.32% 0.24% 1.26% 1.51% 479 0.48
www.iisc.ernet.in 14270 9824 31.16% 10.93% 0.55% 11.42% 428 0.46
julius.ktl.mii.lt 6060 5678 6.30% 1.44% 1.16% 2.58% 68 0.42
concave.cs.wits.ac.za 8850 8496 4% 1.43% 0.39% 1.81% 329 0.38
ultra.hepi.edu.ge 9040 8540 5.53% 2.01% 0.71% 2.71% 346 0.34
sunsite.wits.ac.za 10100 9709 3.87% 1.45% 0.49% 1.94% 338 0.33
cni.md 8630 8014 7.14% 1.34% 2.69% 3.99% 341 0.28
cab.cnea.gov.ar 10640 10185 4.28% 1.89% 0.95% 2.82% 502 0.20
lhr.comsats.net.pk 10890 10192 6.41% 2.52% 1.77% 4.25% 488 0.20
pknt.utm.my 10140 9306 8.22% 3.37% 2.96% 6.23% 452 0.14
ns.ucr.ac.cr 10530 9867 6.30% 2.04% 2.91% 4.89% 478 0.13
sci.am 10340 9764 5.57% 2.22% 2.20% 4.37% 427 0.12
clan2.fit.unimas.my 8410 7859 6.55% 3.98% 1.63% 5.54% 374 0.08
sun.ihep.ac.cn 10420 10038 3.67% 0.99% 2.48% 3.45% 497 0.03
solar.uzsci.net 10760 10128 5.87% 5.09% 0.58% 5.64% 467 0.02
if.usp.br 613240 568929 7.23% 6.48% 0.60% 7.04% 510 0.01
traceroute.hkt.net 10960 10519 4.02% 3.28% 1.07% 4.32% 516 -0.04
www.jinr.dubna.su 11490 10800 6.01% 4.65% 2.15% 6.70% 496 -0.05
intrans.baku.az 13030 12366 5.10% 4.57% 1.24% 5.76% 509 -0.06
www.bu.ac.th 10000 9644 3.56% 0.43% 3.96% 4.37% 493 -0.10
fisica.edu.uy 11060 10570 4.43% 2.85% 3.11% 5.87% 495 -0.14
daimon.uniandes.edu.co 16740 15598 6.82% 8.97% 0.77% 9.67% 504 -0.17
ns.itep.ru 12440 11700 5.95% 7.07% 1.67% 8.62% 494 -0.18
tnp.saha.ernet.in 12990 12164 6.36% 5.42% 4.56% 9.73% 500 -0.21
moon.atomki.hu 11840 11075 6.46% 6.47% 4.16% 10.35% 481 -0.23
tjev.tel.fer.hr 9190 8813 4.10% 3.86% 3.49% 7.22% 297 -0.28
www.lu.lv 16270 15082 7.30% 11.81% 1.78% 13.38% 504 -0.29
kadri.ut.ee 12450 11810 5.14% 9.38% 0.41% 9.75% 501 -0.31
ns1b.itb.ac.id 13900 12568 9.58% 15.03% 4.10% 18.51% 337 -0.32
gamma.carnet.hr 9540 9191 3.66% 4.08% 3.49% 7.43% 301 -0.34
groa.uct.ac.za 19720 18249 7.46% 14.06% 2.65% 16.34% 492 -0.37
www.usm.my 11290 10626 5.88% 3.49% 12.41% 15.47% 500 -0.45
altair.ihep.su 15630 14662 6.19% 18.78% 2.09% 20.48% 477 -0.54
eros.cnea.gov.ar 10180 9274 8.90% 1.15% 54.17% 54.70% 470 -0.72 

The graph below shows the asymmetry distribution for the data in Table 3.
Asymmetry distribution over larger set of pings

Larger number of probes per sample measurement

The data in Table 4 below was collected by configuring Sting to send 1000 packets at a time, and the results are quite different from the earlier ones.
Table 4: Ping and Sting measurements for large number of probes
Name pings-snt pings-rcv ping-loss fwd-sting-loss rev-sting-loss total-sting-loss numTimes  asymmetry
concave.cs.wits.ac.za 7970 7797 2.17% 1.39% 0.03% 1.42% 70 0.21
ultra.hepi.edu.ge 7720 7288 5.60% 3.55% 0.88% 4.40% 66 0.12
sunsite.wits.ac.za 10130 9978 1.50% 1.18% 0.03% 1.20% 70 0.11
clan2.fit.unimas.my 13690 12594 8.01% 6.54% 0.00% 6.54% 73 0.10
tjev.tel.fer.hr 11360 10763 5.26% 3.82% 0.61% 4.41% 77 0.09
altair.ihep.su 10600 7631 28.01% 23.71% 0.00% 23.71% 86 0.08
gamma.carnet.hr 10890 10344 5.01% 3.48% 0.83% 4.27% 74 0.08
moon.atomki.hu 11490 10506 8.56% 6.55% 0.97% 7.46% 99 0.07
www.iisc.ernet.in 10960 8955 18.29% 15.70% 0.39% 16.03% 87 0.07
intrans.baku.az 10950 9934 9.28% 6.85% 1.47% 8.22% 94 0.06
lhr.comsats.net.pk 10890 9979 8.37% 5.53% 2.21% 7.62% 95 0.05
ns1b.itb.ac.id 9690 8520 12.07% 10.49% 0.58% 11.01% 81 0.05
www.lu.lv 11700 10025 14.32% 12.10% 1.19% 13.14% 98 0.04
daimon.uniandes.edu.co 10980 9600 12.57% 11.12% 0.52% 11.58% 91 0.04
www.jinr.dubna.su 10670 9472 11.23% 9.91% 0.54% 10.40% 97 0.04
pknt.utm.my 9450 7868 16.74% 13.12% 2.76% 15.52% 80 0.04
groa.uct.ac.za 12210 9790 19.82% 18.78% 0.02% 18.79% 97 0.03
tifr.res.in 21220 21003 1.02% 0.98% 0.00% 0.98% 94 0.02
traceroute.hkt.net 11350 10796 4.88% 4.89% 0.24% 5.12% 99 -0.02
isgate.isnet.is 11180 11011 1.51% 0.81% 0.82% 1.62% 98 -0.03
ns.itep.ru 15290 13450 12.03% 12.97% 0.00% 12.97% 96 -0.04
fisica.edu.uy 7560 7449 1.47% 1.61% 0.00% 1.61% 94 -0.05
ns.ucr.ac.cr 11100 10878 2% 1.41% 0.83% 2.23% 100 -0.06
if.usp.br 12510 11649 6.88% 7.49% 0.27% 7.74% 100 -0.06
www.bu.ac.th 9790 9733 0.58% 0.60% 0.14% 0.73% 85 -0.12
cab.cnea.gov.ar 22460 21959 2.23% 3.11% 0.00% 3.11% 94 -0.16
main.tirana.al 21990 21683 1.40% 1.98% 0.00% 1.98% 98 -0.17
sun.ihep.ac.cn 10130 9986 1.42% 3.41% 0.22% 3.63% 92 -0.44
www-ribf.riken.go.jp 11570 11552 0.16% 0.45% 0.02% 0.47% 100 -0.50
kadri.ut.ee 11890 11590 2.52% 8.99% 0.13% 9.11% 98 -0.57
dagobert.lucas.lu.se 11560 11529 0.27% 0.99% 0.00% 0.99% 100 -0.58
eros.cnea.gov.ar 11800 11602 1.68% 1.51% 53.81% 54.50% 97 -0.94
The graph below shows the asymmetries for the data in Table 4.
asymmetry distribution when sting consists of 1000 packets per round

Ping Blocking :

We have noticed that some sites do not respond to any pings at all out of the bunch of 10 pings that are sent every half an hour. This situation makes it difficult for us to evaluate whether the network is down or whether the site is performing ICMP rate-limiting. This phenomenon is omitted from calculation of packet-loss under the assumption that this signifies network unavailability.
We decided to test this assumption by subjecting them to rounds of Ping vs Sting. We came up with strong evidence that out of the 30 odd sites that were losing >= 5% of the Ping request bunches, about 10 of them might be performing rate-limiting. The full results are shown below :

Results on Port 80

Table 5: Ping and Sting results for port 80
Name  Pings-sent Pings-recvd  Ping-loss   Fwd-sting-loss  Rev-sting-loss overall-sting-loss  NumTimes  Asymmetry
khi.comsats.net.pk  1110 2098 20% 0.019 0.047 6.5% 82 0.88
clan2.fit.unimas.my  1400 770 45.5% 0.009 0.068 7.66% 40 0.71
concave.cs.wits.ac.za  1700 1630 4.12% 0.008 0.011 1.94% 68 0.36
pknt.utm.my 2390 1621 32.18% 0.169 0.054 21.4% 76 0.2
if.ufrj.br 2610 2227 14.67% 0.069 0.044 10.86% 97 0.15
altair.ihep.su 4370  2950 32.49% 0.229 0.053 27.03% 93 0.09
daimon.uniandes.edu.co  3050  2572  15.67% 0.095 0.043 13.36% 99 0.08
groa.uct.ac.za  4020  3055 24.00% 0.149 0.067 20.63% 97 0.075
probe36.mot.com  2020  1973  2.33% 0.021 0 2.08% 100 0.056
eros.cnea.gov.ar  2030 2002  1.38% 0.016 0.54 54.5% 99 -0.95

Results on Port 7:

Contrast the low asymmetry recorded by altair.ihep.su on port 80 v/s port 7. While it could be a case of statistical error ( the data was collected over a different time-period), it is also possible that rate-limiting may not have kicked in on port 7 while it did on port 80 for the sting packets. More independent measurements can help us to arrive at the right answer. The reason that some sites show up on one but not the other is that they didn't respond to Stings on one of the ports - i.e. the port was blocked.
 
Table 6: Ping and Sting results for port 7
Name pings-snt pings-rcv Ping-loss Fwd-sting-loss Rev-sting-loss overall-sting-loss NumTimes  Asymmetry
altair.ihep.su 1000 864 13.60% 1.22% 1.78% 2.98% 98 0.64
ns2.sri.ucl.ac.be 1990 1979 0.55% 0.14% 0.02% 0.16% 99 0.55
julius.ktl.mii.lt 1990 1919 3.57% 0.46% 2.94% 3.39% 100 0.03
pwt.direcpc.com 2000 1985 0.75% 0.73% 0.00% 0.73% 95 0.02

 

A Curious Case :

The site eros.cnea.gov.ar shows a very baffling behavior. While its Ping loss is around 1-5%, Sting consistently reports its loss to be close to 54%. This pattern has been validated over a number of independent measurements. Our hypothesis is that they may have an entry in their access-list which permits all ICMP traffic from our site to pass unimpeded. If true, then it implies that even web traffic is heavily rate-limited by this site and that it could be suffering from under-provisioning. The other explanation is that the monitoring network could be different from the actual network and that our ICMP requests are monitoring a different network from the one which responds to STING requests. We await the validation or otherwise of this hypothesis, but there is not gainsaying that this behaviour of the particular site in argentina is at once, striking and extremely interesting.

Comparison of Ping vs. SYN/ACK

We next compared pings with synack for the candidate sites. Synack establishes a client-server TCP connection using the 3-way handshake (SYN, SYN/ACK, ACK) of the TCP protocol, records the round trip time (RTT) or loss and closes (FIN) the connection. For more on this see Measuring RTT by using SYN/ACKs instead of Pings, To use synack, we had to find ports that were available to open a TCP session on the candidate hosts. For each of the candidate hosts we tried using synack to open a TCP session to TCP Port Numbers 80 (www), 25 (smtp), 21 (ftp control), 53 (domain), 22 (ssh), 23 (telnet), 7 (TCP echo), 9 (discard), and 13 (daytime). We were able to find open TCP ports for 44 of the initial 56 candidates. During the tests (which extended over a week), we received one complaint from an adminsitrator of a host that we were probing the telnet port for: We see a lot of tries of telnet connections on a machine here, on which there is nothing interesting for you, and anyway on which you are not allowed to connect. We would appreciate that you take the measures to stop these tries. We removed this host from the list of candidate hosts, and also restricted further monitoring to ports 80, 25 and 21 on other hosts, apart from one case where we knew the administrator. The results we had already gathered indicated that the pings and synacks tracked one another very well for the host that we removed following the administrators complaint.

For each candidate, with an open port, we used oceanus.slac.stanford.edu (a Sun Ultra 5_10 running SunOS 5.6) to send one ping (we used the NIKHEF ping since it allowed us to set the timeout) followed by one synack and recorded the round trip time (RTT) or loss. Each measurement was also timestamped. In both cases the timeout options were set to 10 seconds. When the list of hosts to test was completed a the script slept for a random delay before repeating the cycle. The sleep delay was typically 209 seconds + epsilon, where epsilon was a random number selected from 0 to 199 seconds and the random distribution was flat. Each cycle through the list took about 8 minutes and was typically repeated 20-30 times. We repeated these sets of measurement cycles with different sets of hosts at various times between Thursday June 15, 2000 16:21:44 PDT and August 13, 2000 each time the new measurements being appended to those measured previously. We also extended the measurements cover the 76 PingER Beacon sites. There was some overlap between the hosts chosen above and the Beacons. The total number of hosts that we compared pings with synack was about 111.

The recorded data was then analyzed to generate RTT time series, and synack loss asymmetries:

Asynack_Loss = (Lossping - Losssynack) / (Lossping + Losssynack).
The resultant time series and statistical tables indicate (note that losses in the RTT time series are shown as zero RTT): The average across the average RTTs (if the ping RTT was > 2 seconds we treated the packet as being lost to enable comparison with the synack measurements) for the 37 hosts with similar time series for ping and synack differs by 1.7msec (ping slower) and the medians of the averages differs by about 10msec (ping slower).

The minimum synack RTT asymmetry:

Asynack_RTT = (RTTping - RTTsynack) / (RTTping + RTTsynack)
was -.21, and the maximum was 0.11. The average synack RTT asymmetry was -0.0045 and the median was 0.0023. Plots of the RTT differences and asymmetries by host are shown below. It is seen that the sign of the asymmetry varies from host to host and there are more hosts where the RTT for synack is > the RTT for ping (positive asymmetry).

Plots of the Loss differences and asymmetries by host are shown below. Again the sign of the asymmetry varies from host, and there are more hosts with heavier loss on ping than synack. The average (median) loss for pings is 12.5% (6.5%) and for synack is 10.8% (4.5%). The Inter Quartile Range (IQR) for ping loss is 13.6% and for synack is 11.3%. The average (median) synack Loss asymmetry is 0.058 (0.000) and the IQR is 0.27.

Comparison of 100 Byte vs 1000 Byte Pings:

We would expect that if pings are being rate-limited, then 1000 byte pings should trigger rate-limiting more often than 100 byte pings. We are assuming here the absence of any mechanisms whereby routers take in the packet and store it with a reduced size - instead we assume that 1000 byte pings could be possibly contending for router buffer space and therefore may be dropped more frequently even in the case of pings to sites that may not be rate-limiting pings. Hence we seek to show that on the average, 1000 byte pings do exhibit a higher packet loss than 100 byte pings, and secondly that our selected candidates exhibit a HIGHER loss asymmetry Asize = (loss1000B - loss100B) / (loss1000B + loss100B) (where loss is the median monthly loss measured by PingER for the monitor (SLAC) site to monitored site pair) than the average for all the data.

To investigate this we took the monthly median ping packet losses measured by PingER from SLAC to about 200 sites from January 1998 through June 2000.

Loss Asymmetry measured from SLAC to all sites.

The Probablility Distributions Functions (PDF) of the 100Byte (blue) and 1000Byte (red) monthly median packet loss are shown in the plot below together with the Cumulative Distribution Functions. We use the PDF instead of the frequency to simplify comparing the two distributions taken with different numbers of samples. The plots show little difference in the shapes of the distributions. Looking in more detail at the various statistical metrics for the distribution (see Table 7 below), the median losses for 1000 Byte and 100 Byte pings differ by about 0.38% (1.27% - 0.89%).

If we just select rate limiting candidates with Slope < -0.7 (and eliminting the anomalous .gov sites) we have 47 such sites. The loss distributions for these Limit Candidates are shown in the plot below. In this case the 100 Byte distribution has a markedly greater frequency of low losses (< 2%). The differences in the medians in this case is 3.3%.

The relevant statistics for the above plots is shown below in Table 6.
Table 7: Statistics for 100Byte and 1000Byte loss distributions
Metric All 100Byte losses All 1000Byte losses Rate Limit Candidates 100Byte losses Rate Limit Candidates 1000Byte losses
Samples 4845 4769 611 520
Average 2.95 3.61 7.26 10.58
Std Dev 5.06 6.1 7.53 9.91
Median 0.89 1.27 4.79 7.67
25 percentile 0.208 0.3 1.84 3.8
75 percentile 3/33 4.3 10.32 14.11
90 percentile 8.28 9.95 18.09 22.1
95 percentile 13.25 14.80 22.13 28.76
99 percentile 24.77 20.42 33.17 52.09
99.5 percentile 29.11 38.63 36.35 58.42
Maximum 55.50 68.82 49.83 68.82
The Size Asymmetry distributions for the rate Limiting Candidates and the rest of the data (excluding the rate limiting candidates) for Jul-99 thru Jun-00 are shown below. The PDFs are seen to have similar shapes with each having a main peak at about +.05% asymmetry. Reviewing This similarity the distributions' statistics seen below in Table 8, it is seen that the limit candidate distribution has a higher median. The biggest difference is in the 25 percentiles (-0.32 for th rest vs. 0.0089 for the candidates) which is also eveident in the bump in the rest distribution between Asymmetries of -0.2 and -0.1
Table 8: Statistics for Size Asymmetry distributions for all sites and for Rate Limit Candidate sites
Metric All Sites Limit Candidate Sites
Samples 1863 372
Average 0.111 0.146
Std Dev 0.26 0.22
Median 0.064 0.082
25 percentile -0.032 0.0089
75 percentile 0.241 0.245
90 percentile 0.502 0.448
95 percentile 0.613 0.555
99 percentile 0.754 0.838
99.5 percentile 0.828 0.891
Minimum -0.871 -0.462
Maximum 0.923 0.924
IQR 0.272 0.236

Summary

We have looked at 4 possible methods of identifying which, of about 200 remote hosts seen from SLAC, may experience ICMP rate limiting of pings. The 4 methods all look at the packet losses. The methods are:
  1. measuring the slope of the loss as a function of the sequence number in a series of 10 consecutive pings (tail-drop slopes);
  2. comparing ping with a non-ICMP based mechanism (sting) for measuring packet loss;
  3. comparing ping with a second non-ICMP based mechanism (synack for measuring packet loss;
  4. comparing the ping losses for small (100Byte) and larger (1000Byte) packets.
Using the tail-drop slope mechanism we identified about 45 possible candidates out of 200 sites by selecting those with large tail-drop slopes. Most of these candidates are for sites that have relatively poor Internet connectivity and typically are located in the former Soviet Union, Eastern Europe, S. E. Asia, Latin America and South Africa. We then compared the candidate sites versus all the sites (including candidates) using methods 2 and 3.

The early results have been somewhat mixed in identifying which sites actually carry out rate-limiting. We ran into problems with pathologies with sting, for example negative packet losses, that reduced the number of candidate sites that sting could be used for comparing losses with ping. Also for both the TCP (non-ICMP) based mechanisms (sting and synack) we were unable to find a suitable open port for about 20% of the candidate sites. In addition, probing such ports can be misinterpreted as a host scan and raise security alerts. Using sting we were able to identify 4 sites which had high tail-drop slopes and anomalously high method (ping vs. sting) asymmetries.

One would imagine that opening a session to a port as is done by synack would take more effort (longer code path etc.) than responding to a simple ping. However, surprisingly it appears that on average the response (RTT) for ping is slower than for synack, however the difference varies from host to host and 41% had ping RTTs < synack RTTs. The ping losses also appear to be larger on average than the synack losses, this might be an indication of ping rate limiting. However again there is a lot of variability from host to host and about 39% of the hosts have lower ping losses than synack losses. By comparing the synack and ping RTT time series we identified:

As was expected ping losses were greater (46% for all the data and 60% for the rate limiting candidates) for 1000Byte pings compared to 100Byte pings. However, the packet size asymmetries for the rate limiting candidates were only slightly different from those for the rest of the data.

We also compared the results from the 4 methods to see whether they correlated well with each other. To do this we scatter plotted the sting and size loss asymmetries and the slopes versus the synack asymmetries. The results are shown below. It can be seen that there is a weak positive correlation (R2 ~ 0.2) between the synack and sting asymmetries, but no noticeable correlation between the synack asymmetries and the slopes or the size asymmetries. Since it appears that comparing losses from ping with other non-ICMP methods such as synack and sting is more likely to directly identify pathologies that would preferentially affect pings, the lack of correlation between synack and the tail-drop slopes or size asymmetries does not bode well for identifying rate limiting by examining existing PingER data for tail-drop slopes or the difference in losses for 100Byte and 1000Byte pings.

Unfortunately neither Surveyor nor RIPE monitors are situtaed at many, if any of the the candidate sites, so we cannot use Surveyor or RIPE loss measurements to validate the rate limiting conjectures.


URL: http://www-iepm.slac.stanford.edu/pinger/tools.html
Comments to iepm-l@slac.stanford.edu