IEPM

ICMP Rate Limiting

Part 1 of 2

Executive Summary

Part 1: Introduction, Tools and Techniques


SLAC Home Page


Introduction

One concern raised with ICMP echo is the possibility of Internet Service Providers (ISPs) or sites or even hosts rate limiting (including complete blocking) ICMP echo and thus giving rise to invalid packet loss measurements. Some sites simply block pings, others limit the amount of traffic that is accepted from pings, which can be especially bad for pings with large packet sizes. The net effect is that pings can get blocked so packet loss can look bad even though the network is OK.

We first noticed evidence of blocking/limiting in 1996. More recently we observed that of 76 IEPM/PingER Beacon sites that were responding to pings in June 2000, by December 2000 2 (or about 2.6% in 6 months) of those sites were no longer responding to pings, even though they still responded to other probes. In addition 5 (or 7 including the 2 Beacon hosts, or around 3%) of about 240 hosts being pinged from SLAC no longer responded to pings. Some of the blocking/limiting was prompted by defending against denial of service attacks such as smurf or ping o' death attacks, or security scans. Ping O' death is an OS problem not an ICMP problem and requires an update to the OS to fix the problem. Smurf attacks can be blocked by turning off directed broadcasts and blocking IP addresses *.255 and *.0. Also see Minimizing the Effects of DoS Attacks from Juniper.

Even if one blocks directed broadcast, a cracker can still individually ping each address in a subnet and use the responses to document the existence of host which can then be scanned to see what ports respond on each host. ICMP has also been used to control rooted machines behind firewalls (loki, stacheldraht, trin00). It is also likely that attackers will find a way to similarly (mis)use any other kind of packet that most sites will allow through their network perimeter (if any other packet could replace the ICMP packets used for ping and friends).

Note also that the Internet Host Requirement (RFC 1122) states Every host MUST implement an ICMP Echo server function that receives Echo Requests and sends corresponding Echo Replies. However, even if implemented in the code the facility could be turned off or blocked at a router. At least Cisco IOS 12.0 defaults directed broadcasts to the "off" position, the current "Requirements for IPv4 Routers" documents not withstanding. However, it is a struggle to educate admins on how to properly handle ICMP, and it may be too late anyway.

Given all the above, some security experts recommend blocking pings. In particular, recently The System Adminstration, Networking and Security Institute (SANS) has recommended that to maximize computer security, system admins should block ICMP among other measures: ICMP -- block incoming echo request (ping and Windows traceroute), block outgoing echo replies, time exceeded and host unreachable messages". Taken at face value this recommendation will also block Path MTU discovery. It is also contrary to the recommendations of RFC 1435. Thus it should be ammended to at least let ICMP can't fragment (type 3, code 4) messages through, which should make the Path MTU Discovery problem disappear. After this was pointed out a later version exempted blocking of Path Discovery. There will still remain the major problem that the 2 most widely used, deployed and understood Internet problem shooting tools, ping and traceroute, will no longer work. Possibly worse, they will only partially work and give false results (e.g. a node/site is reported down by ping, but is really working fine apart from responding to ping). The paper ICMP Packet Filtering provides a guideline for the filtering of ICMP messages. A paper by Ofir Arkin deals with plain Host Detection techniques, Host Detection techniques using ICMP error messages generated from probed hosts, Inverse Mapping, Trace routing, OS finger printing methods with ICMP, and which ICMP traffic should be filtered on a Filtering Device.

We have identified some kind of ping shaping at about 2% of the remote PingER sites (e.g. a Nordunet site, 2 Israeli ILAN sites (who by default limit ping traffic to < 10kbps per site), educational sites in Singapore). In some cases we have worked with the site to allow our traffic through (e.g. Israel). Sometimes we can reduce our traffic to avoid shaping, or alternatively we can increase traffic to help identify shaping. Usually it has been fairly apparent (e.g. one sees a change in behavior from the previous baseline, see for example The median packet loss seen from nbi.dk). Note that since we select the remote sites that we monitor and try to have a contact at each remote site, by working with the site contacts, we avoid many of the problems that we would have if the sites/host were chosen randomly.

To avoid shaping etc. one needs to make the monitoring traffic look like any other traffic so it is never blocked or treated differently, else it will run into same problem as ICMP. This probably rules out changing over to using UDP or TCP echo. This is exactly what motivated the development of "Type P" packets in Framework for IP Performance Metrics (RFC 2330), ensuring that there's leeway to do this and awareness that different types of packets can indeed be treated very differently. As these metrics are defined and are implemented we can start to use them instead of ping. One effort in this direction is IPMP.

Tail-Drop Behavior

In order to detect rate-limiting, we performed a frequency analysis on the frequencies of packet-drops as a function of the packet number. Without rate-limiting, we expected to see a statistically flat distribution, but noticed that some sites exhibited a noticeable slope. This would be consistent with tail-drop rate-limiting of ICMP Packets. Prime examples are :
argentina-packetdrop.gifhistogram-aug-99.gif

We realize that with RED (Randomized Early Detection) being implemented as a limiting mechanism in many of the current routers, the tail-drop scenario would not occur - instead the distribution would look uniform. We took the Cisco Committed Access Rate (CAR) as an example and concluded that the router would exhibit tail-drop behavior if the extended burst limit was set to be equal to normal limit. In practice there have been a number of sites that exhibit this sort of behavior and to ascertain its cause to be icmp rate-limiting, we have requested these sites to allow us to measure TCP traffic with the help of mechanisms such as SYN/ACK and Sting.

Dealing with CAR

In case the normal burst limit and the extended burst limits were different, in the absence of interference from other icmp traffic , each packet would return whatever tokens it had borrowed from the extended burst bucket, if any, before the next packet came in. What would cause a packet drop in such a scenario would be the "Compounded Mechanism" that CAR uses - that is, it keeps a running sum of the total number of tokens borrowed from the extended burst bucket. The sum is not decremented even when the tokens are subsequently returned. Consequently the sum continues to grow at a uniform rate with every packet, till it crosses the preset boundary, whereupon a packet is dropped and the sum is reset to zero. Were this algorithm to operate under our assumption of lack of significant interference, we should be able to detect a regular pattern in the packet-numbers of dropped packets - ie. for some integer n, every nth packet should be dropped. Note that this is specific to ICMP traffic that probes once every time-period > Waiting-time (for a packet inside the router while it is waiting to be sent), so that the previous tokens have been returned before the next packet has come in.

In practice this line of reasoning has not shown any results. The possible reasons are the following :

  1. Perhaps most of the monitored sites do not use CAR in site-specific mode, so our assumption of non-interference is wrong, and hence the number n would be random, reflecting present traffic conditions. This would make the packet loss indistinguishable from a uniform distribution
  2. Currently we probe with 10 packets per host every half-hour. This might be less than the number of tokens that are available and could be falling short of provoking a response. In order to investigate this, we need to probe with longer sequences and variable sized packets.
We will add the ability for PingER to use other methods than ping for injecting traffic (e.g. get a web page, open a TCP session and close it). Comparing the results of these other methods with ping will enable us to look for rate limiting. In a future release of PingER we also plan to allow the size of the ping packet and the number sent to be configurable, this may enable monitoring sites to avoid some of the rate limiting problems.

 We are also comparing PingER results with Surveyor results to help further identify ICMP shaping. Another possibility to identify rate limiting is to compare large packet pings versus small packet pings, or to compare the losses for the first pings with those later pings in a sequence of 10 pings (the idea being that the limiting kicks in after the first few packets are seen so ensuing packets are more likely to be dropped, something similar to this is mentioned in "The End-to-end effects of Internet path selection").

Examples of working with sites

An Israeli ISP (ILAN) shaped ICMP traffic to only allow 10kbps of ICMP traffic per university. We worked with the ILAN people who were most helpful in providing filters to let ping traffic through from our monitoring hosts.

 The University of Cincinnatti in the Fall of 1997 blocked pings altogether to avoid problems with "Ping O'death" attacks (overlong ping packets causing OS crashes). We worked with the UCinn users at SLAC and the network folks at UCinn and after a month or so the blocking was lifted.

 Singapore university only responds to 56 byte pings. I believe this is to prevent "ping O'death" attacks and reduce the possibility of abuse. We have contacted the Singapore University people but have not successfully resolved this.

Measuring RTT by using SYN/ACKs instead of Pings

We have written a program to measure the RTT (and loss) by using the TCP SYN/ACK mechanism for opening a connection. The TCP SYN/ACK mechanism is as follows. A TCP connection, established as a 3-way handshake, is initiated by the rendezvous of an arriving segment containing a SYN and the host's ISN (Initial Sequence Number) and port number. The target responds with an ACK for the host's ISN and also sends the host its own ISN and port number. In the third stage, the host responds by acknowledging the target's ISN. The connection becomes "established" when sequence numbers have been synchronized in both directions. The SYN/ACK program issues a Connection request by a SYN and measures the time taken by the target to respond with an ACK. The connection is prompty cleared by another exchange of packets, this time containing the FIN control flag.

In order to truly measure Web traffic, which is almost entirely TCP/IP traffic, it is best to probe using TCP/IP rather than ICMP since this is most likely to defeat protocol-based limiting. This is where the SYN/ACK mechanism proves useful. On almost all OSes, there is only a limited supply of concurrent sockets. The immediate sending of a FIN closes that TCP connection and thus alleviates the problem of blocked sockets.

To compare the results of using SYN/ACK versus ping we wrote a program to measure the Round Trip Time (RTT) by using SYN/ACK then repeat the measurement with ping. Then we waited 1 second and repeated the measurement. This was repeated for 30,000 samples between December 20 10:06 and December 21 05:00. The measuring host was located at SLAC (oceanus.slac.stanford.edu, a Sun Ultra 5 running Solaris (SunOS 5.6)) and a host at Daresbury Lab. in Cheshire England (icfamon.dl.ac.uk). A time series of the RTTs measured by the two methods is shown below.
dl-time.gif

The ping RTTs are shown in magenta and the SYN/ACK RTTs in dark blue. Note the RTTs are plotted on 2 different y-axes to separate the ping points from the SYN/ACK and make them easier to distinguish. By visual insepction it appears there is more variation in the ping measurements. The table below shows some of the relevant statistical measures of the distributions.
Metric Ping SYN/ACK
Samples 30000 30000
Average 161.6 ms 158.0 ms
Standard Deviation 33.0 ms 11.6 ms
Median 154.4 ms 153.0 ms
25 percentile 153.2 ms 153 ms
75 percentile 163.8 ms 160 ms
95 percentile 175.9 ms 174 ms
Inter quartile range (IQR) 10.6 ms 8 ms
Minimum 151 ms 150 ms
Maximum 1222 ms 610 ms
Lost packets 528 (1.76%) 469 (1.56%)
Standard Deviation on Lost packets 23 31
The larger values for ping of the standard deviation and the Inter Quartile Range (IQR) confirm the increased variation seen for the time series plot. The table also indicates that the RTT measured by ping is on average a couple seconds longer than that measured by SYN/ACK. This difference in the RTTs is more easily seen in the frequency distributions, seen below.

The frequency distributions and Cumulative Distribution Frequencies (CDF) are seen to track one another very well, with a 1 to 2 ms offset. The frequency distributions are multimodal with peaks around 154 ms, 167 ms and 189 ms. We have not studied multi-modality of these distributions in detail. Such behavior is not unusual, see for example High statistics ping results. In some cases it may be caused by having two similar routes available (e.g. to provide load balancing). For more on the multi-modality see the section below on SYN/ACK vs. SYN/ACK & Ping vs. Ping.

If one calculates the square of the Correlation Coefficient R2 between the two frequency distributions and repeats this as one "slews" the time RTT offset of the pings with respect to the SYN/ACKs then one gets the plot shown below.
dl-r2.gif
It can be seen that there is a sharp peak in R2 close to a 1 ms slew. The R2 value at the peak of around 0.94 indicates a strong correlation between the 2 frequency distributions.

Looking at the scatter-plot (below) of the ping RTT versus the SYN/ACK RTT on the other hand shows no strong correlation between the pairs of values.
dl-scatter.gif
The difference between the number lost packets for pings and the number for SYN/ACKs is about 1.4 standard deviations. The number of occurrences of both a ping and a SYN/ACK being lost from the same pair in the 30000 sample pairs is 16 or 0.05%. This is about double what would be expected (0.038%) if the loss of a ping packet was independent of the loss of a SYN/ACK packet.

We conclude that though there is strong agreement between the overall losses and distributions of RTT measured by ping and SYN/ACK measured at one second intervals over an 8 hour time frame, there is little short range (order of < second) correlation between pairs of losses or RTT values measured by ping and SYN/ACK within a few hundred milliseconds of each other.

SYN/ACK vs. SYN/ACK & Ping vs. Ping

We repeated the above measurement for 1000 samples between 14:11 and 14:48 on December 22 but with both members of each pair of measurements being SYN/ACKs and then again for both members being pings The purpose of this was to see if the lack of short range correlation existed between consecutive SYN/ACKs and consecutive pings also. The scatter plot of the RTTs for consecutive SYN/ACKs  and for consecutive pings are seen below and indicate little correlation (R2 ~ 0.0014). Thus we conclude that the lack of correlation between the members of the pairs of measurements occurs for identical measurement methods, and is not due to a difference (ping vs SYN/ACK) in the measurement method.
dl-scatter-syn-syn.gifdl-scatter-ping-ping.gif
The distributions show strong bi-modality and looking in detail at the time series below, the switching between the two different SYN/ACK RTTs is clearly apparent. Similar effects are seen for pings. This may be due to different paths (e.g. for load balancing) being available which can be switched between in a sub-second interval. We are investigating the cause of the bi-modality in more detail.
dl-time-syn-syn.gif
The distributions for the consecutive SYN/ACKs and pings are shown below.
dl-hist-syn-syn.gifdl-hist-ping-ping.gif

Analysis of Bimodality

Pings 2 sites in UK from oceanus.slac.stanford.edu, both of which are fairly distant from each other, icfamon.dl.ac.uk and icfamon.rl.ac.uk, revealed bimodal distributions. We investigated the various links on the common path, found via traceroute.

This common path was :
1) RTR-CORE1.SLAC.Stanford.EDU (134.79.199.2)
2) RTR-CGB6.SLAC.Stanford.EDU (134.79.135.6)
3) RTR-DMZ.SLAC.Stanford.EDU (134.79.111.4)
4) ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.18)
5) nynap1-atms.es.net (134.55.24.9)
6) gin-nyy-bbl.teleglobe.net (192.157.69.33)
7) if-1-0-0.bb5.NewYork.Teleglobe.net (207.45.223.85)
8) if-0-0.core1.NewYork.Teleglobe.net (207.45.221.97)
9) ix-5-3.core1.NewYork.Teleglobe.net (207.45.202.30)
10) 193.62.157.13 (193.62.157.13)
11) external-gw.ja.net (128.86.1.40)

Our initial suspicion that the bimodality was caused due to load-balancing between 2 different transatlantic links proved incorrect when we noticed a bimodal distribution on both sides of it after pinging 193.62.157.13 and 207.45.202.30

Subsequent ping-generated data analysis revealed the following results for 2 consecutive links :
histogram-nynap-rtt.gif

histogram-teleglobe-rtt.gif

From the histograms of the Round-Trip-Times to the 2 nodes above, it is clear that bimodality is not introduced till the gin-nyy-bbl.Teleglobe.net link. While we are presently unsure of the basis for this behaviour, we are investigating various hypothesis to validate it.
 

 Part II : Detailed Study and List of Candidates


Back to top


Created December 23, 1999
URL: http://www-iepm.slac.stanford.edu/pinger/tools.html
Comments to iepm-l@slac.stanford.edu