IEPM

Outage frequencies between SLAC, FNAL CMU and CERN

Supported by DOE MICS Link to SLAC Home Page

The plots on this page show the outage duration frequencies between SLAC, FNAL, CMU and CERN. The outage duration is measured by looking at the Surveyor one way probes and looking for how long consecutive probes don't get through. The Surveyor probes are launched on average 2 times/second with a Poisson distribution and are one way probes.

Outage frequencies for all sites

The plot below shows the outage frequency for the data for all pairs added together for November 1998 thru July 1999. It includes data from about 284 million probes or 142 million seconds.

To assist in seeing how the outage duration frequencies behave for different outage ranges the data is binned into 1 second ( dark blue), 10 second (red) and 100 second bins (green). The light blue points are the 1 second bins out to a 20 second duration. The lines are power series fits with the parameters shown and with the R2 shown. The blue dashed line is a fit to the data binned in 1 second bins out to 20 seconds. The data are seen to have a strong correlation (R2 > 0.6) to the power series fits.

Outage probabilities for all sites

The plot below shows the probabilities of seeing outages of a given length during a phone call of duration 3 minutes (magenta dots), and the probability of observing an outage of > given duration in a 3 minute phone call (dark blus crosses). The magenta line is a power series fit to the magenta dots. The parameters of the power series are given together with the R2. The point depicting the probability of no outage of > 1 second in a call of length 3 minutes (the value is 75%) is not shown in order to make it easier to read off the probababilities for 2 seconds and beyond.

The magenta line is a fit of the probability of an outage of a given length data to a power series. The parameters and R2 of the fit are also shown. The blue crosses illustrate the probability P of not observing an outage of greater than the given number of seconds in a 3 minute time period. The 3 minute time period is chosen as being the typical length of a phone call. P is defined as follows:

Let Fi be the observed frequency of an outage of duration si seconds (si < si+1), i = 1 ... M, then
Ki = Sum1i=M Ki+1 + Fi,
is the reverse cumulative frequency distribution, and KM+1 = 0,
Ji = C * Ki / N ,
where C is the call length (in the above case set to 180 seconds), and N is the total number of seconds over which the measurement was made, and Ji is the probability of an outage of a given length in a call of length C, then
Pi = 1 - Ji
is the probability of not observing an outage of greater than si seconds in a call of C seconds.

Another way of looking at this data is to look at the Outage Events metric defined by the Automotive Network eXchange (ANX) as the number of outage events of 30 seconds or greater per year. For the data shown here this comes out to be about 450/year/site-pair. This is much higher than the ANX limit of 10 such events.

Outage frequencies by month and by site pairs

The plots below are organized by month (one month per row) and by site pairs (one site pair per column). The log of the frequency is plotted against the log of the outage duration in seconds.

There are many causes of the outages, each with its own characteristic time scale, including:

CERN to SLACSLAC to CERNCERN to FNALFNAL to CERN
































SLAC to FNALFNAL to SLACSLAC to CMUCMU to SLAC































Created: 23 September 1999; last update 28 September 1999
URL: http://www-iepm.slac.stanford.edu/monitoring/surveyor/outage.html
Comments to iepm-l@slac.stanford.edu