DetectiNG LOSS OF PERFORMANCE IN Dynamic Bottleneck Capacity (DBCap) measurements USING THE Holt-Winters Algorithm

 

 

 

Felipe Haro, Mahesh Chhaparia, Les Cottrell SLAC

Analysis of canonical examples | Spreadsheets of canonical examples


Introduction

 

            In this report we will explain the different aspects of Holt-Winters algorithm to detect loss of performance in time series dynamic bottleneck capacity (dbcap) measured by the Available Bandwidth Estimation (ABwE) abing tool. We will explain how the Holt-Windows algorithm works, and show some hosts analysis to demonstrate the correct functioning of this method.

 

Holt-Winters Algorithm

 

            Holt-Winters algorithm is basically a quantitative forecasting method that uses mathematical recursive functions to predict the trend behavior. It uses a time series model to make predictions assuming that the future will follow the same pattern as the past.  In this particular case we have seasonal patterns corresponding to weekly periodicity; this means that we should use a factor in our equations that uses information from past weeks to make a more accurate prediction of what’s going to happen in the future. The Holt-Winters algorithm equations [1] are given by:

 

Where:

yt is the observed value at time t. In this case the time of the dbcap measurement.

St is the smoothed observation at time t.

bt is the trend smoothing at time t.

It is the seasonal smoothing at time t. In the equations we can see that we always use t-L because we are using information from previous weeks.

L is the number of periods that complete 1 season. If we are making our measurements every 3 minutes and the season is 1 week long, L= 3360 (there are 10080 minutes in one week so just divide this number by 3).

Ft+m is the forecast/prediction at m periods ahead

m is the number of periods ahead we want to predict. If we are making our measurements every n minutes, an m=3 will means that we are predicting 3n minutes in the future.

 is the overall smoothing parameter.

 is the seasonal smoothing parameter.

 is the trend smoothing parameter.

 

But, what do these parameters mean?

 is basically the short term parameter. A large value of will give a large weight to measurements very near in the past, while a small value of alpha will give more weight to measurements further in the past.

 is the trend parameter. A big value of  will give more weight to the difference of the last smoothed observations; while a little value of  will use information further in the past.

 is the seasonal parameter. A big value of  will give a big weight to the present relation between the observation and the smoothed observation, and little values of  will give more weight to past weeks relation between the observation and the smoothed observation.

                                                                                                                                                                                          

In the equations we can see that we have values corresponding to inexistent periods of time, like the case of St-1 or bt-1. When t=0, we don’t have any values for S or b,  so we need initial values. The same happen for the seasonal index I when t<L.

 

Initial Values

Initial Values for the observation smoothing

 

We will just use S0 =0.

Initial Values for the trend smoothing

 

This value of the trend factor b is mainly an average of the differences in the observations of the first two weeks divided by the length of a week.

Initial Values for the Seasonal smoothing

 

For k=1,2,3,…,L-1,L

            This value of the seasonal index is basically making an average of the observed values of every week and then making an average of the day n of every week divided by the average of the observed values for its week.

 

Choosing the parameters

           

            To find the parameters ,  and   we will use the minimum mean square error, where the error is the difference between the forecast and the observed values at time t. The minimization equation is given by:

           

            In every case we got a  equal to zero.  and  generally take values from 0+ to 0.3.

 

The program

           

            This method requires regularly time spaced data, which we don’t have. To fix this we order the data in 3 minutes bins. If there is more than one measurement in a particular bin we average the data points, but there may be certain bins without data, and the Holt-Winters algorithm not only requires regularly time spaced measurements, but also existent data at all times. To manage this problem, we update the first week data such that it has measurements for all the 3 minute intervals. Find the nearest week () in the future such that for each empty bin in the first week (), there is some intermediate week () having data corresponding to the empty bin. For each empty bin in the first week, average over the corresponding data points from to. With the next weeks, if there is missing data we use the previous week to fill the hole.

 

This is an example of some measurements made.

#date               time      dbcap    

06/20/2004 18:01:05   916.200  

06/20/2004 18:07:39   966.800 

06/20/2004 18:20:33   943.500 

06/20/2004 18:25:14   1000.0    

06/20/2004 18:27:04   700  

 

We transform this data into this:

#date               time      dbcap    

06/20/2004 18:01:05   916.200  

06/20/2004 18:04:05   0  

06/20/2004 18:07:05   966.800 

06/20/2004 18:10:05   0  

06/20/2004 18:13:05   0 

06/20/2004 18:16:05   0  

06/20/2004 18:19:05   943.500  

06/20/2004 18:22:05   0

06/20/2004 18:25:05   850 (here we average 1000 and 700)  

 

Some Results

 

            In the following graphic we can see the results for SLAC in Caltech. We see the moving averages of the measurement and the forecast. The problem zone is shown with the red circle. Actually we are looking for the best way to trigger alerts. In this case we are using the standard deviation of the forecast. The condition is then: if the residual (Forecast-Observation) is bigger than the standard deviation in 70% of the points for the last 2,5 hours, we will trigger an alarm.

            We can make a zoom on the red circle to see what happens:

            We can see that the alarm is triggered, so the loss of performance is detected. We can see that in Florida it works very well.

            On Indiana we see that it works quite well also:

 

Detailed results from all the nodes using this algorithm can be seen HERE.

 

Another Implementation of Holt-Winters Algorithm

 

            In [2] Jake Brutlag presents an implementation of Holt-Winters algorithm which for the most part is same as the version discussed in previous sections. The two techniques differ primarily in their seasonal coefficients calculation. Brutlag’s implementation stores the seasonal contributions as an additive component to the forecast, while the implementation discussed above stores the seasonal contribution as a proportion of the baseline and hence is a multiplicative factor. Further, [2] also handles seasonal variability (e.g. application requests fluctuate wildly minute by minute during the peal hours of 4-8 pm, but at 1 am application requests hardly vary at all) as a deviation envelope (confidence band) with respect to the forecast.

 

The measure of deviation is a weighted average absolute deviation, updated via exponential smoothing:

 

Where,

dt-m = Deviation from last seasonal cycle (m is the seasonal period)

   = Observed value at time t

  = Forecast value

   = Seasonal adaptation parameter

 

The confidence band is the collection of intervals  for each time point yt in the time series. A simple mechanism to detect an anomaly is to check if an observed value falls outside the confidence band (a violation). However this often yields a large number of false positives. A more robust mechanism is to use a moving window of a fixed number of observations and generate an alert if the number of violations within the window exceeds a specified threshold. Details on how to choose model parameters () and initializing and updating baseline, linear trend, seasonal trend can be found in [2].

 

Some Results

 

The results in this section were produced with the following model parameter guidelines:

 

 : Observations in the last one day account for 99% of the weights in baseline calculation

 : Observations in the last one day account for 50% of the weights in linear trend calculation

  : Observations in the last seven days account for 99% of the weights in seasonal trend calculation

 

The numerical values based on the above guidelines and formulae in [2] are:

 

 = 0.00955

 = 0.001443

  = 0.00137

 

The time points correspond to a 3 minute bucket each, and the seasonal cycle period is kept as 7 days (=480*7 time points). The window length is 28 with a threshold value of 23 to trigger alerts. The original dataset has several spans of missing observations. These when used directly generate forecasts in which the seasonal cycle is out of sync with the observations causing incorrect alerts. Hence the dataset is preprocessed as in the previous implementation to fill the gaps with estimates from previous week, where the first week is initially generated to ensure no gaps.

Florida Dataset

 

The vertical black lines indicate the alerts triggered by the algorithm. In an expanded view (shown below) of the black line near Week 32, we find that there is a sharp dip in dbcap value on a Sunday for about an hour which is unlike what the history dictates and the algorithm has experienced. Hence, it triggers an alert indicating anomalous behavior. Similar reason for the initial black lines, although the dip in these cases is smaller, the number of violations exceeds the threshold in the window of observations, causing an alert.

 

Florida Dataset (around Week 32-33)

 

 

Some Interesting Datasets

 

            We have already seen how the data in Florida, Caltech and Indiana looks like. These are some of the dbcap behaviors that we usually have in different nodes.

 

NSLAB : Nice Periodicity

 

MCS : Interesting step change and loss of performance during a whole week

 

LSA : Clear abnormal behavior with no seasonal periodicity

 

JPAPAN : very continuous data with some down impulses

 

UTDALLAS : Impulses every 2 weeks and some random ones

 

TRIUMF : Step changes followed by a very periodic behavior

 

SOX : Important weekly periodicity shape with different magnitudes followed by an almost flat behavior.

 

References

 

[1] Engineering Statistics Handbook, 6.4.3.5 Triple Exponential Smoothing, Available at http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc435.htm

[2] Jake D. Brutlag, “Aberrant Behavior Detection in Time Series for Network Monitoring”, In Proceedings of the USENIX Fourteenth System Administration Conference LISA XIV,

     New Orleans, LA, December 2000.

 

Question or comments on this document, please contact us:

Felipe Haro felipeharo@gmail.com

Mahesh Chhaparia maheshkc@slac.stanford.edu

Les Cottrell cottrell@slac.stanford.edu