DetectiNG LOSS OF PERFORMANCE IN
Dynamic Bottleneck Capacity (DBCap) measurements USING THE Holt-Winters
Algorithm

In this report we will explain the different aspects of Holt-Winters algorithm to detect loss of performance in time series dynamic bottleneck capacity (dbcap) measured by the Available Bandwidth Estimation (ABwE) abing tool. We will explain how the Holt-Windows algorithm works, and show some hosts analysis to demonstrate the correct functioning of this method.
Holt-Winters algorithm is basically a quantitative forecasting method that uses mathematical recursive functions to predict the trend behavior. It uses a time series model to make predictions assuming that the future will follow the same pattern as the past. In this particular case we have seasonal patterns corresponding to weekly periodicity; this means that we should use a factor in our equations that uses information from past weeks to make a more accurate prediction of what’s going to happen in the future. The Holt-Winters algorithm equations [1] are given by:

Where:
yt is the observed value at time t. In this case the time of the dbcap measurement.
St is the smoothed observation at time t.
bt is the trend smoothing at time t.
It is the seasonal smoothing at time t. In the equations we can see that we always use t-L because we are using information from previous weeks.
L is the number of periods that complete 1 season. If we are making our measurements every 3 minutes and the season is 1 week long, L= 3360 (there are 10080 minutes in one week so just divide this number by 3).
Ft+m is the forecast/prediction at m periods ahead
m is the number of periods ahead we want to predict. If we are making our measurements every n minutes, an m=3 will means that we are predicting 3n minutes in the future.
is the overall smoothing parameter.
is the seasonal smoothing parameter.
is the trend smoothing parameter.
But, what do these parameters mean?
is basically the short term parameter. A large value of
will give a large weight to
measurements very near in the past, while a small value of alpha will give more
weight to measurements further in the past.
is the trend parameter. A big value of
will give more
weight to the difference of the last smoothed observations; while a little
value of
will use information further in the past.
is the seasonal
parameter. A big value of
will give a big weight to the present relation
between the observation and the smoothed observation, and little values of
will give more weight to past weeks relation
between the observation and the smoothed observation.
In the equations we can see that we have values corresponding to inexistent periods of time, like the case of St-1 or bt-1. When t=0, we don’t have any values for S or b, so we need initial values. The same happen for the seasonal index I when t<L.
We will just use S0 =0.
![]()
This value of the trend factor b is mainly an average of the differences in the observations of the first two weeks divided by the length of a week.
For k=1,2,3,…,L-1,L

This value of the seasonal index is basically making an average of the observed values of every week and then making an average of the day n of every week divided by the average of the observed values for its week.
To
find the parameters
,
and
we will use the minimum mean square error,
where the error is the difference between the forecast and the observed values
at time t. The minimization equation
is given by:

In
every case we got a
equal to zero.
and
generally take values
from 0+ to 0.3.
This
method requires regularly time spaced data, which we don’t have. To fix this we
order the data in 3 minutes bins. If there is more than one measurement in a
particular bin we average the data points, but there may be certain bins
without data, and the Holt-Winters algorithm not only requires regularly time
spaced measurements, but also existent data at all times. To manage this
problem, we update the first week data such that it has measurements for all
the 3 minute intervals. Find the nearest week (
) in the future such that for each empty bin in the first
week (
), there is some intermediate week (
) having data corresponding to the empty bin. For each empty
bin in the first week, average over the corresponding data points from
to
. With the next weeks, if there is missing data we use the
previous week to fill the hole.
This is an example of some measurements made.
#date time dbcap
06/20/2004 18:01:05 916.200
06/20/2004 18:07:39 966.800
06/20/2004 18:20:33 943.500
06/20/2004 18:25:14 1000.0
06/20/2004 18:27:04 700
We transform this data into this:
#date time dbcap
06/20/2004 18:01:05 916.200
06/20/2004 18:04:05 0
06/20/2004 18:07:05 966.800
06/20/2004 18:10:05 0
06/20/2004 18:13:05 0
06/20/2004 18:16:05 0
06/20/2004 18:19:05 943.500
06/20/2004 18:22:05 0
06/20/2004 18:25:05 850 (here we average 1000 and 700)
In the following graphic we can see the results for SLAC in Caltech. We see the moving averages of the measurement and the forecast. The problem zone is shown with the red circle. Actually we are looking for the best way to trigger alerts. In this case we are using the standard deviation of the forecast. The condition is then: if the residual (Forecast-Observation) is bigger than the standard deviation in 70% of the points for the last 2,5 hours, we will trigger an alarm.

We can make a zoom on the red circle to see what happens:

We
can see that the alarm is triggered, so the loss of performance is detected. We
can see that in

On

Detailed results from all the nodes using this algorithm can be seen HERE.
In [2] Jake Brutlag presents an implementation of Holt-Winters algorithm which for the most part is same as the version discussed in previous sections. The two techniques differ primarily in their seasonal coefficients calculation. Brutlag’s implementation stores the seasonal contributions as an additive component to the forecast, while the implementation discussed above stores the seasonal contribution as a proportion of the baseline and hence is a multiplicative factor. Further, [2] also handles seasonal variability (e.g. application requests fluctuate wildly minute by minute during the peal hours of 4-8 pm, but at 1 am application requests hardly vary at all) as a deviation envelope (confidence band) with respect to the forecast.
The measure of deviation is a weighted average absolute deviation, updated via exponential smoothing:
![]()
Where,
dt-m = Deviation from last seasonal cycle (m is the seasonal period)
= Observed value at
time t
= Forecast value
= Seasonal
adaptation parameter
The confidence band is the
collection of intervals
for each time point yt in the time series.
A simple mechanism to detect an anomaly is to check if an observed value falls
outside the confidence band (a violation). However this often yields a large
number of false positives. A more robust mechanism is to use a moving window of
a fixed number of observations and generate an alert if the number of
violations within the window exceeds a specified threshold. Details on how to
choose model parameters (
) and initializing and updating baseline, linear trend,
seasonal trend can be found in [2].
The results in this section were produced with the following model parameter guidelines:
: Observations in the
last one day account for 99% of the weights in baseline calculation
: Observations in the
last one day account for 50% of the weights in linear trend calculation
: Observations in the
last seven days account for 99% of the weights in seasonal trend calculation
The numerical values based on the above guidelines and formulae in [2] are:
= 0.00955
= 0.001443
= 0.00137
The time points correspond to a 3 minute bucket each, and the seasonal cycle period is kept as 7 days (=480*7 time points). The window length is 28 with a threshold value of 23 to trigger alerts. The original dataset has several spans of missing observations. These when used directly generate forecasts in which the seasonal cycle is out of sync with the observations causing incorrect alerts. Hence the dataset is preprocessed as in the previous implementation to fill the gaps with estimates from previous week, where the first week is initially generated to ensure no gaps.

The vertical black lines indicate the alerts triggered by the algorithm. In an expanded view (shown below) of the black line near Week 32, we find that there is a sharp dip in dbcap value on a Sunday for about an hour which is unlike what the history dictates and the algorithm has experienced. Hence, it triggers an alert indicating anomalous behavior. Similar reason for the initial black lines, although the dip in these cases is smaller, the number of violations exceeds the threshold in the window of observations, causing an alert.

We have
already seen how the data in
NSLAB
: Nice Periodicity
MCS
: Interesting step change and loss of performance during a whole week
LSA
: Clear abnormal behavior with no seasonal periodicity
JPAPAN
: very continuous data with some down impulses
UTDALLAS
: Impulses every 2 weeks and some random ones
TRIUMF
: Step changes followed by a very periodic behavior
SOX
: Important weekly periodicity shape with different magnitudes followed by an
almost flat behavior.
[1] Engineering Statistics Handbook, 6.4.3.5 Triple Exponential Smoothing, Available at http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc435.htm
[2] Jake D. Brutlag, “Aberrant Behavior Detection in Time Series for Network Monitoring”, In Proceedings of the USENIX Fourteenth System Administration Conference LISA XIV,
Question or comments on this document, please contact us:
Felipe Haro felipe_haro@gmail.com
Mahesh Chhaparia maheshkc@slac.stanford.edu
Les Cottrell cottrell@slac.stanford.edu