Comparison between Two Dimensional time-series & Kolmogorov Smirnoff technique

for Anomalous Variations in End-to-End Internet traffic

 

Fawad Nazir, Les Cottrell, Mahesh Chhaparia, Alf Wachsmann, Connie Logg

{fawad,cottrell,maheshc,alfw,cal}@slac.stanford.edu

Stanford Linear Accelerator Center (SLAC)


Introduction | Mark Burgess Method | Kolomogorov Smirnoff Method | Results | Ack |Conclusion | References


 

Introduction

            In this report we will briefly explain the working of the two dimensional time-series technique [1] and Kolmogorow Smirnoff (KS)[2][3] technique and analyze the result after applying  these techniques to detect anomalous variation in end-to-end internet traffic. We have applied these techniques on different network parameters like dynamic bottleneck capacity (dbcap), round trip time (RTT) and throughput measured using the TCP/UDP bandwidth measurement  (iperf) [2] and the Available Bandwidth Estimation (ABwE) [3] tools.

 

Two Dimensional Time-series Technique

A two dimensional time approach is introduced in order to classify a periodic, adaptive threshold for service level anomaly detection. An iterative algorithm is applied to history analysis on this periodic time to provide a smooth roll-off in the significance of the data with time. This method was originally designed to detect anomalous behavior on a single host, with the aim of using the information for self-regulation, by initiating a counter response. An anomaly is indicated by a code indicating the state of the given statistic, as compared to an average of equivalent earlier times. The codes are given below:

 

code     high|low|normal     meaning
            -------------------------------------------------------------
            -2     -                             no sigma variation
            -------------------------------------------------------------
            -4        low                     within noise threshold, and within
            -5        normal                2 standard deviations from
            -6        high                    expected value
            -------------------------------------------------------------
            -14     low                     microanomaly: within noise
            -15     normal                 threshold, but 2 or more standard
            -16     high                     deviations from expected value
            -------------------------------------------------------------
            -24     low                     normal; within 1 standard deviation
            -25     normal                from the expected value
            -26     high
            -------------------------------------------------------------
            -34     low                     dev1; more than 1 standard
            -35     normal                 deviation from the expected value
            -36     high
            ------------------------------------------------------------
            -44     low                     dev2; more than 2 standard
            -45     normal                 deviations from the expected
            -46     high                     value
            -------------------------------------------------------------
            -54     low                     anomaly; more than 3 standard
            -55     normal                 deviations from the expected
            -56     high                     value
 

 

Kolomogorov Smirnoff Technique

The K-S statistic is the best known of several distribution-free procedures which compare two sample CDFs (Cumulative Distribution Functions) in order to test for general differences between two distributions.  The K-S statistic is calculated by comparing two sample CDFs using the maximum vertical distance between them as a test statistic.  The K-S statistic is more generally used to assess the probability that a sample comes from a normally distributed data set.  With regard to testing for general differences between any two distributions, the K-S statistic is both easy to calculate and powerful.  As the K-S statistic is distribution-free we are not required to make any assumption about the distribution of the data.

While running the KS algorithm we have to specify the number of points for KS calculation. These points determine the time window on which the Cumulative Distribution Function(CDF) is calculated. For example, a value of 100 means that CDF will be calculated on a window of 100 time points before and after the current timestamp. The bigger this window size, the more accurate would be the CDF obtained, but the process becomes slower. A value of 100 seems to work well for our analysis.
 

 

Some Results

 

1. Results on Iperf (Throughput) data from Stanford Linear Accelerator Center (SLAC) to California Institute of Technology (Caltech)

 

Figure 1 : KS on Iperf data from SLAC to Caltech

 

Figure 2 : CFEtool on Iperf data from SLAC to Caltech

 

 

 

Figure 3 : CFEtool on Iperf Data SLAC to Caltech, (March 06 to March 13 , 05)

This graph is a snapshot of the graph in Figure 2. The purpose of this graph is to understand the spikes in more detail.

 

Conclusion

               After applying Two Dimensional time-series technique and Kolmogorow Smirnoff (KS) technique on different network parameters we have come to a
conclusion that Mark Burgess technique detects the anomalies for each and every unwanted huge spikes/variation in the data we have and it could work very well 
for the real time analysis. In the other case if we want to check the trend for long term data, this technique may not work very well. On the other KS techniques 
works very well for the long term anomalous variation in internet end-to-end traffic.

 

Acknowledgements

            We are grateful to Dr. Mark Sandford (J.M.Sandford@lboro.ac.uk) [3] for his help in understanding Kolmogorov Smirnoff (KS) technique and providing us with the its implementation.

 

References

 

[1] Mark Burgess, Two dimensional time-series for anomaly detection and regulation in adaptive systems, Proceedings of 13th IFIP/IEEE International Workshop on Distributed System, operations and management (DSOM 2002). "Management Technologies for E-Commerce and E-Business Applications" Springer 2002.

[2] Raj Jain, 1991, The art of computer systems performance analysis: Techniques for Experimental Design, Measurement, Simulation and modeling, ISBN 0-471-50336-3.

[3] Sandford, J.M., Parish, D.J. and Phillips, I.W., ''Neural approach to detecting communication network events'' , IEE Proc. Communications , 1495 , October 2002, pp 257-264, ISSN 1350-2425

[4] The TCP/UDP bandwidth measurement tool (iperf) http://dast.nlanr.net/Projects/Iperf/

[5] Available Bandwidth Estimation (ABwE) tool http://www-iepm.slac.stanford.edu/abing/