Comparison between Two Dimensional
timeseries & Kolmogorov Smirnoff
technique
for Anomalous
Variations in EndtoEnd Internet
traffic
Fawad Nazir, Les Cottrell, Mahesh Chhaparia, Alf Wachsmann, Connie Logg
{fawad,cottrell,maheshc,alfw,cal}@slac.stanford.edu
Stanford Linear
Introduction  Mark Burgess Method  Kolomogorov Smirnoff Method  Results  Ack Conclusion  References
Introduction
In this report we will briefly explain the working of the two dimensional timeseries technique [1] and Kolmogorow Smirnoff (KS)[2][3] technique and analyze the result after applying these techniques to detect anomalous variation in endtoend internet traffic. We have applied these techniques on different network parameters like dynamic bottleneck capacity (dbcap), round trip time (RTT) and throughput measured using the TCP/UDP bandwidth measurement (iperf) [2] and the Available Bandwidth Estimation (ABwE) [3] tools.
Two Dimensional Timeseries Technique
A two dimensional time approach is introduced in order to classify a periodic, adaptive threshold for service level anomaly detection. An iterative algorithm is applied to history analysis on this periodic time to provide a smooth rolloff in the significance of the data with time. This method was originally designed to detect anomalous behavior on a single host, with the aim of using the information for selfregulation, by initiating a counter response. An anomaly is indicated by a code indicating the state of the given statistic, as compared to an average of equivalent earlier times. The codes are given below:
code
highlownormal meaning

2

no sigma variation

4 low within noise threshold, and within
5 normal 2 standard deviations from
6 high expected value

14
low
microanomaly: within noise
15
normal
threshold, but 2 or more standard
16
high
deviations from expected value

24
low
normal; within 1 standard deviation
25
normal
from the expected value
26
high

34
low
dev1; more than 1 standard
35
normal
deviation from the expected value
36
high

44
low
dev2; more than 2 standard
45
normal
deviations from the expected
46
high
value

54
low
anomaly; more than 3 standard
55
normal
deviations from the expected
56
high
value
Kolomogorov Smirnoff Technique
The KS statistic is the best known of several distributionfree procedures which compare two sample CDFs (Cumulative Distribution Functions) in order to test for general differences between two distributions. The KS statistic is calculated by comparing two sample CDFs using the maximum vertical distance between them as a test statistic. The KS statistic is more generally used to assess the probability that a sample comes from a normally distributed data set. With regard to testing for general differences between any two distributions, the KS statistic is both easy to calculate and powerful. As the KS statistic is distributionfree we are not required to make any assumption about the distribution of the data.
While running the KS
algorithm we have to specify the number of points for KS calculation. These
points determine the time window on which the Cumulative Distribution Function(CDF) is calculated. For example, a value of 100 means that CDF will be
calculated on a window of 100 time points before and after the current
timestamp. The bigger this window size, the more accurate would be the CDF
obtained, but the process becomes slower. A value of 100 seems to work well
for our analysis.
Some Results
1. Results on Iperf (Throughput) data from Stanford Linear Accelerator
Center (SLAC) to California Institute of Technology (Caltech)
Figure 1 : KS on
Iperf data from SLAC to Caltech
Figure 2 : CFEtool on
Iperf data from SLAC to Caltech
Figure 3 : CFEtool on Iperf Data SLAC to Caltech, (March 06 to March 13 , 05)
This graph is a snapshot of the graph in Figure 2. The purpose of this graph is to understand the spikes in more detail.
After applying Two Dimensional timeseries technique and Kolmogorow Smirnoff (KS) technique on different network parameters we have come to a
conclusion that Mark Burgess technique detects the anomalies for each and every unwanted huge spikes/variation in the data we have and it could work very well
for the real time analysis. In the other case if we want to check the trend for long term data, this technique may not work very well. On the other KS techniques
works very well for the long term anomalous variation in internet endtoend traffic.
We are grateful to Dr. Mark Sandford (J.M.Sandford@lboro.ac.uk) [3] for his help in understanding Kolmogorov Smirnoff (KS) technique and providing us with the its implementation.
[1] Mark Burgess, Two dimensional timeseries for anomaly detection and regulation in adaptive systems, Proceedings of 13th IFIP/IEEE International Workshop on Distributed System, operations and management (DSOM 2002). "Management Technologies for ECommerce and EBusiness Applications" Springer 2002.
[2] Raj Jain, 1991, The art of computer systems performance analysis: Techniques for Experimental Design, Measurement, Simulation and modeling, ISBN 0471503363.
[3] Sandford, J.M., Parish, D.J. and Phillips, I.W., ''Neural approach to detecting communication network events'' , IEE Proc. Communications , 1495 , October 2002, pp 257264, ISSN 13502425
[4] The TCP/UDP bandwidth measurement tool (iperf) http://dast.nlanr.net/Projects/Iperf/
[5] Available Bandwidth Estimation (ABwE) tool http://wwwiepm.slac.stanford.edu/abing/