|
Monitoring with tcpdump
|
The active (ie intrusive) performance monitoring conducted by PingER and other monitoring projects at SLAC would be well complemented with an understanding of passive monitoring, that is genuine user traffic as it comes in and goes out of the SLAC DMZ. Unfortunately, the SLAC LAN-Analyser Netscout does not provide sufficient information to be useful (although netscout may have other products), so we would like to explore other methodologies.
This report details an investigation of the TCPDUMP utility.
In our case, looking at an ethernet segment, tcpdump operates by putting the network card into promiscuous mode in order to capture all the packets going through the wire. Tcpdump runs using BSD Packet Filter (BPF) which is the method of collecting data from this network interface running into promiscuous mode. BPF receives copies from the driver of sent packets and received packets. Before traveling through the kernel all the way up to the user process the user can set a filter so only interesting packets go through the Kernel. SUN OS uses Network Interface Tap (NIT) which only allows to capture packets received from the interface but no packets sent by the host. Still the SUN OS tcpdump does the trick but it performs its own filtering at the user process level which means that more data goes through the kernel.
We use tcpdump to measure the response time and the packet loss percentages. It can also tell us about lack of reachability for some distant server.
Using tcpdump we have a view on any TCP/UDP connection Establishment and Termination. TCP uses a special mechanism to set and close connections (we will discuss this later on); we measure the time lapse between the packets involved with this mechanism in order to know how fast some connections operate.
In this section we'll see some examples on a HTTP connection between a host at SLAC (Doris.slac.stanford.edu) and CERN web server (www.cern.ch).
The following shows a tcpdump output (containing TCP headers only), with the beginning of each TCP segment organized like this :
timestamp source -> destination : flags
Flags can be any of the list
S -> SYN (Synchronize sequence numbers - Connection establishment)
F -> FIN (Ending of sending by sender - Connection termination)
R -> RST (Reset connection)
P -> PSH (Push data)
. (No flag is set)
We can also find ACK (Acknowledgement) and URG (Urgent) flags following the ones above.
Line 1 :
1412042008:1412042008(0) is the sequence number of the
packet and the number of data sent.
starting
sequence number:ending sequence number(data bytes)
Line 2 :
This is the acknowledgement of the following packet; ack
1412042009 means that packet number 1412042008 has been received and 1412042009
can be sent.

What we're interested in is the connection establishment and termination. It takes three segments to open a connection and four to close it.
Mechanism for opening a connection :
The three way handshake can be described as follows :
1. The client sends a SYN segment with the port number of the server it wants to connect to and the client's initial sequence number (Line 1).
2. The server responds with its own SYN segment containing its initial sequence number (Line 2). This segment also contains an ack flag. So this segment acknowledges the client SYN (segment 1412042008 +1).
3. The client acknowledges this SYN from the server by sending another segment containing the "." flag and ack (Line 3).
Mechanism for closing a connection :
TCP is full-duplex which means that data can flow in either direction and in an independent manner. Either of the two communication ends can send an end of transmission segment (containing the FIN flag) when it has finished transmitting data. This FIN is usually the consequence of an application's CLOSE command. It is said that the first host to issue a FIN performs the active close , then the other and second one becomes the passive close. Usually applications don't take advantage of the fact that you can still send data in one direction having the other one closed.
This pre-study's aim is to point out some aspects of the extensive use of tcpdump, such as flat file size and where the data is stored, and to get some first ideas on what the results may look like. Using tcpdump we will first look at the SYN requests and the SYN ack, measuring the elapse time between the two for one host and one protocol. Doris.slac.stanford.edu creates each hour's web traffic (one attempt of connection every 4 seconds). Tcpdump then collects 50 packets and stops. These numbers let tcpdump collect around two to three entire conversations (establishment to termination). A perl script does some processing and from there we can plot the following:
The following graph is taking data for each hour (around 3 connections per hour) and figuring an average response time. We created this traffic on our own because the subnet (PUP6) where Doris is located doesn't give us enough regular numbers to work with.
The following graphs display the same thing but this time taking about 20 SYN segments (and 20 SYN ack) each hour. We generate new connections every 3 seconds in this example (tcpdump collects about 430 packets in order to get the 20 SYN)
(Saturday July 31)
(Sunday August 1)
We then worked at the FIN and PUSH packet in order to measure the time between the Flag and its ack, this gives us what we'll call FIN response time and PUSH response time. Just like for the SYN, a perl script finds and associates a flag to its ack (and counts the time in between) from a flat file containing the tcpdump output. Another script then processes everything and gives us an average response time for each hour. The traffic is easily generated with a perl system call using lynx. In the future the sock program could be a good tool to generate any kind of traffic (on any port number, not only TCP 80).
Tcpdump allows us to look at TCP in action. We can collect packets on the wire, but the ordering of those packets can seem messy because of the dependence on a lot of factors (over which we have almost no control). Those factors include process scheduling by the OS, network collisions and TCP implementations on both ends. This is why we are looking at SYN, PUSH and FIN flags, as each exchange using those flags is much more predictable and allows us to make some analysis.
You can see all the graphs displaying response times between Doris -> CERN (HTTP) since the beginning of this case study.
Tcpdump is a complement to the PingER utility. Tcpdump gives
an overview of the type of protocol related at a given time to ping peaks.
These graphs now show ping response time for Saturday July 31 and Sunday
August 1.

The two last graphs display full day GMT time. The ones above were SLAC's time (GMT-7). We can find some correlation between some data evolution of each graphs. Of course it is not very precise because of the nature of what we are looking at (comparison between CERN Web server and CERN ping server responses time).
We then looked at a given day (August 9, 1999) to see if we could find any correlation between S, F, P flag time responses and ping time response for the same day. 100-byte ping packets were collected for SLAC -> CERN every hour. For each hour, 40 segments each of S, F and P flags were collected and the response time was then averaged.
Correlation table :

The moderate correlations are marked in green, the strong correlations in yellow.
In order to use those results we should be interested in the Push segment. Just like the ping requests the push flag ask for an answer right away. Using the push flag to measure time response seems to have different advantages. If looking at a busy segment of the network without having to create further traffic we can collect for any protocol we're interested in the relevant Pushes and calcul a time response. As pings get stop sometimes for security purpose this would be a good way to overcome it. But the real interest of looking at the P flag is the protocol detail it allows us by selecting what port numbers we're looking at.
The following graphs show the response time for S, F, and P flags vs. PingER, followed by the linear regression of the data for all three flags.




We are now looking at the R flag which reset the connection listing all the segment containing them setting the tcpdump filter as follows : tcpdump 'tcp[13] & 4 != 0'
Revised 17 August 1999
URL:
http://www-iepm.slac.stanford.edu/monitoring/passive/tcpdump.html
Comments to
iepm-l@slac.stanford.edu