UDP Mon and TCP multi-stream

An analysis

By Saad Ansari saad@slac.stanford.edu

UDP Mon and TCP multi-stream    1

UDP BW Mon   2

Test-Bed   2

Scenario 1  3

Scenario 2  3

Scenario 3  5

Other Statistics  6

TCP BW Mon/ Multi-stream    8

Scenario 1  8

Scenario 2  10

Conclusion   11

 

Introduction

UDP Mon and TCP multi-stream are parts of a software that puts forward a technique for measuring data-rates by varying various traffic parameters, but most visibly by changing inter-packet gaps.

The software follows a “request-response” design, analogous to a client-server architecture. The software runs as a user-mode process and does not modify the kernel or have very stringent requirements for it. Essentially, the client (requester) sends a request to the server (responder) by varying traffic and system characteristics. These include:

·        Packet length. This is the length of a packet, not including its headers.

·        Packet-wait time. This is the time the client waits before sending the next packet.

·        Number of probes. This is total number of packets the client sends for each measurement.

·        Packet length increments. This specifies the increment by which the length of the packet should be increased with each subsequent transmission.

·        Packet-wait time increments. This specifies the increment by which the wait time between packets should be increased with each subsequent transmission.

·        Port number. This specifies the port number on which the communication will take place.

·        Buffer size. This is the maximum buffer the system will allocate to handle incoming packets, thereby constraining the packet length to be less than or equal to this value. It defaults to 65536 bytes (64 KB) and the default value was used in all test cases.

 The server responds by echoing back the request. The client, upon receiving the echoed data, calculates various statistics like:

·        Received Data Rate. This is the data rate that the client perceives for the entire communication, i.e. both transmit and receive.

·        Send Data Rate. This is the data rate which depicts the rate at which the client can send data to its NIC interface.

·        Latency. This is the time taken for a packet to go to its destination and be echoed back.

·        Time/frame. This shows the time taken to process an outgoing/incoming packet.

Of these, the received data-rate stands out most conspicuously. Data-rate calculation is done using a simple formula:

(Data per packet in bytes * 8 * number of packets sent * 2)

-----------------------------------------------------------

(Time taken to send and receive data)

The numerator reflects the total data sent and received for that measurement (which is double the data initially sent by the client).

The following section highlights some of the main software packages and attempts to discern their utility by analyzing the results obtained from running these packages.

UDP BW Mon

UDP BW Mon is a utility which is used to estimate the capacity of a link. It follows the architecture described above and uses the aforementioned characteristics to generate traffic, from which various traffic statistics are extracted for analysis.

Varying inter-packet gaps did not reveal much as cross-traffic introduces non-deterministic gaps anyway and the specified inter-packet gaps do not remain the same. Ideally, the inter-packet gap should be close to 0, if the objective is to measure raw throughput on the link. However, higher values of up to 10 µseconds revealed similar results. With even higher values, throughput fell tremendously as predicted, because higher delays mean lower utilization of the link capacity.

Another metric that was varied was packet size. Bigger packets (values closer to the MTU of 1500 bytes) revealed close to optimal bandwidth results and smaller packets showed low utilization.

UDP Mon was also able to pick up transient network congestion and this was reflected as a trough in all the measurements.

Test-Bed

Initially tests were run over multiple hosts, but most links had confounding traffic variation, which skewed results, making it difficult to discover patterns in the resulting statistics. Looping back packets to the same machine was also not a good idea as data would never hit the wire. Therefore, a reasonably fast link was required which would have mostly deterministic characteristics without too much variation.

The path to the server (responder) had a bottleneck link of 622 Mbps. Therefore, theoretically, the maximum achievable rate would be close to this value, discounting for link layer headers, processing delay and potentially queuing delay.

Client Machine: Hercules.slacs.stanford.edu

Server Machine: pdsfgrid2.nersc.gov

Physical Distance between client and server: 45 miles, RTT = 2ms

Scenario 1

The first scenario had the following traffic parameters:

Packet length = 1450 bytes

Inter-packet wait times = 0, 10, 20, 50 µseconds

Number of probes per measurement[1] = 300

Results

Figure 1 Hercules to pdsfgrid2, packet size 1450 bytes

The graph shows that for wait times 0-20 µseconds, the data rate estimation followed the same pattern. Any transient network delays were reflected in most measurements and manifested itself as a trough in all plots for that measurement (consider measurement 16). There are a few outliers, but it is a reasonable assumption to ignore these (as they can be attributed to transient network congestion). This result also shows that delays of 50 µseconds reveal highly inaccurate bandwidth measurements and therefore have little utility as far as raw data rate measurement is concerned. Data rate measurement is done at the receiver end (server side).

Scenario 2

The second scenario had the following traffic parameters:

Packet length = 1000 bytes

Inter-packet wait times = 0, 10, 20, 50 µseconds

Number of probes per measurement = 300

Results

Figure 2 Hercules to pdsfgrid2, packet size 1000 bytes

This graph adds a little more to our analysis. It seems that varying wait times is not the only factor affecting the accuracy of measurement. Varying packet sizes also starts affecting results. With a packet size of 1000 bytes, results show that delays greater than 10 µseconds fail to give precise values. An interesting difference from the previous measurement is that delays of 10 µseconds give much higher estimates of the data rate than delays of 0 µseconds and these estimates are closer to the values revealed in the earlier tests. To check if this was not transient behavior, I ran the same test at a different time from another machine to the same server and got the following graph.

Figure 3 Antonia to pdsfgrid2, packet size 1000 bytes

This graph also shows better throughput for a wait of 10 µseconds as compared to packets sent back to back. I was unable to come up with a plausible explanation for this behavior.

Scenario 3

In order to verify the claim that smaller packet sizes will show lower throughput, the following traffic parameters were used:

Packet lengths = 200 and 100 bytes

Inter-packet wait times = 0, 10, 20, 50 µseconds

Number of probes per measurement = 300

Results

Figure 4 Hercules to pdsfgrid2, packet size 200 bytes

Figure 5 Hercules to pdsfgrid2, packet size 100 bytes

Both tests show that lowering packet sizes decreases the throughput. There is now much greater separation between the 0 and 10 µseconds measurements. Higher wait times show very low data rate estimates.

Other Statistics

UDP Mon also supplies an estimate of the data rate at which the client is able to send requests. This result is much higher than the bandwidth estimates for the link.

Figure 6 Comparing machine and link data rates from hercules to pdsfgrid2

As the data rate of the link was much slower than that at which data was supposedly being pumped on to the network, it would seem that packets would eventually buffer up at the bottleneck and some would probably get dropped. As this flow was UDP, any dropped packets would not be retransmitted. The UDP Mon application, however, did keep track of packet sequences and counted any dropped/missing packets from the sequence it was expecting. These tests revealed that no packets were dropped. This would seem contradictory to the reason cited above, but upon closer inspection of the code, it shows that this calculation of the machine data rate, does not account for the wait time between packets sent and therefore showed a much higher bandwidth than that of the link rate estimate. Another interesting observation is that sudden drops in link throughput do not always reflect transient network delay, but may be because the machine has slowed down because of some context switch or CPU scheduling. This would explain some sudden drops in throughput in some of the measurements that have been recorded in this document.

Another test was run to measure the maximum data rate at which a machine could transmit at. This test revealed that this statistic was directly related to the processing capability of the machine. The following graphs corroborate this claim.

Figure 7 Hercules to Pdsfgrid2, Machine Data Rate

Figure 7 shows that packets sent from Hercules can reach up to 2100 Mbps data rate. The same test when run from a slower machine, revealed a much slower data rate. Transmitting bigger packets was more efficient than smaller packets, perhaps because of less time spent on fragmenting the same amount of data for each. The next graph shows the same test run from a slower machine, showing a fall in machine throughput.

Figure 8 Antonia to Pdsfgrid2, Machine Data Rate

TCP BW Mon/ Multi-stream

This software was very similar to the UDP facility described earlier, but this ran over TCP. It also added another metric to vary traffic characteristics. This was the number of parallel TCP streams that would be run to transmit the data.

As there was no way to control the maximum window size (i.e. set it to a high value) or to tweak the underlying TCP stack, throughput remained small for transfers. Also, because of AIMD, the TCP flows being measured were either still in Slow-start before coming to a close or they had spent a lot of time in Slow-start getting to the maximum window size and this initial ramp-up skewed data rate estimates and the rates predicted by UDP in the earlier tests were not achieved. Therefore, unless there is some mechanism to modify the underlying TCP stack, it does not seem feasible to use TCP Mon to measure maximum available TCP throughput for very high speed connections and bulk transfers. Also, more than 39 simultaneous streams cause a segmentation error.

Following are a few tests that were run to test the utility of this software (the same test bed was used for comparable results).

Scenario 1

Packet length = 1450 bytes

Inter-packet wait times = 0, 10, 20, 50 µseconds

Number of probes per measurement = 300

Number of parallel streams = 1, 10, 20, 30, 35

Results

Figure 9 TCP Mon - Hercules to Pdsfgrid2, number of streams = 1

Figure 10 TCP Mon - Hercules to Pdsfgrid2, number of streams = 10

Figure 11 TCP Mon - Hercules to Pdsfgrid2, number of streams = 20

Figure 12 TCP Mon - Hercules to Pdsfgrid2, number of streams = 30

Figure 13 TCP Mon - Hercules to Pdsfgrid2, number of streams = 35

The results show that increasing the number of parallel TCP streams has increased the total throughput. However, it also shows that for larger number of TCP streams, there is greater variation in the throughput. Consider the graph with 35 TCP streams, apart from the 3 outliers where the data rate has dropped below 50 Mbps, there is greater variation in the throughput for different wait times. In order to investigate further how throughput reacted to increased number of connections, the next scenario was created.

Scenario 2

Packet length = 1450 bytes

Inter-packet wait time = 0 µseconds

Number of probes per measurement = 300

Number of parallel streams = 1-38

Results

Figure 14 Varying the number of TCP streams

This graph shows that although there is a general increase in the throughput for a greater number of connections, it begins to taper off at the end. The reason could be burning CPU cycles and memory for managing the different TCP connections. There was a strict limitation in the software, that it did not allow more than 39 TCP connections, as it would exit with a segmentation fault, possibly because of using statically sized buffers.

Conclusion

The software is useful for revealing available link bandwidth and results are corroborated by other tests (namely tests run by Jiri). However, investigating the utility of varying inter-packet gaps is beyond the scope of this document. An option could be to look at varying inter-packet gaps randomly, emulating cross-traffic and studying the results. Documentation detailing how calculations were made was sparse and in order to find out, the code had to be studied. Although simple in essence, a single main function and goto’s sprinkled inside loops and other control structures, made understanding the flow much harder than it should have been. TCP Mon did not prove very much useful either, unless the TCP window size could be modified. Although TCP has some sort of checksum mechanism, UDP Mon assumed whatever packets it got were correct. It may be that a checksum calculation was deliberately left out to avoid introducing further delay in the calculations, but the program did not offer any remedies for a scenario if a packet got clobbered on the network.



[1] Each probe consisted of packets of the specified length. Measurements for all probes were aggregated and averaged over the total number of probes sent.