Bulk File Transfer with Compression Measurements
Bulk throughput measurements | Bulk throughput simulation | Windows vs. streams | Effect of load on RTT and loss | Bulk file transfer measurements | QBSS measurements
We verified that bbcp was indeed setting the window sizes correctly by using the Solaris snoop command on pharlap to capture packets and looking at the stream initiation SYN and SYN/ACK packets.
For each copy, we noted the transfer rate reported by bbcp, the file size, the window size, the number of streams, the compression factor and compression achieved, the Unix time user, system/kernel, real times, the bbcp source and target host cpu usage reported by bbcp. We also noted the loaded (when bbcp was running) and unloaded ping times (when bbcp was not running). Between each measurement we slept for the duration of the previous bbcp measurement to limit the load imposed by bbcpload.pl and to allow unloaded ping measurements to be made. We also noted the operating system and version, together with the number of cpus and their MHz, for the remote host. The maximum number of streams allowed by bbcp was 64. The host at SLAC (pharlap) was a Sun E4500 with 4 cpus running at 336 MHz, a Gbps Ethernet interface and running Solaris 5.8.
To source file was read from /tmp. On pharlap /tmp is stored in swap space. The source file used was a 60Mbyte BaBar Objectivity file. The destination file was always written to /dev/null.
All measurements were made for a duration of approximately 10 seconds (see Measurement Duration for more details).
The average MHz-seconds used for a compression factor = 0 (no compression) is
8.3 +- 0.91 MHz-secs, and for a compression factor of 1 (compression = 6.9) is
57.3 +- 0.46 MHz-secs.
Next we made measurements of bbccp throughput to 22 remote hosts with
compression factors of 0 and 1, and with an optimal
TCP window size and number of streams selected for each host.
The results are shown to the right, with the compression factor of
0 being shown in
blue and that for a compression factor of 1 shown in red diagonal
It can be seen that the maximum compressed throughput is about 50Mbits/s.
If the uncompressed
throughput exceeds this rate (as in the case of NERSC2,ANL,
LANL, caltech, SDSC, Stanford, NERSC, Mich, Wisc, and LBL) then
there is no improvement by using compression. If the uncomppressed
throughput is < ~ 50Mbits/s
then compression can help (by more than a factor of 4 in the case of KEK which only has a 10Mbit/s
bottleneck bandwidth between it and SLAC). When using a compression factor of
1 (or compression of 6.7), then the average compressed bbcp
thoughput is 58+- 0.46 Mbits/s.
The consistency of the compressed throughput indicates that there is a common cause. To ascertain whether this common cause was the measuring host (pharlap), we repeated the measurements from antonia, a 2*532 MHz cpus running Linux 2.4 with a Gig Ethernet NIC, and with hercules, a 2*1131 MHz cpus host running Linux 2.4 with 2 Gig Ethernet NICs. Comparing the antonia results with pharlap's it is apparent that the maximum uncompressed throughput is reduced from about 400Mbits/s to about 165Mbits/s. This is believed to be since the pharlap source file is read from memory (/tmp is in swap space) whereas on antonia and hercules it is read from disk (/dev/sda2 for antonia and hercules and /dev/sda9 on testlnx05).
For the 1131MHz cpu (hercules) it can be seen that uncompressed throughputs of over 400Mbits are achievable, and the median compressed throughput is over 140Mbits/s. To understand these compressed throughput values better we measured the system time gzip took to compress the 380MByte Objectivity file on the measurement hosts and reported this as Mbits/s. The table below compares the results from all the measuring hosts. It can be seen that there is reasonable agreement between the median bbcp compressed throughput and the Gzip throughputs, with Gzip typically being 10-17% lower. This reduction maybe due to gzip source and destination being on the same host, whereas the bbcp measurements were using separate source and destination hosts. To pursue this further we used bbcp to compress and copy the above file from and to the same host and measured the source process cpu seconds and Mbits/s. The graph to the right of the table shows the median bbcp compressed (compression factor 1, compression 6.9) throughput from the measuring host versus the MHz of the measuring host's cpu:
|Host||OS||# cpus||MHz||NIC Mbps||Median compressed MBits/s||Stdev||Gzip Mbits/s||Bbcp sce=tgt Mbits/s||c/x|
For an uncompressed
copy, the average ratio of source_host MHz-seconds / target_host MHz-seconds was
1.2 +- 0.3. For The average target host MHz-s for Solaris (5.6, 6.7 and 5.8 a total of
5 hosts) was 5460+-367 MHz-seconds, and for Linux (2.2 & 2.4, a total of 20 hosts) was
7805+-745. There was
little variation within the various Solaris versions or the various Linux versions.
We also looked for a correlation between target MHz-seconds and the throughput
in Mbits/s or the number
of streams but could find little evidence for any correlation. See the plots below:
Looking at the bbcp compression throughput graph for Hercules above, we see that the Stanford compressed throughput rate (96 Mbits/s) is much lower than the median (~ 143 Mbits/s), and its uncompressed bbcp throughput is about 90Mbits/s so there is no lack of network bandwidth. The Stanford cpu is a single 299MHz cpu running Linux 2.2. Thus if we take the c/x ratio to be ~ 3.32, the best compressed throughput is limited by its cpu to about:
By fixing the window size at 16Kbytes and varying the number of streams with no compression for bbcp copies from SLAC to ANL, we were able to increase the file copy throughput in a fairly linear-with-streams fashion from about 1.3Mbits/s to over 115 Mbits/s. Using the same window sizes and varying the numbers of streams for compression factors of 1 through 9 we were able to visualize the effectiveness of compression on throughput for varying uncompressed file copy rates. This is seen on the graph to the right. It is seen that for uncompressed copies the throughput is less than about 50Mbits/sec for fewer than 18 streams. For 18 or more streams the uncompressed throughput is greater than 50 Mbits/s. Thus, as can be seen from the graph for a window size of 16KBytes between SLAC and ANL, compression is effective in increasing throughput for fewer than 18 parallel streams. It can also be seen that a compression factor of 1 is most effective.