By Saad Ansari saad@slac.stanford.edu
Tsunami is a file transfer protocol that uses UDP as a transport mechanism to transfer files. Its architecture is analogous to conventional FTP in that it uses a control session to authenticate and negotiate the connection, and a data session to transfer the target file. The Tsunami architecture follows a typical client server model. The client attempts file transfer by asking the server for a specific file on the TCP control port. The server forks a thread to handle this request and goes back waiting for the next connection. The forked thread checks for the file and attempts file transfer to the client on a known UDP port.

Figure 1 Tsunami Protocol Flow
Tsunami uses UDP as its transport mechanism to facilitate file transfer. It transfers files block by block where block sizes are variable. Since UDP does not guarantee reliable delivery, Tsunami has its own reordering and retransmission mechanism. Therefore if any dropped, delayed or out of order blocks are detected by the client, it sends a retransmission request for that particular block to the server on the TCP control connection. The server does not keep any state for the file transfer. Any retransmissions will require the client to send the server a retransmission request.
Blocks can arrive out of order and may require re-sequencing and reordering at the client’s end. It is therefore the client’s responsibility to maintain states for the blocks that it receives.

Figure 2 Tsunami client-side processing
The client maintains a “ring” buffer, which functions as a circular queue where it queues incoming packets for reassembly and re-sequencing. There is more than one thread accessing this queue, therefore it enforces a mutual exclusion locking mechanism to ensure dead-lock and race condition avoidance. As a result, the client incurs a software locking overhead for each block that is read from or written to the ring. Each incoming block is written into this ring buffer, which is maintained in memory. The client also invokes a disk thread, which from time to time, reads from the ring buffer and writes the file out to disk. Whenever a packet/block is received by the client, it attempts to see if this block was what it was expecting (i.e. it arrived in sequence). If this is not the case the client will queue up a request for a retransmission for the missing block(s). After 50 iterations, the client checks for any requests queued up in the retransmission queue and sends them to the server. At every iteration, the server simply checks if it has any retransmission requests coming in, if so it will first service them. Otherwise, the server will send the next block.
The client also sends the current error rate with the retransmission request, which the server echoes locally. At the end of the file transfer, the client displays the transfer statistics. The throughput is calculated for end-to-end disk transfers.
In order to test Tsunami for its utility in measuring bandwidth for bulk file transfers, files were transferred from the server at Hercules at SLAC to a client running on IPERF1-GIG.NSLABS.UFL.EDU, with a bottleneck link of about 100 Mbps. Files of sizes 1MB and 100KB were used for the transfer. Files of sizes between 800 MB and 1 GB were attempted but the transfers were not successful and no useful data could be gleaned from these tests.
Small file sizes however, will actually show very poor performance because file sizes should be at least as large as the bandwidth-delay product of the network. Since this was not the case, and the protocol has been designed to favor a small number of very large transfers. The reason for the file being at least as large as the bandwidth-delay product is that Tsunami has a built-in rate adaptation mechanism, which kicks in, in response to packet loss. Small files, do not give enough time for this to happen as the transfer is completed very early.
Once a stable version of Tsunami is released, it is intended that these tests be run with larger file sizes.
Tsunami generates statistics summarizing each transfer and logs any errors or timeouts encountered in the process. However, these statistics are printed only if the Tsunami client exits gracefully. Unfortunately, for most of the tests, this was not the case. In order to get bandwidth estimates, net-flow records were used to extract flow characteristics. These results however did not give a lot of insight on network capacity. Each point on the graph represents a file transfer. Overall, for all the test runs, the average data rate that Tsunami experienced was around 10 Mbps. This data rate corresponded to all the data that had been pushed onto the network (including retransmissions etc.). This value is small compared with the actual achievable rate (close to 100 Mbps according to the IEPM-BW measurements for that day).

Figure 3 Total Achievable data rate by Tsunami = Total bytes transferred / total time, each point denotes a file transfer
To compound these results, the actual bandwidth that Tsunami achieved was significantly lower than what Tsunami’s raw UDP throughput were able to achieve.
These results would seem to corroborate the claim that small file transfers will not yield high-throughput.

Figure 4 The effective goodput per file transfer. Each point represents the corresponding file transfer that has been mentioned in figure 3
There were a few points of contention with the current
Tsunami implementation (version
Another problem was the exact opposite of the above problem. After the client had received the target file, instead of halting, it would hang and therefore, not issue connection tear-down directives, either locally or to the remote server to release connection resources. If the client was forced to exit via a control command directive (cntrl-C in this case), the machine on which the client was running, would be slowed down. The slow-down not only affected the current session, but any other processes running on the system were slowed down as well. It may have been that the slow-down may have been due to packet loss or a lossy network. However, there was no network activity that was logged at the time of issuing the forced quit. Therefore, the machine slow-down seems to be a local rather than a network phenomenon.
The following is a ping session to the client node at the time of the file transfer:
4iepm@hercules:~>ping
IPERF1-GIG.NSLABS.UFL.EDU
PING
IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3) from 134.79.240.27 :
56(84) bytes of data.
64 bytes from
IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=1 ttl=242 time=70.2 ms
...
64 bytes from
IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=149
ttl=242 time=70.2 ms
64
bytes from IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=154
ttl=242 time=156 ms (Transfer Started)
64 bytes from
IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=155
ttl=242 time=70.3 ms
…
64 bytes from
IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=182
ttl=242 time=70.1 ms
64
bytes from IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=185
ttl=242 time=228 ms (Ctrl-C at this point)
64
bytes from IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=196
ttl=242 time=500 ms
64
bytes from IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=209
ttl=242 time=516 ms
64
bytes from IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=210
ttl=242 time=511 ms
64
bytes from IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=217
ttl=242 time=504 ms
64
bytes from IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=226
ttl=242 time=331 ms
64
bytes from IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=227
ttl=242 time=330 ms
64
bytes from IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=228
ttl=242 time=143 ms
64 bytes from
IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=229
ttl=242 time=70.0 ms
…
64 bytes from
IPERF1-GIG.NSLABS.UFL.EDU (65.118.160.3): icmp_seq=242
ttl=242 time=70.1 ms
---
IPERF1-GIG.NSLABS.UFL.EDU ping statistics ---
242 packets
transmitted, 201 received, 16% loss, time 241468ms
rtt min/avg/max/mdev
= 69.942/83.152/516.485/67.237 ms
This result shows that after the forced exit, there was high packet loss as well as increased round-trip delay. There is a strong co-relation between the forced exit and the machine slow-down, because each time the client had to be killed, the slowdown became obvious. At the time of issuing the control directive, there were memory resident processes associated with the Tsunami client. As there is no signal handler, the operating system may have been invoked at this point to handle the control interrupt and clean-up these memory resident processes and this might explain why so many CPU cycles were dedicated to killing the client and its associated processes.
Due to the non-deterministic behavior of Tsunami, it is difficult to make a concrete recommendation at this point. Once a more stable version is released, it would be easier to make an analysis. It is clear however, that Tsunami is intended for large file transfers. Small transfers aren’t very useful because of the reasons cited above, although small transfers have helped to expose some race conditions which are being incorporated in the next version.
The maintainer’s of Tsunami are working on removing these software bugs.