|
Disk ThroughputsAjay Tirumala, Les Cottrell, Connie Logg, and I-Heng Mei |
|
|
How can we measure sustainable throughput for large file transfers over the Internet without transfering large amounts of data?
Most file transfer tools like BBFTP used 'cached writes'. We cannot use 'cached writes' with small file transfers to estimate sustainable throughput for large transfers. Since small files fit in the buffer cache, nothing will actually be committed to disk (at least immediately). So we would only be measuring BBFTP's network performance(and disk read throughput of the sender). While 'commit at end' can be used with small files to estimate sustainable disk-only throughput, this is not the case when the when the network is involved. Since the small file will fit the buffer cache, the entire file contents will be flushed only at the end of the transfer, thus forcing the network transfer and the actual disk writes to happen sequentially. For a large transfer, the disk activity will happen in parallel to the network transfer, since the buffer cache will fill up periodically, causing the OS to forcibly commit the data. Since the small transfer with 'commit at end' forces sequential network and disk activity instead of parallel, the throughput measurement will be lower than the actual sustainable throughput for BBFTP. A more appropriate method would be similar to 'commit each write', where the application explicitly commits the data periodically. For example, in order to estimate the performance of BBFTP for large files, we should transfer a small file using a variant of BBFTP that commits the data periodically. This method closely models the disk activity occurring in parallel with the network activity. The block size results suggest that we can commit the data every ~4 MB ('commit each write') to approximate the throughput of 'commit at end'. The file size results suggest we can transfer a relatively small file (32MB or 64MB) using 'commit at end' and still approximate the sustainable disk performance for large files. Thus, if we transfer a small file and commit the data every ~4MB, we should be able to approximate the performance of the application for large file transfers. We are currently trying out this theory with BBFTP. We are creating a modified version that commits the data periodically, and comparing the throughput of the modified BBFTP for small files versus the throughput of regular BBFTP for large files. We will add the results to this page.
| Host name | OS Info | CPU Info | File System | Disk BW - First read | Disk BW - Second Read | Write BW - plain | Write BW - commit each write | Write BW - commit at end | Write BW - commit at end (large file) |
|---|---|---|---|---|---|---|---|---|---|
| node1.dl.ac.uk | i386_linux24 | 1*996 | ext | 29.80 | 188.08 | 191.13 | 15.48 | 26.09 | 29.05(file-size = 2147.48 MB) |
| node1.kek.jp | i386_linux22 | 1*451 | ext | 12.60 | 10.74 | 8.66 | 2.45 | 4.05 | 4.08(file-size = 76.76 MB) |
| node1.riken.go.jp | i386_linux24 | 1*2008 | ext | 29.20 | 770.62 | 414.23 | 4.75 | 24.36 | 24.86(file-size = 2147.48 MB) |
| node1.uiuc.edu | i386_linux24 | 4*1999 | ext | 52.74 | 1058.04 | 154.73 | 10.27 | 33.65 | 42.81(file-size = 2147.48 MB) |
| node1.utah.edu | i386_linux24 | 1*1693 | unknown | 40.56 | 594.69 | 184.76 | 13.76 | 28.82 | 40.80(file-size = 2147.48 MB) |
| node1.jp.apan.net | i386_linux24 | 1*1132 | ext | 38.79 | 448.60 | 171.94 | 7.94 | 32.15 | 40.42(file-size = 2147.48 MB) |
| node1.switch.ch | i386_linux24 | 2*1396 | ext | 53.73 | 491.83 | 140.39 | 4.39 | 34.33 | 43.54(file-size = 2147.48 MB) |
| node1.cacr.caltech.edu | i386_linux24 | 8*2396 | ext | 32.91 | 994.49 | 161.10 | 22.85 | 28.94 | 32.09(file-size = 2147.48 MB) |
| node2.ccs.ornl.gov | i386_linux24 | 1*1395 | ext | 32.85 | 832.43 | 614.63 | 29.18 | 26.91 | 29.84(file-size = 2147.48 MB) |
| node1.internet2.edu | i386_linux24 | 1*1794 | ext | 36.00 | 453.22 | 126.25 | 4.32 | 32.69 | 33.36(file-size = 2147.48 MB) |
| node1.sox.i2.edu | freebsd43 | *0 | - | 23.92 | 203.55 | 22.34 | 6.15 | 23.19 | 0.00(file-size = 2147.48) |
| node1.cesnet.cz | i386_linux24 | 1*498 | ext | 14.76 | 143.92 | 166.03 | 4.12 | 12.51 | 13.44(file-size = 1204.84 MB) |
| node1.triumf.ca | i386_linux22 | 1*1715 | ext | 19.16 | 430.58 | 85.69 | 6.47 | 19.68 | 20.82(file-size = 2147.48 MB) |
| node1.stanford.edu | i386_linux24 | 2*1333 | ext | 26.83 | 345.67 | 139.17 | 3.08 | 20.35 | 20.83(file-size = 2147.48 MB) |
| node1.utdallas.edu | sun4x_59 | 2*400 | ufs | 24.06 | 70.04 | 28.19 | 3.04 | 21.01 | 19.31(file-size = 2147.48 MB) |
| node1.nslabs.ufl.edu | i386_linux24 | 1*999 | ext | 25.50 | 219.90 | 227.59 | 4.99 | 17.55 | 13.82(file-size = 2147.48 MB) |
| node2.nslabs.ufl.edu | i386_linux24 | 1*999 | ext | 23.27 | 226.04 | 225.73 | 4.73 | 14.83 | 13.53(file-size = 2147.48 MB) |
| node2.jlab.org | i386_linux24 | 4*451 | ext | 14.18 | 117.39 | 59.06 | 2.90 | 10.63 | 14.42(file-size = 2147.48 MB) |
| node1.clrc.ac.uk | i386_linux24 | 1*604 | ext | 15.04 | 193.39 | 84.10 | 2.97 | 13.78 | 13.63(file-size = 2147.48 MB) |
| node2.rhic.bnl.gov | i386_linux24 | 8*2392 | nfs | 26.20 | 644.01 | 10.93 | 9.27 | 10.91 | 11.13(file-size = 2147.48 MB) |
| node1.in2p3.fr | sun4x_58 | 2*450 | ufs | 11.02 | 84.83 | 10.51 | 2.24 | 10.46 | 9.97(file-size = 2047.43 MB) |
| node1.roma1.infn.it | sun4x_58 | 4*400 | nfs | 15.33 | 125.88 | 113.59 | 3.58 | 15.12 | 4.50(file-size = 2147.48 MB) |
| node2.cern.ch | i386_linux24 | 1*1495 | ext | 3.46 | 756.35 | 349.38 | 3.25 | 4.02 | 4.02(file-size = 2147.48 MB) |
| node1.mcs.anl.gov | i386_linux24 | 2*866 | nfs | 9.14 | 169.13 | 2.74 | 1.64 | 2.99 | 1.79(file-size = 2147.48 MB) |
| node1.lsa.umich.edu | i386_linux24 | 4*1263 | ext | 4.62 | 371.95 | 121.06 | 2.24 | 4.97 | 0.83(file-size = 1941.92 MB) |
To see the effect of different block sizes on disk write throughput, we varied the block size from 16KB to 64MB while keeping the file size constant at 128MB. From the results, we can see that varying the block size does not have a clear effect on either 'commit at end' or 'cached writes'. Block size does have a significant effect on the throughput in 'commit each write' mode, as the throughput is low for small block sizes and increases as block size increases, approaching the throughput of 'commit at end' after a certain point. This increase is likely due to a decrease in seek time overhead for the disk writes. When block size is small, 'commit each write' performs many small disk writes, resulting in a significant seek time relative to actual transfer time. As the block size is increased, this overhead is reduced and performance approaches that of 'commit at end'. For most nodes, a block size of 4MB causes 'commit each write' performance to be within 10% of 'commit at end' performance (see below).
--- commit each write
--- commit at end
--- cached write
We varied the file size from 64KB to 1GB while keeping the block size constant at 64KB to measure the effect of file size on throughput. From the graphs below, we can see that file size does have an effect on both 'commit at end' and 'cached writes'. The 'commit at end' throughput generally increases with the throughput, probably due to the seek time becoming less significant as the committed data becomes larger. (Small files result in commiting a small amount of file data, thus the seek time is significant. As the file size increases, the transfer time dominates and the throughput begins to level off.) 'Cached write' throughputs are generally larger (due to OS buffering), but the throughput drops sharply above a certain file size threshold. This threshold represents the situation where the buffer cache fills up and the OS is forced to automatically commit the data to disk. Both 'commit at end' and 'cached writes' are subject to this threshold, and as the file size increases to infinity, their throughputs converge to the same limit - the sustainable disk throughput for large transfers. Clearly 'commit at end' approaches the limit much sooner, so we can use a smaller filesize to estimate sustainable disk throughput.(see below).
--- commit each write
--- commit at end
--- cached write