IEPM
SLAC site

Disk Throughputs

Ajay Tirumala, Les Cottrell, Connie Logg, and I-Heng Mei

Questions

Definitions

Results

Summary

Disk Reads

Disk Writes

Block Sizes (effect on disk write throughput)

File Sizes (effect on disk write throughput)

Real File Transfers (over the Internet)

How can we measure sustainable throughput for large file transfers over the Internet without transfering large amounts of data?

Most file transfer tools like BBFTP used 'cached writes'. We cannot use 'cached writes' with small file transfers to estimate sustainable throughput for large transfers. Since small files fit in the buffer cache, nothing will actually be committed to disk (at least immediately). So we would only be measuring BBFTP's network performance(and disk read throughput of the sender). While 'commit at end' can be used with small files to estimate sustainable disk-only throughput, this is not the case when the when the network is involved. Since the small file will fit the buffer cache, the entire file contents will be flushed only at the end of the transfer, thus forcing the network transfer and the actual disk writes to happen sequentially. For a large transfer, the disk activity will happen in parallel to the network transfer, since the buffer cache will fill up periodically, causing the OS to forcibly commit the data. Since the small transfer with 'commit at end' forces sequential network and disk activity instead of parallel, the throughput measurement will be lower than the actual sustainable throughput for BBFTP. A more appropriate method would be similar to 'commit each write', where the application explicitly commits the data periodically. For example, in order to estimate the performance of BBFTP for large files, we should transfer a small file using a variant of BBFTP that commits the data periodically. This method closely models the disk activity occurring in parallel with the network activity. The block size results suggest that we can commit the data every ~4 MB ('commit each write') to approximate the throughput of 'commit at end'. The file size results suggest we can transfer a relatively small file (32MB or 64MB) using 'commit at end' and still approximate the sustainable disk performance for large files. Thus, if we transfer a small file and commit the data every ~4MB, we should be able to approximate the performance of the application for large file transfers. We are currently trying out this theory with BBFTP. We are creating a modified version that commits the data periodically, and comparing the throughput of the modified BBFTP for small files versus the throughput of regular BBFTP for large files. We will add the results to this page.



Read/Write throughput tests

  • Both reads and writes were done using a block size of 128 KB.Default file-size was 64MB. File-size of < 2GB for large file size read option indicates that the host machine did now allow the creation of a large file (>64MB), this generally happens if the fsync fails after all the writes.
  • (5/20/2003) *CPU Info - Number of CPUs * Clock speed in MHz, All other results in MBytes/s

    Host name OS Info CPU Info File System Disk BW - First read Disk BW - Second Read Write BW - plain Write BW - commit each write Write BW - commit at end Write BW - commit at end (large file)
    node1.dl.ac.uk i386_linux24 1*996 ext 29.80 188.08 191.13 15.48 26.09 29.05(file-size = 2147.48 MB)
    node1.kek.jp i386_linux22 1*451 ext 12.60 10.74 8.66 2.45 4.05 4.08(file-size = 76.76 MB)
    node1.riken.go.jp i386_linux24 1*2008 ext 29.20 770.62 414.23 4.75 24.36 24.86(file-size = 2147.48 MB)
    node1.uiuc.edu i386_linux24 4*1999 ext 52.74 1058.04 154.73 10.27 33.65 42.81(file-size = 2147.48 MB)
    node1.utah.edu i386_linux24 1*1693 unknown 40.56 594.69 184.76 13.76 28.82 40.80(file-size = 2147.48 MB)
    node1.jp.apan.net i386_linux24 1*1132 ext 38.79 448.60 171.94 7.94 32.15 40.42(file-size = 2147.48 MB)
    node1.switch.ch i386_linux24 2*1396 ext 53.73 491.83 140.39 4.39 34.33 43.54(file-size = 2147.48 MB)
    node1.cacr.caltech.edu i386_linux24 8*2396 ext 32.91 994.49 161.10 22.85 28.94 32.09(file-size = 2147.48 MB)
    node2.ccs.ornl.gov i386_linux24 1*1395 ext 32.85 832.43 614.63 29.18 26.91 29.84(file-size = 2147.48 MB)
    node1.internet2.edu i386_linux24 1*1794 ext 36.00 453.22 126.25 4.32 32.69 33.36(file-size = 2147.48 MB)
    node1.sox.i2.edu freebsd43 *0 - 23.92 203.55 22.34 6.15 23.19 0.00(file-size = 2147.48)
    node1.cesnet.cz i386_linux24 1*498 ext 14.76 143.92 166.03 4.12 12.51 13.44(file-size = 1204.84 MB)
    node1.triumf.ca i386_linux22 1*1715 ext 19.16 430.58 85.69 6.47 19.68 20.82(file-size = 2147.48 MB)
    node1.stanford.edu i386_linux24 2*1333 ext 26.83 345.67 139.17 3.08 20.35 20.83(file-size = 2147.48 MB)
    node1.utdallas.edu sun4x_59 2*400 ufs 24.06 70.04 28.19 3.04 21.01 19.31(file-size = 2147.48 MB)
    node1.nslabs.ufl.edu i386_linux24 1*999 ext 25.50 219.90 227.59 4.99 17.55 13.82(file-size = 2147.48 MB)
    node2.nslabs.ufl.edu i386_linux24 1*999 ext 23.27 226.04 225.73 4.73 14.83 13.53(file-size = 2147.48 MB)
    node2.jlab.org i386_linux24 4*451 ext 14.18 117.39 59.06 2.90 10.63 14.42(file-size = 2147.48 MB)
    node1.clrc.ac.uk i386_linux24 1*604 ext 15.04 193.39 84.10 2.97 13.78 13.63(file-size = 2147.48 MB)
    node2.rhic.bnl.gov i386_linux24 8*2392 nfs 26.20 644.01 10.93 9.27 10.91 11.13(file-size = 2147.48 MB)
    node1.in2p3.fr sun4x_58 2*450 ufs 11.02 84.83 10.51 2.24 10.46 9.97(file-size = 2047.43 MB)
    node1.roma1.infn.it sun4x_58 4*400 nfs 15.33 125.88 113.59 3.58 15.12 4.50(file-size = 2147.48 MB)
    node2.cern.ch i386_linux24 1*1495 ext 3.46 756.35 349.38 3.25 4.02 4.02(file-size = 2147.48 MB)
    node1.mcs.anl.gov i386_linux24 2*866 nfs 9.14 169.13 2.74 1.64 2.99 1.79(file-size = 2147.48 MB)
    node1.lsa.umich.edu i386_linux24 4*1263 ext 4.62 371.95 121.06 2.24 4.97 0.83(file-size = 1941.92 MB)
    CSV Data
    Graph
    Documentation

    Block size v. Write Throughput

    To see the effect of different block sizes on disk write throughput, we varied the block size from 16KB to 64MB while keeping the file size constant at 128MB. From the results, we can see that varying the block size does not have a clear effect on either 'commit at end' or 'cached writes'. Block size does have a significant effect on the throughput in 'commit each write' mode, as the throughput is low for small block sizes and increases as block size increases, approaching the throughput of 'commit at end' after a certain point. This increase is likely due to a decrease in seek time overhead for the disk writes. When block size is small, 'commit each write' performs many small disk writes, resulting in a significant seek time relative to actual transfer time. As the block size is increased, this overhead is reduced and performance approaches that of 'commit at end'. For most nodes, a block size of 4MB causes 'commit each write' performance to be within 10% of 'commit at end' performance (see below).

        --- commit each write
        --- commit at end
        --- cached write



    File size v. Write Throughput

    We varied the file size from 64KB to 1GB while keeping the block size constant at 64KB to measure the effect of file size on throughput. From the graphs below, we can see that file size does have an effect on both 'commit at end' and 'cached writes'. The 'commit at end' throughput generally increases with the throughput, probably due to the seek time becoming less significant as the committed data becomes larger. (Small files result in commiting a small amount of file data, thus the seek time is significant. As the file size increases, the transfer time dominates and the throughput begins to level off.) 'Cached write' throughputs are generally larger (due to OS buffering), but the throughput drops sharply above a certain file size threshold. This threshold represents the situation where the buffer cache fills up and the OS is forced to automatically commit the data to disk. Both 'commit at end' and 'cached writes' are subject to this threshold, and as the file size increases to infinity, their throughputs converge to the same limit - the sustainable disk throughput for large transfers. Clearly 'commit at end' approaches the limit much sooner, so we can use a smaller filesize to estimate sustainable disk throughput.(see below).

        --- commit each write
        --- commit at end
        --- cached write

    Note: we assume 'max throughput' to be roughly equal to the 'commit at end' throughput for the largest file size in this test(1GB)



    This page was created on 5/21/2003