Verify performance

This page provides details on how to verify volume performance.

Measure volume performance using Fio

Use the I/O generator tool, Fio, to measure baseline performance.

Using Fio

Fio applies a workload which you can specify through a command line interface or a configuration file. While it runs, Fio shows a progress indicator with current throughput and input and output per second (IOPS) numbers. After it ends, a detailed summary displays.

Fio results example

The following examples show a single-threaded, 4k random write job running for 60 seconds, which is a useful way to measure baseline latency. In the following commands, the --directory parameter points to a folder with a mounted NetApp Volumes share:

  $ FIO_COMMON_ARGS=--size=10g --fallocate=none --direct=1 --runtime=60 --time_based --ramp_time=5
  $ fio $FIO_COMMON_ARGS --directory=/netapp --ioengine=libaio --rw=randwrite --bs=4k --iodepth=1 --name=nv
  cvs: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
  fio-3.28
  Starting 1 process
  cvs: Laying out IO file (1 file / 10240MiB)
  Jobs: 1 (f=1): [w(1)][100.0%][w=7856KiB/s][w=1964 IOPS][eta 00m:00s]
  cvs: (groupid=0, jobs=1): err= 0: pid=1891: Wed Dec 21 14:56:37 2022
    write: IOPS=1999, BW=7999KiB/s (8191kB/s)(469MiB/60001msec); 0 zone resets
      slat (usec): min=4, max=417, avg=12.06, stdev= 5.71
      clat (usec): min=366, max=27978, avg=483.59, stdev=91.34
      lat (usec): min=382, max=28001, avg=495.96, stdev=91.89
      clat percentiles (usec):
      |  1.00th=[  408],  5.00th=[  429], 10.00th=[  437], 20.00th=[  449],
      | 30.00th=[  461], 40.00th=[  469], 50.00th=[  482], 60.00th=[  490],
      | 70.00th=[  498], 80.00th=[  515], 90.00th=[  529], 95.00th=[  553],
      | 99.00th=[  611], 99.50th=[  652], 99.90th=[  807], 99.95th=[  873],
      | 99.99th=[ 1020]
    bw (  KiB/s): min= 7408, max= 8336, per=100.00%, avg=8002.05, stdev=140.09, samples=120
    iops        : min= 1852, max= 2084, avg=2000.45, stdev=35.06, samples=120
    lat (usec)   : 500=70.67%, 750=29.17%, 1000=0.15%
    lat (msec)   : 2=0.01%, 4=0.01%, 50=0.01%
    cpu          : usr=2.04%, sys=3.25%, ctx=120561, majf=0, minf=58
    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      issued rwts: total=0,119984,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=1

  Run status group 0 (all jobs):
    WRITE: bw=7999KiB/s (8191kB/s), 7999KiB/s-7999KiB/s (8191kB/s-8191kB/s), io=469MiB (491MB), run=60001-60001msec

Read the following lines for details about the performance results:

  • Latency: lat (usec): min=382, max=28001, avg=495.96, stdev=91.89

    The average latency is 495.96 microseconds (usec), roughly 0.5 ms, which is an ideal latency.

  • IOPS: min= 1852, max= 2084, avg=2000.45, stdev=35.06, samples=120

    The preceding example shows an average of 2,000 IOPS. That value is expected for a single-threaded job with 0.5 ms latency (IOPS = 1000 ms/0.5 ms = 2000).

  • Throughput: bw ( KiB/s): min= 7408, max=8336, per=100.00%, avg=8002.05, stdev=140.09

    The throughput average is 8002 KiBps, which is the expected result for 2,000 IOPS with a block size of 4 KiB (2000 1/s * 4 KiB = 8,000 KiB/s).

Measure latency

Latency is a fundamental metric for volume performance. It's a result of client and server capabilities, the distance between client and server (your volume), and equipment in between. The main component of the metric is distance-induced latency.

You can ping the IP of your volume to get the round-trip time, which is a rough estimate of your latency.

Latency is affected by the block size and whether you are doing read or write operations. We recommend that you use the following parameters to measure the baseline latency between your client and a volume:

Linux

fio --directory=/netapp \
 --ioengine=libaio \
 --rw=randwrite \
 --bs=4k --iodepth=1 \
 --size=10g \
 --fallocate=none \
 --direct=1 \
 --runtime=60 \
 --time_based \
 --ramp_time=5 \
 --name=latency

Windows

fio --directory=Z\:\
--ioengine=windowsaio
--thread
--rw=randwrite
--bs=4k
--iodepth=1
--size=10g
--fallocate=none
--direct=1
--runtime=60
--time_based
--ramp_time=5
--name=latency

Replace the parameters rw (read/write/randread/randwrite) and bs (block size) to fit your workload. Larger block sizes result in higher latency, where reads are faster than writes. The results can be found in the lat row.

Measure IOPS

IOPS are a direct result of the latency and concurrency. Use one of the following tabs based on your client type to measure IOPS:

Linux

fio --directory=/netapp \
--ioengine=libaio \
--rw=randread \
--bs=4k \
--iodepth=32 \
--size=10g \
--fallocate=none \
--direct=1 \
--runtime=60 \
--time_based \
--ramp_time=5 \
--name=iops

Windows

fio --directory=Z\:\
--ioengine=windowsaio
--thread
--rw=randread
--bs=4k
--iodepth=32
--size=10g
--fallocate=none
--direct=1
--runtime=60
--time_based
--ramp_time=5
--numjobs=16
--name=iops

Replace the parameters rw (read/write/randread/randwrite), bs (blocksize), and iodepth (concurrency) to fit your workload. The results can be found in the iops row.

Measure throughput

Throughput is IOPS multiplied by blocksize. Use one of the following tabs based on your client type to measure throughput:

Linux

fio --directory=/netapp \
--ioengine=libaio \
--rw=read \
--bs=64k \
--iodepth=32 \
--size=10g \
--fallocate=none \
--direct=1 \
--runtime=60 \
--time_based \
--ramp_time=5 \
--numjobs=16 \
--name=throughput

Windows

fio --directory=Z\:\
--ioengine=windowsaio
--thread
--rw=read
--bs=64k
--iodepth=32
--size=10g
--fallocate=none
--direct=1
--runtime=60
--time_based
--ramp_time=5
--numjobs=16
--name=throughput

Replace the parameters rw (read/write/randread/randwrite), bs (blocksize), and iodepth (concurrency) to fit your workload. You can only achieve high throughput using block sizes 64k or larger and high concurrency.

What's next

Review performance benchmarks.