Can Automatic Striping make VMware vSAN performance better?

Fine-tuning VMware vSAN is fun. We bet you should do less job to prepare a ballistic missile for launch than to deploy that SDS solution… Anyway, we’ll try to find the right settings to ensure the highest vSAN performance for you! Today, we check whether Automatic Striping provides any performance boost.

Introduction

Slowest things in the world

There was a guy saying that we should increase “data” virtual disk volume to get higher VMware vSAN performance. Sounds weird, but he’s right: if you create a +255 GB virtual device, it has Automatic Striping enabled by default (read more about that thing here: https://blogs.vmware.com/virtualblocks/2016/09/19/vsan-stripes/). Wait, what if you do not need a virtual disk to be as massive as Paul’s ex? 🙂 Well, you can just set stripe width right! In our previous article, we discussed how this can be done and how to find the right Number of disk stripes per object value. Today, we check whether setting stripe width manually provides the same advantage as Automatic Striping.

Hardware toys

To start with, let’s take a look at our environment configuration. Tests here were run on just the same setup as before. The overall datastore capacity remained 6.99 TB, so due to massive “data” virtual disks, we were unable to spawn more than 12 VMs in the cluster.

Here’s the environment configuration:

  • ESXi Host #1, ESXi Host #2, ESXi Host #3, ESXi Host #4 are packed with equal hardware.
  • Dell R730, CPU 2x Intel Xeon E5-2683 v3 @ 2.00 GHz , RAM 64GB
  • Storage: 1x Intel SSD DC P3700 2TB, 4x Intel SSD DC S3500 480GB
  • LAN: 2x Mellanox ConnectX-4 100Gbit/s CX465A
  • Hypervisor: VMware ESXi 6.5 Update 1

To make everything clear, study the setup interconnection diagram below.

VMware vSAN 4-node Cluster

Boring part: VMware vSAN deployment

Here is the whole process of VMware vSAN deployment.

VMware vSAN deployment

VMware vSAN deployment

VMware vSAN deployment Claim disks

VMware vSAN deployment Create fault domains

VMware vSAN deployment Complete

Once vSAN was successfully deployed, we created a new VM storage policy.

VM storage policy

VM storage policy structure

VM storage policy vSAN

VM storage policy vSAN

VM storage policy Storage compability

VM storage policy review and finish

In the end, let’s check whether we set up everything right.

Configure vSAN

Configure vSAN

Configure vSAN

Configure vSAN

Well, everything looks great. There are no critical errors, health warnings, and any other ujkv that may disrupt VMware vSAN stability.

Creating a test VM

Now, let’s create a test VM. Find its configuration below:

  • CPU: 4xCPU (1x Sockets, 4x Cores per Socket)
  • Memory: 7GB
  • Hard disk 1: 25GB (vmdk, SCSI controller 0 – LSI Logic SAS) on Local ESXi Datastore. It is the “system” disk today (a virtual device where the guest OS resides)
  • Hard disk 2: 264GB (vmdk, SCSI controller 1 – VMware Paravirtual) on vsanDatastore. It is the “data” disk (a virtual device that has its performance tested);
  • Network adapter: 1x VMXNET 3

Creating a test VM

Creating a test VM virtual hardware

Finding the optimal test utility parameters

There’s still one thing left before we finally start the research: picking the optimal test utility parameters (number of threads, queue depth). We need them to make sure that we get the highest performance that VMware vSAN can potentially provide. To find those parameters, we ran a bunch of measurements under the varying number of threads and rising outstanding I/O value. All these tests were run under the 4k random read pattern. The number of threads and queue depth values that correspond to the highest performance are considered the optimum test utility parameters.

Find what the testing utility parameters were like in general.

DiskSPD testing parameters under threads=1, Outstanding I/O=1,2,4,8,16,32,64,128

diskspd.exe -t1 -b4k -r -w0 -o1 -d60 -Sh -L #1 > c:\log\t1-o1-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o2 -d60 -Sh -L #1 > c:\log\t1-o2-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o4 -d60 -Sh -L #1 > c:\log\t1-o4-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o8 -d60 -Sh -L #1 > c:\log\t1-o8-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\t1-o16-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o32 -d60 -Sh -L #1 > c:\log\t1-o32-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o64 -d60 -Sh -L #1 > c:\log\t1-o64-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o128 -d60 -Sh -L #1 > c:\log\t1-o128-4k-rand-read.txt
timeout 10

FIO testing parameters under threads=1, Outstanding I/O=1,2,4,8,16,32,64,128

[global]
numjobs=1
loops=1
time_based
ioengine=windowsaio
direct=1
runtime=60
filename=\\.\PhysicalDrive1

[4k-rnd-read-o1]
bs=4k
iodepth=1
rw=randread
stonewall

[4k-rnd-read-o2]
bs=4k
iodepth=2
rw=randread
stonewall

[4k-rnd-read-o4]
bs=4k
iodepth=4
rw=randread
stonewall

[4k-rnd-read-o8]
bs=4k
iodepth=8
rw=randread
stonewall

[4k-rnd-read-o16]
bs=4k
iodepth=16
rw=randread
stonewall

[4k-rnd-read-o32]
bs=4k
iodepth=32
rw=randread
stonewall

[4k-rnd-read-o64]
bs=4k
iodepth=64
rw=randread
stonewall

[4k-rnd-read-o128]
bs=4k
iodepth=128
rw=randread
stonewall

VMware Virtual Disk 264GB (RAW) over vsanDatastore 4k random read (DiskSPD)

VMware Virtual Disk 264GB (RAW) over vsanDatastore – 4k random read (DiskSPD)
threads=1 threads=2 threads=4 threads=8
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
QD=1 2635 10 0.38 6956 27 0.29 14437 56 0.28 26724 104 0.30
QD=2 6088 24 0.33 13988 55 0.29 26643 104 0.30 49155 192 0.33
QD=4 12826 50 0.31 26127 102 0.31 47915 187 0.33 63982 250 0.50
QD=8 24561 96 0.33 47682 186 0.34 61912 242 0.52 66513 260 0.96
QD=16 45080 176 0.36 64505 252 0.49 68291 267 0.94 65743 257 1.95
QD=32 63239 247 0.51 65038 254 0.99 65670 257 1.95 67785 265 3.78
QD=64 68207 266 0.94 68190 266 1.88 67623 264 3.79 67852 265 7.55
QD=128 68506 268 1.87 68315 267 3.75 67449 263 7.59 67596 264 15.15

VMware Virtual Disk 264GB (RAW) over vsanDatastore 4k random read (FIO)

VMware Virtual Disk 264GB (RAW) over vsanDatastore – 4k random read (FIO)
threads=1 threads=2 threads=4 threads=8
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
QD=1 3427 13 0.28 7363 29 0.26 14841 58 0.26 27375 107 0.28
QD=2 6948 27 0.28 14607 57 0.26 27189 106 0.28 50154 196 0.31
QD=4 13813 54 0.28 27095 106 0.28 47895 187 0.32 64176 251 0.49
QD=8 25251 99 0.31 50858 199 0.30 62356 244 0.50 67370 263 0.94
QD=16 49803 195 0.31 67200 263 0.46 66784 261 0.95 67501 264 1.89
QD=32 62339 244 0.46 68106 266 0.92 66974 262 1.90 69237 270 3.69
QD=64 67339 263 0.93 69254 271 1.83 66991 262 3.81 68644 268 7.45
QD=128 66587 260 1.90 69169 270 3.69 68420 267 7.47 68568 268 14.92

Mini-conclusion

With the VM configuration in mind, we decided to use the following test utility parameters: threads=4 and Outstanding I/O=16.

Test time!

Measurements in this article were performed just in the same way as we always do. First, we created a lonely VM in the cluster. Then, after measuring its performance, we cloned it to another node and benchmarked the performance again. We stopped when there were 12 VMs in the cluster (we had capacity tier limitations).

Here are the test patterns under which we measured vSAN performance:

  • 4k random write
  • 4k random read
  • 64k random write
  • 64k random read
  • 8k random 70%read/30%write
  • 1M sequential read

In the screenshot below, find the menu where you make a virtual disk spanned across all drives in the host capacity tier. Select the stripe width value from the Number of disk stripes per object dropdown list.

Edit VM Storage Policy vSAN

We followed just the same methodology as before.

Results

Now, let’s see whether automatic striping can make VMware vSAN cluster performance any better. We compared the overall cluster performance gain while swarming it with VMs that had automatic striping enabled (Number of stripes per object = 1) versus performance observed for VMs with smaller virtual disk. To check whether picking stripe width automatically matters, we also plotted the overall cluster performance under Number of stripes per object = 4.

Performance VMware Virtual Disk 264GB (RAW)4k random write (IOPS)

Performance VMware Virtual Disk 264GB (RAW)4k random write (MB/s)

Performance VMware Virtual Disk 264GB (RAW)4k random read (IOPS)

Performance VMware Virtual Disk 264GB (RAW)4k random read (MB/s)

Performance VMware Virtual Disk 264GB (RAW)64k random write (IOPS)

Performance VMware Virtual Disk 264GB (RAW) 64k random write (MB/s)

Performance VMware Virtual Disk 264GB (RAW) 64k random read (IOPS)

Performance VMware Virtual Disk 264GB (RAW) 64k random read (MB/s)

Performance VMware Virtual Disk 264GB (RAW) 8k random 70% read 30% write (IOPS)

Performance VMware Virtual Disk 264GB (RAW) 8k random 70% read 30% write (MB/s)

Performance VMware Virtual Disk 264GB (RAW) 1M seq read (IOPS)

Performance VMware Virtual Disk 264GB (RAW) 1M seq read (MB/s)

Number of disk stripes per object=1
4k random write 4k random read 64k random write 64k random read 8k random 70%read/30%write 1M seq read
DiskSPD (Strite=1) FIO (Stripe=1) DiskSPD (Strite=1) FIO (Stripe=1) DiskSPD (Strite=1) FIO (Stripe=1) DiskSPD (Strite=1) FIO (Stripe=1) DiskSPD (Strite=1) FIO (Stripe=1) DiskSPD (Strite=1) FIO (Stripe=1)
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
1x VM 23060 90 2.78 24637 96 10.38 67054 262 0.95 63732 249 4.01 7065 442 9.06 7199 450 35.53 26169 1636 2.45 20028 1252 12.77 36470 285 1.75 35904 281 7.55 1128 1128 56.76 1176 1177 217.08
2x VM 37198 145 3.86 40431 158 13.41 134719 526 0.95 133074 520 3.85 10123 633 12.67 9973 624 51.31 48282 3018 2.65 40840 2553 12.53 95444 746 1.35 95021 742 5.50 2349 2349 54.49 2343 2348 217.85
3x VM 40882 160 9.32 40291 157 40.73 190574 744 1.01 189921 742 4.03 9627 602 22.02 9901 619 82.57 56913 3557 3.42 47458 2967 16.17 87505 684 2.57 86739 678 10.92 3437 3437 57.11 3023 3030 259.44
4x VM 35901 140 9.76 34362 134 34.61 181728 710 1.45 178676 698 5.91 8728 546 31.43 9584 600 110.26 64296 4019 3.99 56331 3521 18.24 80454 629 3.80 77150 603 15.27 3817 3817 67.04 3507 3515 292.31
5x VM 40788 159 10.35 37675 147 40.61 163135 637 2.19 160324 626 8.79 9569 598 36.31 10777 674 134.09 61581 3849 5.34 59155 3698 22.09 78159 611 4.52 79372 620 18.58 4071 4071 84.82 3708 3720 353.61
6x VM 39021 152 10.68 42533 166 40.27 222893 871 2.00 220477 861 7.81 11138 696 36.13 12294 769 132.82 80298 5019 5.07 76386 4775 20.97 97071 758 4.28 101707 795 17.71 5367 5367 75.60 4727 4735 334.48
7x VM 40774 159 11.19 42849 167 42.55 228148 891 2.13 222568 869 8.90 11614 726 38.70 12811 802 140.61 88483 5530 5.13 77367 4836 23.57 95997 750 4.72 103794 811 19.10 5561 5561 83.38 5070 5085 365.28
8x VM 41249 161 12.44 43041 168 48.76 234629 917 2.38 231746 905 9.76 12178 761 43.16 13665 855 154.61 92592 5787 5.73 81075 5068 26.54 98402 769 5.46 107103 837 24.27 6005 6005 88.64 5487 5505 382.01
9x VM 41653 163 14.38 44636 174 54.24 225717 882 3 222357 869 10.90 12165 760 48.23 13598 851 171.29 91917 5745 6.33 82141 5135 28.92 96022 750 6.12 100621 786 25.78 6556 6556 92.23 5737 5757 429.29
10x VM 37169 145 17.32 41180 161 63.40 231541 904 2.84 225574 881 12.01 12232 765 52.85 13587 851 190.82 91739 5734 7.10 85756 5361 30.53 95448 746 6.75 101065 790 28.51 6371 6371 100.66 5692 5712 455.13
11x VM 40075 157 18.94 41608 163 71.52 229075 895 3.34 222686 870 13.87 11975 748 63.86 13444 841 223.56 94234 5890 7.76 75591 4726 39.89 94740 740 8.91 92708 725 40.75 6598 6598 108.04 5770 5791 490.55
12x VM 39318 154 26.51 52155 204 86.87 207064 809 3.98 226215 884 14.43 12182 761 111.24 14475 906 332.18 99624 6227 7.82 88753 5549 36.11 98455 769 9.34 95298 745 42.25 7011 7011 110.41 6349 6368 485.75
threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16

Number of disk stripes per object=4
4k random write 4k random read 64k random write 64k random read 8k random 70%read/30%write 1M seq read
DiskSPD (Strite=4) FIO (Stripe=4) DiskSPD (Strite=4) FIO (Stripe=4) DiskSPD (Strite=4) FIO (Stripe=4) DiskSPD (Strite=4) FIO (Stripe=4) DiskSPD (Strite=4) FIO (Stripe=4) DiskSPD (Strite=4) FIO (Stripe=4)
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
1x VM 10882 43 5.88 27729 108 9.22 77928 304 0.82 76944 301 3.31 5606 350 11.41 7147 447 35.79 26255 1641 2.44 21358 1335 11.97 30061 235 2.13 38773 303 7.02 1185 1185 53.99 1180 1181 216.43
2x VM 30136 118 4.29 33372 130 15.70 141625 553 0.91 141979 555 3.61 9176 574 14.00 10642 665 48.09 50711 3169 2.52 41282 2580 12.39 62769 490 2.22 78885 616 6.88 2319 2319 55.21 2332 2337 218.84
3x VM 38675 151 6.32 48687 190 17.12 201538 787 0.96 201356 787 3.82 11407 713 17.05 14246 891 53.87 73345 4584 2.63 60505 3782 12.69 91474 715 2.24 111800 874 7.35 3467 3467 55.40 3540 3547 216.42
4x VM 36058 141 12.04 43090 168 30.24 213695 835 1.21 222288 868 4.72 10072 630 26.75 12199 763 86.64 83958 5247 3.06 77330 4833 13.26 69020 539 4.48 100852 788 12.63 4954 4954 51.89 4862 4871 210.18
5x VM 39559 155 12.45 42180 165 40.79 208558 815 1.60 213374 834 6.23 10569 661 30.94 11728 734 109.54 80985 5062 4.18 75904 4745 17.79 81341 635 4.71 89937 703 17.88 4955 4955 66.64 4835 4845 272.85
6x VM 31091 121 15.83 38031 149 48.05 183026 715 2.32 187335 732 9.10 9221 576 43.64 10490 656 147.26 72914 4557 5.71 68877 4306 23.83 64275 502 6.47 69095 540 24.64 5912 5912 65.44 5770 5780 266.89
7x VM 41485 162 14.01 48165 188 44.28 216801 847 2.20 219075 856 8.52 11567 723 39.10 12811 802 141.42 88232 5515 5.48 87187 5450 22.20 81615 638 5.86 103556 809 22.72 6257 6257 73.64 6092 6106 301.80
8x VM 38389 150 16.27 41479 162 49.42 207099 809 2.56 213805 835 9.97 11527 720 44.68 12572 787 163.94 82473 5155 6.43 83022 5190 25.55 79576 622 7.10 90080 704 28.63 6775 6775 78.91 6361 6377 332.28
9x VM 37927 148 15.86 41937 164 55.44 201875 789 3.20 210947 824 11.47 11265 704 54.49 12448 779 191.18 92193 5762 6.66 88490 5532 26.99 90810 709 9.12 92015 719 37.01 6920 6920 86.35 6282 6301 371.70
10x VM 37177 145 17.42 45327 177 58.72 233345 912 2.91 231514 904 11.76 11815 738 56.54 12367 774 209.43 98666 6167 6.86 101088 6319 26.85 69844 546 9.27 79806 624 37.05 7679 7679 89.54 7542 7560 355.57
11x VM 36836 144 19.23 45141 176 63.06 228305 892 3.29 221628 866 14.04 11570 723 62.17 12436 779 226.40 99177 6199 7.72 96673 6043 31.48 73860 577 10.69 79370 620 40.08 7880 7880 95.94 7565 7585 396.81
12x VM 37769 148 21.36 50897 199 68.47 233073 910 3.48 238762 933 14.29 11094 693 77.06 14265 893 224.48 102272 6392 8.01 85798 5364 44.17 82937 648 10.73 83835 655 41.50 8622 8622 94.73 8073 8095 397.89
threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16

Conclusion

Let’s sum up everything that we saw today and in our previous study.

As you already know, switching to a higher number of disk stripes per object can give you some more IOPS. So, just set this parameter to the value corresponding to the number of disks in the host capacity tier. Especially, this parameter is a game-changer for writing performance… but, wait, such way to boost performance is kinda “illegal”! Who gives a hwem though! Just look in the plots above to see how the overall cluster performance changes.

Good news, Automatic Striping works. Interestingly, it provides slightly better performance gain than stripe width set manually. Maybe, there is some black magic under the hood… who knows. Whatever, for virtual disks of small capacity (less than 255 GB), ALWAYS SET THE NUMBER OF DISK STRIPES PER OBJECT RIGHT!!! For massive virtual devices, the parameter is picked automatically so that VMs could exhibit the highest possible performance.

Regardless of how cool this feature is, VMware, for some reason, did not bother themselves to provide enough info on this feature. There is nothing on this feature in VMware Docs. While creating or adding a new virtual disk, its default volume is 40 GB, and stripe width is set to 1. See there is NO SIGN OF AUTOMATIC STRIPING!!! That’s actually why VMware vSAN performance uwemu by default. VMware, why are you doing this to us?

4.6/5 (10)

Please rate this