VMware vSAN is great… in messing things up! Re-investigating VMware vSAN 4-node cluster performance

Introduction

As you probably know, we’ve tested the VMware vSAN 4-node cluster previously. This time, to sort things out completely, we’ve decided to run another series of tests, but this time, with Intel SSD DC P3700 NVMe drives and 100 GbE switches. To refresh your memory, VMware claims its vSAN can “scale to tomorrow”. Houston, we’ve got a problem – our future is HWEMED UP. But OK, maybe this time, it won’t be so bad. Let’s find out.

Suspect:

VMware vSAN

To check the suspect’s crime history and previous verdict, check out our previous article here:

http://www.cultofanarchy.org/another-day-another-patient-four-node-vmware-vsan-cluster-performance-report/

Status:

Pretrial detention due to possible relapse

Mission:

Measure the 4-node VMware vSAN hyperconverged cluster scalability using vSAN datastore comprised out of 8 Intel SSD DC P370 drives. For this goal, we’ll keep increasing the vSAN datastore load by increasing the VMs number till we reach the saturation point where performance growth stops or till we hit the 12 VMs limit.

Methodology, Considerations & Milestones:

So today, we’re investigating the VMware vSAN 4-node hyperconverged cluster with ESXi6.5 Update 1 to show how awesome (read poorly) it can scale. You know what they say…these guys never lose a chance to lose a chance. This time we’ve upgraded our setup with Intel SSD DC P3700 drives and 100 GbE switches to make the game fair. The use of Intel SSD DC P3700 for a virtual storage pool should increase the VMs I/O performance for any hypervisor. Therefore, by combining, the disks into a single pool, we should allow a hypervisor to effectively load-balance the input-output operations among all the VMs running in the cluster. Add the hypervisor’s system scalability, and we should get a highly available cluster where the increased number of VMs (proportional to the number of disks in the pool dedicated to VMs) should (at least) keep the same VMs I/O performance level. Theoretically, the increase of VMs should lead to the overall performance growth.

Testing methodology

1. Measure if Intel SSD DC P3700 2TB “raw” performance on Windows Server 2016 correspond the values claimed by a vendor.

2. Deploy a 4-node VMware vSAN cluster with vSAN datastore built out of eight Intel SSD DC P3700 2TB drives

3. Create a Windows Server 2016 VM keeping it pinned to the ESXi host. The test VMware 256GB Virtual Disk (VMware Paravirtual SCSI) is used as a second hard drive for a VM.

4. Determine the optimal number of threads and Outstanding I/O value for launching DiskSPD and FIO utilities

5. Run a VMware 256GB Virtual Disk performance test on a single VM

6. Clone the VM to the second ESXi host and repeat the performance test of VMware 256GB Virtual Disk on all VMs simultaneously.

7. Repeat the step 6 (clone VM keeping it pinned to the next node and measure the overall performance) till we hit the saturation point and performance growth stop or till we reach the 12 VMs limit.

8. Test one Intel SSD DC P3700 2TB drive performance on Windows Server 2016. We’ll compare its results with the performance of VMware vSAN.

The toolkit

Hardware

The setup configuration for measuring the “raw” Intel SSD DC P3700 performance in Windows Server 2016 environment.

Node: Dell R730 chassis, 2x Intel Xeon E5-2683 v3 CPUs @ 2.00 GHz, 64GB RAM

Storage: 1x Intel SSD DC P3700 2TB NVMe

OS: Windows Server 2016

The 4-node VMware vSAN cluster configuration:

4 ESXi hosts: Dell R730 chassis, 2x Intel Xeon E5-2683 v3 CPUs @ 2.00 GHz, 64GB RAM

Storage: 2x Intel SSD DC P3700 2TB, 1x Intel SSD DC S3500 480GB

LAN: 2x Mellanox ConnectX-4 100GbE/s CX465A

Hypervisor: VMware ESXi 6.5 Update 1

The network interconnection diagram for the VMware vSAN 4-node cluster is provided below:

VMware vSAN

Investigating if “raw” Intel SSD DC P3700 2TB NVMe performance matches the vendor-claimed value

The table below indicates performance values for P3700 drives series.

 Intel SSD DC P3700 2TB NVMe

Here is the official Intel website where we took these numbers:

https://www.intel.com/content/www/us/en/solid-state-drives/intel-ssd-dc-family-for-pcie-brief.html

As we can see, Intel SSD DC P3700 2TB drive hits 460K IOPS with 4 workers and under Queue Depth 32:

Queue Depth 32

It’s not that we don’t trust Intel, we just wanna see everything with our own eyes. So, let’s test those claimed IOPS values under the 4k random read pattern using DiskSPD v2.17 and Fio v.3.5.

Here are the results:

Performance Intel SSD DC P3700 2TB (RAW)DiskSPD (4k random read)

Performance Intel SSD DC P3700 2TB (RAW)DiskSPD (4k random read)

Mini-conclusion

Overall, our test results match the Intel SSD DC P3700 2TB drive performance characteristics claimed by the vendor. Particularly, DiskSPD showed the maximum performance of near 460K IOPS under the 4k random read pattern with 4 workers and Queue Depth 32.

Well, at least Intel doesn’t dwnnujkv us about its performance rates. What can we say about VMware vSAN? Time to find out.

Deploying and testing VMware vSAN 4-node hyperconverged cluster

Testing network throughput

Having deployed ESXi on all 4 hosts and configured the network, let’s check the network throughput between ESXi hosts using the iperf. Despite we’ve used Mellanox 100 GbE/s NICs and 100 GbE/s switches, the throughput between the ESXi hosts was only 40GbE/s.

Testing network throughput

Honestly, we did all we could…installed the latest nmlx5-core 4.16.12.12 NIC Driver for Mellanox ConnectX-4 100GbE/s CX465A NICs, united two NICs on the host into a NIC Team, changed MTU, changes switch’s port settings – nothing could help improving the throughput. It seems that VMware vSAN is rock-solid in terms of improvements resiliency.

VMware vSAN requires 1 GbE/s network throughput for hybrid configurations, but for an all-flash array on NVMe drives, it requires 10 GbE/s network throughput or higher. Anyway, our network throughput of 40 GbE/s should be enough and should not influence the VMware vSAN cluster performance. If you wanna know more about the VMware vSAN network requirements, check out their official website here.

vSAN network requirements

So we know that one port of Mellanox ConnectX-4 NIC provides the throughput of 40GbE/s or 5GB/s. Thus, two ports will give us 10GB/s. So, a single ESXi host will have a network limitation of (10GB/s*1024*1024)/4≈2620K IOPS. Two Intel SSD DC P3700 drives give us 2*460K=920K IOPS which is significantly lower anyway than the network limit. As a result – 10GB/s network throughput should be enough for running our tests. Still, this could be a bottleneck under large blocks patterns.

For creating a vSAN datastore, we need to specify disks for Cache tier and Capacity tier. Cache tier should have an SSD while Capacity tier can have SSD or HDD. Note that the architecture does not allow creating datastores without cache. In our case, we’re building an all-flash array on NVMe drives and provide one drive per each tier.

Configure vSAN

What we get is a vSAN datastore of 7.28TB over 8x Intel SSD DC P3700 2TB.

vsanDatastore

Creating a test VM

Why would you use fast SSDs for vSAN datastore anyway? Well, for variety of reasons actually. It can serve as a datastore for MS SQL. Oh, so it seems there is a real-life scenario for our vSAN, let’s take our VM configuration from one of those standard options you have for working with MS SQL provided by Azure.

Microsoft Azure

We’ve chosen the maximum configuration for our VM – A3 Standard. So what we have is…

– 4xVCPU
– RAM 7GB
– Disk0 (Type SCSI) – 25GB (for Windows Server 2016)
– Disk1 (Type SCSI) – 256GB (VMware Virtual Disk 256GB – as a second drive in each VM)

NOTE: Since virtual disks are created on vSAN datastore with Thin Provision type, we have to fill our disk with some random data to simulate it’s “normal” volume before testing. Will use the dd.exe utility for that purpose. Also, we’ll have to do this before each test when creating new virtual disk (or changing volume) inside the VM.

Here are the dd utility launching parameters:

dd.exe bs=1M if=/dev/random of=\\?\Device\Harddisk1\DR1 –progress

Let’s create the first test VM keeping in pinned to ESXi Host1 and evaluate the optimal launch parameters (number of threads and Outstanding I/O value) for DiskSPD and FIO.

Investigating VMware 256GB Virtual Disk performance on a sole Windows Server 2016 VM

At this stage, we’ll run a series of tests to measure the VMware 256GB Virtual Disk performance under the 4k random read pattern with changing number of threads and Outstanding I/O values. This is done to find the saturation point where performance growth stops and latency increases and the corresponding threads and Outstanding I/O values.

Here is the example of DiskSPD listing for threads=1, Outstanding I/O=1, 2, 4, 8, 16, 32, 64, 128

diskspd.exe -t1 -b4k -r -w0 -o1 -d60 -Sh -L #1 > c:\log\t1-o1-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o2 -d60 -Sh -L #1 > c:\log\t1-o2-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o4 -d60 -Sh -L #1 > c:\log\t1-o4-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o8 -d60 -Sh -L #1 > c:\log\t1-o8-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\t1-o16-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o32 -d60 -Sh -L #1 > c:\log\t1-o32-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o64 -d60 -Sh -L #1 > c:\log\t1-o64-4k-rand-read.txt
timeout 10
diskspd.exe -t1 -b4k -r -w0 -o128 -d60 -Sh -L #1 > c:\log\t1-o128-4k-rand-read.txt
timeout 10

Performance VMware Virtual Disk 256GB (RAW)4k random read (DiskSPD)

VMware Virtual Disk 256GB (RAW) - 4k random read (DiskSPD)

Performance VMware Virtual Disk 256GB (RAW)4k random read (FIO)

VMware Virtual Disk 256GB (RAW) - 4k random read (FIO)

Mini-conclusion

What do numbers tell? Well, that with a single VM, the maximum VMware 256GB virtual disk performance is somewhere around 70000-71000 IOPS and disk latency “affordable” 0,44-0,45 ms. We reached these values with 1, 2, and 4 threads. That’s why, in our further tests, we’ll use the following launching parameters for DiskSPD and FIO: threads=4Outstanding I/O=8.

Configuring testing tools

Now that we have optimal parameters for launching our testing utilities, we can proceed to their configuration.

Again, we use DiskSPD v2.17 and Fio v3.5.

We’re gonna run our tests under the following patterns:
– 4k random write
– 4k random read
– 64k random write
– 64k random read
8k random 70%read/30%write
1M sequential read

Here are the launching parameters for our testing tools: thread=4, Outstanding I/O=8, time=60sec

DiskSPD
diskspd.exe -t4 -b4k -r -w100 -o8 -d60 -Sh -L #1 > c:\log\4k-rand-write.txt
timeout 10
diskspd.exe -t4 -b4k -r -w0 -o8 -d60 -Sh -L #1 > c:\log\4k-rand-read.txt
timeout 10
diskspd.exe -t4 -b64k -r -w100 -o8 -d60 -Sh -L #1 > c:\log\64k-rand-write.txt
timeout 10
diskspd.exe -t4 -b64k -r -w0 -o8 -d60 -Sh -L #1 > c:\log\64k-rand-read.txt
timeout 10
diskspd.exe -t4 -b8k -r -w30 -o8 -d60 -Sh -L #1 > c:\log\8k-rand-70read-30write.txt
timeout 10
diskspd.exe -t4 -b1M -s -w0 -o8 -d60 -Sh -L #1 > c:\log\1M-seq-red.txt
FIO
[global]
numjobs=4
iodepth=8
loops=1
time_based
ioengine=windowsaio
direct=1
runtime=60
filename=\\.\PhysicalDrive1

[4k rnd write]
rw=randwrite
bs=4k
stonewall

[4k random read]
rw=randread
bs=4k
stonewall

[64k rnd write]
rw=randwrite
bs=64k
stonewall

[64k random read]
rw=randread
bs=4k
stonewall

[OLTP 8k]
bs=8k
rwmixread=70
rw=randrw
stonewall

[1M seq read]
rw=read
bs=1M
stonewall

Testing the 4-node VMware vSAN cluster performance

Here comes the main part. We’ll run the series of performance tests on VMware 256GB Virtual Disk under the variety of patterns starting with a single VM. Next, we’re gonna clone that VM keeping it pinned to the next node and repeat the test on all VMs simultaneously. We’ll keep on testing till we reach the saturation point or till we hit the 12 VMs limit.

Performance VMware Virtual Disk 256GB (RAW)4k random write, (IOPS)

Performance VMware Virtual Disk 256GB (RAW)4k random write (MB/s)

Performance VMware Virtual Disk 256GB (RAW)4k random write (MB/s)

Performance VMware Virtual Disk 256GB (RAW)4k random read, (MB/s)

Performance VMware Virtual Disk 256GB (RAW)4k random read, (MB/s)

Performance VMware Virtual Disk 256GB (RAW)64k random write, (MB/s)

Performance VMware Virtual Disk 256GB (RAW)64k random read, (IOPS)

Performance VMware Virtual Disk 256GB (RAW)64k random read, (MB/s)

Performance VMware Virtual Disk 256GB (RAW)8k random 70%read/30%write, (IOPS)

Performance VMware Virtual Disk 256GB (RAW)8k random 70%read/30%write, (IOPS)

Performance VMware Virtual Disk 256GB (RAW)1M seq read, (IOPS)

Performance VMware Virtual Disk 256GB (RAW)1M seq read, (MB/s)

Random Write

Testing a single Intel SSD DC P3700 2TB drive on Windows Server 2016

We’re gonna test a single Intel SSD DC P3700 2TB drive on Windows Server 2016 under the same patterns in order to evaluate how 8 Intel SSD DC P3700 drives united into vSAN datastore influence the system performance and compare it with the theoretical values.

The media allocated for building vSAN and united into a disk group is used completely for creating a storage system. It can’t be used simultaneously for any other purposes. The disk groups are united into a pool available to the entire vSphere cluster and create a shared “external” and fault-tolerant storage.

vSAN provides 2 ways for ensuring fault tolerance:

Fault tolerance

RAID-1 (Mirroring). By default, the number of failures to tolerate is 1, which means that data is replicated to the vSAN cluster. Sure, you won’t have much capacity left, but the performance will be higher. You need at least 2-3 hosts to tolerate one failure, but the recommended number is 4 since you can rebuild the cluster if one of the hosts goes down.

RAID-5/6 (Erasure Coding). You can do that only with an all-flash cluster configuration. This option will allow your cluster to tolerate 1 (minimum 4 hosts required) or 2 (minimum 6 hosts) failures. The recommended requirements are 5 and hosts correspondingly to have the ability to rebuild the cluster. Pros and cons? More capacity but lower performance (just to degrade those few IOPS vSAN gives even more).

In our case, vSAN Default Storage Policy has not been changed and all tests were held with default settings.

https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.virtualsan.doc/GUID-08911FD3-2462-4C1C-AE81-0D4DBC8F7990.html

About vSAN Policies

vSAN Default Storage Policy

For this matter, we can presume that:

1. Since the read operations are performed on all disks available in vSAN datastore, (both from capacity and cache tiers), the read performance equals 8x of that one disk read speed.

2. The same goes for write operations but since we’ve created vSAN datastore with Failures to tolerate – 1 failure RAID-1 (Mirroring), the theoretical performance on writes will be…((IOPS-Write-one-disk)*N)/2 where N is a numer of disks simultenously used for write operations (N=8 in our setup).

3. In case of 8k random 70%read/30%write pattern, the theoretical performance is estimated using the following formula: (IOPS-Read-one-disk*N*0.7)+((IOPS-Write-one-disk*N*0.3)/2) where N is a number of disks simultenously utilized for read/write operations (8 in our configuration).

Disk SPD

Max Performance for VMware Virtual Disk over vsanDatastore

Сomparative performance diagram (DiskSPD test utility)

Сomparative performance diagram (DiskSPD test utility)

Сomparative performance diagram (DiskSPD test utility)

Сomparative performance diagram (DiskSPD test utility)

Сomparative performance diagram (DiskSPD test utility)

Сomparative performance diagram (DiskSPD test utility)

Сomparative performance diagram (DiskSPD test utility)

Verdict and imprisonment

The testing results on vSAN datastore show that under the 4k random read pattern, the overall performance of all VMware Virtual Disks grows nicely and linearly up to…4 VMs and 222000 IOPS (oops) with an affordable 0,57ms latency. The increase of VMs number didn’t lead to performance increase. Surprised? NOPE.

Under the 4k random write pattern, the highest performance (the astonishing 32000 IOPS) was reached with 2 VMs. We’ve added more VMs (3xVM-12xVM) but it resulted in performance degrade to 14000-16000 IOPS. Degrade to yesterday!

Let’s proceed with 64k random read pattern. Performance grows linearly up to 115000 IOPS (7000 MB/s) with 6 VMs running. The increase of VMs number doesn’t lead to significant performance increase. With 12 VMs running in the cluster, performance reaches 8500 MB/s which is close to the 10GB/s network limit. Possibly, the addition of new VMs would give more IOPS but only till the mentioned network limit.

As per 64k random write…Lord help us…it’s actually hard to highlight performance drops or increases. The results for DiskSPD and Fio are varying in certain places for 2000 IOPS (157 MB/s), but the overall performance for 1xVM-12xVM is distributed almost equally – DiskSPD – 5200-7400 IOPS (356-465 MB/s), Fio – 6300-8800 IOPS (395-513 MB/s).

Think that’s ujkvty performance? Take a look at 8k random 70%read/30% pattern. vSAN datastore shows highest performance with 2 VMs. The more VMs we add, the lower the performance gets…

The test results show that under the 1M seq read pattern, performance grows linearly up to 9300-9700 MB/s on 10 VMs< which is extremely close to the 10GB/s network limit.

If we compare the performance test result of a single Intel SSD DC P3700 2TB disk on Windows Server 2016 and VMware Virtual Disks on vSAN datastore (you can hear VMware guys screaming “No, don’t do that!”), we can see that VMware vSAN datastore performance is lower than that of even a single Intel SSD DC P3700 2TB disk.

But, we have a king of the hill (hill of etcr of course) – 1M seq read pattern. VMware vSAN datastore beats a single Intel SSD DC P3700 2TB drive in unbelievable three times in terms of performance.

OK, to sum things up, the total performance of all Virtual Disks is:

2%(DiskSPD), 2%(Fio) of the estimated theoretical value under the 4k random write pattern

7%(DiskSPD), 9%(Fio) of the estimated theoretical value under the 4k random read pattern

6%(DiskSPD), 6%(Fio) of the estimated theoretical value under the 64k random write pattern

30%(DiskSPD), 46%(Fio) of the estimated theoretical value under the 64k random read pattern

2%(DiskSPD), 4%(Fio) of the estimated theoretical value under the 8k random 70%read/30%write pattern

38%(DiskSPD), 36%(Fio) of the estimated theoretical value under the 1M seq read pattern

And finally, our Diagram of shame shows how disappointing the overall performance of Virtual Disks is (“extent of disappointment” stands here for how many times the overall Virtual Disks performance is lower than we expected). Ruling: replace that hwemkpi meaningless “scale to tomorrow” with “degrade to yesterday”. VMware vSAN will remain the same piece of etcr but, hey, at least the marketing will be 100% true.

diagram of shame

4.67/5 (43)

Please rate this