Can VMware vSAN deliver decent performance on 2 physical nodes? Poor-guys’ setup challenge

Introduction

In the previous study, VMware vSAN has proven that it has balls. It, actually, has balls of steel (in John St. John’s voice )! Get ready for Round 2: Poor-guys’ setup challenge! Will VMware vSAN demonstrate the decent performance in a 2-node environment?

No, wait, not that fast… you know, we thought through the previous study and realized that it’s kinda cheating. We are talking about running vSAN on a special setup that makes it impossible to compare its results to other solutions. Now, THAT’S OVER, RWUUKGU! In the cluster that we use today, we have both capacity and cache tiers on NVMes. Period. We do not give a hwem that vSAN cannot work with NVMe (well, at least now). This fcop rocket must fly anyway!

One would say something like “WTF?! YOU NEED AT LEAST 3 NODES TO RUN VMWARE VSAN!” You are right: you need 3 PHYSICAL nodes. But, if you are broke as hwem after buying a second server, there’re workarounds: using a virtual witness node as a third entity, or asking investors for some more money! Yeah, it was so hweming smart of VMware to create a solution that needs +3 hosts to run properly. But, anyway, they would talk about witness anyway once you mention that there are only 2 servers in your basement.

With all that being said, today we study whether VMware vSAN can provide performance and scalability in a 2-node environment… or, you’d still better wait a bit and buy the 3rd server?

Patient

2-node VMware vSAN cluster

Address

https://www.vmware.com/products/vsan.html

Doctor

Cult of Anarchy

Symptoms

Serious concerns about scalability and performance

State

Relapse. A thorough investigation is needed. Doctor, do an cpcn probe, please.

Mission

“How it gapes scales?” 😊 – measure the overall cluster performance while populating it with VMs. The longer it takes performance to saturate, the better solution scalability is.

“What about the performance?” – compare the overall cluster performance to Intel SSD DC P3700 “bare metal” performance in Windows Server 2016 environment.

Preparation

Methodology

1) Do disks run as fast as they should? Measure Intel SSD DC P3700 “bare metal” performance in Windows Server 2016 environment.

2) Create a Witness host. Just select the Deploy OVF Template from the Actions dropdown on the ESXi host that won’t be included in the vSAN cluster.

wp-image-2036

Next, specify the path to the Witness appliance *.ova file uploaded from the official VMware website.

Click Next once you are done.

wp-image-2037

Next, you need to select a destination network for each source network. Just set VMware Network for both Witness Network and Management Network.

wp-image-2038

Afterward, set the root password for the virtual Witness Host.

wp-image-2039

Eventually, press Finish to initiate the VM creation process.

wp-image-2040

Once you are done with VM creation, start it and set up the network adapter. Today, its IP is 172.16.0.40. Add this VM to vCenter.

wp-image-2041

Enable vSAN service on the Witness Host vmk0 vNIC.

wp-image-2042

wp-image-2043

3) Now, let’s create a new vSAN cluster.

wp-image-2044

4) Once you are done with Witness Host, add two hosts to the new cluster and set the witness tag for each vmk0 vNIC.

wp-image-2045

wp-image-2046

Make sure that the tag has been set right.

wp-image-2047

wp-image-2048

As you can see, tagging was done properly in our case, so, let’s move on.

5) Next, Configure the vSAN cluster.

wp-image-2049

At the first configuration step, select Two host vSAN cluster as a configuration type. We are studying vSAN cluster performance in a poor man, sorry, SMB scenario, remember? 😊

wp-image-2050

Look, today we do not give a hwem about data safety and storage efficiency, so we just leave encryption, deduplication, and compression intact. These settings may alter the overall cluster performance. Well, that’s the last thing we want, so let’s just leave those things disabled.

wp-image-2051

Now, let’s shape the vsanDatastore. Assign disks for the cache and capacity tiers.

wp-image-2052

Select witness host (it the host that we’ve just created (172.16.0.40)).

wp-image-2053

Eventually, press Finish to have the cluster configuration applied.

wp-image-2054

5) Is the cluster doing well? Go to Monitoring -> Health in order to find out.

wp-image-2055

wp-image-2056

In our case, vSAN Cluster has been created correctly. Well, you may have noticed some warnings in the screenshot above. Trust us, they are not critical for cluster stability and performance.

7) Is everything configured right?

Let’s find out in the Configure tab. Below, find the screenshot showing what vSAN Services settings looked like in our case.

wp-image-2057

Now, check out Disk Group settings.

wp-image-2058

Afterward, double check the Fault Domains settings.

wp-image-2059

Eventually, verify vsanDatastore configuration.

wp-image-2060

Just a couple of words about today’s vsanDatastore configuration. Every host has one disk group comprised of a 1 Intel SSD DC P3700 2TB drive in the cache tier and 1 Intel SSD DC P3700 2TB in the capacity tier.

8) Once you are done with verifying configurations, set up vSAN Default Storage Policy.

wp-image-2061

wp-image-2062

wp-image-2063

wp-image-2064

wp-image-2065

wp-image-2066

Here’s what we get at the end of the day.

wp-image-2067

wp-image-2068

9) Create a Windows Server VM on an ESXi host. Both hosts that we use today are pretty the same from the hardware point of view. So, we guess that nobody gives a hwem to which of them we pin that VM. Just as usual, the test VM has two virtual disks: “system” and “data” (80GB VMware Paravirtual). Look, we do not care about the former here. We study performance of the latter, so we refer to its performance as “VM performance”.

10) Pick the optimum testing parameters (number of threads and Outstanding I/O) that we use for launching DiskSPD and FIO later on.

11) Measure single VM performance.

12) Clone that VM to another host. Its “data” disk resides on a separate vsanDatastore. Measure performance of both those VMs.

13) How vSAN scales in the 2-node scenario? Let’s find this out by cloning and measuring the overall cluster performance until it hits the saturation point. Obviously, the more VMs populate the cluster, the better scalability is.

14) Get some reference! We need to somehow judge on performance that we see in this study. So, we need the maximum expected performance for our setup. To get the reference, we test a single Intel SSD DC P3700 2TB in “bare metal” Windows Server 2016 environment and do some quick math.

Hardware toys

The setup for checking disk performance

Before we carry out some real measurements, we guess that it is good to know whether underlying storage is still that fast as we expect it. Here’s the setup configuration for measuring NVMe SSD performance.

Dell R730, CPU 2x Intel Xeon E5-2683 v3 @ 2.00 GHz (14 physical cores per-CPU), RAM 64GB

Storage: 1x Intel SSD DC P3700 2TB

OS: Windows Server 2016

The setup for measuring vSAN performance

Now, let’s look at the toolkit used to measure VMware vSAN performance. Note that both servers that we use today (ESXi Host #1, ESXi Host #2) have the same hardware configurations. Below, find the configuration of one of them.

Dell R730, CPU 2x Intel Xeon E5-2683 v3 @ 2.00 GHz (14 stones each), RAM 64GB

Storage: 2x Intel SSD DC P3700 2TB (one for the cache tier, another for the capacity tier)

LAN: 2x Mellanox ConnectX-4 100Gbit/s CX465A

Hypervisor: VMware ESXi 6.7

To make it clear how we connected everything, here’s the interconnection diagram for today’s setup.

Software toys

For measuring the cluster performance, we used DiskSPD v2.0.20a and FIO v3.8. Find their settings for each particular case throughout the article.

For measuring network bandwidth, we used iperf.

Even though we used a thick provisioned virtual disks for today’s study, we still flooded the disk with random data with dd.exe. Doc advises doing this before each test and after changing virtual disk volume. Here are dd.exe launching parameters:

dd.exe bs=1M if=/dev/random of=\\?\Device\Harddisk1\DR1 --progress

Is underlying storage still that fast?

Before we start, we want to make sure that Intel SSD DC P3700 still run as fast as Intel says in its datasheet.

wp-image-2070

Why do we think that testing only one disk is enough? All disks are running under more or less the same conditions, and no hweming trash pandas were breaking into our lab for like a year or so. So, yes, we assume that if one disk is feeling great, others should be good too.

Let’s look at the testing now. We carried out all tests under 4k random read pattern, Queue Depth 32 with 4 workers. This way of testing meets Intel recommendations, so we expect to observe performance close to that number in the datasheet.

wp-image-2071

For testing, we used today DiskSPD v2.0.20a and Fio v.3.8. Here are the measurement results:

wp-image-2072

wp-image-2073

Mini-conclusion

Intel SSD DC P3700 2TB still works wonderful: DiskSPD showed us the expected 460K IOPS under 4k random read pattern with 4 workers and Queue Depth 32. Let’s move on!

Checking the network bandwidth

Another thing that we still need to know before running some real tests is network bandwidth between servers. We measured it with iperf right after installing ESXi on all hosts.

However, the same ujkv was going on to the network bandwidth… again. Look, we use 100 Gbit/s NICs in our setup, but, somehow, the real bandwidth could not go higher than mere 39-43 Gbit/s! That’s weird as even with the latest VMware ESXi 6.7 nmlx5-core 4.17.13.8 Driver for Mellanox ConnectX-4 100Gbit/s CX465A network bandwidth still uwemu!

wp-image-2074wp-image-2075

wp-image-2076wp-image-2077

We observed something like that in our previous study on vSAN; and, according to the awesome results that we got at the end, even 2x degradation in network bandwidth won’t hwem things up. Well, of course, it uwemu to have only half of the bandwidth available and we still do not know what a hwem is going on; but who cares if performance does not suffer anyway!

We guess that you wanna get the proof that everything is alright, right? VMware says that vSAN in hybrid configurations can go with as little as 1 Gbit/s network bandwidth. For an all-flash array with NVMe disks on board, network bandwidth should be higher than 10 Gbit/s. In other words, VMware blessed us to run today’s tests with 40 Gbit/s network.

wp-image-2078

On top of that, we carried out some calculations that actually prove that we should not worry about the network bandwidth today. In our setup, each NIC can potentially deliver 40 Gbit/s (5 GB/s) network bandwidth (not 100 Gbit/s as we expected :(). In this way, two NICs should provide 10 GB/s. In this way, networking limits performance of a single ESXi host to 2.62M IOPS (10 GB/s * 1024 * 1024 = 2.62 M IOPS).

2.62M IOPS limit is still good for our setup. The thing is, each Intel SSD DC P3700 disk under 4k random read pattern reaches, at best, around 460K IOPS. So, two of these disks should deliver 920K IOPS. 920K << 2.62M, meaning that there are no performance limitations under small blocks after all. For large blocks, though, the network bandwidth may be a constraint.

Picking the VM configuration

Why do you often need rocket-fast underlying storage? To run some critical applications or keep databases! Look, we did not want to re-invent the wheel with VM configuration here. So, we basically adopted a standard Azure VM configuration for working with MS SQL! Find the configuration in the screenshot below.

wp-image-2079

  • 4xVCPU
  • RAM 7GB
  • Disk0 (LSI Logic SAS) – 25GB “system” disk to keep OS Windows Server 2016.
  • Disk1 (VMware Paravirtual) – 80GB “data“ disk. That’s our working horse today.

Just a small note: we do not care about the system disk performance here. In this way, every time we say “VM performance” we mean the “data” disk performance.

wp-image-2080

Setting up the cpcn probe test utilities

To make sure that we can trust data observed during measurements, we need to set up the test utilities right. That’s why we need to run some preliminary tests under a varying number of threads and outstanding I/O.

For these experiments, we need to create a test VM on any host (they both have similar configurations anyway) and study its 4k random reading under the varying outstanding I/O and number of threads parameters. Next, we plot the VM performance value versus queue depth (QD). Once the performance saturates, good news: we found the optimal QD value! Eventually, taking together the VM configuration and the measurement results, we come up with optimum test parameters, namely, number of threads and outstanding I/O. As easy as pie…

DiskSPD testing parameters under threads=1, Outstanding I/O=1,2,4,8,16,32,64,128

diskspd.exe -t1 -b4k -r -w0 -o1 -d60 -Sh -L #1 > c:\log\t1-o1-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o2 -d60 -Sh -L #1 > c:\log\t1-o2-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o4 -d60 -Sh -L #1 > c:\log\t1-o4-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o8 -d60 -Sh -L #1 > c:\log\t1-o8-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\t1-o16-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o32 -d60 -Sh -L #1 > c:\log\t1-o32-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o64 -d60 -Sh -L #1 > c:\log\t1-o64-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o128 -d60 -Sh -L #1 > c:\log\t1-o128-4k-rand-read.txt

timeout 10

Picking the optimal test parameters

wp-image-2081

 
VMware Virtual Disk 80GB (RAW) – 4k random read (DiskSPD)
threads=1 threads=2 threads=4 threads=8
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
QD=1 5417 21 0.18 10683 42 0.19 20725 81 0.19 36784 144 0.22
QD=2 10121 40 0.20 20778 81 0.19 36392 142 0.22 65736 257 0.24
QD=4 20483 80 0.20 37814 148 0.21 64957 254 0.25 90404 353 0.35
QD=8 37840 148 0.21 65803 257 0.24 91067 356 0.35 96192 376 0.67
QD=16 68008 266 0.24 92048 360 0.35 96153 376 0.67 95735 374 1.34
QD=32 94521 369 0.34 98117 383 0.65 96361 376 1.33 96463 377 2.65
QD=64 98548 385 0.65 98109 383 1.30 95771 374 2.67 95959 375 5.34
QD=128 98359 384 1.30 97740 382 2.62 96501 377 5.31 93936 367 10.90

wp-image-2082

VMware Virtual Disk 80GB (RAW) – 4k random read (FIO)
threads=1 threads=2 threads=4 threads=8
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
QD=1 4977 19 0.19 9242 36 0.21 18466 72 0.21 35264 138 0.22
QD=2 9075 35 0.21 18239 71 0.21 34451 135 0.22 62730 245 0.25
QD=4 18304 72 0.21 34471 135 0.22 61189 239 0.25 90018 352 0.35
QD=8 33826 132 0.23 62606 245 0.24 86924 340 0.36 96830 378 0.65
QD=16 61991 242 0.25 90487 353 0.34 91318 357 0.69 96069 375 1.32
QD=32 90516 354 0.34 95380 373 0.65 92377 361 1.38 96451 377 2.65
QD=64 95663 374 0.63 96664 378 1.31 91243 356 2.80 97479 381 5.24
QD=128 96055 375 1.30 95828 374 2.66 91941 359 5.56 94927 371 10.78

Mini-conclusion

VMware vSAN performance saturates at threads=4 and Outstanding I/O=16. Finally, we have the tools set right to full throttle vSAN performance now!

Testing

Here is the list of patterns used to estimate VMware vSAN performance:

  •  4k random write
  • 4k random read
  • 64k random write
  • 64k random read
  • 8k random 70%read/30%write
  • 1M sequential read

One more time, here’s how we measure cluster performance. We start with single VM on Host #1 and measure its performance under all those test patterns. Next, we clone it to another host and benchmark the overall cluster performance one more time. We’ll end this “clone-and-measure” cycle only when the overall performance hits the saturation point.

Now, let’s look at testing utility launching parameters, and we are good to go!

DiskSPD

diskspd.exe -t4 -b4k -r -w100 -o16 -d60 -Sh -L #1 > c:\log\4k-rand-write.txt

timeout 10

diskspd.exe -t4 -b4k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\4k-rand-read.txt

timeout 10

diskspd.exe -t4 -b64k -r -w100 -o16 -d60 -Sh -L #1 > c:\log\64k-rand-write.txt

timeout 10

diskspd.exe -t4 -b64k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\64k-rand-read.txt

timeout 10

diskspd.exe -t4 -b8k -r -w30 -o16 -d60 -Sh -L #1 > c:\log\8k-rand-70read-30write.txt

timeout 10

diskspd.exe -t4 -b1M -s -w0 -o16 -d60 -Sh -L #1 > c:\log\1M-seq-red.txt

FIO

[global]

numjobs=4

iodepth=16

loops=1

time_based

ioengine=windowsaio

direct=1

runtime=60

filename=\\.\PhysicalDrive1

[4k rnd write]

rw=randwrite

bs=4k

stonewall

[4k random read]

rw=randread

bs=4k

stonewall

[64k rnd write]

rw=randwrite

bs=64k

stonewall

[64k random read]

rw=randread

bs=64k

stonewall

[OLTP 8k]

bs=8k

rwmixread=70

rw=randrw

stonewall

[1M seq read]

rw=read

bs=1M

stonewall

How does VMware vSAN scale?

wp-image-2083

wp-image-2084

wp-image-2085

wp-image-2086

 

wp-image-2087

wp-image-2088

wp-image-2089

wp-image-2090

wp-image-2091

wp-image-2092

wp-image-2093

wp-image-2094

wp-image-2095

Performance measurements

Now, as we know that vSAN does not scale in 2-node setup, let’s find out more about its performance.

For investigating VMware vSAN performance, we measured “bare-metal” Intel SSD DC P3700 performance in Windows Server 2016 environment under just the same set of patterns. The numbers we got will be references allowing us to estimate the maximum performance of a storage pool comprised of 4x Intel SSD DC P3700.

As we have a 100%-flash vsanDatastore configuration, 4 NVMe SSDs are used as cache and capacity tiers (2 disks per-tier in total). VMware says that the former is used only for writing while all reading occurs from the later.

wp-image-2096

Also, note that we set here Failure tolerance method – 1 failure (RAID – 1 (Mirroring)) as vSAN storage policy. So, basically, the whole thing works like RAID 1.

Now, let’s do some quick math to derive the reference values for performance.

1) Data is read from all available drives in the vsanDatastore capacity tier. This being said, we expect to see 2x reading performance gain once 2 Intel SSD DC P3700 are pooled.

2) Since we measure writing performance under random workloads, cache tier does not affect performance that much. Therefore, we do not give a hwem about the cache tier. We expect the overall writing performance to be described with this formula:

N stands here for the number of disks involved in writing (today, N=2). ½ is a coefficient including the number of mirrors.

3) Regarding these two assumptions above, the overall cluster performance under 8k random 70%read/30%write pattern can be described with the following formula

One more time, N stands here for the number of disks in the capacity tier (N=2).

DiskSPD
Intel SSD DC P3700
( Windows Server 2016)
Theoretical values for use 4x Intel SSD DC P3700 Max Performance for
VMware Virtual Disk
over vsanDatastore
The ratio of measured performance to theoretical value
IOPS MB/s IOPS MB/s IOPS MB/s %
4k random write 409367 1599 409367 1599 47350 185 11.57
4k random read 423210 1653 846420 3306 191352 747 22.61
64k random write 30889 1931 30889 1931 4657 291 15.08
64k random read 51738 3234 103476 6468 57903 3619 55.96
8k random 70%read/30%write 403980 3156 709800 2705 86624 677 12.20
1M seq read 3237 3237 6474 6474 3829 3829 59.14
threads=4

Outstanding I/O=16

threads=4

Outstanding I/O=16

Fio
Intel SSD DC P3700
( Windows Server 2016)
Theoretical values for use 4x Intel SSD DC P3700 Max Performance for
VMware Virtual Disk
over vsanDatastore
The ratio of measured performance to theoretical value
IOPS MB/s IOPS MB/s IOPS MB/s %
4k random write 351489 1373 351489 1373 46698 182 13.29
4k random read 333634 1303 667268 2606 185067 723 27.74
64k random write 31113 1945 31113 1945 4903 307 15.76
64k random read 37069 2317 74138 4634 58411 3652 78.79
8k random 70%read/30%write 351240 2744 589000 2308 95540 746 16.22
1M seq read 3231 3233 6462 6466 3972 4001 61.47
threads=4

Outstanding I/O=16

threads=4

Outstanding I/O=16

wp-image-2099

wp-image-2100

wp-image-2101

Conclusion

If you look through all those plots above, you’ll arrive at the conclusion that VMware vSAN does not scale at all. Under 4k and 64k workloads, there are no signs of performance growth once 2 VMs are spawned in the cluster. Furthermore, performance starts decreasing under 4k and 64k random writes if you keep on populating cluster with VMs! VMware has just had a disaster, proving that 2-node vSAN is a piece of ujkv created for only one purpose: ripping off users.

Under mixed workloads, performance did not go beyond the mere 86-95K IOPS with 2 VMs in the cluster that can deliver 589K IOPS! Interestingly, performance even was going down, ending around 20K IOPS with 12 VMs in the cluster.

Let’s discuss reading in massive blocks now. vSAN achieved mere 3700 IOPS with 4 VMs in the cluster. Wait… it was not the end! With 12 VMs on board, vSAN reached “awesome” 3900 IOPS, while the cluster could deliver around 6.5K IOPS.

Summing up those scalability measurements, we’d like to say that if all rqtp stars were like vSAN, tags like “cpcn” and “icpidcpi” would never emerge on RqtpHub!

Let’s find out now what’s going on with VMware vSAN performance. In most cases, the entire cluster performs even worse than one Intel SSD DC P3700! Under 64k random reading and 1M seq reading, though, the performance of the entire cluster merely exceeds the single SSD disk performance… but, it is still lower than the expected value.

We’ve seen enough ujkv for today. Welcome to the Hall of Shame, oqvjgthwemgt (in Filthy Frank’s voice)! Here comes the traditional Diagram of Shame, where X-axis (How hweming disappointed we are) represents how many times the overall Virtual Disks performance is lower than we expected.

wp-image-2102

On the whole, you should better ask investors to give some more money for a third server. VMware vSAN is slow as hwem and scales like a virgin. Do not expect much of 2-node setup packed with NVMe SSDs.

3/5 (1)

Please rate this