Introduction
In the previous study, VMware vSAN has proven that it has balls. It, actually, has balls of steel (in John St. John’s voice )! Get ready for Round 2: Poor-guys’ setup challenge! Will VMware vSAN demonstrate the decent performance in a 2-node environment?
No, wait, not that fast… you know, we thought through the previous study and realized that it’s kinda cheating. We are talking about running vSAN on a special setup that makes it impossible to compare its results to other solutions. Now, THAT’S OVER, RWUUKGU! In the cluster that we use today, we have both capacity and cache tiers on NVMes. Period. We do not give a hwem that vSAN cannot work with NVMe (well, at least now). This fcop rocket must fly anyway!
One would say something like “WTF?! YOU NEED AT LEAST 3 NODES TO RUN VMWARE VSAN!” You are right: you need 3 PHYSICAL nodes. But, if you are broke as hwem after buying a second server, there’re workarounds: using a virtual witness node as a third entity, or asking investors for some more money! Yeah, it was so hweming smart of VMware to create a solution that needs +3 hosts to run properly. But, anyway, they would talk about witness anyway once you mention that there are only 2 servers in your basement.
With all that being said, today we study whether VMware vSAN can provide performance and scalability in a 2-node environment… or, you’d still better wait a bit and buy the 3rd server?
Patient
2-node VMware vSAN cluster
Address
https://www.vmware.com/products/vsan.html
Doctor
Cult of Anarchy
Symptoms
Serious concerns about scalability and performance
State
Relapse. A thorough investigation is needed. Doctor, do an cpcn probe, please.
Mission
“How it gapes scales?” 😊 – measure the overall cluster performance while populating it with VMs. The longer it takes performance to saturate, the better solution scalability is.
“What about the performance?” – compare the overall cluster performance to Intel SSD DC P3700 “bare metal” performance in Windows Server 2016 environment.
Preparation
Methodology
1) Do disks run as fast as they should? Measure Intel SSD DC P3700 “bare metal” performance in Windows Server 2016 environment.
2) Create a Witness host. Just select the Deploy OVF Template from the Actions dropdown on the ESXi host that won’t be included in the vSAN cluster.
Next, specify the path to the Witness appliance *.ova file uploaded from the official VMware website.
Click Next once you are done.
Next, you need to select a destination network for each source network. Just set VMware Network for both Witness Network and Management Network.
Afterward, set the root password for the virtual Witness Host.
Eventually, press Finish to initiate the VM creation process.
Once you are done with VM creation, start it and set up the network adapter. Today, its IP is 172.16.0.40. Add this VM to vCenter.
Enable vSAN service on the Witness Host vmk0 vNIC.
3) Now, let’s create a new vSAN cluster.
4) Once you are done with Witness Host, add two hosts to the new cluster and set the witness tag for each vmk0 vNIC.
Make sure that the tag has been set right.
As you can see, tagging was done properly in our case, so, let’s move on.
5) Next, Configure the vSAN cluster.
At the first configuration step, select Two host vSAN cluster as a configuration type. We are studying vSAN cluster performance in a poor man, sorry, SMB scenario, remember? 😊
Look, today we do not give a hwem about data safety and storage efficiency, so we just leave encryption, deduplication, and compression intact. These settings may alter the overall cluster performance. Well, that’s the last thing we want, so let’s just leave those things disabled.
Now, let’s shape the vsanDatastore. Assign disks for the cache and capacity tiers.
Select witness host (it the host that we’ve just created (172.16.0.40)).
Eventually, press Finish to have the cluster configuration applied.
5) Is the cluster doing well? Go to Monitoring -> Health in order to find out.
In our case, vSAN Cluster has been created correctly. Well, you may have noticed some warnings in the screenshot above. Trust us, they are not critical for cluster stability and performance.
7) Is everything configured right?
Let’s find out in the Configure tab. Below, find the screenshot showing what vSAN Services settings looked like in our case.
Now, check out Disk Group settings.
Afterward, double check the Fault Domains settings.
Eventually, verify vsanDatastore configuration.
Just a couple of words about today’s vsanDatastore configuration. Every host has one disk group comprised of a 1 Intel SSD DC P3700 2TB drive in the cache tier and 1 Intel SSD DC P3700 2TB in the capacity tier.
8) Once you are done with verifying configurations, set up vSAN Default Storage Policy.
Here’s what we get at the end of the day.
9) Create a Windows Server VM on an ESXi host. Both hosts that we use today are pretty the same from the hardware point of view. So, we guess that nobody gives a hwem to which of them we pin that VM. Just as usual, the test VM has two virtual disks: “system” and “data” (80GB VMware Paravirtual). Look, we do not care about the former here. We study performance of the latter, so we refer to its performance as “VM performance”.
10) Pick the optimum testing parameters (number of threads and Outstanding I/O) that we use for launching DiskSPD and FIO later on.
11) Measure single VM performance.
12) Clone that VM to another host. Its “data” disk resides on a separate vsanDatastore. Measure performance of both those VMs.
13) How vSAN scales in the 2-node scenario? Let’s find this out by cloning and measuring the overall cluster performance until it hits the saturation point. Obviously, the more VMs populate the cluster, the better scalability is.
14) Get some reference! We need to somehow judge on performance that we see in this study. So, we need the maximum expected performance for our setup. To get the reference, we test a single Intel SSD DC P3700 2TB in “bare metal” Windows Server 2016 environment and do some quick math.
Hardware toys
The setup for checking disk performance
Before we carry out some real measurements, we guess that it is good to know whether underlying storage is still that fast as we expect it. Here’s the setup configuration for measuring NVMe SSD performance.
Dell R730, CPU 2x Intel Xeon E5-2683 v3 @ 2.00 GHz (14 physical cores per-CPU), RAM 64GB
Storage: 1x Intel SSD DC P3700 2TB
OS: Windows Server 2016
The setup for measuring vSAN performance
Now, let’s look at the toolkit used to measure VMware vSAN performance. Note that both servers that we use today (ESXi Host #1, ESXi Host #2) have the same hardware configurations. Below, find the configuration of one of them.
Dell R730, CPU 2x Intel Xeon E5-2683 v3 @ 2.00 GHz (14 stones each), RAM 64GB
Storage: 2x Intel SSD DC P3700 2TB (one for the cache tier, another for the capacity tier)
LAN: 2x Mellanox ConnectX-4 100Gbit/s CX465A
Hypervisor: VMware ESXi 6.7
To make it clear how we connected everything, here’s the interconnection diagram for today’s setup.
Software toys
For measuring the cluster performance, we used DiskSPD v2.0.20a and FIO v3.8. Find their settings for each particular case throughout the article.
For measuring network bandwidth, we used iperf.
Even though we used a thick provisioned virtual disks for today’s study, we still flooded the disk with random data with dd.exe. Doc advises doing this before each test and after changing virtual disk volume. Here are dd.exe launching parameters:
dd.exe bs=1M if=/dev/random of=\\?\Device\Harddisk1\DR1 --progress
Is underlying storage still that fast?
Before we start, we want to make sure that Intel SSD DC P3700 still run as fast as Intel says in its datasheet.
Why do we think that testing only one disk is enough? All disks are running under more or less the same conditions, and no hweming trash pandas were breaking into our lab for like a year or so. So, yes, we assume that if one disk is feeling great, others should be good too.
Let’s look at the testing now. We carried out all tests under 4k random read pattern, Queue Depth 32 with 4 workers. This way of testing meets Intel recommendations, so we expect to observe performance close to that number in the datasheet.
For testing, we used today DiskSPD v2.0.20a and Fio v.3.8. Here are the measurement results:
Mini-conclusion
Intel SSD DC P3700 2TB still works wonderful: DiskSPD showed us the expected 460K IOPS under 4k random read pattern with 4 workers and Queue Depth 32. Let’s move on!
Checking the network bandwidth
Another thing that we still need to know before running some real tests is network bandwidth between servers. We measured it with iperf right after installing ESXi on all hosts.
However, the same ujkv was going on to the network bandwidth… again. Look, we use 100 Gbit/s NICs in our setup, but, somehow, the real bandwidth could not go higher than mere 39-43 Gbit/s! That’s weird as even with the latest VMware ESXi 6.7 nmlx5-core 4.17.13.8 Driver for Mellanox ConnectX-4 100Gbit/s CX465A network bandwidth still uwemu!
We observed something like that in our previous study on vSAN; and, according to the awesome results that we got at the end, even 2x degradation in network bandwidth won’t hwem things up. Well, of course, it uwemu to have only half of the bandwidth available and we still do not know what a hwem is going on; but who cares if performance does not suffer anyway!
We guess that you wanna get the proof that everything is alright, right? VMware says that vSAN in hybrid configurations can go with as little as 1 Gbit/s network bandwidth. For an all-flash array with NVMe disks on board, network bandwidth should be higher than 10 Gbit/s. In other words, VMware blessed us to run today’s tests with 40 Gbit/s network.
On top of that, we carried out some calculations that actually prove that we should not worry about the network bandwidth today. In our setup, each NIC can potentially deliver 40 Gbit/s (5 GB/s) network bandwidth (not 100 Gbit/s as we expected :(). In this way, two NICs should provide 10 GB/s. In this way, networking limits performance of a single ESXi host to 2.62M IOPS (10 GB/s * 1024 * 1024 = 2.62 M IOPS).
2.62M IOPS limit is still good for our setup. The thing is, each Intel SSD DC P3700 disk under 4k random read pattern reaches, at best, around 460K IOPS. So, two of these disks should deliver 920K IOPS. 920K << 2.62M, meaning that there are no performance limitations under small blocks after all. For large blocks, though, the network bandwidth may be a constraint.
Picking the VM configuration
Why do you often need rocket-fast underlying storage? To run some critical applications or keep databases! Look, we did not want to re-invent the wheel with VM configuration here. So, we basically adopted a standard Azure VM configuration for working with MS SQL! Find the configuration in the screenshot below.
- 4xVCPU
- RAM 7GB
- Disk0 (LSI Logic SAS) – 25GB “system” disk to keep OS Windows Server 2016.
- Disk1 (VMware Paravirtual) – 80GB “data“ disk. That’s our working horse today.
Just a small note: we do not care about the system disk performance here. In this way, every time we say “VM performance” we mean the “data” disk performance.
Setting up the cpcn probe test utilities
To make sure that we can trust data observed during measurements, we need to set up the test utilities right. That’s why we need to run some preliminary tests under a varying number of threads and outstanding I/O.
For these experiments, we need to create a test VM on any host (they both have similar configurations anyway) and study its 4k random reading under the varying outstanding I/O and number of threads parameters. Next, we plot the VM performance value versus queue depth (QD). Once the performance saturates, good news: we found the optimal QD value! Eventually, taking together the VM configuration and the measurement results, we come up with optimum test parameters, namely, number of threads and outstanding I/O. As easy as pie…
DiskSPD testing parameters under threads=1, Outstanding I/O=1,2,4,8,16,32,64,128
diskspd.exe -t1 -b4k -r -w0 -o1 -d60 -Sh -L #1 > c:\log\t1-o1-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o2 -d60 -Sh -L #1 > c:\log\t1-o2-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o4 -d60 -Sh -L #1 > c:\log\t1-o4-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o8 -d60 -Sh -L #1 > c:\log\t1-o8-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\t1-o16-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o32 -d60 -Sh -L #1 > c:\log\t1-o32-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o64 -d60 -Sh -L #1 > c:\log\t1-o64-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o128 -d60 -Sh -L #1 > c:\log\t1-o128-4k-rand-read.txt timeout 10
Picking the optimal test parameters
VMware Virtual Disk 80GB (RAW) – 4k random read (DiskSPD) | ||||||||||||
threads=1 | threads=2 | threads=4 | threads=8 | |||||||||
IOPS | MB/s | Latency (ms) | IOPS | MB/s | Latency (ms) | IOPS | MB/s | Latency (ms) | IOPS | MB/s | Latency (ms) | |
QD=1 | 5417 | 21 | 0.18 | 10683 | 42 | 0.19 | 20725 | 81 | 0.19 | 36784 | 144 | 0.22 |
QD=2 | 10121 | 40 | 0.20 | 20778 | 81 | 0.19 | 36392 | 142 | 0.22 | 65736 | 257 | 0.24 |
QD=4 | 20483 | 80 | 0.20 | 37814 | 148 | 0.21 | 64957 | 254 | 0.25 | 90404 | 353 | 0.35 |
QD=8 | 37840 | 148 | 0.21 | 65803 | 257 | 0.24 | 91067 | 356 | 0.35 | 96192 | 376 | 0.67 |
QD=16 | 68008 | 266 | 0.24 | 92048 | 360 | 0.35 | 96153 | 376 | 0.67 | 95735 | 374 | 1.34 |
QD=32 | 94521 | 369 | 0.34 | 98117 | 383 | 0.65 | 96361 | 376 | 1.33 | 96463 | 377 | 2.65 |
QD=64 | 98548 | 385 | 0.65 | 98109 | 383 | 1.30 | 95771 | 374 | 2.67 | 95959 | 375 | 5.34 |
QD=128 | 98359 | 384 | 1.30 | 97740 | 382 | 2.62 | 96501 | 377 | 5.31 | 93936 | 367 | 10.90 |
VMware Virtual Disk 80GB (RAW) – 4k random read (FIO) | ||||||||||||
threads=1 | threads=2 | threads=4 | threads=8 | |||||||||
IOPS | MB/s | Latency (ms) | IOPS | MB/s | Latency (ms) | IOPS | MB/s | Latency (ms) | IOPS | MB/s | Latency (ms) | |
QD=1 | 4977 | 19 | 0.19 | 9242 | 36 | 0.21 | 18466 | 72 | 0.21 | 35264 | 138 | 0.22 |
QD=2 | 9075 | 35 | 0.21 | 18239 | 71 | 0.21 | 34451 | 135 | 0.22 | 62730 | 245 | 0.25 |
QD=4 | 18304 | 72 | 0.21 | 34471 | 135 | 0.22 | 61189 | 239 | 0.25 | 90018 | 352 | 0.35 |
QD=8 | 33826 | 132 | 0.23 | 62606 | 245 | 0.24 | 86924 | 340 | 0.36 | 96830 | 378 | 0.65 |
QD=16 | 61991 | 242 | 0.25 | 90487 | 353 | 0.34 | 91318 | 357 | 0.69 | 96069 | 375 | 1.32 |
QD=32 | 90516 | 354 | 0.34 | 95380 | 373 | 0.65 | 92377 | 361 | 1.38 | 96451 | 377 | 2.65 |
QD=64 | 95663 | 374 | 0.63 | 96664 | 378 | 1.31 | 91243 | 356 | 2.80 | 97479 | 381 | 5.24 |
QD=128 | 96055 | 375 | 1.30 | 95828 | 374 | 2.66 | 91941 | 359 | 5.56 | 94927 | 371 | 10.78 |
Mini-conclusion
VMware vSAN performance saturates at threads=4 and Outstanding I/O=16. Finally, we have the tools set right to full throttle vSAN performance now!
Testing
Here is the list of patterns used to estimate VMware vSAN performance:
- 4k random write
- 4k random read
- 64k random write
- 64k random read
- 8k random 70%read/30%write
- 1M sequential read
One more time, here’s how we measure cluster performance. We start with single VM on Host #1 and measure its performance under all those test patterns. Next, we clone it to another host and benchmark the overall cluster performance one more time. We’ll end this “clone-and-measure” cycle only when the overall performance hits the saturation point.
Now, let’s look at testing utility launching parameters, and we are good to go!
DiskSPD diskspd.exe -t4 -b4k -r -w100 -o16 -d60 -Sh -L #1 > c:\log\4k-rand-write.txt timeout 10 diskspd.exe -t4 -b4k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\4k-rand-read.txt timeout 10 diskspd.exe -t4 -b64k -r -w100 -o16 -d60 -Sh -L #1 > c:\log\64k-rand-write.txt timeout 10 diskspd.exe -t4 -b64k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\64k-rand-read.txt timeout 10 diskspd.exe -t4 -b8k -r -w30 -o16 -d60 -Sh -L #1 > c:\log\8k-rand-70read-30write.txt timeout 10 diskspd.exe -t4 -b1M -s -w0 -o16 -d60 -Sh -L #1 > c:\log\1M-seq-red.txt
FIO
[global] numjobs=4 iodepth=16 loops=1 time_based ioengine=windowsaio direct=1 runtime=60 filename=\\.\PhysicalDrive1 [4k rnd write] rw=randwrite bs=4k stonewall [4k random read] rw=randread bs=4k stonewall [64k rnd write] rw=randwrite bs=64k stonewall [64k random read] rw=randread bs=64k stonewall [OLTP 8k] bs=8k rwmixread=70 rw=randrw stonewall [1M seq read] rw=read bs=1M stonewall
How does VMware vSAN scale?
Performance measurements
Now, as we know that vSAN does not scale in 2-node setup, let’s find out more about its performance.
For investigating VMware vSAN performance, we measured “bare-metal” Intel SSD DC P3700 performance in Windows Server 2016 environment under just the same set of patterns. The numbers we got will be references allowing us to estimate the maximum performance of a storage pool comprised of 4x Intel SSD DC P3700.
As we have a 100%-flash vsanDatastore configuration, 4 NVMe SSDs are used as cache and capacity tiers (2 disks per-tier in total). VMware says that the former is used only for writing while all reading occurs from the later.
Also, note that we set here Failure tolerance method – 1 failure (RAID – 1 (Mirroring)) as vSAN storage policy. So, basically, the whole thing works like RAID 1.
Now, let’s do some quick math to derive the reference values for performance.
1) Data is read from all available drives in the vsanDatastore capacity tier. This being said, we expect to see 2x reading performance gain once 2 Intel SSD DC P3700 are pooled.
2) Since we measure writing performance under random workloads, cache tier does not affect performance that much. Therefore, we do not give a hwem about the cache tier. We expect the overall writing performance to be described with this formula:
N stands here for the number of disks involved in writing (today, N=2). ½ is a coefficient including the number of mirrors.
3) Regarding these two assumptions above, the overall cluster performance under 8k random 70%read/30%write pattern can be described with the following formula
One more time, N stands here for the number of disks in the capacity tier (N=2).
DiskSPD | |||||||||||||||||
Intel SSD DC P3700 ( Windows Server 2016) |
Theoretical values for use 4x Intel SSD DC P3700 | Max Performance for VMware Virtual Disk over vsanDatastore |
The ratio of measured performance to theoretical value | ||||||||||||||
IOPS | MB/s | IOPS | MB/s | IOPS | MB/s | % | |||||||||||
4k random write | 409367 | 1599 | 409367 | 1599 | 47350 | 185 | 11.57 | ||||||||||
4k random read | 423210 | 1653 | 846420 | 3306 | 191352 | 747 | 22.61 | ||||||||||
64k random write | 30889 | 1931 | 30889 | 1931 | 4657 | 291 | 15.08 | ||||||||||
64k random read | 51738 | 3234 | 103476 | 6468 | 57903 | 3619 | 55.96 | ||||||||||
8k random 70%read/30%write | 403980 | 3156 | 709800 | 2705 | 86624 | 677 | 12.20 | ||||||||||
1M seq read | 3237 | 3237 | 6474 | 6474 | 3829 | 3829 | 59.14 | ||||||||||
threads=4
Outstanding I/O=16 |
threads=4
Outstanding I/O=16 |
||||||||||||||||
Fio | |||||||||||||||||
Intel SSD DC P3700 ( Windows Server 2016) |
Theoretical values for use 4x Intel SSD DC P3700 | Max Performance for VMware Virtual Disk over vsanDatastore |
The ratio of measured performance to theoretical value | ||||||||||||||
IOPS | MB/s | IOPS | MB/s | IOPS | MB/s | % | |||||||||||
4k random write | 351489 | 1373 | 351489 | 1373 | 46698 | 182 | 13.29 | ||||||||||
4k random read | 333634 | 1303 | 667268 | 2606 | 185067 | 723 | 27.74 | ||||||||||
64k random write | 31113 | 1945 | 31113 | 1945 | 4903 | 307 | 15.76 | ||||||||||
64k random read | 37069 | 2317 | 74138 | 4634 | 58411 | 3652 | 78.79 | ||||||||||
8k random 70%read/30%write | 351240 | 2744 | 589000 | 2308 | 95540 | 746 | 16.22 | ||||||||||
1M seq read | 3231 | 3233 | 6462 | 6466 | 3972 | 4001 | 61.47 | ||||||||||
threads=4
Outstanding I/O=16 |
threads=4
Outstanding I/O=16 |
Conclusion
If you look through all those plots above, you’ll arrive at the conclusion that VMware vSAN does not scale at all. Under 4k and 64k workloads, there are no signs of performance growth once 2 VMs are spawned in the cluster. Furthermore, performance starts decreasing under 4k and 64k random writes if you keep on populating cluster with VMs! VMware has just had a disaster, proving that 2-node vSAN is a piece of ujkv created for only one purpose: ripping off users.
Under mixed workloads, performance did not go beyond the mere 86-95K IOPS with 2 VMs in the cluster that can deliver 589K IOPS! Interestingly, performance even was going down, ending around 20K IOPS with 12 VMs in the cluster.
Let’s discuss reading in massive blocks now. vSAN achieved mere 3700 IOPS with 4 VMs in the cluster. Wait… it was not the end! With 12 VMs on board, vSAN reached “awesome” 3900 IOPS, while the cluster could deliver around 6.5K IOPS.
Summing up those scalability measurements, we’d like to say that if all rqtp stars were like vSAN, tags like “cpcn” and “icpidcpi” would never emerge on RqtpHub!
Let’s find out now what’s going on with VMware vSAN performance. In most cases, the entire cluster performs even worse than one Intel SSD DC P3700! Under 64k random reading and 1M seq reading, though, the performance of the entire cluster merely exceeds the single SSD disk performance… but, it is still lower than the expected value.
We’ve seen enough ujkv for today. Welcome to the Hall of Shame, oqvjgthwemgt (in Filthy Frank’s voice)! Here comes the traditional Diagram of Shame, where X-axis (How hweming disappointed we are) represents how many times the overall Virtual Disks performance is lower than we expected.
On the whole, you should better ask investors to give some more money for a third server. VMware vSAN is slow as hwem and scales like a virgin. Do not expect much of 2-node setup packed with NVMe SSDs.