Concerns about performance and scalability, but the patient claims that everything is alright
Analyzing patient’s health status
VMware vSAN is an SDS platform delivering the “flash-optimized, secure, and scalable” storage for virtual machines. Well, at least, these guys say so. According to their words, the solution can be deployed on the industry-standard x86 servers and components that together make TCO up to 50 % lower as compared to the traditional storage. Also, these guys claim they deliver industry’s “first HCI-native encryption” industry-first? Seriously? How can you even proof that? Still, no one gives a fuck about that, so we do not snoop that here. Actually, we are looking for solution’s scalability and performance. As VMware puts it vSAN scales datacenter to tomorrow’s business needs…let’s check that.
As you probably know, there is only the trial version available for free (greedy of them). Regarding this fact, we’ve downloaded a 60-day trial version to do some measurements. Unfortunately, VMware allows any testing of its software only when its methodology as well as results are approved by VMware.
That can be found in their End User License Agreement (EULA) https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/downloads/eula/universal_eula.pdf :
2.4 Benchmarking. You may use the Software to conduct internal performance testing and benchmarking studies. You may only publish or otherwise distribute the results of such studies to third parties as follows: (a) if with respect to VMware’s Workstation or Fusion products, only if You provide a copy of Your study to email@example.com prior to distribution; (b) if with respect to any other Software, only if VMware has reviewed and approved of the methodology, assumptions and other parameters of the study (please contact VMware at firstname.lastname@example.org to request such review and approval) prior to such publication and distribution.
Basically, what they are saying here is…we know our software may actually show some miserable performance so don’t even think about sharing such results with public. So, nothing we can do here…but wait! Here is a good quote from Anarchy “EULA”:
We believe that users deserve to know all about the solutions they are using, and every product, be it a piece of hardware or software, should be analyzed inside-out. We are a high profile think tank guided by the sole principle of truth… and don’t giving a shit about the rules.
Well, it puts it pretty clear that we shouldn’t give a single fuck about those warnings from VMware (nothing personal, any other vendor might be here). And off we go!
What we’re looking for
To check how VMware vSAN scales and how fast it runs, we‘ve built a four-node cluster and tested solution’s performance with the 4k random read within the defined queue depth range (from 1 to 128).
- Measure the raw Samsung SSD 960 EVO M.2 NVMe (500GB) performance in Windows Server 2016 environment.
- Estimate performance of a single VM run in the four-node vSAN cluster. At this stage, we’ll played around the vCPU/vCORE ratio to estimate the optimal VM properties.
- Investigate how the number of VMs assigned to a single node impacts the four-node vSAN cluster performance. For this purpose, we’ll vary the number of VMs assigned to a single node and study how performance changes. Based on the tests’ results, we’ll set the optimal number of VMs that will be used for further testing.
- Clone the optimal number of VMs estimated previously to other cluster nodes and measure their combined performance.
The employed toolkit
Hardware and software toys:
Below, you can find the setup for testing Samsung SSD 960 EVO M.2 NVMe (500GB) “raw” performance in Windows Server 2016 environment:
Node: Dell R730, CPU 2x Intel Xeon E5-2683 v3 @ 2.00 GHz, RAM 128GB
SSD: Samsung SSD NVMe 960 EVO M.2 (500GB)
OS: Windows Server 2016 x64 Datacenter
For testing our 4-node VMware vSAN cluster performance we’ll use such stuff:
4x Node: Dell R730, CPU 2x Intel Xeon E5-2683 v3 @ 2.00 GHz, RAM 128GB, 2x Samsung SSD 960 EVO M.2 NVMe (500GB), 1x Mellanox ConnectX-4 (100GbE), plus Mellanox SX1012 40GbE switch. Yeah, we’re going to slow down our 100 GbE NIC performance to just 40 GbE, but that’s just enough since we’re not gonna saturate the network with those two NVMe drives we have per server.
Hypervisor: ESXi 6.5 Update 1
Tools for testing performance
We’ve used DiskSPD v.2.0.17 and FIO v.3.0 for performance measurements.
Tests were held with 4k random read pattern with QD variation. We’ve used the following queue depth values for that purpose: QD=1,2,4,6,8,10,12,14,16,32,64,12. Test duration: 360 s, warmup 60 s
DiskSPD launching parameters
Find DiskSPD launching parameters below:
diskspd.exe -t8 -b4K -r -w0 -o1 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1 diskspd.exe -t8 -b4K -r -w0 -o2 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1 diskspd.exe -t8 -b4K -r -w0 -o4 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1 diskspd.exe -t8 -b4K -r -w0 -o6 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1 diskspd.exe -t8 -b4K -r -w0 -o8 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1 diskspd.exe -t8 -b4K -r -w0 -o10 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1 diskspd.exe -t8 -b4K -r -w0 -o12 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1 diskspd.exe -t8 -b4K -r -w0 -o14 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1 diskspd.exe -t8 -b4K -r -w0 -o16 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1 diskspd.exe -t8 -b4K -r -w0 -o32 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1 diskspd.exe -t8 -b4K -r -w0 -o64 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1 diskspd.exe -t8 -b4K -r -w0 -o128 -d360 -W60 -Su -L -a0,2,4,6,8,10,12,14 #1
FIO launching parameters
Below, we’ve listed the FIO launching parameters:
[global] numjobs=8 iodepth=1 loops=1 ioengine=windowsaio cpus_allowed=0,2,4,6,8,10,12,14 direct=1 ramp_time=60 runtime=360 filename=\\.\PhysicalDrive1 [4k rnd read QD1] iodepth=1 rw=randread bs=4k stonewall [4k rnd read QD2] iodepth=2 rw=randread bs=4k stonewall [4k rnd read QD4] iodepth=4 rw=randread bs=4k stonewall [4k rnd read QD6] iodepth=6 rw=randread bs=4k stonewall [4k rnd read QD8] iodepth=8 rw=randread bs=4k stonewall [4k rnd read QD10] iodepth=10 rw=randread bs=4k stonewall [4k rnd read QD12] iodepth=12 rw=randread bs=4k stonewall [4k rnd read QD14] iodepth=14 rw=randread bs=4k stonewall [4k rnd read QD16] iodepth=16 rw=randread bs=4k stonewall [4k rnd read QD32] iodepth=32 rw=randread bs=4k stonewall [4k rnd read QD64] iodepth=64 rw=randread bs=4k stonewall [4k rnd read QD128] iodepth=128 rw=randread bs=4k stonewall
Getting some reference
Since the underlying storage pool for vSAN Cluster is comprised of Samsung SSD 960 EVO M.2 NVMe (500GB), we’ve decided to test the bare-metal performance of NVMe in Windows Server 2016. No virtualization deployed to avoid any possible virtualization overhead affecting our test results.
Tests were held on unformatted disks with 4k random read loads with DiskSPD and FIO utilities.
Testing Samsung SSD 960 EVO M.2 NVMe (500GB) with four threads
Testing Samsung SSD 960 EVO M.2 NVMe (500GB) with eight threads
According to data obtained in this section, Samsung SSD 960 EVO M.2 NVMe (500GB) reached its maximum claimed performance (330 000 IOPS) during 4k random read in eight threads. As you can see, the disk demonstrates the highest IOPS value under QD=10. Therefore, we’ll consider the NVMe performance measured with 4k random read in eight threads under QD=10 as a “backbone” for further tests.
Measuring the four-node VMware vSAN cluster performance
Here is the step-by-step process of measuring the four-node VMware vSAN cluster performance:
- Install VMware ESXi 6.5 Update1
- Next, create a vSAN cluster itself. The number of nodes incorporated in the cluster depends on cluster purpose. Here, we’ve used a four-node setup
- Being properly configured, the vSAN cluster displays all available NVMe storages. Build an All-flash array vSAN Datastore. Note that each host should have an NVMe disk assigned for cache and capacity tiers.
- Create a two-disk VM. The first disk, 25 GB Virtual Disk0, is intended for the operating system. It is located on ESXi Datastore. Another one, maximum-size Virtual Disk1, is used for performance measurements with DiskSPD and FIO. This disk is located on vSAN Datastore.
The tested disk should have thick provision eager zeroed provisioning format. It should be connected with an additional virtual SCSI controller.
- While testing a single VM, vary the vCPU number in order to estimate the optimal VM parameters that ensure its highest performance.
- Next, clone the optimally-set VM keeping it pinned to its current Carry out tests on all VMs simultaneously. Disk volume, in this case, is distributed among all VMs. The maximum number of VMs on a single node depends on their overall performance.
- In order to estimate the VMware vSAN cluster availability, clone the optimal number of VMs on other nodes and test all VMs simultaneously.
- Look at the results we get and roll out the diagnosis.
NOTE. Before each test, the disk should be filled with some junk data to simulate its normal operation. We should do that while creating a new Virtual Disk before every test. In this study, we’ve used the dd.exe utility for that purpose.
dd launching parameters:
dd.exe bs=1M if=/dev/random of=\\?\Device\Harddisk1\DR1 –progress
Configuring vSAN cluster
We have already installed ESXi on our hosts and vCenter VM on a separate host so we’re not gonna cover that boring process here.
OK, log in to VMware vCenter:
Next, in the console, create new Datacenter and a new cluster inside of it. Now, we have to add all ESXi hosts to the cluster.
In order to add a host to the cluster, we have to type in its name or IP in the Name and location section.
In the Connection settings section, specify the host user name and password.
Now, we can specify the license and Lockdown mode (no rush, we can do that later).
The host may be set in the maintenance mode after being added, so let’s turn it back to operational state.
OK, so we have added our four hosts to the cluster:
The next stage is to configure the network for vSAN.
Let’s navigate to the host by switching to Configure=> VMkernel adapters and add a new network (Add host networking).
In the Add networking window, choose VMkernel Network Adapter.
On the Select target device step, create new standard switch.
Let’s add the active network adapter that is needed for testing to standard switch. Under our configuration, it is 40 GbE network port.
After that, in the Port properties section, give a name for Network label and activate the vSAN service.
Now, add the IP address of our network adapter.
Everything goes smooth by now, and we can see that the new network has appeared in Configure=> VMkernel adapters.
We need to add such VMkernel adapter on each host. After that, we have to check that all hosts can communicate with each other over vSAN network. Houston says everything’s clear and we’re good to go on.
Let’s move to vSAN configuration. Go to Cluster=>Configure=>vSAN=>General=>Configure.
On the vSAN Capabilities page, leave settings by default. We don’t need deduplication, compression, and encryption since they can degrade the performance, and in our test, we wanna be as unbiased as possible.
Under our configuration, fault domains don’t need any additional settings since we have a four-node cluster with each node serving as a separate fault domain (such configuration is set by default).
If everything was set correctly, the Network validation page will show us that each host has one VMkernel adapter assigned.
In order to create the vSAN storage, the necessary condition is to specify disks for Cache tier and Capacity tier. Cache tier must be provided with SSD and Capacity tier – with SSD or HDD. Take into account that the architecture doesn’t allow creating storage without cache.
In our scenario, we have created an all-flash array out of NVMe drives. Let’s provide cache and capacity tier with one disk each.
After having configured vSAN, we can see the created vsanDatastore (the name can be changed if you need something more original) in the Datastores tab. The disk type is vSAN.
This Datastore along with all its objects is set with vSAN policy by default. You can take a look at the storage policies in the menu Home=>Policies and Profiles.
Let’s take a quick look at the default policies. So, the Primary level of failures to tolerate is set to 1 (each object has one replica), which means we have RAID1, and each object is written to one disk (without spreading objects among disks like in RAID0).
Finally, our vSAN cluster is configured so we can proceed with picking up the proper VM parameters for running further performance tests.
Estimating VMs’ optimal properties
OK, once the cluster has been built, we’ve created a VM (Virtual Disk0 – 25GB, Virtual Disk1 – 860GB) to study its performance under the following vCPU/vCORE combinations:
Testing one VM (2vCPU/1CorePerSocket) in a four-node VMware vSAN cluster
Testing one VM (4vCPU/1CorePerSocket) in a four-node VMware vSAN cluster
Testing one VM (6vCPU/1CorePerSocket) in a four-node VMware vSAN cluster
Testing one VM (8vCPU/1CorePerSocket) in a four-node VMware vSAN cluster
Let’s discuss the results
According to the plots above, the VM disk performance merely depends on the vCPU number. The maximum disk performance in all cases fluctuates around 70000 IOPS. Regarding this fact, we chose the optimal vCPU/vCORE ratio from plots based on the lowest latency value in the disk performance saturation point. The further measurements were held with the 4 CPU/1CorePerSocket VM configuration.
Estimating the optimal number of VMs
In order to estimate the optimal number of VMs per node, we’ve gradually increased their number assigned to one node until their overall performance stopped growing. All the stuff we performed in this section was held on one node. Below, we list configurations tested at this stage:
- 2xVM (4CPU/1CorePerSocket/Virtual Disk1=430GB)
- 3xVM (4CPU/1CorePerSocket/Virtual Disk1=285GB)
- 4xVM (4CPU/1CorePerSocket/Virtual Disk1=215GB)
- 5xVM (4CPU/1CorePerSocket/Virtual Disk1=170GB)
- 6xVM (4CPU/1CorePerSocket/Virtual Disk1=140GB)
- 7xVM (4CPU/1CorePerSocket/Virtual Disk1=120GB)
- 8xVM (4CPU/1CorePerSocket/Virtual Disk1=105GB)
Testing two VMs (4CPU/1CorePerSocket/Virtual Disk1=430GB configuration) in a four-node VMware vSAN cluster:
Testing three VMs (4CPU/1CorePerSocket/Virtual Disk1=285GB configuration) in a four-node VMware vSAN cluster:
Testing four VMs (4CPU/1CorePerSocket/Virtual Disk1=215GB configuration) in a four-node VMware vSAN cluster:
Testing five VMs (4CPU/1CorePerSocket/Virtual Disk1=170GB configuration) in a four-node VMware vSAN cluster:
What we’ve learned from the experiments
According to the data above, there was no dramatic performance growth for five VMs (4CPU/1CorePerSocket configuration) running in one node as compared to the four-VM system with the similar configuration. Regarding this fact, there was no point in increasing the number of VMs assigned to the single node. Therefore, we’ve cloned four VMs (4CPU/1CorePerSocket configuration) on other nodes.
Testing 4-node VMware vSAN cluster scalability
After the long way of preparations, finally, we can test performance scalability of the entire thing. The further VMware solution’s testing was held on the four-node vSAN cluster (16xVM/4CPU/1CorePerSocket /Virtual Disk1=54GB). We’ve reduced the Virtual Disk volume to 54 GB since the overall vSAN Datastore capacity was constant and distributed among 16 VMs. Logically, we expected the four-node VMware vSAN cluster to perform four times faster than a single node.
“Doctor, how is vSAN?”
Not that good, dear. The initial concerns regarding vSAN scalability and performance were supported by our experiment: vSAN is slower than we expected and scales not that good. After cloning VMs on all cluster nodes and running tests on virtual disks of all 16 VMs simultaneously (Virtual Disk 54GB), the disk performance of each VM degraded in two or even four times! For example, four VMs’ (4CPU/1Core per Socket configuration) average disk performance reached 47000 IOPS during 4k random read with QD=14. Yet, disks performance of 16 VMs ranged around 12200 and 22600 IOPS under the similar testing conditions. Such a dramatic performance drop calls into question solution’s scalability. According to our test results, VMware’s “Scale to tomorrow” should be read as “scale to fucking nowhere”.