StarWind Virtual SAN.
Preamble 🙂 StarWind has a legion of weird looking guys and hot East European girls who adore writing long emails. Only to make things worse, they can wake you up in the middle of the night (Fuck time difference, clocks are for kids!) with a call just to chat about their tech and how cool they are. So… Think twice before providing them with your real email! Sticking with some throwaway junky one sounds like a much better idea in general. If you want to give their stuff a spin of course.
Unknown. Expedition needed.
We know from TV; Russian bears can ride a bicycle in a circus and play an accordion to avoid being tazed & beaten to death. Some drunk punk told us bears are good in rocket science as well. Rocket science = high performance storage, hyperconvergence, and all that jazz. To either confirm or deny this statement we decided to give StarWind Virtual SAN scalability and performance a try. Run it, test it and compare it to a big guy Microsoft with Storage Spaces Direct (S2D) who’s our current favorite race horse, and a bunch of losers & cocksuckers (VMware vSAN, DellEMC ScaleIO & Nutanix) as well. So… Does StarWind suck or blow? Spit or swallow?
Let’s start with some history as we know you love it. Back in 1961, Russians kicked U.S. butt in the space race sending first human to Moon… Space! 🙂 At this point they are trying to kick democratic system butt with their steel toe military boot. Again.
StarWind Software takes its roots from Rocket Division Software. They brought iSCSI protocol support to at that time anemic Windows Server platform back in 2003 when iSCSI spec was still in draft, and it took Microsoft another maybe five years to kind of catch up. Microsoft bought stinking String Bean Software for piece of shit they have been told is iSCSI target, and write their own still buggy as hell and slow as fuck iSCSI initiator. When we’ve been discussing Microsoft Storage Spaces and that pure fact Microsoft doesn’t understand storage we really mean it. OK, back in 2008 StarWind has been the first one who clustered a pair of Hyper-V servers without SAS JBODs, external SANs or whatever, just pure Software-Defined Storage running inside hypervisor and… no stinking “controller virtual machines”! Nutanix can go fuck themselves, they rolled out their kludge pretending to do similar “NoSAN” thing somewhere around 2012. A day late and a dollar short. StarWind & Intel partnered in 2010 to deliver 1M IOPS with iSCSI and at that time people were talking about tens of hundreds at best. These days StarWind keeps releasing weird stuff like NVMe over Fabrics target for Windows, iSER which is iSCSI over RDMA, and virtual tapes with Amazon AWS cloud back end (Who’s a customer for that?!?). Question is: Can they amaze with their performance?
Now let’s take a closer look at what we actually have on the table in front of us. StarWind Virtual SAN (VSAN) was born to make you happy… Nope! It was born to turn your rusty servers into high performing SAN. StarWind calls their software “hardware-agnostic” and that is supposed to mean it can be deployed on any piece of shit… hardware that fits the bill. Remember installing Linux on a dead badger? We know you do! 🙂 Now StarWind want us to believe we can install StarWind on a dead bear 🙂 There are no hardware compatibility lists, love it or hate it. If this bloody thing can run Windows Server – it can run StarWind as well. Nice try!
Another quite remarkable fact about StarWind is you can get it for free. These crazy vodka-drinking motherfuckers provide a flexed-out version for SMBs and hobos… ROBOs completely free of charge. Restriction less and frictionless. Production allowed. Bells & whistles! Though, you still have to love PowerShell (Fuck you Microsoft! PowerShell everywhere!). Remember, Microsoft had this “CLI only” thing to manage Windows Server & Hyper-V before Project Honolulu roll out? Same shit here. CLI once and forever. No support. Only for brave. GUI and guaranteed support are for paying customers only. OK, enough of dithyrambs and pointless shit, let’s move closer to our topic which is performance, performance, and again performance. Plus, scalability!
Considerations and milestones
By design, StarWind Virtual SAN (VSAN? vSAN? Whatever…) is fully integrated into hypervisor. VSAN is an interesting combination of a user-land applications (iSCSI, iSER & NVMeoF targets, iSCSI & iSER initiators, UI, licensing, health monitoring etc.) and some kernel-mode drivers (iSCSI initiator for God knows what, TCP loopback “fast path”, iSCSI load balancer and so on…) as well. It can optionally utilize “controller virtual machine (VM)” to either offload some I/O to dedicated CPU cores (Yes, StarWind “polls” for performance in user-mode with Intel DPDK & SPDK!) or just keep VMware guys happy with NOT installing any proprietary modules into ESXi kernel memory space. Smart, eh?
We run this bloody mix on top of an all-NVMe datastore that should deliver awesome numbers. In theory. As usual, we expect to get a cluster that has its overall I/O performance increased every time we spawn an extra VM. However, as it happens quite often, what we expect is not what we get…
What we’re planning to do:
1. Check underlying NVMe storage “raw” performance and compare it to vendor-provided reference numbers. Here, we measure “raw” performance in “bare metal” Windows Server 2016 environment.
2. Deploy StarWind VSAN on each host. Next, we enable Hyper-V role everywhere as we run hyperconverged and configure MPIO (Multi-Path I/O) as StarWind does iSCSI (and iSER…) and we need MPIO for block tech. For now, we’ll do TCP & iSCSI leaving RDMA & iSER for the second part of the story we tell.
3. Create distributed StarWind virtual device replicated between two hosts (Host #1 and Host #2).
4. Connect StarWind virtual device on Host #1 to the one on Host #2 with iSCSI (two 127.0.0.1 sessions and two iSCSI sessions with the Host #2). Round Robin service policy is used by default. Format this resulting raw block device as NTFS volume and assign a mount point drive letter to it.
5. Create a Hyper-V VM with a “system” virtual disk somewhere on the local Windows host partition and create one extra “data” 256GB VHDX for tests, this “data” one will be placed on top of the StarWind virtual device.
6. Pick some optimal test parameters (numbers of threads and Outstanding I/O).
7. Test StarWind virtual device performance with one VM running in the cluster.
8. Clone this test VM and pinpoint it to another Hyper-V host. Check the overall performance and clone the VM again… and again! For each newborn VM, we create a new StarWind virtual device and assign new dedicated “data” VHDX to it.
9. Keep on going ‘till the overall cluster performance chokes up, drops or stops increasing. Watch CPU usage.
10. Test single NVMe with all of our test patterns. Based on the numbers we get, we judge on the performance level StarWind VSAN delivers.
Inspecting the equipment
Here’s our testbed for a single NVMe performance.
1 Dell R730 server, 2x Intel Xeon E5-2683 v3 @ 2.00 GHz CPU (14 physical cores per-CPU), RAM 64GB
Storage: 1x Intel SSD DC P3700 2TB
OS: Microsoft Windows Server 2016 Datacenter 10.0.14393 N/A Build 14393
Now, each host used for StarWind Virtual SAN performance testing looks like this…
Host #1, Host #2, Host #3, Host #4
Dell R730 server, 2x Intel Xeon E5-2683 v3 @ 2.00 GHz CPUs (14 physical cores per-CPU), RAM 64GB
Storage: 2x Intel SSD DC P3700 2TB
LAN: 2x Mellanox ConnectX-4 100GB CX465A
OS: Microsoft Windows Server 2016 Datacenter 10.0.14393 N/A Build 14393
The drawing below illustrates how we interconnected everything:
Are our disks really that fast?
Before we start, we want to make sure that our Intel SSD DC P3700 Series 2 TB performs as it is said to. We do trust the vendor (Intel), but who knows how our disks are doing? Wear level, dead blocks etc. Now look at what our vendor says in its datasheet:
In this table, Intel claims its NVMe can reach massive 460K IOPS under 4k random reading pattern with 4 workers and Queue Depth=32.
To reproduce (verify?) the numbers Intel gave us, we measured disk performance under 4k random read pattern. Find test results in the plots below:
So, we reached our expected 4K random read pattern performance without compromising the latency. Let’s move on…
After Windows Server 2016 installation and basic initial configuration, we installed the WinOF-2 (1.90.19216.0) driver for Mellanox ConnectX-4 NICs and set up the networks. Then, we checked the networking bandwidth between two Windows hosts (i.e., Host #1 and Host#2) with iperf for TCP and nd_send_bw for RDMA (nd_send_bw is included in Mellanox Firmware Tools).
Let’s look at RDMA network bandwidth first (OK, we don’t do RDMA with StarWind VSAN within our current test, but that’s just to make sure network is healthy and properly configured):
Now, check out network bandwidth between hosts (host #1 and host #2) for RDMA:
And, here’s how TCP looked like (iSCSI is application-level protocol and runs on top of TCP, so TCP is workhorse for iSCSI):
Now we know NVMe drive we use reaches 460K IOPS with 4K random read pattern, let’s ensure that networking won’t let us down. In total, we have 8 drives (2 per host).
Switch, network etc math. We run our VSAN tests on the same setup we used before with Storage Spaces Direct. Making long story short and saving this planet another ceder or two, you can check out S2D performance testing report to find out whether our network bandwidth is enough, what NICs and switches can do etc. http://www.cultofanarchy.org/microsoft-storage-spaces-direct-s2d-surprisingly-good-job/
Time to create VMs
In this study, we used the following test VM configuration
– RAM 7GB
– Disk0 (Type SCSI) – 25GB (“system” virtual disk for Windows Server 2016)
– Disk1 (Type SCSI) – 256GB (“data” Hyper-V virtual disk 256GB)
NOTE: Even though we used only fixed virtual disks today, we completely filled them with random data with dd.exe before each test. It’s a good idea to do so while creating or adjusting VM virtual disk’s size.
So, here are dd.exe launching parameters:
dd.exe bs=1M if=/dev/random of=\\?\Device\Harddisk1\DR1 --progress
Picking test utility launching parameters
Now, let’s decide on the optimal number of threads and outstanding I/O. For that purpose, we created a VM and pinned it to the Host #1. Next, we measured StarWind virtual device performance with 4K random read pattern and varying number of threads and outstanding I/O. At some point, the performance hit the ceiling and saturated. That was the point with the optimal number of outstanding I/O and threads that we were looking for.
Here are DiskSPD launching parameters under threads=1 and Outstanding I/O=1,2,4,8,16,32,64,128
diskspd.exe -t1 -b4k -r -w0 -o1 -d60 -Sh -L #1 > c:\log\t1-o1-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o2 -d60 -Sh -L #1 > c:\log\t1-o2-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o4 -d60 -Sh -L #1 > c:\log\t1-o4-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o8 -d60 -Sh -L #1 > c:\log\t1-o8-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\t1-o16-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o32 -d60 -Sh -L #1 > c:\log\t1-o32-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o64 -d60 -Sh -L #1 > c:\log\t1-o64-4k-rand-read.txt timeout 10 diskspd.exe -t1 -b4k -r -w0 -o128 -d60 -Sh -L #1 > c:\log\t1-o128-4k-rand-read.txt timeout 10
So, let’s interpret the numbers. Hyper-V virtual disk maximum performance on StarWind virtual device lays between 220300-220500 IOPS. The latency, in its turn, is between 1.12-1.16 ms. In our setup, we can reach such performance only using 4 or 8 I/O threads. Remember VM has a 4-core vCPU? That’s why we used the following test utilities launching parameters: threads=4 and Outstanding I/O=64.
Accomplishing orbit task
Setting up tools
Now, as we got here, let’s set up tools and carry out the measurements. As usual, we used DiskSPD v2.17 and Fio v3.5 today. We run our tests under the following patterns:
– 4k random write
– 4k random read
– 64k random write
– 64k random read
– 8k random 70%read/30%write
– 1M sequential read
And, here come launching parameters for our tools under thread=4, Outstanding I/O=8, time=60sec
DiskSPD diskspd.exe -t4 -b4k -r -w100 -o8 -d60 -Sh -L #1 > c:\log\4k-rand-write.txt timeout 10 diskspd.exe -t4 -b4k -r -w0 -o8 -d60 -Sh -L #1 > c:\log\4k-rand-read.txt timeout 10 diskspd.exe -t4 -b64k -r -w100 -o8 -d60 -Sh -L #1 > c:\log\64k-rand-write.txt timeout 10 diskspd.exe -t4 -b64k -r -w0 -o8 -d60 -Sh -L #1 > c:\log\64k-rand-read.txt timeout 10 diskspd.exe -t4 -b8k -r -w30 -o8 -d60 -Sh -L #1 > c:\log\8k-rand-70read-30write.txt timeout 10 diskspd.exe -t4 -b1M -s -w0 -o8 -d60 -Sh -L #1 > c:\log\1M-seq-red.txt
FIO [global] numjobs=4 iodepth=8 loops=1 time_based ioengine=windowsaio direct=1 runtime=60 filename=\\.\PhysicalDrive1 [4k rnd write] rw=randwrite bs=4k stonewall [4k random read] rw=randread bs=4k stonewall [64k rnd write] rw=randwrite bs=64k stonewall [64k random read] rw=randread bs=64k stonewall [OLTP 8k] bs=8k rwmixread=70 rw=randrw stonewall [1M seq read] rw=read bs=1M stonewall
Testing 4-node Hyper-V / StarWind Virtual SAN (VSAN) cluster performance
Everything is ready and set up. So, let’s jump right to the real measurements now! Just like during S2D testing, we measured Hyper-V 256 GB “data” virtual disk performance under the mentioned range of patterns. For that purpose, we started with one VM, cloned it to another host and measured the overall cluster performance. Since S2D has surprised us (unlike Nutanix CE and VMware vSAN), fuck those 12-VM warm up! So, we’ll kept on doing this until the total performance either hit the ceiling and reaches saturation point or hit the ground. Note that each VM has its own StarWind virtual device that kept its “data” virtual disk designated for our performance tests. Go!
See, the performance keeps on growing proportionally to the number of VMs in the cluster. But, we decided to stop right after 20 VMs. Why? Just look at what we got under other patterns:
Well, enough is enough. Let’s see what happens to CPU. If you want to build a hyper-converged environment you should keep an eye on your CPU utilization and here’s why: You have to leave some spare CPU cycles to these poor little things called “production VMs”, right? Here’s how CPU behaves and what StarWind leaves to number crunching:
Well, it was iSCSI which uses TCP so it’s expected CPU hog, right? Oh, wait! It looks like StarWind does a better load balancing job than S2D did! StarWind knows what NUMA is, while Microsoft definitely doesn’t.
Measuring single Intel SSD DC P3700 2TB performance
Now, let’s take a look at how a single NVMe drive performs under the same test patterns in Windows Server 2016 environment. It will give us kinda of a reference number set to judge whether running StarWind VSAN on an 8 NVMe datastore makes or breaks Hyper-V cluster day.
At this point, let’s do some presumptions to avoid making today’s measurement too complex:
1. All 8 NVMes in the datastore are available for reading. Thus, the overall datastore reading performance should be 8x of just one Intel SSD DC P3700 2TB value (Well, it’s all assuming we don’t have network or CPU or whatever other bottlenecks of course).
2. All NVMe drives are available for writing. Yet, there’s the small thing about VSAN: Replication. This means each block after being written gets replicated to the partner disk (Actually both writes happen in parallel, but who the fuck cares…). In other words, you get a “network” RAID1 as these NVMes “sit” on the different physical hosts with network in between. So, the overall performance will be ((IOPS-Write-one-NVMe)*N)/2. N here stands for a number of NVMe drives involved in writing (8 for our setup).
3. In case of 8k random 70%read/30%write pattern, at best, performance will be (IOPS-Read-one-NVMe*N*0.7)+((IOPS-Write-one-NVMe*N*0.3)/2). Again, N is the number of NVMe drives utilized for the pattern (8 in our case).
Deorbit and landing
Since we’re pretty much done with all that jazz, let’s present our mission report now.
First, let’s look at how StarWind VSAN performance depends on the number of VMs running in the cluster. Just a tiny reminder: Under all of those test patterns we used an individual Hyper-V VM Virtual Disk (VHDX) which was placed on top of StarWind virtual device, there’s no dogfight for LUN ownership! Think about this strategy as of “poor-man’s vVols” if you know what we mean. Overall combined StarWind performance grows nice and smoothly till 4 VMs are concurrently running in the cluster. Saturation point with close to top (80th percentile, FYI) performance numbers achieved comes very soon! Things change once 5th VM gets on board… Under 4k blocks and mixed loads, performance growth slows down but keeps on rising steadily till 20th VM gets deployed.
Wait, why did we stop our measurements after 20th VM, though everything was looking promising? Two reasons actually… Well, first of all this boozing bear has already knocked out anything else reviewed in our lab before. Kidding… But, second, let’s be honest, it also overwhelmed CPU cores on all the hosts rendering hyperconverged scenario pretty much useless beyond this point: There’s no CPU cycles left for production VMs and we didn’t intend to test storage-focused installation! But what else did we expect from ancient iSCSI crawling all over the neighborhood on the back of “guest from 80s” TCP? Anyway, even with close to 100% CPU usage, StarWind performed more-or-less like S2D with its bloody SMB3, SMB Direct, RDMA etc. technology. It might draw different colors with iSER, but let’s leave it for our second home run. Oh, BTW, just look at StarWind per-CPU utilization chart! Unlike S2D, StarWind does proper load balancing. Another reason to stop our further tests was insignificant performance gain with every new VM added. Indeed, performance growth under 4k random read pattern became less intense. Well, let’s just look at these numbers… Initially, while running from 1 through 9 virtual machines in the cluster, average performance gain was approx. 120K IOPS per VM. Next, from 10 through 20 VMs, we got only 46K IOPS / VM. The highest overall performance (1.7M IOPS for number fanatics) was observed while 20 VMs were running in the cluster.
Hold on, what about performance & growing number of VMs measured with other I/O patterns? Well, once 4th or 5th VM gets engaged, there’s no significant performance growth with all of these “other” patterns as well, so we can make another mini-conclusion here: StarWind needs only one running VM per cluster host to grab close to maximum IOPS, making it super-efficient in terms of resource utilization. Our current performance leader (ex-leader?) Storage Spaces Direct (S2D) needs way more than that… OK, back to StarWind. There we observed true saturation. Maximum performance under 64k random write pattern didn’t go higher than 140K IOPS. Under 64k random read loads, StarWind VSAN exhibited 310K IOPS. As for 1M sequential reading performance reached 18K IOPS aka whooping 18.5 GB per second. Holy moly, what a bandwidth!
What does it mean? Once again, you do not need to run a legion of VMs within your cluster for damn performance. SQL Server, Oracle, and scientific number crunching app of choice you can’t do all in-memory because of cost will simply love StarWind and virtual storage it manages. You can nail down all performance your underlying storage can provide flying as low as one VM per physical host. Looks good for a “mom and dad’s shop” IT environment, right? That’s actually what these Russians say their SDS was intended for. Idiots… They created a nuclear bomb in their garage and they went off blast fishing with it!
Bottom line. The table below provides everything you need to know about this vodka-powered dragster. In the “Comparing performance” column, we list the percentage of the theoretical values we could reach.
Well, we aren’t exactly pissed off with the numbers we see. StarWind got biggest swnging dick in the room so far. Yes, this Russian bear does surprisingly well with rocket science! Another nice thing about it worth mentioning: These guys prove iSCSI has balls. Look at our S2D testing again: StarWind performs generally better with iSCSI / TCP than S2D does over SMB3 / RDMA. So, weapons are nothing, tactics make whole lot of difference.
Next time we’ll see is StarWind capable of making another ground shaking home run with RDMA enabled. Stay tuned, we aren’t dead yet 🙂