This is NOT David Blaine, and that’s NOT street magic! StorMagic SvSAN 6.2 case study

Prologue

Hi, sorry for missing that long. Some time ago, we presented the eqemfight results between VMware vSAN and Microsoft S2D screwing the current performance champ for being Russian company. Having the job done, we decided to make the long-awaited trip to Europe. We ended it at some underground punk gig in Prague. Everything was just fine until cops showed up at the venue as it was a bit too noisy for guys living nearby. We “love” cops, but that’s not the main thing we are trying to tell you now.

During that gig, one of our “dream team” members said, “Guys, it seems that we forgot to test StorMagic…” Oh, ujkv, Patrick where have you been when S2D was kicking ujkv out of helpless vSAN? Where have you been when drunken bears were waving their 9” fkems? Oh, you had been working on some other stuff… What a good boy, hwem you! Ok, let’s look at this solution at least now. On the other hand, thanks to Patrick, we got a good subject for our studies – the existing vSAN alternatives. We hope that you remember how awful VMware vSAN performance is, so we guess, yes, it’s good to have some alternatives.

P.S. We played around with the format of this article a bit. We hope it gonna be more reader-friendly than any other similar ujkv on the Internet. Enjoy!

Suspect

StorMagic SvSAN 6.2

Status

Under investigation

Suspect description

When we say “magic”, we imagine Gandalf, Harry Potter, or David Blaine at least. Those guys could do a really deep magic ujkv. Let’s see today what that “magic” in StorMagic name stands for! Let’s hope, it’s not a dwnnujkv invented by some braindead guy from marketing…

SvSAN has been created to overcome physical SANs. Today, we’ll see whether it has big enough balls to do that. The solution can be deployed as a part of a hyperconverged solution or as a storage-only target for any server environment.

It’s a hardware-agnostic solution and guys from StorMagic promise to turn a bunch of etcrra cheap servers into a shared storage that fits your “changing capacity and performance needs”. The nice thing is that it can be run on 2 nodes. Hmm, that looks pretty minimalistic. So, let’s give that thing a shot in the 2-node setup today. The solution can be deployed on VMware vSphere or Microsoft Hyper-V servers. Nice try, StorMagic, but we don’t give a hwem about SvSAN for Hyper-V: we already have S2D and StarWind VSAN Free available to anyone (sure, if you have some time to set up that thing). In the upcoming set of articles, we try to pick an alternative for VMware vSAN. So, the thing we are looking at here is StorMagic SvSAN for VMware vSphere.

Restage

Today, we gonna test the 2-node all-flash StorMagic SvSAN 6.2 cluster. For experiment purposes, we are going to grow the number of VMs until the overall cluster performance either chokes up or hits the floor.

Well, yes, we tested 4-node cluster previously. And now, we’ll make a series of tests devoted to ROBO environments – 2-node clusters.

Hardware toys, SvSAN & test VM configuration

So, to start with, here’s the interconnection diagram of the setup used for today’s study.

wp-image-1516

First, let’s look at the host configuration we used for today’s study. Just as usual, we used Dell R730 servers with 2x Intel Xenon E5-268 v3 @ 2.00 GHz (14 physical cores per-CPU), and 64 GB of RAM as hosts. But, today, unlike our previous studies, we used only 2 of those boxes. We used a pair of test-seasoned Intel SSD DC P3700 2TB disks in Windows Server 2016 environment as storage here.

Now, let’s look at how the entire setup looked like. We had 2 exactly the same from the hardware point of view ESXi hosts (we refer them here as Host #1 and Host #2) with the configuration that we’ve just mentioned above. The entire setup configuration looked like as follows:

Boxes: 2xDell R730, CPU 2x Intel Xeon E5-2683 v3 @ 2.00 GHz, RAM 64GB
Storage on Host #1: 1x Intel SSD DC P3700 2TB
Storage on Host #2: 1x Intel SSD DC P3700 2TB
LAN: 2x Mellanox ConnectX-4 100Gbit/s
Hypervisor: VMware ESXi 6.7

As you know all about the hardware, let’s look at the StorMagic SvSAN for vSphere VM configuration. It’s a RHEL 6 VM installed that has everything you need to start running it. So, basically, we get a black box… do you still think it is a good idea?

CPU: 56xCPU (2x Sockets, 28x Cores per Socket)
Memory: 16GB
Hard disk 1: 512MB (vmdk, SCSI controller 0 – LSI Logic SAS)
Hard disk 2: 20GB (vmdk, SCSI controller 0 – LSI Logic SAS)
Hard disk 3: 1820GB (vmdk, Thick Provisioned Eager Zeroed, SCSI controller 1 – VMware Paravirtual);
Hard disk 4: 1820GB (vmdk, Thick Provisioned Eager Zeroed, SCSI controller 2 – VMware Paravirtual);
Network adapter: 3x VMXNET 3
Virtual SAN: StorMagic SvSAN 6.2 for vSphere (link for download)

wp-image-1517

Now, let’s look at our test VM configuration:

CPU: 4 x vCPUs (4 x Sockets, 1x Cores per-Socket)
Memory: 4GB
Hard disk 1: 25GB (vmdk, SCSI controller 0 – LSI Logic SAS) that resides on the ESXi local datastore. OS: Windows Server 2016 Datacenter. We refer this guy as “system” disk.
Hard disk 2: 80GB (vmdk, Thick Provisioned Eager Zeroed, SCSI controller 1 – VMware Paravirtual) located on STORMAGIC iSCSI DISK. We call it “data” disk here.

wp-image-1518

Software toys

We used DiskSPD v2.0.20a and FIO v3.8 to measure performance

Quick note. We entirely filled in our disks with random data using dd.exe, even though we used thick provision eager zeroed virtual disks for this study. We did this operation before each test while creating a new virtual disk or changing its volume.

Here are dd.exe launching parameters:

dd.exe bs=1M if=/dev/random of=\\?\Device\Harddisk1\DR1 --progress

Here are the launching parameters for DiskSPD v2.0.20a and FIO v3.8 under threads=1 and Outstanding I/O=1,2,4,8,16,32,64,128

DiskSPD

diskspd.exe -t1 -b4k -r -w0 -o1 -d60 -Sh -L #1 > c:\log\t1-o1-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o2 -d60 -Sh -L #1 > c:\log\t1-o2-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o4 -d60 -Sh -L #1 > c:\log\t1-o4-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o8 -d60 -Sh -L #1 > c:\log\t1-o8-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\t1-o16-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o32 -d60 -Sh -L #1 > c:\log\t1-o32-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o64 -d60 -Sh -L #1 > c:\log\t1-o64-4k-rand-read.txt

timeout 10

diskspd.exe -t1 -b4k -r -w0 -o128 -d60 -Sh -L #1 > c:\log\t1-o128-4k-rand-read.txt

timeout 10
FIO

[global]

numjobs=1

loops=1

time_based

ioengine=windowsaio

direct=1

runtime=60

filename=\\.\PhysicalDrive1

[4k-rnd-read-o1]

bs=4k

iodepth=1

rw=randread

stonewall

[4k-rnd-read-o2]

bs=4k

iodepth=2

rw=randread

stonewall

[4k-rnd-read-o4]

bs=4k

iodepth=4

rw=randread

stonewall

[4k-rnd-read-o8]

bs=4k

iodepth=8

rw=randread

stonewall

[4k-rnd-read-o16]

bs=4k

iodepth=16

rw=randread

stonewall

[4k-rnd-read-o32]

bs=4k

iodepth=32

rw=randread

stonewall

[4k-rnd-read-o64]

bs=4k

iodepth=64

rw=randread

stonewall

[4k-rnd-read-o128]

bs=4k

iodepth=128

rw=randread

stonewall

What we gonna do?

1. Benchmark SSD DC P3700 bare-metal performance in Windows Server 2016 “bare-metal” environment. Let’s ensure that these babies still run as fast the vendor said.

2. Deploy StorMagic SvSAN. Actually, StorMagic provides a good guide on how to deploy its solution. So, we won’t write just the same stuff here. We cover only important steps here. Nothing more than that.

2.1 Install an OVA file to create the VM. Start the VM in Web interface (it’s called VSA web GUI) and set up the network interfaces.

wp-image-1519

wp-image-1520

In network settings, we measured the StorMagic SvSAN network interfaces bandwidth. You can do that right in the VSA web GUI in the Speed Test tab (Network -> Actions -> Speed Test).

wp-image-1521

Network Speed Test has proven that SvSAN iSCSI vNICs (VMware VMXNET 3) on both hosts deliver… 3.3-3.4 Gbit/s bandwidth with MTU 9000. Setting MTU 1500 reduced the network bandwidth to 1.3-1.5 Gbit/s. WAIT, WHAT?!

wp-image-1522

We just did not believe our eyes. That’s why we checked NICs throughput with iperf at this point. This utility is available right after VMware ESXi 6.7 nmlx5-core 4.17.13.8 Driver for Mellanox ConnectX-4 100Gbit/s CX465A installation. And, you know, deep ujkv was still going on: even though the NICs in our setup can potentially deliver 100 Gbit/s throughput, the real one was around 39-43 Gbit/s. Note that ESXi hosts are connected right to each other. Sorry, what the hwem?! It seems that routing iSCSI traffic over vSwitch is a bad idea…

wp-image-1523

wp-image-1524

wp-image-1525

wp-image-1526
That’s what we got for physical connections.

Wait, is such a hwemed up vNICs throughput enough for benchmarking StorMagic SvSAN in all-flash cluster? We don’t think so. You see, VMXNET 3 vNICS connecting StorMagic VMs can bottleneck the overall Virtual SAN performance. Obviously, we do not want that thing performance to choke up! Unfortunately, users cannot fine tune the StorMagic VM and check network interfaces settings… AS THEY GIVE AWAY A FCOP BLACK BOX! To find out whether there’s something we can do with such a low throughput, we reached out StorMagic support. Here’s their response.

wp-image-1527

Oh, ujkv, seriously? THANKS FOR YOUR HELP, A FCOP SUPPORT NCFADQA! Now we know that StorMagic SvSAN is created (or just supported, who knows) by ujkvholes who’d rather blame their customer than DO THEIR HWEMKPI JOB and lift their asses from chairs to do something with the issue. Another “nice” thing is that StorMagic still says that the solution should be doing well even with ujkvty network. Seriously, just check out this fragment from the official website:

SvSAN has a lightweight footprint and has been designed for the realities of edge computing such as poor network reliability which is often found in remote areas. It delivers virtual storage and high availability, even when networks have long latencies and limited throughput.”

Well-well-well, it looks like that StorMagic had a disaster! They made a slow as hwem solution and customers with poor network throughput will never find out that network is not only the thing the thing they should blame for poor performance… Genius!

Some of us were going to quit right there, but Patrick (what a stubborn guy) said that we should still give the solution a shot using VMXNET 3 adapters as they are with MTU 9000. So, we, basically, we take the solution as it is, and squeeze maximum performance it can provide. Fair enough. If network bandwidth gonna be the problem (it actually IS A PROBLEM), it won’t be our fault this time. Period.

2.2 Pool all StorMagic SvSAN disks together. You can use two RAID levels: JBOD (concatenation) and RAID 1. Yes, you can create a software RAID. In this article, we investigate both RAID levels. To start with, for each VM we created RAID 1 storage pool comprised of two SSDs. Each was presented to StorMagic SvSAN VM as a VMDK file.

wp-image-1528

The total pool capacity is smaller than we expected. That’s all because of the Trial license pool capacity limitation.

wp-image-1529

Create Target afterward and connect it to ESXi initiator.

wp-image-1530

2.3 Connect the StorMagic iSCSI Disk to ESXi hosts.

We did all the stuff we wrote in this article in ESXi Web interface.

You need Software iSCSI storage adapter to be enabled on each host to connect StorMagic Target to ESXi hosts. Here’s the path to enable it: Storage->Adapters->Configure iSCSI. Tick Enabled in the iSCSI enabled field and press Save configuration in the end.

wp-image-1531

Next, press +Add… in the Dynamic Discovery tab to add IPs of both StorMagic VMs iSCSI interfaces on each ESXi host. Click Rescan Adapter afterward.

wp-image-1532

Once you are done, you should see StorMagic iSCSI disk listed in the Devices tab.

wp-image-1533

You can also learn its path in the Paths tab.

wp-image-1534

2.4 Now, let’s create VMware Datastore using the recently connected StorMagic iSCSI disk. Select whatever host you want for that purpose. The image below describes what exactly you should do.

wp-image-1535

You gonna have a new VMware Datastore (we called it StorMagic-Datastore here) once you go through all wizard step. Find it listed in the Datastores tab. It should be visible on both ESXi hosts.

wp-image-1536

wp-image-1537

Now, set Round Robin (VMware) as Path Selection Policy for StorMagic iSCSI Disk. To enable the policy, follow this path: Configuration->Storage Devices-> StoreMagc iSCSI Disk ->Properties->Edit Multipathing.

wp-image-1538

2.5 Now, let’s just cluster two hosts in vCenter and get over with this fcop boring preparation.

wp-image-1539

wp-image-1540

3. Create a Windows Server 2016 VM and pin it to, let’s say, Host#1. This VM has two disks: “data” and “system”. The former is the 80 GB test VMware Virtual Disk (VMware paravirtual disk) that resides on the StorMagic-Datastore. It gonna be our working horse today. The later just keeps the OS as its name implies.

4. Come up with the optimal testing parameters (number of threads and Outstanding I/O) for running test utilities. We use DiskSPD and FIO… just as usual!

5. Run performance tests on the test VM under the bunch of test patterns.

6. Clone the VM and pin it to another ESXi host. Just as the parent VM, the new one has own 80GB “data” disk on the StorMagic-Datastore. Measure both VM “data” disks performance.

7. “Clone-measure-clone it again” cycle. Keep on cloning VMs until overall cluster performance saturates.

8. Go back to step 2.2 and create the pool with JBOD RAID level on both StorMagic VMs. Repeat all the steps from 2.2 through 7.

wp-image-1541

9. Test Intel SSD DC P3700 2TB in Windows Server 2016 “bare-metal” environment under all those patterns to get some reference to judge on StorMagic SvSAN performance.

Hearing

Benchmarking Intel SSD DC P3700 single drive performance

Before we start testing, it’s really good to know whether one of our Intel SSD DC P3700 2TB drives still performs that great as vendor states. You see, we were out for a while and it’s really good to know whether disks still are doing well. First, let’s look at what Intel says in its datasheet.

wp-image-1542

This table says that NVMe disks in our setup can reach massive 460K IOPS under the 4k random reading pattern with 4 workers and Queue Depth=32.

wp-image-1543

To verify the numbers, we measured the disk performance under 4k random read pattern. Find the test results below.

wp-image-1544

wp-image-1545

Mini-conclusion

On the whole, everything is ok. Intel SSD DC P3700 2TB perform just as their vendor said. Now, as we know that disks are doing well, let’s move on!

Limitations

There’s one thing that’s worthy of being mentioned before we jump to picking the optimal test parameters and testing itself – performance limitations. Remember that StorMagic VM vNICs throughput could not go higher than 3.4 Gbit/s? Now, with those massive 460K IOPS per-disk in mind, let’s do some performance estimations.

Network Speed Test utility says that each VMXNET3 vNIC delivers 3.4 Gbit/s (0.43 GB/s) throughput. So, two these guys should deliver 0.86 GB/s throughput, right? Well, that’s really tough since StorMagic VM performance won’t go higher than (0,86GB/s*1024*1024)/4≈225K IOPS! In their turn, 2 Intel SSD DC P3700 2TB can potentially provide 2*460K=920K IOPS. Feel the difference… In this way, vNICs settings are going to hwem up the whole study! And, you basically can do nothing with it. StorMagic, go hwem yourself…

Picking the optimal test parameters

Now, let’s come up with the optimal test utility parameters – number of threads and Outstanding I/O under which the “data” disk shows its best. For that purpose, we measured the test VM VMware “data” virtual disk performance under 4k random read pattern under varying number of threads and outstanding I/O. Look at those test results and find the saturation point – the point where nothing grows except the latency. Congratulations, you found the optimal parameters!

wp-image-1546

VMware Virtual Disk 80GB (RAW) over StorMagic SvSAN (RAID level – RAID 1) – 4k random read (DiskSPD)
threads=1 threads=2 threads=4 threads=8
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
QD=1 4187 16 0.24 6351 25 0.31 12780 50 0.31 19026 74 0.42
QD=2 6469 25 0.31 12427 49 0.32 19912 78 0.40 27132 106 0.59
QD=4 12711 50 0.31 20008 78 0.40 28333 111 0.56 35661 139 0.90
QD=8 19833 77 0.40 27570 108 0.58 36855 144 0.87 40832 160 1.57
QD=16 27689 108 0.58 36687 143 0.87 42141 165 1.52 42020 164 3.05
QD=32 36574 143 0.88 41770 163 1.50 41626 163 3.08 41449 162 6.18
QD=64 42386 166 1.51 42747 167 2.99 42303 165 6.05 41640 163 12.30
QD=128 42469 166 3.01 41871 164 6.11 41752 163 12.26 41871 164 24.46

wp-image-1547

VMware Virtual Disk 80GB (RAW) over StorMagic SvSAN (RAID level – RAID 1) – 4k random read (FIO)
threads=1 threads=2 threads=4 threads=8
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
QD=1 3784 15 0.25 5878 23 0.33 11567 45 0.34 19529 76 0.40
QD=2 5884 23 0.33 11559 45 0.34 19442 76 0.40 29089 114 0.54
QD=4 11811 46 0.33 19471 76 0.40 26922 105 0.58 35404 138 0.89
QD=8 18894 74 0.40 29833 117 0.52 35093 137 0.90 41319 161 1.54
QD=16 26753 105 0.58 35175 137 0.89 41476 162 1.53 40984 160 3.11
QD=32 35444 138 0.88 40951 160 1.54 41191 161 3.10 41279 161 6.19
QD=64 41771 163 1.50 41573 162 3.06 40886 160 6.25 42611 166 12.01
QD=128 42039 164 3.02 40825 159 6.25 41795 163 12.24 41374 162 24.74

We’ve derived optimal test parameters for RAID 1. Anyway, we observed just the same values for JBOD. Furthermore, RAID 1 is more common for production environments.

Mini-conclusion

So, here are the optimal test parameters: threads=4 and Outstanding I/O=16. We observed just the same values for RAID level – JBOD.

Custody

Well, here’s the bunch of test parameters that we used for today’s study:

  • 4k random write
  • 4k random read
  • 64k random write
  • 64k random read
  • 8k random 70%read/30%write
  • 1M sequential read

And, here are the test utility launching parameters: thread=4, Outstanding I/O=16, time=60sec

DiskSPD

diskspd.exe -t4 -b4k -r -w100 -o16 -d60 -Sh -L #1 > c:\log\4k-rand-write.txt

timeout 10

diskspd.exe -t4 -b4k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\4k-rand-read.txt

timeout 10

diskspd.exe -t4 -b64k -r -w100 -o16 -d60 -Sh -L #1 > c:\log\64k-rand-write.txt

timeout 10

diskspd.exe -t4 -b64k -r -w0 -o16 -d60 -Sh -L #1 > c:\log\64k-rand-read.txt

timeout 10

diskspd.exe -t4 -b8k -r -w30 -o16 -d60 -Sh -L #1 > c:\log\8k-rand-70read-30write.txt

timeout 10

diskspd.exe -t4 -b1M -s -w0 -o16 -d60 -Sh -L #1 > c:\log\1M-seq-red.txt
FIO

[global]

numjobs=4

iodepth=16

loops=1

time_based

ioengine=windowsaio

direct=1

runtime=60

filename=\\.\PhysicalDrive1

[4k rnd write]

rw=randwrite

bs=4k

stonewall

[4k random read]

rw=randread

bs=4k

stonewall

[64k rnd write]

rw=randwrite

bs=64k

stonewall

[64k random read]

rw=randread

bs=64k

stonewall

[OLTP 8k]

bs=8k

rwmixread=70

rw=randrw

stonewall

[1M seq read]

rw=read

bs=1M

stonewall

Performance tests

Below, find plots and tables with performance tests results. We obtained them for both RAID levels.

wp-image-1548

wp-image-1549

wp-image-1551

wp-image-1552

wp-image-1553

wp-image-1554

wp-image-1555

wp-image-1557

wp-image-1559

wp-image-1561

wp-image-1563

wp-image-1565

Here’re the tables.

4k random write 4k random read
Pool (RAID1) Pool (JBOD) Pool (RAID1) Pool (JBOD)
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
1x VM 8717 34 7.34 8712 34 7.35 42625 167 1.68 40337 158 1.59
2x VM 15550 61 8.23 15606 61 8.20 58516 229 2.19 72306 282 1.77
3x VM 10616 41 24.90 10908 43 23.91 52783 206 4.35 53845 210 4.33
4x VM 14789 58 17.31 15060 59 17.00 57144 223 4.48 54170 212 4.73
5x VM 14624 57 22.78 15127 59 22.02 53433 209 6.23 51889 203 6.43
6x VM 14780 58 25.98 15255 60 25.17 58485 228 6.56 51588 202 7.44
7x VM 14784 58 30.93 15999 62 28.79 51817 202 8.83 49532 193 9.28
8x VM 15577 61 33.19 15264 60 33.54 49806 195 10.28 47108 184 10.88
9x VM 14505 57 40.20 15122 59 38.57 52280 204 11.14 47769 187 12.28
10x VM 15063 59 42.49 15218 59 42.06 47081 184 13.61 49486 193 12.93
11x VM 15000 59 47.95 16142 63 45.09 53245 208 15.63 45232 177 15.93
12x VM 14653 57 52.42 15984 62 49.17 52468 205 14.62 46150 180 16.67
threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16
64k random write 64k random read
Pool (RAID1) Pool (JBOD) Pool (RAID1) Pool (JBOD)
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
1x VM 1037 65 61.70 1040 65 61.56 18756 1172 3.41 20123 1258 3.18
2x VM 1039 65 123.13 1038 65 123.40 29200 1825 4.38 26168 1636 4.89
3x VM 1038 65 208.25 1039 65 206.79 28755 1797 7.41 25956 1622 8.22
4x VM 1046 65 244.83 1042 65 245.66 19880 1242 12.87 19620 1226 13.05
5x VM 1044 65 319.27 1045 65 318.98 20058 1254 16.58 19726 1233 16.87
6x VM 1047 65 367.03 1045 65 367.57 19391 1212 19.80 19556 1222 19.63
7x VM 1050 66 435.39 1147 72 406.72 19598 1225 23.33 19216 1201 23.71
8x VM 1068 67 479.68 1056 66 485.82 19645 1228 26.28 19548 1222 26.19
9x VM 1044 65 558.96 1058 66 551.15 19000 1188 30.67 19095 1193 30.54
10x VM 1065 67 601.97 1042 65 615.08 18886 1180 33.91 19125 1195 33.46
11x VM 1095 68 648.98 1169 73 622.76 19775 1236 35.81 19091 1193 37.29
12x VM 1055 66 729.72 1140 71 684.24 19459 1216 39.46 19315 1207 39.82
threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16
8k random 70%read/30%write 1M seq read
Pool (RAID1) Pool (JBOD) Pool (RAID1) Pool (JBOD)
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
1x VM 17478 137 3.66 18093 141 3.54 2290 2290 27.95 2330 2330 27.46
2x VM 26632 208 4.81 26796 209 4.78 2443 2443 52.41 2455 2455 52.44
3x VM 20596 161 12.46 21313 167 11.49 2385 2385 95.19 2435 2435 89.62
4x VM 23636 185 10.83 26332 206 9.72 2386 2386 107.36 2471 2471 104.36
5x VM 23788 186 14.00 26181 205 12.73 2377 2377 140.06 2479 2479 131.09
6x VM 24844 194 15.45 26158 204 14.68 2427 2427 158.30 2457 2457 157.29
7x VM 24652 193 18.55 27175 212 16.88 2360 2360 193.91 2649 2649 172.50
8x VM 24735 193 20.71 26324 206 19.45 2338 2338 220.63 2513 2513 205.29
9x VM 25234 197 23.11 26258 205 22.21 2375 2375 245.61 2481 2481 232.30
10x VM 25343 198 25.25 26109 204 24.51 2421 2421 264.41 2520 2520 256.27
11x VM 24432 191 29.03 26044 203 27.22 2468 2468 287.18 2517 2517 280.33
12x VM 25514 199 30.10 26608 208 28.87 2389 2389 322.13 2520 2520 306.21
threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16

Now, let’s repeat just the same set of tests but using FIO.

wp-image-1567

wp-image-1569

wp-image-1571

wp-image-1573

wp-image-1575

wp-image-1577

wp-image-1580

wp-image-1582

wp-image-1584

wp-image-1586

wp-image-1587

wp-image-1589

And, here is the data FIO provided us with.

4k random write 4k random read
Pool (RAID1) Pool (JBOD) Pool (RAID1) Pool (JBOD)
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
1x VM 8708 34 7.33 8709 34 7.33 37965 148 1.67 39740 155 1.60
2x VM 15351 60 8.32 15588 61 8.19 63330 247 2.01 67365 263 1.89
3x VM 10633 42 24.79 10922 43 23.78 53453 209 4.28 53864 210 4.30
4x VM 14490 57 17.64 14287 56 17.89 55909 218 4.56 52535 205 4.86
5x VM 15555 61 21.19 15145 59 21.96 54401 213 6.30 51753 202 6.46
6x VM 14910 58 25.72 15443 60 24.83 55202 216 6.94 50716 198 7.56
7x VM 14991 59 30.48 15664 61 29.22 52599 206 8.68 49039 192 9.36
8x VM 15321 60 33.48 15887 62 32.32 53703 210 9.52 46972 184 10.89
9x VM 15371 60 38.17 16060 63 36.73 52776 206 11.03 48388 189 12.08
10x VM 16582 65 39.94 16631 65 39.80 49887 195 12.81 45162 176 14.17
11x VM 16003 63 45.91 17200 67 43.54 46677 182 15.25 43974 172 16.31
12x VM 16398 64 50.06 16532 65 46.23 43773 171 17.53 44155 173 17.41
threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16
64k random write 64k random read
Pool (RAID1) Pool (JBOD) Pool (RAID1) Pool (JBOD)
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
1x VM 1034 65 61.73 1033 65 61.78 18937 1184 3.36 19566 1223 3.25
2x VM 1033 65 123.18 1033 65 123.31 29275 1830 4.35 26412 1651 4.83
3x VM 1028 65 211.26 1034 65 205.36 28251 1766 7.53 25459 1592 8.37
4x VM 1032 65 245.90 1030 65 246.56 19578 1224 13.05 19677 1230 12.98
5x VM 1141 72 289.80 1047 66 315.39 20241 1266 16.60 19969 1249 16.65
6x VM 1068 67 355.97 1063 67 357.58 19564 1224 19.59 19618 1227 19.54
7x VM 1075 68 422.35 1079 68 419.75 19817 1239 23.03 19475 1218 23.42
8x VM 1086 69 467.24 1084 69 466.44 20062 1255 25.47 19982 1250 25.58
9x VM 1087 69 529.38 1111 70 521.72 19327 1209 30.13 19276 1206 30.17
10x VM 1185 75 551.31 1187 76 548.79 18561 1161 34.45 18872 1181 33.90
11x VM 1178 75 613.67 1221 78 605.99 18692 1170 37.92 19090 1194 37.17
12x VM 1125 70 658.12 1182 74 650.83 18462 1155 41.59 19305 1208 39.83
threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16
8k random 70%read/30%write 1M seq read
Pool (RAID1) Pool (JBOD) Pool (RAID1) Pool (JBOD)
IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms) IOPS MB/s Latency (ms)
1x VM 17570 137 5.07 18142 142 5.13 2272 2274 28.08 2356 2357 27.08
2x VM 26452 207 6.74 26897 210 7.05 2433 2438 52.44 2436 2439 52.64
3x VM 20264 158 14.57 21181 166 13.79 2391 2397 97.15 2484 2486 87.55
4x VM 23133 181 11.99 26145 204 10.75 2360 2370 107.95 2512 2516 102.36
5x VM 23999 188 14.67 26135 204 13.70 2420 2431 137.04 2448 2458 131.90
6x VM 23620 185 17.17 26381 206 15.48 2379 2391 160.47 2476 2487 155.46
7x VM 24695 193 19.43 26550 208 18.15 2360 2371 192.72 2468 2487 180.64
8x VM 24780 194 21.56 25921 203 20.71 2382 2393 213.83 2480 2494 206.79
9x VM 24648 193 24.61 26001 203 23.34 2353 2377 245.21 2484 2501 230.41
10x VM 25241 197 26.31 26775 209 24.87 2345 2374 269.79 2476 2489 258.82
11x VM 24791 194 29.58 26028 204 28.20 2360 2378 298.70 2481 2505 281.43
12x VM 25535 200 31.02 26517 207 29.89 2344 2374 323.80 2516 2536 304.41
threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16 threads=4 Outstanding I/O=16

Getting some reference

Today, we measured StorMagic SvSAN performance for both RAID levels: RAID1 and JBOD. If you take a look at experimental data, the performance we got for both RAID levels looks pretty similar. Based on this fact, we calculated the estimated performance only for RAID 1. Furthermore, RAID 1 is close to the real production.

We got some numbers today, but we still need to interpret them somehow. As a reference, we use the single Intel SSD DC P3700 performance in “bare metal” Windows Server 2016 environment measured under the same patterns. To derive the reference, we made several assumptions. Find them below:

1. Data is read from all available SSD disks simultaneously. Therefore, we expect the overall reading performance to be 4x expected Intel SSD DC P3700 performance (we have 4 disks in the setup).

2. Writes occur to all SSDs simultaneously. All blocks written to StorMagic iSCSI Disk are replicated to the partner StorMagic SvSAN node (that’s how RAID 1 works). Data on each StorMagic VM are written to Pool with RAID level – RAID1. In this way, we expect writing performance to be described good enough with the following formula:

wp-image-1591
Now, we guess, you need some explanations. N stands here for the number of disks available for writing (4 for this setup). ¼ coefficient includes replication.

3. Under the 8k random 70%read/30%write pattern, the expected performance can be described with this formula:
wp-image-1593
Again, N stands here for the number of disks involved in writing or reading. 0.7 and 0.3 coefficients represent the portion of each workload in the pattern.

DiskSPD
Intel SSD DC P3700
( Windows Server 2016)
Theoretical values for use 4x Intel SSD DC P3700 Max Performance for
VMware Virtual Disk
over StorMagic SvSAN
Ratio of measured performance to theoretical value
IOPS MB/s IOPS MB/s IOPS MB/s %
4k random write 409367 1599 409367 1599 15577 61 4
4k random read 423210 1653 1692840 6612 58516 229 3
64k random write 30889 1931 30889 1931 1095 68 4
64k random read 51738 3234 206952 12936 29200 1825 14
8k random 70%read/30%write 403980 3156 1297800 4945 26632 208 2
1M seq read 3237 3237 12948 12948 2468 2468 19
threads=4

Outstanding I/O=16

threads=4

Outstanding I/O=16

FIO
Intel SSD DC P3700
( Windows Server 2016)
Theoretical values for use 4x Intel SSD DC P3700 Max Performance for
VMware Virtual Disk
over StorMagic SvSAN
Ratio of measured performance to theoretical value
IOPS MB/s IOPS MB/s IOPS MB/s %
4k random write 351489 1373 351489 1373 16582 65 5
4k random read 333634 1303 1334536 5212 63330 247 5
64k random write 31113 1945 31113 1945 1185 75 4
64k random read 37069 2317 148276 9268 29275 1830 20
8k random 70%read/30%write 351240 2744 1079000 4226 26452 207 2
1M seq read 3231 3233 12924 12932 2433 2438 19
threads=4

Outstanding I/O=16

threads=4

Outstanding I/O=16

 

wp-image-1595

wp-image-1596

wp-image-1598

wp-image-1600

wp-image-1602

wp-image-1604

wp-image-1606

wp-image-1608

Death on the electric chair

In this article, we made sure that there’s no hwemkpi magic! StorMagic SvSAN for vSphere is incapable of showing any signs of performance. Period. Now, we guess that’s time to get more specific.

To start with, StorMagic SvSAN performance could not deliver even those miserable 225K IOPS which we expected it to provide due to VMXNET3 vNICs throughput limitations. You know what does it mean? There’s something else bottlenecking performance. And, if you look at all our numbers, you’ll understand that even with normal vNICs throughput, SvSAN will never show the performance you expect. What’s wrong with you, StorMagic, or should we call it UjkvMagic now?

Under all testing patterns, performance saturates once the second VM spawns in the cluster. We observed this for both RAID 1 and JBOD RAID levels. Notably, under 4k and 64k reads, performance drops gradually as more VMs are spawned in the cluster. Excuse me, what the hwem?!

Under the 4k random read pattern, performance growth observed while 2 VMs are running in the cluster is followed by the significant drop once 3rd VM gets on the board. Peak performance lays between 53K-63K IOPS. If you keep on spawning VMs in the cluster, the performance just keeps on degrading. Anyway, we stopped with 12 VMs in the cluster as we could not squeeze more than 47K-58K IOPS… from the setup that should deliver 1.7M IOPS! True magic ujkv, isn’t it?

Under the 64k random read pattern, we also observed performance drops between 2 and 4VMs in the cluster. Peak performance under this pattern was between 29275 IOPS and 19620 IOPS. From 4 VMs through 12 VMs on board, the performance just kept on going down until it reached 19305 IOPS.

Under 64k random writes, performance fluctuates between 1000 and 1220 IOPS.

Under the 8k random 70%read/30%write pattern, we did not observe any significant StorMagic SvSAN performance gain. True, there was a 26K IOPS peak while 2 VMs were running in the cluster, but later performance saturated (there was insignificant fluctuation between 24K-25K IOPS on the entire experiment span).

Under the 1M seq read pattern, the overall StorMagic SvSAN cluster performance saturated (2400-2500 IOPS) once the second VM was spawned in the cluster.

Let’s wrap it up. Here’ s the traditional disappointment diagram. “Welcome to the Cult of Anarchy Hall of Shame, oqvjgthwemgt!” (said in Filthy Frank’s voice)

wp-image-1610

“Degree of disappointment” in this diagram stands here for how many times the overall Virtual Disks performance is lower than we expected.

Summing up everything we’ve seen today, if you are looking for VMware vSAN alternative, NEVER EVER deploy StorMagic SvSAN. The solution is so hwemed up that its performance looks as a rounding error even compared to VMware vSAN.

4.33/5 (3)

Please rate this