Hello, my name is James. I'm an IT Manager, specialising in Windows Server, Software Development (.Net) and SQL Database Design & Administration.

Starwind Virtual SAN and HP StoreVirtual VSA – side by side

An upcoming project I’ll be involved in centres around High Availability and Disaster Recovery, and whilst Failover Clustering ticks a number of boxes on the high availability front, that does come with some additional caveats. I wrote recently about areas of weakness I’d found in a traditional Windows Server Failover Cluster, the main thing being shared storage introducing a new single point of failure.

To counter this, there were a number of possible options, at a variety of price points, from dedicated Hardware SAN devices in Cluster Configuration themselves, to software based Virtual SAN Solutions which claim to achieve the same.

This is a brief update on my experiences of Virtual SAN, based on two products, HP StoreVirtual VSA and Starwind Virtual SAN. I should note these are not performance reviews, just some notes on my experiences setting them up and using them in a lab environment.

Starwind Virtual SAN.

paste_image_1400

In it’s basic form, this is a Free piece of software, support is provided by a community forum, and there are naturally commercial licenses available. This edition can allow you quickly provision iSCSI based storage to be served over the network, but has no resiliency built in. I implemented this recently to provide additional storage to an ageing server running Exchange where physical storage was maxed out, and so network based was the only option, but Exchange needed to see this more as a physical disk than use a network share.

There is a two node license available for this, providing replicated storage across two servers running the software. This is where it provides real use, as you’ve now introduced storage resiliency given it’s available in two places. From experience, once the initial replication has taken place, provided you’ve set up your iSCSI connections and MPIO to use the multiple routes to the storage, powering down one of the servers running Starwind Virtual SAN had no impact on the access to data provided by the Virtual SAN. Once the server was powered back up, it took a little time to re-establish it’s replication relationship, but I’m going to write this off to my environment.

The software can be used in one of two ways, you can install it directly to your server (Bare Metal) or you can install it to a Virtual Machine, with Hyper-V and VMWare VSphere both possible. There are benefits to installing directly to your server, mainly being RAM usage and not having the overhead of hosting a full OS install running on a VM on top of your hypervisor. Two network connections are required, one as a synchronisation channel, ideally using a direct connection between the two servers, the other connection is required for management and health monitoring of the synchronisation relationship.

For extra resilience, if the license permits, a further node can be added to the configuration that is off-site, for Asynchronous replication.

HP StoreVirtual VSA

paste_image_1416

StoreVirtual is a virtualised storage applicance provided by HP, originally a product known as LeftHand. It is only available as a Virtual Machine and so adds some overhead to it’s implementation, using at least 4GB of RAM, which increases dependent on the capacity hosted. Supported on both VMWare and Hyper-V platforms, there is a wide market for the product.

The StoreVirtual VSA can function as a single node, and equally works in a multi-node configuration with scale-out functionality. Because it cannot be installed ‘Bare Metal’ other than on a dedicated hardware appliance, performance therefore has a potential to be slightly impacted given the overhead in the Hypervisor providing access to underlying physical storage.

In terms of management, there is a dedicated management interface, provided by installing the management tools on another computer (or VM) on the network. Here, it’s simple to provision storage, set up access control in terms of who can access this storage, and see health and performance information.

High availability is achieved not through MPIO, but through presenting a group of servers as a single cluster, this however needs to be managed by a further Virtual Machine running a role called Failover Manager (FOM), which again adds to the overall overhead of the implementation. In an ideal scenario, this would be hosted on hardware independent of the other two nodes to avoid a loss of Quorum. StoreVirtual also supports Asynchronous replication for off-site replication.

Update: for clarity, FOM is required when an even number of nodes are active, to ensure a majority vote is possible for failover purposes.

Limitations of Testing

My lab consists of 2 x HP Microserver Generation 8, both with Intel Xeon E3 series processors and 16GB RAM, both are connected to a HP Procurve 1800 managed gigabit switch. With only 16GB of RAM on each Hypervisor, it’s difficult to simulate a real-world workload on the I/O front, particularly when at bare minimum, 6GB needs to be allocated to StoreVirtual and a FOM on one of the machines, and 4GB for the redundant node on the other.

Pros and Cons

Starwind:

Pro – Installs directly to Windows Server 2012 R2 or to a VM
Pro – Relatively low memory footprint
Pro – Lots of options to tweak performance, can leverage SSD cache etc.
Pro – Generous licenses for evaluation purposes – 2 nodes (provided they are VM based) licenses are available free of charge

Con – I’d heard of Starwind before, having used a few of their useful tools, but would you trust their solution with your enterprise data?
Con – Caught up with a full resync when one node was shutdown and restarted, and it took some time to re-establish the synchronisation

HP Storevirtual:

Pro – A brand name you know, and might find easier to trust
Pro – Up to it’s 13th Version, the underlying OS is proven and stable
Pro – Intuitive management tools

Con – Must be ran as a VM, minimum RAM required is 4GB for a StoreVirtual node. A Failover Manager is required to maintain Quorum in a 2 node configuration
Con – 1TB license expires after 3 years, so for lab use, prepare for the time to come

Closing thoughts

I can vouch for solid performance in Starwind Virtual SAN, as the shared storage for my lab’s Hyper-V highly available VMs is running on a 2 node Starwind Virtual SAN. Ultimately, lack of Hardware available to perform a comparable test has meant I have not been able to use StoreVirtual to host the same workload. The licensing of StoreVirtual put me off a little, Starwind’s license is non expiring, but the 1TB license for StoreVirtual on offer is restricted to 3 years.

Once I’ve found some suitable hardware to give StoreVirtual a fuller evaluation, I’ll add more detail here.

 

Link Aggregation between Proxmox and a Synology NAS

I’ve been using Synology DSM as my NAS operating system of choice for some time, hosted on a HP N54L Microserver with 4 x 3TB drives and a 128GB SSD. This performs well and I’ve been leveraging the iSCSI and NFS functionality in my home lab, setting up SQL Database storage and Windows Server Failover clusters.

Having Proxmox and Synology hooked up by a single gigabit connection was giving real world disk performance of around 100MB/s, near enough maxing out the connection. For Synology to have enough throughput to be the storage backend for virtual machines, this would not cut it, so I installed an Intel PRO/1000 PT Quad in each machine giving an additional 4 gigabit network ports.

Proxmox itself supports network bonding modes of most kinds, including the one of most interesting, balance-rr (mode 0) which will effectively leverage multiple network connections to increase available bandwidth rather then provide fault tolerance or load balancing.

I could easily create a 802.3ad link aggregated connection between each, which worked perfectly, but serves no purpose in a directly connected environment other than providing redundancy as the hashing algorithms for load balancing will try and route all traffic from one MAC address to another via the same network port, so I set out to investigate whether the Synology could support balance-rr (mode 0) bonding which sends packets out across all available interfaces in succession, increasing the throughput.

Note: You’ll need to have already set up a network bond in both Synology and Proxmox for this to work, I won’t cover this here as it’s simple on both platforms. I’ll be talking about is how we can enable the mode required for the highest performance.

The simple answer is no, Synology will not let you configure this through the web interface, it wants to set up an 802.3ad LACP connection, or an active-passive bond (failover is in mind rather than performance). I found however that provided you’re not scared of a bit of config file hacking (well you probably wouldn’t be using Proxmox if you didn’t know your way around a linux shell and DSM is based on linux too) you can enable this mode and achieve the holy grail that is a high performance aggregated link.

Simply edit /etc/sysconfig/network-scripts/ifcfg-bond0 and change the following line:

BONDING_OPTS=”mode=4 use_carrier=1 miimon=100 updelay=100 lacp_rate=fast”

to

BONDING_OPTS=”mode=0 use_carrier=1 miimon=100 updelay=100″

Now, reboot your Synology NAS and enjoy the additional performance this brings.

For reference, here’s the output from ‘iperf’ performing a single transfer:

root@DiskStation:/# iperf -c 10.75.60.1 -N -P 1 -M 9000

WARNING: attempt to set TCP maximum segment size to 9000, but got 536

————————————————————

Client connecting to 10.75.60.1, TCP port 5001

TCP window size: 96.4 KByte (default)

————————————————————

[  3] local 10.75.60.2 port 37463 connected with 10.75.60.1 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-10.0 sec  3.40 GBytes  2.92 Gbits/sec

Not bad?!?

High Availability and DR in SQL Server 2014 Standard

In my day job it’s part of my role to consider ways in which the IT department can work more effectively, as well as ways we can get our IT infrastructure to work better for us. A project that’s currently under way is migrating from SQL Server 2008 R2 to SQL Server 2014 Standard. The current plan is that it will run on it’s own box, and whilst it will have the horse power to deal with the load, this approach is ultimately vulnerable to a number of different types of failure that could render the database server unusable and adversely affect the business.

Part of my studies towards MCSE Data Platform involves High Availability and Disaster Recovery strategies for SQL Server, but most of the features are noticeable absent in the standard edition of SQL Server.

So, how can I work with Standard and still give us some type of fault tolerance?

I’m currently exploring either physical or hardware failover clustering using Server 2012’s built in Failover Clustering services along with a SQL Server 2014 cluster – Standard Edition, provided is correctly licensed (either through multiple licenses or with failover rights covered by Software Assurance) will allow for a two node cluster.

Windows Failover Clustering has reliance on shared storage however, thereby introducing another potential point of failure in the storage platform that would also lead to downtime.

Failover Clustering is great, but how do I provide fault tolerant storage to it?

I’ll document here my experiences with both hardware and software solutions to this.

I’m considering Synology rackmounted NAS devices in high availability configuration, but the potentially more cost effective solution is to virtualise a VSAN in a hypervisor of choice. SANSymphony and StarWind Virtual SAN are options I’ll consider. All of this will need to be tested in my home lab, which is a Lenovo Thinkserver TS440 with Xeon e3 Processor, 32GB RAM with 256GB SSD storage backed by a HP N54L providing shared storage via iSCSI – it runs Proxmox as my hypervisor of choice which is a platform I’ve been using for a number of years before Hyper-V really took off. It’s open source with commercial offerings, and uses KVM / Qemu – the solution must work here first.

I’ll post an update soon.

Copyright © James Coleman-Powell, 2016