November 2016 - jtpedersen.com IT made simple

Troubleshooting Storage Spaces Direct

November 23, 2016November 10, 2017jantorep

Over the last few weeks we have been having some issues with our Storage Spaces Direct test/dev cluster. To start off i will explain what happened and what did go wrong.

Troubleshooting failed VirtualDisk on a Storage Spaces Direct Cluster

November 23, 2016June 12, 2018jantorep

In this guide i will explain what you can do to fix a failed virtualdisk in a Failover Cluster. In S2D the ReFS volume will write some metadata to the volume when it mounts it. If it can’t do this for some reason it will jump the virtualdisk from node to node until it’s tried to mount it on the last host. Then it will fail and you will get this state in the event log and the Virtual disk will be failed.

Updated April 18th 2018

Troubleshooting performance issues on your windows storage. Storage Spaces Direct

November 23, 2016November 10, 2017jantorep

In this guide i will give you a quick overview on how to troubleshoot your Storage Spaces, like ordinary Storage Pools with and without Tiering and Storage Spaces Direct. I will do the troubleshooting based on an issue we had with our test Storage Spaces Direct Cluster.

What happen was that we where starting to experience really bad performance on the VM’s. Response times where going trough the roof. We had response times of 8000 ms on normal os operations. What we traced it down to was faulty SSD drives. These where Kingston V310 consumer SSD drives. These did not have power loss protection on them, and that’s a problem as S2D or windows storage want’s to write to a safe place. The caching on these Kingston drives worked for a while. But after to much writing it failed. You can read all about SSD and power loss protection here.

Our ebay development S2D cluster

November 15, 2016November 10, 2017jantorep

During this summer i decided i wanted to test out Storage Spaces Direct. TP5 was out and i was quite eager to test it out. Now it’s been upgraded to RTM with cluster rolling upgrade. Rember to run Update-ClusterFunctionalLevel after.

So i look arround on ebay for some servers and other items i needed to buy. I ended up with the list under all in 4x

HP DL380 G6 16 bay 128GB mem, 2x4core Intel CPU HP P420
HP H220
MellanoX ConnectX3 MCX312A-XCBT
Intel 750 NVME PCIe
2x Kingston SSDNow V310 for caching(Replacing with Samsung SM863)
6x WD Red NAS 1tb 2.5″
Dell Force Ten S4810P (Already had)

How replacing a NVME card on a S2D cluster caused me alot of hedache

November 15, 2016November 10, 2017jantorep

A week ago i replaced a NVME card on our development Storage Spaces Direct cluster. This did not go as gracefully as i had hoped. Normaly this should work in the following way.

Pause node and drain the server for all resources
Shut down server
Replace NVME card
Reboot server
Resume server

This did not go as planned. I ended up with quite alot off issues. This was a late saturday evening. I ended up with disks that looked like this.