Troubleshooting failed VirtualDisk on a Storage Spaces Direct Cluster

In this guide i will explain what you can do to fix a failed virtualdisk in a Failover Cluster. In S2D the ReFS volume will write some metadata to the volume when it mounts it. If it can’t do this for some reason it will jump the virtualdisk from node to node until it’s tried to mount it on the last host. Then it will fail and you will get this state in the event log and the Virtual disk will be failed.

Updated April 18th 2018

clusterlog

If you also look in your ReFS event log you will see things like this

refslog

Now let’s run a powershell command on one of the nodes to look at the VirtualDisk

get-virtualdisk1

Updated section

Microsoft has changed some settings lately on what to do when a ReFS volume goes offline on a CSV. They have given us another parameter to use. Start by running these commands.

 

 

Now the virtualdisk should look like this in Failover Cluster manager.

fixvirtualdisk1

Wait for any storage jobs that is running. This might happen. Run Get-StorageJob and it should be empty. Once it’s empty we can add the virtualdisk back as a Cluster Shared Volume

Now it should be ok. You can run

And it should show as online.

If the volume does not come online after starting the clusterresource. Run the first part again with the dataintegrity scan. let it sit for a while. Then do the 2nd part with commands. One time i had to do this process 10 times over before it came online.

 

 

 

7 thoughts on “Troubleshooting failed VirtualDisk on a Storage Spaces Direct Cluster

  • March 26, 2018 at 7:33 pm
    Permalink

    When try to bring virtual disk up following your solution I still have no luck. It’s still offline. I’ve failed disks in the pool which I replaced them with brand new disks but don’t show up as available,instead I see the get-physicaldisk showing removing from pool,communication lost with old disk serial numbers even after reboot

    Reply
    • April 3, 2018 at 8:37 am
      Permalink

      Hello

      Without getting more info i can’t help you out. If you send me your email adress i will invite you to the slack channel we use for troubleshooting S2D. https://storagespacesdirect.slack.com

      Reply
      • April 20, 2018 at 4:21 pm
        Permalink

        How do I private message you? I’d like to join the slack…

        Reply
        • April 21, 2018 at 10:13 pm
          Permalink

          send me a msg on twitter, jantorep

          Reply
  • January 26, 2018 at 4:47 pm
    Permalink

    Thanks! this is the only reference I have found to this problem.
    I would have wished the events were in text, not pictures – so I found this a bit late – searching for the errors using google does not pop this article, because it’s pictures I guess.

    Never the less – THANKS!
    At least I got a bit of my trust in the S2D back after loosing complete access to a volume with no explanation.

    Reply
  • January 1, 2018 at 4:18 pm
    Permalink

    I did not check the storagejob’s if any was running. You can probably wait if you want to yes if a repair job is running. But it works without waiting as well.

    Regards
    Jan-Tore

    Reply
  • December 31, 2017 at 1:12 am
    Permalink

    Many thanks for this post ! Is very “simple” but is very Good !

    Just before to re-add the VirtualDisk to the Cluster, It’s possible to wait the Repair Storage Job.

    Best regards,
    Philippe G.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *