How to bring your 2 node S2D cluster back up when witnes share is gone

So a client had patched there 2 node S2D cluster this Sunday. When the 2nd node came up it would not join the cluster. So the client started troubleshooting, without realizing that the Quorum file share Witness was not working. Someone had managed to delete the folder where the File Share Witness was on. At first, the first node that was patched and booted was working. But at some point both nodes where booted over and over again.

Thus causing the cluster not to come online again. Several fixes was tried, by trying to force the cluster online. By setting the quorum node vote to only 1 node and so on. Also causing the 2nd node to be evicted from the cluster at one point. The logs kept screaming that there was not a majority vote to bring the cluster online.

Every possible option was tried to bring the cluster online again. Tried setting the cluster Quorum with a cloud Witness with Powershell but that did not work and it failed.

Even when i ran Get-PhysicalDisk and Get-Virtualdisk it was not showing

So i had 3 options

  1. Reinstall nodes and wipe the S2D config on the storage pool with the Clear-SdsConfig.ps1 script
  2. Reinstall nodes and keep the Disk config for the S2D storage pool
  3. Try and reinstall the cluster role on nodes.

I ended up trying the 3rd option first, as that was the fastest option i had. And i rely did not want to reinstall the nodes.

So what i did was the following.

Start with removing the Clustername from the Domain Controllers. Make sure it’s removed on all Domain Controllers in the Domain.

Then run these Powershell commands.

You will get some warnings when creating the cluster and enabling S2D again. But these are ok.

Once this is done, your Failover Cluster Manager will be empty. There is no VM’s, no StoragePool and no Cluster Shared Volumes. Go into Pools under Storage, and click on Add Storge Pool and choose your S2D pool. Once that’s done, go into the Disks and click on Add Disk. Then choose the disks and add them as Cluster Shared Volume

You can do this with GUI or powershell

 

Now that the volumes are back into the cluster. And you can access the drives under c:\clusterstorage\ we need to add the VM’s back to cluster. First we go into Hyper-V manager and import the VM configuration for each VM.

Once the VM is in Hyper-V manager we also need to add it as a cluster role in Failover Cluster Manager. Go to FCM and add the VM’s as Virtual Machine Roles.

Now your VM’s should be added as a Cluster Role and you can start them up.

Update 19.04.2018

VMM

After getting everything up again. A Live Migration within VMM was tried and failed. It was complaining about

Error (12711)

VMM cannot complete the WMI operation on the server () because of an error: [MSCluster_ResourceGroup.Name="bd32d3bf-6cbc-47fc-98b2-796d2f98f998"] The cluster group could not be found.

This was due to when importing the VM’s to Hyper-V manager it’s not setup with HA and VMM does not pickup that the VM was added to the Failover Cluster Manager. So do a manual refresh on each Virtual Machine in VMM and Live Migration should start to work.

DPM other Backup solutions

In DPM if the machines is deployed with VMM the name of the VM’s in DPM starts with SCVMM, when importing the VM’s in Hyper-V manager they will not have SCVMM in the name anymore in Failover Cluster Manager. Thus the old backup set’s will not find the VM’s. Remove the backups, don’t delete the backup data. And setup the protection on the VM’s again. It will trigger a synchronization of the machines.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

* Checkbox GDPR is required

*

I agree