How to replace a NVME Caching device on a Storage Spaces Direct Cluster

After my initial failure of replacing a NVME Caching card and hitting a bug in the 2016 version i was on, i replaced another one today. As we where starting our cluster out with Intel 750 drives, and these NVME PCIe cards only have 70gb of write’s pr day. So i decided to replace them with the Intel DC P3600. The first failed as can be seen here.

Step 1

Drain the node and Pause it, then shut it down

Step 2

Replace the NVME Card and boot it back up

Step 3

Open PowerShell and type to see the new NVME card.

Get-PhysicalDisk -CanPool $true

Now let’s set the old NVME card in Status retired.

Get-PhysicalDisk -Usage Journal -HealthStatus Warning | Set-PhysicalDisk -Usage Retired

This will retire the disk run

Get-PhysicalDisk -Usage Retired

Now let’s add the new NVME disk

Add-PhysicalDisk -PhysicalDisks (Get-PhysicalDisk -CanPool $True) -StoragePoolFriendlyName (Important to set full name here)

Now let’s set the disk as Journal drive and see how it looks after. S2D might even do this automaticaly

Set-PhysicalDisk -FriendlyName "INTEL SSDPEDME400G4" -Usage Journal
Get-PhysicalDisk -FriendlyName Intel*

Now run this command

Repair-ClusterS2D -RecoverUnboundDrives -Node (Name of node)

Now it’s added and you can resume the node. Then pause the node and resume it again and the disks will go out of maitenance mode.

$faileddisk = Get-PhysicalDisk -Usage Retired
$faileddisk
Remove-PhysicalDisk -PhysicalDisks $faileddisk -StoragePoolFriendlyName "Use full name of storagepool"
Get-Physicaldisk -Usage Retired

If you run you will see that the retired disk is gone.

Get-physicalDisk

After you enable the node again the storage rebuild job will run. This needs to finish before you replace another drive. And you won’t be able to pause a node before it’s finished.

 

Leave a Reply

Your email address will not be published. Required fields are marked *