After my initial failure of replacing a NVME Caching card and hitting a bug in the 2016 version i was on, i replaced another one today. As we where starting our cluster out with Intel 750 drives, and these NVME PCIe cards only have 70gb of write’s pr day. So i decided to replace them with the Intel DC P3600. The first failed as can be seen here.
Step 1
Drain the node and Pause it, then shut it down
Step 2
Replace the NVME Card and boot it back up
Step 3
Open PowerShell and type to see the new NVME card.
Get-PhysicalDisk -CanPool $true
Now let’s set the old NVME card in Status retired.
Get-PhysicalDisk -Usage Journal -HealthStatus Warning | Set-PhysicalDisk -Usage Retired
This will retire the disk run
Get-PhysicalDisk -Usage Retired
Now let’s add the new NVME disk
Add-PhysicalDisk -PhysicalDisks (Get-PhysicalDisk -CanPool $True) -StoragePoolFriendlyName (Important to set full name here)
Now let’s set the disk as Journal drive and see how it looks after. S2D might even do this automaticaly
Set-PhysicalDisk -FriendlyName "INTEL SSDPEDME400G4" -Usage Journal Get-PhysicalDisk -FriendlyName Intel*
Now run this command
Repair-ClusterS2D -RecoverUnboundDrives -Node (Name of node)
Now it’s added and you can resume the node. Then pause the node and resume it again and the disks will go out of maitenance mode.
$faileddisk = Get-PhysicalDisk -Usage Retired $faileddisk Remove-PhysicalDisk -PhysicalDisks $faileddisk -StoragePoolFriendlyName "Use full name of storagepool" Get-Physicaldisk -Usage Retired
If you run you will see that the retired disk is gone.
Get-physicalDisk
After you enable the node again the storage rebuild job will run. This needs to finish before you replace another drive. And you won’t be able to pause a node before it’s finished.