So a client had patched there 2 node S2D cluster this Sunday. When the 2nd node came up it would not join the cluster. So the client started troubleshooting, without realizing that the Quorum file share Witness was not working. Someone had managed to delete the folder where the File Share Witness was on. At first, the first node that was patched and booted was working. But at some point both nodes where booted over and over again.
A few days ago a friend of mine asked me if i had any idea on how to get his SSD’s and HDD’s to be attached to the NVME cache devices. As he had added alot of disks over the last 7 months to his S2D cluster. And the normal behavior is that any new Disk will be auto bound to the Cache
A while ago i gave u the first look into Dataon Must, Dataon’s monitoring system that comes with their S2D servers.
Today i want to give you an insight into a new offering that is coming, Barton Glass. Barton Glass is built by Barton Systems member of the Cronos group and 2016 Microsoft Partner of the year in Belgium.
So i built a 2 node S2D cluster a while ago at home on some old HP G6 nodes i got cheap. But have decided to get rid of that and setup a new 3 node cluster with bit’s i can find on ebay. And reuse disks i have. This will be a multipart blog post as the parts are orderd and they come in. And during the build process. I will provide a step by step guide in building it, installing, configuring, monitoring and troubleshooting Storage Spaces Direct. Including switch config.
After patching both our S2D clusters today, i have had the same error after resuming nodes and failing back roles. This happens after installing KB4038782
Update, after installing CU10 october patch. Rebooting a S2D node will not cause this issue, after the initial update boot.
The Physical disk’s stays in maintenance mode.
I wanted to enable Application Insights on my WordPress web app in azure. Normally one can do this by go into the web app and install it first time you open up application insights. This way of doing it does not work.
So we are having some issues with a REFS volume going offline, on a singel server storage pool if there is too much data being written to the volume in the morning. At the moment we have not figured out why. Disks are showing ok. Get-Physicaldisk and Get-Virtualdisk is ok. Everything says it’s healthy. And logs only show REFS being taken offline due to write error.
So a week ago we moved our fileserver shares to dfs shares. And updated a GPO that had not worked for a while. This had done so some users did not have a folder under the filseshare for users for redirect of Documents Folder.
After the issue with DPM and Defender in one of my prev posts here we started having problems backing up some vm’s. The error would be Unknown error or The DPM service was unable to communicate with the protection agent on (Name of hyperv host) (ID 52 Details: The semaphore timeout period has expired. (0x80070079))
The backup is of a Hyper-V virtual machines on a S2D 4 node cluster. And it’s spread over all 4 nodes. Initially this was on 7 vm’s. Im down to 3 now as i write this blog. As i need to fix 1 and 1.
So yesterday i had to replace a disk on a failed HDD in one of our S2D cluster. After replacing the drive and removing the failed drive from the cluster i ran Get-Physicaldisk and noticed i had no disks with Canpool = True. This is normal as S2D will detect the new disk and add it to the Storage Pool to balance the pool correctly.