After patching both our S2D clusters today, i have had the same error after resuming nodes and failing back roles. This happens after installing KB4038782
Update, after installing CU10 october patch. Rebooting a S2D node will not cause this issue, after the initial update boot.
The Physical disk’s stays in maintenance mode.
I wanted to enable Application Insights on my WordPress web app in azure. Normally one can do this by go into the web app and install it first time you open up application insights. This way of doing it does not work.
So we are having some issues with a REFS volume going offline, on a singel server storage pool if there is too much data being written to the volume in the morning. At the moment we have not figured out why. Disks are showing ok. Get-Physicaldisk and Get-Virtualdisk is ok. Everything says it’s healthy. And logs only show REFS being taken offline due to write error.
So a week ago we moved our fileserver shares to dfs shares. And updated a GPO that had not worked for a while. This had done so some users did not have a folder under the filseshare for users for redirect of Documents Folder.
After the issue with DPM and Defender in one of my prev posts here we started having problems backing up some vm’s. The error would be Unknown error or The DPM service was unable to communicate with the protection agent on (Name of hyperv host) (ID 52 Details: The semaphore timeout period has expired. (0x80070079))
The backup is of a Hyper-V virtual machines on a S2D 4 node cluster. And it’s spread over all 4 nodes. Initially this was on 7 vm’s. Im down to 3 now as i write this blog. As i need to fix 1 and 1.
So yesterday i had to replace a disk on a failed HDD in one of our S2D cluster. After replacing the drive and removing the failed drive from the cluster i ran Get-Physicaldisk and noticed i had no disks with Canpool = True. This is normal as S2D will detect the new disk and add it to the Storage Pool to balance the pool correctly.
Almost 2 weeks ago our backups stopped working totally. It would just stand still and eventually fail. Today i was informed that this might be a windows defender bug with DPM. And i can confirm it is. Working with MS Support on a permanent fix for this issue.
I initially thought it was a Database bug. But it’s defender.
This will be short, There seams to be a bug in the DPM 2016 database once it reaches a certain size or number of recovery points. We have 2 setup around the same time and both failed within 2 days of each other. The symptoms where backups just standing still not doing anything and if i open resource monitor of perfmonitor it would be slow and sluggish.
Updated May 26th 2018 with HPE FlexFabric config
You have probably heard these acronyms somewhere, so what are these and are they the same. In short yes and no
RoCE stands for RDMA over Converged Ethernet, the RDMA part is Remote Direct Memory Access.
RDMA allows for network data(TCP packets) to be offloaded on the Network cards and put directly in to the memory, bypassing the hosts CPU. Allowing for the host to have all the access to the CPU. In normal TCP offload all the network traffic goes trough the CPU and with higher speeds will take more CPU. On a 10gbit network it would take about 100% cpu on a 12 core Intel Xeon V4 CPU. Read more
In my plans for our Dataon S2D-3212 setup i had plans on using our Dell N4000 switches for DCB/RDMA, as we have this in both our datacenters. When i did our first install i had problems when enabling DCB with no-drop on the N4000. The N4000 was running firmware version 126.96.36.199 and we where loosing connectivity to some servers when no-drop was enabled. So we ended up buying some new Dell Force10 S6000-on switches, as the Nic’s in our servers are Mellanox ConnectX-4 40Gbit cards. Read more