Almost 2 weeks ago our backups stopped working totally. It would just stand still and eventually fail. Today i was informed that this might be a windows defender bug with DPM. And i can confirm it is. Working with MS Support on a permanent fix for this issue.
I initially thought it was a Database bug. But it’s defender.
This will be short, There seams to be a bug in the DPM 2016 database once it reaches a certain size or number of recovery points. We have 2 setup around the same time and both failed within 2 days of each other. The symptoms where backups just standing still not doing anything and if i open resource monitor of perfmonitor it would be slow and sluggish.
You have probably heard these acronyms somewhere, so what are these and are they the same. In short yes and no
RoCE stands for RDMA over Converged Ethernet, the RDMA part is Remote Direct Memory Access.
RDMA allows for network data(TCP packets) to be offloaded on the Network cards and put directly in to the memory, bypassing the hosts CPU. Allowing for the host to have all the access to the CPU. In normal TCP offload all the network traffic goes trough the CPU and with higher speeds will take more CPU. On a 10gbit network it would take about 100% cpu on a 12 core Intel Xeon V4 CPU. Read more
In my plans for our Dataon S2D-3212 setup i had plans on using our Dell N4000 switches for DCB/RDMA, as we have this in both our datacenters. When i did our first install i had problems when enabling DCB with no-drop on the N4000. The N4000 was running firmware version 184.108.40.206 and we where loosing connectivity to some servers when no-drop was enabled. So we ended up buying some new Dell Force10 S6000-on switches, as the Nic’s in our servers are Mellanox ConnectX-4 40Gbit cards. Read more
So i noticed on our Dataon S2D cluster that not all interfaces had a SMB Listener.
You can get the output for that from Netstat -xan
Last week Dataon officially released there monitoring system for there S2D offering. Dataon Must, it is at the time the only S2D monitoring software out there. There is management packs you can add to SCOM to get alerts and stuff in there. There is also one made for MOMS. Take a look at Stanislavs post about it here. Other then that there is nothing else.
This post will be all about the pictures and not so much about the writing 🙂
Dataon Must is only available with Dataon S2D hardware. As of what i know, it will not be available with non Dataon S2D clusters. When you get this solution it’s delivered as a finished setup. You get a vhdx file to import as a virtual machine. Import it, give it 4gb of ram and min 2 cpu and you are good to go. It’s a Win 2016 server and running a IIS website. Setting it up is quite simple and is done in minutes.
Updated 27. feb
We have been testing Storage Spaces Direct for a while on our Ebay cluster. We have been running development and some production systems. As the 2nd exchange node, a mediation server and our vmm server.
We have been looking to replace our current Hyper-V solution that consist of HP BL465c G8 and BL490 G7 blade servers attached to HP P2000 G3 MSA over iscsi. This has been slower and slower as we have setup more virtual machines. This was a 12 disk shelf with 11 disks active with one spare. One 15k disk gives about 170 iops, giving a whopping 1870 iops on max speed. On normal load it would use about 1200-1500 IOPS so not a lot of spare IOPS. We had one pr cluster.
Most of you know what S2D(Storage Spaces Direct) is, if you don’t go look at Cosmos Darwin’s post over at Technet to get some good insight about S2D.
What i am going to focus on in this blog is the new Dataon HyperConverged server. Back at Ignite 2016 Dataon released there first offering the S2D-3110 all flash solution pumping out 2.6 Million IOPS in a 1u form factor. Read more
A friend of mine asked me about this a while ago, as he had setup his S2D cluster with SSD and HDD only. So the SSD’s became the journal drives(caching drives). Now he wanted to replace the SSD’s with NVME disks that he had replaced. Yesterday he did the swap and it worked great.
After my initial failure of replacing a NVME Caching card and hitting a bug in the 2016 version i was on, i replaced another one today. As we where starting our cluster out with Intel 750 drives, and these NVME PCIe cards only have 70gb of write’s pr day. So i decided to replace them with the Intel DC P3600. The first failed as can be seen here.
Over the last few weeks we have been having some issues with our Storage Spaces Direct test/dev cluster. To start off i will explain what happened and what did go wrong.