During my last project with a customer I had to build a small SCOM environment using 2012 R2 version with SQL Server 2014 SP1 as databases. The environment consists of the following components:
- 1 SQL 2014 SP server with named instance for Operations Manager database
- 1 SQL 2014 SP server with named instance for Operations ManagerDW database
- 1 SQL 2014 SP server with named instance for Report Server database
- 2 Management Servers
- 2 Web Console Servers
During the initial installation there were no any issue. Everything went as smooth as it could be. After I finished the configuration for the DB engines (like memory usage, MAXDOP, etc.) I wanted to import the basic management pack. I started with the Windows Operating System Management Packs as always. And then Boom… It took almost half an hour to import those management packs (it usually takes half a minute) Of course the first round was usual like checking the event log, processor usage, and so on. All server looked normal. Then I saw that more than 90% of the memory is in use on the DB server, but I didn’t think any serious, because we know this is how the SQL Servers works. It uses all the available memory it can get. But after that I saw that the SQL server process uses only 850 MB of memory and my eyebrows went up a bit then I give that sound which can cause a heart-attack to a system owner when one of the architect says “Hmmm….”
My next move was to go back to the task manager performance tab. And here I saw 45.8GB paged pool memory (the VM has only 28GB of memory assigned) and it was still counting. The page file was about 40GB. Something in my head told me it could be a memory leak, but where should I look for it. The it hit me the memory when I logged into the server the first time I got a little bitter taste when I saw they use McAfee as a VirusScan. I had really bad experience with this application in the past. So, I started to dig. It was only a 20-30 minutes when I realized that the version they use for Virus Scan Enterprise (VSE) and McAfee Agent (MA) is not supported on Windows Server 2012 R2. So, the next step was to wait for the customer engineer to update the versions to the supported level.
After all, they came back to me that all servers use now the correct version. I was happy to continue my work. All server was restarted and I checked the Paged-pool memory was standing constantly around 150-170MB. After half an Hour I had to go back to check a process when I saw the Paged-Pool memory at 25GB and was climbing. I was a bit disappointed. So, back to the digging. Nothing else came to my mind than to use poolmon.exe to check what process uses that horrible amount of Paged-Pool memory.
Poolmon showed me immediately that only one process is using 98% of the Paged-Pool memory and it is the Windows Notification Facility (WnF) with this information I turned my attention back to my friend (Google) and almost immediately got the right answer from a Microsoft article. Exactly the same symptoms I experience:
- High Paged-Pool memory
- This memory leak occurs after about 10 minutes of system uptime
- Poolmon analysis shows that Windows Notification Facility (WnF) tag is consuming all the available paged pool memory
The cause is the Remote Registry services did not stop running after the connection has been idle for 10 minutes as it should be. By default, the Remote Registry service is disabled on our Windows Server 2012 R2 installation, but either the McAfee or the SCOM agent installation turned it on and then the problem started to occur
Fortunately, the solution was easy as 1-2-3. Much easier that to find the issue itself. Here is the solution for this issue:
- Open registry
- Locate the sub key called “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\RemoteRegistry”
- Modify the DisableIdleStop value to 00000001
- Exit registry and restart the server
After restart all servers were fine with 150-180MB of Paged-Pool memory.