Recently we came across some issues where our Operations Manager 2012 environment was running very slow. So the Investigation into why Operations Manager was running so poorly began after initially checking the physical performance of the servers, we could see that they were being highly utilised with the Management Servers and the Operations Manager Database SQL server’s processors always averaging around 80% performing slow, the Disk and Network Utilisation looked ok although busy at times.
After a colleague advised we then ran the “System Center Core Monitoring Reports” and the “Data Volume by Management Pack” report which brings back the top 10 highest data volume generating management packs for Discovery Data, Alerts, Events and State changes. We automatically noticed that the SQL server management pack had an extremely high “Discovery Data, “Performance”, “Alert Count” and “State Changes”.
From this we then looked at the health of the SQL instances and found that all the SQL Clusters had very high state changes. We then looked up another level to the Agent which was the agent proxy for the SQL clusters and found that these SQL servers had a high state change as well. Drilling down through all the monitors we could see that the state changes were coming from the “Health Service Private Bytes Threshold” this monitor was changing state every 12 minutes! Looking at the state change view we could see that the private bytes threshold was being exceeded.
The Private Bytes default threshold is by default 314572800. After doing a little bit of analysis of the agent’s performance we decided to double the threshold 629145600 for the object “Health Service” (All Agents, However you may would only like to only do this for the agents which are highly utilised) so that this would no longer break the threshold.
Once any of the System Center Health monitor change to a critical state the rollup monitor “System Center Management Health Service Performance” has a recovery task set, the recovery task flushes the agent cache and restart the “System Center Management Service” this means that the agent would have to re download all the management pack to the re run all Discoveries, Rules and Monitors again which is not good for the agent SQL Server performance or the Operations Manager Management servers as they are constantly having to gather the data from these agents. Most importantly the Operations Manager Database is constantly being written to with the data being collected from these servers to the Operations Manager Database consequently the Operations manager database struggles to run any of its usual operational tasks and is very highly utilised.
Since we have performed these changes the state changes and alerts have decreased enormously and the amount of data from the SQL Server Management Pack is now a lot less from well over 900,000 state changes to over 6000 state changes, since making these changes we have now found that Operations Manager is also performing much better. Although this is still not perfect and there is further work to be done.
I hope this helps
I found these issues by using Kevin Holman blogs about the health service restart here (This was applicable for SCOM 2007): http://blogs.technet.com/b/kevinholman/archive/2009/12/21/the-new-and-improved-guide-on-healthservice-restarts-aka-agents-bouncing-their-own-healthservice.aspx