Exchange SLA Scorecard 2.0 – Outage collection fixed
As you may recall we wanted to implement the Exchange SLA scorecard. The primary reason we couldn’t use it was because we have clustered Exchange. We arranged our MSFT engagement to get the 2.0 version of the scorecard installed. After it was installed and configured we noticed we were not collecting all of our outages.
I did some searching and noticed that MOM was not collecting the events for STORE going offline when a node failed it’s Exchange resources over to another node in the cluster. So I presented this theory to MSFT and they did some testing to see if they could recreate it. They were able to recreate it, and so we just had to add the Windows Clusters computer groups to the SLA scorecard rule groups.
After we did this MOM was able to collect all the instances of a store going offline and so we could measure our outages. You see, once Exchange moves from one node to another (the Exchange Virtual Server instance and all of the other cluster resources for Exchange), the Exchange Management Pack (and the SLA Scorecard for Exchange MP) will ignore the node that doesn’t have the Exchange resources.
So, if you were in cluster manager and MOVED the Exchange group to another node, these events in the application log (store is going offline) would be ignored because the MOM agent would start to process against the new node that is getting the Exchange resources.
This might not be exactly what is going on, but it’s a theory I have since even the Exchange management pack that looks for store offline wasn’t collecting these events if a cluster failed resources over from one node to another.
HAPPY HAPPY JOY JOY!