Hi,
I Have 4 servers, x2 Mailbox and x2 CAS, all with 2008 r2 OS, Mailboxes with DAG. Problem started after I upgraded all servers from Exchange 2013 CU3 to Ex13 SP1. All Servers are in Vmware, same hardware.
All servers getting BSOD, error is same as some other posts in Internet about:
CRITICAL_OBJECT_TERMINATION (f4)
A process or thread crucial to system operation has unexpectedly exited or been
terminated.
Several processes and threads are necessary for the operation of the
system; when they are terminated (for any reason), the system can no
longer function.
Arguments:
Arg1: 0000000000000003, Process
Arg2: fffffa8007ab4890, Terminating object
Arg3: fffffa8007ab4b70, Process image file name
Arg4: fffff800019de7b0, Explanatory message (ascii)
Debugging Details:
------------------
PROCESS_OBJECT: fffffa8007ab4890
IMAGE_NAME: wininit.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 0
MODULE_NAME: wininit
FAULTING_MODULE: 0000000000000000
PROCESS_NAME: MSExchangeHMWo
BUGCHECK_STR: 0xF4_MSExchangeHMWo
DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT
CURRENT_IRQL: 0
ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) amd64fre
LAST_CONTROL_TRANSFER: from fffff80001a67ab2 to fffff800016d9bc0
LOGS:
RecoveryActionLogs: ForceReboot-ServerName -ActiveDirectoryConnectivityServerReboot: Throttling rejected the operation
RecoveryActionLogs: ForceReboot-ServerName -ActiveDirectoryConnectivityConfigDCServerReboot: Throttling rejected the operation
Recovery Action Failed. (ActionId=RestartService, ResourceName=ServerName.contoso.com, Requester=ActiveDirectoryConnectivityRestart, InstanceId=140408.062707.65292.001, ActualStartTime=2014-04-08T15:27:07.6529270Z, ActualEndTime=2014-04-08T15:27:07.7777262Z, ErrorMessage=Service ServerName.contoso.com was not found on computer '.'.)
when Server restarts, logs show
Bugcheck action reported by server 'ServerName' initiated by responder 'ActiveDirectoryConnectivityServerReboot'
Recovery Action Started. (ActionId=ForceReboot, ResourceName=ServerName, Requester=ActiveDirectoryConnectivityServerReboot, InstanceId=140408.063307.93261.004, ExpectedToFinishAt=2014-04-08T15:38:07.9326175Z
Recovery Action Succeeded. (ActionId=ForceReboot, ResourceName=ServerName, Requester=ActiveDirectoryConnectivityServerReboot, InstanceId=140408.063307.93261.004, ActualStartTime=2014-04-08T15:33:07.9326175Z, ActualEndTime=2014-04-08T15:35:07.0828500Z)
I found that it could restart like many times in a day, but Throttling does not let it, just one restart per day.
I already spent four days trying to find problem but no luck.
Try'ed to add override
Add-GlobalMonitoringOverride -Identity Exchange\ActiveDirectoryConnectivityConfigDCServerReboot -ItemType Responder -PropertyName Enabled -PropertyValue 0 -Duration 60.00:00:00
but after
(Get-WinEvent -LogName Microsoft-Exchange-ActiveMonitoring/responderdefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ?{$_.Name -like “ActiveDirectoryConnectivityConfigDCServerReboot"} | ft name,enabled
still shows propertyValue 1, and servers are restarting no matter what, I just dont want to disable HealtManager Service.
Try'ed to recreate all performance counters, did'nt worked
Try'ed to create prefered domain controller, did'nt worked
Try'ed to play with healthSet 'AD', but there is almost no information about it in TechNet, so I just looked which services are unhealthy, sometimes it shows ActiveDirectoryConnectivityServer or ActiveDirectoryConnectivityConfigDCServer as unhealthy, but after few minutes it goes to healthy, problem that Exchange could already restart the server
mainly reason I think is:
The AD Health Set has detected a problem with Server.contoso.com at 2014.04.09 06:33:11. The Health Manager is reporting that ActiveDirectoryConnectivityProbe/Server.contoso.com Failed with Error message: Search took 1518 ms. Threshold 800 ms. Attempts to auto-recover from this condition have failed and requires Administrator attention. Exception Details: System.Exception: Search took 1518 ms. Threshold 800 ms
I really need some new ideas...
EDIT: Exchange 2013 sp1 works perfectly in test lab, there are cloned machines from production, the only difference is DomainController, there is just one (server), no replications, and no subdomains (production domain have subdomain).
Running DCDiag shows no errors