"OALGen is running on the wrong CCR cluster node"
Sometimes I just have to wonder.
It seems that the Microsoft Exchange Team thought that you’d always want one Exchange Server 2007 CCR node to be the active, “primary” node, and the other node would always be the passive, “secondary” node. This isn’t exactly a problem, per se, except that there may be times when you want (or need) to make the “secondary” node the active one for an extended period of time. Still, not a problem, right? Sure…except when the System Attendant goes to regenerate the offline address book(s). At that point, you’ll get a nasty warning in the Application event log, EventID 9395: “OALGen is running on the wrong CCR cluster node”.
But wait, you say: “I thought CCR was a clustering technology, and as such it wouldn’t matter which node was active?” Me, too, but apparently this isn’t quite the case. Reading Microsoft’s Technet article about the issue reveals this:
This Warning event indicates that the Microsoft Exchange System Attendant service is running on an active cluster continuous replication (CCR) node, but the
HKLM\System\CurrentControlSet\Services\MSExchangeSA\Parameters\{ClusterMailboxServerName}\EnableOabGenOnThisNode
registry value is not set to this node name.
Awesome.
So, it looks like either your active node always needs to be the
primary node (see this blog for more information), or the key
mentioned above needs to always be set to whatever the active node is at
the time. My solution was to create a script that will set the
EnableOabGenOnThisNode
value to the currently active node and have
Task Scheduler on each node run it when specific events (MSExchange
Cluster 1028 and 1029) are seen in the Application event log. (Note
that this is a feature of Task Scheduler in Windows Server 2008.)
The script is pretty short, and doesn’t require any modification–if it
is run on a CCR node, it will automatically discover the CMS Name and
the currently active node, and set the registry value appropriately.
All errors and other output are logged to the Application event log with
the source of EnableOabGen
, so if you use a monitoring solution like
Zenoss (my choice), MOM/SCOM, etc., you can have it pick up the events
from the event log.
I apologize for the long lines in advance. Also, I may not always
remember to update this post with the “latest” version of the script,
but you can always pull the latest version here. (Also note that
this script used to be called ps_enableoabgenonthisnode.ps1
, but I
decided to change the name to make it fit the Verb-Noun pattern, like
all good PowerShell scripts should :-) .)
################################################################################
#
# NAME : ps_EnableOabGenOnThisNode.ps1
# AUTHOR: Seth Wright , James Madison University
# DATE : 4/14/2009
#
# DESCRIPTION: This script sets the 'EnableOabGenOnThisNode' registry key on
# a CCR node member to the active node in the cluster.
# Reference http://technet.microsoft.com/en-us/library/bb266910.aspx
# for more information.
#
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY
# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
# THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
################################################################################
$CurrentNode = Get-Content Env:ComputerName
$Log = $null
# Set up a new Event Source for the script, if it doesn't already exist.
if (![System.Diagnostics.EventLog]::sourceExists("EnableOabGen")) {
$Log = [System.Diagnostics.EventLog]::CreateEventSource("EnableOabGen", "Application")
}
# Get the Application event log into the $Log variable.
$Log = New-Object System.Diagnostics.EventLog("Application", ".")
$Log.Source = "EnableOabGen"
$error.Clear()
$CMSName = Get-MailboxServer | where { $_.RedundantMachines -eq $CurrentNode }
if (!($CMSName) -or !([String]::IsNullOrEmpty($error[0]))) {
# The CMSName couldn't be determined. Either the script doesn't have
# network access, or the computer is not part of a CCR cluster, or something
# else is broke. Log the error and bail.
$Log.writeEntry("Could not determine the CMSName: $($error[0])", [System.Diagnostics.EventLogEntryType]::Error, 404)
return
}
# Find the active node of the CMS.
# First, get a list of all nodes in the CMS.
$error.Clear()
$OperationalMachines = (Get-ClusteredMailboxServerStatus -Identity $CMSName).OperationalMachines
if (!($OperationalMachines) -or !([String]::IsNullOrEmpty($error[0]))) {
# Couldn't determine the active node. Something's not right, so don't try
# to do anything else. Log the error and bail.
$Log.writeEntry("Could not determine OperationalMachines: $($error[0])", [System.Diagnostics.EventLogEntryType]::Error, 404)
return
}
# $pattern is the regex pattern to use to look for node marked as
# <Active...> in the OperationalMachines array.
$activePattern = "^(?<activenode>.*)\s+<Active.*"
# Perform the regex match. $match is a throw-away variable.
$match = $OperationalMachines | where { $_ -match $activePattern }
if (!($matches.activenode)) {
# No regex matches were found.
$Log.writeEntry("Cannot determine the Active Node of CMS $CMSName", [System.Diagnostics.EventLogEntryType]::Error, 404)
return
}
# A regex match was found for the Active Node.
$ActiveNode = $matches.activenode
# This is the base registry key
$baseKey = 'HKLM:\SYSTEM\CurrentControlSet\Services\MSExchangeSA\Parameters\' + $CMSName.Name
# Get the registry value "EnableOabGenOnThisNode" and check it against what it should be.
$result = (Get-ItemProperty -Path $baseKey).EnableOabGenOnThisNode
if ($result -notlike $ActiveNode) {
# The value is wrong, so set it to the currently-active node.
Set-ItemProperty -Path $baseKey -Name "EnableOabGenOnThisNode" -Value "$ActiveNode"
$result = $null
# Get the new value. There could probably be a check that the key was
# properly set, but I'm just trying to set it once.
$result = (Get-ItemProperty -Path $baseKey).EnableOabGenOnThisNode
# Log the change.
$Log.writeEntry("The registry value `"EnableOabGenOnThisNode`" on $CurrentNode has been set to $result", [System.Diagnostics.EventLogEntryType]::Warning, 200)
}
This might be over-complicating matters; I haven’t decided yet. Either way, it works with Server 2008’s Task Scheduler. Speaking of, here’s the task’s XML export.
<?xml version="1.0" encoding="UTF-16"?>
<Task version="1.2" xmlns="http://schemas.microsoft.com/windows/2004/02/mit/task">
<RegistrationInfo>
<Date>2009-04-14T17:52:21.2602543</Date>
<Description>
This script sets the 'EnableOabGenOnThisNode' registry key
on a CCR node member to the active node in the cluster.
Reference
http://technet.microsoft.com/en-us/library/bb266910.aspx for
more information.
</Description>
</RegistrationInfo>
<Triggers>
<EventTrigger>
<Enabled>true</Enabled>
<Subscription><QueryList><Query Id="0" Path="Application"><Select Path="Application">*[System[Provider[@Name='MSExchange Cluster'] and EventID=1028]]</Select></Query></QueryList></Subscription>
<Delay>PT15M</Delay>
</EventTrigger>
<EventTrigger>
<Enabled>true</Enabled>
<Subscription><QueryList><Query Id="0" Path="Application"><Select Path="Application">*[System[Provider[@Name='MSExchange Cluster'] and EventID=1029]]</Select></Query></QueryList></Subscription>
<Delay>PT15M</Delay>
</EventTrigger>
</Triggers>
<Principals>
<Principal id="Author">
<LogonType>Password</LogonType>
<RunLevel>LeastPrivilege</RunLevel>
</Principal>
</Principals>
<Settings>
<IdleSettings>
<Duration>PT10M</Duration>
<WaitTimeout>PT1H</WaitTimeout>
<StopOnIdleEnd>true</StopOnIdleEnd>
<RestartOnIdle>false</RestartOnIdle>
</IdleSettings>
<MultipleInstancesPolicy>Queue</MultipleInstancesPolicy>
<DisallowStartIfOnBatteries>false</DisallowStartIfOnBatteries>
<StopIfGoingOnBatteries>true</StopIfGoingOnBatteries>
<AllowHardTerminate>true</AllowHardTerminate>
<StartWhenAvailable>true</StartWhenAvailable>
<RunOnlyIfNetworkAvailable>false</RunOnlyIfNetworkAvailable>
<AllowStartOnDemand>true</AllowStartOnDemand>
<Enabled>true</Enabled>
<Hidden>false</Hidden>
<RunOnlyIfIdle>false</RunOnlyIfIdle>
<WakeToRun>false</WakeToRun>
<ExecutionTimeLimit>P3D</ExecutionTimeLimit>
<Priority>7</Priority>
</Settings>
<Actions Context="Author">
<Exec>
<Command>C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe</Command>
<Arguments>-PSConsoleFile "D:\Program Files\Microsoft\Exchange Server\bin\exshell.psc1" D:\Scripts\Enable-OabGenOnThisNode.ps1</Arguments>
</Exec>
</Actions>
</Task>
You’ll want to change the path to the script (look for
D:\Scripts\Enable-OabGenOnThisNode.ps1
on line 49) either by editing
the XML file or doing it after you import the task. Also, make sure you
change the path to the Exchange console file (D:\Program Files\Microsoft\Exchange Server\bin\exshell.psc1
in the XML, also line
49) to wherever you installed Exchange. And finally, have the task run
as a privileged user (probably an account with local Administrator
rights in order to modify the registry) and select “Run whether user is
logged on or not” on the General tab. Put the script and that task on
both nodes, and any time the CMSName is moved onto or off of either
node, they’ll update the registry value a minute later.
Since this is a work in progress, I may update this post with new versions of the script. If (or when) I do, I’ll make a note of what changes have been made.
EDIT 21-Jul-2010: General housekeeping, no real changes.