I have been trying to understand a domain controller’s initial synchronization requirements, which lead me to write up this post. See, I was reading the Microsoft Forest Recovery white paper, and they specifically state that when restoring a Windows 2008 DC that holds a FSMO role, initial synchronization should be disabled or else AD DS will be unavailable on that domain controller.
This got me wondering exactly what the requirements for initial synchronization are and would a DC really not advertise itself if initial synchronization was not completed at startup. I found in my experimentation, that this was not the case. Read on for further explanation.
Initial Synchronization of FSMO Owners
When a DC that owns a FSMO role boots up, it must complete inbound replication with its known replication partners before it will operate as the FSMO master. Specifically, it must replicate the partition that contains the FSMO role the DC owns. For example, if a DC holds either or both of the forest wide operations masters (the Domain Naming or Schema master), then that DC must successfully replicate the Configuration partition of Active Directory. Similarly, if the DC holds one or more of the domain specific FSMO roles (RID, PDC, Infrastructure) then that DC must successfully replicate the domain partition at startup before it will function as that operations master. See: http://support.microsoft.com/kb/305476
I have a domain of 3 Windows 2003 SP2 DCs and I tested this requirement by booting a DC that owns all FSMO roles when all other DCs were powered down. The machine booted OK, but initially I could not access AD Users and Computers as it said the domain may not be online. After a short delay it was accessible, so I am not sure if this was a initially caused by replication failure or if my virtual machine was just running slow that day. Anyway, I could create new accounts and log on to client machines normally (GPOs, new accounts, etc. all worked). I did not see event ID 1555 warning that initial synchronization had not finished like I thought I would (I did end up seeing this on Windows 2008, see the update at the end for more info). I created a lot of new accounts to exhaust the RID pool and sure enough, I could not request a new pool and had event ID 16651 in the event log. I would also like to note that despite this, dcdiag /test:ridManager reported that it passed successfully.
Also, after a period of time, event ID 2092 was logged saying:
This server is the owner of the following FSMO role, but does not consider it valid. For the partition which contains the FSMO, this server has not replicated successfully with any of its partners since this server has been restarted. Replication errors are preventing validation of this role.
Operations which require contacting a FSMO operation master will fail until this condition is corrected.
This was logged for each FSMO role held. Finally, I booted another DC and AD was accessible again on the FSMO machine. However, the FSMO DC did not resume FSMO operations until I forced replication to take place.
The purpose of this initial synchronization is to prevent the problems that occur when more than one DC claims to be the owner of an operations master role. By performing a replication at startup, the starting DC can see if there has been a change in FSMO ownership that it may not have been aware of since it was powered off. If it detects a change in ownership of the role, it will no longer act as the owner of that FSMO role.
Now, initial synchronization can be helpful to prevent multiple DCs from holding the same FSMO role in some cases, but it is not fool-proof. To confirm this, I created two sites in my forest: Site-A and Site-B. I moved DC1 (the owner of all FSMO roles and the only GC) and DC2 into Site-A and moved DC3 into Site-B.
Next, I took DC1 offline and used DC3 to seize the RID Master role. This went fine and DC3 now owned the RID Master role, or so it thought. See, because my DCs are in separate sites now, DC2 will not know about the change in the RID Master ownership until replication between sites takes place (every 3 hours by default).
Now, recall how initial synchronization works. A DC that is a FSMO owner replicates with its known replication partners at startup to check for changes in FSMO ownership. When I powered back up DC1 (the original RID master), as long as DC1 replicated with a partner DC that knew about the RID role changing ownership, DC1 would surrender the role and I would avoid having two DCs both thinking they were RID masters.
In my case, however, I made certain DC1 was replication partners with DC2 only (DC2 was actually the bridgehead for Site-A). Because of this, when DC1 started up, it performed its inital synchronization tasks by replicating with its replication partner DC2. Only problem is, DC2 had still not replicated with DC3 in Site-B and thus still did not know about the RID role changing. As a result, I ended up with two DCs in my domain that both thought they were RID Masters and both DCs were capable of assigning pools of RIDs. Not good!
I would also like to emphasize again, that when a DC holding a FSMO role starts, the initial synchronization will be performed with that DC’s known replication partners. If those DCs are offline, then the DC owning the FSMO role will not take ownership of the role until the KCC rearranges the replication topology and replication is able to take place.
Disabling Initial Synchronization
I was under the impression that you can bypass the requirement that DCs holding FSMO roles must complete successful initial synchronization by setting the following registry key:
HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Repl Perform Initial Synchronizations
Create the above entry as a REG_DWORD with a value of 0 to disable initial synchronization.
However, I tried this on my Windows 2003 machine and it did not seem to help. I did get a event ID 1564 in my event log as shown below.
But, when I tried exhausting the RID pool, I could not get a new pool as my RID master would not perform that function. I do not know if this is different in Windows 2008, because the MS Forest Recovery white paper specifically says to set this registry setting for a Windows 2008 DC to avoid AD DS being unavailable. I am going to try this next. (Please see the update at the end of this post for the results)
Three other options exist, however. You can perform a metadata cleanup on the FSMO DC to remove the other DCs from the forest. Or you could delete the incoming replication links from the FSMO owner to its partner DCs that contain the partition hosting the FSMO role. See “How to use the Repadmin.exe tool to troubleshoot initial synchronization issues” in http://support.microsoft.com/kb/305476. You can also use NTDSutil to seize the roles.
I tried deleting the replication links using repadmin /delete <dn of partition> <name of dc you want to delete the link on> <guid of replication partner>._msdcs.domain.tld
By deleting the replication link to the partner for the domain partition, my RID master was able to resume ownership of the role and hand out more pools of RIDs.
As far as other non-FSMO domain controllers, my understanding is that they do not need to perform any type of initial synchronization while starting to act as domain controllers. They will attempt to perform initial synchronization, but if they cannot replicate with known replication partners, they will not be prevented from advertising as DCs. I tested this by booting only one normal DC in an environment with a total of 3 DCs. This DC started OK and AD was accessible and I could make new users and GPOs. I tested logging on to a client machine successfully as well.
Update: I have tried testing initial synchronization requirements for FSMO owners on Windows 2008 R2. I setup 2 DCs; DC1 held all FSMO roles and DC2 held no other roles other than DNS. Even when setting the registry key for “Repl Perform Initial Synchronizations” to 0, DC1 would still not perform FSMO functions if it could not replicate with a partner at startup. I received event ID 2092 as I did on Windows 2003. However, DC1 still performed normal DC operations, such as processing logon requests as I tried several times creating new accounts and logging in to a Windows 7 machine. Again, this is consistent with what happened on Windows 2003. So, I am not sure why the Forest Recovery whitepaper specifically says to set this registry key for a Windows 2008 machine. It appears the DC will continue to operate as a normal DC and if you want to perform FSMO operations, event 2092 actually suggests you use NTDSutil to seize the roles (even if the local machine is the holder).
Update 2: I was just trying a domain recovery by restoring the first domain controller on an isolated network and I tried to seize the FSMO roles as suggested in event ID 2092 and it didn’t seem to work. I was able to seize all the roles except RID master. I was greeted with an error that the selected server is already the RID master so I’m not sure what the deal is with that.
Update 3: After re-reading the forest recovery white paper, it sounds to me like maybe the authors were trying to say that AD DS would not be available during the time that a domain controller is trying to perform initial synchronization (which could be quite a few minutes) as opposed to never being available at all which is how I initially interpreted it. Here is the relevant passage from the white paper:
If the first domain controller runs Windows Server 2008, add the following registry entry to avoid AD DS being unavailable until it has completed replication of a writeable directory partition. Unless you add this registry entry, you may see Event ID 1555 in the Directory Services log of the Windows Server 2008 domain controller, which indicates that AD DS is not available. The registry entry to add is the following:
HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Repl Perform Initial Synchronizations
Create the entry with the data type REG_DWORD and a value of 0. After the forest is recovered completely, you can reset the value of this entry to 1, which requires a domain controller that restarts and holds operations master roles to have successful AD DS inbound and outbound replication with its known replica partners before it advertises itself as domain controller and starts providing services to clients.
Emphasis mine. Regardless, the “Repl Perform Initial Synchronizations” registry key does not seem to bypass the requirements that FSMO owners replicate with known partners before they will provide FSMO services. Instead, it seems that it can simply be used for any DC to disable it from trying to replicate with it’s partners when it starts up.
An operations master does not synchronize when a Windows 2000 Server-based or Windows Server 2003-based computer is started
Initial synchronization requirements for Windows 2000 Server and Windows Server 2003 operations master role holders