We are migrating our main file server from Windows 2003 Rs to Windows 2008.
The plan is to use DFRS replication while migrating from one FS to another.
The data consists of two main shares with a combined fifty thousand of
subfolders and nearly half a million files. We use DFSR to keep the two
servers in sync, allowing us to move users slowly and keep all data current
on both servers. This worked very well last year when we migrated from one
W2K3 R2 Server to another, but with a data set not quite as large.
We set up Replication on the new box (W2K8). We did not need a namespace or
publish to AD, so we created a replication group between the two servers as
we just want the folders to stay in sync. The current production server
(W2K3) was defined as the primary server in the replication group.
The initial replication took nearly a week. One caveat is that we backup
the current production server across the wire running Backup Exec and an
agent on the server. In order to have the agent recognize the shared
folders to be backed up, the DFS Replication service has to be stopped on the
production server for the duration of the backup. We automated this by
setting the Replication Group schedule to “No Replication” an hour before the
backup job was to start, and then created a scheduled task on the production
server to stop the DFS Replication service 45 minutes later. The thought
process behind doing this is that the replication process had 45 minutes of
no replication to finish whatever it was in the midst of before the DFSR
service on the production server was stopped. In the morning we would start
the DFSR service on the production server.
We started this process on Monday May 11th, and it finished over the weekend
of May 16 – 17. We used the DFSRdiag /backlog command to keep an eye on how
many files were still to be replicated. By Sunday evening, all four command,
both primary shares replicating to and from each server, showed that
everything was fully replicated. There was no backup job on Sunday night, so
we kept the replication running through Monday the 18th.
In the afternoon Monday the 18th, the production file server became
completely locked up and users could not save files. The dual processors
were pegged at 100% , and DFSR.exe and the exe of the AntiVirus program we
use were taking up all the processor so we had to kill the DFSR and
anti-virus process. The processor usage returned to normal, and our users
were able to work again.
We left replication turned off until the next morning. In the interim, we
disabled the anti-virus software on both the current production and new
server. We then resumed the replication process. In monitoring the process
using the DFSRdiag /backlog command, it turned out that the entire
replication process appeared to be running again. DFSRdiag showed that
hundreds of thousands of files were backlogged each way on each share. We
considered stopping everything, but decided to let it continue.
With the staggered schedule of having to shut replication down each evening
for 10 hours, once again the process kept replicating until the weekend. On
Saturday night all four backlog commands showed that everything was fully
replicated. There were a few odd things, but I am not sure they are
important and I have made this too long already.
Where I sit today is that it appears that both servers are in sync.
However, the DFSRPrivite folders on both shares of both servers have quite a
bit of data. On the production 2003 server, the sizes are 8GBs and 4GBs. On
the 2008 server, I can not get explorer to show any information on the System
Volume Information folder. I did grant local admin group read permissions
to the folder, and went in using the command prompt and found that the
Staging and ConflictAndDeleted folders contain what many hundreds of entries
(if anyone has a tip on how I can run a command that will show me the size of
these folders, or how to see the information in Windows explorer, it would be
greatly appreciated).
We are assuming that much of this data is left over from the initial
replication attempt, prior to the lockup of the production server. So our
dilemma is what d I do with all of this. Is there any reason we need to keep
GBs worth of disk space of these files that would appear to be no longer
needed since the servers are fully replicated. In one folder in particular,
the Installing folder has what appears to be a 3GB pst file in it. The time
of this file is around the time that we had our freezeup on the 18th. This
.pst file does exist on both servers in its correct size. What process can
we use to get rid of this file? Also, I have been looking around and came
across a blog that describes how to clean the ConflictAndDeleted folder:
http://blogs.technet.com/askds/archive/2008/10/06/manually-clearing-the-conflictanddeleted-folder-in-dfsr.aspxWhat about the folders and Files under the Staging folder. Is there any way
to clean this up?
Finally, once we are done with the old server, can someone point me to
instructions on how to COMPLETELY remove the replication group from the two
servers. I have read that even after removing the replication group there is
still data kept on the servers.
Any assistance would be greatly appreciated. Thank you in advance.
E_K