GAPTHEGURU

Geek with special skills

Recovering Public Folders After Accidental Deletion

Part 1: Recovery Process

Overview

This two-part blog series will outline some of the recovery options available to administrators in the event that one or more public folders are accidentally deleted from the environment. The first part will explain the options, while the second part will outline the architectural aspects of public folders that drive the options.

Introduction

In older versions of Exchange, mailbox and mailbox database recovery was a long, complicated process involving backups, recovery servers, and changes to Active Directory. Successive versions of the product have introduced more and more functionality around recovery (recovery storage groups/databases, database replication, etc.), and we’re now at the point where restoring a mailbox is a seemingly trivial operation, and restoring a mailbox database is almost unheard of. But mailboxes aren’t the only data stored on Mailbox servers in Exchange Server 2010, and the procedure for restoring public folders and public folder databases differs greatly from the mailbox procedure.

Review of Recovery Options

The first two recovery options are detailed either in TechNet or elsewhere on the Exchange team blog site, so I’ll simply list them here and then move on to the real purpose of this blog.  The recovery options for public folders and public folder databases in Exchange Server 2010 are as follows, from the easiest to the most complex:

  1. Recover deleted folders via Outlook (detailed in http://technet.microsoft.com/en-us/magazine/dd553036.aspx).Note: Exchange Server 2010 Service Pack 2 fixes an issue where users were unable to use Outlook to recover deleted public folders. This is another reason to upgrade your Exchange Server 2010 systems to SP2 at the earliest opportunity.
  2. Recover deleted folders via ExFolders (http://blogs.technet.com/b/exchange/archive/2009/12/04/3408943.aspx).
  3. Recover folders via public folder database restore.

The first option is the easiest and most obvious – if an end user accidentally deletes a folder, he or she should be able to undelete that folder using Outlook. Failing that, an administrator should be able to use ExFolders to recover that folder. But what if these options won’t work in your situation? What if the end user didn’t realize he or she deleted the folder, and a month has passed? Or what if your organization has changed the retention settings for deleted public folders, and essentially eliminated the dumpster?  How do you recover public folders in this case?

Recovery Options

At the heart of public folder recovery is a painful truth: you can’t delete a public folder from the organization and recover it by simply restoring an older version of a public folder database. If you restore a public folder database from backup and place it back into production, you’ll see the public folders only until the server receives replication messages. Because the public folder hierarchy – the list of all folders in the environment – no longer includes the folders which were deleted, the target server has copies of folders which, from Exchange’s perspective, don’t exist. As soon as that public folder database receives a hierarchy update, it will see that those public folders aren’t present in the hierarchy, and the store will delete the public folder again. Since you can’t edit the hierarchy via the Public Folder Management Console (or even via adsiedit.msc), you can’t manually add that public folder back in. So, given this limitation, how do we recover that public folder?

Consider the following points:

  • If you don’t replicate every folder to every database, you would need to delete all current databases and then recover from backup any database that contains unique content.  This only works if you have recent backups, of course, and would also require that you export any content generated since that backup, since you’re going to delete all of the existing databases. The deletion is necessary because if a restored public folder store receives hierarchy replication from one of the existing public folder stores, the whole exercise is for naught.
  • If you do replicate all folders to all stores in the environment, you can delete all stores and just restore one database, then replicate the content from that database out to the other servers. Again, this depends on all databases having duplicate content, and you must delete all existing databases before restoring the one from backup.
  • You can restore a backup of the public folder database to an isolated Exchange environment, connect to the public folder database with Outlook, export all content to a series of PSTs, create new folders in the production environment with the same names as the deleted folders, and then import all of the content. This is obviously a somewhat manual process, and most administrators aren’t going to want to do this.

Recommended Recovery Procedure

Thankfully there is a much easier process which can be performed in-place and with a minimum of fuss.

  1. Select one of the existing public folder servers in the environment. [Using an existing server simplifies the process a bit.] You will isolate this system from its replication partners, so choose a system that doesn’t serve as the source for a lot of content which needs to be replicated.
  2. Using Registry Editor, set the value of the Replication registry key (HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\MSExchangeIS\<servername>\Public- <GUID of Public Store>) to 0(zero).Note: You may need to create this DWORD key if it doesn’t already exist. Further information on the Replication registry key is available in the article, “Replication does not occur for one Exchange server in the organization” (http://support.microsoft.com/kb/812294). This registry key also applies to Exchange Server 2007 and 2010.
  3. Restore the public folder database in place using your normal restoration procedure.
  4. Using an Outlook client, log onto a mailbox which uses the restored public folder database as its default public folder store (this is necessary in order to see the restored folders). If you don’t have a mailbox database which uses that public folder database as its default, either create a new mailbox database (recommended) or change an existing mailbox database to use the newly-restored public folder database.
  5. If necessary, click the Folders icon at bottom left of the Navigation screen, and then expand the public folders node.
  6. Copy each of the folders you wish to restore to another location within the public folder hierarchy. If you’re restoring an entire hierarchy, you can simply Ctrl-click and drag the root folder to make new copies of all subfolders. Although the new folders will have similar names to the originals, the underlying folder IDs (FIDs) are different.
  7. Once you’ve created copies of all of the folders, verify that the replica lists include all desired targets (and reconfigure as appropriate).
  8. At this point, it’s now safe to reintroduce that server into the production environment. To do so, dismount the public folder database, delete the Replication registry key (or set it to 1), and then remount the database.
  9. As soon as hierarchy is replicated to the server, the original folders will once again disappear, but the copies of the folders will be replicated to all replication partners.

You may need to add mail-enabled public folders back into distribution groups, as their SMTP addresses will likely be different from those on the original folders. End users will also need to recreate public folder favorites in Outlook.

Summary

Recovering from accidental public folder deletion can be difficult, especially if you don’t take hierarchy replication into account. By restoring into an isolated environment, and then cloning the folders to be restored, you can work around this limitation and restore the missing content. In the next blog entry, I’ll explain the underlying architecture of public folders (including replication, change numbers, and the replication state table) to show why these steps are so necessary.

———————————————————————————————————————————————————————————————————————————————————————————-

Part 2: Public Folder Architecture

 

Introduction

In this second part, I’m going to describe some of the inner workings of public folders themselves.  Each organization maintains a list of all public folders in the environment, as well as the locations of all replicas.  This list is called the hierarchy, and it’s common to all public folder stores in the environment.  The hierarchy lists all public folders in the environment as well as which servers host replicas of each folder.  Each public folder store has a copy of the hierarchy, and uses it to provide referrals to end users for public folder replicas on other servers (among other things).  Each public folder store also maintains a table, called the replication state table, which keeps track of the status of each folder.  This table is a critical yet little understood feature of public folders, and it has a huge impact on recovery.

Overview

As I said above, each public folder store maintains a replication state table, but unlike the hierarchy, it’s unique to each store.  A public folder store maintains information about the public folders for which it has a replica, not just for itself but for all servers with that replica.  It does this so that it knows which other stores have more up-to-date public folder content, or which ones might have items required for backfill replication (catching up on old or missing items).

Imagine the following scenario:  we have three servers, each hosting a public folder database – PFS1, PFS2, and PFS3.  We have a folder – Folder1 – which is replicated to each database.  If I could peer into the replication state table for PFDB1, I would see an entry for Folder1, and that entry would contain information about Folder1’s status not on for PFS1, but also for PFS2 and PFS3.  What kind of information does this table actually contain?  To answer that, we need to dig yet further into public folder structure, and talk about CNs.

Change Numbers

CNs – or, to give their full name, change numbers – are numbers assigned to each modification made to content in a public folder.  Think of them as per-folder odometers – they increment each time a change is made to a folder, and only increase, never decrease. Each public folder assigns CNs to the changes made on a given replica, and that information is transmitted to other replicas.  These other replicas use this information to see if they’ve already received a particular change.  For example, if I make a change to Folder1 on PFS1, that database might assign change number 211 to that modification.  When the public folder database replicates that change to other databases, it records and transmits that change as FID1-123:PFS1:211.  [Folder1 is represented within the public folder database, and by extension in the replication traffic, by a folder ID (FID). This becomes very important later.] PFS2 receives the replication message and checks to see if it has already received CN 211 from PFS1.  If it hasn’t, it applies the change and updates its own entry in the replication state table to reflect the fact that it has now received change 211 for Folder1 (FID1-123) from PFS1.  If PFS3 later replicates the same change (FID1-123:PFS1:211) to PFS2, PFS2 will check its list, see that it has indeed already received that change, and discard that particular replication message.

Here’s a sample hierarchy replication message. Notice the CN min, CN max, and FID entries in the description field.

Event Type: Information
Event Source: MSExchangeIS Public Store
Event Category: Replication Outgoing Messages
Event ID: 3018
Description:
An outgoing replication message was issued.
Type: 0x2
Message ID: <23599A0EB070AA92F03E31C546C9C8FFA4F7@contoso.com>
Database “PFDB”
CN min: 1-11D3, CN max: 1-11D4
RFIs: 1
1) FID: 1-38BF, PFID: 1-1, Offset: 28
IPM_SUBTREE\TestPF

At any given time, each public folder store knows exactly what content it has, and has a general idea of what content the other public folder stores have.  This is an important point – public folder databases are aware of their environment surroundings.  It’s this awareness that has implications for recovery.

The Replication State Table

Here’s a quick visualization of how a public folder change is propagated from one server to another. This table simulates the replication state table which is internal to every server. There are four columns – the first represents the replication details (the CNsets), and the next three represent the same folder on each of the three servers. In essence, this table shows you what each server knows about other server’s knowledge of this particular folder. Please note that this is a simplified version of the replication state table – it’s actually quite a bit more complicated than this, but this is all the detail 99.99% of engineers will ever need.

In this example, Folder1 has been replicated to three systems – PFS1, PFS2, and PFS3 – and public folder replication is fully up-to-date. The servers know what they’ve sent to their replication partners, and what’s been replicated back to them. Since end users could conceivably make updates on any of the servers, they each have their own CN sets for the same folder.

Details From Folder1 on PFS1 Folder1 on PFS2 Folder1 on PFS3
PFS1 Last sent CN PFS1:10 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS2 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

Last sent CN PFS2:20 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS3 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

Last sent CN PFS3:30

An end user connected to PFS1 makes a change, which PFS1 assigned change number 11. The replication state table on PFS1 is updated to reflect this new CN.

Details From Folder1 on PFS1 Folder1 on PFS2 Folder1 on PFS3
PFS1 Last sent CN PFS1:11 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS2 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

Last sent CN PFS2:20 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS3 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

Last sent CN PFS3:30

PFS1 packages this change (which we assume is the only one made to Folder1) and sends it to PFS2 and PFS3, which update their own replication state tables.

Details From Folder1 on PFS1 Folder1 on PFS2 Folder1 on PFS3
PFS1 Last sent CN PFS1:11 FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS2 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

Last sent CN PFS2:20 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS3 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

Last sent CN PFS3:30

Both PFS2 and PFS3 apply the changes, and since those two systems received the change from PFS1, they also update their “knowledge” of PFS1. Notice that PFS1 does not update its entries for PFS2 and PFS3 – while it has sent the content to them, it hasn’t received confirmation that they’ve applied that change. [Because public folder replication messages are delivered via Hub Transport, public folder stores don’t directly interact and so never assume that the updates were delivered and applied.]

Continuing with our example, an end user makes a change to Folder1 on PFS3:

Details From Folder1 on PFS1 Folder1 on PFS2 Folder1 on PFS3
PFS1 Last sent CN PFS1:11 FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS2 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

Last sent CN PFS2:20 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS3 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

Last sent CN PFS3:31

That change is now replicated to PFS1 and PFS2:

Details From Folder1 on PFS1 Folder1 on PFS2 Folder1 on PFS3
PFS1 Last sent CN PFS1:11 FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS2 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

Last sent CN PFS2:20 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS3 FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-31

FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-31

Last sent CN PFS3:31

Note that when PFS3 sent out its replication message, it included not only its own update, but also the fact that it had received update 11 from PFS1.

Again, while every server has the most up-to-date content for Folder1, they don’t necessarily know that every replica is up-to-date. [PFS1, for example, “thinks” that PFS2 is out of date, while PFS3 “thinks” that both PFS1 and PFS2 are out of date.] It’s important to note that this isn’t a problem – by only encapsulating status messages in outgoing replication, Exchange avoids saturating the network with constant messages from various servers confirming the receipt of recent replication messages.

Backfill Replication

However, from time to time, a server loses its connection to its replication partners, either through network failure, service failure, or other causes. When it does, its replication state table no longer receives updates to the CNs held by its partners for their replicas. In other words, its replication state table is outdated. When that server reconnects with its partners, and receives a new message, it may find that the CN on that new message is much higher than what it expected. Using the previous example, imagine that PFS3 is isolated from PFS1 and PFS2 due to a server failure, and does not receive updates to Folder1 from the other servers for several hours. The resulting table might look like this:

Details From Folder1 on PFS1 Folder1 on PFS2 Folder1 on PFS3 (OFFLINE)
PFS1 Last sent CN PFS1:16 FID1-123:PFS1:1-16

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS2 FID1-123:PFS1:1-16

FID1-123:PFS2:1-28

FID1-123:PFS3:1-30

Last sent CN PFS2:28 FID1-123:PFS1:1-10

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS3 FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-31

FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-31

Last sent CN PFS3:31

Notice that PFS1 is aware that the most recent replication message from PFS2, for change number 28, also included information about PFS2’s knowledge of PFS1 (namely, that PFS2 receives PFS1’s update numbers 12 to 16). PFS3 has not received any of these recent updates.

However, when PFS3 is brought back online, and receives a new replication message, it suddenly learns of the missing messages. This triggers a backfill request– a request from PFS3 to the source server for the missing content.

Details From Folder1 on PFS1 Folder1 on PFS2 Folder1 on PFS3
PFS1 Last sent CN PFS1:17 FID1-123:PFS1:1-17

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

FID1-123:PFS1:1-11, 17

FID1-123:PFS2:1-20

FID1-123:PFS3:1-30

PFS2 FID1-123:PFS1:1-16

FID1-123:PFS2:1-28

FID1-123:PFS3:1-30

Last sent CN PFS2:28 FID1-123:PFS1:1-16

FID1-123:PFS2:1-28

FID1-123:PFS3:1-30

PFS3 FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-31

FID1-123:PFS1:1-11

FID1-123:PFS2:1-20

FID1-123:PFS3:1-31

Last sent CN PFS3:31

Backfill Request PFS1:12-16

Backfill Request PFS2:21-28

Notice that PFS3 is missing updates 12 through 16 for PFS1, and 21 through 28 for PFS2. PFS3 will request the missing content from any server that it believes has that content, which in this case would mean either PFS1 or PFS2. How does PFS3 know that both servers have the content? Because the replication message from PFS1, which included change number 17, included the information about the CN sets for PFS1, PFS2, and PFS3.

Strictly speaking, Exchange doesn’t issue these backfill requests right away – it waits a few hours (six or more, depending on the situation) before sending them out, just in case one of its replication partners happens to send that missing content. If a specific update hasn’t been received after the backfill timeout is reached, Exchange then generates that backfill request and sends it to the replication partners. This process is detailed in the “Backfill Requests and Backfill Messages” section of the TechNet page on “Understanding Public Folder Replication” at http://technet.microsoft.com/en-us/library/bb629523.aspx#Backfill.

Removing or Deleting Replicas

When you remove a public folder replica, the owning public folder database contacts all other database to find out if they have all of the content that’s contained within the replica that’s about to be removed.  It does so by sending out a status message that contains the CNs for its replica of the folder. For example, if I were to remove the replica of Folder1 from PFS3, it would send a message to PFS1 and PFS2 confirming that between the two of them, they have every update from PFS3 from 1 to 31. [This is an important point: the content doesn’t need to be on one server. As long as the content exists somewhere in the organization, the replica can be removed.] If PFS3 had any unique content that neither PFS1 nor PFS2 had, it would replicate those items to its replication partners. Once it has confirmed that it no longer has any unique content, the public folder store removes that replica.

However, when you delete a public folder outright (as in, remove all replicas), there’s no need to preserve content, so it’s deleted from every public folder store.  This is why it’s vital that public folder administrators understand the difference between removing a replica (with Set-PublicFolder -Replicas) and deleting a public folder (with Remove-PublicFolder).

These changes to replica lists and outright deletions are transmitted just like any other public folder change – as hierarchy replication messages, complete with their own CNs.  If I remove the replica of Folder1 from PFS1, that change will go to PFS2 and PFS3 so that they know that they no longer need to replicate new content for Folder1 to PFS1.  Likewise, if I delete Folder1, it will be deleted from all of the databases and removed from the hierarchy as well.  The replication state table keeps track of changes to hierarchy too, and so knows which folders exist in the organization and which don’t. It is this tracking mechanism that prevents us from simply restoring a public folder database and reintroducing the deleted folders into the environment.

Recovery of Deleted Public Folders

In part one of this blog, I outlined a process for safely and successfully restoring public folders which were accidentally deleted from the environment. Step six of the procedure reads, in part, “Copy each of the folders you wish to restore. [Although the new folders will have similar names to the originals, the underlying folder IDs (FIDs) are different.]” I’ve added italics to highlight the key point – when you copy (clone) public folders, you’re really creating new folders. They may bear the same name as the originals, but the folder IDs are different. So although my cloned copy of Folder1 may look like the original Folder1, and contain the same items as Folder1, none of the replication messages for the original Folder1 will apply to it, because it’ll have a completely different FID. This new folder is added to the hierarchy, and because end users see the name, not the FID, they’ll simply use it as they would the original folder.

Troubleshooting Replication

If you’re looking for troubleshooting information, look no further than Bill Long’s excellent four-part blog series on public folders:

Summary

Public folders use their own replication mechanism, where changes are tracked in an internal, non-editable table and communicated to replication partners alongside the actual content changes. The public folder hierarchy follows the same principles, and so changes made to the hierarchy are replicated to all public folder databases in the environment. Understanding the replication mechanism helps an administrator understand not only disaster recovery, but troubleshooting as well.

07/10/2012 Posted by | Exchange server, Public Folders, Recovering/Restore | , , | Leave a comment