GAPTHEGURU

Geek with special skills

Backup and Restore of the Failover Cluster Configuration Using VSS

1                Introduction to Backup and Restore of the Failover Cluster Configuration Using VSS

Failover clustering uses the Volume Shadow Copy Service (VSS) for backing up and restoring the failover cluster configuration.

While a node is not part of a cluster or the cluster is down, the cluster configuration file (ClusDb) can be treated like any other registry hive. If the cluster is running, ClusDb will have to be committed on reboot as part of the registry.

Because of the distributed nature of clusters, backing up a ClusDb is not sufficient to ensure that the full cluster state has been saved. This state is contained in the quorum resource or database.

For a backup application to successfully save the state of a clustered system, at a minimum it will need to perform the following steps:

  1. 1.    Detect whether the system is a clustered system and determine its current state.
  2. 2.    Select a clustered node for backup.
  3. 3.    Back up the clustered database and registry in a checkpointed form.

1.1.1      Backing Up the Cluster State Using VSS

The first step in backing up or recovering a clustered system is to detect clustering services and determine their state using the GetNodeClusterState function.

If clustering services are installed and running, backup applications (requesters) should choose to treat the system as a cluster. If clustering services are installed but not running, backup applications should treat the system as a stand-alone server.

If clustering services are detected, a backup application can display the cluster by name for backup. The cluster name can be determined by passing either the machine name or NULL to the OpenCluster function and retrieving a handle to the cluster to which the current node running the backup application belongs.

It is best to pass “machine name” as a parameter to the OpenCluster function if possible because this will ensure that the client is connected to the cluster even if the cluster service stops on the node to which the client is initially connected.

After the application has a handle to the cluster, the GetClusterInformation function will provide information about the cluster, including its name.

Now, ClusDb can be backed up using VSS.

 

1.1.2      Overview of Processing a Backup Under VSS

In processing a backup, requester and writers coordinate to provide a stable system image from which to back up data (the shadow copied volume), to group files together on the basis of their usage, and to store information on the saved data. This must all be done while creating only minimal interruption to the writer’s normal work flow.

A requester queries writers for their metadata, processes this data, notifies the writers prior to the beginning of the shadow copy and of the backup operations, and then notifies the writers again after the shadow copy and backup operations end.

In response to these notifications, the writer provides information about files to be backed up—including specifying groups of files to coordinate (components)—pauses in its I/O operations prior to a shadow copy, and then returns to normal operation following the completion of a shadow copy or at the end of the backup.

In the course of processing the backup, a writer specifies the files it is responsible for through its read-only metadata—the Writer Metadata Document (see VSS Metadata: Working with the Writer Metadata Document). The requester then interprets this metadata, chooses what to back up, and stores these decisions in its own metadata object, the Backup Components Document (see VSS Metadata: Working with the Backup Components Document). This Backup Components Document is available for writer inspection and modification during both the backup and restore operations.

This diagram shows the interactions between the requester, the VSS service, the VSS kernel support, any VSS writers involved, and any VSS hardware providers.

 

1.1.3      Restoring the Cluster State Using VSS

There are two parts to restoring the cluster state:

  1. 1.      Restoring the node. This is completely taken care of in a system state restore on that particular node.

If the rest of the cluster is running and you do not want to restore the cluster, stop at this point and restart the node; it should rejoin the cluster.

If the cluster needs to be restored or rolled back in time, you will need to do an authoritative restore and restore the cluster.

  1. 2.      Restoring the cluster. To perform such a restore, the cluster service has to be installed but not necessarily running on the node being restored. All other nodes in the cluster must have the cluster service stopped. The IVssComponentEx2::SetAuthoritativeRestore method is used to indicate that an authoritative restore is being performed.

The Cluster VSS Writer will specify a restore method of VSS_RME_RESTORE_STOP_START to indicate that the specified service is to be stopped and restarted. For more information, see the description of the VSS_RME_RESTORE_STOP_START enumeration value in the VSS_RESTOREMETHOD_ENUM enumeration topic. After the cluster service has been started, the other nodes on the cluster can have their cluster service restarted.

2                Backing up and Recovering the Failover Cluster Configuration Database

For my first post here, I thought I’d talk about backing up and recovering the Failover Cluster configuration database.  For Windows Server 2008, Failover Clustering’s backup and restore code was revamped to fit into the Volume Shadow Copy Service (VSS) framework.  Now Failover Clusters can be backed up just like any other application that supports VSS.

To get us started, I created a cluster of two file servers.  I have a 2-node cluster with a node and disk majority quorum mode, so I can sustain the failure of one node or the quorum disk.  Let’s open up the cluster administrator and take a look at what we’ve got.  Take a look at Figure 1 to see this file share.

 

 

Figure 1: The first file share on the cluster. 

 

I set up a backup schedule to backup all the critical volumes every 30 minutes, as you can see in Figure 2.  You want to carefully think about the backup schedule of your server and volumes.

 

 

Figure 2: My scheduled backups.  It doesn’t show it, but the Windows Server Backup utility includes the Clustering application by default.

 

In this guide I have moved some disks from one of the cluster node. The steps to recover the cluster configuration to an earlier state is: (Start -> Right Click on Command, Select Run as Administrator).  At the prompt, type wbadmin get versions.  This will show you all the backups on this machine.

Here’s the output on mine:

 

C:\Windows\system32>wbadmin get versions

wbadmin 1.0 – Backup command-line tool

(C) Copyright 2004 Microsoft Corp.

 

Backup time: 12/31/2007 12:18 PM

Backup target: Network Share labeled \\mattkur-stor\ClusterBackups

Version identifier: 12/31/2007-20:18

Can Recover: Volume(s), File(s), Application(s), Bare Metal Recovery, System State

 

Backup time: 12/31/2007 3:30 PM

Backup target: Fixed Disk labeled mattkur 2007_12_31 15:11 DISK_01(\\?\Volume{e028e25b-b000-11dc-8ea1-0011114b1b2e})

Version identifier: 12/31/2007-23:30

Can Recover: Volume(s), File(s), Application(s), Bare Metal Recovery, System State

 

Backup time: 12/31/2007 4:00 PM

Backup target: Fixed Disk labeled mattkur 2007_12_31 15:11 DISK_01(\\?\Volume{e028e25b-b000-11dc-8ea1-0011114b1b2e})

Version identifier: 01/01/2008-00:00

Can Recover: Volume(s), File(s), Application(s), Bare Metal Recovery, System State

 

Figure 4: The backup versions available to me at this time.  Note the version identifier, as this is the key string we need to identify this version in future commands.

 

The 4PM backup happened just before I made the change, so in this guide I will use this.  Take note of the string after “Version Identifier: “.  This string is what we pass in to the follow commands as the parameter “-version:XXXXXX” to refer to the specific backup that we made.  To make sure that we can restore cluster data, let’s take a look at what was backed up.  For this, use the command “wbadmin get items”:

 

 

C:\Windows\system32>wbadmin get items -version:01/01/2008-00:00

wbadmin 1.0 – Backup command-line tool

(C) Copyright 2004 Microsoft Corp.

 

Volume Id = {33e841b0-affa-11dc-baba-806e6f6e6963}

Volume ‘<Unlabeled Volume>’, mounted at D:

 

Volume Id = {33e841b2-affa-11dc-baba-806e6f6e6963}

Volume ‘Longhorn’, mounted at C:

 

Application = Cluster

 

Component = Cluster Database (\Cluster Database)

 

Application = Registry

 

Component = Registry (\Registry)

 

Figure 5: The items included in that backup.  Note the Cluster application – this is what we’re looking for!

 

Next step is restore of the cluster.  We always advise that an administrator take all applications in the cluster offline prior to recovering the cluster configuration.  While we’re at the command line, enter “cluster group <group-name> /off”, for each application name (to see a list of the application names, just run the command “cluster group”).  This will take care of the applications.  To start the recovery, use the “wbadmin start recovery” command.  I specify that I want to perform an Application level recovery (-itemType:App) and restore the Cluster application (-items:Cluster).

 

C:\Windows\system32>wbadmin start recovery -itemType:App -items:Cluster -version:01/01/2008-00:00

wbadmin 1.0 – Backup command-line tool

(C) Copyright 2004 Microsoft Corp.

 

You have chosen to restore the application Cluster.

The following components will be restored.

Component = Cluster Database (\Cluster Database)

 

WARNING:  This operation will perform an authoritative restore of your cluster. After restoring the cluster database, the Cluster service will be stopped and then started, which may take a few minutes. Please be patient.

 

Do you want to continue with an authoritative restore of your

cluster?

[Y] Yes [N] No y

 

Preparing the component Cluster Database for restore.

Restoring the files for the component Cluster Database, copied (100%).

Restoring the component Cluster Database.

Restored the component Cluster Database successfully.

 

Recovery operation completed

Log of files successfully restored

‘C:\Windows\Logs\WindowsServerBackup\ApplicationRestore 31-12-2007 17-25-08.log

 

Summary of recovery:

——————–

 

Restored the component Cluster Database successfully.

 

NOTE: In order to COMPLETE the restoration of cluster associated with this

node,

1.  the cluster service must be started on this node.

2.  After that, cluster service needs to be started on the nodes identified in the restored cluster database. To see the list of nodes, type the following command in a command window:

Command:: cluster.exe node

 

Figure 6:  The restore of the cluster configuration and a restart of the node.

 

Figure 7: But still one of our nodes is still down.

 

If you take a look at the nodes, notice that one of my nodes is down.  During the restore process, the cluster is taken completely offline.  The cluster configuration database is recovered from the backup store on one node (the node that we just ran our recovery from).  To ensure that this is the copy of the cluster configuration that the cluster uses, this node must be started first.  Wbadmin is kind enough to do this for us, but we need to start the other nodes in the cluster.  Do that and we’re 100% operational.

 

3                Index of switch parameters

3.1          GetNodeClusterState function

Determines whether the Cluster service is installed and running on a node.

3.1.1      Syntax

DWORD WINAPI GetNodeClusterState(

__in_opt  LPCWSTR lpszNodeName,

__out     LPDWORD pdwClusterState

);

3.1.2      Parameters

lpszNodeName [in, optional]

Pointer to a null-terminated Unicode string containing the name of the node to query. If lpszNodeName is NULL, the local node is queried.

pdwClusterState [out]

Pointer to a value describing the state of the Cluster service on the node. A node will be described by one of the following NODE_CLUSTER_STATE enumeration values.

Value

Meaning

ClusterStateNotInstalled

0

The Cluster service is not installed on the node.
ClusterStateNotConfigured

1

The Cluster service is installed on the node but has not yet been configured.
ClusterStateNotRunning

3

The Cluster service is installed and configured on the node but is not currently running.
ClusterStateRunning

19 (0x13)

The Cluster service is installed, configured, and running on the node.

 

3.1.3      Return value

If the operation succeeds, the function returns ERROR_SUCCESS (0). If the operation fails, the function returns a system error code.

3.1.4      Remarks

Note  The GetNodeClusterState function does not support a 64-bit Windows-based node if the calling application is 32-bit Windows-based.

 

3.2          OpenCluster function

Opens a connection to a cluster and returns a handle to it.

3.2.1      Syntax

HCLUSTER WINAPI OpenCluster(

__in_opt  LPCWSTR lpszClusterName

);

 

typedef HCLUSTER (WINAPI *PCLUSAPI_OPEN_CLUSTER)(

__in_opt  LPCWSTR lpszClusterName

);

3.2.2      Parameters

lpszClusterName [in, optional]

Specifies one of the following values:

  • Pointer to a null-terminated Unicode string containing the name of the cluster or one of the cluster nodes expressed as a NetBIOS name, a fully-qualified DNS name, or an IP address. This produces an RPC cluster handle.
  • NULL, which produces an LPC handle to the cluster to which the local computer belongs.

3.2.3      Return value

If the operation was successful, OpenCluster returns a cluster handle.

Return code/value

Description

NULL

0

The operation was not successful. For more information about the error, call the function GetLastError.

 

3.2.4      Remarks

A cluster handle is a pointer to an internally-defined structure which stores information about the RPC or LPC connection to the cluster. Any object handles obtained from the cluster handle will be associated with the RPC or LPC session data stored in the cluster structure. Combining RPC and LPC handles or using handles obtained from different contexts can cause exceptions or other unpredictable results. For more information, see LPC and RPC Handles.

When finished with a cluster handle, it is important to call CloseCluster to ensure that all memory is freed and the connection is shut down cleanly.

If the cluster is remote, the client must be running a compatible operating system. Computers running Windows Server 2008 and Windows Vista cannot call OpenCluster against a cluster running Windows Server 2003 or Windows 2000 Server, and computers running Windows Server 2003, Windows XP, or Windows 2000 cannot call OpenCluster against a cluster running Windows Server 2008. To remotely manage these clusters, use the Failover Cluster WMI Provider.

3.3          GetClusterInformation function

Retrieves a cluster’s name and version.

3.3.1      Syntax

DWORD WINAPI GetClusterInformation(

__in       HCLUSTER hCluster,

__out      LPWSTR lpszClusterName,

__inout    LPDWORD lpcchClusterName,

__out_opt  LPCLUSTERVERSIONINFO lpClusterInfo

);

3.3.2      Parameters

hCluster [in]

Handle to a cluster.

lpszClusterName [out]

Pointer to a null-terminated Unicode string containing the name of the cluster identified by hCluster.

lpcchClusterName [in, out]

Pointer to the size of the lpszClusterName buffer as a count of characters. On input, specify the maximum number of characters the buffer can hold, including the terminating NULL. On output, specifies the number of characters in the resulting name, excluding the terminating NULL.

lpClusterInfo [out, optional]

Either NULL or a pointer to a CLUSTERVERSIONINFO structure describing the version of the Cluster service. When lpClusterInfo is not NULL, the dwVersionInfoSize member of this structure should be set as follows: lpClusterInfo->dwVersionInfoSize = sizeof(CLUSTERVERSIONINFO);

3.4          IVssBackupComponentsEx2::SetAuthoritativeRestore method

Marks the restore of a component as authoritative for a replicated data store.

3.4.1      Syntax

HRESULT SetAuthoritativeRestore(

[in]  VSS_ID writerId,

[in]  VSS_COMPONENT_TYPE ct,

[in]  LPCWSTR wszLogicalPath,

[in]  LPCWSTR wszComponentName,

[in]  bool bAuth

);

3.4.2      Parameters

writerId [in]

The globally unique identifier (GUID) of the writer class.

ct [in]

The type of the component. See the VSS_COMPONENT_TYPE enumeration for the possible values.

wszLogicalPath [in]

A null-terminated wide character string containing the logical path of the component. For more information, see Logical Pathing of Components.

The value of the string containing the logical path used here should be the same as the string that was used when the component was added.

The logical path can be NULL.

There are no restrictions on the characters that can appear in a non-NULL logical path.

wszComponentName [in]

A null-terminated wide character string containing the name of the component.

The string cannot be NULL and should contain the same component name as the string that was used when the component was added to the backup set using the IVssBackupComponents::AddComponent method.

bAuth [in]

Set this variable to true to indicate that the restore of the component is authoritative, or false otherwise.

The default value is false.

 

3.5          SS_RESTOREMETHOD_ENUM enumeration

The VSS_RESTOREMETHOD_ENUM enumeration is used by a writer at backup time to specify through its Writer Metadata Document the default file restore method to be used with all the files in all the components it manages.

The restore method is writer-wide and is also referred to as the original restore target and indicated by a VSS_RESTORE_TARGET value of VSS_RT_ORIGINAL.

3.5.1      Syntax

typedef enum VSS_RESTOREMETHOD_ENUM {

VSS_RME_UNDEFINED                             = 0,

VSS_RME_RESTORE_IF_NOT_THERE                  = 1,

VSS_RME_RESTORE_IF_CAN_REPLACE                = 2,

VSS_RME_STOP_RESTORE_START                    = 3,

VSS_RME_RESTORE_TO_ALTERNATE_LOCATION         = 4,

VSS_RME_RESTORE_AT_REBOOT                     = 5,

VSS_RME_RESTORE_AT_REBOOT_IF_CANNOT_REPLACE   = 6,

VSS_RME_CUSTOM                                = 7,

VSS_RME_RESTORE_STOP_START                    = 8

} ;

 

3.5.2      Constants

VSS_RME_UNDEFINED

No restore method is defined.

This indicates an error on the part of the writer.

This value is not supported for express writers.

VSS_RME_RESTORE_IF_NOT_THERE

The requester should restore the files of a selected component or component set only if there are no versions of those files currently on the disk.

Unless alternate location mappings are defined for file restoration, if a version of any file managed by a selected component or component set is currently on the disk, none of the files managed by the selected component or component set should be restored.

If a file’s alternate location mapping is defined, and a version of the files is present on disk at the original location, files should be written to the alternate location only if no version of the file exists at the alternate location.

VSS_RME_RESTORE_IF_CAN_REPLACE

The requester should restore files of a selected component or component set only if the files currently on the disk can be overwritten.

Unless alternate location mappings are defined for file restoration, if there is a version of any file that cannot be overwritten of the selected component or component set on the disk, none of the files managed by the component or component set should be restored.

If a file’s alternate location mapping is defined, files should be written to the alternate location.

VSS_RME_STOP_RESTORE_START

The requester should perform the restore operation as follows:

  1. 1.    Send the PreRestore event and wait for all writers to process it.
  2. 2.    Stop the service.
  3. 3.    Restore the files to their original locations.
  4. 4.    Restart the service.
  5. 5.    Send the PostRestore event and wait for all writers to process it.

The service to be stopped is specified the writer beforehand when it calls the IVssCreateWriterMetadata::SetRestoreMethod method. The requester can obtain the name of the service by calling the IVssExamineWriterMetadata::GetRestoreMethod method.

Note that if the writer is hosted in the service that is being stopped, that writer will not receive the PostRestore event, because the writer instance ID changes when the service is stopped and restarted.

VSS_RME_RESTORE_TO_ALTERNATE_LOCATION

The requester should restore the files of the selected component or component set to the location specified by the alternate location mapping specified in the writer component metadata file. (See IVssCreateWriterMetadata::AddAlternateLocationMapping, IVssComponent::GetAlternateLocationMapping, IVssExamineWriterMetadata::GetAlternateLocationMapping, and IVssWMFiledesc::GetAlternateLocation.)

This value is not supported for express writers.

VSS_RME_RESTORE_AT_REBOOT

The requester should restore the files of a selected component or component set after the computer is restarted.

The files to be restored should be copied to a temporary location, and the requester should use MoveFileEx with the MOVEFILE_DELAY_UNTIL_REBOOT flag to complete the restoration of these files to their proper location after the computer is restarted.

VSS_RME_RESTORE_AT_REBOOT_IF_CANNOT_REPLACE

If possible, the requester should restore the files of the selected component or component set to their correct location immediately.

If there are versions of any of the files managed by the selected component or component set on the disk that cannot be overwritten, then all the files managed by the selected component or component set should be restored after the computer is restarted.

In this case, files to be restored should be copied to a temporary location on disk, and the requester should use MoveFileEx with the MOVEFILE_DELAY_UNTIL_REBOOT flag to complete the restoration of these files to their proper location after the computer is restarted.

VSS_RME_CUSTOM

The requester should use a custom restore method to restore the files that are managed by the selected component or component set.

A custom restore may use file retrieval API functions or protocols that are private to a given writer application. Such a restore need not use the information in the writer component metadata file.

This value is not supported for express writers.

VSS_RME_RESTORE_STOP_START

The requester should perform the restore operation as follows:

  1. 1.    Send the PreRestore event and wait for all writers to process it.
  2. 2.    Restore the files to their original locations.
  3. 3.    Send the PostRestore event and wait for all writers to process it.
  4. 4.    Stop the service.
  5. 5.    Restart the service.
Advertisements

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: