Backup plan policy best practices

Following these best practices helps you to avoid some of the more common mistakes users make when creating and modifying policy templates.

You need to configure policy templates according to your recovery point objectives (RPOs) and recovery time objectives (RTOs). Over time you may find it necessary to make changes to those templates.

Initial data backup capture

The first time a policy in a policy template creates a backup of an application's data, it backs up the data in its entirety. Subsequent backups will be incremental.

To protect multiple applications with one policy template, apply the policy template to just a few of the applications. Once the initial full data capture is complete, apply the policy template to more applications. Repeat the process until the policy template has been applied to all applications.

Resizing volumes

If you resize a volume that contains protected data, for some application types, the next time the snapshot policy for that volume runs and it may perform a full backup operation; regardless of how many times that volume's data has been backed up in the past. These include resized VMWare VMDKs and agent based backups of Microsoft Windows applications and Linux applications that are not on LVM.

If you must resize a volume for those affected application types, consider the impact capturing all of the data will have on the applications server(s), network, and the backup/recovery appliance.

Jobs concurrency

A backup/recovery appliance can by default run six snapshot jobs concurrently. If you have more than the allowed number of jobs scheduled for the same time period, the policy scheduler will start as many jobs as allowed and queue the other jobs.

Because each user's network design, data layout, and storage classes differ, experiment with concurrency until the optimal number of concurrent jobs is reached.

Policy schedules

Management console supports two methods of specifying a policy schedule when configuring a policy:

  • Windowed. Defines a discrete snapshot backup schedule adhering to a specific frequency and time window (for example, perform a backup every 30 minutes, daily from 09:00 to 17:00 UTC). You can instruct the backup/recovery appliance to run multiple backup jobs at a specified frequency interval or once during a specified time window.
  • Continuous. Defines a continuous snapshot backup schedule (for example, perform a backup job every eight hours, starting the first job at 01:00 UTC). In this policy schedule, jobs run continuously (24/7) at the specified time interval.

Frequency calculation

The period is the time between scheduled runs and frequency is the number of jobs run per unit time. For example, if a schedule calls for jobs to run every 4 hours, the period is 4 hours and the expected frequency is 6 times per day. If a job takes one hour to complete and the policy has a 12 hour frequency, then the policy's job will run again 11 hours after the previous job completes.

Be sure to select a frequency that achieves your required Recovery Point Objectives (RPOs) and allows sufficient time for a job to finish.

  • Minimum recommended frequency for a Snapshot policy is 1 hour (local RPO).
  • A StreamSnap policy can point to any Snapshot policy with frequency of 1 hour or longer (remote RPO).

Database log protection in a backup plan policy

When creating a snapshot policy for a database you have the option of also capturing its log files at a specified frequency. The frequency at which database logs are captured is defined separately from that of the database. For example, a database can be captured every day and its logs captured every hour.

The frequency of database log backup is set in minutes, and the frequency at which logs are captured must not exceed the frequency at which its associated database is captured. For example, if a database capture frequency is every 24 hours, the log file capture frequency must be less than every 24 hours.

Frequency and retention are defined in the advanced settings of the database's snapshot policy. The capture of logs is done without regard to day boundaries, window, or frequency at which its associated database is captured.

You enable the Log Protection functionality through the Enable Database Log Backup advanced settings in a backup plan snapshot policy. Frequency and retention are also defined in the advanced settings for a backup plan policy.

The physical space required to accommodate a database's logs is automatically managed by management console. At a minimum, management console will evaluate typical log sizes and their retention period and add space as needed.

To enable log backup and to more efficiently and effectively manage the storage requirements for database logs, refer to this table.

Setting Input
Truncate or purge log after backup Required to set to Yes for purging the production log. To manage log purging, select this. This will run log purge at the end of each log backup. The default is Do Not Truncate.

If a policy with Enable database log backup is set to No, and if Truncate or purge log after backup is set to Yes, then log purging runs at the end of each database backup, purging all the logs.
Log backup retention period The log backup under Backup and DR staging disk will be retained to the value set here. Backup log retention can be different from snapshot retention.
Log staging disk growth size Set a percentage by which to grow the log backup staging disk when needed.
Estimated change rate Estimate the percentage by which the database data changes daily.
Compress database log backup Use this to enable database log backup to run in compress mode using the app-level database API.
Enable database log backup The Enable Database Log Backup option allows the backup plan policy to back up a database and all associated log files. The logs are backed up when the log backup job runs. The options are Yes or No. When set to Yes, the related options are enabled.
RPO When Enable Database Log Backup is set to Yes, RPO defines the frequency for database log backup. Frequency is set in minutes and must not exceed the database backup interval.
Replicate logs (Uses StreamSnap technology) When Enable Database Log Backup is set to Enable, the Replicate Logs advanced setting allows database log backups to be replicated to a remote backup/recovery appliance. For a log backup replication job to run, there must be a StreamSnap replication policy included in the template along with a resource profile that specifies a remote backup/recovery appliance, and at least one successful replication of the database must first be completed. You can then use the log backup at the remote site for any database image within the retention range of the replicated log backup. This function is enabled by default.

Log replication uses StreamSnap technology to perform the replication between the local and remote backup/recovery appliances; log replication goes directly from the local snapshot pool to the snapshot pool on the remote appliance.

Note: Log replication does not occur until the database has been protected and the database backup image has been replicated to the remote backup/recovery appliance.
Send logs to OnVault Pool Set to Yes, logs will get replicated to one or more OnVault storage pools enabling point-in-time recoveries from an OnVault pool on another site.

Job priority and scheduling

All activities run as jobs. Jobs are executed according to the schedules configured when the policies were created.

Some jobs take much longer than others. Expiration jobs are fast. Snapshot jobs depend upon variables like the size of the application or VM and how much data has changed since its last snapshot; the initial snapshot of any application or VM is all-new data, so those can take a long time.

The policy scheduler identifies when one or more policies applied to applications are to run, and then initiates a job that places the policy into a queue when the scheduled start time occurs. For each policy type there is a pacing mechanism to ensure that the system is not overwhelmed with running jobs. This pacing mechanism uses job slots to achieve this steady state, which means that even if a job is supposed to start at a particular time it will execute only occur when a job slot is available.

If multiple applications are scheduled to run at the same time with the same job priority, the selection of the application to run is randomized to ensure fairness across all of the applications that have the same priority.

Job retries

When a job fails, the scheduler will automatically retry running the job. The first time the job fails the scheduler will wait 4 minutes before making it available for retry. After 3 failed job attempts the job is marked as Failed and is no longer retried. The next job will be attempted according to the policy's schedule.

The scheduler will treat a job retry like any other available job. If there are more jobs available than slots to accommodate them, then jobs are queued. This may result in a retry failing to start within the window and the job being marked as failed.

Job retries are reported in the Monitor. To identify job retries, the Monitor appends first a, then b, and finally c to each retry job's name.

What's next