The following table describes important Backup and DR Service event IDs, event messages, and steps to resolve them:
Event ID | Event message | What to do |
---|---|---|
5022 | Actifio Connector: Failed in preparing VSS snapshot set | This issue occurs if Windows fails to create a VSS snapshot. To resolve this issue, do the following:
Usually these errors are accompanied by VSS errors reported in the logs such as: VSS_E_VOLUME_NOT_SUPPORTED_BY_PROVIDER
VSS_E_UNEXPECTED_PROVIDER_ERROR
First check if all the VSS writers are in a stable state by going to the command line and issuing this command. # vssadmin list writers
Check output to confirm that all the writers are in a stable state. Restart VSS service and check if the writers are stable. If not you may have to reboot the machine. |
5024 | Actifio Connector: Failed to create VSS snapshot for backup. Insufficient storage available to create either the shadow copy storage file or other shadow copy data | This issue occurs if there is insufficient disk space to process a snapshot.
vssadmin list providers
vssadmin list writers
Sometimes the message appears when internal VSS errors occur. Check the Windows Event Logs for VSS related errors. For errors related to VSS, search for related Microsoft patches. Additional VSS troubleshooting details can be found on Microsoft TechNet. Microsoft recommends at least 320MB on devices specified for saving the created VSS snapshot, plus change data that is stored there. Actifio recommends the shadow storage space be set to unbounded (unlimited) using these commands: vssadmin list shadowstorage
vssadmin Resize ShadowStorage /On=[drive]: /For=[drive]: / Maxsize=[size]
To change the storage area size in the Windows UI, refer to - Configuring Volume Shadow Copy on Windows Server 2008. Re-run the backup once the VSS state is stable and shadow storage is set to unbounded. |
5046 | Backup staging LUN is not visible to the Actifio Connector | This issue occurs if the staging LUN is not visible to the UDSAgent on the application's host and the host is unable to detect the staging LUN from the backup/recovery appliance. |
5049 | Actifio Connector failed identifying logical volume on the backup staging lun | Actifio Connector couldn't see the staging LUN. This can be caused by a bad connection or by trouble on the LUN. Verify that FC/iSCSI connectivity is good, then make sure it works by mapping the VDisk, partitioning it, formatting it, and copying files to it. The steps for partitioning and formatting are OS specific. |
5078 | Actifio Connector: The staging disk is full | Jobs fail if a file that was modified in the source disk is copied to the staging disk, but the file is larger than the free space available in the staging disk.
|
5087 | Actifio Connector: Failed to write files during a backup (Source File) | Anti-virus programs or third party drivers may have applied file locks that cannot be overridden. Check the UDSAgent.log to see which file couldn't be accessed. Attempt to find which process is locking the file using lsof on Unix/ Linux, or fltmc on Windows. Exclude the file from the antivirus or capture job and re-try the capture. The current processes known to Microsoft are listed at: Allocated filter altitudes. These errors are rarely found on Unix or Linux, but it is possible that a process such as database maintenance or patch install / update has created an exclusive lock on a file. Install the latest Actifio Connector. A file system limitation or inconsistency was detected by the host operating system. Run the Windows Disk defragmenter on the staging disk. Low I/O throughput from the hosts disks or transport medium, iSCSI or FC. Ensure there are no I/O issues in the host's disks or transport medium. The transport medium will either be iSCSI or Fibre Channel depending on out of band configuration. Consult storage and network administrators as needed. |
5131 - SQL Logs report error 3041 | SQL log backups on instance fail with error 5131 | To resolve this, enable "Don't forcefully unload the user registry at user logoff", see User Profile Service functionality. |
5131 - SQL logs show backup/recovery appliances error 43901 | Snapshot jobs fail with error 5131, SQL logs show backup/recovery appliances error 43901 "Failed snapshot Job" | This is because the ODBC login for the database is failing. Fixing the ODBC login resolves the issue. |
5136 | Actifio Connector: The staging volume is not readable | Check /act/logs/UDSAgent.log for details and contact Google support for the resolution for the issue. |
5241 | Actifio Connector: Failed to mount/clone applications from mapped image (Source File) | Invalid username and password being parsed from the control file. On the source, review the UDSAgent.log to see if the source is configured with the correct username/password under Advanced Settings in the connector properties. |
5547 | Oracle: Failed to backup archivelog (Source File) | Actifio Connector failed to backup the archive log using RMAN archive backup commands. The likely causes for this failure are:
Search for ORA- or RMAN- errors in the RMAN log. This is the error received from Oracle. Use the preferred Oracle resource as these are not Backup and DR Service conditions, and hence cannot be resolved within Backup and DR Service.
|
10032 | Snapshot pool exceeded warning level |
To reduce consumption of the snapshot pool, do the following:
|
10038 | About to exceed VDisk warning limit |
To immediately reduce VDisk consumption, do the following:
If this alert repeats daily but the appliance does not reach the maximum VDisks, then modify the policies to reduce the number of VDisks used, or increase the alert threshold. During a daily snapshot window the VDisk count can fluctuate while new VDisks are created for snapshots before the old VDisks are removed as a part of snapshot expirations. The daily fluctuations varies depending on the number of applications protected. |
10039 | Network error reaching storage device |
A heartbeat ping to monitored storage has failed due to hardware failure or network issue. Check the network to resolve the issue. |
10043 | A backup plan violation has been detected |
Review the backup plan best practices and optimize your policies. The are common causes for backup plan violations.
|
10046 | Performance Pool exceeded safe threshold |
To reduce consumption of the snapshot pool, do the following:
|
10055 | Unable to check remote protection |
Each backup/recovery appliance checks the remote appliance hourly for possible remote protection issues. The appliance communication fails due to the following issues:
|
10070 | Udppm scheduler is off for more than 30 minutes. | The scheduler is off. This may have been set for maintenance. If the maintenance is complete, you can re-enable the scheduler, see to enable the scheduler.. |
10084 | Alert for application (app name) and policy (policyname) job did not run because of unknown reason |
Review the backup plan best practices and optimize your policies. The are common causes for backup plan violations.
|
10085 | Backup Plan violation for application (app name) on host (host name) and policy (policy name). Job did not run because of unknown reason. |
Review the backup plan best practices and optimize your policies. The are common causes for backup plan violations.
|
10120 | Psrv started successfully | This is an internal event and can be ignored. |
10220 | NTP Service is not running or not synchronised. | The NTP Service on the backup appliance is not running. The NTP Service is needed to ensure the backup appliance uses the correct timestamps. A Compute Engine appliance should be using metadata.google.internal. Follow how to set the NTP server DNS and NTP method. |
10225 | Udp corefiles are found, filename udpengine.(file name) | Internal processes are unexpectedly logging error files. Contact Google support to get the resolution for this issue. |
10229 | Exceeded storage, System name: (device name) | This is is an internal event and normally can be ignored. |
20019 | Insufficient CPU / Memory. Minimum number of core required: (cores) Actual cores : (cores). Minimum memory size required (GB): (memory) Actual memory : (memory) | Backup/recovery appliance has been changed and is not the recommended size. Contact Google support to get the resolution for this issue. |
20025 | Swap usage exceeded | This issue occurs when the swap usage is exceeding the configured threshold limit that is set for the backup/recoevery appliance. Contact Google support to get the resolution for this issue. |
20030 | tomcat stopped successfully | This is is an internal event and can be ignored. |
20031 | tomcat started successfully | This is is an internal event and can be ignored. |
22001 | OMD started Successfully, sltname: , slpname: . | This is is an internal event and can be ignored. |
42356 | File changes have been detected no deleted files have been detected new files have been detected. | This is is an internal event and can be ignored. |
43151 | couldn't add raw device mappings to virtual machine (VM). Error: VM task failed A general system error occurred: The system returned an error. | Adding a raw device mapping to a VM "stuns" the VM until ESX has had a chance to add the new resource. To find out why the raw device mapping couldn't be added, look at the ESX logs for the VM in question (vmware.log). Refer to the VMware documentation and knowledge base for assistance on reviewing the logs for error messages. Also, review the VMware article for more information on collecting VMware logs. |
43155 | Error: VM task failed. An error occurred while saving the snapshot: Failed to quiesce the virtual machine. | This is a VMware issue; for additional information, refer to VMware KB article - 1015180. Virtual machine quiesce issues are dependent on the OS type. Additional investigation, further VMware KBA searches or contact VMware support to resolve this issue. |
43155 - a | Error: VM task failed. Device scsi3 couldn't be hot-added. | This usually means that the SCSI device you are trying to add to the VM is already in use by another VM. |
43155 - b | Error: VM task failed. The virtual disk is either corrupted or not a supported format. | This issue occurs if the VM's CTK files are locked, unreadable, or are being committed. To fix this issue, remove and re-create these CTK files. Refer to the KB article - 2013520 for more information. |
43155 - c | Error: VM task failed. The operation is not allowed in the current state of the datastore." progress ="11" status="running" | There are two options for formatting a VMware datastore: NFS and VMFS. With NFS, there are some limitations like not being able to do RDM (Raw Disk Mapping). This means that you cannot mount from the backup/recovery appliance to an NFS datastore. Refer to the following KB article - 1001856 for additional information. |
43175 | UDSAgent socket connection got terminated abnormally; while waiting for the response from agent | The Actifio Connector stops responding between the appliance and a host with Backup and DR agent is installed.
# telnet
|
43604 | Failed to verify fingerprint | This occurs when an inconsistency is found between the source and target data. Contact Google support to get the resolution for this issue. |
43690 | Host doesn't have any SAN or iSCSI ports defined. | This issue occurs if the backup/recovery appliance is not configured with iSCSI connection to the target host. Ensure that the network ports are open for iSCSI and the target host has discovered the backup/recovery appliances. |
43698 | ESX host is not accessible for NBD mode data movement | The backup/recovery appliance is unable to reach the ESX host over the network or resolve the ESX hostname using DNS. Contact Google support to get the resolution for this issue. |
43702 | Backup was aborted because there are too many extra files in the home directory of the VM | This is an alert condition generated by Backup and DR Service and is caused by leftover delta files in the VM's datastore. Normally, the delta files are removed after Backup and DR snapshot is consolidated. In some instances, these can be left behind by the VMware consolidation, and Backup and DR begins failing jobs to prevent exacerbating the issue. This issue is caused by VMware, refer to the knowledge base article - 1002310. |
43755 | Failed to open VMDK volume; check connectivity to ESX server. | This happens when the ESX server cannot be reached by the controller, usually because of a physical connection or DNS problem. To fix this issue, do the following:
|
43844 | Invalid size vmdk detected for the VM | There are two possible solutions for this situation:
|
43873 | Disk space usage on datastore has grown beyond the critical threshold | This issue occurs when the remaining space on the datastore is less than the critical threshold. If more storage is not made available soon, then jobs start to fail when the remaining space is inadequate to store them. For more information, refer to the VMware knowledge base article - 1003412. |
43900 | Retry pending OnVault (log) (jobname for application (appname) on host (hostname) Error: (errorID) (Error Description) | Job retries can be caused by many errors. Each 43900 event message includes an error code and an error message. |
43901 | Job failure | Job failures can be caused by many errors. Each 43901 event message includes an error code and an error message. |
43903 | Failed expire job | This issue occurs when the image is in use at the time of the expiration. This can be due to the image is in use by another process or operation, such as a mount, clone, restore. The expiration job most likely complete successfully on the second attempt. Backup and DR does not report the successful completion of this second attempt. If you get only one error for an image, it is safe to conclude that a second attempt to expire this image was successful. If there is a legitimate reason why this image cannot be expired, you will get multiple errors related to this image. If you receive more than one error, contact Google Support. |
43905 | Failed mount job | There are many ways a mount job can fail. The error code that accompanies the event helps to identify the root cause. |
43908 | Failed restore job | Job failures can be caused by many errors. Each 43908 event message includes an error code and an error message. |
43915 | Couldn't connect to backup host. Make sure Backup and DR agent is running on (host) and network port (port) is open |
To initiate backup, the Actifio Connector service must be reachable by the backup/recovery appliance. This issue occurs, when the required ports are not open, the incorrect host IP is configured, the Backup and DR agent service not running, or the host is out of physical resources. To fix this issue, do the following:
|
43941 | Disk space usage on datastore has grown beyond the critical threshold | This issue occurs when the remaining space on the datastore is less than the critical threshold. If more storage is not made available soon, then jobs start to fail when the remaining space is inadequate to store them. This alert is created to help you take action to prevent ESX datastores from filling with snapshot data. Increase available space by expanding the datastore, migrating some VMs, or deleting old data on the datastore. Snapshots grow as more change data is added. If a datastore fills up due to a growing snapshot, VMs may be taken offline automatically by VMware to protect the data. |
43954 | Failed OnVault job |
During a mount job, the backup/recovery appliance is unable to connect to the OnVault pool. This issue can be occurred due to any of the following reasons.
|
43929 | Snapshot creation of VM failed. Error: VM task failed An error occurred while saving the snapshot: Failed to quiesce the virtual machine. | VM snapshot fails if the ESX server is unable to quiesce the virtual machine - either because of too much I/O, or because VMware tools cannot quiesce the application using VSS in time. Check the event logs on the host and check the VM's ESX log (vmware.log). Crash-consistent snapshots and connector-based backups show this behavior less often. For more information, refer to the VMware knowledge base articles - 1018194 and 1007696. |
43933 | Failed to find VM with matching BIOS UUID | This issue occurs if the VM's UUID is modified. To fix this issue, rediscover the VM and check if it was discovered as a new UUID. You can confirm this in the management console by comparing the UUID of the newly discovered VM and that of the previously discovered VM. If the UUIDs don't match, the VM might have been cloned. You can also see this error, if a large number of Backup and DR managed VMs are removed from the vCenter. |
43948 | The number of images not expired awaiting further processing is (quantity) images ((quantity) snapshots, (quantity) onvaults) from (quantity) unique applications. (quantity) snapshots and (quantity) OnVaults were added in the last (quantity) seconds ((quantity) hours (quantity) minutes)., sltname: No specific slt, slpname: No specific slp. | "Event ID 43948 is generated when an application begins halting expirations as a part of Image Preservation. 'Image Preservation' preserves snapshot and OnVault images beyond their expiration dates to ensure that those images are properly processed by the backup/recovery appliance. When a new application enters into a preserved mode, a Warning alert will be generated. The most common cause of this is backup plan violations as documented under event ID 10085". |
43954 | Retry OnVault | An OnVault job needed to be retried. Possible issues could include: The Service Account being used has the wrong role. The Service Account does not have authority to write to the bucket. The Cloud Storage bucket no longer exists. |
43960 | Skipped backing up 6 offline applications for SqlServerWriter application. | Backup of a SQL Server Instance found some databases were offline and couldn't be backed up. This commonly occurs when the database has been deleted on the server side, but is still included on the Backup/DR side. The error message contains the names of the offline databases that should be investigated. |
43972 | Metadata upload to bucket failed. | Metadata writes to an OnVault bucket failed. Possible issues could include: The Service Account being used has the wrong role. The Service Account does not have authority to write to the bucket. The Cloud Storage bucket no longer exists. |
43973 | udppm started Successfully | This is is an internal event and can be ignored. |
43999 | Warning: VM is running on a host that is running an outdated version of ESXi , which is not supported by Google. Please upgrade it to a supported version (>=) to ensure the best results. | Upgrade the VM to a supported version (>=) to ensure the best results. |
44003 | Succeeded Job_xxxxxxx for application application ID on host host , sltname: template , slpname: profile . |
This is is an successful status event and can be ignored. |
62001 | Streamsnapd daemon started successfully | This is is an internal event and can be ignored. |
What's next
- To view the event logs created on backup/recovery appliances, see Backup and DR event logs
- To configure log-based alerts for Backup and DR Service, based on sample queries, see create a log-based alert.