Application resilience considerations

This page provides details about Google Cloud NetApp Volumes application resilience.

Application resilience considerations

Although NetApp Volumes is highly available, planned maintenance events such as platform updates, service upgrades, software upgrades, or unplanned component failures in the service can lead to brief pauses in input and output (I/O) operations.

I/O pauses

Network File System (NFS) or Server Message Block (SMB) client software inside your operating system handles short I/O pauses. The client waits and retries the I/O operations without bringing the issue up to the application. Such short pauses are considered non-disruptive because while the application's users might see longer response times, the application doesn't report I/O errors.

For longer I/O pauses, the behavior depends on your operating system's NFS or SMB client and potential timeouts configured in the application. The following sections discuss protocol-specific details for I/O pauses.

NFS I/O pauses

All calls to an unavailable, hard-mounted NFS share are blocked in the NFS client and wait indefinitely until the NFS server responds again. While your NFS client waits, messages appear in your client logs that indicate the NFS server isn't responding.

From an application perspective, I/O operations such as read or write are blocked and remain outstanding until the NFS share returns successfully. During I/O pauses, no I/O operation is ever lost and NetApp Volumes ensures data consistency, unless you forcefully stop outstanding I/O operations on the client side.

Use cluster software applications to automate failovers

If you use cluster software applications such as Pacemaker on the client VMs to automate failover of your application, configure your timeouts for NFS shares to withstand NetApp Volumes maintenance events. Such failovers abort outstanding I/O operations on the client and can lead to lost transactions. We recommend the following timeouts:

Protocol type Recommended timeout Notes
NFSv3 shares 60 seconds We recommend that you use a fencing method, which uses the nolock mount option instead of relying on NFS locks.
NFSv4.1 105 seconds The NFSv4.1 protocol automatically adds reliable locking over NFSv3 (NFSv4.x RFC, section 9.6.2), which you can use as a fencing mechanism. Lock state recovery adds an additional 45 seconds.

SMB share I/O pauses

Unlike NFS, SMB sessions use a connection which can time out. NetApp Volumes stays below timeouts in most cases.

Session timeouts

The session timeout defines at the client. The default timeout for Windows clients is 60 seconds. You can run the Get-SmbClientConfiguration/Set-SmbClientConfiguration command using the SessionTimeout parameter to read or change the session timeout.

If a session timeout occurs, the SMB session breaks and an I/O error is reported to the application doing the I/O. File Explorer or Microsoft 365 applications usually reconnect as soon as the user accesses the SMB share again. When running into I/O errors, some applications attempt to reconnect and retry the failed I/O operation, while others do not. Consult your application vendor's documentation to learn how the application handles SMB timeouts and can operate resiliently on SMB shares.

Continuously available (CA) shares are an SMB3.x feature that improves failover resilience for database-like applications. NetApp Volumes supports continuously available shares for Microsoft SQL Server and FSLogix.

Failure recovery improves with every new SMB version. NetApp Volumes supports SMB 2.1, 3.0, and 3.1.1. If possible, use the latest supported SMB version. Windows 10/Server 2016 and later support the latest SMB version 3.1.1.

SMB application-based precautions

Certain SMB-based applications require SMB Transparent Failover. SMB Transparent Failover enables maintenance operations on SMB volumes within NetApp Volumes without interrupting connectivity to server applications that store and access data. NetApp Volumes supports the SMB continuously available shares option to make sure specific applications support SMB Transparent Failover. Using SMB continuously available shares supports only the following workloads:

  • FSLogix user profile containers

  • Microsoft SQL Server (not Linux SQL Server)

SMB continuously available shares doesn't support custom applications.

Planned maintenance like for platform updates, service, and software upgrades might occur occasionally. Maintenance operations are considered non-disruptive from a file protocol (NFS or SMB) perspective as long as the application can handle the I/O pauses that might occur during these events. The I/O pauses are typically short and range from a few seconds up to 30 seconds.

What's next

Read about Google Cloud NetApp Volumes security considerations.