Prepare for a peak capacity event

This guide explains how to prepare for peak capacity events and high traffic so that you can deliver your services with minimal to no downtime, and with no degrading of performance or user experience.

When managing events in the cloud, there are three key phases:

Preparation: complete activities and tasks to prepare for your event, including an architecture review, capacity planning, and creating reservations.
Execution: when your event begins, closely monitor it and react accordingly.
Analysis: after your event is completed, analyze what went well, what didn't, and how to improve future events.

This guide focuses on the preparation phase.

Checklist

Learn how to prepare for a peak capacity event by completing the following tasks:

Understand event management best practices
Request and secure the appropriate capacity for your event
Prepare for rapid issue resolution during your event
Manage and optimize your cloud resources
Schedule and conduct load testing
Schedule and conduct disaster recovery testing

To follow step-by-step guidance for this task directly in the Google Cloud console, click Guide me:

Guide me

Before you begin

Depending on the Customer Care support package that you have purchased, make sure that you understand the event readiness services available to you:

Enhanced Support: you can optimize readiness for your events, launches, and migrations by using Planned Event Support which is a Value-Add Service that provides coverage for critical phases of high-traffic short-term digital events like grand openings, new product launches, or data migrations.
Premium Support: you can use the Event Management Service for planned peak events, such as a product launch or major sales event. With this service, Customer Care partners with you to create a plan, provides guidance throughout the event, and supports you with the following:
- Preparing your systems for key moments and heavy workloads
- Running disaster tests to proactively resolve potential issues
- Developing and implementing a faster path to resolution to reduce the impact of any issues that might occur
After the event, your Technical Account Manager (TAM) works with you to review the outcomes and make recommendations for future events. To get started or to learn more, contact your TAM.

Understand event management best practices

You should understand and follow certain event management best practices:

Detail your event planning process: while in the preparation phase for a peak capacity event, you should engage with your account team or TAM to help you review your architecture, create diagrams, timelines, and launch checklists, check service quotas based on your anticipated traffic profile, and determine the overall impact on your projects.
Monitor and react during execution: when your peak season or launch event begins, you'll need to closely monitor activity and react as needed. If any issues occur, a well-defined incident management and escalation process is key to reducing the effort and time it takes to address and resolve the issue.
Review and document your analysis: when your peak season or launch event is over, review and analyze the event, and document the lessons learned so that you can apply them to your next major event.

For more information, see Event management best practices.

Request and secure the appropriate capacity for your event

A critical component of the preparation phase is capacity planning, where you determine the amount of cloud resources needed to ensure your workloads have what they need to operate effectively. As you're preparing for your high traffic event or peak season, make sure that your quotas match your resource requirements so that you don't encounter unexpected failures. For more information, see Secure capacity for your event.

Use Capacity Planner

Capacity Planner lets you view historical and forecasted usage of your virtual machine (VM) instances, Persistent Disk volumes, and GPUs. You can also view reservations, reserve resources in advance, manage your project's quota limits, and set up automatic quota increase requests. For more information, see View usage and forecast data in Capacity Planner.

In the Google Cloud console, go to the Capacity Planner page.

Go to Capacity Planner
Using the resource selector on the console toolbar, select a project, folder, or organization.
Select the type of Compute Engine resource you want to view usage data for:
- To view your VMs' usage data, click the Virtual machine tab, if it isn't already selected.
- To view your Persistent Disk volumes' usage data, click the Disk tab.
- To view your GPUs' usage data, click the GPU tab.
You can filter elements to see different data visualizations for the usage and forecast data of the resource.

Enable the quota adjuster

The quota adjuster monitors your resource consumption and proactively submits quota adjustment requests on your behalf so that you don't have to make manual requests. For more information, see How the quota adjuster works.

In the Google Cloud console, go to the IAM & Admin > Quotas and system limits page.

Go to Quotas and system limits
Click the Configurations tab.
To turn on the quota adjuster, click the Enable toggle to the on position.

When the Status column reads Enabled, the quota adjuster monitors your usage and issues quota adjustment requests when resource use approaches its quota value.

Request a quota adjustment

If your quotas aren't sufficient for what you need, you can request a quota adjustment. For more information, see View and manage quotas.

In the Google Cloud console, go to the IAM & Admin > Quotas and system limits page.

Go to Quotas and system limits
Select the checkbox next to the quota that you want to update.

Optionally, to search for a quota, use the Filter.
Click Edit and the Quota changes pane appears.
In the New value field, enter the quota value that you want. Some quota values have a unit of measurement; if this applies, select the unit that you want in the adjacent list.
If you see a message indicating that you will remove the quota override by requesting a quota greater than the override, select the checkbox to proceed, and then select Apply for higher quota.
In the Request description field, enter a description, and then click Done.
Click Next and provide your contact details.
Click Submit request.

If your request is denied, you can try requesting the quota increase by creating a support case, or by contacting your account team or TAM.

Create a reservation

Reservations help ensure that you have the Compute Engine resources available to create VMs with the same hardware (memory and vCPUs) and optional resources (GPUs and Local SSD disks) whenever you need them.

An on-demand reservation is provisioned at the time you request it, if the requested capacity is available. A future reservation lets you request capacity in advance. We recommend that you procure future reservations well in advance of your event. For more information, see Reservations of Compute Engine zonal resources.

Create a single-project on-demand reservation

In the Google Cloud console, go to the Reservations page.

Go to Reservations
On the On-demand reservations tab (default), click Create reservation.

The Create a reservation page opens.
Enter a Name for your reservation.
Select a Region and Zone where you want to reserve resources.
For the Share type, select Local if it isn't already selected.
To allow a reservation of GPU VMs to be consumed by Vertex AI custom training jobs or prediction jobs, in the Google Cloud services section, select Share reservation.
Select a Use with VM instance option:
- To allow matching VMs to automatically consume this reservation, select Use reservation automatically if it isn't already selected.
- To consume this reservation's resources only when creating matching VMs that specifically target this reservation by name, select Select specific reservation.
  
  Note: If your instance template specifies an A3 machine type or a compact placement policy, then you must select Select specific reservation. Otherwise, you encounter errors.
Enter the Number of VM instances that you want to reserve.
In the Machine configuration section, do the following:
1. To specify the properties of your VMs from an existing instance template, select Use instance template.
2. Specify an Instance Template. If you select a regional template, you can only reserve resources in that region.
To automatically delete the reservation at a specified time, select the Enable auto-delete checkbox. Automatically deleting reservations can be useful to avoid unnecessary charges when you stop consuming the reservation.
To create the reservation, click Create.

The Reservations page opens. It might take up to a minute to create the reservation.

Create a single-project future reservation

In the Google Cloud console, go to the Reservations page.

Go to Reservations
Click the Future reservations tab.
Click Create future reservation.

The Create a future reservation page opens.
For Name, enter a name for your future reservation request.
For Prefix, enter a name prefix. The name of each auto-created reservation from this future reservation request starts with this prefix.
For Region and Zone, select the region and zone in which to reserve resources.
For Start time, enter a start time for the reservation period. The start time can't be later than one year from the date and time at which you submit a future reservation request for review. To have sufficient time to optionally modify, cancel, or delete the future reservation before it enters its lock time, specify a recommended value for the start time.
For End time, enter an end time for the reservation period. In the Duration summary section, you can see the length of the reservation period. To increase the likelihood that Google Cloud approves your future reservation request after submitting it for review, specify a recommended value for the reservation period.
In the Share type section, select Local if it's not already selected.
For Total capacity needed, enter the total count of VMs that you want reserved for the period, VM configuration, and projects specified in this future reservation request.

Important: Verify that you have sufficient quota for the resources that you're requesting to reserve. Otherwise, if you submit the future reservation request for review, Google Cloud declines it.
In the Machine configuration section, do the following:
1. Select Use instance template.
2. For Instance template, select the instance template of your choice. If you select a regional instance template, then you can only reserve resources within the same region as the template's region.
In the Auto-generated reservations auto-delete section, do one of the following:
- To enable Compute Engine to automatically delete the auto-created reservations for your future reservation request, do the following:
  
  Caution: If the auto-delete option is enabled, then all the auto-created reservations are deleted at the specified deletion time regardless if they're fully consumed or not.
  1. Click the Enable auto-delete toggle to the on position if it isn't already set to it.
  2. Optional: For Auto-delete time, enter a date and time when to delete the auto-created reservations. The date and time must be after the end time of the reservation period. If you leave this field empty, then the auto-created reservations are deleted within two hours after the reservation period's end time.
- Otherwise, click the Enable auto-delete toggle to the off position.
To finish creating the draft future reservation request, click Save a draft.

The Reservations page opens. Creating the future reservation request can take up to a minute to complete.

Prepare for rapid issue resolution during your event

How you respond to, categorize, and solve incidents of differing severity can significantly affect your operations during an event. Make sure to use a centralized incident management system for effective tracking throughout the incident lifecycle.

In addition to reviewing your contacts and enabling Personalized Service Health, you should do the following:

Review best practices when working with Customer Care
Create a communication plan
Ensure access

For more information, see Prepare to resolve issues during your event.

Review your contacts

Many Google Cloud offerings send out notifications to share important information with Google Cloud users. By default, these notifications are sent to members with certain Identity and Access Management (IAM) roles. With Essential Contacts, you can customize who receives notifications by providing your own list of contacts. For more information, see Managing contacts for notifications.

In the Google Cloud console, go to the IAM & Admin > Essential Contacts page.

Go to Essential Contacts
Make sure the name of your project, folder, or organization appears in the resource selector on the console toolbar. The resource selector tells you what project, folder, or organization you are managing contacts for.
To list the contacts by category, select Category. To list the contacts alphabetically, select Contacts.

Note: When you view contacts for an organization or folder, you don't see contacts for its child resources. If you are viewing contacts for a child resource (folder or project) that inherits contacts from its ancestors (organizations or other folders), you see the inherited contacts for any ancestors that you have permission to view contacts for.

Enable Personalized Service Health

Personalized Service Health lets you identify Google Cloud service disruptions relevant to your projects so that you can manage and respond to them efficiently. For more information, see the Personalized Service Health overview.

Make sure that you have enabled the Service Health API ahead of your event and that your organization can access the dashboard and configure alerts. For more information, see Manage Personalized Service Health access.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector
Enable the Service Health API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.
Enable the API

Manage and optimize your cloud resources

Manage and optimize the resources that are used by your Google Cloud workloads. This involves right-sizing resources based on actual usage and demand, using auto scaling for dynamic resource allocation, and reviewing architecture and security recommendations.

In addition to reviewing Active Assist recommendations, you should do the following:

Review Google Cloud best practices
Review scalability
Review your product versions
Review alerts and dashboards

For more information, see Optimize your cloud resources.

Review Active Assist recommendations

Active Assist refers to the portfolio of tools used in Google Cloud to generate recommendations and insights to help you optimize your Google Cloud projects. For more information, see What is Active Assist.

In the Google Cloud console, go to the Active Assist.

Go to Active Assist
Using the resource selector on the console toolbar, select a Google Cloud project, folder, or organization.

You can only see recommendations at an organization or folder level if you have permissions for that organization or folder, regardless of your permissions for folders or projects within it.
In the navigation pane, click All recommendations.

Recommendations that you can filter, sort, and share are displayed. You can explore recommendations in detail, see what resources are affected, and what impact applying a recommendation might have.
To learn more about a recommendation, in the Recommendation column, click a recommendation.

Relevant data that might help you decide whether to apply or dismiss the recommendation is displayed.
If you have the permission to, you can apply or dismiss a recommendation by clicking the applicable button.

Note that some recommendations might not be directly applicable. Follow the instructions detailed in the console to apply them.

Schedule and conduct load testing

Load testing can determine whether your system will scale successfully during production use, and find any bottlenecks that prevent it from scaling.

Three to five months before your event, you should perform load testing on critical projects and workloads to simulate peak traffic.
Investigate the quota and cost implications of the load tests and consider creating Cloud Billing budget alerts that will monitor the expense of the services that you are going to stress the most.
After each test, evaluate the results and use Capacity Planner to visualize your usage and forecast data, and request quota increases as necessary. In this document, see Use Capacity Planner and Request a quota adjustment.

If you require assistance, contact your account team or TAM. They can help you capture and plan architecture changes to improve the overall reliability and availability of your services.

Schedule and conduct disaster recovery testing

If you have a robust and well-tested disaster recovery (DR) strategy, you can minimize the impact of disruptions, achieve faster recovery times, and resume core operations more rapidly when things go wrong.

Your DR strategy should include detailed emergency response requirements, backup operations, and recovery procedures.

One to three months before your event, make sure to run through simulations of disaster scenarios to identify any gaps in your DR plans and readiness.
After each test, evaluate the results to determine if any architectural changes are required. Review and optimize resources as needed. In this document, see Manage and optimize your cloud resources.

Your account team or TAM can help you with your DR testing, as well as how to understand and improve your incident response process. For more information, see the Disaster recovery planning guide.

Summary of resources

The following table lists the guides that are referred to in this document.

Understand event management best practices
Event management best practices
Request and secure the appropriate capacity for your event
About future reservation requests Create and manage support cases Instance templates Quota adjuster Reservations of Compute Engine zonal resources Secure capacity for your event Use reservations with prediction Use reservations with training View and manage quotas View usage and forecast data in Capacity Planner
Prepare for rapid issue resolution during your event
Manage Personalized Service Health access Managing contacts for notifications Personalized Service Health overview Prepare to resolve issues during your event
Manage and optimize your cloud resources
Optimize your cloud resources What is Active Assist
Schedule and conduct load testing
Create, edit, or delete budgets and budget alerts
Schedule and conduct disaster recovery testing
Disaster recovery planning guide