Event management best practices

When managing peak season or high-traffic events in the cloud, there are three key phases:

  1. Preparation: complete activities and tasks to prepare for your event, including an architecture review, capacity planning, and creating reservations.
  2. Execution: when your event begins, closely monitor it and react accordingly.
  3. Analysis: after your event is completed, analyze what went well, what didn't, and how to improve future events.

You should understand and follow certain best practices during these phases so that you can deliver your services with minimal to no downtime, and with no degrading of performance or user experience during your event.

Preparation: detail your event planning process

While in the preparation phase for a peak capacity event, you should engage with your account team or TAM to help you review your architecture, create diagrams, timelines, and launch checklists, check service quotas based on your anticipated traffic profile, and determine the overall impact on your projects.

Planning should start well ahead of your event date. In some cases, it might be necessary to start planning months in advance while your service is still in development. Put together a team who can assist you in reviewing your proposed event workflow, the expected audience, your requirements and success criteria, and the end-to-end media delivery chain. The preparation phase should include risk assessments, mitigation planning, and an operational readiness review to ensure your workflow is well designed.

For more information, review the Google Cloud Architecture Framework which provides recommendations to help architects, developers, administrators, and other cloud practitioners design and operate a cloud topology that's secure, efficient, resilient, high-performing, and cost-effective.

Execution: monitor and react during execution

When your peak season or launch event begins, you'll need to closely monitor activity and react as needed. Once you've set up monitoring, alerting, and logging in the preparation phase, you can monitor business-critical traffic, logs, and quota levels and use this information to determine root causes and quickly resolve issues. For more information, see Review alerts and dashboards.

If any issues occur, a well-defined incident management and escalation process is key to reducing the effort and time it takes to address and resolve the issue. If you haven't already, establish an incident management process. For more information, see Prepare to resolve issues during your event.

Analysis: review and document your analysis

When your peak season or launch event is over, review and analyze the event, and document the lessons learned so that you can apply them to your next major event.

Focus on the following areas:

  • Timeline recap: capture when your traffic began to increase and the key events (peaks) during the event period. Identify when, if any, issues arose.

  • Root cause analysis: investigate any issues that occurred. Is there anything that you or Google Cloud could have done differently? Is this something to consider for next time? Document any lessons learned and necessary steps to improve for the future.

  • Compare predictions versus actual: analyze your traffic prediction versus the actual traffic you recorded. Where were additional resources needed? Where were resources underutilized or unnecessary?

  • Retrospective: share and review the preceding information with key stakeholders. Promote a blameless culture, where you assume everyone involved had good intentions and you're focused on identifying contributing causes without indicating any individual or team.

What's next