Decide the security for your Google Cloud landing zone

Last reviewed 2023-08-31 UTC

This document introduces important security decisions and recommended options to consider when designing a Google Cloud landing zone. It's part of a series about landing zones, and is intended for security specialists, CISOs, and architects who want to understand the decisions that they need to make when designing a landing zone in Google Cloud.

In this document, it's assumed that a central team, such as the security team or the platform team, enforces these landing zone security controls. Because the focus of this document is the design of enterprise-scale environments, some strategies that it describes might be less relevant for small teams.

Decision points for securing your Google Cloud landing zone

To choose the best security design for your organization, you must make the following decisions:

Architecture diagram

The example architecture described in this document uses common security design patterns. Your specific controls might vary based on factors such as your organization's industry, target workloads, or additional compliance requirements. The following diagram shows the security controls architecture that you apply in your landing zone when you follow the recommendations in this document.

Example security controls architecture.

The preceding diagram shows the following:

  • Service account key management helps mitigate risk from long-lived service account credentials.
  • VPC Service Controls defines a perimeter around sensitive resources that helps to restrict access from outside the perimeter.
  • Security Command Center monitors the environment for insecure configurations and threats.
  • A centralized log sink collects audit logs from all projects.
  • Google default encryption at rest encrypts all data that persists to disk.
  • Google default encryption in transit applies to layer 3 and layer 4 network paths.
  • Access Transparency gives you visibility and control over how Google can access your environment.

Decide how to limit persistent credentials for service accounts

Service accounts are machine identities that you use to grant IAM roles to workloads and allow the workload to access Google Cloud APIs. A service account key is a persistent credential, and any persistent credentials are potentially high risk. We don't recommend that you let developers freely create service account keys.

For example, if a developer accidentally commits the service account key to a public Git repository, an external attacker can authenticate using those credentials. As another example, if the service account key is stored in an internal repository, a malicious insider who can read the key could use the credentials to escalate their own Google Cloud privileges.

To define a strategy to manage these persistent credentials, you must provide viable alternatives, limit the proliferation of persistent credentials, and manage how they are used. For information about alternatives to service account keys, see Choose the best authentication method for your use case.

The following sections describe the options to limit persistent credentials. We recommend option 1 for most use cases. The other options discussed in the following sections are alternatives that you can consider if option 1 doesn't apply to your specific organization.

Option 1: Restrict use of persistent service account keys

We recommend that you do not permit any users to download service account keys because exposed keys are a common attack vector. Restricting the use of persistent service account keys is an option that can help reduce the risk and overhead of manually managing service account keys.

To implement this option, consider the following:

  • To prevent developers from creating and downloading persistent credentials, configure the organization policy constraint constraints/iam.disableServiceAccountKeyCreation.
  • Educate your teams on more secure alternatives to service account keys. For example, when users and applications that are outside of your Google Cloud environment need to use a service account, they can authenticate with service account impersonation or workload identity federation instead of a service account key.
  • Design a process for teams to request an exception to this policy when downloading a service account key is the only viable option. For example, a third-party SaaS product might require a service account key to read logs from your Google Cloud environment.

Avoid this option when you already have tooling in place to generate short-lived API credentials for service accounts.

For more information, see the following:

Option 2: Use additional access management tools to generate short-lived credentials

As an alternative to Restrict use of persistent service account keys, you can generate short-lived credentials for service accounts. Short-lived credentials create less risk than persistent credentials such as service account keys. You can develop your own tooling or use third-party solutions such as Hashicorp Vault to generate short-lived access credentials.

Use this option when you already have invested in a third-party tool for generating short-lived credentials for access control, or have sufficient budget and capacity to develop your own solution.

Avoid using this option when you don't have existing tooling to grant short-lived credentials, or don't have the capacity to build your own solution.

For more information, see Creating short-lived service account credentials.

Decide how to mitigate data exfiltration through Google APIs

Google APIs have public endpoints that are available to all customers. While every API resource in your Google Cloud environment is subject to IAM access controls, there is a risk that data could be accessed using stolen credentials, exfiltrated by malicious insiders or compromised code, or exposed through a misconfigured IAM policy.

VPC Service Controls is a solution that addresses these risks. However, VPC Service Controls also introduces complexity to your access model, so you must design VPC Service Controls to meet your unique environment and use case.

The following sections describe the options to mitigate data exfiltration through Google APIs. We recommend option 1 for most use cases. The other options discussed in the following sections are alternatives that you can consider if option 1 doesn't apply to your specific use case.

Option 1: Configure VPC Service Controls broadly across your environment

We recommend that you design your environment within one or more VPC Service Controls perimeters that restrict all supported APIs. Configure exceptions to the perimeter with access levels or ingress policies so that developers can access the services that they require, including console access where needed.

Use this option when the following is true:

  • The services that you intend to use support VPC Service Controls, and your workloads do not require unrestricted internet access.
  • You store sensitive data on Google Cloud that could be a significant loss if exfiltrated.
  • You have consistent attributes for developer access that can be configured as exceptions to the perimeter, allowing users to access the data that they need.

Avoid this option when your workloads require unrestricted internet access or services that are not supported by VPC Service Controls.

For more information, see the following:

Option 2: Configure VPC Service Controls for a subset of your environment

Instead of configuring VPC Service Controls broadly across your environment, you can configure VPC Service Controls only on the subset of projects that contain sensitive data and internal-only workloads. This option lets you use a simpler design and operation for most projects, while still prioritizing data protection for projects with sensitive data.

For example, you might consider this alternative when a limited number of projects contain BigQuery datasets with sensitive data. You can define a service perimeter around just these projects, and define ingress and egress rules to allow narrow exceptions for the analysts who need to use these datasets.

For another example, in an application with three-tier architecture, some components might be outside of the perimeter. The presentation tier that allows ingress from user traffic might be a project outside of the perimeter, and the application tier and data tier that contain sensitive data might be separate projects inside the service perimeter. You define ingress and egress rules to the perimeter so that the tiers can communicate across the perimeter with granular access.

Use this option when the following is true:

  • Only limited and well-defined projects contain sensitive data. Other projects contain data of lower risk.
  • Some workloads are internal only, but some workloads require public internet access or have dependencies on services that are not supported by VPC Service Controls.
  • Configuring VPC Service Controls across all projects creates too much overhead or requires too many workarounds

Avoid this option when many projects could potentially contain sensitive data.

For more information, see Best practices for enabling VPC Service Controls.

Option 3: Don't configure VPC Service Controls

As another alternative to configuring VPC Service Controls broadly across your environment, you can choose not to use VPC Service Controls, particularly if the operational overhead outweighs the value of VPC Service Controls.

For example, your organization might not have a consistent pattern of developer access that could form the basis of an ingress policy. Perhaps your IT operations are outsourced to multiple third parties, so developers don't have managed devices or access from consistent IP addresses. In this scenario, you might not be able to define ingress rules to allow exceptions to the perimeter that developers need to complete their daily operations.

Use this option when:

  • You use services that do not support VPC Service Controls.
  • Workloads are internet facing and don't contain sensitive data.
  • You don't have consistent attributes of developer access like managed devices or known IP ranges.

Avoid this option when you have sensitive data in your Google Cloud environment.

Decide how to continuously monitor for insecure configurations and threats

Adopting cloud services introduces new challenges and threats when compared to using services located on-premises. Your existing tools that monitor long-lived servers may not be appropriate for autoscaling or ephemeral services, and might not monitor serverless resources at all. Therefore, you should evaluate security tools that work with the full range of cloud services that you might adopt. You should also continuously monitor for secure cloud standards, like the CIS Benchmarks for Google Cloud.

The following sections describe the options for continuous monitoring. We recommend option 1 for most use cases. The other options discussed in the following sections are alternatives that you can consider if option 1 doesn't apply to your specific use case.

Option 1: Use Security Command Center Premium

We recommend that you activate the Premium tier of Security Command Center at the organization level, which helps you strengthen your security posture by doing the following:

  • Evaluating your security and data attack surface
  • Providing asset inventory and discovery
  • Identifying misconfigurations, vulnerabilities, and threats
  • Helping you mitigate and remediate risks

When you enable Security Command Center at the beginning of your landing zone build, your organization's security team has near real-time visibility on insecure configurations, threats, and remediation options. This visibility helps your security team assess whether the landing zone meets their requirements and is ready for developers to start deploying applications.

Use this option when the following is true:

  • You want a security posture management and threat detection tool that is integrated with all Google Cloud services without additional integration effort.
  • You want to use the same threat intelligence, machine learning, and other advanced methods that Google uses to protect its own services.
  • Your existing security operations center (SOC) doesn't have the skills or capacity to generate threat insights from a large volume of cloud logs.

Avoid this option when your existing security tools can fully address ephemeral or serverless cloud resources, monitor for insecure configurations, and identify threats at scale in a cloud environment.

Option 2: Use your existing security tools for cloud security posture management and threat detection

As an alternative option to Use Security Command Center Premium tier, you might consider other cloud security posture management tools. Various third-party tools exist that have similar functions to Security Command Center, and you might already have invested in cloud-native tools that are focused on multi-cloud environments.

You can also use Security Command Center and third-party tools together. For example, you might ingest the finding notifications from Security Command Center to another tool, or you might add a third-party security service to the Security Command Center dashboard. As another example, you might have a requirement to store logs on an existing SIEM system for the SOC team to analyze for threats. You could configure your existing SIEM to ingest only the finding notifications that Security Command Center produces, instead of ingesting a large volume of logs and expecting a SOC team to analyze the raw logs for insight.

Use this option when your existing security tools can fully address ephemeral or serverless cloud resources, monitor for insecure configurations, and identify threats at scale in a cloud environment.

Avoid this option when the following is true:

  • Your existing SOC doesn't have the skills or capacity to generate threat insights from the vast volume of cloud logs.
  • Integrating multiple third-party tools with multiple Google Cloud services introduces more complexity than value.

For more information, see the following:

Decide how to centrally aggregate necessary logs

Most audit logs are stored in the Google Cloud project that produced them. As your environment grows, it can be untenable for an auditor to check logs in every individual project. Therefore, you need to make a decision on how logs will be centralized and aggregated to help your internal audit and security operations.

The following sections describe the options for aggregating logs. We recommend option 1 for most use cases. The other options discussed in the following sections are alternatives that you can consider if option 1 doesn't apply to your specific use case.

Option 1: Retain logs in Google Cloud by using aggregated logs sinks

We recommend that you configure a centralized organization-wide log sink for audit logs and other logs that are required by your security team. You can reference the logs scoping tool to identify the logs that your security team requires and whether these log types require explicit enablement.

For example, the security team expects a central record of any resources that your users create so that the security team can monitor and investigate suspicious changes. The security team also requires an immutable record of data access for certain highly sensitive workloads. Therefore, the security team configures one log sink to aggregate admin activity audit logs from all projects into a log analytics bucket in a central project that they can view for impromptu investigations. They then configure a second log sink for data access audit logs from projects with sensitive workloads into a Cloud Storage bucket for long-term retention.

Use this option when the following is true:

  • Your security team expects a central record of all audit logs or other specific log types.
  • Your security team needs to store logs in an environment with restricted access, outside the control of the workload or teams who produced the log.

Avoid this option when the following is true: - Your organization doesn't have a central requirement for consistent audit logs across workloads. - Individual project owners have full responsibility for managing their own audit logs.

For more information, see the following:

Option 2: Export required audit logs to storage outside of Google Cloud

As an alternative to storing logs in Google Cloud only, consider exporting audit logs outside of Google Cloud. After you centralize necessary log types into an aggregate log sink in Google Cloud, ingest the contents of that sink to another platform outside of Google Cloud for storing and analyzing logs.

For example, you might use a third-party SIEM to aggregate and analyze audit logs across multiple cloud providers. This tool has sufficient capabilities to work with serverless cloud resources, and your SOC team has the skills and capacity to generate insight from this large volume of logs.

This option can potentially be very expensive because of the network egress cost in Google Cloud, as well as the storage cost and capacity in the other environment. Rather than exporting every available log, we recommend that you be selective about which logs are required in the external environment.

Use this option when you have a requirement to store logs from all environments and cloud providers in a single central location.

Avoid this option when the following is true:

  • Your existing systems don't have the capacity or budget to ingest a large volume of additional cloud logs.
  • Your existing systems require integration efforts for each log type and format.
  • You are collecting logs without a clear goal of how they will be used.

For more information, see the following:

Decide how to meet compliance requirements for encryption at rest

Google Cloud automatically encrypts all your content stored at rest, using one or more encryption mechanisms. Depending on your compliance requirements, you might have an obligation to manage the encryption keys yourself.

The following sections describe the options for encryption at rest. We recommend option 1 for most use cases. The other options discussed in the following sections are alternatives that you can consider if option 1 doesn't apply to your specific use case.

Option 1: Accept use of default encryption at rest

Default encryption at rest is sufficient for many use cases that don't have particular compliance requirements regarding encryption key management.

For example, the security team at an online gaming company requires all customer data to be encrypted at rest. They don't have regulatory requirements about key management, and after reviewing Google's default encryption at rest, they are satisfied that it's a sufficient control for their needs.

Use this option when the following is true:

  • You don't have particular requirements around how to encrypt data or how encryption keys are managed.
  • You prefer a managed service over the cost and operational overhead of managing your own encryption keys.

Avoid this option when you have compliance requirements to manage your own encryption keys.

For more information, see Encryption at rest in Google Cloud.

Option 2: Manage encryption keys using Cloud KMS

In addition to default encryption at rest, you might require more control over the keys used to encrypt data at rest within a Google Cloud project. Cloud Key Management Service (Cloud KMS) offers the ability to protect your data using customer-managed encryption keys (CMEK). For example, in the financial services industry, you might have a requirement to report to your external auditors how you manage your own encryption keys for sensitive data.

For additional layers of control, you can configure hardware security modules (HSM) or external key management (EKM) with CMEK. Customer-supplied encryption keys (CSEK) are not recommended; scenarios that historically were addressed by CSEK are now better addressed by Cloud External Key Manager (Cloud EKM) because Cloud EKM has support for more services and higher availability.

This option shifts some responsibility to application developers to follow the key management that your security team mandates. The security team can enforce the requirement by blocking the creation of non-compliant resources with