Manage Dataproc resources using custom constraints
Google Cloud Organization Policy gives you centralized, programmatic control over your organization's resources. As the organization policy administrator, you can define an organization policy, which is a set of restrictions called constraints that apply to Google Cloud resources and descendants of those resources in the Google Cloud resource hierarchy. You can enforce organization policies at the organization, folder, or project level.
Organization Policy provides predefined constraints for various Google Cloud services. However, if you want more granular, customizable control over the specific fields that are restricted in your organization policies, you can also create custom constraints and use those custom constraints in a custom organization policy.
Benefits
You can use a custom organization policy to allow or deny specific operations on Dataproc clusters. For example, if a request to create or update a cluster fails to satisfy custom constraint validation as set by your organization policy, the request will fail, and an error will be returned to the caller.
Policy inheritance
By default, organization policies are inherited by the descendants of the resources on which you enforce the policy. For example, if you enforce a policy on a folder, Google Cloud enforces the policy on all projects in the folder. To learn more about this behavior and how to change it, refer to Hierarchy evaluation rules.
Pricing
The Organization Policy Service, including predefined and custom organization policies, is offered at no charge.
Before you begin
- Set up your project
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Dataproc API.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Dataproc API.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
- Ensure that you know your organization ID.
Required roles
To get the permissions that you need to manage organization policies, ask your administrator to grant you the following IAM roles:
-
Organization policy administrator (
roles/orgpolicy.policyAdmin
) on the organization resource -
To create or update a Dataproc cluster:
Dataproc Admin or Dataproc Editor (
roles/dataproc.admin
orroles/dataproc.editor
) on the project resource
For more information about granting roles, see Manage access to projects, folders, and organizations.
These predefined roles contain the permissions required to manage organization policies. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to manage organization policies:
-
orgpolicy.constraints.list
-
orgpolicy.policies.create
-
orgpolicy.policies.delete
-
orgpolicy.policies.list
-
orgpolicy.policies.update
-
orgpolicy.policy.get
-
orgpolicy.policy.set
You might also be able to get these permissions with custom roles or other predefined roles.
Create a custom constraint
A custom constraint is defined in a YAML file by the resources, methods,
conditions, and actions it is applied to. Dataproc supports
custom constraints that are applied to the CREATE
and UPDATE
methods of the
CLUSTER
resource (see
Dataproc constraints on resources and operations).
To create a YAML file for a Dataproc custom constraint:
name: organizations/ORGANIZATION_ID/customConstraints/CONSTRAINT_NAME
resourceTypes:
- dataproc.googleapis.com/Cluster
methodTypes:
- METHOD
condition: "CONDITION"
actionType: ACTION
displayName: DISPLAY_NAME
description: DESCRIPTION
Replace the following:
ORGANIZATION_ID
: your organization ID, such as123456789
.CONSTRAINT_NAME
: the name you want for your new custom constraint. A custom constraint must start withcustom.
, and can only include uppercase letters, lowercase letters, or numbers, for example, custom.dataprocEnableComponentGateway. The maximum length of this field is 70 characters, not counting the prefix, for example,organizations/123456789/customConstraints/custom
.
METHOD
: When creating a cluster creation constraint, specifyCREATE
. When creating a clusterUPDATE
constraint, specify both as follows:methodTypes: - CREATE - UPDATE
CONDITION
: a CEL condition that is written against a representation of a supported service resource. This field has a maximum length of 1000 characters. See Supported resources for more information about the resources available to write conditions against. For example,"resource.config.endpointConfig.enableHttpPortAccess==true"
.ACTION
: the action to take if thecondition
is met. This can be eitherALLOW
orDENY
.DISPLAY_NAME
: a human-friendly name for the constraint, for example, "Enforce enabling Dataproc Component Gateway". This field has a maximum length of 200 characters.DESCRIPTION
: a human-friendly description of the constraint to display as an error message when the policy is violated, for example, "Only allow Dataproc cluster creation if the Component Gateway is enabled". This field has a maximum length of 2000 characters.
For more information about how to create a custom constraint, see Defining custom constraints.
Set up a custom constraint
After you have created a new custom constraint using the Google Cloud CLI, you must set it up to make it available for organization policies in your organization. To set up a custom constraint, use thegcloud org-policies set-custom-constraint
command:
gcloud org-policies set-custom-constraint CONSTRAINT_PATH
CONSTRAINT_PATH
with the full path to your
custom constraint file. For example, /home/user/customconstraint.yaml
.
Once completed, you will find your custom constraints as available organization policies
in your list of Google Cloud organization policies.
To verify that the custom constraint exists, use the
gcloud org-policies list-custom-constraints
command:
gcloud org-policies list-custom-constraints --organization=ORGANIZATION_ID
ORGANIZATION_ID
with the ID of your organization resource.
For more information, see
Viewing organization policies.
Enforce a custom constraint
You can enforce a boolean constraint by creating an organization policy that references it, and applying that organization policy to a Google Cloud resource.Console
To enforce a boolean constraint:
- In the Google Cloud console, go to the Organization policies page.
- Select the project picker at the top of the page.
- From the project picker, select the project for which you want to set the organization policy.
- Select your constraint from the list on the Organization policies page. The Policy details page for that constraint should appear.
- To configure the organization policy for this resource, click Manage policy.
- On the Edit policy page, select Override parent's policy.
- Click Add a rule.
- Under Enforcement, select whether enforcement of this organization policy should be on or off.
- Optionally, to make the organization policy conditional on a tag, click Add condition. Note that if you add a conditional rule to an organization policy, you must add at least one unconditional rule or the policy cannot be saved. For more details, see Setting an organization policy with tags.
- If this is a custom constraint, you can click Test changes to simulate the effect of this organization policy. For more information, see Test organization policy changes with Policy Simulator.
- To finish and apply the organization policy, click Set policy. The policy will take up to 15 minutes to take effect.
gcloud
To create an organization policy that enforces a boolean constraint, create a policy YAML file that references the constraint:
name: projects/PROJECT_ID/policies/CONSTRAINT_NAME spec: rules: - enforce: true
Replace the following:
-
PROJECT_ID
: the project on which you want to enforce your constraint. -
CONSTRAINT_NAME
: the name you defined for your custom constraint. For example,custom.dataprocEnableComponentGateway
.
To enforce the organization policy containing the constraint, run the following command:
gcloud org-policies set-policy POLICY_PATH
Replace POLICY_PATH
with the full path to your organization policy
YAML file. The policy will take up to 15 minutes to take effect.
Test the custom constraint
The following cluster creation example assumes a custom organization policy has
been created and enforced on cluster creation to require enabling the
Component Gateway
(resource.config.endpointConfig.enableHttpPortAccess==true
).
gcloud dataproc clusters create example-cluster \
--project=PROJECT_ID \
--zone=COMPUTE_ZONE
Sample output (by default, the Component Gateway is not enabled when a Dataproc cluster is created):
Operation denied by custom org policies: ["customConstraints/custom.dataprocEnableComponentGateway": "Only allow Dataproc cluster creation if the Component Gateway is enabled"]
Dataproc constraints on resources and operations
The following Dataproc custom constraint fields are available to use when you create or update a Dataproc cluster. Note that when updating a cluster, only the constraints related to editable cluster parameters are supported (see Updating a cluster).
- Compute Engine network configuration (networkUri, internalIpOnly, serviceAccount, and metadata)
resource.config.gceClusterConfig.networkUri
resource.config.gceClusterConfig.internalIpOnly
resource.config.gceClusterConfig.serviceAccount
resource.config.gceClusterConfig.metadata
- Compute Engine instance group configuration (imageUri and machineTypeUri)
resource.config.masterConfig.imageUri
resource.config.masterConfig.machineTypeUri
resource.config.workerConfig.imageUri
resource.config.workerConfig.machineTypeUri
resource.config.secondaryWorkerConfig.imageUri
resource.config.secondaryWorkerConfig.machineTypeUri
- Compute Engine instance group disk configuration (bootDiskType, bootDiskSizeGb, numLocalSsds, and localSsdInterface)
resource.config.masterConfig.diskConfig.bootDiskType
resource.config.workerConfig.diskConfig.bootDiskType
resource.config.secondaryWorkerConfig.diskConfig.bootDiskType
resource.config.masterConfig.diskConfig.bootDiskSizeGb
resource.config.workerConfig.diskConfig.bootDiskSizeGb
resource.config.secondaryWorkerConfig.diskConfig.bootDiskSizeGb
resource.config.masterConfig.diskConfig.numLocalSsds
resource.config.workerConfig.diskConfig.numLocalSsds
resource.config.secondaryWorkerConfig.diskConfig.numLocalSsds
resource.config.masterConfig.diskConfig.localSsdInterface
resource.config.workerConfig.diskConfig.localSsdInterface
resource.config.secondaryWorkerConfig.diskConfig.localSsdInterface
- Initialization actions (executableFile)
resource.config.initializationActions.executableFile
- Software config (imageVersion, properties, and optionalComponents)
resource.config.softwareConfig.imageVersion
resource.config.softwareConfig.properties
resource.config.softwareConfig.optionalComponents
- Kerberos config (enableKerberos and crossRealmTrustKdc)
resource.config.securityConfig.kerberosConfig.enableKerberos
resource.config.securityConfig.kerberosConfig.crossRealmTrustKdc
- Component gateway (enableHttpPortAccess)
resource.config.endpointConfig.enableHttpPortAccess
- Metastore config (dataprocMetastoreService)
resource.config.metastoreConfig.dataprocMetastoreService
- Persistent Disk CMEK (gcePdKmsKeyName)
resource.config.encryptionConfig.gcePdKmsKeyName
- Cluster labels
resource.labels
- Cluster size
resource.config.masterConfig.numInstances
resource.config.workerConfig.numInstances
resource.config.secondaryWorkerConfig.numInstances
- Autoscaling
resource.config.autoscalingConfig.policyUri
Example custom constraints for common use cases
The following table provides examples of custom constraints:
Description | Constraint syntax |
---|---|
Restrict the number of Dataproc worker instances to 10 or fewer when a cluster is created or updated. |
name: organizations/ORGANIZATION_ID/customConstraints/custom.dataprocNoMoreThan10Workers resourceTypes: - dataproc.googleapis.com/Cluster methodTypes: - CREATE - UPDATE condition: "resource.config.workerConfig.numInstances + resource.config.secondaryWorkerConfig.numInstances > 10" actionType: DENY displayName: Total number of worker instances cannot be larger than 10 description: Cluster cannot have more than 10 workers, including primary and secondary workers. |
Prevent application master from running on Dataproc cluster preemptible workers. |
name: organizations/ORGANIZATION_ID/customConstraints/custom.dataprocAmPrimaryOnlyEnforced resourceTypes: - dataproc.googleapis.com/Cluster methodTypes: - CREATE condition: "('dataproc:am.primary_only' in resource.config.softwareConfig.properties) && (resource.config.softwareConfig.properties['dataproc:am.primary_only']==true)" actionType: ALLOW displayName: Application master cannot run on preemptible workers description: Property "dataproc:am.primary_only" must be "true". |
Disallow custom Hive properties on Dataproc clusters. |
name: organizations/ORGANIZATION_ID/customConstraints/custom.dataprocNoCustomHiveProperties resourceTypes: - dataproc.googleapis.com/Cluster methodTypes: - CREATE condition: "resource.config.softwareConfig.properties.all(p, !p.startsWith('hive:'))" actionType: ALLOW displayName: Cluster cannot have custom Hive properties description: Only allow Dataproc cluster creation if no property starts with Hive prefix "hive:". |
Disallow the use of the n1-standard-2 machine type for Dataproc master instances. |
name: organizations/ORGANIZATION_ID/customConstraints/custom.dataprocMasterMachineType resourceTypes: - dataproc.googleapis.com/Cluster methodTypes: - CREATE condition: "resource.config.masterConfig.machineTypeUri.contains('n1-standard-2')" actionType: DENY displayName: Master cannot use the n1-standard-2 machine type description: Prevent Dataproc cluster creation if the master machine type is n1-standard-2. |
Enforce the use of a specified initialization action script. |
name: organizations/ORGANIZATION_ID/customConstraints/custom.dataprocInitActionScript resourceTypes: - dataproc.googleapis.com/Cluster methodTypes: - CREATE condition: "resource.config.initializationActions.exists(action, action.executableFile=='gs://some/init-action.sh')" actionType: ALLOW displayName: Initialization action script "gs://some/init-action.sh" must be used description: Only allow Dataproc cluster creation if the "gs://some/init-action.sh" initialization action script is used. |
Enforce the use of a specified persistent disk encryption key. |
name: organizations/ORGANIZATION_ID/custom.dataprocPdCmek resourceTypes: - dataproc.googleapis.com/Cluster methodTypes: - CREATE condition: "resource.config.encryptionConfig.gcePdKmsKeyName == 'projects/project-id/locations/global/keyRings/key-ring-name/cryptoKeys/key-name'" actionType: ALLOW displayName: Cluster PD must be encrypted with "key-name" from "key-ring-name" key-ring description: Only allow Dataproc cluster creation if the PD is encrypted with "key-name" from "key-ring-name" key-ring. |
Enforce cluster label restrictions. |
name: organizations/ORGANIZATION_ID/customConstraints/custom.dataprocEnvLabel resourceTypes: - dataproc.googleapis.com/Cluster methodTypes: - CREATE - UPDATE condition: "('env' in resource.labels) && (resource.labels.env=='test')" actionType: DENY displayName: Cluster cannot have the "env=test" label description: Deny Dataproc cluster creation or update if the cluster will be labeled "env=test". |
Enforce the use of a non-default network. |
name: organizations/ORGANIZATION_ID/customConstraints/custom.dataprocNoDefaultNetwork resourceTypes: - dataproc.googleapis.com/Cluster methodTypes: - CREATE condition: "resource.config.gceClusterConfig.networkUri.contains('networks/default')" actionType: DENY displayName: Cluster cannot be created in the default network description: Deny Dataproc cluster creation if the cluster will be created in the default network. |
What's next
- See Introduction to the Organization Policy Service to learn more about organization policies.
- Learn more about how to create and manage organization policies.
- See the full list of predefined Organization policy constraints.