gcloud alpha dataproc clusters gke create

NAME
gcloud alpha dataproc clusters gke create - create a GKE-based virtual cluster
SYNOPSIS
gcloud alpha dataproc clusters gke create (CLUSTER : --region=REGION) --spark-engine-version=SPARK_ENGINE_VERSION (--gke-cluster=GKE_CLUSTER : --gke-cluster-location=GKE_CLUSTER_LOCATION) [--async] [--namespace=NAMESPACE] [--pools=[KEY=VALUE[;VALUE],…]] [--properties=[PREFIX:PROPERTY=VALUE,…]] [--setup-workload-identity] [--staging-bucket=STAGING_BUCKET] [--history-server-cluster=HISTORY_SERVER_CLUSTER : --history-server-cluster-region=HISTORY_SERVER_CLUSTER_REGION] [--metastore-service=METASTORE_SERVICE : --metastore-service-location=METASTORE_SERVICE_LOCATION] [GCLOUD_WIDE_FLAG]
DESCRIPTION
(ALPHA) Create a GKE-based virtual cluster.
EXAMPLES
Create a Dataproc on GKE cluster in us-central1 on a GKE cluster in the same project and region with default values:
gcloud alpha dataproc clusters gke create my-cluster --region=us-central1 --gke-cluster=my-gke-cluster --spark-engine-version=latest --pools='name=dp,roles=default'

Create a Dataproc on GKE cluster in us-central1 on a GKE cluster in the same project and zone us-central1-f with default values:

gcloud alpha dataproc clusters gke create my-cluster --region=us-central1 --gke-cluster=my-gke-cluster --gke-cluster-location=us-central1-f --spark-engine-version=3.1 --pools='name=dp,roles=default'

Create a Dataproc on GKE cluster in us-central1 with machine type 'e2-standard-4', autoscaling 5-15 nodes per zone.

gcloud alpha dataproc clusters gke create my-cluster --region='us-central1' --gke-cluster='projects/my-project/locations/us-central1/clusters/my-gke-cluster' --spark-engine-version=dataproc-1.5 --pools='name=dp-default,roles=default,machineType=e2-standard-4,min=5,max=15'

Create a Dataproc on GKE cluster in us-central1 with two distinct node pools.

gcloud alpha dataproc clusters gke create my-cluster --region='us-central1' --gke-cluster='my-gke-cluster' --spark-engine-version='dataproc-2.0' --pools='name=dp-default,roles=default,machineType=e2-standard-4' --pools='name=workers,roles=spark-drivers;spark-executors,machineType=n2-standard-8
POSITIONAL ARGUMENTS
Cluster resource - The name of the cluster to create. The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways.

To set the project attribute:

  • provide the argument cluster on the command line with a fully specified name;
  • provide the argument --project on the command line;
  • set the property core/project.

This must be specified.

CLUSTER
ID of the cluster or fully qualified identifier for the cluster.

To set the cluster attribute:

  • provide the argument cluster on the command line.

This positional argument must be specified if any of the other arguments in this group are specified.

--region=REGION
Dataproc region for the cluster. Each Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. Overrides the default dataproc/region property value for this command invocation.

To set the region attribute:

  • provide the argument cluster on the command line with a fully specified name;
  • provide the argument --region on the command line;
  • set the property dataproc/region.
REQUIRED FLAGS
--spark-engine-version=SPARK_ENGINE_VERSION
The version of the Spark engine to run on this cluster.
Gke cluster resource - The GKE cluster to install the Dataproc cluster on. The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways.

To set the project attribute:

  • provide the argument --gke-cluster on the command line with a fully specified name;
  • provide the argument --project on the command line;
  • set the property core/project.

This must be specified.

--gke-cluster=GKE_CLUSTER
ID of the gke-cluster or fully qualified identifier for the gke-cluster.

To set the gke-cluster attribute:

  • provide the argument --gke-cluster on the command line.

This flag argument must be specified if any of the other arguments in this group are specified.

--gke-cluster-location=GKE_CLUSTER_LOCATION
GKE region for the gke-cluster.

To set the gke-cluster-location attribute:

  • provide the argument --gke-cluster on the command line with a fully specified name;
  • provide the argument --gke-cluster-location on the command line;
  • provide the argument --region on the command line;
  • set the property dataproc/region.
OPTIONAL FLAGS
--async
Return immediately, without waiting for the operation in progress to complete.
--namespace=NAMESPACE
The name of the Kubernetes namespace to deploy Dataproc system components in. This namespace does not need to exist.
--pools=[KEY=VALUE[;VALUE],…]
Each --pools flag represents a GKE node pool associated with the virtual cluster. It is a comma-separated list in the form KEY=VALUE[;VALUE], where certain keys may have multiple values.

The following KEYs must be specified:

-----------------------------------------------------------------------------------------------------------
KEY    Type             Example                  Description
------ ---------------- ------------------------ ----------------------------------------------------------
name   string           `my-node-pool`          Name of the node pool.
roles  repeated string  `default;spark-driver`  Roles that each node pool will perform.
                                                [One Pool must have DEFAULT role] Valid values are
                                                `default`, `controller`, `spark-driver`, `spark-executor`.
-----------------------------------------------------------------------------------------------------------

The following KEYs may be specified:

----------------------------------------------------------------------------------------------------------------------------------------------------------------
KEY                Type             Example                                       Description
--------------- ---------------- --------------------------------------------- ---------------------------------------------------------------------------------
machineType        string           `n1-standard-8`                               Compute Engine machine type to use.
preemptible        boolean          `false`                                       If true, then this node pool uses preemptible VMs.
                                                                                  This Must be `false` for a node pool with the CONTROLLER role or
                                                                                  for a node pool with the DEFAULT role in no node pool has the CONTROLLER role.
localSsdCount      int              `2`                                           The number of local SSDs to attach to each node.
localNvmeSsdCount  int              `2`                                           The number of local NVMe SSDs to attach to each node.
accelerator        repeated string  `nvidia-tesla-a100=1`                         Accelerators to attach to each node, in NODE=COUNT format.
minCpuPlatform     string           `Intel Skylake`                               Minimum CPU platform for each node.
bootDiskKmsKey     string           `projects/project-id/locations/us-central1    The Customer Managed Encryption Key (CMEK) used to encrypt
                                    /keyRings/keyRing-name/cryptoKeys/key-name`   the boot disk attached to each node in the node pool.
locations          repeated string  `us-west1-a;us-west1-c`                       Zones within the location of the GKE cluster.
                                                                                  All `--pools` flags for a Dataproc cluster must have identical locations.
min                int              `0`                                           Minimum number of nodes per zone that this node pool can scale down to.
max                int              `10`                                          Maximum number of nodes per zone that this node pool can scale up to.
----------------------------------------------------------------------------------------------------------------------------------------------------------------
--properties=[PREFIX:PROPERTY=VALUE,…]
Specifies configuration properties for installed packages, such as Spark. Properties are mapped to configuration files by specifying a prefix, such as "core:io.serializations".
--setup-workload-identity
Sets up the GKE Workload Identity for your Dataproc on GKE cluster. Note that running this requires elevated permissions as it will manipulate IAM policies on the Google Service Accounts that will be used by your Dataproc on GKE cluster.
--staging-bucket=STAGING_BUCKET
The Cloud Storage bucket to use to stage job dependencies, miscellaneous config files, and job driver console output when using this cluster.
History server cluster resource - A Dataproc Cluster created as a History Server, see https://cloud.google.com/dataproc/docs/concepts/jobs/history-server The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways.

To set the project attribute:

  • provide the argument --history-server-cluster on the command line with a fully specified name;
  • provide the argument --project on the command line;
  • set the property core/project.
--history-server-cluster=HISTORY_SERVER_CLUSTER
ID of the history-server-cluster or fully qualified identifier for the history-server-cluster.

To set the history-server-cluster attribute:

  • provide the argument --history-server-cluster on the command line.

This flag argument must be specified if any of the other arguments in this group are specified.

--history-server-cluster-region=HISTORY_SERVER_CLUSTER_REGION
Compute Engine region for the history-server-cluster. It must be the same region as the Dataproc cluster that is being created.

To set the history-server-cluster-region attribute:

  • provide the argument --history-server-cluster on the command line with a fully specified name;
  • provide the argument --history-server-cluster-region on the command line;
  • provide the argument --region on the command line;
  • set the property dataproc/region.
Metastore service resource - Dataproc Metastore Service to be used as an external metastore. The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways.

To set the project attribute:

  • provide the argument --metastore-service on the command line with a fully specified name;
  • provide the argument --project on the command line;
  • set the property core/project.
--metastore-service=METASTORE_SERVICE
ID of the metastore-service or fully qualified identifier for the metastore-service.

To set the metastore-service attribute:

  • provide the argument --metastore-service on the command line.

This flag argument must be specified if any of the other arguments in this group are specified.

--metastore-service-location=METASTORE_SERVICE_LOCATION
Dataproc Metastore location for the metastore-service.

To set the metastore-service-location attribute:

  • provide the argument --metastore-service on the command line with a fully specified name;
  • provide the argument --metastore-service-location on the command line;
  • provide the argument --region on the command line;
  • set the property dataproc/region.
GCLOUD WIDE FLAGS
These flags are available to all commands: --access-token-file, --account, --billing-project, --configuration, --flags-file, --flatten, --format, --help, --impersonate-service-account, --log-http, --project, --quiet, --trace-token, --user-output-enabled, --verbosity.

Run $ gcloud help for details.

NOTES
This command is currently in alpha and might change without notice. If this command fails with API permission errors despite specifying the correct project, you might be trying to access an API with an invitation-only early access allowlist. These variants are also available:
gcloud dataproc clusters gke create
gcloud beta dataproc clusters gke create