- NAME
-
- gcloud alpha dataplex datascans create data-discovery - create a Dataplex data discovery scan job
- SYNOPSIS
-
-
gcloud alpha dataplex datascans create data-discovery
(DATASCAN
:--location
=LOCATION
)--data-source-resource
=DATA_SOURCE_RESOURCE
[--description
=DESCRIPTION
] [--display-name
=DISPLAY_NAME
] [--labels
=[KEY
=VALUE
,…]] [--async
|--validate-only
] [--bigquery-publishing-connection
=BIGQUERY_PUBLISHING_CONNECTION
--bigquery-publishing-dataset-location
=BIGQUERY_PUBLISHING_DATASET_LOCATION
--bigquery-publishing-table-type
=BIGQUERY_PUBLISHING_TABLE_TYPE
--storage-exclude-patterns
=[PATTERN
,…]--storage-include-patterns
=[PATTERN
,…]--csv-delimiter
=CSV_DELIMITER
--csv-disable-type-inference
=CSV_DISABLE_TYPE_INFERENCE
--csv-encoding
=CSV_ENCODING
--csv-header-row-count
=CSV_HEADER_ROW_COUNT
--csv-quote-character
=CSV_QUOTE_CHARACTER
--json-disable-type-inference
=JSON_DISABLE_TYPE_INFERENCE
--json-encoding
=JSON_ENCODING
] [--on-demand
=ON_DEMAND
|--schedule
=SCHEDULE
] [GCLOUD_WIDE_FLAG …
]
-
- DESCRIPTION
-
(ALPHA)
Allows users to auto discover BigQuery External and BigLake tables from underlying Cloud Storage buckets. - EXAMPLES
-
To create a data discovery scan
data-discovery-datascan
in projecttest-project
located inus-central1
on Cloud Storage buckettest-bucket
, run:gcloud alpha dataplex datascans create data-discovery data-discovery-datascan --project=test-project --location=us-central1 --data-source-resource="//storage.googleapis.com/projects/test-project/buckets/test-bucket"
- POSITIONAL ARGUMENTS
-
-
Datascan resource - Arguments and flags that define the Dataplex datascan you
want to create a data discovery scan for. The arguments in this group can be
used to specify the attributes of this resource. (NOTE) Some attributes are not
given arguments in this group but can be set in other ways.
To set the
project
attribute:-
provide the argument
datascan
on the command line with a fully specified name; -
provide the argument
--project
on the command line; -
set the property
core/project
.
This must be specified.
DATASCAN
-
ID of the datascan or fully qualified identifier for the datascan.
To set the
dataScans
attribute:-
provide the argument
datascan
on the command line.
This positional argument must be specified if any of the other arguments in this group are specified.
-
provide the argument
--location
=LOCATION
-
The location of the Dataplex resource.
To set the
location
attribute:-
provide the argument
datascan
on the command line with a fully specified name; -
provide the argument
--location
on the command line; -
set the property
dataplex/location
.
-
provide the argument
-
provide the argument
-
Datascan resource - Arguments and flags that define the Dataplex datascan you
want to create a data discovery scan for. The arguments in this group can be
used to specify the attributes of this resource. (NOTE) Some attributes are not
given arguments in this group but can be set in other ways.
- REQUIRED FLAGS
-
--data-source-resource
=DATA_SOURCE_RESOURCE
-
Fully-qualified service resource name of the cloud resource bucket that contains
the data for the data discovery scan, of the form:
//storage.googleapis.com/projects/{project_id_or_number}/buckets/{bucket_id}
.
- OPTIONAL FLAGS
-
--description
=DESCRIPTION
- Description of the data discovery scan.
--display-name
=DISPLAY_NAME
- Display name of the data discovery scan.
--labels
=[KEY
=VALUE
,…]-
List of label KEY=VALUE pairs to add.
Keys must start with a lowercase character and contain only hyphens (
-
), underscores (_
), lowercase characters, and numbers. Values must contain only hyphens (-
), underscores (_
), lowercase characters, and numbers. -
At most one of --async | --validate-only can be specified.
At most one of these can be specified:
--async
- Return immediately, without waiting for the operation in progress to complete.
--validate-only
- Validate the create action, but don't actually perform it.
-
Data spec for the data discovery scan.
-
BigQuery publishing config arguments for the data discovery scan.
--bigquery-publishing-connection
=BIGQUERY_PUBLISHING_CONNECTION
-
BigQuery connection to use for auto discovering cloud resource bucket to BigLake
tables in format
projects/{project_id}/locations/{location_id}/connections/{connection_id}
. Connection is required forBIGLAKE
BigQuery publishing table type. --bigquery-publishing-dataset-location
=BIGQUERY_PUBLISHING_DATASET_LOCATION
- The location of the BigQuery dataset to publish BigLake external or non-BigLake external tables to. If not specified, the dataset location will be set to the location of the data source resource. Refer to https://cloud.google.com/bigquery/docs/locations#supportedLocations for supported locations.
--bigquery-publishing-table-type
=BIGQUERY_PUBLISHING_TABLE_TYPE
-
BigQuery table type to discover the cloud resource bucket. Can be either
EXTERNAL
orBIGLAKE
. If not specified, the table type will be set toEXTERNAL
.BIGQUERY_PUBLISHING_TABLE_TYPE
must be one of:BIGLAKE
- Cloud Storage bucket is discovered to BigQuery BigLake tables.
EXTERNAL
- Default value. Cloud Storage bucket is discovered to BigQuery External tables.
-
Storage config arguments for the data discovery scan.
--storage-exclude-patterns
=[PATTERN
,…]- List of patterns that identify the data to exclude during discovery. When both include and exclude patterns exist at the same time, exclude patterns are applied first.
--storage-include-patterns
=[PATTERN
,…]- List of patterns that identify the data to include during discovery when only a subset of the data should be considered.
-
CSV options arguments for the data discovery scan.
--csv-delimiter
=CSV_DELIMITER
- Delimiter used to separate values in the CSV file. If not specified, the delimiter will be set to comma (",").
--csv-disable-type-inference
=CSV_DISABLE_TYPE_INFERENCE
- Whether to disable the inference of data types for CSV data. If true, all columns are registered as strings.
--csv-encoding
=CSV_ENCODING
- Character encoding of the CSV file. If not specified, the encoding will be set to UTF-8.
--csv-header-row-count
=CSV_HEADER_ROW_COUNT
- The number of rows to interpret as header rows that should be skipped when reading data rows. The default value is 1.
--csv-quote-character
=CSV_QUOTE_CHARACTER
- The character used to quote column values. Accepts " (double quotation mark) or ' (single quotation mark). If unspecified, defaults to " (double quotation mark).
-
JSON options arguments for the data discovery scan.
--json-disable-type-inference
=JSON_DISABLE_TYPE_INFERENCE
- Whether to disable the inference of data types for JSON data. If true, all columns are registered as strings.
--json-encoding
=JSON_ENCODING
- Character encoding of the JSON file. If not specified, the encoding will be set to UTF-8.
-
BigQuery publishing config arguments for the data discovery scan.
-
Data discovery scan execution settings.
-
Data discovery scan scheduling and trigger settings.
At most one of these can be specified:
--on-demand
=ON_DEMAND
- If set, the scan runs one-time shortly after data discovery scan creation.
--schedule
=SCHEDULE
-
Cron schedule (https://en.wikipedia.org/wiki/Cron) for running scans
periodically. To explicitly set a timezone to the cron tab, apply a prefix in
the cron tab: "CRON_TZ=${IANA_TIME_ZONE}" or "TZ=${IANA_TIME_ZONE}". The
${IANA_TIME_ZONE} may only be a valid string from IANA time zone database. For
example,
CRON_TZ=America/New_York 1 * * * *
orTZ=America/New_York 1 * * * *
. This field is required for RECURRING scans.
-
Data discovery scan scheduling and trigger settings.
- GCLOUD WIDE FLAGS
-
These flags are available to all commands:
--access-token-file
,--account
,--billing-project
,--configuration
,--flags-file
,--flatten
,--format
,--help
,--impersonate-service-account
,--log-http
,--project
,--quiet
,--trace-token
,--user-output-enabled
,--verbosity
.Run
$ gcloud help
for details. - NOTES
- This command is currently in alpha and might change without notice. If this command fails with API permission errors despite specifying the correct project, you might be trying to access an API with an invitation-only early access allowlist.
gcloud alpha dataplex datascans create data-discovery
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-05-06 UTC.