This page describes how to create, view, list, cancel, and delete storage batch operations jobs. It also describes how to use Cloud Audit Logs with storage batch operations jobs.
Before you begin
To create and manage storage batch operations jobs, complete the steps in the following sections.
Configure Storage Intelligence
To create and manage storage batch operations jobs, configure Storage Intelligence on the bucket where you want to run the job.
Set up Google Cloud CLI
You must use Google Cloud CLI version 516.0.0 or later.
Set the default project
Set the project where you want to create the storage batch operations job.
gcloud config set project PROJECT_ID
where PROJECT_ID is the ID of your project.
Enable API
Enable the storage batch operations API.
gcloud services enable storagebatchoperations.googleapis.com
Create a manifest
To use a manifest for object selection, create a manifest.
Create a storage batch operations job
This section describes how to create a storage batch operations job.
Command line
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
In your development environment, run the
gcloud storage batch-operations jobs create
command.gcloud storage batch-operations jobs create JOB_NAME --bucket=BUCKET_NAME OBJECT_SELECTION_FLAG JOB_TYPE_FLAG
Where:
JOB_NAME
is the name of the storage batch operations job.BUCKET_NAME
is the name of the bucket that contains one or more objects you want to process.OBJECT_SELECTION_FLAG
is one of the following flags:--included-object-prefixes
: Specify one or more object prefixes. For example:- To match a single prefix, use:
--included-object-prefixes='prefix1'
. - To match multiple prefixes, use a comma-separated prefix list:
--included-object-prefixes='prefix1,prefix2'
. - To include all objects, use an empty prefix:
--included-object-prefixes=''
.
- To match a single prefix, use:
--manifest-location
: Specify the manifest location. For example,gs://bucket_name/path/object_name.csv
.
JOB_TYPE_FLAG
is one of the following flags, depending on the job type.--delete-object
: Delete one or more objects.--put-metadata
: Update object metadata. Object metadata is stored as key-value pairs. Specify the key-value pair for the metadata you want to modify. You can specify one or more key-value pairs as a list.--rewrite-object
: Update the customer-managed encryption keys for one or more objects.--put-object-event-based-hold
: Enable event-based object holds.--no-put-object-event-based-hold
: Disable event-based object holds.--put-object-temporary-hold
: Enable temporary object holds.--no-put-object-temporary-hold
: Disable temporary object holds.
REST APIs
JSON API
Have gcloud CLI installed and initialized, which lets you generate an access token for the
Authorization
header.Create a JSON file that contains the settings for the storage batch operations job. The following are common settings to include:
{ "Description": "JOB_DESCRIPTION", "BucketList": { "Buckets": [ { "Bucket": "BUCKET_NAME", "Manifest": { "manifest_location": "MANIFEST_LOCATION" } "PrefixList": { "include_object_prefixes": "OBJECT_PREFIXES" } } ] }, "DeleteObject": { "permanent_object_deletion_enabled": OBJECT_DELETION_VALUE } "RewriteObject": { "kms_key":"KMS_KEY_VALUE" } "PutMetadata": { METADATA_KEY= METADATA_VALUE, ..., } "PutObjectHold": { "temporary_hold": TEMPORARY_HOLD_VALUE, "event_based_hold": EVENT_BASED_HOLD_VALUE } }
Where:
JOB_NAME
is the name of the storage batch operations job.JOB_DESCRIPTION
is the description of the storage batch operations job.BUCKET_NAME
is the name of the bucket that contains one or more objects you want to process.To specify the objects you want to process, use any one of the following attributes in the JSON file:
MANIFEST_LOCATION
is the manifest location. For example,gs://bucket_name/path/object_name.csv
.OBJECT_PREFIXES
is the comma-separated list containing one or more object prefixes. To match all objects, use an empty list.
Depending on the job you want to process, specify any one of the following options:
Delete objects:
"DeleteObject": { "permanent_object_deletion_enabled": OBJECT_DELETION_VALUE }
Where
OBJECT_DELETION_VALUE
isTRUE
to delete objects.Update the Customer-managed encryption key for objects:
"RewriteObject": { "kms_key": KMS_KEY_VALUE }
Where
KMS_KEY_VALUE
is the value of the object's KMS key you want to update.Update object metadata:
"PutMetadata": { METADATA_KEY= METADATA_VALUE, ..., }
Where
METADATA_VALUE
is the object's metadata key value. You can specify one or more key-value pairs as a list.Update object holds:
"PutObjectHold": { "temporary_hold": TEMPORARY_HOLD_VALUE, "event_based_hold": EVENT_BASED_HOLD_VALUE }
Where:
TEMPORARY_HOLD_VALUE
is used to enable or disable the temporary object hold. A value of1
enables the hold, and a value of2
disables the hold.EVENT_BASED_HOLD_VALUE
is used to enable or disable the event-based object hold. A value of1
enables the hold, and a value of2
disables the hold.
Use
cURL
to call the JSON API with aPOST
storage batch operations job request:curl -X POST --data-binary @JSON_FILE_NAME \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://storagebatchoperations.googleapis.com/v1/project=PROJECT_ID/locations/global/jobs?job_id=JOB_ID"
Where:
JSON_FILE_NAME
is the name of the JSON file.PROJECT_ID
is the ID or number of the project. For example,my-project
.JOB_ID
is the name of the storage batch operations job.
Get storage batch operations job details
This section describes how to get the storage batch operations job details.
Command line
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
In your development environment, run the
gcloud storage batch-operations jobs describe
command.gcloud storage batch-operations jobs describe JOB_ID
Where:
JOB_ID
is the name of the storage batch operations job.
REST APIs
JSON API
Have gcloud CLI installed and initialized, which lets you generate an access token for the
Authorization
header.Use
cURL
to call the JSON API with aGET
storage batch operations job request:curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://storagebatchoperations.googleapis.com/v1/projects/PROJECT_ID/locations/global/jobs?JOB_ID"
Where:
PROJECT_ID
is the ID or number of the project. For example,my-project
.JOB_ID
is the name of the storage batch operations job.
List storage batch operations jobs
This section describes how to list the storage batch operations jobs within a project.
Command line
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
In your development environment, run the
gcloud storage batch-operations jobs list
command.gcloud storage batch-operations jobs list
REST APIs
JSON API
Have gcloud CLI installed and initialized, which lets you generate an access token for the
Authorization
header.Use
cURL
to call the JSON API with aLIST
storage batch operations jobs request:curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://storagebatchoperations.googleapis.com/v1/projects/PROJECT_ID/locations/global/jobs"
Where:
PROJECT_ID
is the ID or number of the project. For example,my-project
.
Cancel a storage batch operations job
This section describes how to cancel a storage batch operations job within a project.
Command line
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
In your development environment, run the
gcloud storage batch-operations jobs cancel
command.gcloud storage batch-operations jobs cancel JOB_ID
Where:
JOB_ID
is the name of the storage batch operations job.
REST APIs
JSON API
Have gcloud CLI installed and initialized, which lets you generate an access token for the
Authorization
header.Use
cURL
to call the JSON API with aCANCEL
a storage batch operations job request:curl -X CANCEL \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://storagebatchoperations.googleapis.com/v1/projects/PROJECT_ID/locations/global/jobs/JOB_ID"
Where:
PROJECT_ID
is the ID or number of the project. For example,my-project
.JOB_ID
is the name of the storage batch operations job.
Delete a storage batch operations job
This section describes how to delete a storage batch operations job.
Command line
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
In your development environment, run the
gcloud storage batch-operations jobs delete
command.gcloud storage batch-operations jobs delete JOB_ID
Where:
JOB_ID
is the name of the storage batch operations job.
REST APIs
JSON API
Have gcloud CLI installed and initialized, which lets you generate an access token for the
Authorization
header.Use
cURL
to call the JSON API with aDELETE
a storage batch operations job request:curl -X DELETE \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://storagebatchoperations.googleapis.com/v1/projects/PROJECT_ID/locations/global/jobs/JOB_ID"
Where:
PROJECT_ID
is the ID or number of the project. For example,my-project
.JOB_ID
is the name of the storage batch operations job.
Create a storage batch operations job using Storage Insights datasets
To create a storage batch operations job using Storage Insights datasets, complete the steps in the following sections.
Create a manifest using Storage Insights datasets
You can create the manifest for your storage batch operations job by extracting data from BigQuery. To do so, you'll need to query the linked dataset, export the resulting data as a CSV file, and save it to a Cloud Storage bucket. The storage batch operations job can then use this CSV file as its manifest.
Running the following SQL query in BigQuery on a Storage Insights dataset view retrieves objects larger than 1 KiB that are named Temp_Training
:
EXPORT DATA OPTIONS( uri=`URI`, format=`CSV`, overwrite=OVERWRITE_VALUE, field_delimiter=',') AS SELECT bucket, name, generation FROM DATASET_VIEW_NAME WHERE bucket = BUCKET_NAME AND name LIKE (`Temp_Training%`) AND size > 1024 * 1024 AND snapshotTime = SNAPSHOT_TIME
Where:
URI
is the URI to the bucket that contains the manifest. For example,gs://bucket_name/path_to_csv_file/*.csv
. When you use the*.csv
wildcard, BigQuery exports the result to multiple CSV files.OVERWRITE_VALUE
is a boolean value. If set totrue
, the export operation overwrites existing files at the specified location.DATASET_VIEW_NAME
is the fully qualified name of the Storage Insights dataset view inPROJECT_ID.DATASET_ID.VIEW_NAME
format. To find the name of your dataset, view the linked dataset.Where:
PROJECT_ID
is the ID or number of the project. For example,my-project
.DATASET_ID
is the name of the dataset. For example,objects-deletion-dataset
.VIEW_NAME
is the name of the dataset view. For example,bucket_attributes_view
.
BUCKET_NAME
is the name of the bucket. For example,my-bucket
.SNAPSHOT_TIME
is the snapshot time of the Storage Insights dataset view. For example,2024-09-10T00:00:00Z
.
Create a storage batch operations job
To create a storage batch operations job to process objects contained in the manifest, complete the following steps:
Command line
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
In your development environment, run the
gcloud storage batch-operations jobs create
command:gcloud storage batch-operations jobs create \ JOB_ID \ --bucket=SOURCE_BUCKET_NAME \ --manifest-location=URI \ --JOB_TYPE_FLAG
Where:
JOB_ID
is the name of the storage batch operations job.SOURCE_BUCKET_NAME
is the bucket that contains one or more objects you want to process. For example,my-bucket
.URI
is the URI to the bucket that contains the manifest. For example,gs://bucket_name/path_to_csv_file/*.csv
. When you use the*.csv
wildcard, BigQuery exports the result to multiple CSV files.JOB_TYPE_FLAG
is one of the following flags, depending on the job type.--delete-object
: Delete one or more objects.--put-metadata
: Update object metadata. Object metadata is stored as key-value pairs. Specify the key-value pair for the metadata you want to modify. You can specify one or more key-value pairs as a list.--rewrite-object
: Update the customer-managed encryption keys for one or more objects.--put-object-event-based-hold
: Enable event-based object holds.--no-put-object-event-based-hold
: Disable event-based object holds.--put-object-temporary-hold
: Enable temporary object holds.--no-put-object-temporary-hold
: Disable temporary object holds.
Use Cloud Audit Logs for storage batch operations jobs
Storage batch operations jobs record transformations on Cloud Storage
objects in Cloud Storage Cloud Audit Logs. You can use Cloud Audit Logs
with Cloud Storage to track the object transformations that
storage batch operations jobs perform. For information about enabling audit
logs, see Enabling audit logs. In the audit log entry, the callUserAgent
metadata field with the value StorageBatchOperations
indicates a
storage batch operations transformation.
Next Steps
- Learn about Storage Insights datasets