Storage batch operations

This page provides an overview of storage batch operations, its benefits, use cases, job configurations, and its limitations.

Overview

Storage batch operations is a Cloud Storage management feature that performs operations on billions of Cloud Storage objects in a serverless manner.

Using storage batch operations, you can automate large-scale API operations on Cloud Storage objects, reducing the development time required to write and maintain scripts for such operations. To ensure reliable execution, storage batch operations automatically manages retries for failed operations. Additionally, storage batch operations offers detailed progress tracking to monitor the status and completion of all jobs. Storage batch operations is an exclusive feature only available through the Storage Intelligence subscription. For pricing information, refer to Storage Intelligence pricing.

Benefits

  • Scalability: Perform transformations on millions of objects with a single storage batch operations job.
  • Serverless execution: Run batch jobs in a serverless environment, eliminating the need to manage infrastructure.
  • Automation: Automate complex and repetitive tasks, improving operational efficiency.
  • Reduced development time: Avoid writing and maintaining complex custom scripts.
  • Performance: Complete time-sensitive operations within the required time. With multiple batch jobs running concurrently on a bucket, you can process up to one billion objects within three hours.

Use cases

Storage batch operations, when used with Storage Insights, is valuable for the following use cases:

  • Security management:

    • Set encryption keys on multiple objects using the rewrite object method.
    • Apply or remove object holds to control object immutability.
  • Compliance:

    • Use object holds to meet data retention requirements for regulatory compliance.
    • Delete data between specific timeframes, to meet wipeout compliance requirements.
  • Data transformation: Perform bulk updates to object metadata.

  • Cost optimization: Bulk delete objects in Cloud Storage buckets to reduce storage costs.

Job configurations

To create a storage batch operations job, you'll need to set the following job configurations. Job configurations are parameters that control how the job is defined for different processing requirements.

  • Job name: A unique name to identify the storage batch operations job. This is used for tracking, monitoring, and referencing the job. Job names are alphanumeric, for example, job-01.

  • Job Description (Optional): A brief description of the job's purpose. This helps with understanding and documenting the job details. For example, Deletes all objects in a bucket.

  • Bucket name: The name of the storage bucket containing the objects to be processed. This is essential for locating the input data. For example, my-bucket. You can specify only one bucket name for a job.

  • Object selection: The selection criteria that defines which objects to process. You can specify the criteria using any one of the following options:

    • Manifest: Create a manifest and specify its location when you create the storage batch operations job. The manifest is a CSV file, uploaded to Google Cloud, that contains one object or a list of objects that you want to process. Each row in the manifest must include the bucket and name of the object. You can optionally specify the generation of the object. If you don't specify the generation, the current version of the object is used.

      The file must include a header row of the following format:

      bucket,name,generation

      The following is an example of the manifest:

      bucket,name,generation
      bucket_1,object_1,generation_1
      bucket_1,object_2,generation_2
      bucket_1,object_3,generation_3
      

      You can also create a manifest using Storage Insights datasets. For details, see Create a manifest using Storage Insights datasets.

    • Object prefixes: Specify a list of prefixes to filter objects within the bucket. Only objects with these prefixes are processed. If empty, all objects in the bucket are processed.

  • Job type: Storage batch operations supports the following job types, running a single job per batch operation.

    • Object deletion: You can delete objects within a bucket. This is crucial for cost optimization, data lifecycle management, and compliance with data deletion policies.

    • Metadata updates: You can modify the object metadata. This includes updating custom metadata, storage class, and other object properties.

    • Object hold updates: You can enable or disable object holds. Object holds prevent objects from being deleted or modified, which is essential for compliance and data retention purposes.

    • Object encryption key updates: You can manage the customer-managed encryption keys for one or more objects. This includes applying or changing encryption keys using the rewrite object method.

Limitations

Storage batch operations has the following limitations:

  • Storage batch operations jobs have a maximum lifetime of 14 days. Any ongoing job that doesn't complete within 14 days of its creation is automatically cancelled.

  • We don't recommend running more than 20 concurrent batch operations jobs on the same bucket.

  • Storage batch operations is not compatible with VPC Service Controls.

  • Storage batch operations is not supported on the following buckets:

    • Buckets that have Requestor Pays enabled.

    • Buckets located in the eur4 or us-west8 regions.

Next Steps