Download many objects

Use Transfer Manager to download many objects with concurrency.

Explore further

For detailed documentation that includes this code sample, see the following:

Code sample

Java

For more information, see the Cloud Storage Java API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import com.google.cloud.storage.BlobInfo;
import com.google.cloud.storage.transfermanager.DownloadResult;
import com.google.cloud.storage.transfermanager.ParallelDownloadConfig;
import com.google.cloud.storage.transfermanager.TransferManager;
import com.google.cloud.storage.transfermanager.TransferManagerConfig;
import java.nio.file.Path;
import java.util.List;

class DownloadMany {

  public static void downloadManyBlobs(
      String bucketName, List<BlobInfo> blobs, Path destinationDirectory) {

    TransferManager transferManager = TransferManagerConfig.newBuilder().build().getService();
    ParallelDownloadConfig parallelDownloadConfig =
        ParallelDownloadConfig.newBuilder()
            .setBucketName(bucketName)
            .setDownloadDirectory(destinationDirectory)
            .build();

    List<DownloadResult> results =
        transferManager.downloadBlobs(blobs, parallelDownloadConfig).getDownloadResults();

    for (DownloadResult result : results) {
      System.out.println(
          "Download of "
              + result.getInput().getName()
              + " completed with status "
              + result.getStatus());
    }
  }
}

Node.js

For more information, see the Cloud Storage Node.js API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// The ID of your GCS bucket
// const bucketName = 'your-unique-bucket-name';

// The ID of the first GCS file to download
// const firstFileName = 'your-first-file-name';

// The ID of the second GCS file to download
// const secondFileName = 'your-second-file-name;

// Imports the Google Cloud client library
const {Storage, TransferManager} = require('@google-cloud/storage');

// Creates a client
const storage = new Storage();

// Creates a transfer manager client
const transferManager = new TransferManager(storage.bucket(bucketName));

async function downloadManyFilesWithTransferManager() {
  // Downloads the files
  await transferManager.downloadManyFiles([firstFileName, secondFileName]);

  for (const fileName of [firstFileName, secondFileName]) {
    console.log(`gs://${bucketName}/${fileName} downloaded to ${fileName}.`);
  }
}

downloadManyFilesWithTransferManager().catch(console.error);

Python

For more information, see the Cloud Storage Python API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

def download_many_blobs_with_transfer_manager(
    bucket_name, blob_names, destination_directory="", workers=8
):
    """Download blobs in a list by name, concurrently in a process pool.

    The filename of each blob once downloaded is derived from the blob name and
    the `destination_directory `parameter. For complete control of the filename
    of each blob, use transfer_manager.download_many() instead.

    Directories will be created automatically as needed to accommodate blob
    names that include slashes.
    """

    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"

    # The list of blob names to download. The names of each blobs will also
    # be the name of each destination file (use transfer_manager.download_many()
    # instead to control each destination file name). If there is a "/" in the
    # blob name, then corresponding directories will be created on download.
    # blob_names = ["myblob", "myblob2"]

    # The directory on your computer to which to download all of the files. This
    # string is prepended (with os.path.join()) to the name of each blob to form
    # the full path. Relative paths and absolute paths are both accepted. An
    # empty string means "the current working directory". Note that this
    # parameter allows accepts directory traversal ("../" etc.) and is not
    # intended for unsanitized end user input.
    # destination_directory = ""

    # The maximum number of processes to use for the operation. The performance
    # impact of this value depends on the use case, but smaller files usually
    # benefit from a higher number of processes. Each additional process occupies
    # some CPU and memory resources until finished. Threads can be used instead
    # of processes by passing `worker_type=transfer_manager.THREAD`.
    # workers=8

    from google.cloud.storage import Client, transfer_manager

    storage_client = Client()
    bucket = storage_client.bucket(bucket_name)

    results = transfer_manager.download_many_to_path(
        bucket, blob_names, destination_directory=destination_directory, max_workers=workers
    )

    for name, result in zip(blob_names, results):
        # The results list is either `None` or an exception for each blob in
        # the input list, in order.

        if isinstance(result, Exception):
            print("Failed to download {} due to exception: {}".format(name, result))
        else:
            print("Downloaded {} to {}.".format(name, destination_directory + name))

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.