This page describes how to use cross-bucket replication, which uses Storage Transfer Service to copy new and updated objects asynchronously from a source bucket to a destination bucket. When you use cross-bucket replication, you create and manage replication jobs, which are a type of job in Storage Transfer Service.
Before you begin
Before you begin, complete the following steps.
Enable the Storage Transfer Service API
If you haven't already, enable the Storage Transfer Service API.
Get required roles
To get the permissions that you need to use cross-bucket replication,
ask your administrator to grant you the
Storage Transfer User (roles/storagetransfer.user
) IAM role on the bucket or the project.
This predefined role contains the permissions required to use cross-bucket replication. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to use cross-bucket replication:
-
storagetransfer.jobs.create
-
storagetransfer.jobs.delete
-
storagetransfer.jobs.get
-
storagetransfer.jobs.list
-
storagetransfer.jobs.run
-
storagetransfer.jobs.update
For instructions on granting roles on buckets, see Use IAM with buckets. For instructions on granting roles on projects, see Manage access to projects.
Grant required roles
Cross-bucket replication uses Pub/Sub to receive notifications of changes to your source bucket and Storage Transfer Service to replicate objects from your source bucket to your destination bucket. To use cross-bucket replication, you must also grant the required permissions to the service agent that's used by Storage Transfer Service to replicate data and the service agent that's used by Pub/Sub to write notifications.
Grant required roles to Storage Transfer Service service agent
Storage Transfer Service uses a Google-managed service agent to replicate data. The
email address of this service agent follows the naming format
project-PROJECT_NUMBER@storage-transfer-service.iam.gserviceaccount.com
.
You can get the email address of the Storage Transfer Service service agent by
using the Storage Transfer Service googleServiceAccounts.get
API.
The Storage Transfer Service service agent needs the following permissions to replicate your objects and set up Pub/Sub notifications for your source bucket:
Required permissions
storage.buckets.get
on the source and destination bucketstorage.buckets.update
on the source bucketstorage.objects.list
on the source bucketstorage.objects.get
on the source bucketstorage.objects.rewrite
on the destination bucketpubsub.topics.create
on the project
These permissions can be granted through the Pub/Sub Editor
(roles/pubsub.editor
) role and the Storage Admin (roles/storage.admin
) role.
For a less permissive role than the Storage Admin role, you can also use
a custom role.
Grant required roles to Cloud Storage service agent
Cloud Storage uses a Google-managed service agent to manage
Pub/Sub notifications. The email address of this service agent follows
the naming format
service-PROJECT_NUMBER@gs-project-accounts.iam.gserviceaccount.com
.
The Cloud Storage service agent needs the following permissions to set up Pub/Sub and publish messages to a topic:
Required permissions
pubsub.topics.publish
on the Pub/Sub topicpubsub.subscriptions.consume
on the Pub/Sub topicpubsub.subscriptions.create
on the project
These permission can be granted through the Pub/Sub Publisher
(roles/pubsub.publisher
) role.
Create a replication job
Console
When using the Google Cloud console, you can create a replication job for existing buckets or for new buckets during the bucket creation process.
To create a replication job for a new bucket, follow the instructions for creating a new bucket.
To create a replication job for an existing bucket, complete the following steps:
- In the Google Cloud console, go to the Cloud Storage Buckets page.
In the list of buckets, click the name of the source bucket whose objects you want to replicate.
On the Bucket details page, click the Configuration tab.
Locate the Cross-bucket replication option and click
Edit.In the Edit cross-bucket replication pane that opens, click Add a destination.
In the Choose a destination section, select a destination bucket, then click Next.
In the Choose replication settings section, do the following:
Optional: To filter objects to replicate by object name prefix, select the Replicate objects based on prefix checkbox in the Choose which objects to replicate section.
To include objects by prefix, enter a prefix in the Include objects with prefix section, then click
Add a prefix.To exclude objects by prefix, enter a prefix in the Exclude objects with prefix section, then click
Add a prefix.
Optional: To set a storage class for replicated objects, select a storage class from the menu in the Set storage class for replicated objects section.
If you skip this step, replicated objects use the destination bucket's storage class by default.
- Click Save.
Command line
When using the Google Cloud CLI, you can create a replication job for existing buckets.
To create a replication job, use the
gcloud alpha transfer jobs create
command with the --replication
flag:
gcloud alpha transfer jobs create gs://SOURCE_BUCKET_NAME gs://DESTINATION_BUCKET_NAME --replication
Replace:
SOURCE_BUCKET_NAME
with the name of the source bucket you want to replicate. For example,my-source-bucket
.DESTINATION_BUCKET_NAME
with the name of the destination bucket. For example,my-destination-bucket
.
REST APIs
JSON API
When using the JSON API, you can create a replication job for existing buckets.
Have gcloud CLI installed and initialized, in order to generate an access token for the
Authorization
header.Alternatively, you can create an access token using the OAuth 2.0 Playground and include it in the
Authorization
header.Create a JSON file that contains a
TransferJob
object with an initializedReplicationSpec
resource:TransferJob { "name": "TRANSFER_JOB_NAME", ... ReplicationSpec: { "gcsDataSource": { "bucketName": "SOURCE_BUCKET_NAME" }, "gcsDataSink" { "bucketName": "DESTINATION_BUCKET_NAME" }, "objectConditions": { }, "transferOptions": { "overwriteWhen": "OVERWRITE_OPTION" } } ... }
Replace:
TRANSFER_JOB_NAME
with the name you want to assign the replication job. See thetransferJobs
reference documentation for naming requirements.SOURCE_BUCKET_NAME
with the name of the source bucket that contains the objects you want to replicate. For example,example-source-bucket
.DESTINATION_BUCKET_NAME
with the name of the destination bucket where your objects will be replicated. For example,example-destination-bucket
.OVERWRITE_OPTION
with an option for how existing objects in the destination bucket can be overwritten as the result of a replication job, which can happen when the destination object and the source object have the same name. The value must be one of the following:ALWAYS
: Always overwrite objects in the destination bucketDIFFERENT
: Only overwrite objects in the destination bucket if the destination object data is different from the source object dataNEVER
: Never overwrite objects in the destination bucket
Use cURL to call the Storage Transfer Service REST API with a
transferJobs.create
request:curl -X POST --data-binary @JSON_FILE_NAME \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://storagetransfer.googleapis.com/v1/transferJobs"
Where:
JSON_FILE_NAME
is the name of the JSON file you created in Step 2.
To check the status of the replication job, view Cloud Logging for Storage Transfer Service logs.
List replication jobs
Console
You cannot list replication jobs using the Google Cloud console. See View replication job for instructions on how to view a single replication job at a time.
Command line
Use the gcloud alpha transfer jobs list
command with the
--job-type
flag:
gcloud alpha transfer jobs list --job-type=replication
REST APIs
JSON API
Have gcloud CLI installed and initialized, in order to generate an access token for the
Authorization
header.Alternatively, you can create an access token using the OAuth 2.0 Playground and include it in the
Authorization
header.Use cURL to call the Storage Transfer Service REST API with a
transferJobs.list
request:curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://storagetransfer.googleapis.com/v1/transferJobs"
View a replication job
Console
- In the Google Cloud console, go to the Cloud Storage Buckets page.
In the list of buckets, click the name of the source bucket whose cross-bucket replication job you want to view.
On the Bucket details page, click the Configuration tab.
Locate the Cross-bucket replication option and click
Edit.The Edit cross-bucket replication pane appears, which displays the replication job for each destination bucket.
On the Buckets page, you can view the Replication column, which displays whether a bucket has a Turbo replication job or a cross-bucket replication job running. For instructions on displaying the Replication column, see Show columns.
Command line
Use the gcloud alpha transfer jobs describe
command:
gcloud alpha transfer jobs describe JOB_NAME
Replace:
JOB_NAME
with the unique ID of the replication job. For example,1234567890
. To find the ID of your transfer job, list your replication jobs.
REST APIs
JSON API
Have gcloud CLI installed and initialized, in order to generate an access token for the
Authorization
header.Alternatively, you can create an access token using the OAuth 2.0 Playground and include it in the
Authorization
header.Use
cURL
to call the Storage Transfer Service REST API with atransferJobs.get
request:curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://storagetransfer.googleapis.com/v1/transferJobs/JOB_NAME"
Replace:
JOB_NAME
with the unique ID of the replication job. For example,1234567890
. To find the ID of your replication job, list your replication jobs.
Update a replication job
You can update the following fields of a replication job:
The description of the replication job
The configuration for running a replication job
The configuration of notifications published to Pub/Sub
The logging behavior for replication job operations
The status of the replication job (whether it's enabled, disabled, or deleted)
Console
When using the Google Cloud console, you can only update a replication job by pausing or unpausing the job.
- In the Google Cloud console, go to the Cloud Storage Buckets page.
In the list of buckets, click the name of the source bucket that you want to pause or unpause replicating.
On the Bucket details page, click the Configuration tab.
Locate the Cross-bucket replication option and click
Edit.In the Edit cross-bucket replication pane that appears, click
Pause or Unpause next to the replication job you want to update.
Command line
Use the gcloud alpha transfer jobs update
command with the flags
that control the replication job properties you want to update. For a list
of possible flags, view the
gcloud alpha transfer jobs update
documentation.
For example, to update the object overwrite behavior of the replication job,
run the gcloud alpha transfer jobs update
command with the
--overwrite-when
flag:
gcloud alpha transfer jobs update JOB_NAME --overwrite-when=OVERWRITE_OPTION
Replace:
JOB_NAME
with the unique ID of the replication job. For example,1234567890
. To find the ID of your transfer job, list or view your transfer job.OVERWRITE_OPTION
with an option for how existing objects in the destination bucket can be overwritten as the result of a replication job, which can happen when the destination object and the source object have the same name. The value must be one of the following:always
: Always overwrite destination objects.different
: Only overwrite objects in the destination bucket if the destination object data is different from the source object data.never
: Never overwrite destination objects.
REST APIs
JSON API
Have gcloud CLI installed and initialized, in order to generate an access token for the
Authorization
header.Alternatively, you can create an access token using the OAuth 2.0 Playground and include it in the
Authorization
header.Create a JSON file that contains the following structure that includes the fields of the
TransferJob
object you want to update:{ "projectId": string, "transferJob": { object (TransferJob) }, "updateTransferJobFieldMask": UPDATE_MASK }
Where:
object (TransferJob)
is replaced with the fields of the replication job you want to update. See theTransferJob
resource representation for more information.UPDATE_MASK
is a comma-separated list of the field names you want to update. Values can be one or more of the following:description
,transferSpec
,notificationConfig
,loggingConfig
,status
.
For more information about the field names you can include, see the
transferJobs.patch
request body.Use cURL to call the Storage Transfer Service REST API with a
transferJobs.patch
request:curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://storagetransfer.googleapis.com/v1/transferJobs/JOB_NAME"
Replace:
JOB_NAME
with the unique ID of the replication job. For example,1234567890
. To find the ID of your replication job, list your replication jobs.
Delete a replication job
Console
- In the Google Cloud console, go to the Cloud Storage Buckets page.
In the list of buckets, click the name of the source bucket you want to stop replicating.
On the Bucket details page, click the Configuration tab.
Locate the Cross-bucket replication option and click
Edit.In the Edit cross-bucket replication pane that appears, click
Delete next to the replication job you want to delete.In the dialogue that appears, click Confirm.
Command line
Use the gcloud alpha transfer jobs delete
command:
gcloud alpha transfer jobs delete JOB_NAME
Replace:
JOB_NAME
with the unique ID of the replication job. For example,1234567890
. To find the ID of your replication job, list your replication jobs.
REST APIs
JSON API
Have gcloud CLI installed and initialized, in order to generate an access token for the
Authorization
header.Alternatively, you can create an access token using the OAuth 2.0 Playground and include it in the
Authorization
header.Use cURL to call the Storage Transfer Service REST API with a
transferJobs.delete
request:curl -X DELETE \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://storagetransfer.googleapis.com/v1/transferJobs/JOB_NAME"
Replace:
JOB_NAME
with the unique ID of the replication job. For example,1234567890
. To find the ID of your replication job, list your replication jobs.