Write data to the Firestore database

This page describes the second stage of the migration process where you set up a Dataflow pipeline and begin a concurrent data move from the Cloud Storage bucket into your destination Firestore with MongoDB compatibility database. This operation will run concurrently with the Datastream stream.

Start the Dataflow pipeline

The following command starts a new, uniquely named, Dataflow pipeline.

The following formats are acceptable:

inputFilePattern=gs://bucket-name/migration/attempt_1/
inputFilePattern=gs://bucket-name/migration/
inputFilePattern=gs://bucket-name/

Not supported:

inputFilePattern=gs://bucket-name/**/*

DATAFLOW_START_TIME="$(date +'%Y%m%d%H%M%S')"

gcloud dataflow flex-template run "dataflow-mongodb-to-firestore-$DATAFLOW_START_TIME" \
--template-file-gcs-location gs://dataflow-templates-us-central1/latest/flex/Cloud_Datastream_MongoDB_to_Firestore \
--region $LOCATION \
--num-workers $NUM_WORKERS \
--temp-location $TEMP_OUTPUT_LOCATION \
--additional-user-labels "" \
--parameters inputFilePattern=$INPUT_FILE_LOCATION,\
inputFileFormat=avro,\
fileReadConcurrency=10,\
connectionUri=$FIRESTORE_CONNECTION_URI,\
databaseName=$FIRESTORE_DATABASE_NAME,\
shadowCollectionPrefix=shadow_,\
batchSize=500,\
deadLetterQueueDirectory=$DLQ_LOCATION,\
dlqRetryMinutes=10,\
dlqMaxRetryCount=500,\
processBackfillFirst=false,\
useShadowTablesForBackfill=true,\
runMode=regular,\
directoryWatchDurationInMinutes=20,\
streamName=$DATASTREAM_NAME,\
stagingLocation=$STAGING_LOCATION,\
autoscalingAlgorithm=THROUGHPUT_BASED,\
maxNumWorkers=$MAX_WORKERS,\
workerMachineType=$WORKER_TYPE

For more information about monitoring the Dataflow pipeline, see Troubleshooting.

What's next

Proceed to Migrate traffic to Firestore.