Write data to the Firestore database
This page describes the second stage of the migration process where you set up a Dataflow pipeline and begin a concurrent data move from the Cloud Storage bucket into your destination Firestore with MongoDB compatibility database. This operation will run concurrently with the Datastream stream.
Start the Dataflow pipeline
The following command starts a new, uniquely named, Dataflow pipeline.
DATAFLOW_START_TIME="$(date +'%Y%m%d%H%M%S')"
gcloud dataflow flex-template run "dataflow-mongodb-to-firestore-$DATAFLOW_START_TIME" \
--template-file-gcs-location gs://dataflow-templates-us-central1/latest/flex/Cloud_Datastream_MongoDB_to_Firestore \
--region $LOCATION \
--num-workers $NUM_WORKERS \
--temp-location $TEMP_OUTPUT_LOCATION \
--additional-user-labels "" \
--parameters inputFilePattern=$INPUT_FILE_LOCATION,\
inputFileFormat=avro,\
rfcStartDateTime=$START_TIME,\
fileReadConcurrency=10,\
connectionUri=$FIRESTORE_CONNECTION_URI,\
databaseName=$FIRESTORE_DATABASE_NAME,\
shadowCollectionPrefix=shadow_,\
batchSize=500,\
deadLetterQueueDirectory=$DLQ_LOCATION,\
dlqRetryMinutes=10,\
dlqMaxRetryCount=500,\
processBackfillFirst=false,\
useShadowTablesForBackfill=true,\
runMode=regular,\
directoryWatchDurationInMinutes=20,\
streamName=$DATASTREAM_NAME,\
stagingLocation=$STAGING_LOCATION,\
autoscalingAlgorithm=THROUGHPUT_BASED,\
maxNumWorkers=$MAX_WORKERS,\
workerMachineType=$WORKER_TYPE
For more information about monitoring the Dataflow pipeline, see Troubleshooting.
What's next
Proceed to Migrate traffic to Firestore.