Import from the source Mongo database
This page describes the first stage of the migration process where you use a Datastream stream to capture the contents of your MongoDB-compatible source database and transfer them into a Cloud Storage bucket.
Create YAML configuration files for the stream
In addition to usual command-line flags, creating a stream requires two configuration files in the YAML format:
- The - mongo_source_config.yamlfile configures the selection of specific resources for migration, such as the database name. Mongo connectivity parameters such as the hostname, username, and password are all properties of the connection profile. However, the database (and any specific collections within that database) are a property of the stream.
- The - gcs_dst_config.yamlfile configures the data placement within Cloud Storage. The Cloud Storage bucket and the root path within the bucket are properties of the connection profile. However, the data format and the data placement within the Cloud Storage bucket structure are a property of the stream.
The following command examples create these files and populate them with values from the environment variables that you've set earlier. As an alternative, you can create these files in any text editor and substitute the values manually.
echo "$(cat <<EOF
includeObjects:
  databases:
  - database: ${MONGODB_DATABASE_NAME}
EOF
)" > mongo_source_config.yaml
echo "$(cat <<EOF
path: "/${GCS_BUCKET_SUB_PATH}"
avroFileFormat: {}
EOF
)" > gcs_dst_config.yaml
The previous example configures the full contents of $MONGODB_DATABASE_NAME for
migration. It is also possible to limit the migration to specific collections
within the database. For example, to migrate only the collections users and
chats use the following:
includeObjects:
  databases:
  - database: ${MONGODB_DATABASE_NAME}
    collections:
      - collection: users
      - collection: chats
Create a Datastream stream
Next, create a stream that connects the source and the destination:
gcloud datastream streams create "$DATASTREAM_NAME" \
--display-name="$DATASTREAM_NAME" \
--location="$LOCATION" \
--source="$SRC_CONNECTION_PROFILE_NAME" \
--destination="$DST_CONNECTION_PROFILE_NAME" \
--mongodb-source-config=./mongo_source_config.yaml \
--gcs-destination-config=./gcs_dst_config.yaml \
--backfill-all
For more information about monitoring the Datastream stream creation, see Troubleshooting.
Activate the Datastream stream
Finally, activate the new stream.
As the stream begins pulling data and streaming changes from the Mongo source, you can observe new directories and files created in the Cloud Storage bucket, under the path configured in the connection profile and the stream.
To activate the stream, run the following command:
gcloud datastream streams update "$DATASTREAM_NAME" \
--location="$LOCATION" \
--state=RUNNING \
--update-mask=state
What's next
Proceed to Write data into the destination database.