This page explains how to configure a FHIR store to automatically export FHIR resources to BigQuery tables every time a FHIR resource is created, updated, patched, or deleted. This process is called BigQuery streaming.
You can use BigQuery streaming to do the following:
- Synchronize the data in a FHIR store with a BigQuery dataset in near real time.
- Perform complex queries on FHIR data without needing to export it to BigQuery every time you want to analyze the data.
To improve query performance and reduce costs, you can configure BigQuery streaming to partitioned tables. For instructions, see Stream FHIR resources to partitioned tables.
Before you begin
Read Exporting FHIR resources to BigQuery to understand how the export process works.
Limitations
If you import FHIR resources from Cloud Storage, the changes aren't streamed to BigQuery.
Setting BigQuery permissions
To enable BigQuery streaming, you must grant additional permissions to the Cloud Healthcare Service Agent service account. For more information, see FHIR store BigQuery permissions.
Configure BigQuery streaming on a FHIR store
To enable BigQuery streaming, configure the StreamConfigs
object in your FHIR store. In StreamConfigs
, you can configure the resourceTypes[]
array to control which types of FHIR resources BigQuery streaming
applies to. If you don't specify resourceTypes[]
, BigQuery
streaming applies to all FHIR resource types.
For explanations of other configurations available in StreamConfigs
, such as
BigQueryDestination
,
see Exporting FHIR resources.
The following samples show how to enable BigQuery streaming on an existing FHIR store.
Console
To configure BigQuery streaming on an existing FHIR store using the Google Cloud console, complete the following steps:
In the Google Cloud console, go to the Datasets page.
Select the dataset containing the FHIR store you want to edit.
In the Data stores list, click the FHIR store you want to edit.
In the BigQuery streaming section, complete the following steps:
- Click Add new streaming configuration.
- In the New streaming configuration section, click Browse to select the BigQuery dataset where you want changed FHIR resources to be streamed.
- In the Schema type dropdown, select the output schema for the
BigQuery table. The following schemas are available:
- Analytics. A schema based on the SQL on FHIR document. Because BigQuery only allows for 10,000 columns per table, schemas are not generated for the
Parameters.parameter.resource
,Bundle.entry.resource
, andBundle.entry.response.outcome
fields. - Analytics V2. A schema similar to the Analytics schema, with added support for the following:
- Extensions with multiple values for the same
url
- Contained FHIR resources
- Extensions with multiple values for the same
- Analytics. A schema based on the SQL on FHIR document. Because BigQuery only allows for 10,000 columns per table, schemas are not generated for the
- Select a depth level in the Recursive Structure Depth slider to set the depth for all recursive structures in the output schema. By default, the recursive value is 2.
- In the Select FHIR resource types list, select the resource types to stream.
Click Done to save the streaming configuration.
gcloud
The gcloud CLI doesn't support this action. Instead, use the Google Cloud console, curl
, PowerShell, or your preferred language.
REST
To configure BigQuery streaming on an existing FHIR store, use the
projects.locations.datasets.fhirStores.patch
method.
The following samples do not specify the resourceTypes[]
array,
so BigQuery streaming is enabled for all FHIR resource types.
Before using any of the request data, make the following replacements:
- PROJECT_ID: the ID of your Google Cloud project
- LOCATION: the dataset location
- DATASET_ID: the FHIR store's parent dataset
- FHIR_STORE_ID: the FHIR store ID
- BIGQUERY_DATASET_ID: the name of an existing BigQuery dataset where you're streaming FHIR resource changes
- SCHEMA_TYPE: a value for the
SchemaType
enum. Use one of the following values:ANALYTICS
. A schema based on the SQL on FHIR document. Because BigQuery only allows for 10,000 columns per table, schemas are not generated for theParameters.parameter.resource
,Bundle.entry.resource
, andBundle.entry.response.outcome
fields.ANALYTICS_V2
. A schema similar toANALYTICS
with added support for the following:- Extensions with multiple values for the same
url
- Contained FHIR resources
.ANALYTICS_V2
uses more space in the destination table thanANALYTICS
- Extensions with multiple values for the same
- WRITE_DISPOSITION: a value for the
WriteDisposition
enum. Use one of the following values:WRITE_EMPTY
. Only export data if the destination BigQuery tables are empty.WRITE_TRUNCATE
. Erase all existing data in the BigQuery tables before writing the FHIR resources.WRITE_APPEND
. Append data to the destination BigQuery tables.
Request JSON body:
{ "streamConfigs": [ { "bigqueryDestination": { "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID", "schemaConfig": { "schemaType": "SCHEMA_TYPE", }, "writeDisposition": "WRITE_DISPOSITION" } } ] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF' { "streamConfigs": [ { "bigqueryDestination": { "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID", "schemaConfig": { "schemaType": "SCHEMA_TYPE", }, "writeDisposition": "WRITE_DISPOSITION" } } ] } EOF
Then execute the following command to send your REST request:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://healthcare.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/fhirStores/FHIR_STORE_ID?updateMask=streamConfigs"
PowerShell
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@' { "streamConfigs": [ { "bigqueryDestination": { "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID", "schemaConfig": { "schemaType": "SCHEMA_TYPE", }, "writeDisposition": "WRITE_DISPOSITION" } } ] } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method PATCH `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://healthcare.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/fhirStores/FHIR_STORE_ID?updateMask=streamConfigs" | Select-Object -Expand Content
APIs Explorer
Copy the request body and open the method reference page. The APIs Explorer panel opens on the right side of the page. You can interact with this tool to send requests. Paste the request body in this tool, complete any other required fields, and click Execute.
You should receive a JSON response similar to the following.
If you configured any fields in the FhirStore
resource, they also appear in the response.
By default, when you stream FHIR resource changes to BigQuery, a view is created for each resource streamed. The view has the following properties:
- It has the same name as the resource and the resource's table in the
BigQuery dataset. For example, when you stream a Patient
resource, a table named
Patient
is created with a view namedPatientview
. - It only contains the current version of the resource, rather than all historical versions.
Stream FHIR resources to partitioned tables
To export FHIR resources to BigQuery partitioned tables, set the
TimePartitioning
enum in the
lastUpdatedPartitionConfig
field in your FHIR store.
The partitioned tables work like BigQuery
time-unit partitioned tables.
Partitioned tables have an added column named lastUpdated
, which is a duplicate
of the meta.lastUpdated
column which is generated from the meta.lastUpdated
field in
a FHIR resource. BigQuery uses the lastUpdated
column to partition tables by hour, day, month, or year.
See Select daily, hourly, monthly, or yearly partitioning for recommendations on how to select a partition granularity.
You can't convert existing, non-partitioned BigQuery tables into
partitioned tables. If you export Patient resource
changes to a non-partitioned Patients
table, and
later create a new FHIR store with table partitioning that exports to the same
BigQuery dataset, the Cloud Healthcare API still exports data
to the non-partitioned Patients
table. To start using a partitioned table,
delete the existing Patients
table or use a different BigQuery dataset.
If you add partitioning to an existing FHIR store configuration, you can still export to existing non-partitioned tables. However, partitioning will only take effect on new tables.
The following samples show how to enable BigQuery streaming to partitioned tables on an existing FHIR store.
Console
The Google Cloud console and the gcloud CLI don't support this action. Instead, use curl
, PowerShell, or your preferred language.
gcloud
The Google Cloud console and the gcloud CLI don't support this action. Instead, use curl
, PowerShell, or your preferred language.
REST
To configure BigQuery streaming to partitioned tables on an
existing FHIR store, use the
projects.locations.datasets.fhirStores.patch
method.
Before using any of the request data, make the following replacements:
- PROJECT_ID: the ID of your Google Cloud project
- LOCATION: the dataset location
- DATASET_ID: the FHIR store's parent dataset
- FHIR_STORE_ID: the FHIR store ID
- BIGQUERY_DATASET_ID: the name of an existing BigQuery dataset where you're streaming FHIR resource changes
- SCHEMA_TYPE: a value for the
SchemaType
enum. Use one of the following values:ANALYTICS
. A schema based on the SQL on FHIR document. Because BigQuery only allows for 10,000 columns per table, schemas are not generated for theParameters.parameter.resource
,Bundle.entry.resource
, andBundle.entry.response.outcome
fields.ANALYTICS_V2
. A schema similar toANALYTICS
with added support for the following:- Extensions with multiple values for the same
url
- Contained FHIR resources
.ANALYTICS_V2
uses more space in the destination table thanANALYTICS
- Extensions with multiple values for the same
- TIME_PARTITION_TYPE: the granularity at which to partition exported FHIR resources. Use one of the following values:
HOUR
: partition data by hourDAY
: partition data by dayMONTH
: partition data by monthYEAR
: partition data by year
- WRITE_DISPOSITION: a value for the
WriteDisposition
enum. Use one of the following values:WRITE_EMPTY
. Only export data if the destination BigQuery tables are empty.WRITE_TRUNCATE
. Erase all existing data in the BigQuery tables before writing the FHIR resources.WRITE_APPEND
. Append data to the destination BigQuery tables.
Request JSON body:
{ "streamConfigs": [ { "bigqueryDestination": { "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID", "schemaConfig": { "schemaType": "SCHEMA_TYPE", "lastUpdatedPartitionConfig": { "type": "TIME_PARTITION_TYPE" } }, "writeDisposition": "WRITE_DISPOSITION" } } ] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF' { "streamConfigs": [ { "bigqueryDestination": { "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID", "schemaConfig": { "schemaType": "SCHEMA_TYPE", "lastUpdatedPartitionConfig": { "type": "TIME_PARTITION_TYPE" } }, "writeDisposition": "WRITE_DISPOSITION" } } ] } EOF
Then execute the following command to send your REST request:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://healthcare.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/fhirStores/FHIR_STORE_ID?updateMask=streamConfigs"
PowerShell
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@' { "streamConfigs": [ { "bigqueryDestination": { "datasetUri": "bq://PROJECT_ID.BIGQUERY_DATASET_ID", "schemaConfig": { "schemaType": "SCHEMA_TYPE", "lastUpdatedPartitionConfig": { "type": "TIME_PARTITION_TYPE" } }, "writeDisposition": "WRITE_DISPOSITION" } } ] } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method PATCH `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://healthcare.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/fhirStores/FHIR_STORE_ID?updateMask=streamConfigs" | Select-Object -Expand Content
APIs Explorer
Copy the request body and open the method reference page. The APIs Explorer panel opens on the right side of the page. You can interact with this tool to send requests. Paste the request body in this tool, complete any other required fields, and click Execute.
You should receive a JSON response similar to the following:
Query a partitioned table
To reduce query costs when querying partitioned tables, use the
WHERE
clause to filter by time units.
For example, suppose that you set the
PartitionType
enum to DAY
.
To query a Patients
table for Patient resources that changed on a specific
date, run the following query:
SELECT * FROM `PROJECT_ID.BIGQUERY_DATASET.Patients` WHERE DATE(lastUpdated) = 'YYYY-MM-DD'
Migrate from Analytics to Analytics V2
You can't migrate an existing BigQuery dataset from the Analytics
schema to the Analytics V2
schema using any method, including the following:
- Changing the schema type of the table in BigQuery.
- Changing the schema type in an existing FHIR streaming configuration.
This is because the BigQuery table columns for
FHIR extensions
in the Analytics
schema have their mode set to NULLABLE
, whereas those in
the Analytics V2
schema have them set to REPEATED
. BigQuery
doesn't allow changing the mode of a column from NULLABLE
to REPEATED
.
Therefore, the two schema types are incompatible.
To migrate the schema type of the exported FHIR resources from Analytics
to
Analytics V2
, you must export the FHIR resources to a new BigQuery
dataset using a new streaming configuration with the updated schema type. To do
so, perform the following steps:
Add a new streaming configuration to the FHIR store with the schema type set to
Analytics V2
.Backfill existing data by exporting the existing FHIR data using the following settings. See exporting FHIR resources for instructions on how to configure these settings using the Google Cloud console, the Google Cloud CLI, or the REST API. The following settings apply to the REST API:
- Set
WriteDisposition
toWRITE_APPEND
to append the data to the destination table. - Set
SchemaType
toANALYTICS_V2
.
- Set
The views in BigQuery that correspond to some or all FHIR resources in the original BigQuery dataset might be missing from your new dataset. To troubleshoot this, see Missing FHIR resource view creation.
Troubleshooting FHIR streaming
If errors occur when resource changes are sent to BigQuery, the errors are logged to Cloud Logging. For more information, see Viewing error logs in Cloud Logging.
Cannot convert column from NULLABLE to REPEATED
This error is caused by a repeated extension. To resolve this error,
use the ANALYTICS_V2
schema type. If you're already using ANALYTICS_V2
, you might have a
conflict between two extensions, or a conflict between an extension and another
field.
Column names are generated from the text after the last /
character in
extension URLs. If an extension URL ends with a value like
/resource_field name
, a conflict can occur.
To prevent this error from occurring again, don't use extensions if their field names are the same as the resource fields you're populating.
Missing FHIR resource view creation
If you bulk export a FHIR resource to BigQuery before streaming that FHIR resource, BigQuery does not create views for the FHIR resource.
For example, you might not see any views for Encounter resources in the following situation:
You configure BigQuery streaming on a FHIR store, and then use the REST API to create a Patient resource.
BigQuery creates a table and a view for the Patient resource.
You bulk export Encounter resources to the same BigQuery dataset as the previous step.
BigQuery creates a table for the Encounter resources.
You use the REST API to create an Encounter resource.
After this step, BigQuery views are not created for the Encounter resource.
To resolve this issue, use the following query to create a view:
SELECT * EXCEPT (_resource_row_id) FROM ( SELECT ROW_NUMBER() OVER(PARTITION BY id ORDER BY meta.lastUpdated DESC, commitTimestamp DESC) as _resource_row_id, * FROM `PROJECT_ID.BIGQUERY_DATASET_ID.RESOURCE_TABLE` AS p ) AS p WHERE p._resource_row_id=1 AND NOT EXISTS ( SELECT * FROM UNNEST(p.meta.tag) WHERE code = 'DELETE');
Replace the following:
- PROJECT_ID: the ID of your Google Cloud project
- BIGQUERY_DATASET_ID: the ID of the BigQuery dataset where you bulk exported the FHIR resource
- RESOURCE_TABLE: the name of the table corresponding to the FHIR resource that you want to create views for
After creating the view, you can continue to stream changes to the FHIR resource and the view is updated accordingly.
What's next
For a tutorial on a use case for streaming FHIR resource changes, see Stream and synchronize FHIR resources with BigQuery.