You can convert invoices into structured data in Cloud Data Fusion using the Invoice Parser plugin, which is powered by Document AI. The structured data gets stored in BigQuery.
Before you begin
To parse invoices, you need a Cloud Data Fusion instance running in version 6.4.1 or later. For more information, see Upgrading Cloud Data Fusion instances.
Create a processor
In the Google Cloud console, go to the Document AI Processors page.
Create a processor. Select Invoice parser as the type of processor.
Configure the invoice parser plugin
In the Google Cloud console, go to the Cloud Data Fusion Instances page.
Ensure that the desired instance has been upgraded to version 6.4.1 or later. For earlier versions, upgrade the instance.
Click View instance. The Cloud Data Fusion UI opens.
Click Hub.
Click GCP, and then deploy GCP Plugins.
Click DocAI, and then deploy the Doc AI Plugins.
Click the Invoice Parser Quickstart > Create.
Customize your pipeline by entering the Invoice Parser processor ID, Cloud Storage bucket path, and BigQuery table details.
Deploy and run the pipeline.
Parsed invoices are stored in the output table in BigQuery.
Metadata from the invoices is stored in the Metadata
table and includes
parsing status, Cloud Storage path, and upload timestamp of the raw
invoice. Records in the output and metadata tables can be joined with the
invoice_uuid
key.