Process documents

This quickstart shows you how to process documents (invoices) from a source Cloud Storage bucket and store the processed document (JSON file) in a target bucket by using the batch processing capability of Document AI API through the SAP BTP edition of ABAP SDK for Google Cloud.

Before you begin

Before you run this quickstart, make sure that you or your administrators have completed the following prerequisites:

  • Make sure the Document AI API is enabled in your Google Cloud project.

    Go to API library

  • In the Document AI Workbench, create a processor with type INVOICE_PROCESSOR. For more information, see Creating and managing processors.

  • In Cloud Storage, create a source bucket to store the invoices for processing and place the invoices in this bucket. For more information, see Create buckets.

  • In Cloud Storage, create a target bucket to store the processed files.

Create an ABAP class to process documents

  1. Create a package:

    1. In ADT, go to the Project Explorer.
    2. Right-click the package ZLOCAL, and select New > ABAP Package.
    3. Enter the following details for your package:

      • Name: enter ZABAPSDK_TEST.
      • Description: enter ABAP SDK Test Package.
    4. Click Next.

    5. In the Select a Transport Request dialog, select the Create a new request checkbox.

    6. Enter a description for the transport request.

    7. Click Finish.

  2. Create an ABAP class to call the Document AI API:

    1. Right-click your ABAP package and select New > ABAP Class.
    2. Enter the following details for your ABAP class:

      • Name: enter ZGOOG_CL_QS_DOCUMENT_AI.
      • Description: enter Quick start for Document AI API.
    3. Click Next.

    4. Select a transport request and click Finish.

  3. In the code editor, replace the default code with the following code snippet:

    " --------------------------------------------------------------------
    "  Copyright 2024 Google LLC                                         -
    "                                                                    -
    "  Licensed under the Apache License, Version 2.0 (the "License");   -
    "  you may not use this file except in compliance with the License.  -
    "  You may obtain a copy of the License at                           -
    "      https://www.apache.org/licenses/LICENSE-2.0                   -
    "  Unless required by applicable law or agreed to in writing,        -
    "  software distributed under the License is distributed on an       -
    "  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,      -
    "  either express or implied.                                        -
    "  See the License for the specific language governing permissions   -
    "  and limitations under the License.                                -
    " --------------------------------------------------------------------
    CLASS zcl_qs_process_documents DEFINITION
      PUBLIC FINAL
      CREATE PUBLIC.
    
      PUBLIC SECTION.
        INTERFACES if_oo_adt_classrun.
    ENDCLASS.
    
    
    CLASS zcl_qs_process_documents IMPLEMENTATION.
      METHOD if_oo_adt_classrun~main.
        DATA lv_p_projects_id   TYPE string.
        DATA lv_p_locations_id  TYPE string.
        DATA lv_p_processors_id TYPE string.
        DATA ls_input           TYPE /goog/cl_documentai_v1=>ty_017.
        DATA lo_docai           TYPE REF TO /goog/cl_documentai_v1.
    
        TRY.
    
            " Open HTTP connection
            lo_docai = NEW #( iv_key_name = 'DEMO_DOC_PROCESSING' ).
    
            " Populate relevant parameters to be passed to API
            lv_p_projects_id  = 'PROJECT_ID'.
            lv_p_locations_id = 'LOCATION_ID'.
            lv_p_processors_id = 'PROCESSOR_ID'.
            ls_input-input_documents-gcs_prefix-gcs_uri_prefix = 'SOURCE_BUCKET_URI'.
            ls_input-document_output_config-gcs_output_config-gcs_uri = 'TARGET_BUCKET_URI'.
    
            " Call API method
            lo_docai->batch_process_processors( EXPORTING iv_p_projects_id   = lv_p_projects_id
                                                          iv_p_locations_id  = lv_p_locations_id
                                                          iv_p_processors_id = lv_p_processors_id
                                                          is_input           = ls_input
                                                IMPORTING
                                                          es_output          = DATA(ls_output)
                                                          ev_ret_code        = DATA(lv_ret_code)
                                                          ev_err_text        = DATA(lv_err_text)
                                                          es_err_resp        = DATA(ls_err_resp) ).
    
            IF lo_docai->is_success( lv_ret_code ) = abap_true.
              out->write( |API call successful| ).
            ELSE.
              out->write( |Error occurred during API call| ).
              out->write( lv_err_text ).
            ENDIF.
    
            " Close HTTP connection
            lo_docai->close( ).
    
          CATCH /goog/cx_sdk INTO DATA(lo_exception). " TODO: variable is assigned but never used (ABAP cleaner)
            " Handle exception here
        ENDTRY.
      ENDMETHOD.
    ENDCLASS.
    

    Replace the following:

    • DEMO_DOC_PROCESSING: the client key name.
    • PROJECT_ID: the ID of the Google Cloud project that has the Document AI API enabled.
    • LOCATION_ID: the processor's location.
    • PROCESSOR_ID: the ID of the processor.
    • SOURCE_BUCKET_URI: the URI of the Cloud Storage bucket folder where source documents are kept for processing.
    • TARGET_BUCKET_URI: the URI of the Cloud Storage bucket where the processed document (JSON file) would be stored.
  4. Save and activate the changes.

  5. Run your application:

    1. Select the ABAP class ZGOOG_CL_QS_DOCUMENT_AI.
    2. Click Run > Run As > ABAP Application (Console). Alternatively, press F9.
  6. To validate the results, follow these steps:

    1. In the Google Cloud console, go to Cloud Storage Buckets page.

      Go to Buckets

    2. Open the target bucket. The processed document is stored in the form of a JSON file.

What's next