Business insights in SAP with BigQuery ML

This document describes a reference architecture for using BigQuery ML to build a machine learning solution in your SAP environment, by using the on-premises or any cloud edition of ABAP SDK for Google Cloud. With the help of the reference architecture explained in this document, you can build solutions that extract data-driven insights to accelerate your business decision-making processes within SAP.

The intended audience for this document includes ABAP developers, SAP solution architects, and cloud architects. The document assumes that you're familiar with BigQuery and SQL.

Architecture

The following diagram shows a reference architecture for a machine learning solution that provides business insights based on the data from an SAP ERP system:

Machine learning solution for business insights in SAP

This machine learning solution architecture includes the following components:

Component Subsystem Details
1 Data source An SAP ERP system such as SAP S/4HANA, which is the source of enterprise data.
2 Data warehouse BigQuery, which stores raw data from source systems and ML generated insights.
3 Machine learning: Training Machine learning models, which are trained on source data to generate business insights.
4 Machine learning: Serve Insights Components that return ML generated business insights for consumption to SAP.

Products used

This reference architecture uses the following Google Cloud products:

  • ABAP SDK for Google Cloud: Helps you develop ABAP applications that connect your SAP systems to Google Cloud services, such as BigQuery.
  • BigQuery: Helps you manage and analyze your enterprise data.

Use cases

BigQuery ML empowers SAP users to derive valuable insights for improved decision-making and business outcomes by using the enterprise data stored in BigQuery. This section provides examples of use cases for which you can use BigQuery ML to build machine learning solutions.

Forecasting

Forecasting in BigQuery ML involves analyzing historical time series data to predict future trends. For example, you can analyze historical sales data from several store locations to predict future sales at those locations. You can perform forecasting by using the ML.FORECAST function with the ARIMA_PLUS or ARIMA_PLUS_XREG model.

Anomaly detection

Anomaly detection finds unusual patterns in data, enabling proactive response to potential problems or opportunities. With labeled data, you can utilize various supervised machine learning models for this task. You can use the linear and logistic regression, boosted trees, random forests, deep neural networks, wide-and-deep models, or AutoML model for anomaly detection.

Recommendation systems

Recommendation systems are powerful tools for businesses, helping users discover relevant content within vast collections. These systems use machine learning to suggest items that users might not find on their own, overcoming the limitations of search. Recommendation systems primarily employ two approaches:

Design considerations

This section provides guidance to help you use this reference architecture to develop architectures that help you to meet your specific requirements for security, privacy, compliance, cost, and performance.

Security, privacy, and compliance

To implement data governance policies, you can use BigQuery features. Data governance manages data security and quality throughout its lifecycle to ensure access and accuracy comply with policies and regulations. For more information, see Introduction to data governance in BigQuery.

Cost and performance

For an estimate of the cost of the Google Cloud resources that the data warehouse with BigQuery solution uses, see the precalculated estimate in the Google Cloud Pricing Calculator.

As you begin to scale with BigQuery, you have a number of ways available to help improve your query performance and to reduce your total spend. These methods include changing how your data is physically stored, modifying your SQL queries, and using slot reservations to ensure cost performance. For more information about ways to help scale and run your data warehouse, see Introduction to optimizing query performance.

Design alternative

While this document focuses on the on-premises or any cloud edition of ABAP SDK for Google Cloud, you can achieve similar results by using the SAP BTP edition of ABAP SDK for Google Cloud. You can adapt the provided resources to build similar solutions within your SAP BTP environment.

Deployment

This section shows you how to load data from your source SAP system into BigQuery, create a machine learning model suitable for your business requirements, and finally generate insights that can be used for business decisions in SAP.

Before you begin

Before implementing a solution based on this reference architecture, make sure that you have completed the following prerequisites:

Load data from an SAP source system into BigQuery

To load data from an SAP source system into a BigQuery table by using the ABAP SDK for Google Cloud, you use the INSERT_ALL_TABLEDATA method of the /GOOG/CL_BIGQUERY_V2 class. This data serves as the foundation for training the machine learning models.

The following code sample illustrates how to load data from an SAP source system into a BigQuery table:

TRY.
    DATA(lo_client) = NEW /goog/cl_bigquery_v2( iv_key_name = 'CLIENT_KEY' ).

    TYPES:
      BEGIN OF t_timeseries,
        date  TYPE string,   "YYYY-MM-DD
        value TYPE string,   "Numeric value
      END OF t_timeseries.

    DATA: lt_data TYPE STANDARD TABLE OF t_timeseries.
    "Developer TODO - populate lt_data

    DATA: ls_input        TYPE /goog/cl_bigquery_v2=>ty_133.

    LOOP AT lt_data INTO FIELD-SYMBOL(<ls_data>).
      APPEND INITIAL LINE TO ls_input-rows ASSIGNING FIELD-SYMBOL(<ls_row>).
      CREATE DATA <ls_row>-json TYPE t_timeseries.
      FIELD-SYMBOLS: <lfs_json> TYPE t_timeseries.
      ASSIGN <ls_row>-json->* TO <lfs_json> CASTING.
      <lfs_json> = <ls_data>.
    ENDLOOP.

    "Call API method: bigquery.tabledata.insertAll
    CALL METHOD lo_client->insert_all_tabledata
      EXPORTING
        iv_p_dataset_id = 'BIGQUERY_DATASET'
        iv_p_project_id = CONV #( lo_client->gv_project_id )
        iv_p_table_id   = 'BIGQUERY_TABLE'
        is_input        = ls_input
      IMPORTING
        es_output       = DATA(ls_output)
        ev_ret_code     = DATA(lv_ret_code)
        ev_err_text     = DATA(lv_err_text).

    lo_client->close( ).

  CATCH cx_root.
    "handle error
ENDTRY.

Replace the following:

  • CLIENT_KEY: The client key configured for authentication.
  • BIGQUERY_DATASET: The ID of the target BigQuery dataset.
  • BIGQUERY_TABLE: The ID of the target BigQuery table.

Alternatively, if you are looking for a no-code BigQuery data ingestion solution that handles both initial load and delta (CDC) handling, then you could consider BigQuery Connector for SAP, Cloud Data Fusion, or SAP Datasphere.

Create a machine learning model

BigQuery provides a comprehensive list of machine learning models and AI resources to derive insights from enterprise data. One such model is ARIMA_PLUS, which you can use to forecast future time series values based on historical data. For example, you can use the ARIMA_PLUS model to predict future sales volumes based on historical data. To train a new model with the SAP data that you have loaded into BigQuery, you use the CREATE MODEL statement. Depending on the size of your data, this might take a few minutes or hours to run. After the model is created, you can view the model in BigQuery Studio.

To create and train your model, run the CREATE MODEL statement:

  1. In the Google Cloud console, click the Compose new query button.

  2. Enter the following GoogleSQL query in the Query editor:

       CREATE OR REPLACE MODEL `MODEL_NAME`
           OPTIONS
           (model_type = 'ARIMA_PLUS',
           time_series_timestamp_col = 'TIME_SERIES_TIMESTAMP_COL',
           time_series_data_col = 'TIME_SERIES_DATA_COL',
           auto_arima = TRUE,
           data_frequency = 'AUTO_FREQUENCY',
           decompose_time_series = TRUE
             ) AS
         SELECT
           {timestamp_column_name},
           {timeseries_data_column_name}
         FROM
           `{BigQuery_table}`
    

    Replace the following:

    • MODEL_NAME: The name of your model that you have trained for generating insights.
    • TIME_SERIES_TIMESTAMP_COL: The name of the column that provides the time points used in training the model.
    • TIME_SERIES_DATA_COL: The name of the column that contains the data to forecast.

    For more information about the arguments, see the CREATE MODEL syntax.

  3. Click Run. Your model appears in the navigation panel.

Generate and serve insights

You can generate insights by using your machine learning model. For example, if you created an ARIMA_PLUS model, then you can use it to generate insights for time series values such as future sales predictions.

Generate insights

To generate insights based on a trained model, perform the following steps:

  1. In the Google Cloud console, click the Compose new query button.

  2. Enter the following GoogleSQL query in the Query editor:

    SELECT
    *
    FROM
    ML.FORECAST(MODEL `MODEL_NAME`,
    STRUCT(HORIZON AS horizon, CONFIDENCE_LEVEL AS confidence_level));
    

    Replace the following:

    • MODEL_NAME: The name of your model that you have trained for generating insights.
    • HORIZON: An INT64 value that specifies the number of time points to forecast.
    • CONFIDENCE_LEVEL: A FLOAT64 value that specifies percentage of the future values that fall in the prediction interval.

    For more information about the arguments, see the The ML.FORECAST function.

  3. Click Run. The ML model provides an output of forecasted values for future dates.

  4. When the query is complete, click the Results tab. The results should look like the following:

    Generate and serve insights by using BigQuery ML

Serve insights

To serve generated predictions for data driven business decisions from your SAP application by using the ABAP SDK for Google Cloud, you use the QUERY_JOBS method of the /GOOG/CL_BIGQUERY_V2 class.

To help you deploy the example solution explained in this section with minimal effort, a code sample is provided in GitHub. This code sample shows an end-to-end example of generating forecasts of future time series values by using the ARIMA_PLUS model and the on-premises or any cloud edition of ABAP SDK for Google Cloud.

What's next

  • For an overview of BigQuery ML, see Introduction to BigQuery ML.
  • To evaluate the available BigQuery machine learning models for your business process and requirements, see End-to-end user journey for each model.
  • Instead of performing ad-hoc queries to get BigQuery ML predictions, you can use Cloud Run to automatically generate insights for new or changed data in BigQuery. The generated insights can be written to a temporary BigQuery table, and then retrieved by an SAP application.
  • To extract greater value and insights from your SAP data, you can use the Google Cloud Cortex Data Foundation that provides a scalable and extensible data foundation, along with prebuilt data models and pipelines for common business use cases related to SAP.
  • To learn more about the Google Cloud console, see Using the Google Cloud console.
  • To learn about ABAP SDK for Google Cloud, see Overview of ABAP SDK for Google Cloud.
  • If you need help resolving problems with the ABAP SDK for Google Cloud, then do the following:

Contributors

Author: Ajith Urimajalu | SAP Application Engineer Team Lead

Other contributor: Vikash Kumar | Technical Writer