Configure external datasets for K9

This page describes an optional step to configure external datasets for K9 for the Cortex Data Foundation deployment. Some advanced use cases might require external datasets to complement an enterprise system of record. In addition to external exchanges consumed from Analytics Hub, some datasets might need custom or tailored methods to ingest data and join them with the reporting models.

Configure the DAGs following these steps:

  1. Holiday Calendar: This DAG retrieves the special dates from PyPi Holidays. Adjust the list of countries and years to retrieve holidays, as well as DAG parameters from the file holiday_calendar.ini. Note: Keep default values, if using sample data.

  2. Product Hierarchy Texts: This DAG flattens materials and their product hierarchies. The resulting table can be used to feed the Trends list of terms to retrieve Interest Over Time. Adjust DAG parameters in file prod_hierarchy_texts.py. If you are using sample data, keep default values. Otherwise, adjust the levels of the hierarchy and the language under the markers for ## CORTEX-CUSTOMER:. Use the command grep -R CORTEX-CUSTOMER to check all## CORTEX-CUSTOMER comments. If your product hierarchy contains more levels, you might need to add an additional SELECT statement similar to the Common Table Expression h1_h2_h3.

    There might be additional customizations depending on the source systems. Getting the business users or analysts involved early is recommended in the process to help spot these.

  3. Trends: This DAG retrieves Interest Over Time for a specific set of terms from Google Search trends. The terms can be configured in trends.ini. Adjust the timeframe to 'today 7-d' in trends.py after an initial run. Get familiarized with the results coming from the different terms to tune parameters. Partitioning large lists to multiple copies of this DAG running at different times is recommended. For more information about the underlying library being used, see Pytrends.

  4. Weather: By default, this DAG uses the publicly available test dataset BigQuery-public-data.geo_openstreetmap.planet_layers. The query also relies on an NOAA dataset only available through Analytics Hub: noaa_global_forecast_system. This dataset needs to be created in the same region as the other datasets prior to executing deployment. If the datasets are not available in your region, you can continue with the following instructions to transfer the data into the chosen region. Skip this configuration if using sample data.

    1. Go to BigQuery > Analytics Hub.
    2. Click Search Listings.
    3. Search for "NOAA Global Forecast System"
    4. Click Add dataset to project.
    5. When prompted, keep noaa_global_forecast_system as the name of the dataset. If needed, adjust the name of the dataset and table in the FROM clauses in weather_daily.sql.
    6. Repeat the listing search for Dataset OpenStreetMap Public Dataset.
    7. Adjust the FROM clauses containing: BigQuery-public-data.geo_openstreetmap.planet_layers in postcode.sql.
  5. Analytics Hub is only supported in EU and US locations, and some datasets, such as NOAA Global Forecast, are only offered in a single multi location. If you are targeting a location different from the one available for the required dataset, it's recommended to create a scheduled query to copy the new records from the Analytics Hub linked dataset followed by a transfer service to copy those new records into a dataset located in the same location or region as the rest of your deployment. You then need to adjust the SQL files.

  6. Add the required python modules as dependencies before copying these DAGs to Cloud Composer:

    Required modules:
    pytrends~=4.9.2
    holidays
    
  7. Sustainability and ESG insights. Cortex Framework combines SAP supplier performance data with advanced ESG insights to compare delivery performance, sustainability and risks more holistically across global operations. For more information, see Dun and Bradstreet datasource.