Upgrade recommendations

This page describes recommendations to upgrade to new versions from customized Cortex Data Foundation. On every release, the Cortex team commits to minimize disruptions while it add new features to the Cortex Framework. New updates prioritize backward compatibility. However, this guide helps you minimize the possible issues.

Cortex Data Foundation provides a set of predefined content and templates to accelerate value from data replicated into BigQuery. Organizations adapt these templates, modules, SQL, Python scripts, pipelines and other content provided to fit their needs.

Core components

Cortex Data Foundation content is designed with a principle of openness in mind. Organizations can use the tools that work best for them when working with the BigQuery data models provided. The only platform on which the foundation has a tight dependency on is BigQuery. All other tools can be interchanged as required:

  • Data Integration: Any integration tool that has interconnectivity with BigQuery can be leveraged provided it can replicate raw tables and structures. For example, raw tables should resemble the same schema as they were created in SAP (same names, fields, and data types). In addition, the integration tool should be able to provide basic transformation services such as updating target data types for BigQuery compatibility as well as adding additional fields like timestamp or operations flag for highlighting new and changed records.
  • Data Processing: The Change Data Capture (CDC) processing scripts provide work with Cloud Composer (or Apache Airflow) are optional, and the implementation of this submodule might be ignored completely if another tool is achieving this goal. Conversely, the SQL statements are produced separately from the Airflow-specific files where possible, so that customers can make use of the separate SQL files in another tool as needed.
  • Data Visualization: While Looker dashboarding templates are provided and contain visualizations and minimum logic, the core logic remains available in the data foundation within BigQuery by design to create visualizations with their reporting tool of choice.

Key benefits

Cortex Data Foundation is designed to be adaptable to various business needs. Its components, such as submodules and SQL views, are built with flexibility, allowing organizations to tailor the platform to their specific requirements and getting the following benefits:

  • Openness: Integrates seamlessly with various data integration, processing, and visualization tools beyond BigQuery.
  • Customization: Organizations can modify and expand pre built components like SQL views to match their data models and business logic.
  • Performance Optimization: Techniques like partitioning, data quality checks, and clustering can be adjusted based on individual workloads and data volumes.
  • Backward Compatibility: Cortex strives to maintain backward compatibility in future releases, minimizing disruption to existing implementations. For information about version changes, see the Release Notes.
  • Community Contribution: Encourages knowledge sharing and collaboration among users.

Update process

The following sections share the instructions for one way in which developers can keep their code up-to-date with the Cortex Data Foundation repository while retaining their customizations. Use of the pre-delivered deployment scripts in CI/CD pipelines. However, organizations can employ alternative tools and methodologies to suit their preferences, such as Dataform, or automation tools provided by the different Git hosts, such as GitHub actions.

Set up your repositories

This section outlines one approach to setting up your repositories that is forking the repositories. Before following these steps, a solid understanding of Git is recommended.

  1. Fork core repositories: Create forks of the following core Cortex Data Foundation repositories. The fork keeps that repository receiving updates from the Google Cloud repository, and a separate repository for the Company's _main_. For the data foundation, use the deployment included there and all the submodules that you intend to deploy and customize, such as the following:

    • Cortex Data Foundation
    • Corrtex reporting
    • Cortex DAG generator
    • Cortex ML models

    If you are forking the Data foundation parent repository and not using one of the submodules and choose not to fork it, you can remove it from the submodules list after forking. However, to avoid adjusting this structure each time, it might be more convenient to just keep it.

  2. Clone Forked Repositories: After you have forked the repositories, clone them into a new folder.

  3. Create Company Repositories: Establish a new Git host for your company's repositories (for example, Cloud Source). Create repositories with the same names as your forked repositories on the new host. This simplifies the submodule update process.

    The aim of this step is to show an example of a different tool being used as a customer's main repository where developers collaborate to create and adjust models. This tool can be any Git host of choice. This page refers to the repositories created in Cloud Source as the _Company's repositories_.

  4. Initialize Company Repositories: Copy the code from your forked repositories into the newly created company repositories. Add the original forked repositories as an upstream remote repository with the following command, and verify the remote has been added. This establishes a connection between your company repositories and the original repositories.

    git remote add google <<remote URL>>
    git remote -v
    git push --all google
    
  5. Verify Repository Setup: Ensure each company repository contains the cloned code, history, and submodule references. You should see the two remotes, origin and the one you added after using the command:

    git remote -v:
    

    You now have a set of repositories, the _Company's repositories_, where developers can submit their changes. Developers can now clone and work in branches in the new repositories.

Merge your changes with a new Cortex release

This section describes the process of merging changes from the _Company's repositories_ and changes coming from the Google Cloud repositories.

  1. Update forks: Click Sync fork to update your forks for all parent and submodule repositories with the changes from the Google Cloud repository. For example, the following changes to the _Company's repositories_ are done. And there has been some other changes in the Data Foundation repository by Google Cloud in a new release.

    • Created and incorporated the use of a new view in SQL
    • Modified existing views
    • Replaced a script entirely with our own logic

    Considering all these changes, start with the submodules you want to update from. The following commands sequence adds the fork repository as an upstream remote repository to pull the updated release from as GitHub and checks out its main branch as GitHub-main. Then, this example checks out the main branch from the _Company's repositories_ in Google Cloud Source and creates a branch for merging called merging_br.

    git remote add github <<github fork>>
    git fetch github main
    git checkout -b github-main github/main
    git checkout  main
    git checkout -b merging_br
    

    There are multiple ways to build this flow. The merging process could also happen in the fork in GitHub, be replaced by a rebase instead of a merge, and the merging branch could also be sent as a merge request. These variations of the process depend on current organizational policies, depth of changes and convenience.

    With this setup in place, you can compare the incoming changes to your local changes. It's recommended to use a tool in a graphic IDE of choice to see the changes and choose what gets merged. For example, Visual Studio.

    It's recommended flagging customizations using comments that stand out visually, to make the diff process easier.

  2. Start the merge process: Use the created branch (in this example, is the branch called merging_br) to converge all changes and discard files. When ready, you can merge this branch back into the main or another branch for your Company's repository to create a merge request. From that merging branch that was checked out from your Company's repository's main (git checkout merging_br), merge the incoming changes from the remote fork.

        ## git branch -a
        ## The command shows github-main which was created from the GitHub fork
        ## You are in merging_br
    
        git merge github-main
    
        ## If you don't want a list of the commits coming from GitHub in your history, use `--squash`
    

    This command generates a list of conflicts. Use the graphical IDE comparison to understand the changes and choose between current, incoming and both. This is where having a comment in the code around customizations becomes handy. Choose to discard changes altogether, delete files that you don't want to merge at all and ignore changes to views or scripts that you have already customized.

  3. Merge changes: After you have decided on the changes to apply, check the summary and commit them with the command:

        git status
        ## If something doesn't look right, you can use git rm or git restore accordingly
        git add --all #Or . or individual files
        git commit -m "Your commit message"
    

    If you feel insecure about any step, see Git basic undoing things.

  4. Test and deploy: So far you are only merging into a "temporary" branch. It's recommended running a test deployment from the cloudbuild\*.yaml scripts at this point to make sure everything is executing as expected. Automated testing can help streamline this process.

    Once this merging branch looks good, you can checkout your main target branch and merge the _mergingbr branch into it, starting from the submodules.

    In the case of submodules, you can keep your own .gitmodules file pointing to your directory structure. Keep this structure the same for the Cortex-delivered submodules to simplify this process. This is what submodules file looks like as of release v3.0:

        [submodule "src/SAP/SAP_REPORTING"]
            path = src/SAP/SAP_REPORTING
            url = ../cortex-reporting
            branch = main
        [submodule "src/SAP/SAP_ML_MODELS"]
            path = src/SAP/SAP_ML_MODELS
            url = ../cortex-ml-models
            branch = main
        [submodule "src/SAP/SAP_CDC"]
            path = src/SAP/SAP_CDC
            url = ../cortex-dag-generator
            branch = main
    
  5. Merge into Main Branch: After you have made changes to the submodules and if they are stable enough that the Data Foundation can be updated, you can run git submodule update (you might need to add the flag --remote depending on the scenario). You can also test a specific commit or branch of a specific submodule.

For more information about Git, see Git documentation.