使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
升级建议
本页介绍了从自定义 Cortex Framework Data Foundation 升级到新版本的建议。在每次发布时,Cortex 团队都致力于在向 Cortex Framework 添加新功能时最大限度地减少中断。新更新会优先考虑向后兼容性。不过,本指南可帮助您尽可能减少可能出现的问题。
Cortex Framework Data Foundation 提供了一组预定义的内容和模板,可加速从复制到 BigQuery 的数据中获取价值。组织可以根据自己的需求调整这些模板、模块、SQL、Python 脚本、流水线和其他提供的内容。
核心组件
Cortex Framework Data Foundation 内容的设计遵循开放原则。在使用提供的 BigQuery 数据模型时,组织可以使用最适合他们的工具。该基础架构与 BigQuery 紧密相关,是唯一的依赖平台。所有其他工具都可以根据需要互换:
- 数据集成:您可以使用与 BigQuery 具有互连性的任何集成工具,前提是该工具能够复制原始表和结构。例如,原始表应与在 SAP 中创建的表具有相同的架构(相同的名称、字段和数据类型)。此外,集成工具应能够提供基本转换服务,例如更新目标数据类型以实现 BigQuery 兼容性,以及添加时间戳或操作标志等其他字段以突出显示新记录和更改后的记录。
- 数据处理:变更数据捕获 (CDC) 处理脚本提供与 Cloud Composer(或 Apache Airflow)的集成,但并非必需。反之,SQL 语句会尽可能与 Airflow 专用文件分开生成,以便客户根据需要在其他工具中使用单独的 SQL 文件。
- 数据可视化:虽然系统提供了 Looker 信息中心模板,其中包含可视化图表和最少的逻辑,但核心逻辑仍会保留在 BigQuery 中的数据基础架构中,以便用户使用其首选报告工具创建可视化图表。
主要优势
Cortex Framework Data Foundation 旨在适应各种业务需求。其组件采用灵活的设计,可让组织根据其具体要求量身定制该平台,并获得以下优势:
- 开放性:除了 BigQuery 之外,还可与各种数据集成、处理和可视化工具无缝集成。
- 自定义:组织可以修改和扩展预构建的组件(例如 SQL 视图),以匹配其数据模型和业务逻辑。
- 性能优化:可根据具体工作负载和数据量调整分区、数据质量检查和集群等技术。
- 向后兼容性:Cortex 会努力在未来版本中保持向后兼容性,尽可能减少对现有实现的影响。如需了解版本变更,请参阅版本说明。
- 社区贡献:鼓励用户之间分享知识和协作。
更新进程
以下部分介绍了一种方法,开发者可以通过该方法使用 Cortex Framework Data Foundation 代码库来及时更新其代码,同时保留其自定义内容。在 CI/CD 流水线中使用预先交付的部署脚本。不过,组织可以根据自己的偏好采用其他工具和方法,例如 Dataform,或不同的 Git 托管服务提供的自动化工具,例如 GitHub Actions。
设置代码库
本部分概述了设置代码库的一种方法。建议您先全面了解 Git,然后再按照以下步骤操作。
分叉核心代码库:创建 Cortex Framework Data Foundation 代码库的分支。分支会让该代码库继续接收来自 Google Cloud 代码库的更新,并为公司的主要代码库创建一个单独的代码库。
创建公司代码库:为公司代码库建立新的 Git 主机(例如 Cloud Source)。在新主机上创建一个与分叉的代码库同名的代码库。
初始化公司代码库:将代码从分叉的代码库复制到新创建的公司代码库。使用以下命令将原始分叉代码库添加为上游远程代码库,并验证是否已添加远程代码库。这会在贵公司代码库与原始代码库之间建立关联。
git remote add google <<remote URL>>
git remote -v
git push --all google
验证代码库设置:确保您的公司代码库包含克隆的代码和历史记录。使用该命令后,您应该会看到两个远程仓库:origin 和您添加的那个仓库:
git remote -v:
现在,您已经有了代码库,即公司代码库,开发者可以在其中提交更改。开发者现在可以在新代码库中克隆分支并在其中工作。
将您的更改与新的 Cortex 版本合并
本部分介绍了合并公司代码库和 Google Cloud 代码库中的更改的过程。
更新分支:点击同步分支,使用 Google Cloud 代码库中的更改更新代码库的分支。例如,对公司代码库进行了以下更改。新版本中,Data Foundation 代码库中还进行了一些其他更改。 Google Cloud
- 创建了新视图并在 SQL 中集成了该视图的使用
- 修改了现有视图
- 将脚本完全替换为我们自己的逻辑
以下命令序列会将分支代码库添加为上游远程代码库,以从 GitHub 拉取更新后的版本,并将其主分支检出为 GitHub-main。然后,此示例会在 Google Cloud Source 中从公司代码库中检出主分支,并创建一个名为 merging_br
的用于合并的分支。
git remote add github <<github fork>>
git fetch github main
git checkout -b github-main github/main
git checkout main
git checkout -b merging_br
您可以通过多种方式构建此流程。合并过程也可以在 GitHub 中的分支中进行,并替换为重新建立(而非合并),合并分支也可以作为合并请求发送。这些流程变体取决于当前的组织政策、更改的深度和便利性。
完成此设置后,您可以将传入的更改与本地更改进行比较。建议您在所选图形 IDE 中使用工具查看更改并选择要合并的内容。例如 Visual Studio。
建议使用视觉上醒目的注释标记自定义内容,以简化差异比较流程。
启动合并流程:使用创建的分支(在此示例中,是名为 merging_br
的分支)来合并所有更改并舍弃文件。准备就绪后,您可以将此分支合并回公司代码库的主分支或其他分支,以创建合并请求。从从贵公司代码库的主分支 (git checkout merging_br
) 检出的合并分支中,合并来自远程分支的传入更改。
## git branch -a
## The command shows github-main which was created from the GitHub fork
## You are in merging_br
git merge github-main
## If you don't want a list of the commits coming from GitHub in your history, use `--squash`
此命令会生成冲突列表。使用图形 IDE 比较功能了解更改,并在当前、传入和全部之间进行选择。这时,在代码中添加自定义相关的注释会很有用。
选择完全舍弃更改、删除您完全不想合并的文件,以及忽略对您已自定义的视图或脚本所做的更改。
合并更改:确定要应用的更改后,请检查摘要,然后使用以下命令提交更改:
git status
## If something doesn't look right, you can use git rm or git restore accordingly
git add --all #Or . or individual files
git commit -m "Your commit message"
如果您对任何步骤不确定,请参阅 Git 基本撤消操作。
测试和部署:到目前为止,您只会合并到“临时”分支。建议您在此时通过 cloudbuild\*.yaml
脚本运行测试部署,以确保一切按预期执行。自动化测试有助于简化此流程。合并分支看起来没问题后,您可以检出主目标分支,并将 merging_br
分支合并到其中。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-18。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-18。"],[[["\u003cp\u003eThis guide provides instructions for upgrading to new versions of the Cortex Framework Data Foundation while maintaining customizations made to fit organizational needs.\u003c/p\u003e\n"],["\u003cp\u003eCortex Framework Data Foundation is designed with openness and customization in mind, allowing organizations to utilize a variety of data integration, processing, and visualization tools alongside BigQuery.\u003c/p\u003e\n"],["\u003cp\u003eThe core components of Cortex Framework Data Foundation offer flexibility, enabling organizations to interchange tools, customize SQL views, and adjust performance optimization techniques to align with their specific needs.\u003c/p\u003e\n"],["\u003cp\u003eThe upgrade process involves forking the core repository, setting up a company repository, merging changes from the new Cortex release, and strategically addressing conflicts to retain customizations, and can be done with multiple variations.\u003c/p\u003e\n"],["\u003cp\u003eThe recommended process includes thorough testing and deployment of merged changes to ensure compatibility and proper functionality within the company's customized environment, before merging it into the main branch.\u003c/p\u003e\n"]]],[],null,["# Upgrade recommendations\n=======================\n\nThis page describes recommendations to upgrade to new versions from customized\n[Cortex Framework Data Foundation](https://github.com/GoogleCloudPlatform/cortex-data-foundation).\nOn every release, the Cortex team commits to minimize disruptions while it add\nnew features to the Cortex Framework. New updates prioritize backward\ncompatibility. However, this guide helps you minimize the possible issues.\n\n[Cortex Framework Data Foundation](https://github.com/GoogleCloudPlatform/cortex-data-foundation)\nprovides a set of predefined content and templates to accelerate value from\ndata replicated into [BigQuery](https://cloud.google.com/bigquery).\nOrganizations adapt these templates, modules, SQL, Python scripts, pipelines\nand other content provided to fit their needs.\n\nCore components\n---------------\n\nCortex Framework Data Foundation content is designed with a principle of openness in mind.\nOrganizations can use the tools that work best for them when working with\nthe BigQuery data models provided. The only platform on which\nthe foundation has a tight dependency on is BigQuery. All\nother tools can be interchanged as required:\n\n- **Data Integration:** Any integration tool that has interconnectivity with BigQuery can be leveraged provided it can replicate raw tables and structures. For example, raw tables should resemble the same schema as they were created in SAP (same names, fields, and data types). In addition, the integration tool should be able to provide basic transformation services such as updating target data types for BigQuery compatibility as well as adding additional fields like timestamp or operations flag for highlighting new and changed records.\n- **Data Processing:** The Change Data Capture (CDC) processing scripts provide work with [Cloud Composer](https://cloud.google.com/composer) (or Apache Airflow) are optional. Conversely, the SQL statements are produced separately from the Airflow-specific files where possible, so that customers can make use of the separate SQL files in another tool as needed.\n- **Data Visualization:** While [Looker](https://cloud.google.com/looker) dashboarding templates are provided and contain visualizations and minimum logic, the core logic remains available in the data foundation within BigQuery by design to create visualizations with their reporting tool of choice.\n\nKey benefits\n------------\n\nCortex Framework Data Foundation is designed to be adaptable to\nvarious business needs. Its components are built with flexibility,\nallowing organizations to tailor the platform to their specific\nrequirements and getting the following benefits:\n\n- **Openness**: Integrates seamlessly with various data integration, processing, and visualization tools beyond BigQuery.\n- **Customization:** Organizations can modify and expand pre built components like SQL views to match their data models and business logic.\n- **Performance Optimization:** Techniques like partitioning, data quality checks, and clustering can be adjusted based on individual workloads and data volumes.\n- **Backward Compatibility:** Cortex strives to maintain backward compatibility in future releases, minimizing disruption to existing implementations. For information about version changes, see the [Release Notes](/cortex/docs/release-notes).\n- **Community Contribution:** Encourages knowledge sharing and collaboration among users.\n\nUpdate process\n--------------\n\nThe following sections share the instructions for one way in which developers\ncan keep their code up-to-date with the Cortex Framework Data Foundation repository while\nretaining their customizations. Use of the pre-delivered deployment scripts in\nCI/CD pipelines. However, organizations can employ alternative tools and\nmethodologies to suit their preferences, such as [Dataform](/dataform/docs),\nor automation tools provided by the different Git hosts, such as GitHub actions.\n\n### Set up your repository\n\nThis section outlines one approach to setting up your repository. Before following\nthese steps, a solid understanding of Git is recommended.\n\n1. **[Fork](https://github.com/GoogleCloudPlatform/cortex-data-foundation/fork) core repository** :\n Create a fork of the Cortex Framework Data\n Foundation repository. The fork keeps\n that repository receiving updates from the Google Cloud repository, and a\n separate repository for the *Company's main*.\n\n2. **Create Company Repository**: Establish a new Git host for your\n company's repository (for example, Cloud Source). Create a repository with the same\n names as your forked repository on the new host.\n\n3. **Initialize Company Repository** : Copy the code from your forked Repository\n into the newly created company repository. Add the original forked repository as an\n [upstream remote repository](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/configuring-a-remote-repository-for-a-fork) with the following command,\n and verify the remote has been added. This establishes a connection between\n your company repository and the original repository.\n\n git remote add google \u003c\u003cremote URL\u003e\u003e\n git remote -v\n git push --all google\n\n4. **Verify Repository Setup**: Ensure your company repository contains the\n cloned code and history. You should see the two remotes,\n origin and the one you added after using the command:\n\n git remote -v:\n\n You now have the repository, the *Company's repository*, where\n developers can submit their changes. Developers can now clone and work in\n branches in the new repository.\n\n### Merge your changes with a new Cortex release\n\nThis section describes the process of merging changes from the *Company's repository*\nand changes coming from the Google Cloud repository.\n\n1. **Update forks** : Click **Sync fork** to update your forks for your\n repository with the changes from the Google Cloud repository. For example,\n the following changes to the *Company's repository* are done. And there has\n been some other changes in the Data Foundation repository by Google Cloud in a\n new release.\n\n - Created and incorporated the use of a new view in SQL\n - Modified existing views\n - Replaced a script entirely with our own logic\n\n The following commands sequence adds the fork repository as\n an upstream remote repository to pull the updated release from as *GitHub*\n and checks out its main branch as *GitHub-main.* Then, this example checks\n out the main branch from the *Company's repository* in Google Cloud Source\n and creates a branch for merging called `merging_br`. \n\n git remote add github \u003c\u003cgithub fork\u003e\u003e\n git fetch github main\n git checkout -b github-main github/main\n git checkout main\n git checkout -b merging_br\n\n There are multiple ways to build this flow. The merging process could also\n happen in the fork in GitHub, be replaced by a rebase instead of a merge,\n and the merging branch could also be sent as a merge request. These variations\n of the process depend on current organizational policies, depth of changes\n and convenience.\n\n With this setup in place, you can compare the incoming changes to your local\n changes. It's recommended to use a tool in a graphic IDE of choice to see the\n changes and choose what gets merged. For example, Visual Studio.\n\n It's recommended flagging customizations using comments that stand out\n visually, to make the diff process easier.\n2. **Start the merge process** : Use the created branch (in this example, is\n the branch called `merging_br`) to converge all changes\n and discard files. When ready, you can merge this branch back into the main or\n another branch for your Company's repository to create a merge request. From\n that merging branch that was checked out from your Company's repository's main\n (`git checkout merging_br`), merge the incoming changes from the remote fork.\n\n ## git branch -a\n ## The command shows github-main which was created from the GitHub fork\n ## You are in merging_br\n\n git merge github-main\n\n ## If you don't want a list of the commits coming from GitHub in your history, use `--squash`\n\n This command generates a list of conflicts. Use the graphical IDE comparison\n to understand the changes and choose between *current* , *incoming* and *both*.\n This is where having a comment in the code around customizations becomes handy.\n Choose to discard changes altogether, delete files that you don't want to\n merge at all and ignore changes to views or scripts that you have already customized.\n3. **Merge changes**: After you have decided on the changes to apply, check the\n summary and commit them with the command:\n\n git status\n ## If something doesn't look right, you can use git rm or git restore accordingly\n git add --all #Or . or individual files\n git commit -m \"Your commit message\"\n\n If you feel insecure about any step, see [Git basic undoing things](https://git-scm.com/book/en/v2/Git-Basics-Undoing-Things).\n4. **Test and deploy** : So far you are only merging into a \"temporary\" branch.\n It's recommended running a test deployment from the `cloudbuild\\*.yaml` scripts\n at this point to make sure everything is executing as expected. Automated\n testing can help streamline this process. Once this merging branch looks good,\n you can checkout your main target branch and merge the `merging_br` branch into it."]]