[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-18。"],[[["\u003cp\u003eThis step involves configuring the integration between Cortex Framework Data Foundation and your chosen data sources, enabling data centralization from diverse platforms.\u003c/p\u003e\n"],["\u003cp\u003eCortex Framework uses Dataflow pipelines triggered by Cloud Composer DAGs to ingest data into a raw layer, apply Change Data Capture (CDC) processing to create a CDC layer, and then create a final reporting layer.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003econfig.json\u003c/code\u003e file contains settings for connecting to various data sources, which can be operational, marketing, or sustainability-focused, and specific documentation is available for each source type.\u003c/p\u003e\n"],["\u003cp\u003eThe K9 deployer simplifies data integration by providing reusable components and datasets for ingestion, processing, and modeling, such as the "time" dimension, and combining it with other sources to provide richer insights.\u003c/p\u003e\n"],["\u003cp\u003eThe K9 pre-processing step ensures consistent and reusable data for all workloads, reducing redundancy and maintaining data consistency across the data pipeline.\u003c/p\u003e\n"]]],[],null,["# Step 3: Determine integration mechanism\n=======================================\n\nThis page describes the third step to deploy Cortex Framework Data Foundation,\nthe core of Cortex Framework. In this step, you configure the integration\nwith your chosen data source. **If you are using sample data, skip this step**.\n| **Note:** The steps outlined in this page are specifically designed for deploying Cortex Framework Data Foundation from the [official GitHub repository](https://github.com/GoogleCloudPlatform/cortex-data-foundation).\n\nIntegration overview\n--------------------\n\nCortex Framework helps you centralize data from various sources, along with\nother platforms. This creates a single source of truth for your data. Cortex\nData Foundation integrates with each data source in different ways, but most\nof them follow a similar procedure:\n\n- **Source to Raw layer:** Ingest data from data source to raw dataset using APIs. This is achieved by using Dataflow pipelines triggered through Cloud Composer DAGs.\n- **Raw layer to CDC layer**: Apply CDC processing on raw dataset and store the output in CDC dataset. This is accomplished by Cloud Composer DAGs running BigQuery SQLs.\n- **CDC layer to Reporting layer:** Creates final reporting tables from CDC tables in the Reporting dataset. This is accomplished by either creating runtime views on top of CDC tables or running Cloud Composer DAGs for materialized data in BigQuery tables - depending on how it's configured. For more information about configuration, see [Customizing reporting settings file](/cortex/docs/deployment-step-five#customizing_reporting_settings_file).\n\nThe [`config.json`](https://github.com/GoogleCloudPlatform/cortex-data-foundation/blob/main/config/config.json) file configures the settings required to connect to data\nsources for transferring data from various workloads. See the integration\noptions for each data source in the following resources.\n\n- Operational:\n - [SAP (SAP ECC or SAP S/4 HANA)](/cortex/docs/operational-sap)\n - [Salesforce Sales Cloud](/cortex/docs/operational-salesforce)\n - [Oracle EBS](/cortex/docs/operational-oracle-ebs)\n- Marketing:\n - [Google Ads](/cortex/docs/marketing-googleads)\n - [Campaign Manager 360 (CM360)](/cortex/docs/marketing-cm360)\n - [TikTok](/cortex/docs/marketing-tiktok)\n - [LiveRamp](/cortex/docs/marketing-liveramp)\n - [Meta (Facebook / Instagram)](/cortex/docs/marketing-meta)\n - [Salesforce Marketing Cloud (SFMC)](/cortex/docs/marketing-salesforce)\n - [YouTube (with DV360)](/cortex/docs/marketing-dv360)\n - [Google Analytics 4](/cortex/docs/marketing-google-analytics)\n - [Cross Media \\& Product Connected Insights](/cortex/docs/marketing-cross-media)\n - [Cortex for Meridian](/cortex/docs/meridian)\n- Sustainability:\n - [Dun \\& Bradstreet](/cortex/docs/dun-and-bradstreet)\n\nFor more information about the **Entity-Relationship Diagrams** that each\ndata source supports, see the [`docs`](https://github.com/GoogleCloudPlatform/cortex-data-foundation/tree/main/docs) folder in the Cortex Framework Data Foundation repository.\n\nK9 deployment\n-------------\n\nThe [K9 deployer](https://github.com/GoogleCloudPlatform/cortex-data-foundation/tree/main/src/k9)\nsimplifies the integration of diverse data sources. The K9 deployer\nis a predefined dataset within the BigQuery\nenvironment responsible for ingesting, processing, and modeling of\ncomponents that are reusable across different data sources.\n\nFor example, the `time` dimension is reusable across all data sources where tables\nmight need to take analytical results based on a Gregorian calendar. The K9\ndeployer combines external data like weather or Google Trends with other data sources\n(for example, SAP, Salesforce, Marketing). This enriched dataset enables\ndeeper insights and more comprehensive analysis.\n\nThe following diagram shows the flow of data from different raw sources to\nvarious reporting layers:\n\n**Figure 1**. K9 datasets.\n\nIn the diagram, the [source project](/cortex/docs/deployment-step-one#projects)\ncontains the raw data from the chosen data sources (SAP, Salesforce,\nand Marketing). While the [target project](/cortex/docs/deployment-step-one#projects)\ncontains processed data, derived from the Change Data Capture (CDC) process.\n\nThe pre-processing K9 step runs before all workloads start their deployment, so\nthe reusable models are available during their deployment. This step transforms\ndata from various sources to create a consistent and reusable dataset.\n\nThe post-processing K9 steps occurs after all workloads have deployed their\nreporting models to enable cross-workload reporting or augmenting models to\nfind their necessary dependencies within each individual reporting dataset.\n\n### Configure the K9 deployment\n\nConfigure the Directed Acyclic Graphs (DAGs) and models to be generated\nin the [K9 manifest file](https://github.com/GoogleCloudPlatform/cortex-data-foundation/blob/main/src/k9/src/manifest.yaml).\n\nThe K9 pre-processing step is important because it ensures that all workloads\nwithin the data pipeline have access to consistently prepared data. This reduces\nredundancy and ensures data consistency.\n| **Note:** If you are using a deployment framework like Dataform or [dbt](https://www.getdbt.com/), we recommended considering porting the K9 pre-processing DAG execution into your scheduler of choice, and ensure your reporting views use the pre-processed data generated by K9 (instead of the raw data sources). **The Cortex team can't provide\n| support for external frameworks**.\n\nFor more information about how to configure external datasets for K9, see [Configure external datasets for K9](/cortex/docs/optional-step-external-datasets).\n\nNext steps\n----------\n\nAfter you complete this step, move on to the following deployment steps:\n\n1. [Establish workloads](/cortex/docs/deployment-step-one).\n2. [Clone repository](/cortex/docs/deployment-step-two).\n3. [Determine integration mechanism](/cortex/docs/deployment-step-three) (this page).\n4. [Set up components](/cortex/docs/deployment-step-four).\n5. [Configure deployment](/cortex/docs/deployment-step-five).\n6. [Execute deployment](/cortex/docs/deployment-step-six)."]]