Dataflow 是一种用于执行各种数据处理模式的托管式服务。本网站上提供的文档介绍如何使用 Dataflow 部署批量数据处理流水线和流式数据处理流水线,其中包括各项服务功能的使用说明。
Apache Beam SDK 是一个开源编程模型,既可用于开发批处理流水线,又可用于开发流处理流水线。您可以使用 Apache Beam 程序创建流水线,然后在 Dataflow 服务上运行这些流水线。Apache Beam 文档提供了有关 Apache Beam 编程模型、SDK 和其他运行程序的深入概念性信息和参考资料。
如需了解 Apache Apache 基本概念,请参阅 Beam 导览和 Beam Playground。Dataflow 实战宝典代码库还提供了即时可用且独立的流水线以及最常见的 Dataflow 应用场景。
Apache、Apache Beam、Beam、Beam 徽标和 Beam 萤火虫吉祥物是 Apache Software Foundation 在美国和/或其他国家/地区的注册商标。
获享 $300 免费赠金开始概念验证
-
体验 Gemini 2.0 Flash Thinking
-
免费使用热门产品(包括 AI API 和 BigQuery)的每月用量
-
不会自动收费,无需承诺
继续探索 20 多种提供“始终免费”用量的产品
使用适用于常见应用场景(包括 AI API、虚拟机、数据仓库等)的 20 多种免费产品。
使用场景
使用场景
运行 HPC 高度并行工作负载
使用 Dataflow,您可以在单个流水线中运行高度并行的工作负载,从而提高效率并使工作流更易于管理。
流式
使用场景
使用场景
使用 Dataflow ML 运行推理
借助 Dataflow ML,您可以使用 Dataflow 部署和管理完整的机器学习 (ML) 流水线。使用机器学习模型通过批处理和流处理流水线进行本地和远程推断。使用数据处理工具准备数据以用于模型训练并处理模型的结果。
机器学习
流式处理
使用场景
使用场景
创建电子商务流处理流水线
构建一个端到端的电子商务示例应用,该应用可将来自网上商店的数据流式传输到 BigQuery 和 Bigtable。示例应用展示了实现流式数据分析和实时人工智能 (AI) 的常见使用场景和最佳实践。
电子商务
流式处理
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-18。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-18。"],[[["\u003cp\u003eDataflow is a managed service for executing batch and streaming data processing pipelines, with comprehensive documentation available on deployment and feature usage.\u003c/p\u003e\n"],["\u003cp\u003eThe Apache Beam SDK, an open-source programming model, is used to create pipelines that can be run on the Dataflow service, and its documentation can be found on the Apache website.\u003c/p\u003e\n"],["\u003cp\u003eVarious guides, references, and resources are provided, including quickstarts for creating pipelines in Java, Python, and Go, along with troubleshooting information.\u003c/p\u003e\n"],["\u003cp\u003eDataflow supports highly parallel workloads, machine learning inference, and the creation of ecommerce streaming pipelines, which are detailed in use case examples.\u003c/p\u003e\n"],["\u003cp\u003eThe documentation provides access to code samples, pricing information, quotas, release notes, support and billing help, all relevant to the managed service.\u003c/p\u003e\n"]]],[],null,["# Dataflow documentation\n======================\n\n[Read product documentation](/dataflow/docs/overview)\nDataflow is a managed service for executing a wide variety of data\nprocessing patterns. The documentation on this site shows you how to deploy\nyour batch and streaming data processing pipelines using\nDataflow, including directions for using service features.\n\n\nThe Apache Beam SDK\nis an open source programming model that enables you to develop both batch\nand streaming pipelines. You create your pipelines with an Apache Beam\nprogram and then run them on the Dataflow service. The\n[Apache Beam\ndocumentation](https://beam.apache.org/documentation/) provides in-depth conceptual information and reference\nmaterial for the Apache Beam programming model, SDKs, and other runners.\n\nTo learn basic Apache Beam concepts, see the [Tour of Beam](https://tour.beam.apache.org/) and [Beam Playground](https://play.beam.apache.org/).\nThe [Dataflow Cookbook](https://github.com/GoogleCloudPlatform/dataflow-cookbook) repository also provides ready-to-launch and self-contained pipelines\nand the most common Dataflow use cases. \n*Apache, Apache Beam, Beam, the\nBeam logo, and the Beam firefly mascot are registered trademarks of The Apache Software Foundation in the\nUnited States and/or other countries.* [Get started for free](https://console.cloud.google.com/freetrial) \n\n#### Start your proof of concept with $300 in free credit\n\n- Get access to Gemini 2.0 Flash Thinking\n- Free monthly usage of popular products, including AI APIs and BigQuery\n- No automatic charges, no commitment \n[View free product offers](/free/docs/free-cloud-features#free-tier) \n\n#### Keep exploring with 20+ always-free products\n\n\nAccess 20+ free products for common use cases, including AI APIs, VMs, data warehouses,\nand more.\n\nDocumentation resources\n-----------------------\n\nFind quickstarts and guides, review key references, and get help with common issues. \nformat_list_numbered\n\n### Guides\n\n-\n\n [Create a Dataflow pipeline using Java](/dataflow/docs/quickstarts/create-pipeline-java)\n\n-\n\n [Create a Dataflow pipeline using Python](/dataflow/docs/quickstarts/create-pipeline-python)\n\n-\n\n [Create a Dataflow pipeline using Go](/dataflow/docs/quickstarts/create-pipeline-go)\n\n-\n\n [Create a streaming pipeline using a Dataflow template](/dataflow/docs/quickstarts/create-streaming-pipeline-template)\n\n-\n\n [Build and run a Flex Template](/dataflow/docs/guides/templates/using-flex-templates)\n\n-\n\n [Deploy Dataflow pipelines](/dataflow/docs/guides/deploying-a-pipeline)\n\n-\n\n [Develop with notebooks](/dataflow/docs/guides/interactive-pipeline-development)\n\n-\n\n [Troubleshooting and debugging](/dataflow/docs/guides/troubleshooting-your-pipeline)\n\nfind_in_page\n\n### Reference\n\n-\n\n [Install the Apache Beam SDK](/dataflow/docs/guides/installing-beam-sdk)\n\n-\n\n [Java SDK](https://beam.apache.org/documentation/sdks/javadoc/current/)\n\n-\n\n [Python SDK](https://beam.apache.org/documentation/sdks/pydoc/current/)\n\n-\n\n [Go SDK](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam)\n\n-\n\n [SDK version support status](/dataflow/docs/support/sdk-version-support-status)\n\n-\n\n [REST API](/dataflow/docs/reference/rest)\n\n-\n\n [gcloud command-line functions](/sdk/gcloud/reference/dataflow)\n\n-\n\n [Google-provided templates](/dataflow/docs/concepts/dataflow-templates)\n\ninfo\n\n### Resources\n\n-\n\n [Dataflow code samples](/dataflow/docs/samples)\n\n-\n\n [Pricing](/dataflow/pricing)\n\n-\n\n [Quotas and limits](/dataflow/quotas)\n\n-\n\n [Release Notes](/dataflow/docs/release-notes)\n\n-\n\n [Getting support](/dataflow/docs/support/getting-support)\n\n-\n\n [Billing questions](/dataflow/docs/support/billing-questions)\n\nRelated resources\n-----------------\n\nExplore self-paced training, use cases, reference architectures, and code samples with examples of how to use and connect Google Cloud services. Use case \nUse cases\n\n### Run HPC highly parallel workloads\n\n\nWith Dataflow, you can run your highly parallel workloads in a single pipeline, improving efficiency and making your workflow easier to manage.\n\nStreaming\n\n\u003cbr /\u003e\n\n[Learn more](/dataflow/docs/hpc-ep) \nUse case \nUse cases\n\n### Run inference with Dataflow ML\n\n\nDataflow ML lets you use Dataflow to deploy and manage complete machine learning (ML) pipelines. Use ML models to do local and remote inference with batch and streaming pipelines. Use data processing tools to prepare your data for model training and to process the results of the models.\n\nML Streaming\n\n\u003cbr /\u003e\n\n[Learn more](/dataflow/docs/machine-learning) \nUse case \nUse cases\n\n### Create an ecommerce streaming pipeline\n\n\nBuild an end-to-end ecommerce sample application that streams data from a webstore to BigQuery and Bigtable. The sample application illustrates common use cases and best practices for implementing streaming data analytics and real-time artificial intelligence (AI).\n\necommerce Streaming\n\n\u003cbr /\u003e\n\n[Learn more](/dataflow/docs/tutorials/ecommerce-retail-pipeline)\n\nRelated videos\n--------------"]]