Cloud Dataproc 是一项托管式 Apache Spark 和 Apache Hadoop 服务,借助该服务,您可以充分利用开源数据工具来执行批处理、查询、流式传输和机器学习。Dataproc 自动化功能可帮助您快速创建集群并轻松管理,以及在不需要集群时将其关闭以节省费用。由于在管理上花费的时间和费用更少,您可以将精力集中在作业和数据上。了解详情
获享 $300 免费赠金开始概念验证
-
体验 Gemini 2.0 Flash Thinking
-
免费使用热门产品(包括 AI API 和 BigQuery)的每月用量
-
不会自动收费,无需承诺
继续探索 20 多种提供“始终免费”用量的产品
使用适用于常见应用场景(包括 AI API、虚拟机、数据仓库等)的 20 多种免费产品。
培训
培训和教程
在 Google Kubernetes Engine 上运行 Spark 作业
从 Dataproc Jobs API 将 Spark 作业提交到正在运行的 Google Kubernetes Engine 集群。
培训
培训和教程
Cloud Dataproc 简介:Google Cloud 上的 Hadoop 和 Spark
该课程包含讲座、演示和实操实验来创建 Dataproc 集群、提交 Spark 作业,然后关闭集群。
培训
培训和教程
使用 Dataproc 上的 Spark 进行机器学习
本课程包含讲座、演示和实操实验,使用在 Dataproc 集群上运行的 Apache Spark 的机器学习库实现逻辑回归,从而针对多变量数据集中的数据开发模型。
使用场景
使用场景
工作流安排解决方案
在 Google Cloud 上安排工作流。
使用场景
使用场景
将 HDFS 数据从本地迁移到 Google Cloud
如何将数据从本地 Hadoop 分布式文件系统 (HDFS) 迁移到 Google Cloud。
使用场景
使用场景
管理 Apache Spark 的 Java 和 Scala 依赖项
向 Dataproc 集群提交 Spark 作业时添加这些依赖项的建议方法。
代码示例
代码示例
Python API 示例
从 Python 调用 Dataproc API。
代码示例
代码示例
Java API 示例
从 Java 调用 Dataproc API。
代码示例
代码示例
Node.js API 示例
从 Node.js 调用 Dataproc API。
代码示例
代码示例
Go API 示例
从 Go 调用 Dataproc API。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-29。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-29。"],[[["\u003cp\u003eDataproc is a managed service for Apache Spark and Apache Hadoop, enabling batch processing, querying, streaming, and machine learning with open-source data tools.\u003c/p\u003e\n"],["\u003cp\u003eDataproc automates cluster creation and management, helping users save time and money by allowing clusters to be turned off when not in use.\u003c/p\u003e\n"],["\u003cp\u003eDocumentation provides resources such as quickstarts, guides, references, and help for common issues.\u003c/p\u003e\n"],["\u003cp\u003eDataproc can be used on a variety of use cases such as workflow scheduling solutions, migrating data from on-premise, and dependency management.\u003c/p\u003e\n"],["\u003cp\u003eThe documentation provides examples on how to call the Dataproc API in Python, Java, Node.js, and Go.\u003c/p\u003e\n"]]],[],null,["# Dataproc documentation\n======================\n\n[Read product documentation](/dataproc/docs/concepts/overview) Dataproc \\| [Serverless for Apache Spark](/dataproc-serverless/docs \"View this page for Serverless for Apache Spark\") \\| [Dataproc Metastore](/dataproc-metastore/docs \"View this page for Dataproc Metastore\")\n\n\nDataproc is a managed Apache Spark and Apache Hadoop service that lets you take advantage of open\nsource data tools for batch processing, querying, streaming, and machine learning.\nDataproc automation helps you create clusters quickly, manage them easily, and save\nmoney by turning clusters off when you don't need them. With less time and money spent on\nadministration, you can focus on your jobs and your data.\n[Learn more](/dataproc/docs/concepts/overview)\n[Get started for free](https://console.cloud.google.com/freetrial) \n\n#### Start your proof of concept with $300 in free credit\n\n- Get access to Gemini 2.0 Flash Thinking\n- Free monthly usage of popular products, including AI APIs and BigQuery\n- No automatic charges, no commitment \n[View free product offers](/free/docs/free-cloud-features#free-tier) \n\n#### Keep exploring with 20+ always-free products\n\n\nAccess 20+ free products for common use cases, including AI APIs, VMs, data warehouses,\nand more.\n\nDocumentation resources\n-----------------------\n\nFind quickstarts and guides, review key references, and get help with common issues. \nformat_list_numbered\n\n### Guides\n\n-\n\n\n Quickstarts:\n [Console](/dataproc/docs/quickstarts/update-cluster-console),\n\n [Command-line](/dataproc/docs/quickstarts/update-cluster-gcloud),\n\n [Client Libraries](/dataproc/docs/quickstarts/create-cluster-client-libraries),\n\n [APIs Explorer---Create a cluster](/dataproc/docs/quickstarts/create-cluster-template),\n or\n [APIs Explorer---Submit a Spark job](/dataproc/docs/quickstarts/submit-sparks-job-template)\n\n\n-\n\n [Overview of Dataproc Workflow Templates](/dataproc/docs/concepts/workflows/overview)\n\n-\n\n [Dataproc on GKE Quickstart](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster)\n\n-\n\n [Configure Dataproc Hub](/dataproc/docs/tutorials/dataproc-hub-admins)\n\n-\n\n [Create a Dataproc Custom Image](/dataproc/docs/guides/dataproc-images)\n\n-\n\n [Write a MapReduce job with the BigQuery connector](/dataproc/docs/tutorials/bigquery-connector-mapreduce-example)\n\n-\n\n [Use the Cloud Storage connector with Apache Spark](/dataproc/docs/tutorials/gcs-connector-spark-tutorial)\n\nfind_in_page\n\n### Reference\n\n-\n\n [REST API](/dataproc/docs/reference/rest)\n\n-\n\n [RPC API](/dataproc/docs/reference/rpc)\n\n-\n\n [Dataproc Client Libraries](/dataproc/docs/reference/libraries)\n\n-\n\n [Dataproc \\& Cloud SDK](/dataproc/docs/gcloud-installation)\n\n-\n\n [Overview of APIs and Client Libraries](/dataproc/docs/api-libraries-overview)\n\ninfo\n\n### Resources\n\n-\n\n [Best practices](https://cloud.google.com/blog/topics/developers-practitioners/dataproc-best-practices-guide)\n\n-\n\n [Pricing](/dataproc/pricing)\n\n-\n\n [Release notes](/dataproc/docs/release-notes)\n\n-\n\n [Diagnose Dataproc clusters](/dataproc/docs/support/diagnose-command)\n\n-\n\n [Dataproc Quotas](/dataproc/quotas)\n\n-\n\n [Get support](/dataproc/docs/support/getting-support)\n\n- \n\nRelated resources\n-----------------\n\nTraining and tutorials \nUse cases \nCode samples \nExplore self-paced training, use cases, reference architectures, and code samples with examples of how to use and connect Google Cloud services. Training \nTraining and tutorials\n\n### Run a Spark job on Google Kubernetes Engine\n\n\nSubmit Spark jobs to a running Google Kubernetes Engine cluster from the Dataproc Jobs API.\n\n\n[Learn more](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster) \nTraining \nTraining and tutorials\n\n### Introduction to Cloud Dataproc: Hadoop and Spark on Google Cloud\n\n\nThis course features a combination of lectures, demos, and hands-on labs to create a Dataproc cluster, submit a Spark job, and then shut down the cluster.\n\n\n[Learn more](https://www.cloudskillsboost.google/focuses/672?parent=catalog) \nTraining \nTraining and tutorials\n\n### Machine Learning with Spark on Dataproc\n\n\nThis course features a combination of lectures, demos, and hands-on labs to implement logistic regression using a machine learning library for Apache Spark running on a Dataproc cluster to develop a model for data from a multivariable dataset.\n\n\n[Learn more](https://www.cloudskillsboost.google/focuses/3390?parent=catalog) \nUse case \nUse cases\n\n### Workflow scheduling solutions\n\n\nSchedule workflows on Google Cloud.\n\n\n[Learn more](/dataproc/docs/concepts/workflows/workflow-schedule-solutions) \nUse case \nUse cases\n\n### Migrate HDFS Data from On-Premises to Google Cloud\n\n\nHow to move data from on-premises Hadoop Distributed File System (HDFS) to Google Cloud.\n\n\n[Learn more](/solutions/migration/hadoop/hadoop-gcp-migration-data) \nUse case \nUse cases\n\n### Manage Java and Scala dependencies for Apache Spark\n\n\nRecommended approaches to including dependencies when you submit a Spark job to a Dataproc cluster.\n\n\n[Learn more](/dataproc/docs/guides/manage-spark-dependencies) \nCode sample \nCode Samples\n\n### Python API samples\n\n\nCall Dataproc APIs from Python.\n\n\n[Open GitHub\narrow_forward](https://github.com/googleapis/python-dataproc/tree/master/samples) \nCode sample \nCode Samples\n\n### Java API samples\n\n\nCall Dataproc APIs from Java.\n\n\n[Open GitHub\narrow_forward](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/main/dataproc) \nCode sample \nCode Samples\n\n### Node.js API samples\n\n\nCall Dataproc APIs from Node.js.\n\n\n[Open GitHub\narrow_forward](https://github.com/GoogleCloudPlatform/nodejs-docs-samples/tree/main/dataproc) \nCode sample \nCode Samples\n\n### Go API samples\n\n\nCall Dataproc APIs from Go.\n\n\n[Open GitHub\narrow_forward](https://github.com/GoogleCloudPlatform/golang-samples/tree/master/dataproc)\n\nRelated videos\n--------------"]]