[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-27。"],[[["\u003cp\u003eCloud Profiler continuously tracks application CPU usage and memory allocation for Dataproc Hadoop and Spark jobs, which must run longer than 3 minutes to collect data.\u003c/p\u003e\n"],["\u003cp\u003eTo enable Profiler, you must create a Dataproc cluster with monitoring service account scopes and grant the Cloud Profiler Agent role to any custom VM service account used.\u003c/p\u003e\n"],["\u003cp\u003eProfiler can be enabled on job submissions via \u003ccode\u003ecloud.profiler.enable=true\u003c/code\u003e, along with optional properties like \u003ccode\u003ecloud.profiler.name\u003c/code\u003e and \u003ccode\u003ecloud.profiler.service.version\u003c/code\u003e to customize profile identification and categorization.\u003c/p\u003e\n"],["\u003cp\u003eOverriding Spark or MapReduce properties directly in job configurations can prevent profiler options from being set, however using the \u003ccode\u003egcloud dataproc jobs submit --properties\u003c/code\u003e flag retains profiler option settings.\u003c/p\u003e\n"],["\u003cp\u003eWhen profiling is enabled, two profiles are created, \u003ccode\u003eprofiler_name\u003c/code\u003e-driver and \u003ccode\u003eprofiler_name\u003c/code\u003e-executor, to separately profile spark driver and executor tasks, respectively, and they can be viewed in the Cloud Profiler interface.\u003c/p\u003e\n"]]],[],null,["# Cloud Profiler\n\n[Cloud Profiler](/profiler/docs) continuously gathers and reports\napplication CPU usage and memory-allocation information.\n\n**Requirements:**\n\n- Profiler supports only Dataproc\n Hadoop and Spark job types (Spark, PySpark, SparkSql, and SparkR).\n\n- Jobs must run longer than 3 minutes to allow Profiler\n to collect and upload data to your project.\n\nDataproc recognizes `cloud.profiler.enable` and the other\n`cloud.profiler.*` properties (see\n[Profiler options](#submit-a-dataproc-job-with-profiler-options)), and then appends\nthe relevant profiler JVM options to the following configurations:\n\n- Spark: `spark.driver.extraJavaOptions` and `spark.executor.extraJavaOptions`\n- MapReduce: `mapreduce.task.profile` and other `mapreduce.task.profile.*` properties\n\n| **Note:** Overriding Spark or MapReduce properties in your job (for example, by manually constructing a SparkConf and setting `spark.executor.extraJavaOptions`), prevents the setting of profiler options. However, if you provide `extraJavaOptions` using the `gcloud dataproc jobs submit (spark|hadoop) --properties` flag, Dataproc retains and sets profiler options.\n\nEnable profiling\n----------------\n\nComplete the following steps to enable and use the Profiler\non your Dataproc Spark and Hadoop jobs.\n\n1. [Enable the Profiler](/profiler/docs/profiling-java#enabling-profiler).\n\n2. [Create a Dataproc cluster](/dataproc/docs/guides/create-cluster#creating_a_cloud_dataproc_cluster)\n with [service account scopes](/sdk/gcloud/reference/dataproc/clusters/create#--scopes)\n set to `monitoring` to allow the cluster to talk to the profiler service.\n\n3. If you are using a\n [custom VM service account](/dataproc/docs/concepts/configuring-clusters/service-accounts#create_a_cluster_with_a_custom_vm_service_account),\n grant the [Cloud Profiler Agent](/profiler/docs/iam#cloudprofiler.agent) role to the custom VM service account. This\n role contains required profiler service permissions.\n\n### gcloud\n\n```\ngcloud dataproc clusters create cluster-name \\\n --scopes=cloud-platform \\\n --region=region \\\n other args ...\n```\n\nSubmit a Dataproc job with Profiler options\n-------------------------------------------\n\n1. [Submit a Dataproc Spark or Hadoop job](/dataproc/docs/guides/submit-job#submitting_a_job) with one or more of the following Profiler options:\n\n### PySpark Example\n\n### Google Cloud CLI\n\nPySpark job submit with profiling example: \n\n```\ngcloud dataproc jobs submit pyspark python-job-file \\\n --cluster=cluster-name \\\n --region=region \\\n --properties=cloud.profiler.enable=true,cloud.profiler.name=profiler_name,cloud.profiler.service.version=version \\\n -- job args\n```\n\nTwo profiles will be created:\n\n1. \u003cvar translate=\"no\"\u003eprofiler_name\u003c/var\u003e`-driver` to profile spark driver tasks\n2. \u003cvar translate=\"no\"\u003eprofiler_name\u003c/var\u003e`-executor` to profile spark executor tasks\n\nFor example, if the `profiler_name` is \"spark_word_count_job\",\n`spark_word_count_job-driver` and `spark_word_count_job-executor`\nprofiles are created.\n\n### Hadoop Example\n\n### gcloud CLI\n\nHadoop (teragen mapreduce) job submit with profiling example: \n\n```\ngcloud dataproc jobs submit hadoop \\\n --cluster=cluster-name \\\n --region=region \\\n --jar=jar-file \\\n --properties=cloud.profiler.enable=true,cloud.profiler.name=profiler_name,cloud.profiler.service.version=version \\\n -- teragen 100000 gs://bucket-name\n```\n\nView profiles\n-------------\n\nView profiles from the [Profiler](https://console.cloud.google.com/profiler) on\nthe Google Cloud console.\n\nWhats next\n----------\n\n- See the [Monitoring documentation](/monitoring/docs)\n- See the [Logging documentation](/logging/docs)\n- Explore [Google Cloud Observability](/stackdriver/docs)"]]