Looking up the details of a job will show who submitted that job in the submittedBy
field. For example, this job output shows user@domain submitted the example
job to a cluster.
This indicates that the Dataproc
agent on the master node was unable to acquire the task from the control plane.
This often happens due to out-of-memory (OOM) or network issues.
If the job ran successfully previously and you have no
t changed network
configuration settings, OOM is the most likely cause, often the
result of the submission of many concurrently running jobs or
jobs whose drivers consume significant memory (for example, jobs that
load large datasets into memory).
No agent on master node(s) found to be active
This indicates that the Dataproc agent on the master node is not
active and cannot accept new jobs. This often happens due to out-of-memory (OOM)
or network issues, or if the master node VM is unhealthy. If the job ran successfully
previously and you have not changed
network configuration settings, OOM is the most likely cause, which often
results from the submission of many concurrently running jobs or jobs
whose drivers consume significant memory (jobs that load
large datasets into memory).
To help resolve the problem, you can try the following actions:
This error indicates that the cluster was deleted while a job was running.
You can perform the following actions to identify the principal that
performed the deletion and confirm that cluster deletion occurred when a
job was running:
View the Dataproc audit logs to identify
the principal that performed the
delete operation.
Use Logging or the gcloud CLI to
check that the YARN application's last known state was
RUNNING:
Use the following filter in Logging:
resource.type="cloud_dataproc_cluster"
resource.labels.cluster_name="CLUSTER_NAME"
resource.labels.cluster_uuid="CLUSTER_UUID"
"YARN_APPLICATION_ID State change from"
Run gcloud dataproc jobs describe job-id --region=REGION,
then check yarnApplications: > STATE in the output.
To avoid Task not found errors, use automation to make sure that clusters are not deleted
before all running jobs have completed.
No space left on device
Dataproc writes HDFS and scratch
data to disk. This error message indicates that the cluster was created with
insufficient disk space. To analyze and avoid this error:
Check the cluster primary disk size listed under the Configuration tab on
the Cluster details page in the Google Cloud console.
The recommended minimum disk size is 1000 GB for clusters using the
n1-standard-4 machine-type and 2 TB for clusters using the n1-standard-32
machine-type.
If the cluster disk size is less than the recommended size,
recreate the cluster
with at least the recommended disk size.
If disk size is the recommended size or greater, use
SSH to connect to the cluster
master VM, and then run df -h on the master VM to check disk utilization
to determine if additional disk space is needed.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[[["\u003cp\u003eThis page helps users monitor and debug Dataproc jobs, offering guidance on understanding job error messages.\u003c/p\u003e\n"],["\u003cp\u003eUsers can analyze and debug Dataproc jobs using the Google Cloud CLI, Dataproc REST API, and Google Cloud console.\u003c/p\u003e\n"],["\u003cp\u003eJob logs can be accessed in Logging, and the specific logs of the dataproc agent can be selected via the Logs Explorer using specific filters.\u003c/p\u003e\n"],["\u003cp\u003eIf a job fails, users can access job logs in Logging, or use the details of a job to check who submitted it.\u003c/p\u003e\n"],["\u003cp\u003eCommon error messages, such as "Task was not acquired," "No agent on master node(s) found to be active," and "Task not found," are explained, with troubleshooting steps provided for each.\u003c/p\u003e\n"]]],[],null,["# Troubleshoot Dataproc jobs\n\nThis page provides information to help you monitor and debug Dataproc\njobs and understand Dataproc job error messages.\n\nJob monitoring and debugging\n----------------------------\n\nUse the Google Cloud CLI, Dataproc REST API, and Google Cloud console\nto analyze and debug Dataproc jobs. \n\n### gcloud CLI\n\nTo examine a running job's status: \n\n```\ngcloud dataproc jobs describe job-id \\\n --region=region\n```\n\nTo view job driver output, see\n[View job output](/dataproc/docs/guides/dataproc-job-output#view_job_output).\n\n### REST API\n\nCall [jobs.get](/dataproc/docs/reference/rest/v1/projects.regions.jobs/get)\nto examine a job's\n[JobStatus.State](/dataproc/docs/reference/rest/v1/projects.regions.jobs#State),\n[JobStatus.Substate](/dataproc/docs/reference/rest/v1/projects.regions.jobs#substate),\n[JobStatus.details](/dataproc/docs/reference/rest/v1/projects.regions.jobs#jobstatus), and\n[YarnApplication](/dataproc/docs/reference/rest/v1/projects.regions.jobs#YarnApplication)\nfields.\n\n### Console\n\nTo view job driver output, see\n[View job output](/dataproc/docs/guides/dataproc-job-output#view_job_output).\n\nTo view the dataproc agent log in [Logging](https://console.cloud.google.com/logs/query),\nselect **Dataproc Cluster→Cluster Name→Cluster UUID**\nfrom the Logs Explorer cluster selector.\n\nThen use the logs selector to select `google.dataproc.agent` logs.\n\n### View job logs in Logging\n\nIf a job fails, you can\n[access job logs in Logging](/dataproc/docs/guides/logging#accessing_job_logs_in).\n\n### Determine who submitted a job\n\n| **Google Cloud console support** \n| The ability to see who submitted a job in the Google Cloud console will be added in a future Dataproc release.\n\n[Looking up the details of a job](/dataproc/docs/guides/dataproc-job-output#view_job_output) will show who submitted that job in the `submittedBy`\nfield. For example, this job output shows `user@domain` submitted the example\njob to a cluster. \n\n```\n...\nplacement:\n clusterName: cluster-name\n clusterUuid: cluster-uuid\nreference:\n jobId: job-uuid\n projectId: project\nstatus:\n state: DONE\n stateStartTime: '2018-11-01T00:53:37.599Z'\nstatusHistory:\n- state: PENDING\n stateStartTime: '2018-11-01T00:33:41.387Z'\n- state: SETUP_DONE\n stateStartTime: '2018-11-01T00:33:41.765Z'\n- details: Agent reported job success\n state: RUNNING\n stateStartTime: '2018-11-01T00:33:42.146Z'\nsubmittedBy: \u003cvar translate=\"no\"\u003euser@domain\u003c/var\u003e\n```\n\n### Error messages\n\n#### Task was not acquired\n\nThis indicates that the Dataproc\nagent on the master node was unable to acquire the task from the control plane.\nThis often happens due to out-of-memory (OOM) or network issues.\nIf the job ran successfully previously and you have no\nt changed network\nconfiguration settings, OOM is the most likely cause, often the\nresult of the submission of many concurrently running jobs or\njobs whose drivers consume significant memory (for example, jobs that\nload large datasets into memory).\n\n#### No agent on master node(s) found to be active\n\nThis indicates that the Dataproc agent on the master node is not\nactive and cannot accept new jobs. This often happens due to out-of-memory (OOM)\nor network issues, or if the master node VM is unhealthy. If the job ran successfully\npreviously and you have not changed\nnetwork configuration settings, OOM is the most likely cause, which often\nresults from the submission of many concurrently running jobs or jobs\nwhose drivers consume significant memory (jobs that load\nlarge datasets into memory).\n\nTo help resolve the problem, you can try the following actions:\n\n- Restart the job.\n- [Connect using SSH to the cluster master node](/dataproc/docs/concepts/accessing/ssh), and then determine what job or other resource is using the most memory.\n- If you can't sign in to the master node, you can check the\n [serial port (console) logs](/compute/docs/troubleshooting/viewing-serial-port-output).\n\n- [Generate a diagnostic bundle](/dataproc/docs/support/diagnose-clusters),\n which contains the syslog and other data.\n\n#### Task not found\n\nThis error indicates that the cluster was deleted while a job was running.\nYou can perform the following actions to identify the principal that\nperformed the deletion and confirm that cluster deletion occurred when a\njob was running:\n\n- View the Dataproc audit logs to identify\n the principal that performed the\n [delete operation](/dataproc/docs/guides/audit-logging#google.cloud.dataproc.v1.ClusterController.DeleteCluster).\n\n- Use Logging or the gcloud CLI to\n check that the YARN application's last known state was\n [RUNNING](https://hadoop.apache.org/docs/current/api//org/apache/hadoop/yarn/api/records/YarnApplicationState.html#RUNNING):\n\n - Use the following filter in Logging:\n\n ```\n resource.type=\"cloud_dataproc_cluster\"\n resource.labels.cluster_name=\"CLUSTER_NAME\"\n resource.labels.cluster_uuid=\"CLUSTER_UUID\"\n \"YARN_APPLICATION_ID State change from\"\n ```\n - Run `gcloud dataproc jobs describe job-id --region=`\u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e, then check `yarnApplications: \u003e `\u003cvar translate=\"no\"\u003eSTATE\u003c/var\u003e in the output.\n- If the principal that deleted the cluster is the\n [Dataproc service agent service account](/dataproc/docs/concepts/iam/dataproc-principals#service_agent_control_plane_identity),\n check if the cluster was configured with an\n [auto-delete duration](/dataproc/docs/concepts/configuring-clusters/scheduled-deletion)\n that is less than the job duration.\n\nTo avoid `Task not found` errors, use automation to make sure that clusters are not deleted\nbefore all running jobs have completed.\n\n#### No space left on device\n\nDataproc writes HDFS and scratch\ndata to disk. This error message indicates that the cluster was created with\ninsufficient disk space. To analyze and avoid this error:\n\n- Check the cluster primary disk size listed under the **Configuration** tab on\n the **Cluster details** page in the Google Cloud console.\n The recommended minimum disk size is `1000 GB` for clusters using the\n `n1-standard-4` machine-type and `2 TB` for clusters using the `n1-standard-32`\n machine-type.\n\n- If the cluster disk size is less than the recommended size,\n [recreate the cluster](/dataproc/docs/guides/recreate-cluster)\n with at least the recommended disk size.\n\n- If disk size is the recommended size or greater, use\n [SSH to connect to the cluster](/dataproc/docs/concepts/accessing/ssh)\n master VM, and then run `df -h` on the master VM to check disk utilization\n to determine if additional disk space is needed.\n\n- [Collect and check disk metrics](/dataproc/docs/guides/dataproc-metrics#custom_metric_collection)."]]