Para visualizar o registro do agente dataproc no Logging,
selecione Cluster do Dataproc→Nome do cluster→UUID do cluster
no seletor de clusters do Explorador de registros.
Em seguida, use o seletor para escolher registros google.dataproc.agent.
A pesquisa dos detalhes de um job mostrará quem enviou esse job no campo submittedBy. Por exemplo, a saída desse job mostra que user@domain enviou o job de exemplo para um cluster.
Isso indica que o agente do Dataproc no nó mestre não conseguiu adquirir a tarefa do plano de controle.
Isso geralmente acontece devido a problemas de falta de memória (OOM, na sigla em inglês) ou rede.
Se o job foi executado com sucesso antes e você não mudou as configurações de configuração de rede, a causa mais provável é a falta de memória, geralmente resultado do envio de muitos jobs em execução simultânea ou jobs cujos drivers consomem muita memória (por exemplo, jobs que carregam grandes conjuntos de dados na memória).
Nenhum agente ativo encontrado nos nós principais
Isso indica que o agente do Dataproc no nó mestre não está ativo e não pode aceitar novos jobs. Isso geralmente acontece devido a problemas de falta de memória (OOM)
ou rede, ou se a VM do nó principal não estiver íntegra. Se o job foi executado com sucesso
antes e você não mudou as
configurações de rede, a causa mais provável é a falta de memória, que geralmente
resulta do envio de muitos jobs em execução simultânea ou jobs
cujos drivers consomem muita memória (jobs que carregam
grandes conjuntos de dados na memória).
Para resolver o problema, tente as seguintes ações:
Esse erro indica que o cluster foi excluído enquanto um job estava em execução.
É possível realizar as seguintes ações para identificar a entidade de segurança que
realizou a exclusão e confirmar que ela ocorreu quando um
job estava em execução:
Consulte os registros de auditoria do Dataproc para identificar
o principal que realizou a
operação de exclusão.
Use o Logging ou a CLI gcloud para
verificar se o último estado conhecido do aplicativo YARN era
RUNNING:
Use o seguinte filtro no Logging:
resource.type="cloud_dataproc_cluster"
resource.labels.cluster_name="CLUSTER_NAME"
resource.labels.cluster_uuid="CLUSTER_UUID"
"YARN_APPLICATION_ID State change from"
Execute gcloud dataproc jobs describe job-id --region=REGION e verifique yarnApplications: > STATE na saída.
Para evitar erros de Task not found, use a automação e verifique se os clusters não são excluídos
antes da conclusão de todos os jobs em execução.
Não há espaço livre no dispositivo
O Dataproc grava dados de HDFS e temporários no disco. Essa mensagem de erro indica que o cluster foi criado com
espaço em disco insuficiente. Para analisar e evitar esse erro:
Verifique o tamanho do disco principal do cluster listado na guia Configuração na página Detalhes do cluster no console do Google Cloud .
O tamanho mínimo recomendado do disco é 1000 GB para clusters que usam o tipo de máquina n1-standard-4 e 2 TB para clusters que usam o tipo de máquina n1-standard-32.
Se o tamanho do disco do cluster for menor que o recomendado, recrie o cluster com pelo menos o tamanho recomendado.
Se o tamanho do disco for o recomendado ou maior, use
SSH para se conectar à VM principal do cluster
e execute df -h na VM principal para verificar a utilização do disco
e determinar se é necessário mais espaço em disco.
[[["Fácil de entender","easyToUnderstand","thumb-up"],["Meu problema foi resolvido","solvedMyProblem","thumb-up"],["Outro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Informações incorretas ou exemplo de código","incorrectInformationOrSampleCode","thumb-down"],["Não contém as informações/amostras de que eu preciso","missingTheInformationSamplesINeed","thumb-down"],["Problema na tradução","translationIssue","thumb-down"],["Outro","otherDown","thumb-down"]],["Última atualização 2025-08-22 UTC."],[[["\u003cp\u003eThis page helps users monitor and debug Dataproc jobs, offering guidance on understanding job error messages.\u003c/p\u003e\n"],["\u003cp\u003eUsers can analyze and debug Dataproc jobs using the Google Cloud CLI, Dataproc REST API, and Google Cloud console.\u003c/p\u003e\n"],["\u003cp\u003eJob logs can be accessed in Logging, and the specific logs of the dataproc agent can be selected via the Logs Explorer using specific filters.\u003c/p\u003e\n"],["\u003cp\u003eIf a job fails, users can access job logs in Logging, or use the details of a job to check who submitted it.\u003c/p\u003e\n"],["\u003cp\u003eCommon error messages, such as "Task was not acquired," "No agent on master node(s) found to be active," and "Task not found," are explained, with troubleshooting steps provided for each.\u003c/p\u003e\n"]]],[],null,["# Troubleshoot Dataproc jobs\n\nThis page provides information to help you monitor and debug Dataproc\njobs and understand Dataproc job error messages.\n\nJob monitoring and debugging\n----------------------------\n\nUse the Google Cloud CLI, Dataproc REST API, and Google Cloud console\nto analyze and debug Dataproc jobs. \n\n### gcloud CLI\n\nTo examine a running job's status: \n\n```\ngcloud dataproc jobs describe job-id \\\n --region=region\n```\n\nTo view job driver output, see\n[View job output](/dataproc/docs/guides/dataproc-job-output#view_job_output).\n\n### REST API\n\nCall [jobs.get](/dataproc/docs/reference/rest/v1/projects.regions.jobs/get)\nto examine a job's\n[JobStatus.State](/dataproc/docs/reference/rest/v1/projects.regions.jobs#State),\n[JobStatus.Substate](/dataproc/docs/reference/rest/v1/projects.regions.jobs#substate),\n[JobStatus.details](/dataproc/docs/reference/rest/v1/projects.regions.jobs#jobstatus), and\n[YarnApplication](/dataproc/docs/reference/rest/v1/projects.regions.jobs#YarnApplication)\nfields.\n\n### Console\n\nTo view job driver output, see\n[View job output](/dataproc/docs/guides/dataproc-job-output#view_job_output).\n\nTo view the dataproc agent log in [Logging](https://console.cloud.google.com/logs/query),\nselect **Dataproc Cluster→Cluster Name→Cluster UUID**\nfrom the Logs Explorer cluster selector.\n\nThen use the logs selector to select `google.dataproc.agent` logs.\n\n### View job logs in Logging\n\nIf a job fails, you can\n[access job logs in Logging](/dataproc/docs/guides/logging#accessing_job_logs_in).\n\n### Determine who submitted a job\n\n| **Google Cloud console support** \n| The ability to see who submitted a job in the Google Cloud console will be added in a future Dataproc release.\n\n[Looking up the details of a job](/dataproc/docs/guides/dataproc-job-output#view_job_output) will show who submitted that job in the `submittedBy`\nfield. For example, this job output shows `user@domain` submitted the example\njob to a cluster. \n\n```\n...\nplacement:\n clusterName: cluster-name\n clusterUuid: cluster-uuid\nreference:\n jobId: job-uuid\n projectId: project\nstatus:\n state: DONE\n stateStartTime: '2018-11-01T00:53:37.599Z'\nstatusHistory:\n- state: PENDING\n stateStartTime: '2018-11-01T00:33:41.387Z'\n- state: SETUP_DONE\n stateStartTime: '2018-11-01T00:33:41.765Z'\n- details: Agent reported job success\n state: RUNNING\n stateStartTime: '2018-11-01T00:33:42.146Z'\nsubmittedBy: \u003cvar translate=\"no\"\u003euser@domain\u003c/var\u003e\n```\n\n### Error messages\n\n#### Task was not acquired\n\nThis indicates that the Dataproc\nagent on the master node was unable to acquire the task from the control plane.\nThis often happens due to out-of-memory (OOM) or network issues.\nIf the job ran successfully previously and you have no\nt changed network\nconfiguration settings, OOM is the most likely cause, often the\nresult of the submission of many concurrently running jobs or\njobs whose drivers consume significant memory (for example, jobs that\nload large datasets into memory).\n\n#### No agent on master node(s) found to be active\n\nThis indicates that the Dataproc agent on the master node is not\nactive and cannot accept new jobs. This often happens due to out-of-memory (OOM)\nor network issues, or if the master node VM is unhealthy. If the job ran successfully\npreviously and you have not changed\nnetwork configuration settings, OOM is the most likely cause, which often\nresults from the submission of many concurrently running jobs or jobs\nwhose drivers consume significant memory (jobs that load\nlarge datasets into memory).\n\nTo help resolve the problem, you can try the following actions:\n\n- Restart the job.\n- [Connect using SSH to the cluster master node](/dataproc/docs/concepts/accessing/ssh), and then determine what job or other resource is using the most memory.\n- If you can't sign in to the master node, you can check the\n [serial port (console) logs](/compute/docs/troubleshooting/viewing-serial-port-output).\n\n- [Generate a diagnostic bundle](/dataproc/docs/support/diagnose-clusters),\n which contains the syslog and other data.\n\n#### Task not found\n\nThis error indicates that the cluster was deleted while a job was running.\nYou can perform the following actions to identify the principal that\nperformed the deletion and confirm that cluster deletion occurred when a\njob was running:\n\n- View the Dataproc audit logs to identify\n the principal that performed the\n [delete operation](/dataproc/docs/guides/audit-logging#google.cloud.dataproc.v1.ClusterController.DeleteCluster).\n\n- Use Logging or the gcloud CLI to\n check that the YARN application's last known state was\n [RUNNING](https://hadoop.apache.org/docs/current/api//org/apache/hadoop/yarn/api/records/YarnApplicationState.html#RUNNING):\n\n - Use the following filter in Logging:\n\n ```\n resource.type=\"cloud_dataproc_cluster\"\n resource.labels.cluster_name=\"CLUSTER_NAME\"\n resource.labels.cluster_uuid=\"CLUSTER_UUID\"\n \"YARN_APPLICATION_ID State change from\"\n ```\n - Run `gcloud dataproc jobs describe job-id --region=`\u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e, then check `yarnApplications: \u003e `\u003cvar translate=\"no\"\u003eSTATE\u003c/var\u003e in the output.\n- If the principal that deleted the cluster is the\n [Dataproc service agent service account](/dataproc/docs/concepts/iam/dataproc-principals#service_agent_control_plane_identity),\n check if the cluster was configured with an\n [auto-delete duration](/dataproc/docs/concepts/configuring-clusters/scheduled-deletion)\n that is less than the job duration.\n\nTo avoid `Task not found` errors, use automation to make sure that clusters are not deleted\nbefore all running jobs have completed.\n\n#### No space left on device\n\nDataproc writes HDFS and scratch\ndata to disk. This error message indicates that the cluster was created with\ninsufficient disk space. To analyze and avoid this error:\n\n- Check the cluster primary disk size listed under the **Configuration** tab on\n the **Cluster details** page in the Google Cloud console.\n The recommended minimum disk size is `1000 GB` for clusters using the\n `n1-standard-4` machine-type and `2 TB` for clusters using the `n1-standard-32`\n machine-type.\n\n- If the cluster disk size is less than the recommended size,\n [recreate the cluster](/dataproc/docs/guides/recreate-cluster)\n with at least the recommended disk size.\n\n- If disk size is the recommended size or greater, use\n [SSH to connect to the cluster](/dataproc/docs/concepts/accessing/ssh)\n master VM, and then run `df -h` on the master VM to check disk utilization\n to determine if additional disk space is needed.\n\n- [Collect and check disk metrics](/dataproc/docs/guides/dataproc-metrics#custom_metric_collection)."]]