Dataproc provides a gcloud CLI dataproc clusters diagnose
command to help you troubleshoot Dataproc on GKE cluster and job issues. This
command gathers and archives cluster-related configuration files, logs, and outputs
into an archive file. and then uploads the archive to the Cloud Storage
staging bucket you specified when you
created your Dataproc on GKE cluster.
Diagnose archive file
The following tables list metrics and other information included in the
dataproc clusters diagnose
command archive file.
Item |
Archive location |
GKE node metrics
where virtual Dataproc on GKE pods run: |
|
/system/NODE_NAME.json
|
Network metrics and file system status of running pods: |
- CPU usage
- Memory usage
- Network status
- Filesystem status
|
/system/POD_NAME.json
|
Item |
Archive location |
Cluster configmap |
/conf/configmap
|
Kubernetes deployment |
/conf/deployment |
Role Based Access Control (RBAC) |
/conf/role
/conf/rolebind
/conf/serviceaccount
|
Logs
Item |
Archive location |
Agent log |
/logs/agent.log |
Spark engine log |
/logs/sparkengine.log |
Spark driver running and completed job logs over the last 24 hours |
/logs/DRIVER_ID |
Item |
Archive location |
JobAttempt object |
/jobattempts |
Kubernetes Pod object |
/pods |