[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-27。"],[[["\u003cp\u003eDataproc jobs do not automatically restart upon failure by default, but they can be configured to do so with optional settings.\u003c/p\u003e\n"],["\u003cp\u003eRestartable jobs are beneficial for handling common failures like out-of-memory errors and virtual machine reboots, and are especially useful for long-running or streaming jobs.\u003c/p\u003e\n"],["\u003cp\u003eYou can set a maximum number of retries per hour (up to 10) and/or a maximum total number of retries (up to 240) for a job.\u003c/p\u003e\n"],["\u003cp\u003eA job is considered failed if it exceeds the set retry limits or if the driver terminates with a non-zero code more than four times in ten minutes.\u003c/p\u003e\n"],["\u003cp\u003eJobs should be designed to gracefully handle restarts, especially considering potential issues like pre-existing directories.\u003c/p\u003e\n"]]],[],null,["# Restartable jobs\n\nBy default, Dataproc jobs won't automatically restart on failure.\nBy using optional settings, you can set jobs to restart on failure. When you set\na job to restart, you specify the maximum number of retries per hour\n(max value is 10 retries per hour) or the maximum number of total retries\n(max value is 240 total retries) or both.\n\nRestarting jobs mitigates common types of job failure, including out-of-memory issues\nand unexpected Compute Engine virtual machine reboots. Restartable jobs\nare particularly useful for long-running and streaming jobs. For example, you\ncan restart Spark streaming jobs running on Dataproc clusters to\nensure that the streaming jobs are resilient.\n\nRestartable job semantics\n-------------------------\n\nThe following semantics apply to reporting the success or failure of jobs:\n\n- A job is reported **successful** if the driver terminates with code `0`.\n- A job is reported **failed** if:\n - The driver terminates with a non-zero code more than 4 times in 10 minutes.\n - The driver terminates with a non-zero code, and has exceeded the `max_failures_per_hour` or the`max_failures_total` setting.\n- A job will be **restarted** if the driver exits with a non-zero code, is not thrashing, and is within the `max_failures_per_hour` and `max_failures_total` settings.\n\nJob design considerations\n-------------------------\n\n- Design your jobs to gracefully handle restarting. For example, if your job writes to a directory, your job accommodate the possibility that the directory will exist when the job is restarted.\n- Apache Spark streaming jobs that checkpoint can be restarted after failure, but they won't report Yarn status.\n\nCreate and use restartable jobs\n-------------------------------\n\nYou can specify the maximum number of times a job can be restarted per hour\nand the maximum number of total retries when submitting the job\nthrough the gcloud CLI\n[gcloud](/sdk/gcloud/reference) command-line tool, the\n[Dataproc REST API](/dataproc/docs/reference/rest), or the\n[Google Cloud console](https://console.cloud.google.com/).\n\n**Example:** If you want to allow your job to retry up to 10 times, but no more than\n5 times in one hour, set `max-failures-total` to 10 and `max-failures-per-hour` to 5. \n\n### gcloud\n\n\nSpecify the maximum number of times a job can be restarted per hour\n(the max value is 10 retries per hour) or the maximum number of total\nretries (max value is 240 total retries) or both, using the\n`--max-failures-per-hour` and `--max-failures-total` flags, respectively. \n\n```\ngcloud dataproc jobs submit job type \\\n --region=region \\\n --max-failures-per-hour=number \\\n --max-failures-total=number \\\n ... other args\n```\n\n\u003cbr /\u003e\n\n### REST API\n\n\nSpecify the maximum number of times a job can be restarted per hour\n(max value is 10 retries per hour) or the maximum number of total\nretries (max value is 240 total retries) or both, by setting the\n[Job.JobScheduling](/dataproc/docs/reference/rest/v1/JobScheduling)\n`maxFailuresPerHour`\nand/or `maxFailuresTotal`fields, respectively.\n\n**Example** \n\n```\nPOST /v1/projects/project-id/regions/us-central1/jobs:submit/\n{\n\"projectId\": \"project-id\",\n\"job\": {\n\"placement\": {\n \"clusterName\": \"example-cluster\"\n},\n\"reference\": {\n \"jobId\": \"cea7ae0b....\"\n},\n\"sparkJob\": {\n \"args\": [\n \"1000\"\n ],\n \"mainClass\": \"org.apache.spark.examples.SparkPi\",\n \"jarFileUris\": [\n \"file:///usr/lib/spark/examples/jars/spark-examples.jar\"\n ]\n},\n\"scheduling\": {\n \"maxFailuresPerHour\": 5\n \"maxFailuresTotal\": 10\n}\n}\n}\n```\n| To examine the JSON body of a Dataproc API request or response, construct the request or select the resource to list from the appropriate Dataproc page of the Google Cloud console, then click **Equivalent REST** at the bottom of the page.\n\n\u003cbr /\u003e\n\n### Console\n\n\nYou can submit restartable jobs by specifying the **max restarts per hour**\non the Dataproc [Submit a job](https://console.cloud.google.com/dataproc/jobs/jobsSubmit)\npage (the maximum value is 10 times per hour). The max restarts total\nsetting isn't available on the Google Cloud console."]]