Class SparkJob (2.2.0)

SparkJob(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A Dataproc job for running Apache Spark <http://spark.apache.org/>__ applications on YARN.

Attributes

NameDescription
main_jar_file_uri str
The HCFS URI of the jar file that contains the main class.
main_class str
The name of the driver's main class. The jar file that contains the class must be in the default CLASSPATH or specified in jar_file_uris.
args Sequence[str]
Optional. The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
jar_file_uris Sequence[str]
Optional. HCFS URIs of jar files to add to the CLASSPATHs of the Spark driver and tasks.
file_uris Sequence[str]
Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
archive_uris Sequence[str]
Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
properties Sequence[.gcd_jobs.SparkJob.PropertiesEntry]
Optional. A mapping of property names to values, used to configure Spark. Properties that conflict with values set by the Dataproc API may be overwritten. Can include properties set in /etc/spark/conf/spark-defaults.conf and classes in user code.
logging_config .gcd_jobs.LoggingConfig
Optional. The runtime log config for job execution.

Classes

PropertiesEntry

PropertiesEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)

The abstract base class for a message.

Parameters
NameDescription
kwargs dict

Keys and values corresponding to the fields of the message.

mapping Union[dict, .Message]

A dictionary or message to be used to determine the values for this message.

ignore_unknown_fields Optional(bool)

If True, do not raise errors for unknown fields. Only applied if mapping is a mapping type or there are keyword parameters.