Stay organized with collections
Save and categorize content based on your preferences.
You can install additional components like Solr when you create a Dataproc
cluster using the
Optional components
feature. This page describes the Solr component.
The Apache Solr
component is an open source enterprise search platform. The Solr server and
Web UI are available on port 8983 on the cluster's master node(s).
Persisting Solr files: By default, Solr writes and reads the index and
transaction log files in
HDFS.
To persist Solr files, use a Cloud Storage path as the Solr home
directory by setting the dataproc:solr.gcs.pathcluster property when you install the component.
Install the component
Install the component when you create a Dataproc cluster.
Components can be added to clusters created with
Dataproc version 1.3
and later.
To create a Dataproc cluster that includes the Solr component,
use the
gcloud dataproc clusters createcluster-name
command with the --optional-components flag. The sample command below uses the optional properties
flag to set a Cloud Storage path as the Solr home directory.
Add the
--properties="dataproc:solr.gcs.path=gs://bucket-name/"
cluster property to the gcloud dataproc clusters create
command to set a Cloud Storage bucket where Solr documents will be stored
(Solr home directory).
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eThe Apache Solr component, an open-source search platform, can be installed on Dataproc clusters, with the Solr server and Web UI accessible via port 8983 on the master node(s).\u003c/p\u003e\n"],["\u003cp\u003eSolr files can be persisted in Cloud Storage by setting the \u003ccode\u003edataproc:solr.gcs.path\u003c/code\u003e cluster property to a Cloud Storage path during cluster creation.\u003c/p\u003e\n"],["\u003cp\u003eThe Solr component is installed on the Dataproc cluster using the \u003ccode\u003egcloud dataproc clusters create\u003c/code\u003e command with the \u003ccode\u003e--optional-components=SOLR\u003c/code\u003e flag, and the \u003ccode\u003e--enable-component-gateway\u003c/code\u003e to access the component gateway.\u003c/p\u003e\n"],["\u003cp\u003eThe Solr component can be added via REST API with \u003ccode\u003eSoftwareConfig.Component\u003c/code\u003e, and by enabling \u003ccode\u003eEndpointConfig.enableHttpPortAccess\u003c/code\u003e to connect to the Solr Web UI using the Component Gateway.\u003c/p\u003e\n"],["\u003cp\u003eWhen creating a Dataproc cluster using the Google Cloud console, the Solr component and Component Gateway can be enabled within the Components section of the cluster setup.\u003c/p\u003e\n"]]],[],null,["# Dataproc optional Solr component\n\nYou can install additional components like Solr when you create a Dataproc\ncluster using the\n[Optional components](/dataproc/docs/concepts/components/overview#available_optional_components)\nfeature. This page describes the Solr component.\n\nThe [Apache Solr](https://lucene.apache.org/solr/)\ncomponent is an open source enterprise search platform. The Solr server and\nWeb UI are available on port `8983` on the cluster's master node(s).\n\n**Persisting Solr files:** By default, Solr writes and reads the index and\ntransaction log files in\n[HDFS](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html).\nTo persist Solr files, use a Cloud Storage path as the Solr home\ndirectory by setting the `dataproc:solr.gcs.path`\n[cluster property](/dataproc/docs/concepts/configuring-clusters/cluster-properties#dataproc-properties) when you [install the component](#install_the_component).\n\nInstall the component\n---------------------\n\nInstall the component when you create a Dataproc cluster.\nComponents can be added to clusters created with\nDataproc [version 1.3](/dataproc/docs/concepts/versioning/dataproc-release-1.3)\nand later.\n\nSee\n[Supported Dataproc versions](/dataproc/docs/concepts/versioning/dataproc-versions#supported_cloud_dataproc_versions)\nfor the component version included in each Dataproc image release. \n\n### gcloud command\n\nTo create a Dataproc cluster that includes the Solr component,\nuse the\n[gcloud dataproc clusters create](/sdk/gcloud/reference/dataproc/clusters/create) \u003cvar translate=\"no\"\u003ecluster-name\u003c/var\u003e\ncommand with the `--optional-components` flag. The sample command below uses the optional `properties`\nflag to set a Cloud Storage path as the Solr home directory.\nWhen creating the cluster, use [gcloud dataproc clusters create](/sdk/gcloud/reference/dataproc/clusters/create) command with the `--enable-component-gateway` flag, as shown below, to enable connecting to the Solr Web UI using the [Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways). \n\n```\ngcloud dataproc clusters create cluster-name \\\n --region=region \\\n --optional-components=SOLR \\\n --enable-component-gateway \\\n ... other flags\n```\nAdd the `--properties=\"dataproc:solr.gcs.path=gs://`\u003cvar translate=\"no\"\u003ebucket-name/\u003c/var\u003e`\"` [cluster property](/dataproc/docs/concepts/configuring-clusters/cluster-properties#service_properties) to the `gcloud dataproc clusters create` command to set a Cloud Storage bucket where Solr documents will be stored (Solr home directory).\n\n### REST API\n\nThe Solr component can be specified through the Dataproc API using\n[SoftwareConfig.Component](/dataproc/docs/reference/rest/v1/ClusterConfig#Component)\nas part of a\n[clusters.create](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create)\nrequest.\n| As part of your `clusters.create` request, you can:\n|\n| 1. Set the [EndpointConfig.enableHttpPortAccess](/dataproc/docs/reference/rest/v1/ClusterConfig#EndpointConfig.description) property to `true` to enable connecting to the Solr Web UI using the [Component Gateway](/dataproc/docs/concepts/accessing/dataproc-gateways).\n| 2. Set the `\"dataproc:solr.gcs.path=gs://`\u003cvar translate=\"no\"\u003ebucket-name\u003c/var\u003e`\"` [cluster property](/dataproc/docs/concepts/configuring-clusters/cluster-properties#service_properties) in the [SoftwareConfig.Component.properties](/static/dataproc/docs/reference/rest/v1/ClusterConfig#SoftwareConfig.FIELDS.properties) field to set a Cloud Storage bucket where Solr documents will be stored (Solr home directory).\n\n### Console\n\n1. Enable the component and component gateway.\n - In the Google Cloud console, open the Dataproc [Create a cluster](https://console.cloud.google.com/dataproc/clustersAdd) page. The Set up cluster panel is selected.\n - In the Components section:\n - Under Optional components, select Solr and other optional components to install on your cluster.\n - Under Component Gateway, select Enable component gateway (see [Viewing and Accessing Component Gateway URLs](/dataproc/docs/concepts/accessing/dataproc-gateways#viewing_and_accessing_component_gateway_urls))."]]