Migrating Hadoop and Spark Clusters to Google Cloud

Bring your Apache Hadoop and Apache Spark clusters to Google Cloud in a way that works for your company.

Many options for many scenarios

Migrating Hadoop and Spark clusters to the cloud can deliver significant benefits, but choices that don’t address existing on-premises Hadoop workloads only make life harder for already strained IT resources. Google Cloud works with customers to help them build Hadoop migration plans designed to both fit their current needs as well as help them look to the future. From lifting and shifting onto virtual machines to exploring new services that take advantage of cloud scale and efficiency, Google Cloud offers a variety of solutions to help customers bring their Hadoop and Spark workloads to the cloud in a way that is tailored to their success.

Lift and shift Hadoop clusters

Rapidly migrate your existing Hadoop and Spark deployment as is to Google Cloud without re-architecting. Take advantage of Google Cloud’s fast and flexible compute infrastructure as a service, Compute Engine, to provision your ideal Hadoop cluster and use your existing distribution. Let your Hadoop administrators focus on cluster usefulness, not server procurement and solving hardware issues.

Optimize for cloud scale and efficiency

Drive down Hadoop costs by migrating to Google Cloud’s managed Hadoop and Spark service, Cloud Dataproc. Explore new approaches for processing data in a Hadoop ecosystem by separating storage and compute using Cloud Storage as well as exploring the practice of on-demand ephemeral clusters.

Modernize your data processing pipeline

Reduce your Hadoop operational overhead by considering cloud-managed services to remove complexity from how you process data. For streaming analytics, explore using a serverless option like Cloud Dataflow to handle real-time streaming data needs. For Hadoop use cases focused on analytics and that use SQL compatible solutions like Apache Hive, consider BigQuery, Google’s enterprise-scale serverless cloud data warehouse.

Mapping on-premises Hadoop workloads to Google Cloud products

"With Cloud Dataproc, we implemented autoscaling which enabled us to increase or reduce the clusters easily depending on the size of the project. On top of that, we also used preemptible nodes for parts of the clusters, which helped us with efficiency of costs."

Orit Yaron, VP of Cloud Platform, Outbrain

Read story