BigQuery Engine for Apache Flink overview

BigQuery Engine for Apache Flink is a Google Cloud service that helps you run Apache Flink. You can focus on building streaming applications rather than on managing the infrastructure.

BigQuery Engine for Apache Flink gives you the advantages of Apache Flink, including state management, event-time processing semantics, and exactly-once semantics, while being fully integrated into Google Cloud. The Google Cloud integration provides additional features, such as autoscaling and monitoring, and makes it easier to connect to other Google Cloud services, such as BigQuery and Google Cloud Managed Service for Apache Kafka.

For more information about Apache Flink, see the Apache Flink website.

Advantages of BigQuery Engine for Apache Flink

BigQuery Engine for Apache Flink helps you run secure, scalable Apache Flink pipelines on Google Cloud. BigQuery Engine for Apache Flink offers the following features:

  • Runs open source Apache Flink compatible with your existing deployments. BigQuery Engine for Apache Flink uses the Apache Flink API and ecosystem.

  • Lets you use open source software in a fully managed environment. You can maintain your open source workflows without having to maintain infrastructure.

  • Automates scaling, upgrades, and maintenance, which reduces operational overhead to let you focus on your applications.

  • Increases pipeline observability. BigQuery Engine for Apache Flink is fully integrated with Cloud Logging and Cloud Monitoring, letting you monitor your jobs in the Google Cloud console.

  • Simplifies pipeline management by integrating with other Google Cloud services, such as Google Cloud Managed Service for Apache Kafka, BigQuery, Cloud Monitoring, and Cloud Logging.

Managed

BigQuery Engine for Apache Flink is a managed service, which means that Google manages the resources needed to run jobs and deployments. When you run a BigQuery Engine for Apache Flink job, the service allocates the required resources to execute the pipeline. You don't need to provision or manage the compute resources. For more information about costs, see BigQuery Engine for Apache Flink pricing.

Scalable

BigQuery Engine for Apache Flink is designed to support batch and streaming pipelines at large scale. Data is processed in parallel, so the work is distributed across multiple task slots. BigQuery Engine for Apache Flink can autoscale by provisioning extra task slots, or by shutting down task slots when fewer are needed. It also optimizes the work based on the characteristics of the pipeline.

Observable

You can monitor the status of your BigQuery Engine for Apache Flink jobs through the monitoring interface in the Google Cloud console. Use the Google Cloud console to track metrics about the running of your jobs.

How it works

BigQuery Engine for Apache Flink runs your Apache Flink deployments and jobs.

A deployment is a dedicated and isolated environment where your Apache Flink jobs run. The deployment contains the compute resources needed to run jobs and the configurations for managing jobs.

A job runs your Apache Flink pipeline. You can either run jobs within an existing deployment, or you can create an on-demand job.

BigQuery Engine for Apache Flink uploads your executable code and dependencies and creates a BigQuery Engine for Apache Flink deployment or job. It then allocates task slots to run the pipeline. Task slots are a fixed subset of resources, such as memory.

The following diagram shows a typical ETL and BI solution using BigQuery Engine for Apache Flink and other Google Cloud services:

Diagram of an ETL and BI solution that uses BigQuery Engine for Apache Flink

This diagram shows the following stages:

  1. Google Cloud Managed Service for Apache Kafka ingests data from an external system.
  2. BigQuery Engine for Apache Flink reads the data from Google Cloud Managed Service for Apache Kafka and writes it to BigQuery. During this stage, BigQuery Engine for Apache Flink might transform or aggregate the data.
  3. BigQuery acts as a data warehouse, allowing data analysts to run ad hoc queries on the data.
  4. Looker provides real-time BI insights from the data stored in BigQuery.

For more information, see Apache Flink architecture and Deployment in the Apache Flink documentation.

What's next