Deploy log streaming from Google Cloud to Splunk

Last reviewed 2023-11-16 UTC

This document describes how you deploy an export mechanism to stream logs from Google Cloud resources to Splunk. It assumes that you've already read the corresponding reference architecture for this use case.

These instructions are intended for operations and security administrators who want to stream logs from Google Cloud to Splunk. You must be familiar with Splunk and the Splunk HTTP Event Collector (HEC) when using these instructions for IT operations or security use cases. Although not required, familiarity with Dataflow pipelines, Pub/Sub, Cloud Logging, Identity and Access Management, and Cloud Storage is useful for this deployment.

To automate the deployment steps in this reference architecture using infrastructure as code (IaC), see the terraform-splunk-log-export GitHub repository.

Architecture

The following diagram shows the reference architecture and demonstrates how log data flows from Google Cloud to Splunk.

Flow of logs from Google Cloud to
Splunk.

As shown in the diagram, Cloud Logging collects the logs into an organization-level log sink and sends the logs to Pub/Sub. The Pub/Sub service creates a single topic and subscription for the logs and forwards the logs to the main Dataflow pipeline. The main Dataflow pipeline is a Pub/Sub to Splunk streaming pipeline which pulls logs from the Pub/Sub subscription and delivers them to Splunk. Parallel to the primary Dataflow pipeline, the secondary Dataflow pipeline is a Pub/Sub to Pub/Sub streaming pipeline to replay messages if a delivery fails. At the end of the process, Splunk Enterprise or Splunk Cloud Platform acts as an HEC endpoint and receives the logs for further analysis. For more details, see the Architecture section of the reference architecture.

To deploy this reference architecture, you perform the following tasks:

Before you begin

Complete the following steps to set up an environment for your Google Cloud to Splunk reference architecture:

Bring up a project, enable billing, and activate the APIs

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Google Cloud project.

  3. Enable the Cloud Monitoring API, Secret Manager, Compute Engine, Pub/Sub, and Dataflow APIs.

    Enable the APIs

Grant IAM roles

In the Google Cloud console, ensure that you have the following Identity and Access Management (IAM) permissions for organization and project resources. For more information, see Granting, changing, and revoking access to resources.

Permissions Predefined roles Resource
  • logging.sinks.create
  • logging.sinks.get
  • logging.sinks.update
  • Logs Configuration Writer (roles/logging.configWriter)

Organization

  • compute.networks.*
  • compute.routers.*
  • compute.firewalls.*
  • networkservices.*
  • Compute Network Admin (roles/compute.networkAdmin)
  • Compute Security Admin (roles/compute.securityAdmin)

Project

  • secretmanager.*
  • Secret Manager Admin (roles/secretmanager.admin)

Project

If the predefined IAM roles don't include enough permissions for you to perform your duties, create a custom role. A custom role gives you the access that you need, while also helping you to follow the principle of least privilege.

Set up your environment

  1. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

  2. Set the project for your active Cloud Shell session:

    gcloud config set project PROJECT_ID
    

    Replace PROJECT_ID with your project ID.

Set up secure networking

In this step, you set up secure networking before processing and exporting logs to Splunk Enterprise.

  1. Create a VPC network and a subnet:

    gcloud compute networks create NETWORK_NAME --subnet-mode=custom
    gcloud compute networks subnets create SUBNET_NAME \
    --network=NETWORK_NAME \
    --region=REGION \
    --range=192.168.1.0/24
    

    Replace the following:

    • NETWORK_NAME: the name for your network
    • SUBNET_NAME: the name for your subnet
    • REGION: the region that you want to use for this network
  2. Create a firewall rule for Dataflow worker virtual machines (VMs) to communicate with one another:

    gcloud compute firewall-rules create allow-internal-dataflow \
    --network=NETWORK_NAME \
    --action=allow \
    --direction=ingress \
    --target-tags=dataflow \
    --source-tags=dataflow \
    --priority=0 \
    --rules=tcp:12345-12346
    

    This rule allows internal traffic between Dataflow VMs which use TCP ports 12345-12346. Also, the Dataflow service sets the dataflow tag.

  3. Create a Cloud NAT gateway:

    gcloud compute routers create nat-router \
    --network=NETWORK_NAME \
    --region=REGION
    
    gcloud compute routers nats create nat-config \
    --router=nat-router \
    --nat-custom-subnet-ip-ranges=SUBNET_NAME \
    --auto-allocate-nat-external-ips \
    --region=REGION
    
  4. Enable Private Google Access on the subnet:

    gcloud compute networks subnets update SUBNET_NAME \
    --enable-private-ip-google-access \
    --region=REGION
    

Create a log sink

In this section, you create the organization-wide log sink and its Pub/Sub destination, along with the necessary permissions.

  1. In Cloud Shell, create a Pub/Sub topic and associated subscription as your new log sink destination:

    gcloud pubsub topics create INPUT_TOPIC_NAME
    gcloud pubsub subscriptions create \
    --topic INPUT_TOPIC_NAME INPUT_SUBSCRIPTION_NAME
    

    Replace the following:

    • INPUT_TOPIC_NAME: the name for the Pub/Sub topic to be used as the log sink destination
    • INPUT_SUBSCRIPTION_NAME: the name for the Pub/Sub subscription to the log sink destination
  2. Create the organization log sink:

    gcloud logging sinks create ORGANIZATION_SINK_NAME \
    pubsub.googleapis.com/projects/PROJECT_ID/topics/INPUT_TOPIC_NAME \
    --organization=ORGANIZATION_ID \
    --include-children \
    --log-filter='NOT logName:projects/PROJECT_ID/logs/dataflow.googleapis.com'
    

    Replace the following:

    • ORGANIZATION_SINK_NAME: the name of your organization
    • ORGANIZATION_ID: your organization ID

    The command consists of the following flags:

    • The --organization flag specifies that this is an organization-level log sink.
    • The --include-children flag is required and ensures that the organization-level log sink includes all logs across all subfolders and projects.
    • The --log-filter flag specifies the logs to be routed. In this example, you exclude Dataflow operations logs specifically for the project PROJECT_ID, because the log export Dataflow pipeline generates more logs itself as it processes logs. The filter prevents the pipeline from exporting its own logs, avoiding a potentially exponential cycle. The output includes a service account in the form of o#####-####@gcp-sa-logging.iam.gserviceaccount.com.
  3. Grant the Pub/Sub Publisher IAM role to the log sink service account on the Pub/Sub topic INPUT_TOPIC_NAME. This role allows the log sink service account to publish messages on the topic.

    gcloud pubsub topics add-iam-policy-binding INPUT_TOPIC_NAME \
    --member=serviceAccount:LOG_SINK_SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com \
    --role=roles/pubsub.publisher
    

    Replace LOG_SINK_SERVICE_ACCOUNT with the name of the service account for your log sink.

Create a dead-letter topic

To prevent potential data loss that results when a message fails to be delivered, you should create a Pub/Sub dead-letter topic and corresponding subscription. The failed message is stored in the dead-letter topic until an operator or site reliability engineer can investigate and correct the failure. For more information, see Replay failed messages section of the reference architecture.

  • In Cloud Shell, create a Pub/Sub dead-letter topic and subscription to prevent data loss by storing any undeliverable messages:

    gcloud pubsub topics create DEAD_LETTER_TOPIC_NAME
    gcloud pubsub subscriptions create --topic DEAD_LETTER_TOPIC_NAME DEAD_LETTER_SUBSCRIPTION_NAME
    

    Replace the following:

    • DEAD_LETTER_TOPIC_NAME: the name for the Pub/Sub topic that will be the dead-letter topic
    • DEAD_LETTER_SUBSCRIPTION_NAME: the name for the Pub/Sub subscription for the dead-letter topic

Set up a Splunk HEC endpoint

In the following procedures, you set up a Splunk HEC endpoint and store the newly created HEC token as a secret in Secret Manager. When you deploy the Splunk Dataflow pipeline, you need to supply both the endpoint URL and the token.

Configure the Splunk HEC

  1. If you don't already have a Splunk HEC endpoint, see the Splunk documentation to learn how to configure a Splunk HEC. Splunk HEC runs on the Splunk Cloud Platform service or on your own Splunk Enterprise instance.
  2. In Splunk, after you create a Splunk HEC token, copy the token value.
  3. In Cloud Shell, save the Splunk HEC token value in a temporary file named splunk-hec-token-plaintext.txt.

Store the Splunk HEC token in Secret Manager

In this step, you create a secret and a single underlying secret version in which to store the Splunk HEC token value.

  1. In Cloud Shell, create a secret to contain your Splunk HEC token:

    gcloud secrets create hec-token \
     --replication-policy="automatic"
    

    For more information on the replication policies for secrets, see Choose a replication policy.

  2. Add the token as a secret version using the contents of the file splunk-hec-token-plaintext.txt:

    gcloud secrets versions add hec-token \
     --data-file="./splunk-hec-token-plaintext.txt"
    
  3. Delete the splunk-hec-token-plaintext.txt file, as it is no longer needed.

Configure the Dataflow pipeline capacity

The following table summarizes the recommended general best practices for configuring the Dataflow pipeline capacity settings:

Setting General best practice

--worker-machine-type flag

Set to baseline machine size n1-standard-4 for the best performance to cost ratio

--max-workers flag

Set to the maximum number of workers needed to handle the expected peak EPS per your calculations

parallelism parameter

Set to 2 x vCPUs/worker x the maximum number of workers to maximize the number of parallel Splunk HEC connections

batchCount

parameter

Set to 10-50 events/request for logs, provided that the max buffering delay of two seconds is acceptable

Remember to use your own unique values and calculations when you deploy this reference architecture in your environment.

  1. Set the values for machine type and machine count. To calculate values appropriate for your cloud environment, see Machine type and Machine count sections of the reference architecture.

    DATAFLOW_MACHINE_TYPE
    DATAFLOW_MACHINE_COUNT
    
  2. Set the values for Dataflow parallelism and batch count. To calculate values appropriate for your cloud environment, see the Parallelism and Batch count sections of the reference architecture.

    JOB_PARALLELISM
    JOB_BATCH_COUNT
    

For more information on how to calculate Dataflow pipeline capacity parameters, see the Performance and cost optimization design considerations section of the reference architecture.

Export logs by using the Dataflow pipeline

In this section, you deploy the Dataflow pipeline with the following steps:

The pipeline delivers Google Cloud log messages to the Splunk HEC.

Create a Cloud Storage bucket and Dataflow worker service account

  1. In Cloud Shell, create a new Cloud Storage bucket with a uniform bucket-level access setting:

    gsutil mb -b on gs://PROJECT_ID-dataflow/
    

    The Cloud Storage bucket that you just created is where the Dataflow job stages temporary files.

  2. In Cloud Shell, create a service account for your Dataflow workers:

    gcloud iam service-accounts create WORKER_SERVICE_ACCOUNT \
       --description="Worker service account to run Splunk Dataflow jobs" \
       --display-name="Splunk Dataflow Worker SA"
    

    Replace WORKER_SERVICE_ACCOUNT with the name that you want to use for the Dataflow worker service account.

Grant roles and access to the Dataflow worker service account

In this section, grant the required roles to the Dataflow worker service account as shown in the following table.

Role Path Purpose
Dataflow Admin

roles/dataflow.worker

Enable the service account to act as a Dataflow admin.
Dataflow Worker

roles/dataflow.worker

Enable the service account to act as a Dataflow worker.
Storage Object Admin

roles/storage.objectAdmin

Enable the service account to access the Cloud Storage bucket that is used by Dataflow for staging files.
Pub/Sub Publisher

roles/pubsub.publisher

Enable the service account to publish failed messages to the Pub/Sub dead-letter topic.
Pub/Sub Subscriber

roles/pubsub.subscriber

Enable the service account to access the input subscription.
Pub/Sub Viewer

roles/pubsub.viewer

Enable the service account to view the subscription.
Secret Manager Secret Accessor

roles/secretmanager.secretAccessor

Enable the service account to access the secret that contains the Splunk HEC token.
  1. In Cloud Shell, grant the Dataflow worker service account the Dataflow Admin and Dataflow Worker roles that this account needs to execute Dataflow job operations and administration tasks:

    gcloud projects add-iam-policy-binding PROJECT_ID \
       --member="serviceAccount:WORKER_SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com" \
       --role="roles/dataflow.admin"
    
    gcloud projects add-iam-policy-binding PROJECT_ID \
       --member="serviceAccount:WORKER_SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com" \
       --role="roles/dataflow.worker"
    
  2. Grant the Dataflow worker service account access to view and consume messages from the Pub/Sub input subscription:

    gcloud pubsub subscriptions add-iam-policy-binding INPUT_SUBSCRIPTION_NAME \
     --member="serviceAccount:WORKER_SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com" \
     --role="roles/pubsub.subscriber"
    
    gcloud pubsub subscriptions add-iam-policy-binding INPUT_SUBSCRIPTION_NAME \
     --member="serviceAccount:WORKER_SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com" \
     --role="roles/pubsub.viewer"
    
  3. Grant the Dataflow worker service account access to publish any failed messages to the Pub/Sub unprocessed topic:

    gcloud pubsub topics add-iam-policy-binding DEAD_LETTER_TOPIC_NAME \
     --member="serviceAccount:WORKER_SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com" \
     --role="roles/pubsub.publisher"
    
  4. Grant the Dataflow worker service account access to the Splunk HEC token secret in Secret Manager:

    gcloud secrets add-iam-policy-binding hec-token \
    --member="serviceAccount:WORKER_SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/secretmanager.secretAccessor"
    
  5. Grant the Dataflow worker service account read and write access to the Cloud Storage bucket to be used by the Dataflow job for staging files:

    gcloud storage buckets add-iam-policy-binding gs://PROJECT_ID-dataflow/ \
    --member="serviceAccount:WORKER_SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com"
    --role=”roles/storage.objectAdmin”
    

Deploy the Dataflow pipeline

  1. In Cloud Shell, set the following environment variable for your Splunk HEC URL:

    export SPLUNK_HEC_URL=SPLUNK_HEC_URL
    

    Replace the SPLUNK_HEC_URL variable using the form protocol://host[:port], where:

    • protocol is either http or https.
    • host is the fully qualified domain name (FQDN) or IP address of either your Splunk HEC instance, or, if you have multiple HEC instances, the associated HTTP(S) (or DNS-based) load balancer.
    • port is the HEC port number. It is optional, and depends on your Splunk HEC endpoint configuration.

    An example of a valid Splunk HEC URL input is https://splunk-hec.example.com:8088. If you are sending data to HEC on Splunk Cloud Platform, see Send data to HEC on Splunk Cloud to determine the above host and port portions of your specific Splunk HEC URL.

    The Splunk HEC URL must not include the HEC endpoint path, for example, /services/collector. The Pub/Sub to Splunk Dataflow template currently only supports the /services/collector endpoint for JSON-formatted events, and it automatically appends that path to your Splunk HEC URL input. To learn more about the HEC endpoint, see the Splunk documentation for services/collector endpoint.

  2. Deploy the Dataflow pipeline using the Pub/Sub to Splunk Dataflow template: