User events ingestion starter guide

This page provides guidance and general best practices for ingesting user events. View the tutorials to get started.

Before you begin

Tutorials

These tutorials show you how to import user events from various sources.

Import events from Cloud Storage

This tutorial shows how to import user events from Cloud Storage.


To follow step-by-step guidance for this task directly in the Cloud Shell Editor, click Guide me:

Guide me


Import events from BigQuery

This tutorial shows how to import user events from BigQuery.


To follow step-by-step guidance for this task directly in the Cloud Shell Editor, click Guide me:

Guide me


Import events inline

This tutorial shows how to import user events data inline.


To follow step-by-step guidance for this task directly in the Cloud Shell Editor, click Guide me:

Guide me


Vertex AI Search for commerce's data ingestion pipeline encompasses both product catalog and user event data. This data stream provides the foundation for model training and continuous evaluation through feedback mechanisms. Accurate and complete data ingestion is an ongoing process for maintaining the adaptability of the underlying models, directly influencing search result quality and relevance.

User event ingestion general best practices

Consider these data ingestion best practices when architecting your search solution to maximize efficiency and effectiveness.

Ingest user events in Vertex AI Search for commerce

Mirroring the catalog ingestion process, Vertex AI Search for commerce offers dual mechanisms for user event data. Bulk import and real-time streaming provide flexibility for diverse customer backend architectures. Unlike the catalog, though, where a hybrid approach is feasible, a dedicated ingestion strategy is better for user events. Of the two options, real-time streaming is standard in practical implementations.

Both yield comparable outcomes in model training, KPI measurement, and revenue optimization, but subtle trade-offs exist. Bulk import, for example, might be more efficient for processing volumes of historical data, while streaming is more responsive to users.

The choice depends on your retail environment requirements, such as:

  • The latency target for incorporating user events into model training.
  • The volume of events being generated.

Scale user event ingestion in Vertex AI Search for commerce

Prepare for traffic surges and ensure data integrity by proactively planning for scaling scenarios. High-traffic special events like weekend sales can trigger a 10 to 20 times surge in user activity. Sufficient quotas and the scalability of your ingestion system to handle such spikes is crucial.

These events often manifest as sudden bursts of traffic, not gradual increases. Missing events during these peak periods can hamper model training, degrade search performance, and skew KPI measurements. Debugging issues can be challenging, because events form the basis for KPI tracking and general troubleshooting. Implementing robust alerting can notify you of deteriorating data quality, which is often a consequence of missing or erroneous event data.

By anticipating these scenarios and taking action, you maintain the reliability and accuracy of your user event data, even under extreme load conditions. Accurate data lays the groundwork for your Vertex AI Search for commerce system to deliver optimized performance, accurate analytics, and a user-centered experience during peak traffic.

Reference architecture for batch events ingestion

Reference architecture for batch events ingestion

Batch ingestion requires a scalable architecture designed for the efficient ingestion of user events into Vertex AI Search for commerce. The architecture is composed of Google Cloud services, including Pub/Sub, Dataflow, BigQuery, Workflows, and Cloud Storage, to manage the ingestion process in a staged, controlled manner.

Architectural overview

Batch events ingestion architecture employs a multi-stage approach to ensure the reliable and accurate transfer of user event data into Vertex AI Search for commerce.

Key components include:

  • Pub/Sub: Acts as the initial entry point for user events, providing a scalable and durable messaging system.
  • Dataflow (streaming events): Continuously reads raw user events from Pub/Sub and writes them into BigQuery raw event tables, also capturing any failed events for debugging.
  • BigQuery: Serves as the primary data warehousing solution, storing raw events, transformed events, and various metadata related to the ingestion process.
  • Workflows: Orchestrates the hourly batch processing of raw events, ensuring data integrity and facilitating error handling.
  • Cloud Storage: Provides temporary storage for Dataflow during processing and for archiving failed event logs.
  • Vertex AI Search for commerce: The final destination for user event data, enabling advanced search and recommendation capabilities.

Step-by-step data flow

This is the step-by-step data flow for bulk events ingestion.

  1. Event streaming and raw data persistence:

    • User events are published to Pub/Sub topics from the customer source system.
    • A Dataflow streaming pipeline (streaming events) continuously reads events from Pub/Sub.
    • Successful raw events are written into BigQuery raw event tables.
    • Any failures during this initial ingestion are captured and stored in separate BigQuery raw failed tables for analysis and troubleshooting.
  2. Hourly batch processing and transformation:

    • Workflows triggers an hourly scheduled Dataflow pipeline (Incr Batch Events).
    • This pipeline reads raw events from BigQuery tables.
    • Events are transformed into the required Vertex AI Search for commerce format.
    • Transformed events are written into BigQuery curated event tables.
    • Any transformation failures are captured in BigQuery curated failed event tables.
  3. Data validation and preparation:

    • Workflows executes a BigQuery stored procedure to create or update a view (Incr Update Event View) that reflects the latest transformed event data from the last hour.
    • The workflow then performs a validation check on this view, ensuring that the number of transformed events falls within predefined thresholds.
  4. Vertex AI Search for commerce import:

    • If the validation is successful, Workflows invokes the Vertex AI Search for commerce import event API, pointing it to the BigQuery view containing the transformed events.
    • Vertex AI Search for commerce then imports these events for further processing and indexing.

Additional considerations for batch events ingestion

Additional aspects to consider when ingesting batch events are:

  • Security: Appropriate security measures, such as access controls and encryption, should be implemented to protect sensitive user event data.
  • Monitoring and logging: Comprehensive monitoring and logging should be set up to track the performance of the system and identify potential issues proactively.
  • Cost optimization: Resource utilization should be monitored to optimize costs and ensure efficient use of Google Cloud services.

This architecture provides a foundation for ingesting user events into Vertex AI Search for commerce. Using various Google Cloud services and incorporating best practices for data processing and error handling, it enables organizations to build scalable, maintainable solutions for advanced search and recommendation experiences.

What's next