Vertex AI glossary

  • annotation set
    • An annotation set contains the labels associated with the uploaded source files within a dataset. An annotation set is associated with both a data type and an objective (for example, video/classification).
  • API endpoints
    • API Endpoints is a service config aspect that specifies the network addresses, also known as service endpoints. (for example, aiplatform.googleapis.com).
  • Approximate Nearest Neighbor (ANN)
    • The Approximate Nearest Neighbor (ANN) service is a high scale, low latency solution, to find similar vectors (or more specifically, "embeddings") for a large corpus. For more information, see How to use Vector Search for semantic matching.
  • artifact
    • An artifact is a discrete entity or piece of data produced and consumed by a machine learning workflow. Examples of artifacts include datasets, models, input files, and training logs.
  • Artifact Registry
    • Artifact Registry is a universal artifact management service. It is the recommended service for managing containers and other artifacts on Google Cloud. For more information, see Artifact Registry.
  • authentication
    • The process of verifying the identity of a client (which might be a user or another process) for the purposes of gaining access to a secured system. A client that has proven its identity is said to be authenticated. For more information, see Authentication methods at Google.
  • baseline
    • A model used as a reference point for comparing how well another model (typically, a more complex one) is performing. For example, a logistic regression model might serve as a good baseline for a deep model. For a particular problem, the baseline helps model developers quantify the minimal expected performance that a new model must achieve for the new model to be useful. For more information, see Baseline and target datasets.
  • batch prediction
    • Batch prediction takes a group of prediction requests and outputs the results in one file. For more information, see Getting batch predictions.
  • bounding box
    • A bounding box for an object in the video frame can be specified in either of two ways (i) Using 2 vertices consisting of a set of x,y coordinates if they are diagonally opposite points of the rectangle. For example: x_relative_min, y_relative_min,,,x_relative_max,y_relative_max,, (ii) Use all 4 vertices. For more information, see Prepare video data.
  • bucket
    • Top-level folder for Cloud Storage. Bucket names must be unique across all users of Cloud Storage. Buckets contain files. For more information, see Product overview of Cloud Storage.
  • classification metrics
    • Supported classification metrics in the Vertex AI SDK for Python are confusion matrix and ROC curve.
  • container image
    • A container image is a package that includes the component's executable code and a definition of the environment that the code runs in. For more information, see Custom training overview.
  • context
    • A context is used to group artifacts and executions together under a single, queryable, and typed category. Contexts can be used to represent sets of metadata. An example of a Context would be a run of a machine learning pipeline.
  • Customer-managed encryption keys (cmek)
    • Customer-managed encryption keys (CMEK) are integrations that allow customers to encrypt data in existing Google services using a key they manage in Cloud KMS (also known as Storky). The key in Cloud KMS is the key encryption key protecting their data. For more information, see Customer-managed encryption keys (CMEK).
  • CustomJob
    • A CustomJob is one of three Vertex AI resources a user can create to train custom models on Vertex AI. Custom training jobs are the basic way to run custom machine learning (ML) training code in Vertex AI. For more information, see Create custom training jobs.
  • Dask
    • Dask is a distributed computing platform that is often used with TensorFlow, Pytorch, and other ML frameworks to manage distributed training jobs. For more information, see Wikipedia.
  • dataset
    • A dataset is broadly defined as a collection of structured or unstructured data records. For more information, see Create a dataset.
  • embedding
    • An embedding is a type of vector that's used to represent data in a way that captures its semantic meaning. Embeddings are typically created using machine learning techniques, and they are often used in natural language processing (NLP) and other machine learning applications.
  • event
    • An event describes the relationship between artifacts and executions. Each artifact can be produced by an execution and consumed by other executions. Events help you to determine the provenance of artifacts in their ML workflows by chaining together artifacts and executions.
  • execution
    • An execution is a record of an individual machine learning workflow step, typically annotated with its runtime parameters. Examples of executions include data ingestion, data validation, model training, model evaluation, and model deployment.
  • experiment
    • An experiment is a context that can contain a set of n experiment runs in addition to pipeline runs where a user can investigate, as a group, different configurations such as input artifacts or hyperparameters.
  • experiment run
    • An experiment run can contain user-defined metrics, parameters, executions, artifacts, and Vertex resources (for example, PipelineJob).
  • exploratory data analysis
    • In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.
  • feature
    • In machine learning (ML), a feature is a characteristic or attribute of an instance or entity that's used as an input to train an ML model or to make predictions.
  • feature engineering
    • Feature engineering is the process of transforming raw machine learning (ML) data into features that can be used to train ML models or to make predictions.
  • feature value
    • A feature value corresponds to the actual and measurable value of a feature (attribute) of an instance or entity. A collection of feature values for the unique entity represent the feature record corresponding to the entity.
  • feature serving
    • Feature serving is the process of exporting or fetching feature values for training or inference. In Vertex AI, there are two types of feature serving—online serving and offline serving. Online serving retrieves the latest feature values of a subset of the feature data source for online predictions. Offline or batch serving exports high volumes of feature data for offline processing, such as ML model training.
  • feature timestamp
    • A feature timestamp indicates when the set of feature values in a specific feature record for an entity were generated.
  • feature record
    • A feature record is an aggregation of all feature values that describe the attributes of a unique entity at a specific point in time.
  • feature registry
    • A feature registry is a central interface for recording feature data sources that you want to serve for online predictions. For more information, see Feature Registry setup.
  • feature group
    • A feature group is a feature registry resource that corresponds to a BigQuery source table or view containing feature data. A feature view might contain features and can be thought of as a logical grouping of feature columns in the data source.
  • feature view
    • A feature view is a logical collection of features materialized from a BigQuery data source to an online store instance. A feature view stores and periodically refreshes the customer's feature data, which is refreshed periodically from the BigQuery source. A feature view is associated with the feature data storage either directly or through associations to feature registry resources.
  • ML pipelines
    • ML pipelines are portable and scalable ML workflows that are based on containers.
  • model
    • Any model pre-trained or not.
  • model resource name
    • The resource name for a model as follows: projects/<PROJECT_ID>/locations/<LOCATION_ID>/models/<MODEL_ID>. You can find the model's ID in the Cloud console on the 'Model Registry' page.
  • Network File System (NFS)
    • A client/server system that lets users access files across a network and treat them as if they resided in a local file directory.
  • offline store
    • The offline store is a storage facility storing recent and historical feature data, which is typically used for training ML models. An offline store also contains the latest feature values, which you can serve for online predictions.
  • online store
    • In feature management, an online store is a storage facility for the latest feature values to be served for online predictions.
  • parameters
    • Parameters are keyed input values that configure a run, regulate the behavior of the run, and affect the results of the run. Examples include learning rate, dropout rate, and number of training steps.
  • pipeline component
    • A self-contained set of code that performs one step in a pipeline's workflow, such as data preprocessing, data transformation, and training a model.
  • pipeline job
    • A resource in the Vertex AI API corresponding to Vertex Pipeline Jobs. Users create a PipelineJob when they want to run an ML Pipeline on Vertex AI.
  • pipeline run
    • One or more Vertex PipelineJobs can be associated with an experiment where each PipelineJob is represented as a single run. In this context, the parameters of the run are inferred by the parameters of the PipelineJob. The metrics are inferred from the system.Metric artifacts produced by that PipelineJob. The artifacts of the run are inferred from artifacts produced by that PipelineJob.
  • pipeline template
    • An ML workflow definition that a single user or multiple users can reuse to create multiple pipeline runs.
  • Private Service Connect (PSC)
    • Private Service Connect is a technology that allows Compute Engine customers to map private IPs in their network to either another VPC network or to Google APIs.
  • Private Service Connect interface (PSC-I)
    • Private Service Connect interface provides a way for producers to initiate connections to any network resources in consumer VPC privately.
  • recall
    • The percentage of true nearest neighbors returned by the index. For example, if a nearest neighbor query for 20 nearest neighbors returned 19 of the "ground truth" nearest neighbors, the recall is 19/20x100 = 95%.
  • restricts
    • Functionality to "restrict" searches to a subset of the index by using Boolean rules. Restrict is also referred to as "filtering". With Vector Search, you can use numeric filtering and text attribute filtering.
  • service account
    • In Google Cloud, a service account is a special kind of account used by an application or a virtual machine (VM) instance, not a person. Applications use service accounts to make authorized API calls.
  • summary metrics
    • Summary metrics are a single value for each metric key in an experiment run. For example, the test accuracy of an experiment is the accuracy calculated against a test dataset at the end of training that can be captured as a single value summary metric.
  • TensorBoard
    • TensorBoard is a suite of web applications for visualizing and understanding TensorFlow runs and models. For more information, see TensorBoard.
  • TensorBoard Resource name
    • A TensorBoard Resource name is used to fully identify a Vertex AI TensorBoard instance. The format is as follows: projects/PROJECT_ID_OR_NUMBER/locations/REGION/tensorboards/TENSORBOARD_INSTANCE_ID.
  • TensorBoard instance
    • A TensorBoard instance is a regionalized resource that stores Vertex AI TensorBoard Experiments associated with a Project. You can create multiple TensorBoard instances in a project if, for example, you want multiple CMEK enabled instances. This is the same as the TensorBoard resource in the API.
  • time series metrics
    • Time series metrics are longitudinal metric values where each value represents a step in the training routine portion of a run. Time series metrics are stored in Vertex AI TensorBoard. Vertex AI Experiments stores a reference to the Vertex TensorBoard resource.
  • token
    • A token in a language model is the atomic unit that the model is training and making predictions on, namely words, morphemes, and characters. In domains outside of language models, tokens can represent other kinds of atomic units. For example, in computer vision, a token might be a subset of an image.
  • unmanaged artifacts
    • An artifact that exists outside of the Vertex AI context.
  • Vertex AI Experiments
    • Vertex AI Experiments enables users to track (i) steps of an experiment run, for example, preprocessing, training, (ii) inputs, for example, algorithm, parameters, datasets, (iii) outputs of those steps, for example, models, checkpoints, metrics.
  • Vertex AI TensorBoard Experiment
    • The data associated with an Experiment can be viewed in TensorBoard web application (scalars, histograms, distributions, etc.). Timeseries scalars can be viewed in the Google Cloud Console. For more information, see Compare and analyze runs.