Stay organized with collections
Save and categorize content based on your preferences.
Vertex AI provides a managed training service
that lets you
operationalize large scale model training. You can enable experiment tracking
using Vertex AI SDK for Python to capture parameters and performance metrics when
submitting the custom training job.
This feature isn't available when you:
submit a training job through Google Cloud console or Google Cloud CLI,
use TPU in the training job,
use distributed training in the training job.
Both prebuilt training containers
and custom containers are supported
are supported.
Required: A version of the Vertex AI SDK for Python higher than 1.24.1 for
google-cloud-aiplatform is installed. If you are training with TensorFlow,
ensure the protobuf version less than 4.0 is installed to avoid conflicts.
There are two options for logging data to Vertex AI Experiments,
autologging and manual logging.
Autologging is recommended if you are using one of these supported frameworks:
Fastai, Gluon, Keras, LightGBM, Pytorch Lightning, Scikit-learn, Spark,
Statsmodels, XGBoost. If your framework isn't supported, or there are
custom metrics you want to log to your experiment run, you can manually adapt
your training script to log parameters, metrics and artifacts.
AutoLog data
To enable autologging, just set enable_autolog=True,
see from_local_script.
You have the option to
create an experiment run, or not. If an experiment name isn't specified, one
is created for you.
The Vertex AI SDK for Python handles creating ExperimentRun
resources for you.
experiment: Provide a name for your experiment. The experiment must have a
TensorBoard instance.
You can find your list of experiments in the Google Cloud console by
selecting Experiments in the section nav.
experiment_run: (Optional) Specify a run name. If not specified, a run is
auto-created.
Manually log data
Use the manually log data option to incorporate your training script.
Here's how to change the training script:
importosimportpickleimportpandasaspdfromsklearn.linear_modelimportLinearRegression# To use manual logging APIs, import aiplatformfromgoogle.cloudimportaiplatform# Create Datasetdata={'A':[1.1,2.2,4.1,5.2],'B':[200,212.12,22,123],'Y':[1,0,1,0]}df=pd.DataFrame(data)X=df[['A','B']]Y=df['Y']# Train modelmodel=LinearRegression().fit(X,Y)# Save the model to gcsmodel_dir=os.getenv('AIP_MODEL_DIR')model_gcs=model_dir.replace('gs://','/gcs/')model_name='model.joblib'os.mkdir(model_gcs)f=open(os.path.join(model_gcs,model_name),'wb')pickle.dump(model,f)f=open(os.path.join(model_gcs,model_name),'wb')pickle.dump(model,f)# Call aiplatform's logging APIs to save data to Vertex AI Experiments.params=model.get_params()aiplatform.log_params(params)metrics={"training_accuracy":model.score(X,Y)}aiplatform.log_metrics(metrics)
You have the option to create an experiment run, or not. If an experiment name
isn't specified, one is created for you.
experiment: Provide a name for your experiment. You can find your
list of experiments in the Google Cloud console by selecting Experiments in the section nav.
experiment_run: Specify a run name. If not specified, a run is
be auto-created.
View autologged parameters and metrics
Use the Vertex AI SDK for Python
to compare runs and
get runs data.
The Google Cloud console
provides an easy way to compare these runs.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-26 UTC."],[],[],null,["# Run training job with experiment tracking\n\nVertex AI provides a [managed training service](/vertex-ai/docs/training/overview)\nthat lets you\noperationalize large scale model training. You can enable experiment tracking\nusing Vertex AI SDK for Python to capture parameters and performance metrics when\nsubmitting the custom training job.\n\nThis feature isn't available when you:\n\n- submit a training job through Google Cloud console or Google Cloud CLI,\n- use TPU in the training job,\n- use distributed training in the training job.\n\nBoth [prebuilt training containers](/vertex-ai/docs/training/pre-built-containers)\nand [custom containers are supported](/vertex-ai/docs/training/create-custom-container)\nare supported.\nRequired: A version of the Vertex AI SDK for Python higher than 1.24.1 for\ngoogle-cloud-aiplatform is installed. If you are training with TensorFlow,\nensure the protobuf version less than 4.0 is installed to avoid conflicts.\n\nThere are two options for logging data to Vertex AI Experiments,\nautologging and manual logging.\n\nAutologging is recommended if you are using one of these supported frameworks:\nFastai, Gluon, Keras, LightGBM, Pytorch Lightning, Scikit-learn, Spark,\nStatsmodels, XGBoost. If your framework isn't supported, or there are\ncustom metrics you want to log to your experiment run, you can manually adapt\nyour training script to log parameters, metrics and artifacts.\n\nAutoLog data\n------------\n\nTo enable autologging, just set `enable_autolog=True`,\nsee [`from_local_script`](/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomJob#google_cloud_aiplatform_CustomJob_from_local_script).\nYou have the option to\ncreate an experiment run, or not. If an experiment name isn't specified, one\nis created for you.\n\nThe Vertex AI SDK for Python handles creating [ExperimentRun](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.ExperimentRun#google_cloud_aiplatform_ExperimentRun)\nresources for you.\n\n### Python\n\n def create_custom_job_with_experiment_autologging_sample(\n project: str,\n location: str,\n staging_bucket: str,\n display_name: str,\n script_path: str,\n container_uri: str,\n service_account: str,\n experiment: str,\n experiment_run: Optional[str] = None,\n ) -\u003e None:\n aiplatform.init(project=project, location=location, staging_bucket=staging_bucket, experiment=experiment)\n\n job = aiplatform.CustomJob.from_local_script(\n display_name=display_name,\n script_path=script_path,\n container_uri=container_uri,\n enable_autolog=True,\n )\n\n job.run(\n service_account=service_account,\n experiment=experiment,\n experiment_run=experiment_run,\n )\n\n- `project`: . You can find these Project IDs in the Google Cloud console [welcome](https://console.cloud.google.com/welcome) page.\n- `location`: See [List of available locations.](/vertex-ai/docs/general/locations)\n- `staging_bucket`: The name you gave your bucket, for example, `my_bucket`.\n- `display_name`: The user-defined name of the [CustomJob](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomJob).\n- `script_path`: The path, relative to the working directory on your local file system, to the script that is the entry point for your training code.\n- `container_uri`: The URI of the training container image can be a Vertex AI [prebuilt training container](/vertex-ai/docs/training/pre-built-containers) or a [custom container](/vertex-ai/docs/training/containers-overview)\n- `service_account`: See [Create a service account with required permissions](/vertex-ai/docs/experiments/tensorboard-training#create_a_service_account_with_required_permissions).\n- `experiment`: Provide a name for your experiment. The experiment must have a [TensorBoard instance](/vertex-ai/docs/experiments/tensorboard-setup#create-tensorboard-instance). You can find your list of experiments in the Google Cloud console by selecting **Experiments** in the section nav.\n- `experiment_run`: (Optional) Specify a run name. If not specified, a run is auto-created.\n\nManually log data\n-----------------\n\nUse the manually log data option to incorporate your training script.\n\nHere's how to change the training script: \n\n import os\n import pickle\n import pandas as pd\n from sklearn.linear_model import LinearRegression\n # To use manual logging APIs, import aiplatform\n from google.cloud import aiplatform\n\n # Create Dataset\n data = {'A': [1.1,2.2,4.1,5.2],\n 'B': [200, 212.12, 22, 123],\n 'Y': [1,0,1,0]}\n df = pd.DataFrame(data)\n X = df[['A', 'B']]\n Y = df['Y']\n\n # Train model\n model = LinearRegression().fit(X, Y)\n\n # Save the model to gcs\n model_dir = os.getenv('AIP_MODEL_DIR')\n model_gcs = model_dir.replace('gs://', '/gcs/')\n model_name = 'model.joblib'\n os.mkdir(model_gcs)\n f = open(os.path.join(model_gcs, model_name), 'wb')\n pickle.dump(model, f)\n\n f = open(os.path.join(model_gcs, model_name), 'wb')\n pickle.dump(model, f)\n\n # Call aiplatform's logging APIs to save data to Vertex AI Experiments.\n params = model.get_params()\n aiplatform.log_params(params)\n metrics = {\"training_accuracy\": model.score(X,Y)}\n aiplatform.log_metrics(metrics)\n\nYou have the option to create an experiment run, or not. If an experiment name\nisn't specified, one is created for you.\n\nLearn more, see\n[Manually log data to an experiment run](/vertex-ai/docs/experiments/log-data).\n\n### Python\n\n def create_custom_job_with_experiment_sample(\n project: str,\n location: str,\n staging_bucket: str,\n display_name: str,\n script_path: str,\n container_uri: str,\n service_account: str,\n experiment: str,\n experiment_run: Optional[str] = None,\n ) -\u003e None:\n aiplatform.init(\n project=project,\n location=location,\n staging_bucket=staging_bucket,\n experiment=experiment\n )\n\n job = aiplatform.CustomJob.from_local_script(\n display_name=display_name,\n script_path=script_path,\n container_uri=container_uri,\n )\n\n job.run(\n service_account=service_account,\n experiment=experiment,\n experiment_run=experiment_run,\n )\n\n- `project`: . You can find these Project IDs in the Google Cloud console [welcome](https://console.cloud.google.com/welcome) page.\n- `location`: See [List of available locations](/vertex-ai/docs/general/locations)\n- `staging_bucket`: The name you gave your bucket, for example, `my_bucket`.\n- `display_name`: The user-defined name of the [CustomJob](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomJob).\n- `script_path`: The path, relative to the working directory on your local file system, to the script that is the entry point for your training code.\n- `container_uri`: The URI of the training container image can be a Vertex AI [prebuilt training container](/vertex-ai/docs/training/pre-built-containers), or a [custom container](/vertex-ai/docs/training/containers-overview). If you are using a custom container, be sure [`google-cloud-aiplatform\u003e=1.24.0`](/vertex-ai/docs/start/install-sdk#install-python-sdk) is installed.\n- `service_account`: See [Create a service account with required permissions](/vertex-ai/docs/experiments/tensorboard-training#create_a_service_account_with_required_permissions).\n- `experiment`: Provide a name for your experiment. You can find your list of experiments in the Google Cloud console by selecting **Experiments** in the section nav.\n- `experiment_run`: Specify a run name. If not specified, a run is be auto-created.\n\nView autologged parameters and metrics\n--------------------------------------\n\nUse the Vertex AI SDK for Python\nto [compare runs](/vertex-ai/docs/experiments/compare-analyze-runs#compare-runs) and\nget runs data.\nThe [Google Cloud console](/vertex-ai/docs/experiments/compare-analyze-runs#console-compare-analyze-runs)\nprovides an easy way to compare these runs.\n\nWhat's next\n-----------\n\n- [Log data to an experiment run](/vertex-ai/docs/experiments/log-data)\n\nRelevant notebook sample\n------------------------\n\n- [Custom training autologging](/vertex-ai/docs/experiments/user-journey/uj-custom-training-autologging)"]]