Class LocalModel (1.37.0)

LocalModel(
    serving_container_spec: typing.Optional[
        google.cloud.aiplatform_v1.types.model.ModelContainerSpec
    ] = None,
    serving_container_image_uri: typing.Optional[str] = None,
    serving_container_predict_route: typing.Optional[str] = None,
    serving_container_health_route: typing.Optional[str] = None,
    serving_container_command: typing.Optional[typing.Sequence[str]] = None,
    serving_container_args: typing.Optional[typing.Sequence[str]] = None,
    serving_container_environment_variables: typing.Optional[
        typing.Dict[str, str]
    ] = None,
    serving_container_ports: typing.Optional[typing.Sequence[int]] = None,
    serving_container_deployment_timeout: typing.Optional[int] = None,
    serving_container_shared_memory_size_mb: typing.Optional[int] = None,
    serving_container_startup_probe_exec: typing.Optional[typing.Sequence[str]] = None,
    serving_container_startup_probe_period_seconds: typing.Optional[int] = None,
    serving_container_startup_probe_timeout_seconds: typing.Optional[int] = None,
    serving_container_health_probe_exec: typing.Optional[typing.Sequence[str]] = None,
    serving_container_health_probe_period_seconds: typing.Optional[int] = None,
    serving_container_health_probe_timeout_seconds: typing.Optional[int] = None,
)

Class that represents a local model.

Methods

LocalModel

LocalModel(
    serving_container_spec: typing.Optional[
        google.cloud.aiplatform_v1.types.model.ModelContainerSpec
    ] = None,
    serving_container_image_uri: typing.Optional[str] = None,
    serving_container_predict_route: typing.Optional[str] = None,
    serving_container_health_route: typing.Optional[str] = None,
    serving_container_command: typing.Optional[typing.Sequence[str]] = None,
    serving_container_args: typing.Optional[typing.Sequence[str]] = None,
    serving_container_environment_variables: typing.Optional[
        typing.Dict[str, str]
    ] = None,
    serving_container_ports: typing.Optional[typing.Sequence[int]] = None,
    serving_container_deployment_timeout: typing.Optional[int] = None,
    serving_container_shared_memory_size_mb: typing.Optional[int] = None,
    serving_container_startup_probe_exec: typing.Optional[typing.Sequence[str]] = None,
    serving_container_startup_probe_period_seconds: typing.Optional[int] = None,
    serving_container_startup_probe_timeout_seconds: typing.Optional[int] = None,
    serving_container_health_probe_exec: typing.Optional[typing.Sequence[str]] = None,
    serving_container_health_probe_period_seconds: typing.Optional[int] = None,
    serving_container_health_probe_timeout_seconds: typing.Optional[int] = None,
)

Creates a local model instance.

Parameters
Name Description
serving_container_spec aiplatform.gapic.ModelContainerSpec

Optional. The container spec of the LocalModel instance.

serving_container_image_uri str

Optional. The URI of the Model serving container.

serving_container_predict_route str

Optional. An HTTP path to send prediction requests to the container, and which must be supported by it. If not specified a default HTTP path will be used by Vertex AI.

serving_container_health_route str

Optional. An HTTP path to send health check requests to the container, and which must be supported by it. If not specified a standard HTTP path will be used by Vertex AI.

serving_container_command Sequence[str]

Optional. The command with which the container is run. Not executed within a shell. The Docker image's ENTRYPOINT is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container's environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.

serving_container_args typing.Optional[typing.Sequence[str]]

(Sequence[str]): Optional. The arguments to the command. The Docker image's CMD is used if this is not provided. Variable references $(VAR_NAME) are expanded using the container's environment. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.

serving_container_environment_variables Dict[str, str]

Optional. The environment variables that are to be present in the container. Should be a dictionary where keys are environment variable names and values are environment variable values for those names.

serving_container_ports Sequence[int]

Optional. Declaration of ports that are exposed by the container. This field is primarily informational, it gives Vertex AI information about the network connections the container uses. Listing or not a port here has no impact on whether the port is actually exposed, any port listening on the default "0.0.0.0" address inside a container will be accessible from the network.

serving_container_deployment_timeout int

Optional. Deployment timeout in seconds.

serving_container_shared_memory_size_mb int

Optional. The amount of the VM memory to reserve as the shared memory for the model in megabytes.

serving_container_startup_probe_exec Sequence[str]

Optional. Exec specifies the action to take. Used by startup probe. An example of this argument would be ["cat", "/tmp/healthy"]

serving_container_startup_probe_period_seconds int

Optional. How often (in seconds) to perform the startup probe. Default to 10 seconds. Minimum value is 1.

serving_container_startup_probe_timeout_seconds int

Optional. Number of seconds after which the startup probe times out. Defaults to 1 second. Minimum value is 1.

serving_container_health_probe_exec Sequence[str]

Optional. Exec specifies the action to take. Used by health probe. An example of this argument would be ["cat", "/tmp/healthy"]

serving_container_health_probe_period_seconds int

Optional. How often (in seconds) to perform the health probe. Default to 10 seconds. Minimum value is 1.

serving_container_health_probe_timeout_seconds int

Optional. Number of seconds after which the health probe times out. Defaults to 1 second. Minimum value is 1.

Exceptions
Type Description
ValueError If serving_container_spec is specified but serving_container_spec.image_uri is None. Also if serving_container_spec is None but serving_container_image_uri is None.

build_cpr_model

build_cpr_model(src_dir: str, output_image_uri: str, predictor: typing.Optional[typing.Type[google.cloud.aiplatform.prediction.predictor.Predictor]] = None, handler: typing.Type[google.cloud.aiplatform.prediction.handler.Handler] = <class 'google.cloud.aiplatform.prediction.handler.PredictionHandler'>, base_image: str = 'python:3.10', requirements_path: typing.Optional[str] = None, extra_packages: typing.Optional[typing.List[str]] = None, no_cache: bool = False) -> google.cloud.aiplatform.prediction.local_model.LocalModel

Builds a local model from a custom predictor.

This method builds a docker image to include user-provided predictor, and handler.

Sample src_dir contents (e.g. ./user_src_dir):

user_src_dir/
|-- predictor.py
|-- requirements.txt
|-- user_code/
|   |-- utils.py
|   |-- custom_package.tar.gz
|   |-- ...
|-- ...

To build a custom container:

local_model = LocalModel.build_cpr_model(
    "./user_src_dir",
    "us-docker.pkg.dev/$PROJECT/$REPOSITORY/$IMAGE_NAME$",
    predictor=$CUSTOM_PREDICTOR_CLASS,
    requirements_path="./user_src_dir/requirements.txt",
    extra_packages=["./user_src_dir/user_code/custom_package.tar.gz"],
)

In the built image, user provided files will be copied as follows:

container_workdir/
|-- predictor.py
|-- requirements.txt
|-- user_code/
|   |-- utils.py
|   |-- custom_package.tar.gz
|   |-- ...
|-- ...

To exclude files and directories from being copied into the built container images, create a .dockerignore file in the src_dir. See https://docs.docker.com/engine/reference/builder/#dockerignore-file for more details about usage.

In order to save and restore class instances transparently with Pickle, the class definition must be importable and live in the same module as when the object was stored. If you want to use Pickle, you must save your objects right under the src_dir you provide.

The created CPR images default the number of model server workers to the number of cores. Depending on the characteristics of your model, you may need to adjust the number of workers. You can set the number of workers with the following environment variables:

VERTEX_CPR_WEB_CONCURRENCY:
    The number of the workers. This will overwrite the number calculated by the other
    variables, min(VERTEX_CPR_WORKERS_PER_CORE * number_of_cores, VERTEX_CPR_MAX_WORKERS).
VERTEX_CPR_WORKERS_PER_CORE:
    The number of the workers per core. The default is 1.
VERTEX_CPR_MAX_WORKERS:
    The maximum number of workers can be used given the value of VERTEX_CPR_WORKERS_PER_CORE
    and the number of cores.

If you hit the error showing "model server container out of memory" when you deploy models to endpoints, you should decrease the number of workers.

Parameters
Name Description
src_dir str

Required. The path to the local directory including all needed files such as predictor. The whole directory will be copied to the image.

output_image_uri str

Required. The image uri of the built image.

predictor Type[Predictor]

Optional. The custom predictor class consumed by handler to do prediction.

handler Type[Handler]

Required. The handler class to handle requests in the model server.

base_image str

Required. The base image used to build the custom images. The base image must have python and pip installed where the two commands python and pip must be available.

requirements_path str

Optional. The path to the local requirements.txt file. This file will be copied to the image and the needed packages listed in it will be installed.

extra_packages List[str]

Optional. The list of user custom dependency packages to install.

no_cache bool

Required. Do not use cache when building the image. Using build cache usually reduces the image building time. See https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache for more details.

Exceptions
Type Description
ValueError If handler is None or if handler is PredictionHandler but predictor is None.
Returns
Type Description
local model Instantiated representation of the local model.

copy_image

copy_image(
    dst_image_uri: str,
) -> google.cloud.aiplatform.prediction.local_model.LocalModel

Copies the image to another image uri.

Parameter
Name Description
dst_image_uri str

The destination image uri to copy the image to.

Exceptions
Type Description
DockerError If the command fails.
Returns
Type Description
local model Instantiated representation of the local model with the copied image.

deploy_to_local_endpoint

deploy_to_local_endpoint(
    artifact_uri: typing.Optional[str] = None,
    credential_path: typing.Optional[str] = None,
    host_port: typing.Optional[str] = None,
    gpu_count: typing.Optional[int] = None,
    gpu_device_ids: typing.Optional[typing.List[str]] = None,
    gpu_capabilities: typing.Optional[typing.List[typing.List[str]]] = None,
    container_ready_timeout: typing.Optional[int] = None,
    container_ready_check_interval: typing.Optional[int] = None,
) -> google.cloud.aiplatform.prediction.local_endpoint.LocalEndpoint

Deploys the local model instance to a local endpoint.

An environment variable, GOOGLE_CLOUD_PROJECT, will be set to the project in the global config. This is required if the credentials file does not have project specified and used to recognize the project by the Cloud Storage client.

Example 1:

with local_model.deploy_to_local_endpoint(
    artifact_uri="gs://path/to/your/model",
    credential_path="local/path/to/your/credentials",
) as local_endpoint:
    health_check_response = local_endpoint.run_health_check()
    print(health_check_response, health_check_response.content)

    predict_response = local_endpoint.predict(
        request='{"instances": [[1, 2, 3, 4]]}',
        headers={"header-key": "header-value"},
    )
    print(predict_response, predict_response.content)

    local_endpoint.print_container_logs()

Example 2:

local_endpoint = local_model.deploy_to_local_endpoint(
    artifact_uri="gs://path/to/your/model",
    credential_path="local/path/to/your/credentials",
)
local_endpoint.serve()

health_check_response = local_endpoint.run_health_check()
print(health_check_response, health_check_response.content)

predict_response = local_endpoint.predict(
    request='{"instances": [[1, 2, 3, 4]]}',
    headers={"header-key": "header-value"},
)
print(predict_response, predict_response.content)

local_endpoint.print_container_logs()
local_endpoint.stop()
Parameters
Name Description
artifact_uri str

Optional. The path to the directory containing the Model artifact and any of its supporting files. The path is either a GCS uri or the path to a local directory. If this parameter is set to a GCS uri: (1) credential_path must be specified for local prediction. (2) The GCS uri will be passed directly to Predictor.load. If this parameter is a local directory: (1) The directory will be mounted to a default temporary model path. (2) The mounted path will be passed to Predictor.load.

credential_path str

Optional. The path to the credential key that will be mounted to the container. If it's unset, the environment variable, GOOGLE_APPLICATION_CREDENTIALS, will be used if set.

host_port str

Optional. The port on the host that the port, AIP_HTTP_PORT, inside the container will be exposed as. If it's unset, a random host port will be assigned.

gpu_count int

Optional. Number of devices to request. Set to -1 to request all available devices. To use GPU, set either gpu_count or gpu_device_ids. The default value is -1 if gpu_capabilities is set but both gpu_count and gpu_device_ids are not set.

gpu_device_ids List[str]

Optional. This parameter corresponds to NVIDIA_VISIBLE_DEVICES in the NVIDIA Runtime. To use GPU, set either gpu_count or gpu_device_ids.

gpu_capabilities List[List[str]]

Optional. This parameter corresponds to NVIDIA_DRIVER_CAPABILITIES in the NVIDIA Runtime. The outer list acts like an OR, and each sub-list acts like an AND. The driver will try to satisfy one of the sub-lists. Available capabilities for the NVIDIA driver can be found in https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#driver-capabilities. The default value is [["utility", "compute"]] if gpu_count or gpu_device_ids is set.

container_ready_timeout int

Optional. The timeout in second used for starting the container or succeeding the first health check.

container_ready_check_interval int

Optional. The time interval in second to check if the container is ready or the first health check succeeds.

get_serving_container_spec

get_serving_container_spec() -> (
    google.cloud.aiplatform_v1.types.model.ModelContainerSpec
)

Returns the container spec for the image.

pull_image_if_not_exists

pull_image_if_not_exists()

Pulls the image if the image does not exist locally.

Exceptions
Type Description
DockerError If the command fails.

push_image

push_image() -> None

Pushes the image to a registry.

If you hit permission errors while calling this function, please refer to https://cloud.google.com/artifact-registry/docs/docker/authentication to set up the authentication.

For Artifact Registry, the repository must be created before you are able to push images to it. Otherwise, you will hit the error, "Repository {REPOSITORY} not found". To create Artifact Registry repositories, use UI or call the following gcloud command.

gcloud artifacts repositories create {REPOSITORY}                 --project {PROJECT}                 --location {REGION}                 --repository-format docker

See https://cloud.google.com/artifact-registry/docs/manage-repos#create for more details.

If you hit a "Permission artifactregistry.repositories.uploadArtifacts denied" error, set up authentication for Docker.

gcloud auth configure-docker {REPOSITORY}

See https://cloud.google.com/artifact-registry/docs/docker/authentication for mode details.

Exceptions
Type Description
ValueError If the image uri is not a container registry or artifact registry uri.
DockerError If the command fails.