Class PrivateEndpoint (1.48.0)

PrivateEndpoint(
    endpoint_name: str,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
)

Represents a Vertex AI PrivateEndpoint resource.

Read more about private endpoints in the documentation.

Properties

create_time

Time this resource was created.

display_name

Display name of this resource.

encryption_spec

Customer-managed encryption key options for this Vertex AI resource.

If this is set, then all resources created by this Vertex AI resource will be encrypted with the provided encryption key.

explain_http_uri

HTTP path to send explain requests to, used when calling PrivateEndpoint.explain()

gca_resource

The underlying resource proto representation.

health_http_uri

HTTP path to send health check requests to, used when calling PrivateEndpoint.health_check()

labels

User-defined labels containing metadata about this resource.

Read more about labels at https://goo.gl/xmQnxf

name

Name of this resource.

network

The full name of the Google Compute Engine network to which this Endpoint should be peered.

Takes the format projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name.

Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network.

predict_http_uri

HTTP path to send prediction requests to, used when calling PrivateEndpoint.predict()

preview

Return an Endpoint instance with preview features enabled.

resource_name

Full qualified resource name.

traffic_split

A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel.

If a DeployedModel's ID is not listed in this map, then it receives no traffic.

The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.

update_time

Time this resource was last updated.

Methods

PrivateEndpoint

PrivateEndpoint(
    endpoint_name: str,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
)

Retrieves a PrivateEndpoint resource.

Example usage: my_private_endpoint = aiplatform.PrivateEndpoint( endpoint_name="projects/123/locations/us-central1/endpoints/1234567891234567890" )

or (when project and location are initialized)

my_private_endpoint = aiplatform.PrivateEndpoint(
    endpoint_name="1234567891234567890"
)
Parameters
Name Description
endpoint_name str

Required. A fully-qualified endpoint resource name or endpoint ID. Example: "projects/123/locations/us-central1/endpoints/my_endpoint_id" or "my_endpoint_id" when project and location are initialized or passed.

project str

Optional. Project to retrieve endpoint from. If not set, project set in aiplatform.init will be used.

location str

Optional. Location to retrieve endpoint from. If not set, location set in aiplatform.init will be used.

credentials auth_credentials.Credentials

Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.

Exceptions
Type Description
ValueError If the Endpoint being retrieved is not a PrivateEndpoint.
ImportError If there is an issue importing the urllib3 package.

create

create(
    display_name: str,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    network: typing.Optional[str] = None,
    description: typing.Optional[str] = None,
    labels: typing.Optional[typing.Dict[str, str]] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
    encryption_spec_key_name: typing.Optional[str] = None,
    sync=True,
) -> google.cloud.aiplatform.models.PrivateEndpoint

Creates a new PrivateEndpoint.

Example usage: my_private_endpoint = aiplatform.PrivateEndpoint.create( display_name="my_endpoint_name", project="my_project_id", location="us-central1", network="projects/123456789123/global/networks/my_vpc" )

or (when project and location are initialized)

my_private_endpoint = aiplatform.PrivateEndpoint.create(
    display_name="my_endpoint_name",
    network="projects/123456789123/global/networks/my_vpc"
)
Parameters
Name Description
display_name str

Required. The user-defined name of the Endpoint. The name can be up to 128 characters long and can be consist of any UTF-8 characters.

project str

Optional. Project to retrieve endpoint from. If not set, project set in aiplatform.init will be used.

location str

Optional. Location to retrieve endpoint from. If not set, location set in aiplatform.init will be used.

network str

Optional. The full name of the Compute Engine network to which this Endpoint will be peered. E.g. "projects/123456789123/global/networks/my_vpc". Private services access must already be configured for the network. If left unspecified, the network set with aiplatform.init will be used.

description str

Optional. The description of the Endpoint.

labels Dict[str, str]

Optional. The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

credentials auth_credentials.Credentials

Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.

encryption_spec_key_name str

Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If set, this Model and all sub-resources of this Model will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.

sync bool

Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.

Exceptions
Type Description
ValueError A network must be instantiated when creating a PrivateEndpoint.
Returns
Type Description
endpoint (aiplatform.PrivateEndpoint) Created endpoint.

delete

delete(force: bool = False, sync: bool = True) -> None

Deletes this Vertex AI PrivateEndpoint resource. If force is set to True, all models on this PrivateEndpoint will be undeployed prior to deletion.

Parameters
Name Description
force bool

Required. If force is set to True, all deployed models on this Endpoint will be undeployed first. Default is False.

sync bool

Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.

Exceptions
Type Description
FailedPrecondition If models are deployed on this Endpoint and force = False.

deploy

deploy(
    model: google.cloud.aiplatform.models.Model,
    deployed_model_display_name: typing.Optional[str] = None,
    machine_type: typing.Optional[str] = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    accelerator_type: typing.Optional[str] = None,
    accelerator_count: typing.Optional[int] = None,
    tpu_topology: typing.Optional[str] = None,
    service_account: typing.Optional[str] = None,
    explanation_metadata: typing.Optional[
        google.cloud.aiplatform_v1.types.explanation_metadata.ExplanationMetadata
    ] = None,
    explanation_parameters: typing.Optional[
        google.cloud.aiplatform_v1.types.explanation.ExplanationParameters
    ] = None,
    metadata: typing.Optional[typing.Sequence[typing.Tuple[str, str]]] = (),
    sync=True,
    disable_container_logging: bool = False,
) -> None

Deploys a Model to the PrivateEndpoint.

Example Usage: my_private_endpoint.deploy( model=my_model )

Parameters
Name Description
deployed_model_display_name str

Optional. The display name of the DeployedModel. If not provided upon creation, the Model's display_name is used.

machine_type str

Optional. The type of machine. Not specifying machine type will result in model to be deployed with automatic resources.

min_replica_count int

Optional. The minimum number of machine replicas this deployed model will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.

max_replica_count int

Optional. The maximum number of replicas this deployed model may be deployed on when the traffic against it increases. If requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the deployed model increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, the larger value of min_replica_count or 1 will be used. If value provided is smaller than min_replica_count, it will automatically be increased to be min_replica_count.

accelerator_type str

Optional. Hardware accelerator type. Must also set accelerator_count if used. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4

accelerator_count int

Optional. The number of accelerators to attach to a worker replica.

tpu_topology str

Optional. The TPU topology to use for the DeployedModel. Required for CloudTPU multihost deployments.

service_account str

The service account that the DeployedModel's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project. Users deploying the Model must have the iam.serviceAccounts.actAs permission on this service account.

explanation_metadata aiplatform.explain.ExplanationMetadata

Optional. Metadata describing the Model's input and output for explanation. explanation_metadata is optional while explanation_parameters must be specified when used. For more details, see Ref docs http://tinyurl.com/1igh60kt

explanation_parameters aiplatform.explain.ExplanationParameters

Optional. Parameters to configure explaining for Model's predictions. For more details, see Ref docs http://tinyurl.com/1an4zake

metadata Sequence[Tuple[str, str]]

Optional. Strings which should be sent along with the request as metadata.

model aiplatform.Model

Required. Model to be deployed.

sync bool

Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.

explain

explain()

Make a prediction with explanations against this Endpoint.

Example usage: response = my_endpoint.explain(instances=[...]) my_explanations = response.explanations

Parameters
Name Description
instances List

Required. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] instance_schema_uri.

parameters Dict

The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] parameters_schema_uri.

deployed_model_id str

Optional. If specified, this ExplainRequest will be served by the chosen DeployedModel, overriding this Endpoint's traffic split.

timeout float

Optional. The timeout for this request in seconds.

Returns
Type Description
prediction (aiplatform.Prediction) Prediction with returned predictions, explanations, and Model ID.

explain_async

explain_async(
    instances: typing.List[typing.Dict],
    *,
    parameters: typing.Optional[typing.Dict] = None,
    deployed_model_id: typing.Optional[str] = None,
    timeout: typing.Optional[float] = None
) -> google.cloud.aiplatform.models.Prediction

Make a prediction with explanations against this Endpoint.

Example usage:

response = await my_endpoint.explain_async(instances=[...])
my_explanations = response.explanations
```
Parameters
Name Description
instances List

Required. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] instance_schema_uri.

parameters Dict

The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] parameters_schema_uri.

deployed_model_id str

Optional. If specified, this ExplainRequest will be served by the chosen DeployedModel, overriding this Endpoint's traffic split.

timeout float

Optional. The timeout for this request in seconds.

Returns
Type Description
prediction (aiplatform.Prediction) Prediction with returned predictions, explanations, and Model ID.

health_check

health_check() -> bool

Makes a request to this PrivateEndpoint's health check URI. Must be within network that this PrivateEndpoint is in.

Example Usage: if my_private_endpoint.health_check(): print("PrivateEndpoint is healthy!")

Exceptions
Type Description
RuntimeError If a model has not been deployed a request cannot be made.
Returns
Type Description
bool Checks if calls can be made to this PrivateEndpoint.

list

list(
    filter: typing.Optional[str] = None,
    order_by: typing.Optional[str] = None,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
) -> typing.List[google.cloud.aiplatform.models.PrivateEndpoint]

List all PrivateEndpoint resource instances.

Example Usage: my_private_endpoints = aiplatform.PrivateEndpoint.list()

or

my_private_endpoints = aiplatform.PrivateEndpoint.list(
    filter='labels.my_label="my_label_value" OR display_name=!"old_endpoint"',
)
Parameters
Name Description
filter str

Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported.

order_by str

Optional. A comma-separated list of fields to order by, sorted in ascending order. Use "desc" after a field name for descending. Supported fields: display_name, create_time, update_time

project str

Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used.

location str

Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used.

credentials auth_credentials.Credentials

Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init.

Returns
Type Description
List[models.PrivateEndpoint] A list of PrivateEndpoint resource objects.

list_models

list_models() -> (
    typing.List[google.cloud.aiplatform_v1.types.endpoint.DeployedModel]
)

Returns a list of the models deployed to this Endpoint.

Returns
Type Description
deployed_models (List[aiplatform.gapic.DeployedModel]) A list of the models deployed in this Endpoint.

predict

predict(
    instances: typing.List, parameters: typing.Optional[typing.Dict] = None
) -> google.cloud.aiplatform.models.Prediction

Make a prediction against this PrivateEndpoint using a HTTP request. This method must be called within the network the PrivateEndpoint is peered to. Otherwise, the predict() call will fail with error code 404. To check, use PrivateEndpoint.network.

Example usage: response = my_private_endpoint.predict(instances=[...]) my_predictions = response.predictions

Parameters
Name Description
instances List

Required. The instances that are the input to the prediction call. Instance types mut be JSON serializable. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] instance_schema_uri.

parameters Dict

The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] parameters_schema_uri.

Exceptions
Type Description
RuntimeError If a model has not been deployed a request cannot be made.
Returns
Type Description
prediction (aiplatform.Prediction) Prediction object with returned predictions and Model ID.

predict_async

predict_async(
    instances: typing.List,
    *,
    parameters: typing.Optional[typing.Dict] = None,
    timeout: typing.Optional[float] = None
) -> google.cloud.aiplatform.models.Prediction

Make an asynchronous prediction against this Endpoint. Example usage:

response = await my_endpoint.predict_async(instances=[...])
my_predictions = response.predictions
```
Parameters
Name Description
instances List

Required. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] instance_schema_uri.

parameters Dict

Optional. The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] parameters_schema_uri.

timeout float

Optional. The timeout for this request in seconds.

Returns
Type Description
prediction (aiplatform.Prediction) Prediction with returned predictions and Model ID.

raw_predict

raw_predict(
    body: bytes, headers: typing.Dict[str, str]
) -> requests.models.Response

Make a prediction request using arbitrary headers. This method must be called within the network the PrivateEndpoint is peered to. Otherwise, the predict() call will fail with error code 404. To check, use PrivateEndpoint.network.

Example usage: my_endpoint = aiplatform.PrivateEndpoint(ENDPOINT_ID) response = my_endpoint.raw_predict( body = b'{"instances":[{"feat_1":val_1, "feat_2":val_2}]}' headers = {'Content-Type':'application/json'} ) status_code = response.status_code results = json.dumps(response.text)

Parameters
Name Description
body bytes

The body of the prediction request in bytes. This must not exceed 1.5 mb per request.

headers Dict[str, str]

The header of the request as a dictionary. There are no restrictions on the header.

to_dict

to_dict() -> typing.Dict[str, typing.Any]

Returns the resource proto as a dictionary.

undeploy

undeploy(deployed_model_id: str, sync=True) -> None

Undeploys a deployed model from the PrivateEndpoint.

Example Usage: my_private_endpoint.undeploy( deployed_model_id="1234567891232567891" )

or

my_deployed_model_id = my_private_endpoint.list_models()[0].id
my_private_endpoint.undeploy(
    deployed_model_id=my_deployed_model_id
)
Parameters
Name Description
deployed_model_id str

Required. The ID of the DeployedModel to be undeployed from the PrivateEndpoint. Use PrivateEndpoint.list_models() to get the deployed model ID.

sync bool

Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.

undeploy_all

undeploy_all(sync: bool = True) -> google.cloud.aiplatform.models.Endpoint

Undeploys every model deployed to this Endpoint.

Parameter
Name Description
sync bool

Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.

update

update(
    display_name: typing.Optional[str] = None,
    description: typing.Optional[str] = None,
    labels: typing.Optional[typing.Dict[str, str]] = None,
    traffic_split: typing.Optional[typing.Dict[str, int]] = None,
    request_metadata: typing.Optional[typing.Sequence[typing.Tuple[str, str]]] = (),
    update_request_timeout: typing.Optional[float] = None,
) -> google.cloud.aiplatform.models.Endpoint

Updates an endpoint.

Example usage: my_endpoint = my_endpoint.update( display_name='my-updated-endpoint', description='my updated description', labels={'key': 'value'}, traffic_split={ '123456': 20, '234567': 80, }, )

Parameters
Name Description
display_name str

Optional. The display name of the Endpoint. The name can be up to 128 characters long and can be consist of any UTF-8 characters.

description str

Optional. The description of the Endpoint.

labels Dict[str, str]

Optional. The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

traffic_split Dict[str, int]

Optional. A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.

request_metadata Sequence[Tuple[str, str]]

Optional. Strings which should be sent along with the request as metadata.

update_request_timeout float

Optional. The timeout for the update request in seconds.

Exceptions
Type Description
ValueError If labels is not the correct format.
Returns
Type Description
Endpoint (aiplatform.Prediction) Updated endpoint resource.

wait

wait()

Helper method that blocks until all futures are complete.