This page describes troubleshooting steps that you might find helpful if you run into problems when you use Vertex AI Workbench.
See also Troubleshooting Vertex AI for help using other components of Vertex AI.
To filter this page's content, click a topic:
Vertex AI Workbench instances
This section describes troubleshooting steps for Vertex AI Workbench instances.
Connecting to and opening JupyterLab
This section describes troubleshooting steps for connecting to and opening JupyterLab.
Nothing happens after clicking Open JupyterLab
Issue
When you click Open JupyterLab, nothing happens.
Solution
Verify that your browser doesn't block new tabs from opening automatically. JupyterLab opens in a new browser tab.
Can't access the terminal in a Vertex AI Workbench instance
Issue
If you're unable to access the terminal or can't find the terminal window in the launcher, it could be because your Vertex AI Workbench instance doesn't have terminal access enabled.
Solution
You must create a new Vertex AI Workbench instance with the Terminal access option enabled. This option can't be changed after instance creation.
502 error when opening JupyterLab
Issue
A 502 error might mean that your Vertex AI Workbench instance isn't ready yet.
Solution
Wait a few minutes, refresh the Google Cloud console browser tab, and try again.
Notebook is unresponsive
Issue
Your Vertex AI Workbench instance isn't running cells or appears to be frozen.
Solution
First try restarting the kernel by clicking Kernel from the top menu and then Restart Kernel. If that doesn't work, you can try the following:
- Refresh the JupyterLab browser page. Unsaved cell output doesn't persist, so you must run those cells again to regenerate the output.
- Reset your instance.
Unable to connect with Vertex AI Workbench instance using SSH
Issue
You're unable to connect to your instance by using SSH through a terminal window.
Vertex AI Workbench instances use OS Login to
enable SSH access. When you create an instance, Vertex AI Workbench enables
OS Login by default by setting the metadata key enable-oslogin
to TRUE
. If
you're unable to use SSH to connect to your instance, this metadata key might
need to be set to TRUE
.
Solution
Connecting to a Vertex AI Workbench instance by using the Google Cloud console isn't supported. If you're unable to connect to your instance by using SSH through a terminal window, see the following:
To set the metadata key enable-oslogin
to TRUE
, use the
projects.locations.instances.patch
method in the Notebooks API or the
gcloud workbench instances update
command in the Google Cloud SDK.
GPU quota has been exceeded
Issue
You're unable to create a Vertex AI Workbench instance with GPUs.
Solution
Determine the number of GPUs available in your project by checking the quotas page. If GPUs aren't listed on the quotas page, or you require additional GPU quota, you can request a quota increase. See Request a higher quota limit.
Creating Vertex AI Workbench instances
This section describes how to troubleshoot issues related to creating Vertex AI Workbench instances.
Instance stays in pending state indefinitely or is stuck in provisioning status
Issue
After creating a Vertex AI Workbench instance, it stays in the pending state indefinitely. An error like the following might appear in the serial logs:
Could not resolve host: notebooks.googleapis.com
If your instance is stuck in provisioning status, this could be because you have an invalid private networking configuration for your instance.
Solution
Follow the steps in the Instance logs show connection or timeout errors section.
Unable to create an instance within a Shared VPC network
Issue
Attempting to create an instance within a Shared VPC network results in an error message like the following:
Required 'compute.subnetworks.use' permission for 'projects/network-administration/regions/us-central1/subnetworks/v'
Solution
The issue is that the Notebooks Service Account is attempting to create the instance without the correct permissions.
To ensure that the Notebooks Service Account has the necessary
permissions to ensure that the Notebooks Service Account can create a Vertex AI Workbench instance within a Shared VPC network,
ask your administrator to grant the Notebooks Service Account the
Compute Network User role (roles/compute.networkUser
)
IAM role on the host project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to ensure that the Notebooks Service Account can create a Vertex AI Workbench instance within a Shared VPC network. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to ensure that the Notebooks Service Account can create a Vertex AI Workbench instance within a Shared VPC network:
-
To use subnetworks:
compute.subnetworks.use
Your administrator might also be able to give the Notebooks Service Account these permissions with custom roles or other predefined roles.
Can't create a Vertex AI Workbench instance with a custom container
Issue
There isn't an option to use a custom container when creating a Vertex AI Workbench instance in the Google Cloud console.
Solution
Adding a custom container to a Vertex AI Workbench instance isn't supported, and you can't add a custom container by using the Google Cloud console.
Adding a conda environment is recommended instead of using a custom container.
You can add a custom container to a Vertex AI Workbench instance by using the Notebooks API, but this capability isn't supported.
Mount shared storage button isn't there
Issue
The Mount shared storage button isn't in the File Browser tab of the JupyterLab interface.
Solution
The storage.buckets.list
permission is required for the
Mount shared storage button to appear in the JupyterLab interface of your
Vertex AI Workbench instance. Ask your administrator to grant your
Vertex AI Workbench instance's service account the
storage.buckets.list
permission on the project.
599 error when using Dataproc
Issue
Attempting to create a Dataproc-enabled instance results in an error message like the following:
HTTP 599: Unknown (Error from Gateway: [Timeout while connecting] Exception while attempting to connect to Gateway server url. Ensure gateway url is valid and the Gateway instance is running.)
Solution
In your Cloud DNS configuration, add a Cloud DNS entry for the
*.googleusercontent.com
domain.
Unable to install third-party JupyterLab extension
Issue
Attempting to install a third-party JupyterLab extension results in an
Error: 500
message.
Solution
Third-party JupyterLab extensions aren't supported in Vertex AI Workbench instances.
Unable to edit underlying virtual machine
Issue
When you try to edit the underlying virtual machine (VM) of a Vertex AI Workbench instance, you might get an error message similar to the following:
Current principal doesn't have permission to mutate this resource.
Solution
This error occurs because you can't edit the underlying VM of an instance by using the Google Cloud console or the Compute Engine API.
To edit a Vertex AI Workbench instance's underlying VM, use the
projects.locations.instances.patch
method in the Notebooks API or the
gcloud workbench instances update
command in the Google Cloud SDK
pip
packages aren't available after adding conda environment
Issue
Your pip
packages aren't available after you add a conda-based kernel.
Solution
To resolve the issue, see Add a conda environment and try the following:
Check that you used the
DL_ANACONDA_ENV_HOME
variable and that it contains the name of your environment.Check that
pip
is located in a path similar toopt/conda/envs/ENVIRONMENT/bin/pip
. You can run thewhich pip
command to get the path.
Unable to access or copy data of an instance with single user access
Issue
The data on an instance with single user access is inaccessible.
For Vertex AI Workbench instances that are set up with single user access, only the specified single user (the owner) can access the data on the instance.
Solution
To access or copy the data when you aren't the owner of the instance, open a support case.
Unexpected shutdown
Issue
Your Vertex AI Workbench instance shuts down unexpectedly.
Solution
If your instance shuts down unexpectedly, this could be because idle shutdown was initiated.
If you enabled idle shutdown, your instance shuts down when there is no kernel activity for the specified time period. For example, running a cell or new output printing to a notebook is activity that resets the idle timeout timer. CPU usage doesn't reset the idle timeout timer.
Instance logs show connection or timeout errors
Issue
Your Vertex AI Workbench instance's logs show connection or timeout errors.
Solution
If you notice connection or timeout errors in the instance's logs make sure that the Jupyter server is running on port 8080. Follow the steps in the Verify that the Jupyter internal API is active section.
If you have turned off External IP
and you are using a private VPC
network make sure you have also followed the
network configuration options documentation.
Consider the following:
You must enable Private Google Access on the chosen subnetwork in the same region where your instance is located in the VPC host project. For more information on configuring Private Google Access, see the Private Google Access documentation.
If you're using Cloud DNS, the instance must be able to resolve the required Cloud DNS domains specified in the network configuration options documentation. To verify this, follow the steps in the Verify the instance can resolve the required DNS domains section.
Instance logs show 'Unable to contact Jupyter API' 'ReadTimeoutError'
Issue
Your Vertex AI Workbench instance logs show an error such as:
notebooks_collection_agent. Unable to contact Jupyter API:
HTTPConnectionPool(host=\'127.0.0.1\', port=8080):
Max retries exceeded ReadTimeoutError(\"HTTPConnectionPool(host=\'127.0.0.1\', port=8080
Solution
Follow the steps in the
Instance logs show connection or timeout errors section.
You can also try
modifying the Notebooks Collection Agent script
to change HTTP_TIMEOUT_SESSION
to a larger value,
for example: 60
, to help verify whether the request has failed due to
the call taking too long to respond or the requested URL can't be reached.
Managed notebooks
This section describes troubleshooting steps for managed notebooks.
Connecting to and opening JupyterLab
This section describes troubleshooting issues with connecting to and opening JupyterLab.
Nothing happens after clicking Open JupyterLab
Issue
When you click Open JupyterLab, nothing happens.
Solution
Verify that your browser doesn't block new tabs from opening automatically. JupyterLab opens in a new browser tab.
Unable to connect with managed notebooks instance using SSH
Issue
There isn't an option to connect with managed notebooks instances by using SSH.
Solution
SSH access to managed notebooks instances isn't available.
Can't access the terminal in a managed notebooks instance
Issue
If you're unable to access the terminal or can't find the terminal window in the launcher, it could be because your managed notebooks instance doesn't have terminal access enabled.
Solution
You must create a new managed notebooks instance with the Terminal access option enabled. This option can't be changed after instance creation.
502 error when opening JupyterLab
Issue
A 502 error might mean that your managed notebooks instance isn't ready yet.
Solution
Wait a few minutes, refresh the Google Cloud console browser tab, and try again.
Opening a notebook results in a 524 (A Timeout Occurred) error
Issue
A 524 error is usually an indication that the Inverting Proxy agent isn't connecting to the Inverting Proxy server or the requests are taking too long on the backend server side (Jupyter). Common causes of this error include networking issues, the Inverting Proxy agent isn't running, or the Jupyter service isn't running.
Solution
Verify that your managed notebooks instance is started.
Notebook is unresponsive
Issue
managed notebooks instance isn't running cells or appears to be frozen.
Solution
First try restarting the kernel by clicking Kernel from the top menu and then Restart Kernel. If that doesn't work, you can try the following:
- Refresh the JupyterLab browser page. Unsaved cell output doesn't persist, so you must run those cells again to regenerate the output.
- Reset your instance.
Migrating to Vertex AI Workbench instances
This section describes methods for diagnosing and resolving issues with migrating from a managed notebooks instance to a Vertex AI Workbench instance.
Can't find a kernel that was in the managed notebooks instance
Issue
A kernel that was in your managed notebooks instance doesn't appear in the Vertex AI Workbench instance that you migrated to.
Custom containers appear as kernels in managed notebooks. The Vertex AI Workbench migration tool doesn't support custom container migration.
Solution
To resolve this issue, add a conda environment to your Vertex AI Workbench instance.
Different version of framework in migrated instance
Issue
A framework that was in your managed notebooks instance was a different version than the one in the Vertex AI Workbench instance that you migrated to.
Vertex AI Workbench instances provide a default set of framework versions. The migration tool doesn't add framework versions from your original managed notebooks instance. See default migration tool behaviors.
Solution
To add a specific version of a framework, add a conda environment to your Vertex AI Workbench instance.
GPUs aren't migrated to the new Vertex AI Workbench instance
Issue
GPUs that were in your managed notebooks instance aren't in the Vertex AI Workbench instance that you migrated to.
Vertex AI Workbench instances support a default set of GPUs. If the GPUs in your original managed notebooks instance aren't available, your instance is migrated without any GPUs.
Solution
After migration, you can add GPUs to your Vertex AI Workbench instance by
using the
projects.locations.instances.patch
method in the Notebooks API or the
gcloud workbench instances update
command in the Google Cloud SDK.
Migrated instance's machine type is different
Issue
The machine type of your managed notebooks instance is different from the Vertex AI Workbench instance that you migrated to.
Vertex AI Workbench instances don't support all machine types. If the
machine type in your original managed notebooks instance isn't
available, your instance is migrated to the e2-standard-4
machine type.
Solution
After migration, you can change the machine type of your Vertex AI Workbench
instance by using the
projects.locations.instances.patch
method in the Notebooks API or the
gcloud workbench instances update
command in the Google Cloud SDK.
GPU quota has been exceeded
Issue
You are unable to create a managed notebooks instance with GPUs.
Solution
Determine the number of GPUs available in your project by checking the quotas page. If GPUs aren't listed on the quotas page, or you require additional GPU quota, you can request a quota increase. See Request a higher quota limit.
Using container images
This section describes troubleshooting issues with using container images.
Container image doesn't appear as a kernel in JupyterLab
Issue
Container images that don't have a valid kernelspec don't successfully load as kernels in JupyterLab.
Solution
Make sure that your container meets our requirements. For more information, see the custom container requirements.
Notebook disconnects on long-running job
Issue
If you see the following error message when running a job in a notebook, it might be caused by the request taking too long to load, or the CPU or memory utilization is high, which can make the Jupyter Service unresponsive.
{"log":"2021/06/29 18:10:33 failure fetching a VM ID: compute: Received 500
`internal error`\n","stream":"stderr","time":"2021-06-29T18:10:33.383650241Z"}
{"log":"2021/06/29 18:38:26 Websocket failure: failed to read a websocket
message from the server: read tcp [::1]:40168-\u003e[::1]:8080: use of closed
network connection\n","stream":"stderr","time":"2021-06-29T18:38:26.057622824Z"}
Solution
This issue is caused by running a long-running job within a notebook. To run a job that might take a long time to complete, it's recommended to use the executor.
Using the executor
This section describes troubleshooting issues with using executor.
Package installations not available to the executor
Issue
The executor runs your notebook code in a separate environment from the kernel where you run your notebook file's code. Because of this, some of the packages you installed might not be available in the executor's environment.
Solution
To resolve this issue, see Ensure package installations are available to the executor.
401 or 403 errors when running the notebook code using the executor
Issue
A 401 or 403 error when you run the executor can mean that the executor isn't able to access resources.
Solution
See the following for possible causes:
The executor runs your notebook code in a tenant project separate from your managed notebooks instance's project. Therefore, when you access resources through code run by the executor, the executor might not connect to the correct Google Cloud project by default. To resolve this issue, use explicit project selection.
By default, your managed notebooks instance can have access to resources that exist in the same project, and therefore, when you run your notebook file's code manually, these resources don't need additional authentication. However, because the executor runs in a separate tenant project, it does not have the same default access. To resolve this issue, authenticate access using service accounts.
The executor can't use end-user credentials to authenticate access to resources, for example, the
gcloud auth login
command. To resolve this issue, authenticate access using service accounts.
exited with a non-zero status of 127
error when using the executor
Issue
An exited with a non-zero status of 127
error, or "command not found" error,
can happen when you use the executor to run code on a custom container that
doesn't have the nbexecutor
extension installed.
Solution
To ensure that your custom container has the nbexecutor
extension, you can
create a derivative container image from a Deep Learning Containers image.
Deep Learning Containers images include the nbexecutor
extension.
Invalid service networking configuration error message
Issue
This error might look like the following:
Invalid Service Networking configuration. Couldn't find free blocks in allocated IP ranges.
Please use a valid range using: /24 mask or below (/23,/22, etc).
This means that no free blocks were found in the allocated IP ranges of your network.
Solution
Use a subnet mask of /24
or lower. Create a bigger allocated
IP address range
and attach this range by modifying the
private service connection
for servicenetworking-googleapis-com
.
For more information, see Set up a network.
Unable to install third-party JupyterLab extension
Issue
Attempting to install a third-party JupyterLab extension results in an
Error: 500
message.
Solution
Third-party JupyterLab extensions aren't supported in managed notebooks instances.
Unable to access or copy data of an instance with single user access
Issue
The data on an instance with single user access is inaccessible.
Solution
For managed notebooks instances that are set up with single user access, only the specified single user (the owner) can access the data on the instance.
To access or copy the data when you aren't the owner of the instance, open a support case.
Unexpected shutdown
Issue
Your Vertex AI Workbench instance shuts down unexpectedly.
Solution
If your instance shuts down unexpectedly, this could be because idle shutdown was initiated.
If you enabled idle shutdown, your instance shuts down when there is no kernel activity for the specified time period. For example, running a cell or new output printing to a notebook is activity that resets the idle timeout timer. CPU usage doesn't reset the idle timeout timer.
Restore instance
Issue
Restoring a managed notebooks instance after it's been deleted isn't supported.
Solution
To back up the data on your instance, you can save your notebooks to GitHub.
Recover data from an instance
Issue
Recovering data from a managed notebooks instance after it's been deleted isn't supported.
Solution
To back up the data on your instance, you can save your notebooks to GitHub.
Creating managed notebooks instances
This section describes troubleshooting issues with creating managed notebooks instances.
Error: Problem while creating a connection
Issue
You encounter this error while creating an instance:
We encountered a problem while creating a connection.
Service 'servicenetworking.googleapis.com' requires at least
one allocated range to have minimal size; please make sure
at least one allocated range will have prefix length at most '24'.
Solution
Create an allocated IP range
bigger than /24
and attach this range by modifying the private service
connection
for the servicenetworking-googleapis-com
connection.
Creating an instance results in a resource availability error
Issue
You're unable to create an instance because of a resource availability error.
This error can look like the following:
Creating notebook INSTANCE_NAME: ZONE does not have enough resources available to fulfill the request. Retry later or try another zone in your configurations.
Resource errors occur when you request new resources in a zone that can't accommodate your request due to the current unavailability of Compute Engine resources, such as GPUs or CPUs.
Resource errors only apply to new resource requests in the zone and don't affect existing resources. Resource errors aren't related to your Compute Engine quota. Resource errors are temporary and can change frequently based on fluctuating demand.
Solution
To proceed, try the following:
- Create an instance with a different machine type.
- Create the instance in a different zone.
- Attempt the request again later.
- Reduce the amount of resources that you're requesting. For example, try to create an instance with less GPUs, disks, vCPUs, or memory.
Starting an instance results in a resource availability error
Issue
You're unable to start an instance because of a resource availability error.
This error can look like the following:
The zone ZONE_NAME doesn't have enough resources available to fulfill the request. '(resource type:compute)'.
Resource errors occur when you try to start an instance in a zone that can't accommodate your request due to the current unavailability of Compute Engine resources, such as GPUs or CPUs.
Resource errors only apply to the resources you specified in your request at the time you sent the request, not to all resources in the zone. Resource errors aren't related to your Compute Engine quota. Resource errors are temporary and can change frequently based on fluctuating demand.
Solution
To proceed, try the following:
- Change the machine type of your instance.
- Migrate your files and data to an instance in a different zone.
- Attempt the request again later.
- Reduce the amount of resources that you're requesting. For example, start a different instance with less GPUs, disks, vCPUs, or memory.
No route to host
on outbound connections from managed notebooks
Issue
Typically, the only routes you can see in the Google Cloud console are those known to your own VPC as well as the ranges reserved when you complete the VPC Network Peering configuration.
Managed notebooks instances reside in a Google-managed network and run a modified version of Jupyter in a Docker networking namespace within the instance.
The Docker network interface and Linux bridge on this instance may select a
local IP that conflicts with IP ranges being exported over the peering by your
VPC. These are typically in the 172.16.0.0/161
and
192.168.10.0/24
ranges, respectively.
In these circumstances, outbound connections from the instance to these ranges
will fail with a complaint that is some variation of No route to host
despite
VPC routes being correctly shared.
Solution
Invoke ifconfig
in a terminal session and ensure that no IP addresses on any
virtual interfaces in the instance conflict with IP ranges that your
VPC is exporting to the peering connection.
User-managed notebooks
This section describes troubleshooting steps for user-managed notebooks.
Connecting to and opening JupyterLab
This section describes troubleshooting issues with connecting to and opening JupyterLab.
Nothing happens after clicking Open JupyterLab
Issue
When you click Open JupyterLab, nothing happens.
Solution
Verify that your browser doesn't block new tabs from opening automatically. JupyterLab opens in a new browser tab.
No Inverting Proxy server access to JupyterLab
Issue
You are unable to access JupyterLab.
Vertex AI Workbench uses a Google internal Inverting Proxy server to provide access to JupyterLab. User-managed notebooks instance settings, network configuration, and other factors can prevent access to JupyterLab.
Solution
Unable to connect with user-managed notebooks instance using SSH
Issue
You're unable to connect to your instance by using SSH through a terminal window.
User-managed notebooks instances use OS Login
to enable SSH access. When you create an instance, Vertex AI Workbench
enables OS Login by default by setting the metadata key enable-oslogin
to
TRUE
. If you're unable to use SSH to connect to your instance, this metadata
key might need to be set to TRUE
.
Solution
To enable SSH access for user-managed notebooks for users, complete the steps for configuring OS Login roles on user accounts.
Opening a user-managed notebooks instance results in a 403 (Forbidden) error
Issue
A 403 (Forbidden)
error when opening a user-managed notebooks
instance often means that there is an access issue.
Solution
To troubleshoot access issues, consider the three ways that access can be granted to a user-managed notebooks instance:
- Single user
- Service account
- Project editors
The access mode is configured during user-managed notebooks instance creation and it is defined in the notebook metadata:
- Single user:
proxy-mode=mail, proxy-user-mail=user@domain.com
- Service account:
proxy-mode=service_account
- Project editors:
proxy-mode=project_editors
If you can't access a notebook when you click Open JupyterLab, try the following:
Verify that the user accessing the instance has the
iam.serviceAccounts.ActAs
permission for the instance's service account. The service account is either the Compute Engine default service account or a service account that is specified when the instance is created.If your instance uses single user access with a specified service account as the single user, see No JupyterLab access, single user mode enabled.
The following example shows how to specify a service account when you create an instance:
gcloud notebooks instances create nb-1 \ --vm-image-family=tf-latest-cpu \ --metadata=proxy-mode=mail,proxy-user-mail=user@domain.com \ --service-account=your_service_account@project_id.iam.gserviceaccount.com \ --location=us-west1-a
When you click Open JupyterLab to open a notebook, the notebook opens in a
new browser tab. If you are signed in to more than one Google Account, the new
tab opens with your default Google Account. If you didn't create your
user-managed notebooks instance with your default Google Account,
the new browser tab will show a 403 (Forbidden)
error.
No JupyterLab access, single user mode enabled
Issue
You are unable to access JupyterLab.
Solution
If a user is unable to access JupyterLab and the instance's access to JupyterLab
is set to Single user only
, try the following:
On the User-managed notebooks page of the Google Cloud console, click the name of your instance to open the Notebook details page.
Next to View VM details, click View in Compute Engine.
On the VM details page, click Edit.
In the Metadata section, verify that the
proxy-mode
metadata entry is set tomail
.Verify that the
proxy-user-mail
metadata entry is set to a valid user email address, not a service account.Click Save.
On the User-managed notebooks page of the Google Cloud console, initialize the updated metadata by stopping your instance and starting the instance back up again.
Opening a notebook results in a 504 (Gateway Timeout) error
Issue
This is an indication of an internal proxy timeout or a backend server (Jupyter) timeout. This can be seen when:
- The request never reached the internal Inverting Proxy server
- Backend (Jupyter) returns a 504 error.
Solution
Open a Google support case.
Opening a notebook results in a 524 (A Timeout Occurred) error
Issue
The internal Inverting Proxy server hasn't received a response from the Inverting Proxy agent for the request within the timeout period. Inverting Proxy agent runs inside your user-managed notebooks instance as a Docker container. A 524 error is usually an indication that the Inverting Proxy agent isn't connecting to the Inverting Proxy server or the requests are taking too long on the backend server side (Jupyter). A typical case for this error is on the user side (for example, a networking issue, or the Inverting Proxy agent service isn't running).
Solution
If you can't access a notebook, verify that your user-managed notebooks instance is started and try the following:
Option 1: Run the diagnostic tool to automatically check and repair user-managed notebooks core services, verify available storage, and generate useful log files. To run the tool in your instance, perform the following steps:
Make sure that your instance is on version M58 or newer.
Run the following command:
sudo /opt/deeplearning/bin/diagnostic_tool.sh [--repair] [--bucket=$BUCKET]
Note that the
--repair
flag and--bucket
flags are optional. The--repair
flag will attempt to fix common core service errors, and the--bucket
flag will let you specify a Cloud Storage bucket to store the created log files.The output of this command will display useful status messages for user-managed notebooks core services and will export log files of its findings.
Option 2: Use the following steps to check specific user-managed notebooks requirements individually.
Verify that the user-managed notebooks instance disk isn't out of space.
Run the following command:
df -h -T /home/jupyter
If the Use% is more than
85%
, you need to manually delete files from/home/jupyter
. As a first step, you can empty the trash with the following command:sudo rm -rf /home/jupyter/.local/share/Trash/*
Verify that the Inverting Proxy agent is running. If the agent is started, try restarting it.
Make sure the Jupyter service is running. If it is, try restarting it.
Verify memory utilization in the user-managed notebooks instance.
Run the following command:
free -t -h
If the used memory is more than
85%
of the total, you should consider changing the machine type.You can install Cloud Monitoring agent to monitor if there is high memory usage in your user-managed notebooks instance. See pricing information.
Verify that you are using Deep Learning VM version M55 or later. To learn more about upgrading, see Upgrade the environment of a user-managed notebooks instance.
Opening a notebook results in a 598 (Network read timeout) error
Issue
The Inverting Proxy server hasn't heard from the Inverting Proxy agent at all for more than 10 minutes. This is a strong indication of an Inverting Proxy agent issue.
Solution
If you can't access a notebook, try the following:
Verify that your user-managed notebooks instance is started.
Verify that the Inverting Proxy agent is running. If the agent is started, try restarting it.
Make sure the Jupyter service is running. If it is, try restarting it.
Verify that you are using Deep Learning VM version M55 or later. To learn more about upgrading, see Upgrade the environment of a user-managed notebooks instance.
Notebook is unresponsive
Issue
Your user-managed notebooks instance isn't running cells or appears to be frozen.
Solution
First try restarting the kernel by clicking Kernel from the top menu and then Restart Kernel. If that doesn't work, you can try the following:
- Refresh the JupyterLab browser page. Any unsaved cell output doesn't persist, so you must run those cells again to regenerate the output.
- From a terminal session in the notebook, run the command
top
to see if there are processes consuming the CPU. - From the terminal, check the amount of free disk space using the command
df
, or check the available RAM using the commandfree
. - Shut your instance down by selecting it from the User-managed notebooks page and clicking Stop. After it has stopped completely, select it and click Start.
Migrating to Vertex AI Workbench instances
This section describes methods for diagnosing and resolving issues with migrating from a user-managed notebooks instance to a Vertex AI Workbench instance.
Can't find R, Beam, or other kernels that were in the user-managed notebooks instance
Issue
A kernel that was in your user-managed notebooks instance doesn't appear in the Vertex AI Workbench instance that you migrated to.
Some kernels, such as the R and Beam kernels, aren't available in Vertex AI Workbench instances by default. Migration of those kernels isn't supported.
Solution
To resolve this issue, add a conda environment to your Vertex AI Workbench instance.
Can't set up a Dataproc Hub instance in the Vertex AI Workbench instance
Issue
Dataproc Hub isn't supported in Vertex AI Workbench instances.
Solution
Continue to use Dataproc Hub in user-managed notebooks instances.
Different version of framework in migrated instance
Issue
A framework that was in your user-managed notebooks instance was a different version than the one in the Vertex AI Workbench instance that you migrated to.
Vertex AI Workbench instances provide a default set of framework versions. The migration tool doesn't add framework versions from your original user-managed notebooks instance. See default migration tool behaviors.
Solution
To add a specific version of a framework, add a conda environment to your Vertex AI Workbench instance.
GPUs aren't migrated to the new Vertex AI Workbench instance
Issue
GPUs that were in your user-managed notebooks instance aren't in the Vertex AI Workbench instance that you migrated to.
Vertex AI Workbench instances support a default set of GPUs. If the GPUs in your original user-managed notebooks instance aren't available, your instance is migrated without any GPUs.
Solution
After migration, you can add GPUs to your Vertex AI Workbench instance
by using the
projects.locations.instances.patch
method in the Notebooks API or the
gcloud workbench instances update
command in the Google Cloud SDK.
Migrated instance's machine type is different
Issue
The machine type of your user-managed notebooks instance is different from the Vertex AI Workbench instance that you migrated to.
Vertex AI Workbench instances don't support all machine types.
If the machine type in
your original user-managed notebooks instance isn't available,
your instance is migrated to the e2-standard-4
machine type.
Solution
After migration, you can change the machine type of your
Vertex AI Workbench instance by using the
projects.locations.instances.patch
method in the Notebooks API or the
gcloud workbench instances update
command in the Google Cloud SDK.
Working with files
This section describes troubleshooting issues with files for user-managed notebooks instances.
File downloading disabled but user can still download files
Issue
For Dataproc Hub user-managed notebooks instances, disabling file downloading from the JupyterLab user interface isn't supported. User-managed notebooks instances that use the Dataproc Hub framework permit file downloading even if you don't select Enable file downloading from JupyterLab UI when you create the instance.
Solution
Dataproc Hub user-managed notebooks instances don't support restricting file downloads.
Downloaded files are truncated or don't complete downloading
Issue
When you download files from your user-managed notebooks instance, a timeout setting on the proxy-forwarding agent limits the connection time for the download to complete. If the download takes too long, this can truncate your downloaded file or prevent it from being downloaded.
Solution
To download the file, copy your file to Cloud Storage, and then download the file from Cloud Storage.
Consider migrating your files and data to a new user-managed notebooks instance.
After restarting VM, local files can't be referenced from notebook terminal
Issue
Sometimes after restarting a user-managed notebooks instance, local files can't be referenced from within a notebook terminal.
Solution
This is a known issue. To reference your local files from within a notebook terminal, first re-establish your current working directory using the following command:
cd PWD
In this command, replace PWD with your current working
directory. For example, if your current working directory was
/home/jupyter/
, use the command cd /home/jupyter/
.
After re-establishing your current working directory, your local files can be referenced from within the notebook terminal.
Creating user-managed notebooks instances
This section describes troubleshooting issues with creating user-managed notebooks instances.
GPU quota has been exceeded
Issue
You are unable to create a user-managed notebooks instance with GPUs.
Solution
Determine the number of GPUs available in your project by checking the quotas page. If GPUs aren't listed on the quotas page, or you require additional GPU quota, you can request a quota increase. See Request a higher quota limit.
Instance stays in pending state indefinitely
Issue
After creating a user-managed notebooks instance, it stays in the pending state indefinitely. An error like the following might appear in the serial logs:
Could not resolve host: notebooks.googleapis.com
Solution
Your instance can't connect to the Notebooks API server due to a Cloud DNS configuration or other network issue. To resolve the issue, check your Cloud DNS and network configurations. For more information, see the network configuration options section.
New user-managed notebooks instance isn't created (insufficient permissions)
Issue
It usually takes about a minute to create a user-managed notebooks
instance. If your new user-managed notebooks instance remains in
the pending
state indefinitely, it might be because the service account used
to start the user-managed notebooks instance doesn't have the
required permissions in your Google Cloud project.
You can start a user-managed notebooks instance with a custom service account that you create or in single-user mode with a user ID. If you start a user-managed notebooks instance in single-user mode, then your user-managed notebooks instance begins the boot process using the Compute Engine default service account before turning control over to your user ID.
Solution
To verify that a service account has the appropriate permissions, follow these steps:
Console
Open the IAM page in the Google Cloud console.
Determine the service account used with your user-managed notebooks instance, which is one of the following:
A custom service account that you specified when you created your user-managed notebooks instance.
The Compute Engine default service account for your Google Cloud project, which is used when you start your user-managed notebooks instance in single-user mode. The Compute Engine default service account for your Google Cloud project is named
PROJECT_NUMBER-compute@developer.gserviceaccount.com
. For example:113377992299-compute@developer.gserviceaccount.com
.
Verify that your service account has the Notebooks Runner (
roles/notebooks.runner
) role. If not, grant the service account the Notebooks Runner (roles/notebooks.runner
) role.
For more information, see Granting, changing, and revoking access to resources in the IAM documentation.
gcloud
If you haven't already, install the Google Cloud CLI.
Get the name and project number for your Google Cloud project with the following command. Replace PROJECT_ID with the project ID for your Google Cloud project.
gcloud projects describe PROJECT_ID
You should see output similar to the following, which displays the name (
name:
) and project number (projectNumber:
) for your project.createTime: '2018-10-18T21:03:31.408Z' lifecycleState: ACTIVE name: my-project-name parent: id: '396521612403' type: folder projectId: my-project-id-1234 projectNumber: '113377992299'
Determine the service account used with your user-managed notebooks instance, which is one of the following:
A custom service account that you specified when you created your user-managed notebooks instance.
The Compute Engine default service account for your Google Cloud project, which is used when you start your user-managed notebooks instance in single-user mode. The Compute Engine default service account for your Google Cloud project is named
PROJECT_NUMBER-compute@developer.gserviceaccount.com
. For example:113377992299-compute@developer.gserviceaccount.com
.
Add the
roles/notebooks.runner
role to the service account with the following command. Replace project-name with the name of your project, and replace service-account-id with the service account ID for your user-managed notebooks instance.gcloud projects add-iam-policy-binding project-name \ --member serviceAccount:service-account-id \ --role roles/notebooks.runner
Creating an instance results in a Permission denied
error
Issue
The service account on the instance provides access to other Google Cloud
services. You can use any service account within the same project, but you must
have the Service Account User permission (iam.serviceAccounts.actAs
) to create
the instance. If not specified, the Compute Engine default service
account is used.
Solution
When creating an instance, verify that the user creating the instance has the
iam.serviceAccounts.ActAs
permission for the defined service account.
The following example shows how to specify a service account when you create an instance:
gcloud notebooks instances create nb-1 \ --vm-image-family=tf-latest-cpu \ --service-account=your_service_account@project_id.iam.gserviceaccount.com \ --location=us-west1-a
To grant the Service Account User role, see Manage access to service accounts.
Creating an instance results in an already exists
error
Issue
When creating an instance, verify that a user-managed notebooks instance with the same name wasn't deleted previously by Compute Engine and still exists in the Notebooks API database.
Solution
The following example shows how to list instances using the Notebooks API and verify their state.
gcloud notebooks instances list --location=LOCATION
If an instance's state is DELETED
, run the following command to
delete it permanently.
gcloud notebooks instances delete INSTANCE_NAME --location=LOCATION
Unable to create an instance in a Shared VPC
Issue
You're unable to create an instance in a Shared VPC.
Solution
If you are using Shared VPC, you must add the
host and the service projects to the service perimeter. In the host project, you
must also grant the Compute Network User role
(roles/compute.networkUser
) to the Notebooks Service Agent
from the service project. For more information, see Managing
service perimeters.
Creating an instance results in a resource availability error
Issue
You're unable to create an instance because of a resource availability error.
This error can look like the following:
Creating notebook INSTANCE_NAME: ZONE does not have enough resources available to fulfill the request. Retry later or try another zone in your configurations.
Resource errors occur when you request new resources in a zone that can't accommodate your request due to the current unavailability of Compute Engine resources, such as GPUs or CPUs.
Resource errors only apply to new resource requests in the zone and don't affect existing resources. Resource errors aren't related to your Compute Engine quota. Resource errors are temporary and can change frequently based on fluctuating demand.
Solution
To proceed, you can try the following:
- Create an instance with a different machine type.
- Create the instance in a different zone.
- Attempt the request again later.
- Reduce the amount of resources that you're requesting. For example, try to create an instance with less GPUs, disks, vCPUs, or memory.
Starting an instance results in a resource availability error
Issue
You're unable to start an instance because of a resource availability error.
This error can look like the following:
The zone ZONE_NAME doesn't have enough resources available to fulfill the request. '(resource type:compute)'.
Resource errors occur when you try to start an instance in a zone that can't accommodate your request due to the current unavailability of Compute Engine resources, such as GPUs or CPUs.
Resource errors only apply to the resources you specified in your request at the time you sent the request, not to all resources in the zone. Resource errors aren't related to your Compute Engine quota. Resource errors are temporary and can change frequently based on fluctuating demand.
Solution
To proceed, you can try the following:
- Change the machine type of your instance.
- Migrate your files and data to an instance in a different zone.
- Attempt the request again later.
- Reduce the amount of resources that you're requesting. For example, start a different instance with less GPUs, disks, vCPUs, or memory.
Upgrading user-managed notebooks instances
This section describes troubleshooting issues with upgrading user-managed notebooks instances.
Unable to upgrade because unable to get instance disk information
Issue
Upgrade isn't supported for single-disk user-managed notebooks instances.
Solution
You might want to migrate your user data to a new user-managed notebooks instance.
Unable to upgrade because instance isn't UEFI compatible
Issue
Vertex AI Workbench depends on UEFI compatibility to complete an upgrade.
User-managed notebooks instances created from some older images are not UEFI compatible, and therefore can't be upgraded.
Solution
To verify that your instance is UEFI compatible, type the following command in either Cloud Shell or any environment where the Google Cloud CLI is installed.
gcloud compute instances describe INSTANCE_NAME \ --zone=ZONE | grep type
Replace the following:
INSTANCE_NAME
: the name of your instanceZONE
: the zone where your instance is located
To verify that the image that you used to create your instance is UEFI compatible, use the following command:
gcloud compute images describe VM_IMAGE_FAMILY \ --project deeplearning-platform-release | grep type
Replace VM_IMAGE_FAMILY
with the image
family name that you used to create your instance.
If you determine that either your instance or image isn't UEFI compatible, you can attempt to migrate your user data to a new user-managed notebooks instance. To do so, complete the following steps:
Verify that the image that you want to use to create your new instance is UEFI compatible. To do so, type the following command in either Cloud Shell or any environment where the Google Cloud CLI is installed.
gcloud compute images describe VM_IMAGE_FAMILY \ --project deeplearning-platform-release --format=json | grep type
Replace
VM_IMAGE_FAMILY
with the image family name that you want to use to create your instance.Migrate your user data to a new user-managed notebooks instance.
User-managed notebooks instance isn't accessible after upgrade
Issue
If the user-managed notebooks instance isn't accessible after an upgrade, there might have been a failure during the replacement of the boot disk image.
User-managed notebooks instances that can be upgraded are dual-disk, with one boot disk and one data disk. The upgrade process upgrades the boot disk to a new image while preserving your data on the data disk.
Solution
Complete the following steps to attach a new valid image to the boot disk.
To store values you'll use to complete this procedure, type the following command in either Cloud Shell or any environment where the Google Cloud CLI is installed.
export INSTANCE_NAME=MY_INSTANCE_NAME export PROJECT_ID=MY_PROJECT_ID export ZONE=MY_ZONE
Replace the following:
MY_INSTANCE_NAME
: the name of your instanceMY_PROJECT_ID
: your project IDMY_ZONE
: the zone where your instance is located
Use the following command to stop the instance:
gcloud compute instances stop $INSTANCE_NAME \ --project=$PROJECT_ID --zone=$ZONE
Detach the data disk from the instance.
gcloud compute instances detach-disk $INSTANCE_NAME --device-name=data \ --project=$PROJECT_ID --zone=$ZONE
Delete the instance's VM.
gcloud compute instances delete $INSTANCE_NAME --keep-disks=all --quiet \ --project=$PROJECT_ID --zone=$ZONE
Use the Notebooks API to delete the user-managed notebooks instance.
gcloud notebooks instances delete $INSTANCE_NAME \ --project=$PROJECT_ID --location=$ZONE
Create a user-managed notebooks instance using the same name as your previous instance.
gcloud notebooks instances create $INSTANCE_NAME \ --vm-image-project="deeplearning-platform-release" \ --vm-image-family=MY_VM_IMAGE_FAMILY \ --instance-owners=MY_INSTANCE_OWNER \ --machine-type=MY_MACHINE_TYPE \ --service-account=MY_SERVICE_ACCOUNT \ --accelerator-type=MY_ACCELERATOR_TYPE \ --accelerator-core-count=MY_ACCELERATOR_CORE_COUNT \ --install-gpu-driver \ --project=$PROJECT_ID \ --location=$ZONE
Replace the following:
MY_VM_IMAGE_FAMILY
: the image family nameMY_INSTANCE_OWNER
: your instance ownerMY_MACHINE_TYPE
: the machine type of your instance's VMMY_SERVICE_ACCOUNT
: the service account to use with this instance, or use"default"
MY_ACCELERATOR_TYPE
: the accelerator type; for example,"NVIDIA_TESLA_T4"
MY_ACCELERATOR_CORE_COUNT
: the core count; for example,1
Monitoring health status of user-managed notebooks instances
This section describes how to troubleshoot issues with monitoring health status errors.
docker-proxy-agent
status failure
Follow these steps after a docker-proxy-agent
status failure:
Verify that the Inverting Proxy agent is running. If not, go to step 3.
docker-service
status failure
Follow these steps after a docker-service
status failure:
jupyter-service
status failure
Follow these steps after a jupyter-service
status failure:
jupyter-api
status failure
Follow these steps after a jupyter-api
status failure:
Boot disk utilization percent
The boot disk space status is unhealthy if the disk space is greater than 85% full.
If your boot disk space status is unhealthy, try the following:
From a terminal session in the user-managed notebooks instance or using ssh to connect, check the amount of free disk space using the command
df -H
.Use the command
find . -type d -size +100M
to help you find large files that you might be able to delete, but don't delete them unless you are sure you can safely do so. If you aren't sure, you can get help from support.If the previous steps don't solve your problem, get support.
Data disk utilization percent
The data disk space status is unhealthy if the disk space is greater than 85% full.
If your data disk space status is unhealthy, try the following:
From a terminal session in the user-managed notebooks instance or using ssh to connect, check the amount of free disk space using the command
df -h -T /home/jupyter
.Delete large files to increase the available disk space. Use the command
find . -type d -size +100M
to help you find large files.If the previous steps don't solve your problem, get support.
Unable to install third-party JupyterLab extension
Issue
Attempting to install a third-party JupyterLab extension results in an
Error: 500
message.
Solution
Third-party JupyterLab extensions aren't supported in user-managed notebooks instances.
Restore instance
Issue
Restoring a user-managed notebooks instance after it's been deleted isn't supported.
Solution
To back up the data on your instance, you can save your notebooks to GitHub or make a snapshot of the disk.
Recover data from an instance
Issue
Recovering data from a user-managed notebooks instance after it's been deleted isn't supported.
Solution
To back up the data on your instance, you can save your notebooks to GitHub or make a snapshot of the disk
Unable to increase shared memory
Issue
You can't increase shared memory on an existing user-managed notebooks instance.
Solution
However, you can specify a shared memory size when you create a
user-managed notebooks instance by using the
container-custom-params
metadata key, with a value like the following:
--shm-size=SHARED_MEMORY_SIZE gb
Replace SHARED_MEMORY_SIZE
with the size that you want
in GB.
Helpful procedures
This section describes procedures that you might find helpful.
Use SSH to connect to your user-managed notebooks instance
Use ssh to connect to your instance by typing the following command in either Cloud Shell or any environment where the Google Cloud CLI is installed.
gcloud compute ssh --project PROJECT_ID \
--zone ZONE \
INSTANCE_NAME -- -L 8080:localhost:8080
Replace the following:
PROJECT_ID
: Your project IDZONE
: The Google Cloud zone where your instance is locatedINSTANCE_NAME
: The name of your instance
You can also connect to your instance by opening your instance's Compute Engine detail page, and then clicking the SSH button.
Re-register with the Inverting Proxy server
To re-register the user-managed notebooks instance with the internal Inverting Proxy server, you can stop and start the VM from the User-managed notebooks page or you can use ssh to connect to your user-managed notebooks instance and enter:
cd /opt/deeplearning/bin sudo ./attempt-register-vm-on-proxy.sh
Verify the Docker service status
To verify the Docker service status you can use ssh to connect to your user-managed notebooks instance and enter:
sudo service docker status
Verify that the Inverting Proxy agent is running
To verify if the notebook Inverting Proxy agent is running, use ssh to connect to your user-managed notebooks instance and enter:
# Confirm Inverting Proxy agent Docker container is running (proxy-agent) sudo docker ps # Verify State.Status is running and State.Running is true. sudo docker inspect proxy-agent # Grab logs sudo docker logs proxy-agent
Verify the Jupyter service status and collect logs
To verify the Jupyter service status you can use ssh to connect to your user-managed notebooks instance and enter:
sudo service jupyter status
To collect Jupyter service logs:
sudo journalctl -u jupyter.service --no-pager
Verify that the Jupyter internal API is active
The Jupyter API should always run on port 8080. You can verify this by inspecting the instance's syslogs for an entry similar to:
Jupyter Server ... running at: http://localhost:8080
To verify that the Jupyter internal API is active you can also use ssh to connect to your user-managed notebooks instance and enter:
curl http://127.0.0.1:8080/api/kernelspecs
You can also measure the time it takes for the API to respond in case the requests are taking too long:
time curl -V http://127.0.0.1:8080/api/status
time curl -V http://127.0.0.1:8080/api/kernels
time curl -V http://127.0.0.1:8080/api/connections
To run these commands in your Vertex AI Workbench instance, open JupyterLab, and create a new terminal.
Restart the Docker service
To restart the Docker service, you can stop and start the VM from the User-managed notebooks page or you can use ssh to connect to your user-managed notebooks instance and enter:
sudo service docker restart
Restart the Inverting Proxy agent
To restart the Inverting Proxy agent, you can stop and start the VM from the User-managed notebooks page or you can use ssh to connect to your user-managed notebooks instance and enter:
sudo docker restart proxy-agent
Restart the Jupyter service
To restart the Jupyter service, you can stop and start the VM from the User-managed notebooks page or you can use ssh to connect to your user-managed notebooks instance and enter:
sudo service jupyter restart
Restart the Notebooks Collection Agent
The Notebooks Collection Agent service runs a Python process in the background that verifies the status of the Vertex AI Workbench instance's core services.
To restart the Notebooks Collection Agent service, you can stop and start the VM from the Google Cloud console or you can use ssh to connect to your Vertex AI Workbench instance and enter:
sudo systemctl stop notebooks-collection-agent.service
followed by:
sudo systemctl start notebooks-collection-agent.service
To run these commands in your Vertex AI Workbench instance, open JupyterLab, and create a new terminal.
Modify the Notebooks Collection Agent script
To access and edit the script open a terminal in our instance or use ssh to connect to your Vertex AI Workbench instance, and enter:
nano /opt/deeplearning/bin/notebooks_collection_agent.py
After editing the file, remember to save it.
Then, you must restart the Notebooks Collection Agent service.
Verify the instance can resolve the required DNS domains
To verify that the instance can resolve the required DNS domains, you can use ssh to connect to your user-managed notebooks instance and enter:
host notebooks.googleapis.com
host *.notebooks.cloud.google.com
host *.notebooks.googleusercontent.com
host *.kernels.googleusercontent.com
or:
curl --silent --output /dev/null "https://notebooks.cloud.google.com"; echo $?
If the instance has Dataproc enabled, you can verify that the instance
resolves *.kernels.googleusercontent.com
by running:
curl --verbose -H "Authorization: Bearer $(gcloud auth print-access-token)" https://${PROJECT_NUMBER}-dot-${REGION}.kernels.googleusercontent.com/api/kernelspecs | jq .
To run these commands in your Vertex AI Workbench instance, open JupyterLab, and create a new terminal.
Make a copy of the user data on an instance
To store a copy of an instance's user data in Cloud Storage, complete the following steps.
Create a Cloud Storage bucket (optional)
In the same project where your instance is located, create a Cloud Storage bucket where you can store your user data. If you already have a Cloud Storage bucket, skip this step.
-
Create a Cloud Storage bucket:
Replacegcloud storage buckets create gs://BUCKET_NAME
BUCKET_NAME
with a bucket name that meets the bucket naming requirements.
Copy your user data
In your instance's JupyterLab interface, select File > New > Terminal to open a terminal window. For user-managed notebooks instances, you can instead connect to your instance's terminal by using SSH.
Use the gcloud CLI to copy your user data to a Cloud Storage bucket. The following example command copies all of the files from your instance's
/home/jupyter/
directory to a directory in a Cloud Storage bucket.gcloud storage cp /home/jupyter/* gs://BUCKET_NAMEPATH --recursive
Replace the following:
BUCKET_NAME
: the name of your Cloud Storage bucketPATH
: the path to the directory where you want to copy your files, for example:/copy/jupyter/
Investigate an instance stuck in provisioning by using gcpdiag
gcpdiag
is an open source tool. It is not an officially supported Google Cloud product.
You can use the gcpdiag
tool to help you identify and fix Google Cloud
project issues. For more information, see the
gcpdiag project on GitHub.
gcpdiag
runbook investigates potential causes for a
Vertex AI Workbench instance to get stuck in provisioning status,
including the following areas:
- Status: Checks the instance's current status to ensure that it is stuck in provisioning and not stopped or active.
- Instance's Compute Engine VM boot disk image:
Checks whether the instance was created with a custom container, an official
workbench-instances
image, Deep Learning VM Images, or unsupported images that might cause the instance to get stuck in provisioning status. - Custom scripts: Checks whether the instance is using custom startup or post-startup scripts that change the default Jupyter port or break dependencies that might cause the instance to get stuck in provisioning status.
- Environment version: Checks whether the instance is using the latest environment version by checking its upgradability. Earlier versions might cause the instance to get stuck in provisioning status.
- Instance's Compute Engine VM performance: Checks the VM's current performance to ensure that it isn't impaired by high CPU usage, insufficient memory, or disk space issues that might disrupt normal operations.
- Instance's Compute Engine serial port or
system logging: Checks whether the instance has
serial port logs, which are analyzed to
ensure that Jupyter is running on port
127.0.0.1:8080
. - Instance's Compute Engine SSH and terminal access: Checks whether the instance's Compute Engine VM is running so that the user can SSH and open a terminal to verify that space usage in 'home/jupyter' is lower than 85%. If no space is left, this might cause the instance to get stuck in provisioning status.
- External IP turned off: Checks whether external IP access is turned off. An incorrect networking configuration can cause the instance to get stuck in provisioning status.
Google Cloud console
- Complete and then copy the following command.
- Open the Google Cloud console and activate Cloud Shell. Open Cloud console
- Paste the copied command.
- Run the
gcpdiag
command, which downloads thegcpdiag
docker image, and then performs diagnostic checks. If applicable, follow the output instructions to fix failed checks.
gcpdiag runbook vertex/workbench-instance-stuck-in-provisioning \
--parameter project_id=PROJECT_ID \
--parameter instance_name=INSTANCE_NAME \
--parameter zone=ZONE
Docker
You can
run gcpdiag
using a wrapper that starts gcpdiag
in a
Docker container. Docker or
Podman must be installed.
- Copy and run the following command on your local workstation.
curl https://gcpdiag.dev/gcpdiag.sh >gcpdiag && chmod +x gcpdiag
- Execute the
gcpdiag
command../gcpdiag runbook vertex/workbench-instance-stuck-in-provisioning \ --parameter project_id=PROJECT_ID \ --parameter instance_name=INSTANCE_NAME \ --parameter zone=ZONE
View available parameters for this runbook.
Replace the following:
- PROJECT_ID: The ID of the project containing the resource.
- INSTANCE_NAME: The name of the target Vertex AI Workbench instance within your project.
- ZONE: The zone in which your target Vertex AI Workbench instance is located.
Useful flags:
--universe-domain
: If applicable, the Trusted Partner Sovereign Cloud domain hosting the resource--parameter
or-p
: Runbook parameters
For a list and description of all gcpdiag
tool flags, see the
gcpdiag
usage instructions.