Use the Data Science Agent

This guide describes how you can use the Data Science Agent in Colab Enterprise to help you perform data science tasks in your notebooks.

Learn how and when Gemini for Google Cloud uses your data.

This document is intended for data analysts, data scientists, and data developers who work with Colab Enterprise. It assumes you have knowledge of how to write code in a notebook environment.

Capabilities of the Data Science Agent

The Data Science Agent can help you with tasks ranging from exploratory data analysis to generating machine learning predictions and forecasts. You can use the Data Science Agent for:

  • Generating plans: Generate and modify a plan to complete a particular task.
  • Data exploration: Explore a dataset to understand its structure, identify potential issues like missing values and outliers, and examine the distribution of key variables.
  • Data cleaning: Clean your data. For example, remove data points that are outliers.
  • Data wrangling: Convert categorical features into numerical representations using techniques like one-hot encoding or label encoding. Create new features for analysis.
  • Data analysis: Analyze the relationships between different variables. Calculate correlations between numerical features and explore distributions of categorical features. Look for patterns and trends in the data.
  • Data visualization: Create visualizations such as histograms, box plots, scatter plots, and bar charts that represent the distributions of individual variables and the relationships between them.
  • Feature engineering: Engineer new features from a cleaned dataset.
  • Data splitting: Split an engineered dataset into training, validation, and testing datasets.
  • Model training: Train a model by using the training data.
  • Model optimization: Optimize a model by using the validation set. Explore alternative models like DecisionTreeRegressor and RandomForestRegressor and compare their performance.
  • Model evaluation: Evaluate the best performing model on the test dataset.

Limitations

  • The Data Science Agent supports the following data sources:
    • CSV files
    • BigQuery tables
  • The code produced by the Data Science Agent only runs in your notebook's runtime.
  • Your notebook must be in a region supported by the Data Science Agent. See Locations.
  • The Data Science Agent isn't supported in projects that have enabled VPC Service Controls.
  • The first time you run the Data Science Agent, you may experience some latency of approximately five to ten minutes. This only occurs once per project during initial setup.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI, Dataform, and Compute Engine APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI, Dataform, and Compute Engine APIs.

    Enable the APIs

Required roles

To get the permissions that you need to use the Data Science Agent in Colab Enterprise, ask your administrator to grant you the Colab Enterprise User (roles/aiplatform.colabEnterpriseUser) IAM role on the project. For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Use the Data Science Agent

To get started using Colab Enterprise's Data Science Agent, do the following:

  1. In the Google Cloud console, go to the Colab Enterprise My notebooks page.

    Go to My notebooks

  2. In the Region menu, select the region that contains your notebook.

  3. Click the notebook that you want to open.

  4. In the toolbar, click the  Gemini button to open the chat dialog.

  5. To upload a CSV file, do the following:

    1. In the chat dialog, click Add files.
    2. If necessary, authorize your Google Account.

      Wait a moment for Colab Enterprise to start a runtime and enable file browsing.

    3. In the Files pane, click  Upload to session storage.
    4. Browse to the location of the file, and then click Open.
    5. Click OK to acknowledge that this runtime's files will be deleted when the runtime is deleted.

      The file is uploaded to the Files pane.

    6. Next to the file that you uploaded, click the Actions menu, and then select Add to Gemini.

      The file is added to the chat dialog.

  6. In the Gemini chat dialog, enter a prompt and click  Send. To get ideas for prompts, review the [Data Science Agent capabilities](#capabilities) and see [Sample prompts](#sample-prompts).

    For example, you might enter "Provide an analysis of the data I've uploaded."

  7. Gemini responds to your prompt. The response can include code snippets to run, general advice for your project, next steps for accomplishing your goals, or information about specific problems in your data or code.

    After evaluating the response, you can do the following:

    • If Gemini provides code in its response, you can click:
      • Accept to add the code to your notebook.
      • Accept and run to add the code to your notebook and run the code.
      • Cancel to delete the suggested code.
    • Ask follow-up questions and continue the discussion as needed.
  8. To close the Gemini dialog, click  Close.

Turn off Gemini in Colab Enterprise

To turn off Gemini in Colab Enterprise for a Google Cloud project, an administrator must turn off the Gemini for Google Cloud API. See Disabling services.

To turn off Gemini in Colab Enterprise for a specific user, an administrator needs to revoke the Gemini for Google Cloud User (roles/cloudaicompanion.user) role for that user. See Revoke a single IAM role.

Sample prompts

The following examples show the types of prompts that you can use with the Data Science Agent.

  • Find and fill in missing values by using the k-Nearest Neighbors (KNN) machine learning algorithm.
  • Create a plot of salaries by experience level. Use the experience_level column to group the salaries, and create a box plot for each group showing the values from the salary_in_usd column.
  • Use the XGBoost algorithm to make a model for determining the class variable of a specific fruit. Split the data into training and test datasets to generate a model and then evaluate the model's accuracy. Create a confusion matrix to show the predictions for each class, including all predictions that are correct and incorrect.
  • Create a pandas dataframe for my data. Analyze the data for null values, and then visualize the distribution of each column using violin plots for measured values and bar plots for categories.
  • Read in the CSV file for the dataset and construct a DataFrame, run an analysis on the DataFrame to determine what needs to be done with values (replace or remove missing values, remove duplicate rows), and determine the distribution of the amount of money invested in USD per city location. Visualize the results on a bar chart in descending order as Location versus Avg Amount Invested (USD), showing only the top 20 results.
  • Forecast target_variable from filename.csv for the next six months.
  • Build and evaluate a classification model on filename.csv for target_variable.

Supported regions

To view the supported regions for Colab Enterprise's Data Science Agent, see Locations.

Billing

During Preview, you are charged only for running code in the notebook's runtime. For more information, see Colab Enterprise pricing.

What's next