Use Imagen on Vertex AI's visual captioning and Visual Question Answering (VQA) to get image information (Console)

Learn how to use Imagen on Vertex AI's visual captioning and Visual Question Answering (VQA) features to get text information about an image. This quickstart shows you how to use visual captioning and VQA in the Google Cloud console.

Sample image of fish
Image source: Worachat Sodsri on Unsplash (image cropped, shown in Google Cloud console).

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI API.

    Enable the API

Get the sample image

After you have set up your environment, you can get a sample image and use visual captioning and Visual Question Answering to get information about the image.

Sample image of fish
Image source: Worachat Sodsri on Unsplash (image cropped).

To get the sample image, either download the image directly from Cloud Storage, or use the following command to save it in the current directory:

curl -O https://storage.googleapis.com/cloud-samples-data/generative-ai/image/vcap-vqa-quickstart_fish.jpg

Generate image descriptions with visual captioning

After you get the sample image, you can send the visual captioning request to get a text descriptions of the image.

Console

  1. In the Google Cloud console, open the Vertex AI Studio > Vision tab in the Vertex AI dashboard.

    Go to the Vertex AI Studio tab

  2. In the lower menu, click Caption.

  3. Click Upload image and select the local image to caption.

  4. In the Parameters panel, set the following:

    1. Number of captions: Select 2.
    2. Language: If not already selected, choose English (en).
  5. Click Generate captions.

Generate answers to questions with VQA

Finally, you can use the same image to ask a question about the image and get an answer using the VQA feature.

Console

  1. In the Google Cloud console, open the Vertex AI Studio > Vision tab in the Vertex AI dashboard.

    Go to the Vertex AI Studio tab

  2. In the lower menu, click Visual Q&A.

  3. Click Upload image and select the local image.

  4. In the Parameters panel, select 2 as the Number of answers.

  5. In the prompt (Ask a question here) field, enter the following text:

    What color is the left fish?
    
  6. Click Generate.

Congratulations! You've just used Imagen's visual captioning and VQA features to get information about an image.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

Delete the project

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next