Stay organized with collections
Save and categorize content based on your preferences.
The Optical Character Recognition (OCR) service of Vertex AI on
Google Distributed Cloud (GDC) air-gapped detects text in images using the
BatchAnnotateImages API method. The service supports JPEG and PNG files for
images.
This page shows you how to detect image text using the OCR API on
Distributed Cloud.
Before you begin
Before you can start using the OCR API, you must have a project
with the OCR API enabled and have the appropriate credentials.
You can also install client libraries to help you make calls to the API. For
more information, see Set up a character recognition project.
Detect text from JPEG and PNG files
The BatchAnnotateImages method detects text from a batch of JPEG or PNG files.
You send the file from which you want to detect text directly as content in the
API request. The system returns the resulting detected text in JSON format in
the API response.
You must specify values for the fields in the JSON body of your API request. The
following table contains a description of the request body fields you must
provide when you use the BatchAnnotateImages API method for your text
detection requests:
Request body fields
Field description
content
The images with text to detect. You provide the Base64 representation (ASCII string) of your binary image data.
type
The type of text detection you need from the image.
Specify one of the two annotation features:
TEXT_DETECTION detects and extracts text from any image. The JSON response includes the extracted string, individual words, and their bounding boxes.
DOCUMENT_TEXT_DETECTION also extracts text from an image, but the service optimizes the response for dense text and documents. The JSON includes page, block, paragraph, word, and break information.
Optional. List of languages to use for the text detection.
The system interprets an empty value for this field as automatic language detection.
You don't need to set the language_hints field for languages based on the Latin alphabet.
If you know the language of the text in the image, setting a hint improves results.
For information about the complete JSON representation, see
AnnotateImageRequest.
Make an API request
Make a request to the OCR pre-trained API using the REST API
method. Otherwise, interact with the OCR pre-trained API from a
Python script to detect text from JPEG or PNG files.
The following examples show how to detect text in an image using
OCR:
REST
Follow these steps to detect text in images using the REST API method:
Save the following request.json file for your request body:
BASE64_ENCODED_IMAGE: the Base64 representation
(ASCII string) of your binary image data. This string begins with
characters that look similar to
/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==.
FEATURE_TYPE: the type of text detection you
need from the image. Allowed values are TEXT_DETECTION or
DOCUMENT_TEXT_DETECTION.
LANGUAGE_HINT: the BCP 47 language tags to use as
language hints for text detection, such as en-t-i0-handwrit. This
field is optional and the system interprets an empty value as automatic
language detection.
Add the following code to the Python script you created:
fromgoogle.cloudimportvisionimportgoogle.authfromgoogle.auth.transportimportrequestsfromgoogle.api_core.client_optionsimportClientOptionsaudience="https://ENDPOINT:443"api_endpoint="ENDPOINT:443"defvision_client(creds):opts=ClientOptions(api_endpoint=api_endpoint)returnvision.ImageAnnotatorClient(credentials=creds,client_options=opts)defmain():creds=Nonetry:creds,project_id=google.auth.default()creds=creds.with_gdch_audience(audience)req=requests.Request()creds.refresh(req)print("Got token: ")print(creds.token)exceptExceptionase:print("Caught exception"+str(e))raiseereturncredsdefvision_func(creds):vc=vision_client(creds)image={"content":"BASE64_ENCODED_IMAGE"}features=[{"type_":vision.Feature.Type.FEATURE_TYPE}]# Each requests element corresponds to a single image. To annotate more# images, create a request element for each image and add it to# the array of requestsreq={"image":image,"features":features}metadata=[("x-goog-user-project","projects/PROJECT_ID")]resp=vc.annotate_image(req,metadata=metadata)print(resp)if__name__=="__main__":creds=main()vision_func(creds)
BASE64_ENCODED_IMAGE: the Base64 representation
(ASCII string) of your binary image data. This string begins with
characters that look similar to
/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==.
FEATURE_TYPE: the type of text detection you
need from the image. Allowed values are TEXT_DETECTION or
DOCUMENT_TEXT_DETECTION.
PROJECT_ID: your project ID.
Save the Python script.
Run the Python script to detect text in the image:
pythonSCRIPT_NAME
Replace SCRIPT_NAME with the name you gave to your
Python script, such as vision.py.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eVertex AI's OCR service on Google Distributed Cloud (GDC) air-gapped uses the \u003ccode\u003eBatchAnnotateImages\u003c/code\u003e API to detect text in JPEG and PNG images.\u003c/p\u003e\n"],["\u003cp\u003eTo use the OCR API, users must have a project with the API enabled and appropriate credentials, potentially including installing client libraries.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eBatchAnnotateImages\u003c/code\u003e method requires the image data as Base64 encoded content within the API request and allows specifying the type of detection (\u003ccode\u003eTEXT_DETECTION\u003c/code\u003e or \u003ccode\u003eDOCUMENT_TEXT_DETECTION\u003c/code\u003e).\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003elanguage_hints\u003c/code\u003e field in the API request is optional and allows users to specify the language of the text for improved detection accuracy, following BCP 47 language tag formatting.\u003c/p\u003e\n"],["\u003cp\u003eText detection can be performed via a REST API method or a Python script, both requiring authentication and the specification of an endpoint, project ID, and the Base64-encoded image.\u003c/p\u003e\n"]]],[],null,["# Detect text in images\n\nThe Optical Character Recognition (OCR) service of Vertex AI on\nGoogle Distributed Cloud (GDC) air-gapped detects text in images using the\n`BatchAnnotateImages` API method. The service supports JPEG and PNG files for\nimages.\n\nThis page shows you how to detect image text using the OCR API on\nDistributed Cloud.\n\nBefore you begin\n----------------\n\nBefore you can start using the OCR API, you must have a project\nwith the OCR API enabled and have the appropriate credentials.\nYou can also install client libraries to help you make calls to the API. For\nmore information, see [Set up a character recognition project](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vai-set-up-ocr).\n\nDetect text from JPEG and PNG files\n-----------------------------------\n\nThe `BatchAnnotateImages` method detects text from a batch of JPEG or PNG files.\nYou send the file from which you want to detect text directly as content in the\nAPI request. The system returns the resulting detected text in JSON format in\nthe API response.\n\nYou must specify values for the fields in the JSON body of your API request. The\nfollowing table contains a description of the request body fields you must\nprovide when you use the `BatchAnnotateImages` API method for your text\ndetection requests:\n\nFor information about the complete JSON representation, see\n[`AnnotateImageRequest`](/distributed-cloud/hosted/docs/latest/gdch/apis/vertex-ai/ocr/rest/v1/AnnotateImageRequest).\n\n### Make an API request\n\nMake a request to the OCR pre-trained API using the REST API\nmethod. Otherwise, interact with the OCR pre-trained API from a\nPython script to detect text from JPEG or PNG files.\n| **Note:** The `BatchAnnotateImages` API method only supports a single request per batch call.\n\nThe following examples show how to detect text in an image using\nOCR: \n\n### REST\n\nFollow these steps to detect text in images using the REST API method:\n\n1. Save the following `request.json` file for your request body:\n\n cat \u003c\u003c- EOF \u003e request.json\n {\n \"requests\": [\n {\n \"image\": {\n \"content\": \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-err\"\u003eBASE\u003c/span\u003e\u003cspan class=\"devsite-syntax-mi\"\u003e64\u003c/span\u003e\u003cspan class=\"devsite-syntax-err\"\u003e_ENCODED_IMAGE\u003c/span\u003e\u003c/var\u003e\n },\n \"features\": [\n {\n \"type\": \"\u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e\"\n }\n ],\n \"image_context\": {\n \"language_hints\": [\n \"\u003cvar translate=\"no\"\u003eLANGUAGE_HINT_1\u003c/var\u003e\",\n \"\u003cvar translate=\"no\"\u003eLANGUAGE_HINT_2\u003c/var\u003e\",\n ...\n ]\n }\n }\n ]\n }\n EOF\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eBASE64_ENCODED_IMAGE\u003c/var\u003e: the Base64 representation (ASCII string) of your binary image data. This string begins with characters that look similar to `/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==`.\n - \u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e: the type of text detection you need from the image. Allowed values are `TEXT_DETECTION` or `DOCUMENT_TEXT_DETECTION`.\n - \u003cvar translate=\"no\"\u003eLANGUAGE_HINT\u003c/var\u003e: the BCP 47 language tags to use as language hints for text detection, such as `en-t-i0-handwrit`. This field is optional and the system interprets an empty value as automatic language detection.\n2. [Get an authentication token](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth).\n\n3. Make the request:\n\n ### curl\n\n curl -X POST \\\n -H \"Authorization: Bearer \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e\" \\\n -H \"x-goog-user-project: projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n https://\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e/v1/images:annotate\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e: [the authentication token](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth) you obtained.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: your project ID.\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-status).\n\n ### PowerShell\n\n $headers = @{\n \"Authorization\" = \"Bearer \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e\"\n \"x-goog-user-project\" = \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\"\n }\n\n Invoke-WebRequest\n -Method POST\n -Headers $headers\n -ContentType: \"application/json; charset=utf-8\"\n -InFile request.json\n -Uri \"\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e/v1/images:annotate\" | Select-Object -Expand Content\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eTOKEN\u003c/var\u003e: [the authentication token](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth) you obtained.\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-status).\n\n### Python\n\nFollow these steps to use the OCR service from a Python\nscript to detect text in an image:\n\n1. [Install the latest version of the OCR client library](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-install-libraries).\n\n2. [Set the required environment variables on a Python script](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vai-set-up-ocr#set-env-var).\n\n3. [Authenticate your API request](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-auth).\n\n4. Add the following code to the Python script you created:\n\n from google.cloud import vision\n import google.auth\n from google.auth.transport import requests\n from google.api_core.client_options import ClientOptions\n\n audience = \"https://\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e:443\"\n api_endpoint=\"\u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e:443\"\n\n def vision_client(creds):\n opts = ClientOptions(api_endpoint=api_endpoint)\n return vision.https://cloud.google.com/python/docs/reference/vision/latest/google.cloud.vision_v1.services.image_annotator.ImageAnnotatorClient.html(credentials=creds, client_options=opts)\n\n def main():\n creds = None\n try:\n creds, project_id = google.auth.default()\n creds = creds.with_gdch_audience(audience)\n req = requests.Request()\n creds.refresh(req)\n print(\"Got token: \")\n print(creds.token)\n except Exception as e:\n print(\"Caught exception\" + str(e))\n raise e\n return creds\n\n def vision_func(creds):\n vc = vision_client(creds)\n image = {\"content\": \"\u003cvar translate=\"no\"\u003eBASE64_ENCODED_IMAGE\u003c/var\u003e\"}\n features = [{\"type_\": vision.https://cloud.google.com/python/docs/reference/vision/latest/google.cloud.vision_v1.types.Feature.html.Type.\u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eFEATURE_TYPE\u003c/span\u003e\u003c/var\u003e}]\n # Each requests element corresponds to a single image. To annotate more\n # images, create a request element for each image and add it to\n # the array of requests\n req = {\"image\": image, \"features\": features}\n\n metadata = [(\"x-goog-user-project\", \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\")]\n\n resp = vc.annotate_image(req,metadata=metadata)\n\n print(resp)\n\n if __name__==\"__main__\":\n creds = main()\n vision_func(creds)\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eENDPOINT\u003c/var\u003e: the OCR endpoint that you use for your organization. For more information, [view service status and endpoints](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-api-status).\n - \u003cvar translate=\"no\"\u003eBASE64_ENCODED_IMAGE\u003c/var\u003e: the Base64 representation (ASCII string) of your binary image data. This string begins with characters that look similar to `/9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==`.\n - \u003cvar translate=\"no\"\u003eFEATURE_TYPE\u003c/var\u003e: the type of text detection you need from the image. Allowed values are `TEXT_DETECTION` or `DOCUMENT_TEXT_DETECTION`.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: your project ID.\n5. Save the Python script.\n\n6. Run the Python script to detect text in the image:\n\n python \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eSCRIPT_NAME\u003c/span\u003e\u003c/var\u003e\n\n Replace \u003cvar translate=\"no\"\u003eSCRIPT_NAME\u003c/var\u003e with the name you gave to your\n Python script, such as `vision.py`."]]