The Vision API can provide online (immediate) annotation of multiple pages or frames from PDF, TIFF, or GIF files stored in Cloud Storage.
You can request online feature detection and annotation of 5 frames (GIF; "image/gif") or pages (PDF; "application/pdf", or TIFF; "image/tiff") of your choosing for each file.
The example annotations on this page are for DOCUMENT_TEXT_DETECTION
, but
online small batch annotation is available for all Vision
features.
Limitations
At most 5 pages will be annotated. Users can specify the specific 5 pages to be annotated.
Authentication
Set up your Google Cloud project and authentication
Currently supported feature types
Feature type | |
---|---|
CROP_HINTS |
Determine suggested vertices for a crop region on an image. |
DOCUMENT_TEXT_DETECTION |
Perform OCR on dense text images, such as documents (PDF/TIFF), and images with
handwriting.
TEXT_DETECTION can be used for sparse text images.
Takes precedence when both DOCUMENT_TEXT_DETECTION and
TEXT_DETECTION are present.
|
FACE_DETECTION |
Detect faces within the image. |
IMAGE_PROPERTIES |
Compute a set of image properties, such as the image's dominant colors. |
LABEL_DETECTION |
Add labels based on image content. |
LANDMARK_DETECTION |
Detect geographic landmarks within the image. |
LOGO_DETECTION |
Detect company logos within the image. |
OBJECT_LOCALIZATION |
Detect and extract multiple objects in an image. |
SAFE_SEARCH_DETECTION |
Run SafeSearch to detect potentially unsafe or undesirable content. |
TEXT_DETECTION |
Perform Optical Character Recognition (OCR) on text within the image.
Text detection is optimized for areas of sparse text within a larger image.
If the image is a document (PDF/TIFF), has dense text, or contains handwriting,
use DOCUMENT_TEXT_DETECTION instead.
|
WEB_DETECTION |
Detect topical entities such as news, events, or celebrities within the image, and find similar images on the web using the power of Google Image Search. |
Sample code
You can either send an annotation request with a locally stored file, or use a file that is stored on Cloud Storage.
Using a locally stored file
Use the following code samples to get any feature annotation for a locally stored file.
REST
To perform online PDF/TIFF/GIF feature detection for a small batch of files, make a POST request and provide the appropriate request body:
Before using any of the request data, make the following replacements:
- BASE64_ENCODED_FILE: The base64
representation (ASCII string) of your binary file data. This string should look similar to the
following string:
JVBERi0xLjUNCiW1tbW1...ydHhyZWYNCjk5NzM2OQ0KJSVFT0Y=
- PROJECT_ID: Your Google Cloud project ID.
Field-specific considerations:
inputConfig.mimeType
- One of the following: "application/pdf", "image/tiff" or "image/gif".pages
- specifies the specific pages of the file to perform feature detection.
HTTP method and URL:
POST https://vision.googleapis.com/v1/files:annotate
Request JSON body:
{ "requests": [ { "inputConfig": { "content": "BASE64_ENCODED_FILE", "mimeType": "application/pdf" }, "features": [ { "type": "DOCUMENT_TEXT_DETECTION" } ], "pages": [ 1,2,3,4,5 ] } ] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: PROJECT_ID" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://vision.googleapis.com/v1/files:annotate"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://vision.googleapis.com/v1/files:annotate" | Select-Object -Expand Content
A successful annotate
request immediately returns a JSON response.
For this feature (DOCUMENT_TEXT_DETECTION
), the JSON response is similar to that of
an image's document text detection request. The
response contains bounding boxes for blocks broken down by paragraphs, words, and individual
symbols. The full text is also detected. The response also contains a context
field showing the location of the PDF or TIFF that was specified and the result's page number in
the file.
The following response JSON is only for a single page (page 2) and has been shortened for clarity.
Java
Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries. For more information, see the Vision API Java reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vision quickstart using client libraries. For more information, see the Vision Node.js API reference documentation.
To authenticate to Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
Before trying this sample, follow the Python setup instructions in the Vision quickstart using client libraries. For more information, see the Vision Python API reference documentation.
To authenticate to Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Using a file on Cloud Storage
Use the following code samples to get any feature annotation for a file on Cloud Storage.
REST
To perform online PDF/TIFF/GIF feature detection for a small batch of
files, make a POST
request and provide the appropriate request body:
Before using any of the request data, make the following replacements:
- CLOUD_STORAGE_FILE_URI: the path to a valid
file (PDF/TIFF) in a Cloud Storage bucket. You must at least have read privileges to
the file.
Example:
gs://cloud-samples-data/vision/document_understanding/custom_0773375000.pdf
- PROJECT_ID: Your Google Cloud project ID.
Field-specific considerations:
inputConfig.mimeType
- One of the following: "application/pdf", "image/tiff" or "image/gif".pages
- specifies the specific pages of the file to perform feature detection.
HTTP method and URL:
POST https://vision.googleapis.com/v1/files:annotate
Request JSON body:
{ "requests": [ { "inputConfig": { "gcsSource": { "uri": "CLOUD_STORAGE_FILE_URI" }, "mimeType": "application/pdf" }, "features": [ { "type": "DOCUMENT_TEXT_DETECTION" } ], "pages": [ 1,2,3,4,5 ] } ] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: PROJECT_ID" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://vision.googleapis.com/v1/files:annotate"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://vision.googleapis.com/v1/files:annotate" | Select-Object -Expand Content
A successful annotate
request immediately returns a JSON response.
For this feature (DOCUMENT_TEXT_DETECTION
), the JSON response is similar to that of
an image's document text detection request. The
response contains bounding boxes for blocks broken down by paragraphs, words, and individual
symbols. The full text is also detected. The response also contains a context
field showing the location of the PDF or TIFF that was specified and the result's page number in
the file.
The following response JSON is only for a single page (page 2) and has been shortened for clarity.
Java
Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries. For more information, see the Vision API Java reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vision quickstart using client libraries. For more information, see the Vision Node.js API reference documentation.
To authenticate to Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
Before trying this sample, follow the Python setup instructions in the Vision quickstart using client libraries. For more information, see the Vision Python API reference documentation.
To authenticate to Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Try it
Try small batch online feature detection below.
You can use the PDF file specified already or specify your own file in its place.
There are three feature types specified for this request:
DOCUMENT_TEXT_DETECTION
LABEL_DETECTION
CROP_HINTS
You can add or remove other feature types by changing the
appropriate object in the request ({"type": "FEATURE_NAME"}
).
Send the request by selecting Execute.
Request body:
{ "requests": [ { "inputConfig": { "gcsSource": { "uri": "gs://cloud-samples-data/vision/document_understanding/custom_0773375000.pdf" }, "mimeType": "application/pdf" }, "features": [ { "type": "DOCUMENT_TEXT_DETECTION" }, { "type": "LABEL_DETECTION" }, { "type": "CROP_HINTS" } ], "pages": [ 1, 2, 3, 4, 5 ] } ] }