Try out Optical Character Recognition (OCR)

This guide walks you through the process of running an Optical Character Recognition (OCR) test using Google's Vertex AI Vision service.

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

Create a python file ocr-test.py. Replace the image_uri_to_test value with the URI of a source image, as shown:

import os
import requests
import json

def detect_text_rest(image_uri):
    """Performs Optical Character Recognition (OCR) on an image by invoking the Vertex AI REST API."""

    # Securely fetch the API key from environment variables
    api_key = os.environ.get("GCP_API_KEY")
    if not api_key:
        raise ValueError("GCP_API_KEY environment variable must be defined.")

    # Construct the Vision API endpoint URL
    vision_api_url = f"https://vision.googleapis.com/v1/images:annotate?key={api_key}"

    print(f"Initiating OCR process for image: {image_uri}")

    # Define the request payload for text detection
    request_payload = {
        "requests": [
            {
                "image": {
                    "source": {
                        "imageUri": image_uri
                    }
                },
                "features": [
                    {
                        "type": "TEXT_DETECTION"
                    }
                ]
            }
        ]
    }

    # Send a POST request to the Vision API
    response = requests.post(vision_api_url, json=request_payload)
    response.raise_for_status()  # Check for HTTP errors

    response_json = response.json()

    print("\n--- OCR Results ---")

    # Extract and print the detected text
    if "textAnnotations" in response_json["responses"]:
        full_text = response_json["responses"]["textAnnotations"]["description"]
        print(f"Detected Text:\n{full_text}")
    else:
        print("No text was detected in the image.")

    print("--- End of Results ---\n")

if __name__ == "__main__":
    # URI of a publicly available image, or a storage bucket
    image_uri_to_test = "IMAGE_URI"

    detect_text_rest(image_uri_to_test)

Replace the following:

IMAGE_URI with the URI of a publicly available image that contains text, for example, "https://cloud.google.com/vision/docs/images/sign.jpg". Alternatively, you can specify a Cloud Storage URI, for example, "gs://your-bucket/your-image.png"

Create a Dockerfile:

ROM python:3.9-slim

WORKDIR /app

COPY ocr_rest_test.py /app/

# Install 'requests' for HTTP calls
RUN pip install --no-cache-dir requests

CMD ["python", "ocr_test.py"]

Build the Docker image for the translation application:
```
docker build -t ocr-app .
```
Follow instructions at Configure Docker to:
1. Configure Docker,
2. Create a secret, and
3. Upload the image to HaaS.
Sign in to the user cluster and generate its kubeconfig file with a user identity. Make sure you set the kubeconfig path as an environment variable:
```
export KUBECONFIG=${CLUSTER_KUBECONFIG_PATH}
```
Create a Kubernetes secret by running the following command in your terminal, pasting your API key:
```
kubectl create secret generic gcp-api-key-secret \
  --from-literal=GCP_API_KEY='PASTE_YOUR_API_KEY_HERE'
```
This command creates a secret named gcp-api-key-secret with a key GCP_API_KEY.

Apply the kubernetes manifest:

apiVersion: batch/v1
kind: Job
metadata:
  name: ocr-test-job-apikey
spec:
  template:
    spec:
      containers:
      - name: ocr-test-container
        image:${HARBOR_INSTANCE_URL}/${HARBOR_PROJECT}/ocr-app:latest # Your image path
        # Mount the API key from the secret into the container
        # as an environment variable named GCP_API_KEY.
        imagePullSecrets:
        - name: ${SECRET}
        envFrom:
        - secretRef:
            name: gcp-api-key-secret
      restartPolicy: Never
  backoffLimit: 4

Check the job status:

kubectl get jobs/ocr-test-job-apikey
# It will show 0/1 completions, then 1/1 after it succeeds

After the job has completed, you can view the OCR output in the pod logs:
```
kubectl logs -l job-name=ocr-test-job-apikey
```

Try out Optical Character Recognition (OCR) Stay organized with collections Save and categorize content based on your preferences.

Try out Optical Character Recognition (OCR)