This guide walks you through the process of running an Optical Character Recognition (OCR) test using Google's Vertex AI Vision service.
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
Create a python file
ocr-test.py
. Replace theimage_uri_to_test
value with the URI of a source image, as shown:import os import requests import json def detect_text_rest(image_uri): """Performs Optical Character Recognition (OCR) on an image by invoking the Vertex AI REST API.""" # Securely fetch the API key from environment variables api_key = os.environ.get("GCP_API_KEY") if not api_key: raise ValueError("GCP_API_KEY environment variable must be defined.") # Construct the Vision API endpoint URL vision_api_url = f"https://vision.googleapis.com/v1/images:annotate?key={api_key}" print(f"Initiating OCR process for image: {image_uri}") # Define the request payload for text detection request_payload = { "requests": [ { "image": { "source": { "imageUri": image_uri } }, "features": [ { "type": "TEXT_DETECTION" } ] } ] } # Send a POST request to the Vision API response = requests.post(vision_api_url, json=request_payload) response.raise_for_status() # Check for HTTP errors response_json = response.json() print("\n--- OCR Results ---") # Extract and print the detected text if "textAnnotations" in response_json["responses"]: full_text = response_json["responses"]["textAnnotations"]["description"] print(f"Detected Text:\n{full_text}") else: print("No text was detected in the image.") print("--- End of Results ---\n") if __name__ == "__main__": # URI of a publicly available image, or a storage bucket image_uri_to_test = "IMAGE_URI" detect_text_rest(image_uri_to_test)
Replace the following:
IMAGE_URI
with the URI of a publicly available image that contains text, for example, "https://cloud.google.com/vision/docs/images/sign.jpg
". Alternatively, you can specify a Cloud Storage URI, for example, "gs://your-bucket/your-image.png
"
Create a Dockerfile:
ROM python:3.9-slim WORKDIR /app COPY ocr_rest_test.py /app/ # Install 'requests' for HTTP calls RUN pip install --no-cache-dir requests CMD ["python", "ocr_test.py"]
Build the Docker image for the translation application:
docker build -t ocr-app .
Follow instructions at Configure Docker to:
- Configure Docker,
- Create a secret, and
- Upload the image to HaaS.
Sign in to the user cluster and generate its kubeconfig file with a user identity. Make sure you set the kubeconfig path as an environment variable:
export KUBECONFIG=${CLUSTER_KUBECONFIG_PATH}
Create a Kubernetes secret by running the following command in your terminal, pasting your API key:
kubectl create secret generic gcp-api-key-secret \ --from-literal=GCP_API_KEY='PASTE_YOUR_API_KEY_HERE'
This command creates a secret named
gcp-api-key-secret
with a keyGCP_API_KEY
.Apply the kubernetes manifest:
apiVersion: batch/v1 kind: Job metadata: name: ocr-test-job-apikey spec: template: spec: containers: - name: ocr-test-container image:${HARBOR_INSTANCE_URL}/${HARBOR_PROJECT}/ocr-app:latest # Your image path # Mount the API key from the secret into the container # as an environment variable named GCP_API_KEY. imagePullSecrets: - name: ${SECRET} envFrom: - secretRef: name: gcp-api-key-secret restartPolicy: Never backoffLimit: 4
Check the job status:
kubectl get jobs/ocr-test-job-apikey # It will show 0/1 completions, then 1/1 after it succeeds
After the job has completed, you can view the OCR output in the pod logs:
kubectl logs -l job-name=ocr-test-job-apikey