Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Pengenalan Karakter Optik (OCR) adalah salah satu dari tiga API pra-latih Vertex AI di Google Distributed Cloud (GDC) yang terisolasi. Layanan OCR mendeteksi teks dalam berbagai jenis file, seperti gambar, file dokumen, dan teks tulisan tangan.
OCR menawarkan metode berikut yang tersedia di Distributed Cloud untuk mengenali teks:
Mendeteksi teks dari batch file PDF atau TIFF dalam bucket penyimpanan untuk permintaan offline.
Pelajari lebih lanjut bahasa yang didukung
yang terdeteksi oleh fitur pengenalan teks.
Fitur pengenalan karakter optik
OCR API dapat mendeteksi dan mengekstrak teks dari gambar. Dua fitur anotasi berikut mendukung pengenalan karakter optik:
TEXT_DETECTION mendeteksi dan mengekstrak teks dari gambar apa pun. Misalnya, foto mungkin berisi rambu jalan atau rambu lalu lintas. Layanan OCR menampilkan file JSON dengan string yang diekstrak, setiap kata, dan kotak pembatasnya.
Gambar 1. Foto rambu jalan yang mendeteksi kata dan kotak pembatasnya menggunakan OCR API.
DOCUMENT_TEXT_DETECTION juga mengekstrak teks dari gambar, tetapi layanan ini
mengoptimalkan respons untuk teks dan dokumen yang padat. Misalnya, gambar
teks yang diketik dan dipindai dapat berisi beberapa paragraf dan judul. Layanan
OCR menampilkan file JSON dengan informasi halaman, blok, paragraf,
kata, dan jeda.
Gambar 2. Gambar yang dipindai dari teks yang diketik, tempat OCR API mendeteksi informasi seperti kata, halaman, dan paragraf.
Teks tulisan tangan
Gambar 3 adalah gambar teks tulisan tangan. OCR API mendeteksi dan
mengekstrak teks dari gambar ini. Untuk mengetahui daftar skrip tulisan tangan yang mendukung pengenalan tulisan tangan, lihat Skrip tulisan tangan.
Gambar 3. Gambar tulisan tangan tempat OCR API mendeteksi teks.
Batas pengenalan karakter optik
Metode API BatchAnnotateImages dan BatchAnnotateFiles hanya mendukung satu permintaan per panggilan batch.
Tabel berikut mencantumkan batas layanan OCR saat ini di Distributed Cloud.
Batas file untuk OCR
Nilai
Jumlah maksimum halaman
Lima
Ukuran file maksimal
20 MB
Ukuran gambar maksimum
20 juta piksel (panjang x lebar)
File yang dikirimkan untuk OCR API yang melebihi jumlah halaman maksimum atau ukuran file maksimum akan menampilkan error. File yang dikirimkan yang melebihi ukuran gambar maksimum akan diperkecil hingga 20 juta piksel.
Jenis file yang didukung untuk OCR
API OCR yang telah dilatih sebelumnya mendeteksi dan mentranskripsikan
teks dari jenis file berikut:
PDF
TIFF
JPG
PNG
Anda harus menyimpan file secara lokal di lingkungan Distributed Cloud. Anda tidak dapat mengakses file yang dihosting di Cloud Storage atau file yang tersedia secara publik untuk deteksi teks.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eOptical Character Recognition (OCR) is a pre-trained Vertex AI API on Google Distributed Cloud (GDC) air-gapped that can detect text in images, document files, and handwritten text.\u003c/p\u003e\n"],["\u003cp\u003eThe OCR service offers three methods: \u003ccode\u003eBatchAnnotateImages\u003c/code\u003e for detecting text from JPEG or PNG images, \u003ccode\u003eBatchAnnotateFiles\u003c/code\u003e for PDF or TIFF files in inline requests, and \u003ccode\u003eAsyncBatchAnnotateFiles\u003c/code\u003e for offline text detection from PDF or TIFF files in storage buckets.\u003c/p\u003e\n"],["\u003cp\u003eThe OCR API supports two main features: \u003ccode\u003eTEXT_DETECTION\u003c/code\u003e to extract text from any image and \u003ccode\u003eDOCUMENT_TEXT_DETECTION\u003c/code\u003e optimized for dense text and documents.\u003c/p\u003e\n"],["\u003cp\u003eThe OCR API also supports handwritten text detection and extraction, with specific supported handwriting scripts available for recognition.\u003c/p\u003e\n"],["\u003cp\u003eOCR has limitations such as a maximum of five pages, a 20 MB file size limit, and a maximum image size of 20 million pixels, and it supports file types like PDF, TIFF, JPG, and PNG, stored locally in the Distributed Cloud environment.\u003c/p\u003e\n"]]],[],null,["# Learn about character recognition features\n\nOptical Character Recognition (OCR) is one of the three Vertex AI\npre-trained APIs on Google Distributed Cloud (GDC) air-gapped. The OCR\nservice detects text in [various file types](#supported-file-types), such as\nimages, document files, and handwritten text.\n\nOCR offers the following methods available in\nDistributed Cloud to recognize text:\n\n| **Note:** The `BatchAnnotateImages` and `BatchAnnotateFiles` API methods only support a single request per batch call.\n\nLearn more about the [supported languages](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-ocr-supported-langs)\ndetected by the text recognition feature.\n\nOptical character recognition features\n--------------------------------------\n\nThe OCR API can detect and extract text from images. The\nfollowing two annotation features support optical character recognition:\n\n- `TEXT_DETECTION` detects and extracts text from any image. For example, a\n photograph might contain a street or traffic sign. The OCR\n service returns a JSON file with the extracted string, individual words, and\n their bounding boxes.\n\n **Figure 1.** Road sign photograph where the OCR API detects\n words and their bounding boxes.\n- `DOCUMENT_TEXT_DETECTION` also extracts text from an image, but the service\n optimizes the response for dense text and documents. For example, a scanned\n image of typed text might contain several paragraphs and headings. The\n OCR service returns a JSON file with page, block, paragraph,\n word, and break information.\n\n **Figure 2.** Scanned image of typed text where the OCR API detects information such as words, pages, and paragraphs.\n\nHandwritten text\n----------------\n\nFigure 3 is an image of handwritten text. The OCR API detects and\nextracts text from these images. For a list of handwriting scripts that\nsupport handwriting recognition, see\n[Handwriting scripts](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-ocr-supported-langs#handwriting-scripts).\n\n**Figure 3.** Handwriting image where the OCR API detects text.\n\nOptical character recognition limits\n------------------------------------\n\nThe `BatchAnnotateImages` and `BatchAnnotateFiles` API methods only support a\nsingle request per batch call.\n\nThe following table lists the current limits of the OCR service\nin Distributed Cloud.\n\nSubmitted files for the OCR API that exceed the maximum number of\npages or the maximum file size return an error. Submitted files that exceed the\nmaximum image size are downsized to 20 million pixels.\n\nSupported file types for OCR\n----------------------------\n\nThe OCR pre-trained API detects and transcribes\ntext from the following file types:\n\n- PDF\n- TIFF\n- JPG\n- PNG\n\nYou must store the files locally in your Distributed Cloud environment. You\ncan't access files hosted in Cloud Storage or publicly available files for\ntext detection."]]