Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Ekstraksi berbasis kustom
Dengan ekstraksi dan pelatihan model kustom, Anda dapat membuat model sendiri yang dirancang khusus untuk dokumen Anda tanpa menggunakan
AI generatif. Ini ideal jika Anda tidak ingin menggunakan AI generatif dan ingin mengontrol semua aspek model terlatih.
Konfigurasi set data
Set data dokumen diperlukan untuk melatih, melakukan uptrain, atau mengevaluasi versi pemroses.
Pemroses Document AI belajar dari contoh, sama seperti manusia. Set data mendorong
stabilitas prosesor dalam hal performa.
Set data pelatihan
Untuk meningkatkan model dan akurasinya, latih set data pada dokumen Anda. Model ini
terdiri dari dokumen dengan kebenaran dasar. Anda memerlukan minimal tiga dokumen untuk melatih model baru.
Set data pengujian
Set data pengujian adalah yang digunakan model untuk menghasilkan skor F1 (akurasi). Set data ini
terdiri dari dokumen dengan kebenaran dasar. Untuk melihat seberapa sering model benar, kebenaran dasar digunakan untuk membandingkan prediksi model (kolom yang diekstrak dari
model) dengan jawaban yang benar. Set data pengujian harus memiliki minimal tiga dokumen.
Menetapkan lokasi set data: Pilih folder opsi default Terkelola Google.
Tindakan ini mungkin dilakukan secara otomatis segera setelah membuat pemroses.
Buka tab Build, lalu pilih Import Documents dengan pemberian label otomatis
diaktifkan (lihat Pemberian label otomatis dengan model dasar). Anda memerlukan minimal
10 dokumen dalam set pelatihan dan 10 dalam set pengujian untuk melatih model kustom.
Melatih model:
Pilih Train new version dan beri nama versi pemroses.
Buka Tampilkan opsi lanjutan dan pilih opsi Berbasis model.
Evaluasi:
Buka Evaluasi & uji, pilih versi yang baru saja Anda latih, lalu pilih Lihat evaluasi lengkap.
Sekarang Anda akan melihat metrik seperti f1, presisi, dan recall
untuk seluruh dokumen dan setiap kolom.
Tentukan apakah performa memenuhi sasaran produksi Anda. Jika tidak, evaluasi ulang
set pelatihan dan pengujian, biasanya dengan menambahkan dokumen ke set pengujian pelatihan
yang tidak diuraikan dengan baik.
Tetapkan versi baru sebagai default.
Buka Kelola versi.
Buka menu more_vert, lalu pilih Tetapkan sebagai default.
Model Anda kini di-deploy dan dokumen yang dikirim ke prosesor ini kini menggunakan versi kustom Anda. Anda ingin mengevaluasi performa model
untuk memeriksa apakah model memerlukan pelatihan lebih lanjut.
Referensi evaluasi
Mesin evaluasi dapat melakukan pencocokan persis atau pencocokan fuzzy.
Untuk kecocokan persis, nilai yang diekstrak harus sama persis dengan kebenaran dasar atau dihitung sebagai tidak cocok.
Ekstraksi pencocokan fuzzy yang memiliki sedikit perbedaan seperti perbedaan
kapitalisasi masih dihitung sebagai kecocokan. Hal ini dapat diubah di layar Evaluasi.
Pelabelan otomatis dengan model dasar
Model dasar dapat mengekstrak kolom secara akurat untuk berbagai jenis dokumen,
tetapi Anda juga dapat memberikan data pelatihan tambahan untuk meningkatkan akurasi
model untuk struktur dokumen tertentu.
Document AI menggunakan nama label yang Anda tentukan dan anotasi sebelumnya untuk melabeli dokumen dalam skala besar dengan pelabelan otomatis.
Setelah membuat pemroses kustom, buka tab Mulai.
Pilih Buat kolom baru.
Berikan nama deskriptif dan isi kolom deskripsi. Deskripsi properti memungkinkan Anda memberikan konteks, insight, dan pengetahuan sebelumnya tambahan untuk setiap entitas guna meningkatkan akurasi dan performa ekstraksi.
Buka tab Build, lalu pilih Import documents.
Pilih jalur dokumen dan set tempat dokumen akan diimpor. Centang kotak pemberian label otomatis, lalu pilih model dasar.
Di tab Build, pilih Manage Dataset. Anda akan melihat dokumen yang diimpor. Pilih salah satu dokumen Anda.
Sekarang Anda melihat prediksi dari model yang ditandai dengan warna ungu.
Tinjau setiap label yang diprediksi oleh model dan pastikan label tersebut benar. Jika ada kolom yang tidak ada, tambahkan juga.
Setelah dokumen ditinjau, pilih Tandai sebagai telah diberi label.
Dokumen kini siap digunakan oleh model. Pastikan dokumen berada dalam
set Pengujian atau Pelatihan.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-18 UTC."],[[["\u003cp\u003eCustom model training and extraction allows building models tailored to specific documents without generative AI, providing complete control over the trained model.\u003c/p\u003e\n"],["\u003cp\u003eA document dataset, consisting of at least three documents, is essential for training, up-training, or evaluating a processor version, as it acts as the source for the model's learning and stability.\u003c/p\u003e\n"],["\u003cp\u003eTraining a model involves using a dataset of documents with ground-truth to improve accuracy, while the test dataset compares the model's predictions against ground truth to measure its accuracy using an F1 score.\u003c/p\u003e\n"],["\u003cp\u003eCreating and evaluating a custom processor involves defining fields, importing documents with auto-labeling, training a new version, and evaluating performance metrics like F1, precision, and recall.\u003c/p\u003e\n"],["\u003cp\u003eAuto-labeling, which can be enhanced with descriptive property information for each entity, uses the foundation model to predict labels and improve extraction accuracy for specific document structures.\u003c/p\u003e\n"]]],[],null,["# Custom-based extraction\n=======================\n\nCustom model training and extraction lets you to build your own model designed specifically for your documents without the use\nof generative AI. It's ideal if you don't want to use generative AI and want to control all aspects of the trained model.\n\n\nDataset configuration\n---------------------\n\nA document dataset is required to train, up-train, or evaluate a processor version. Document AI processors learn from examples, just like humans. Dataset fuels processor stability in terms of performance. \n\n### Train dataset\n\nTo improve the model and its accuracy, train a dataset on your documents. The model is made up of documents with ground-truth. You need a minimum of three documents to train a new model. Ground-truth is the correctly labeled data, as determined by humans.\n\n### Test dataset\n\nThe test dataset is what the model uses to generate an F1 score (accuracy). It is made up of documents with ground-truth. To see how often the model is right, the ground truth is used to compare the model's predictions (extracted fields from the model) with the correct answers. The test dataset should have at least three documents.\n\n\u003cbr /\u003e\n\nBefore getting started\n----------------------\n\nIf not done so already, [enable billing](/document-ai/docs/setup#billing) and the\n[Document AI API](/document-ai/docs/setup).\n\nBuild and evaluate a custom model\n---------------------------------\n\nBegin by building and then evaluating a custom processor.\n\n1. [Create a processor](/document-ai/docs/workbench/build-custom-processor#create_a_processor)\n and [define fields](/document-ai/docs/workbench/build-custom-processor#define_processor_fields)\n you want to extract, which is important because it impacts extraction quality.\n\n | **Note:** The default processor is a foundation model.\n2. Set dataset location: Select the default option folder **Google-managed**.\n This might be done automatically shortly after creating the processor.\n\n3. Navigate to the **Build** tab and select **Import Documents** with auto-labeling\n enabled (see [Auto-labeling with the foundation model](#auto-labeling)). You need a minimum of\n 10 documents in the training set and 10 in the testing set to train a custom model.\n\n4. Train model:\n\n 1. Select **Train new version** and name the processor version.\n 2. Go to **Show advanced options** and select the **Model based** option.\n\n | **Note:** It takes some time for the training to complete.\n5. Evaluation:\n\n - Go to **Evaluate \\& test** , select the version you just trained, then select **View full evaluation**.\n\n - You now see metrics such as [f1, precision, and recall](/document-ai/docs/workbench/evaluate#all-labels) for the entire document and each field.\n - Decide if performance meets your production goals. If it does not, then reevaluate training and testing sets, typically adding documents to the training test set that don't parse well.\n6. Set a new version as the default.\n\n 1. Navigate to **Manage versions**.\n 2. Navigate to the more_vert menu and then select **Set as default**.\n\nYour model is now deployed and documents sent to this processor are now using your\ncustom version. You want to [evaluate the model's performance](/document-ai/docs/workbench/evaluate)\nto check if it requires further training.\n\nEvaluation reference\n--------------------\n\nThe evaluation engine can do both exact match or [fuzzy matching](/document-ai/docs/workbench/evaluate#fuzzy_matching).\nFor an exact match, the extracted value must exactly match the ground truth or is counted as a miss.\n\nFuzzy matching extractions that had slight differences such as capitalization\ndifferences still count as a match. This can be changed at the **Evaluation** screen.\n\nAuto-labeling with the foundation model\n---------------------------------------\n\nThe foundation model can accurately extract fields for a variety of document types,\nbut you can also provide additional training data to improve the accuracy of the\nmodel for specific document structures.\n\nDocument AI uses the label names you define and previous annotations to label\ndocuments at scale with auto-labeling.\n\n1. When you've created a custom processor, go to the **Get Started** tab.\n2. Select **Create new field**.\n3. Provide a descriptive name and fill out the description field. Property description lets you provide additional context, insights, and prior knowledge for each entity to improve extraction accuracy and performance.\n\n| **Note:** Good examples of property descriptions include location information and text patterns of the property values, which help disambiguate potential sources of confusion in the document, guiding the model with rules that ensure more reliable and consistent extractions, regardless of the specific document structure or content variations.\n\n1. Navigate to the **Build** tab, then select **Import documents**.\n\n2. Select the path of the documents and which set the documents should be imported\n into. Check the auto-labeling box, and select the foundation model.\n\n3. In the **Build** tab, select **Manage Dataset**. You should see your imported\n documents. Select one of your documents.\n\nYou now see the predictions from the model highlighted in purple.\n\n1. Review each label predicted by the model and ensure is correct. If there are missing fields, add those as well.\n\n| **Note:** It's important that all fields are as accurate as possible or model performance might affect the results. [More details on labeling](/document-ai/docs/workbench/label-documents).\n\n1. After the document has been reviewed, select **Mark as labeled** . The document is now ready to be used by the model. Make sure the document is in either the **Testing** or **Training** set."]]