Training dataset maximums: 300 documents, 300 pages
Training dataset minimum: every label on at least on at least 3 documents
Test dataset maximums: 2,000 documents; 8,000 pages
Test dataset minimum: every label on at least 3 documents
Maximum of 20 pages per document
Limits to train a Custom Document Classifier (CDC) or a Custom Document Splitter (CDS)
Training dataset maximums: 30,000 documents; 100,000 pages
Training dataset minimum: every label on at least 10 documents
Test dataset maximums: 2,000 documents; 8,000 pages
Test dataset minimum: every label on at least 2 documents
Maximum of 200 pages per document
Labeling
To get started, ensure document labels meet defined minimum training and evaluation thresholds.
To begin evaluating model performance for documents with layout variation, label at least 100 documents. Specifically, ensure that each label exists on 50 documents in training and 50 in evaluation.
Maximum allowed labels (fields): 150
Label size limits (characters): Long items aren't well supported, but there's no explicit limit. Chunk documents into 800- or 1,000-token pieces, with 100 to 200 tokens overlapping between chunks. (Items longer than the overlapping area might run into quality issues.)
Label occurrences in a document: No limit
Geographic coverage
Regions generally supported: US, EU (multiregion)
Regions with limited accessibility: Germany, Singapore, UK, Canada, India, Australia
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-03-27 UTC."],[[["Document AI has fixed system limits that cannot be adjusted, unlike quotas."],["Online processing requests have a maximum file size limit of 20 MB, while batch processing requests can handle files up to 1 GB each."],["The maximum image resolution for uploaded files is 40 megapixels per page, but this does not apply to PDF files."],["Batch processing requests are limited to a maximum of 5,000 files per request, with a maximum of 10 pages per document for Human-in-the-Loop processes."],["Training datasets for Custom Document Extractors, Classifiers, or Splitters have varying maximums, with limits for example, of up to 25,000 documents and 100,000 pages for model based training, with labels needing to be present on a specified minimum amount of documents, and a maximum of 200 pages per document."]]],[]]