Training dataset maximums: 300 documents, 300 pages
Training dataset minimum: every label on at least on at least 3 documents
Test dataset maximums: 2,000 documents; 8,000 pages
Test dataset minimum: every label on at least 3 documents
Maximum of 20 pages per document
Limits to train a Custom Document Classifier (CDC) or a Custom Document Splitter (CDS)
Training dataset maximums: 30,000 documents; 100,000 pages
Training dataset minimum: every label on at least 10 documents
Test dataset maximums: 2,000 documents; 8,000 pages
Test dataset minimum: every label on at least 2 documents
Maximum of 200 pages per document
Labeling
To get started, ensure document labels meet defined minimum training and evaluation thresholds.
To begin evaluating model performance for documents with layout variation, label at least 100 documents. Specifically, ensure that each label exists on 50 documents in training and 50 in evaluation.
Maximum allowed labels (fields): 150
Label size limits (characters): Long items aren't well supported, but there's no explicit limit. Chunk documents into 800- or 1,000-token pieces, with 100 to 200 tokens overlapping between chunks. (Items longer than the overlapping area might run into quality issues.)
Label occurrences in a document: No limit
Geographic coverage
Regions generally supported: US, EU (multiregion)
Regions with limited accessibility: Germany, Singapore, UK, Canada, India, Australia
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-12-19 UTC."],[],[]]