- 3.0.0 (latest)
- 2.35.0
- 2.34.0
- 2.33.0
- 2.32.0
- 2.30.0
- 2.29.3
- 2.28.0
- 2.27.1
- 2.26.0
- 2.25.0
- 2.24.2
- 2.23.0
- 2.22.0
- 2.21.1
- 2.20.2
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.1
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.1
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.1
- 2.3.0
- 2.2.0
- 2.1.0
- 2.0.3
- 1.5.1
- 1.4.2
- 1.3.0
- 1.2.1
- 1.1.0
- 1.0.0
- 0.5.2
- 0.4.0
- 0.3.0
- 0.2.0
- 0.1.0
OutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)
The desired output location and metadata.
Attributes | |
---|---|
Name | Description |
gcs_destination |
The Google Cloud Storage location to write the output to. |
pages_per_shard |
int
The max number of pages to include into each output Document shard JSON on Google Cloud Storage. The valid range is [1, 100]. If not specified, the default value is 20. For example, for one pdf file with 100 pages, 100 parsed pages will be produced. If pages_per_shard = 20, then 5
Document shard JSON files each containing 20 parsed pages
will be written under the prefix
[OutputConfig.gcs_destination.uri][] and suffix
pages-x-to-y.json where x and y are 1-indexed page numbers.
Example GCS outputs with 157 pages and pages_per_shard = 50:
pages-001-to-050.json pages-051-to-100.json
pages-101-to-150.json pages-151-to-157.json
|