GcsTrainingInput(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Cloud Storage training data input.
Attributes |
|
---|---|
Name | Description |
corpus_data_path |
str
The Cloud Storage corpus data which could be associated in train data. The data path format is gs:// . A newline
delimited jsonl/ndjson file.
For search-tuning model, each line should have the \_id,
title and text. Example:
{"_id": "doc1", title: "relevant doc", "text": "relevant text"}
|
query_data_path |
str
The gcs query data which could be associated in train data. The data path format is gs:// . A newline
delimited jsonl/ndjson file.
For search-tuning model, each line should have the \_id and
text. Example: {"_id": "query1", "text": "example query"}
|
train_data_path |
str
Cloud Storage training data path whose format should be gs:// . The file should
be in tsv format. Each line should have the doc_id and
query_id and score (number).
For search-tuning model, it should have the query-id
corpus-id score as tsv file header. The score should be a
number in [0, inf+) . The larger the number is, the more
relevant the pair is. Example:
- query-id\tcorpus-id\tscore
- query1\tdoc1\t1
|
test_data_path |
str
Cloud Storage test data. Same format as train_data_path. If not provided, a random 80/20 train/test split will be performed on train_data_path. |