Searches for documents using provided SearchDocumentsRequest
. This call only returns documents that the caller has permission to search against.
HTTP request
POST https://contentwarehouse.googleapis.com/v1/{parent}/documents:search
Path parameters
Parameters | |
---|---|
parent |
Required. The parent, which owns this collection of documents. Format: projects/{projectNumber}/locations/{location}. It takes the form |
Request body
The request body contains data with the following structure:
JSON representation |
---|
{ "requestMetadata": { object ( |
Fields | |
---|---|
requestMetadata |
The meta information collected about the end user, used to enforce access control and improve the search quality of the service. |
documentQuery |
Query used to search against documents (keyword, filters, etc.). |
offset |
An integer that specifies the current offset (that is, starting result location, amongst the documents deemed by the API as relevant) in search results. This field is only considered if The maximum allowed value is 5000. Otherwise an error is thrown. For example, 0 means to return results starting from the first matching document, and 10 means to return from the 11th document. This can be used for pagination, (for example, pageSize = 10 and offset = 10 means to return from the second page). |
pageSize |
A limit on the number of documents returned in the search results. Increasing this value above the default value of 10 can increase search response time. The value can be between 1 and 100. |
pageToken |
The token specifying the current offset within search results. See |
orderBy |
The criteria determining how search results are sorted. For non-empty query, default is Supported options are:
|
histogramQueries[] |
An expression specifying a histogram request against matching documents. Expression syntax is an aggregation function call with histogram facets and other options. The following aggregation functions are supported:
data types:
Example expression:
|
requireTotalSize |
Controls if the search document request requires the return of a total size of matched documents. See Enabling this flag may adversely impact performance. Hint: If this is used with pagination, set this flag on the initial query but set this to false on subsequent page calls (keep the total count locally). Defaults to false. |
totalResultSize |
Controls if the search document request requires the return of a total size of matched documents. See |
qaSizeLimit |
Experimental, do not use. The limit on the number of documents returned for the question-answering feature. To enable the question-answering feature, set [DocumentQuery].[isNlQuery][] to true. |
Response body
Response message for DocumentService.SearchDocuments.
If successful, the response body contains data with the following structure:
JSON representation |
---|
{ "matchingDocuments": [ { object ( |
Fields | |
---|---|
matchingDocuments[] |
The document entities that match the specified |
nextPageToken |
The token that specifies the starting position of the next page of results. This field is empty if there are no more results. |
totalSize |
The total number of matched documents which is available only if the client set |
metadata |
Additional information for the API invocation, such as the request tracking id. |
histogramQueryResults[] |
The histogram results that match with the specified |
questionAnswer |
Experimental. Question answer from the query against the document. |
Authorization scopes
Requires the following OAuth scope:
https://www.googleapis.com/auth/cloud-platform
For more information, see the Authentication Overview.
IAM Permissions
Requires the following IAM permission on the parent
resource:
contentwarehouse.documents.get
For more information, see the IAM documentation.
DocumentQuery
JSON representation |
---|
{ "query": string, "isNlQuery": boolean, "customPropertyFilter": string, "timeFilters": [ { object ( |
Fields | |
---|---|
query |
The query string that matches against the full text of the document and the searchable properties. The query partially supports Google AIP style syntax. Specifically, the query supports literals, logical operators, negation operators, comparison operators, and functions. Literals: A bare literal value (examples: "42", "Hugo") is a value to be matched against. It searches over the full text of the document and the searchable properties. Logical operators: "AND", "and", "OR", and "or" are binary logical operators (example: "engineer OR developer"). Negation operators: "NOT" and "!" are negation operators (example: "NOT software"). Comparison operators: support the binary comparison operators =, !=, <, >, <= and >= for string, numeric, enum, boolean. Also support like operator To specify a property in the query, the left hand side expression in the comparison must be the property id including the parent. The right hand side must be literals. For example: ""projects/123/locations/us".property_a < 1" matches results whose "property_a" is less than 1 in project 123 and us location. The literals and comparison expression can be connected in a single query (example: "software engineer "projects/123/locations/us".salary > 100"). Functions: supported functions are Support nested expressions connected using parenthesis and logical operators. The default logical operators is The query can be used with other filters e.g. The maximum number of allowed characters is 255. |
isNlQuery |
Experimental, do not use. If the query is a natural language question. False by default. If true, then the question-answering feature will be used instead of search, and |
customPropertyFilter |
This filter specifies a structured syntax to match against the [PropertyDefinition].[isFilterable][] marked as Supported operators are: Boolean expressions (AND/OR/NOT) are supported up to 3 levels of nesting (for example, "((A AND B AND C) OR NOT D) AND E"), a maximum of 100 comparisons or functions are allowed in the expression. The expression must be < 6000 bytes in length. Sample Query: |
timeFilters[] |
Documents created/updated within a range specified by this filter are searched against. |
documentSchemaNames[] |
This filter specifies the exact document schema If a value isn't specified, documents within the search results are associated with any schema. If multiple values are specified, documents within the search results may be associated with any of the specified schemas. At most 20 document schema names are allowed. |
propertyFilter[] |
This filter specifies a structured syntax to match against the |
fileTypeFilter |
This filter specifies the types of files to return: ALL, FOLDER, or FILE. If FOLDER or FILE is specified, then only either folders or files will be returned, respectively. If ALL is specified, both folders and files will be returned. If no value is specified, ALL files will be returned. |
folderNameFilter |
Search all the documents under this specified folder. Format: projects/{projectNumber}/locations/{location}/documents/{documentId}. |
documentNameFilter[] |
Search the documents in the list. Format: projects/{projectNumber}/locations/{location}/documents/{documentId}. |
queryContext[] |
For custom synonyms. Customers provide the synonyms based on context. One customer can provide multiple set of synonyms based on different context. The search query will be expanded based on the custom synonyms of the query context set. By default, no custom synonyms wll be applied if no query context is provided. It is not supported for CMEK compliant deployment. |
documentCreatorFilter[] |
The exact creator(s) of the documents to search against. If a value isn't specified, documents within the search results are associated with any creator. If multiple values are specified, documents within the search results may be associated with any of the specified creators. |
customWeightsMetadata |
To support the custom weighting across document schemas, customers need to provide the properties to be used to boost the ranking in the search request. For a search query with CustomWeightsMetadata specified, only the RetrievalImportance for the properties in the CustomWeightsMetadata will be honored. |
TimeFilter
Filter on create timestamp or update timestamp of documents.
JSON representation |
---|
{ "timeRange": { object ( |
Fields | |
---|---|
timeRange |
|
timeField |
Specifies which time field to filter documents on. Defaults to [TimeField.UPLOAD_TIME][]. |
Interval
Represents a time interval, encoded as a timestamp start (inclusive) and a timestamp end (exclusive).
The start must be less than or equal to the end. When the start equals the end, the interval is empty (matches no time). When both start and end are unspecified, the interval matches any time.
JSON representation |
---|
{ "startTime": string, "endTime": string } |
Fields | |
---|---|
startTime |
Optional. Inclusive start of the interval. If specified, a timestamp matching this interval will have to be the same or after the start. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
endTime |
Optional. Exclusive end of the interval. If specified, a timestamp matching this interval will have to be before the end. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
TimeField
time field used in TimeFilter.
Enums | |
---|---|
TIME_FIELD_UNSPECIFIED |
Default value. |
CREATE_TIME |
Earliest document create time. |
UPDATE_TIME |
Latest document update time. |
DISPOSITION_TIME |
time when document becomes mutable again. |
PropertyFilter
JSON representation |
---|
{ "documentSchemaName": string, "condition": string } |
Fields | |
---|---|
documentSchemaName |
The Document schema name |
condition |
The filter condition. The syntax for this expression is a subset of SQL syntax. Supported operators are:
Supported functions are Boolean expressions (AND/OR/NOT) are supported up to 3 levels of nesting (for example, "((A AND B AND C) OR NOT D) AND E"), a maximum of 100 comparisons or functions are allowed in the expression. The expression must be < 6000 bytes in length. Only properties that are marked filterable are allowed ( Sample Query: CMEK compliant deployment only supports:
|
FileTypeFilter
Filter for the specific types of documents returned.
JSON representation |
---|
{
"fileType": enum ( |
Fields | |
---|---|
fileType |
The type of files to return. |
FileType
representation of the types of files.
Enums | |
---|---|
FILE_TYPE_UNSPECIFIED |
Default document type. If set, disables the filter. |
ALL |
Returns all document types, including folders. |
FOLDER |
Returns only folders. |
DOCUMENT |
Returns only non-folder documents. |
ROOT_FOLDER |
Returns only root folders. |
CustomWeightsMetadata
To support the custom weighting across document schemas.
JSON representation |
---|
{
"weightedSchemaProperties": [
{
object ( |
Fields | |
---|---|
weightedSchemaProperties[] |
List of schema and property name. Allows a maximum of 10 schemas to be specified for relevance boosting. |
WeightedSchemaProperty
Specifies the schema property name.
JSON representation |
---|
{ "documentSchemaName": string, "propertyNames": [ string ] } |
Fields | |
---|---|
documentSchemaName |
The document schema name. |
propertyNames[] |
The property definition names in the schema. |
HistogramQuery
The histogram request.
JSON representation |
---|
{
"histogramQuery": string,
"requirePreciseResultSize": boolean,
"filters": {
object ( |
Fields | |
---|---|
histogramQuery |
An expression specifies a histogram request against matching documents for searches. See |
requirePreciseResultSize |
Controls if the histogram query requires the return of a precise count. Enable this flag may adversely impact performance. Defaults to true. |
filters |
Optional. Filter the result of histogram query by the property names. It only works with histogram query count('FilterableProperties'). It is an optional. It will perform histogram on all the property names for all the document schemas. Setting this field will have a better performance. |
HistogramQueryPropertyNameFilter
JSON representation |
---|
{
"documentSchemas": [
string
],
"propertyNames": [
string
],
"yAxis": enum ( |
Fields | |
---|---|
documentSchemas[] |
This filter specifies the exact document schema(s) At most 10 document schema names are allowed. Format: projects/{projectNumber}/locations/{location}/documentSchemas/{document_schema_id}. |
propertyNames[] |
It is optional. It will perform histogram for all the property names if it is not set. The properties need to be defined with the isFilterable flag set to true and the name of the property should be in the format: "schemaId.propertyName". The property needs to be defined in the schema. Example: the schema id is abc. Then the name of property for property MORTGAGE_TYPE will be "abc.MORTGAGE_TYPE". |
yAxis |
By default, the yAxis is HISTOGRAM_YAXIS_DOCUMENT if this field is not set. |
HistogramYAxis
The result of the histogram query count('FilterableProperties') using HISTOGRAM_YAXIS_DOCUMENT will be: invoice_id: 2 address: 1 payment_method: 2 line_item_description: 1
Enums | |
---|---|
HISTOGRAM_YAXIS_DOCUMENT |
count the documents per property name. |
HISTOGRAM_YAXIS_PROPERTY |
count the properties per property name. |
TotalResultSize
The total number of matching documents.
Enums | |
---|---|
TOTAL_RESULT_SIZE_UNSPECIFIED |
Total number calculation will be skipped. |
ESTIMATED_SIZE |
Estimate total number. The total result size will be accurated up to 10,000. This option will add cost and latency to your request. |
ACTUAL_SIZE |
It may adversely impact performance. The limit is 1000,000. |
MatchingDocument
Document entry with metadata inside SearchDocumentsResponse
JSON representation |
---|
{ "document": { object ( |
Fields | |
---|---|
document |
Document that matches the specified |
searchTextSnippet |
Contains snippets of text from the document full raw text that most closely match a search query's keywords, if available. All HTML tags in the original fields are stripped when returned in this field, and matching query keywords are enclosed in HTML bold tags. If the question-answering feature is enabled, this field will instead contain a snippet that answers the user's natural-language query. No HTML bold tags will be present, and highlights in the answer snippet can be found in |
qaResult |
Experimental. Additional result info if the question-answering feature is enabled. |
QAResult
Additional result info for the question-answering feature.
JSON representation |
---|
{
"highlights": [
{
object ( |
Fields | |
---|---|
highlights[] |
Highlighted sections in the snippet. |
confidenceScore |
The calibrated confidence score for this document, in the range [0., 1.]. This represents the confidence level for whether the returned document and snippet answers the user's query. |
Highlight
A text span in the search text snippet that represents a highlighted section (answer context, highly relevant sentence, etc.).
JSON representation |
---|
{ "startIndex": integer, "endIndex": integer } |
Fields | |
---|---|
startIndex |
Start index of the highlight. |
endIndex |
End index of the highlight, exclusive. |
HistogramQueryResult
Histogram result that matches HistogramQuery
specified in searches.
JSON representation |
---|
{ "histogramQuery": string, "histogram": { string: string, ... } } |
Fields | |
---|---|
histogramQuery |
Requested histogram expression. |
histogram |
A map from the values of the facet associated with distinct values to the number of matching entries with corresponding value. The key format is:
An object containing a list of |