We use a small dataset provided by Kalev Leetaru to illustrate the Timeseries Insights API. The dataset is derived from The GDELT Project, a global database tracking world events and media coverage. This dataset contains entity mentionings in news URLs in April 2019.
Objectives
- Learn the data format for Timeseries Insights API.
- Learn how to create, query, update and delete datasets.
Before you begin
Set up a Cloud project and enable Timeseries Insights API following Setup for Full Access.
Tutorial dataset
The dataset includes entity annotations of locations, organizations, persons, among others.
The Timeseries Insights API takes JSON format inputs. A sample Event for this dataset is
{
"groupId":"-6180929807044612746",
"dimensions":[{"name":"EntityORGANIZATION","stringVal":"Medina Gazette"}],
"eventTime":"2019-04-05T08:00:00+00:00"
}
Each event must have an eventTime
field for the event timestamp. It is preferred
each event also has a long-valued groupId
to mark related events. Event
properties are included as dimensions
, each of which has a name
and one of
stringVal
, boolVal
, longVal
, or doubleVal
.
{"groupId":"-6180929807044612746","dimensions":[{"name":"EntityORGANIZATION","stringVal":"Medina Gazette"}],"eventTime":"2019-04-05T08:00:00+00:00"}
List datasets
projects.locations.datasets.list
shows all datasets under ${PROJECT_ID}
. gcurl
is an alias and
PROJECT_ID
is an environment variable, both set up in
Getting Started.
gcurl https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets
The result is a JSON string like
{
"datasets": [
{
"name": "example",
"state": "LOADED",
...
},
{
"name": "dataset_tutorial",
"state": "LOADING",
...
}
]
}
The results show the datasets currently under the project. The state
field
indicates whether the dataset is ready to be used. When a dataset is just
created, it is in state LOADING
until the indexing completes, then transitions
to LOADED
state. If any errors occur during creation and indexing, it will be
in FAILED
state. The results also include the complete dataset information
from the original create request.
Create dataset
projects.locations.datasets.create adds a new dataset to the project.
gcurl -X POST -d @create.json https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets
where create.json
contains:
{
name: "dataset_tutorial",
dataNames: [
"EntityCONSUMER_GOOD",
"EntityEVENT",
"EntityLOCATION",
"EntityORGANIZATION",
"EntityOTHER",
"EntityPERSON",
"EntityUNKNOWN",
"EntityWORK_OF_ART",
],
dataSources: [
{uri: "gs://data.gdeltproject.org/blog/2021-timeseries-insights-api/datasets/webnlp-201904.json"}
]
}
This request creates a dataset named dataset_tutorial
from GCS
dataSources
, which contain Event data in JSON format. Only dimensions listed
in dataNames
are indexed and used by the system.
The create request returns success if it is accepted by the API server. The
dataset will be in LOADING
state until indexing completes, then the state
becomes LOADED
, after which the dataset can start accepting queries and
updates if any.
Query dataset
projects.locations.datasets.query performs anomaly detection queries.
gcurl -X POST -d @query.json https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets/dataset_tutorial:query
where query.json
contains:
{
"detectionTime": "2019-04-15T00:00:00Z",
"numReturnedSlices": 5,
"slicingParams": {
"dimensionNames": ["EntityLOCATION"]
},
"timeseriesParams": {
"forecastHistory": "1209600s",
"granularity": "86400s"
},
"forecastParams": {
"noiseThreshold": 100.0
},
}
The query result looks like follows:
{
"name": "projects/timeseries-staging/locations/us-central1/datasets/webnlp-201901-202104-dragosd",
"slices": [
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "Notre Dame"
}
],
"detectionPointActual": 1514,
"detectionPointForecast": 15.5,
"expectedDeviation": 5.5,
"anomalyScore": 14.203791469194313,
"status": {}
},
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "Seine"
}
],
"detectionPointActual": 1113,
"detectionPointForecast": 14,
"expectedDeviation": 15,
"anomalyScore": 9.5565217391304351,
"status": {}
},
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "Ile de la Cite"
}
],
"detectionPointActual": 852,
"detectionPointForecast": 0,
"expectedDeviation": 1,
"anomalyScore": 8.435643564356436,
"status": {}
},
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "Paris"
}
],
"detectionPointActual": 1461,
"detectionPointForecast": 857,
"expectedDeviation": 441,
"anomalyScore": 1.1164510166358594,
"status": {}
},
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "France"
}
],
"detectionPointActual": 1098,
"detectionPointForecast": 950.5,
"expectedDeviation": 476.5,
"anomalyScore": 0.25585429314830876,
"status": {}
}
]
}
Streaming update
projects.locations.datasets.appendEvents adds Event records in a streaming fashion.
gcurl -X POST -d @append.json https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets/dataset_tutorial:appendEvents
where append.json
contains (please replace eventTime
to a timestamp close to the present time):
{
events: [
{
"groupId":"1324354349507023708",
"dimensions":[{"name":"EntityPERSON","stringVal":"Jason Marsalis"}],
"eventTime":"2022-02-16T15:45:00+00:00"
},{
"groupId":"1324354349507023708",
"dimensions":[{"name":"EntityORGANIZATION","stringVal":"WAFA"}],
"eventTime":"2022-02-16T04:00:00+00:00"
}
]
}
Streamed updates get indexed near-real time so changes can respond quickly in
query results. All events sent by a single
projects.locations.datasets.appendEvents
request must have the same groupdId
.
Delete dataset
projects.locations.datasets.delete marks the dataset for deletion.
gcurl -X DELETE https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets/dataset_tutorial
The request returns immediately, and the dataset will not accept additional queries or updates. It may take sometime before the data is completely removed from the service, after which List datasets will not return this dataset.
What's next
Some other examples can be found on the GDELT website by searching for "Timeseries Insights API".