通常,系统会随时间的推移收集数字和分类数据。例如,下图显示了数据中心中单个正在运行的作业的 CPU 使用率、内存用量和状态,以及在一段时间内每分钟的变化情况。CPU 用量和内存用量是数值,状态是分类值。
事件
Timeseries Insights API 使用事件作为基本数据条目。每个事件都有一个时间戳和一组维度,即键值对,其中键是维度名称。这种简单的表示法使我们能够处理规模达到数万亿级的数据。例如,系统会包含数据中心、用户、作业名称和任务编号,以便全面表示单个事件。上图显示了为单个作业记录的一系列事件,其中展示了部分维度。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-03。"],[],[],null,["# Overview\n\nIntroduction\n------------\n\nForecasting and anomaly detection over billions of time series is\ncomputationally intensive. Most existing systems run forecasting and anomaly\ndetection as batch jobs (for example, risk pipelines, traffic forecasting,\ndemand planning, and so on). This severely limits the type of analysis that you\ncan perform online, such as deciding whether to alert based on a sudden increase or\ndecrease across a set of event dimensions.\n\nThe main goals of the Timeseries Insights API are:\n\n- Scale to billions of time series that are dynamically constructed from raw events and their properties, based on query parameters.\n- Provide real-time forecasting and anomaly detection results. That is, within a few seconds, detect trends and seasonality across all time series and decide whether any slices are spiking or decreasing unexpectedly.\n\nAPI functionality\n-----------------\n\n- Manage datasets\n - Index and load a dataset consisting of multiple data sources stored on Cloud Storage. Allow appending new events in a streaming fashion.\n - Unload a no longer needed dataset.\n - Ask for the processing status of a dataset.\n- Query datasets\n - Retrieve the time series that matches the given property values. The time series is forecast up to a specified time horizon. The time series is also evaluated for anomalies.\n - Automatically detect combinations of property values for anomalies.\n- Update datasets\n - Ingest new events recently occurred and incorporate them into the index in nearly real-time (seconds to minutes delay).\n\nDisaster recovery\n-----------------\n\nThe Timeseries Insights API does not serve as a backup for\nCloud Storage or return raw streaming updates. Clients are\nresponsible to store and backup data separately.\n\nAfter a regional outage, the service performs a best effort recovery.\nMetadata (information about dataset and operational status) and streamed user\ndata updated within 24 hours of the start of the outage might not\nbe recovered.\n\nDuring recovery, queries and streaming updates to datasets might not\nbe available.\n\nInput data\n----------\n\nIt is common that numerical and categorical data is collected over time. For\nexample, the following figure shows the CPU usage, memory usage, and status of\na single running job in a data center for every minute over a period of time.\nThe CPU usage and memory usage are numerical values, and the status is a\ncategorical value.\n\n### Event\n\nThe Timeseries Insights API uses events as the basic data entry. Each event has a\ntimestamp and a collection of dimensions, that is, key value pairs where the key\nis the dimension name. This simple representation allows us to handle data in\nthe scale of trillions. For example, the data center, user, job names, and task\nnumbers are included to fully represent a single event. The above figure shows\na series of events recorded for a single job illustrating a subset of dimensions. \n\n {\"name\":\"user\",\"stringVal\":\"user_64194\"},\n {\"name\":\"job\",\"stringVal\":\"job_45835\"},\n {\"name\":\"data_center\",\"stringVal\":\"data_center_30389\"},\n {\"name\":\"task_num\",\"longVal\":19},\n {\"name\":\"cpu\",\"doubleVal\":3840787.5207877564},\n {\"name\":\"ram\",\"doubleVal\":1067.01},\n {\"name\":\"state\",\"stringVal\":\"idle\"}\n\n### DataSet\n\nA [DataSet](/timeseries-insights/docs/reference/rest/v1/projects.locations.datasets#DataSet)\nis a collection of events. Queries are performed within the same dataset. Each\nproject can have multiple datasets.\n\nA dataset is built from batch and streaming data. Batch data build reads\nfrom multiple Cloud Storage URIs as data sources. After batch build completes,\nthe dataset can be updated with streaming data. Using batch build for historical\ndata, the system can avoid cold-start problems.\n\nA dataset needs to be built or indexed before it can be queried or updated.\nIndexing starts when the dataset is created, and typically takes minutes to\nhours to complete, depending on the amount of data. More specifically, the\ndata sources are scanned once during the initial indexing. If the contents of\nthe Cloud Storage URIs change after initial indexing completes, they are not\nscanned again. Use streaming updates for additional data. Streaming updates\nget indexed continuously in near real time.\n| **Note:** The Timeseries Insights API can not return the raw streaming updates, so clients should store these data separately if raw data is needed.\n| **Note:** Streaming data is expected to have timestamps not too far back from realtime, so streaming updates cannot be used to incrementally add historical data.\n\nTimeseries and anomaly detection\n--------------------------------\n\nFor the Timeseries Insights API, a [slice](/timeseries-insights/docs/concept#slice)\nis a collection of events with a certain combination of dimension values. We\nare interested in a measure of events falling into these slices over time.\n\nFor a given slice, the events are aggregated into numerical values per\nuser-specified resolution of time intervals, which are the time series to detect\nanomalies. The preceding figure illustrates different choices of slices resulted\nfrom different combinations of \"user\", \"job\", and \"data_center\" dimensions.\n\nAn anomaly happens for a certain slice if the numerical value from the time\ninterval of interest is significantly different from the values in the past. The\nabove figure illustrates a time series based on temperatures measured across the\nworld over 10 years. Suppose we are interested in whether the last month of\n2015 is an anomaly. A query to the system specifies the time of interest,\n`detectionTime`, to be \"2015/12/01\" and the `granularity` to be \"1 month\". The\nretrieved time series before the `detectionTime` is partitioned into an earlier\n**training** period followed by a **holdout** period. The system uses data from\nthe training period to train a model, and uses the holdout period to verify that\nthe model can reliably predict the next values. For this example, the holdout\nperiod is 1 year. The picture shows the actual data and predicted values from\nthe model with upper and lower bounds. The temperature for 2015/12 is marked\nanomaly because the actual value is outside the predicted bounds.\n\nWhat's next\n-----------\n\n- Timeseries Insights API [Concepts](/timeseries-insights/docs/concept)\n- A more detailed [Tutorial](/timeseries-insights/docs/tutorial)\n- Learn more about the [REST API](/timeseries-insights/docs/reference/rest/v1/projects.locations.datasets)"]]