What is a Time Series Database (TSBD)?
A time series database (TSDB) is designed especially for handling time series data or time-stamped data. Time series data can be collected from a range of events or metrics, but it is always across server periods of time rather than individual events.
You can use time series data for monitoring, aggregating, downsampling, and tracking behaviors across time. A TSDB can be used for a variety of data and purposes, including monitoring application performance, storing server metrics and network data, analyzing sensor data, tracking events, market trades, clicks, and more.
TSDB systems are built to measure changes across periods of time. This means the architecture of a TSDB is typically different from other databases, especially in regards to summarization and data lifecycle management.
In this article, you will learn:
How Does a Time Series Database Work?
A TSDB captures a set of fixed values and dynamic values. For example, in a web application, a set of data points might be actions performed by a group of visitors. The dynamic values are the number of desired actions, such as eCommerce purchases, performed by the visitors over time. By analyzing purchases over time, the organization can understand the value of each group of users, and prioritize marketing activity for different customer segments.
Ideally, time series records should be written into a repository in a format that enables quick time-based writes and reads. Because the records are time-stamped, the order of data points becomes a native aspect of the data. You can then use this order to deliver the data to a stream processing engine, which can then treat the ordered data like a data stream. By leveraging a fast stream processing engine you can ensure your TSDB is fast.
What is a Time Series Database Used For?
Internet of Things (IoT)
IoT technologies generate and use massive quantities of time-series data. For example, mobile devices, eCommerce applications, automobiles, and systems for inventory management, all time-stamp data according to events. Quick ingestion of time-series data is critical to ensure IoT devices and metrics can continuously capture data and store it for analysis.
Time series data is often used to monitor computer system metrics. This process works by reading data from computer systems, of users who have agreed to let others monitor the computer system. Typically, metrics include process count and memory utilization, which help check the utilization of computer resources and assess if resources should be reallocated.
Key Performance Indicators (KPIs)
KPIs are time-oriented and sampled repeatedly and fit easily into time series data. Some examples of these KPIs may include profit, revenue, cost, conversion rate, number of transactions and average order value. Once this information is collected and stored, it can be used to create dashboards.
Anomaly detection helps detect unexpected deviations in time series data. Time series data captures a value whenever a system change occurs. Organizations can use these values to measure change, discover ways in which changes occured in the past, monitor what is currently happening, and leverage this accumulated data to predict future events.
Virtualization is a main factor in achieving anomaly detection. A time series plot, for example, provides the visualization people often need to spot outliers. Automated anomaly detection is another way, one that often expedites the process, offering insights in real-time. This can allow you to quickly correlate outliers.
Requirements for a Time Series Database
Databases that store time-series data should provide the following capabilities:
- In-memory for value alerting—inputted data should be immediately compared to all values configured to trigger alerts.
- In-memory for trend alerting—inputted data should also be compared to previous values, to check for trend alarms. When all trend evaluation relevant records are kept in-memory, the comparison can be achieved quickly. The system can also catch relevant high and low values from previous records.
- In-memory for applications and dashboards—applications perform actions according to data values, and dashboards are required to display updated values. To achieve these goals, applications need in-memory live data (because it enables rapid action), and dashboards need continuous display updates.
- Quick access for machine learning (ML), artificial intelligence (AI), and real-time analytics—business intelligence (BI) systems, ad-hoc queries, ML algorithms, AI software, and reporting tools, all need quick access to data stores. To provide this level of speed, data might need to be heavily cached, kept in-memory, or accessed efficiently from any combination of disk and memory resources.
- High concurrency for real-time analytics—time-series data represents the most recent data reading. Stakeholders of various interests often need to access this data at the same time. This means many logins and queries can arrive simultaneously, and the system should be able to handle these demands.
- High capacity—to handle massive amounts of data, a TSDB should be fast and scalable. Often a time-series database is required to scan and compare data on input for alerts. The system also needs to store data accessibly, and respond to queries against large data sets.
- Standard SQL functions—SQL is amongst the most used languages in data processes. To ensure key functionality and usage of time series databases, SQL functions should work at peak performance.
- Custom time-series functions—in addition to SQL functionality, time series databases need extended functionality for performance. You can include, for example, functions for returning only records with the highest and lowest readings from a big set of records.
Time Series Database with Raima
Raima’s latest version has introduced powerful support for time series data storage and aggregation. RDM now has the capability of generating a C/C++ interface based off of a time series definition along with a full set of Fast Fourier Transformation support API calls. The FFT calls support scaling, absolute value and real computations along with a modular design such that a developer can swap the Raima FFT library out with any other custom or third party library to suit their needs. The time series code supports automatic arithmetic, geometric, and harmonic mean calculations along with downsampling and data splitting.