Table of Contents
What is an In-Memory Database System?
An in-memory database (IMDB; also main memory database system or MMDB) is stored in a computer’s main memory (RAM), and is managed by an in-memory database management system. Traditional databases are stored on disk drives.
Traditional disk-based databases are formatted with an awareness of the block-oriented device on which the data is read and written. When one part of a database refers to another part, it may require a different block to be read from the disk. This is a non-issue with in-memory databases, and interrelationships between parts of the database can be managed through direct pointers. The in-memory data is always present, so there is no latency for reading. Writing data to disk must be done in an “atomic” fashion, where all writes are registered, or none of them are. This often requires journaling or double-writes. An in-memory database has no such requirement, so the changes to memory are nearly instantaneous.
On-Disk Databases Vs. In-Memory Databases
- All data is stored on disk, disk I/O needed to move data into main memory when needed.
- Data is always persisted to disk.
- Traditional data structures like B-Trees designed to store tables and indices efficiently on disk.
- Support a very broad set of workloads, for example, OLTP, data warehousing, mixed workloads etc.
- All data stored in main memory, no need to perform disk I/O to query or update data.
- Data is persistent or volatile depending on the in-memory database product.
- Specialized data structures and index structures assume data is always in main memory.
- Optimized for high-performance.
How is data accessed and changed in an in-memory database management system?
How is in-memory data shared among multiple tasks?
The answer is twofold: fast and faster. For those familiar with the RDM Transactional File Server (TFS), which can be used to share a database among user processes in the same computer, in the office LAN or across the world, the database can be managed in-memory by the TFS. This is faster than on-disk databases managed by the TFS.
But the fastest way to share one database among multiple tasks is to run a single process with multiple threads, where each task is running in its own thread. This form of an application runs on one computer and allows each thread to access the database directly in local heap storage. It also circumvents the need to read unchanged database objects into a local cache, because the original object can be viewed directly from the database memory. This is a very fast way to share a database in a multi-user mode.
Can an on-disk database be used in-memory?
What are the advantages and disadvantages of in-memory databases?
In-memory databases, by definition, handle the data they are processing in main memory. There is no need to deal with secondary storage which can be orders of magnitude slower than accessing data held in main memory. Eliminating the requirement of accessing the slower secondary storage allows for the use of algorithms in an in-memory database that would not be feasible for a disk-based database. As an example, a disk-based database commonly uses a b-tree based index to limit the number of disk access required to locate a row. An in-memory database can use an AVL tree instead of a b-tree which reduces (or eliminates) the need to duplicate data but increases the number of rows accessed during traversal.
Without the requirement for secondary storage, an in-memory database can be used in systems without any secondary storage.
This enables the use of a database engine in a lot of embedded systems that could not support a traditional disk-based engine.
Typically, the type of memory used with an in-memory database system is not persistent. When an application closes, whether cleanly or unexpectedly, the data stored in an in-memory database will be lost. Certain application domains do not require data persistence between runs, but others may. Those applications that require a high-degree of persistence may not be suitable for using an in-memory database.
An in-memory database stores all data in main memory which can severely limit the amount of data that can be stored. Most database systems can handle allocating memory ad-hoc for storing database objects or can be given a chunk of memory to use as storage. Either way, the volume of data that can be stored in an in-memory database tends to be much smaller than that stored in a disk-based system.
What is the difference between an in-memory database and simply storing data in shared memory segments?
The biggest difference between in-memory databases and storing data in shared memory segments deals with a structured approach to data access. In-memory databases maintain components of the “ACID” attributes of data storage engines. The ACID properties include Atomicity, Consistency, Isolation, and Durability. While in-memory databases may relax the durability property based on non-persistent storage, they commonly support the other properties:
- Atomicity – multiple changes are committed to the database as an all or nothing operation
- Consistency – structural rules and relationships are maintained for all users
- Isolation – transactional changes cannot be seen by other users until they are committed
It can be time-consuming and error-prone to implement this functionality on top of shared memory segments.
In addition to transaction properties of in-memory databases, there are many other
out of the box advantages including:
Use of common well-defined data definition languages (SQL DDL statements) Use of common well-defined data query languages (SQL DML statements) Remote access to data via a network communication protocol
Ability to store data in a platform-independent format
Tools available for import/export through CSV, XML, JSON, etc. Ability to persist in-memory data to disk
An In-Memory database can be subject of data loss if something stops working: how do you cope with this?
An application developer needs to understand that data loss is a possibility when working with an in-memory database. There are several approaches, such as persistence and replication, used to mitigate data loss scenarios, but data loss is still a possibility.
- Persistence – Raima supports two modes for opening a database in-memory.
- Volatile – the database is empty when first opened and all contents are discarded when the database is closed
- Persistent – on open the database is populated from content on secondary storage, on close changed data (inserts, updates, deletes) is written out to secondary storage
If a database is opened using the persistent in-memory mode changed content will automatically be written to secondary storage when the database is closed. In addition, the developer can persist change to secondary storage on demand. Using
persistence can limit data loss to what has happened since the last persistence operation.
Many database systems support replicating changes to another database instance (or another database system). Using replication allows data that may have been lost from the in-memory copy to be recovered from the replicated copy.
Are all embedded databases also in-memory databases?
An embedded database can be defined as database engine that runs within an application and does not require other processes to be installed, configured, or accessed. The storage media for the data managed by the engine is implementation dependent.
Originally most database engines used a hard-disk for data storage as the amount of memory available would not allow for sufficient data volume for useful data sets. As the size of memory increase, many vendors began adding in-memory capabilities. Today many database engines, embedded or traditional, have some support for in- memory.
How does Raima implement an in-memory database system in RDM?
The Raima RDM database engine is divided into two components: the runtime engine and the storage engine.
The runtime is responsible for transaction management, rule enforcement, query processing, and maintaining local, uncommitted, changes to the database. Whenever the runtime engine needs a database object, it requests the object from the storage engine. In addition, when local changes in the runtime are committed, they are given to the storage engine to be stored in a location shared between all users of the database. The runtime engine does not know or care if the storage engine is using main memory or secondary storage.
The storage engine can be thought of as a key/value pair repository. The keys are database object identifiers and the values are the location of the database objects. For disk-based databases, the values will be a file identifier, file offset, and size, for in-memory database the values will be the memory location of the database object.
The objects themselves may be encoded, compressed, and encrypted, but the runtime engine will know how to interpret the contents.
By maintaining this abstraction, an application developer doesn’t need to do anything other than specifying a pre-open configuration parameter to use an in-memory RDM database. The application, runtime, and storage engine can all run in the same memory space providing very low latency access. If a developer knows that a database will be primarily used in-memory they can utilize methods such as AVL indexing to optimize for in-memory access.
Read more about Raima´s in-memory implementation.