High-Availability Database (HA DB)
RDM, the database management system, was designed to provide high-availability and superior uptime. RDM will seamlessly replicate and mirror your data between various environments, providing you with near real-time access to managed data without downtime.
What is Database Replication?
Database replication is a process of copying data from a database to one or more copies, or replicas, usually to improve accessibility or fault-tolerance.
The terms "active replication" and "passive replication" are sometimes used to characterize how this process works. As defined on Wikipedia, active replication is performed by processing the same request at every replica. In passive replication, each request is processed on a single replica and then its state is transferred to the other replicas.
RDM's Replication Abilities
Raima Database Manager (RDM) supports both of these techniques; in Raima documentation active replication is referred to as just "replication", while passive replication is referred to as "mirroring". In the context of RDM, mirroring will result in copies that are identical to the original database, byte-for-byte, whereas replication will result in copies that are non-identical. These copies will contain all the records transferred from the original, but the physical organization of the records in the database files (or in memory) may differ. Also, the copy may contain additional tables or indexes that were never in the original database. Or it may not contain some of the tables from the original database, as RDM Enterprise Lite supports filtered replication, where only specified tables are replicated.
"Active-Active" vs. "Active-Passive"
In the context of replication, the terms "active-active" and "active-passive" are also common, but these refer to different concepts from the ones just described. Active-active replication means two-way replication of data between two databases that are both being actively updated, and is also called "multi-master replication". Active-passive replication means one-way replication from a master database that is being actively updated to a slave database that is not updated except by the replication process. This is also called "master-slave" replication.
In RDM replication is always master-slave replication, and is performed asynchronously. It can take either of two forms:
- Replication from one RDM database to another: both the master and slave are RDM databases, where the slave is a read-only copy of the master.
- Replication from an RDM database to a different type of database, such as a Microsoft SQL Server database, an Oracle database, a MySQL database or an RDM SQL database. Here the slave may contain other data besides that copied from the master, and may be updated by other applications. However, replication is still "one-way" in the sense that data is transferred only from the RDM database to the slave, and never the other way round.
Optimization of Replication in Embedded Systems
In the context of embedded systems, the second form of replication is useful for gathering data from multiple embedded devices and storing it all in the same machine, where it is available for user queries or reporting. Usually data stored on an embedded device arrives in real time from sensors and other connected hardware, and is highly time-sensitive. Any operation that may block or slow down updates on the embedded device is unacceptable. However, without much CPU cost the data can be replicated (asynchronously) from the embedded device to a PC-based database, where there is more powerful hardware and less requirement for bounded response times.
Differences Between Data Replication and Mirroring
In RDM there is some functional overlap between replication and mirroring, but in general replication is provided mainly for improving accessibility. By maintaining multiple slave copies of the same data on several nodes, some of the workload is offloaded from the master. This will improve update performance in the master database and allow faster reads on the other nodes, where applications can access a local slave copy of the database. When used together with RDM’s support for database unions this allows you to construct a distributed database system. An application can open multiple databases, located on the same machine or different machines, in a database union, and get a unified view of the data, as if it were all in the same database. This support for replication and distributed databases make RDM a highly scalable database system.
Replication can also be a convenient way of propagating configuration data across a network of processors. A master database may contain the current configuration data for the system, and this can be replicated through a multi-tier structure to the other nodes on the network. Each node can be programmed to discover an available source to replicate from, and connect to it. Replication supports in-memory and disk-based databases, so in this example the source of the multi-tier replication could be an in-memory database read from disk, that is replicated throughout the system to in-memory local databases as each node starts up.
To mirror a database is to create a byte-for-byte copy of a database at a different location. Mirroring is different than copying or backing up a database in that a mirror database is updated at the same time as the original database (synchronous) or as quickly as possible after the original database is updated (asynchronous). Page images from the master are applied to the slave(s) to implement mirroring.
The 3 main purposes for mirroring
- To maintain another copy of a database for safe-keeping. The backup copy may be an on-disk copy of an in-memory master database.
- To offload reading of a database to another computer.
- To be prepared to switch processing to another computer if the primary computer fails. This is often referred to as a Highly-Available database.
The Difference Between High-Availability and Mirroring
Although mirroring is often considered in the context of High Availability (HA), it is not the same thing. HA implies more than mirroring in database terms, as HA includes the ability to detect and respond to a failed component in the system, switching to a standby component (failover). Mirroring is the component within an HA system that maintains a redundant copy of the database.
RDM's Mirroring Abilities
In Raima Database Manager (RDM) mirroring has a highly modular and flexible architecture, making it easy to plug RDM components together to achieve database mirroring within a single system or across multiple systems. The mirroring components can all be controlled by an application through published API functions.
At its simplest, RDM mirroring consists of the following components:
- A master database – this database is being continuously updated by the application.
- A slave database – this is a read-only copy of the master.
- A database server controlling access to the master database.
- A database server controlling access to the slave database.
- A mirroring agent that publishes changed data from the master database.
- A mirroring agent that subscribes to the data from component #5 and applies that data to its slave database.
In this list, components 1, 3 and 5 belong to the master system while components 2, 4 and 6 belong to the slave system.
In practice, there may be more components for any of the following reasons:
- There may be more than one slave database for each master.
- The database servers and mirroring agents can handle multiple databases if required.
- The mirroring agents may connect to multiple other mirroring agents, either as publishers or subscribers.
- Mirroring may be a multi-tiered system.
Architecture Optimized for Parallelism
This architecture allows for parallelism, and makes use of transaction logs that are generated automatically as part of RDM's transaction processing, so mirroring does not impose much overhead on RDMs CPU requirements.
Mirroring with RDM - Synchronous or Asynchronous
- With synchronous mirroring, only one slave database per master is allowed. When an application writes a transaction to the master database the commit will not complete in the master until it is complete in the slave. This means that the slave must be connected to the master, otherwise transactions in the master database will fail. It also means that synchronous mirroring across the Internet is not practical.
- With asynchronous mirroring, multiple slave databases are allowed for each master. The slaves do not need to be connected to the master at all times. They may disconnect and then reconnect later, catching up with the updates that have occurred in between. In this configuration, applications writing transactions to the master database are not blocked waiting for the slave transaction to complete.
The status of the RDM components can be determined programmatically by an application, so that RDM mirroring can be built into an effective HA system. The application provides the controlling logic for the HA system: the ability to determine that a failover is required, and the overall control of the failover and the failback. The RDM mirroring components provide mechanisms to switch the roles of the master and slave.
RDM is often used on embedded systems, where a mirror database may be maintained in memory or on a separate file system from the master database location. This solution requires relatively little extra hardware.
In some embedded systems RDM mirroring may be used as a mechanism for transferring the database contents to an external system, normally a PC system. Although it would be possible for the PC software to connect directly to the database server on the embedded system, this might place an unpredictable burden on the embedded system, whereas mirroring has very little impact. In this situation either mirroring or replication could be used. The choice usually depends on whether the slave database is required to be a byte-for-byte copy of the master (mirroring) or just logically equivalent (replication).