Posted by Bill Houglum on August 20, 2012
Wikipedia defines “distributed data” as “Collections of data (e.g. in a database) distributed across multiple physical locations.” There are many reasons why a system designer would consider a distributed database solution. The following scenarios highlight several distributed configurations and the benefits to configuring a distributed database architecture.
The simplest replication or mirroring scenario can be used to backup valuable data. As changes are stored on the master copy of the database, those same changes are being forwarded to the slave database. For High-Availability (HA) environments, Raima Database Manager (RDM) will cooperate with an external HA manager to perform failover or failback functionality.
One of the considerations in any distributed database management system is the latency in data consistency between the master and the slave. This consideration is important in assessing the trade-off between performance and the amount of data the application designer is willing to lose in the event of a catastrophic failure on the master database.
Many distributed databases provide synchronous and asynchronous mirroring modes with the slave. In synchronous mode, the slave is part of the update transaction. In asynchronous mode the updates arrive at the slave in a delayed manner. In the latter mode if the master fails before the updates are shipped, the slave will still have the old values. This window of data inconsistency between the master and the slave depends on the periodicity and throughput speed of the update transfers between the master and the slave.
Placing data near the site of greatest demand automatically increases performance. The diagram below shows database partitioning based on site autonomy. The data “owned” by the site is updated and shared through replication to the second site. The RDM “union” feature allows those separate database instances (sites) to be queried as one database without needing depend on remote connections to the other site. This is possible because RDM allows read only access to the slave database instance.
The latency in data consistency is applicable in this scenario as well since both sites will be accessing a slave database for their local reads of the other site’s data. The local database is guaranteed to eventually be consistent and the window of data inconsistency is subject how quickly the updates arrive from the master.
Data captured on smart devices is becoming more prevalent in the market. One example would be sensors on vehicles or vessels that can experience intermittent connectivity. Embedded devices are able to send the data that it records to mobile databases or servers through replication or data mirroring. The communication between devices is possible through data aggregation and can even lead to efficient processes such as automated maintenance where you can be notified upon a process failure. Capturing data directly on the device increases the capture performance as it removes the latency of remote communications to a central storage repository. The replication logs can be sent to the aggregation database when connectivity is established.
Besides aggregating data captured from these smart devices, control systems need a safe and efficient method to send configuration information or control data to these devices. The system developer needs to have assurances that the entire data transfer package arrives at the device or not at all. Using replication, data can be distributed to multiple devices from a centralized source.
The Advantages of a Distributed Database
A distributed database gives you the ability to backup information, increase performance and sync data. While this can be very powerful, a distributed data design can become very complex very quickly. The Raima products are well suited to handling distributed data scenarios like those above and more. Contact us and we can help you with your specific design requirements.