Query Across Multiple Databases
Database Unions
RaimaDB’s database union feature provides a unified view of multiple identically-structured databases. Since RaimaDB allows highly-distributed data storage and processing, this feature provides a mechanism for unifying the distributed data, giving it the appearance of a single, large database.
As a simple illustration, consider a widely distributed database for an organization that has its headquarters in Seattle, and branch offices in Boston, London and Mumbai. Each office owns and maintains employee records locally, but the headquarters also performs reporting on the entire organization. The database at each location has a structure identical to the others, and although it is a fully contained database at each location, it is also considered a partition of the larger global database. In this case, the partitioning is based on geographical location.
Partitioning and unified queries can provide scaling for performance. Consider a database where each operation begins with a lookup of a record’s primary key. If the “database” is composed of four partitions, each stored on the same multi-core computer, but on different disks controlled by different disk controllers then the only requirement is a scheme that divides the primary key among the four partitions. If that scheme is a modulo of the primary key, then the application quickly determines which partition to store a record into or read the record from. Since there are multiple CPU cores to run the multiple processes (both the applications and the TFSs), and the four partitions are accessible in parallel (the four controllers permit this), the processing capacity is four times bigger than with a single-core, single-disk, single-partition configuration.
The mechanism for querying a distributed database is simple for the programmer. When the database is opened, all partitions are referenced together, with OR symbols (“|”) between the individual partition names.