High Throughput via Paralleism

Scaling up the performance of a system typically involves adding computer hardware. The goal, especially with a shared resource like a database, is to add pieces (both hardware and software) that can run in parallel. If a system is divided up into pieces that block or interfere with each other nothing is gained. Parallelism is the key, if parallel units do not impede others then total throughput can be increased.
 

Here is a proposed architecture for a hardware and software configuration to allow maximum parallelism when using the RDM database engine.

High Throughput

 

Hardware

The architecture recommended contains a separate disk controller for every disk drive. Why? Because even with multiple CPU cores executing multiple independent processes on different disk files, a single disk controller will end up serializing the disk access, creating a bottleneck. So the computer is a multi-core computer with 2 cores for every disk controller/drive. For this example, 8 cores with 4 controllers/drives.

Software

Given this hardware configuration, a software configuration needs to be designed for parallel operation. A necessary ingredient for parallel software operation is a database that is partitioned such that each partition can be updated independently from the other partitions.

 

The applications in the figure above will open 4 different databases within 4 different “task” structures and then decide, based on a primary key, which database a record belongs in. It will either find it there or create it there. Reading is different. Within one “task” structure, all 4 databases should be opened in one call using the database union feature, and reading should be done without locks by using MVCC (Multi-Version Concurrency Control) read-only-transactions.

 

Note also that CPU cores are depicted as though they are assigned to application processes, but the reality is that they are normally operating as SMP, so they will be scheduled to execute the processes that are available. In this case, it will potentially be all 4 TFSs and up to 4 application processes.

 

Using an architecture similar to this RDM developers can design applications to take advantage of modern hardware and operating systems.  By providing features that support parallelism within the data engine RDM allows developers to create applications that can scale with the hardware provided.

Try the configuration now