How we test Raima Database Manager
Before the release of RDM, many tests and methods have been developed to ensure a robust and usable product. This is due to the stability and robustness requirements for a database being much more stringent than many other types of applications.
For instance, durability is of extreme importance. A user never wants to lose their data just because their application crashed due to a software bug or a hardware failure. Raima does everything in its power to protect the data.
Within this page will be a glimpse of some of the procedures and methods Raima follows to create a robust and well-tested product.
Database Invariant Testing
The C/C++ QA Framework, developed at Raima, has built-in support for invariant testing. An invariant test is a special type of test where a decided upon condition should be true over “some time”. This condition is called an invariant. What “some time” means will be explained further on. For a database, the invariant is typically about the data stored in the database. The invariant could be quite relaxed or it could be very strict. An example of a relaxed invariant is “the database exists”. An example of a stricter invariant is “computing some kind of a hash from all the data in the database should yield a given value”.
The QA Framework have special support to create, run and destroy a database invariant. A create case for a database invariant will create the database with a given schema and typically insert some data to establish a database invariant. This special case is always run before the normal run case or cases. The normal run cases are written to “maintain” the invariant. What we mean by “maintain” is specified later. After the normal cases have been run the destroy case is run and the invariant is no longer maintained. The time from when the create case returned and till the destroy case is called is the “some time” the invariant is maintained.
A default run of a QA Framework invariant test will do the destroy case, then the create case, followed by the normal run cases, and last the destroy case.
However, the QA Framework can be instructed to run the normal run cases from multiple threads. The QA Framework will instantiate the threads and run the normal cases in parallel. A correctly written invariant test and a robust product should be able to handle this.
Another scenario is to run the test once, instruct the QA Framework to not run the destroy case at the end. This will leave the database invariant intact. Then we can run the test in multiple processes where only the normal run cases are run. A correctly written invariant test and a robust product should also be able to handle this.
Another scenario is to run the test once, instruct the QA Framework to not run the destroy case at the end, copy the database image to another architecture or operating system, and the run the normal cases again. A correctly written invariant test, a robust product, and portable product should also be able to handle this situation as well.
Image compatibility between versions of RDM
Another scenario is to run the test once but instruct the QA Framework to not run the destroy case at the end, save the database image for later releases. Before those later releases we restore the database images and run the tests again with only the normal cases. A correctly written invariant test, a robust product, and well-maintained product should be able to handle this situation.
A database invariant test can be used to test that database recovery is working correctly.
It is impractical to power-cycle a machine (even though we have done that in the past). Instead we simulate a system crash by hooking in between RDM and the operating system. This can be done by preloading on Linux. We can thus simulate a system crash with lost writes. Rerunning a database invariant test with only the normal test cases should succeed for a correctly written test and durable product.
Corrupt image testing
A database invariant test can also be used to test that the test and RDM is robust to database image corruption.
Again, using a database invariant test and at the same time do random writes to one or more of the database files may result in a failure. Such a failure is OK. The requirement in this case is that RDM or the test should not crash for a correctly written test and robust product.
Remote TFS vs local TFS
A test using the C/C++ QA Framework can connect to the Transactional File Server (TFS) in many ways. As an example, the TFS may be embedded into one database invariant test running and at the same time another instance of the same database invariant test may be connecting remotely to the first instance. The second instance may be running locally on the same machine as the first instance or it may be on a second machine. The second machine may run the same architecture and/or operating system or it may be on a different one. There is a lot of combinations here.
We have some tests that are not strictly a database invariant test. One such case are tests where the normal case or cases can be run only once. If we run the normal cases again the QA Framework have to be instructed to only do a “verify”. A “verify” will not change the database in any way, it will just verify the database invariant. If the QA Framework is instructed to run the normal cases in parallel for a verify test from multiple threads, such an instruction will be ignored. It will always run it from one thread only.
The last type of database invariant tests are sequence tests. They are just like database invariant tests except that the normal cases cannot be run in parallel. The database invariant is only valid after a successful run of all the normal cases. There is no promise that the database invariant will be valid during a run of the normal cases.
We have a number of database invariant tests we run as outlined above. One of our main database invariant tests does a lot of random operations. Every time it commits the sum of something is maintained to be 0. This test have combination with different type of insert, update, and delete in combination with verification parts. It uses nested start-update and start-read where it randomly rolls back to a previous state. For verification it uses both start read and start snapshot. This test is not written for throughput but rather to stress our transaction engine and make sure that we are fully ACID.
High volume data testing
Another database invariant these is written for high throughput. Its main purpose is to do some inserts, verify that the data is there, and then delete the data it inserted. However, the data is inserted based on pseudo random numbers. The seed used for these inserts is also inserted. The invariant is that data corresponding to these seeds should exist. The main purpose of this test is to simply create a lot of data and stress our engine with new database files being created. Running this test is similar ways as our other tests ensures that rolling over to a new database file work even in cases where we have to do recovery.
Database invariant tests, including verify tests, and sequence tests can be used to test one way replication. This will soon be added to the matxis of tests we run before a release.
Database invariant test are invaluable for testing at Raima. Some of the complexity is handled by the QA framework, and a database invariant.
This suite of tests has been developed specifically for our lock manager. Some tests have specifically been designed to cover general and some special cases. Other have been put in place as a result of bugs encountered in our SQL engine which might or might not have been the fault of the core engine.
With our nested locking model, we need to make sure that any updates are ACID. With respect to locking this has certain implications. For instance, with an update in combination with a read we have to make sure that the read lock is held until the update is committed, even though the application has freed the lock. A test cannot observe this through the standard API without using a second database connection. We have single threaded tests that verifies this. This makes it easier to reproduce and find bugs.
We also do lock fuzzy testing where randomized calls to the lock manager is made through the public API. This generates an enormous number of combinations that would be very hard to cover otherwise. The algorithm used in the test to verify that the locking is done correctly have been developed independently from the lock manager code in the product. This test also has the capability of writing out test code for the exact sequence of API calls used. Bugs found with this test have been added to our suite of regression tests for easier reproducibility and protect the code from future changes.
We use Valgrind with the default tool called memcheck. This tool have been proven valuable in finding many types of bugs. A complete run of our default set of tests for C and C++ takes several days for it to run till completion. This have ensured both a good quality of our product and the tests itself.
Efence does a small subset of what memcheck can do but is much faster. It has been somewhat useful for testing RDM for buffer overflow on longer running tests.
When we have a bugs and this can be reproduced with a test it is added as a regression test to one of our tests suites. The waste majority of bugs can easily be reproduced. One class of bugs that are hard to reproduce is when multiple threads are involved. However, we do have many tests that by design can be run in parallel with multiple instances of itself or in parallel with other tests within a suite.
We compile our source code using gcc with options to produce code coverage. Then we use gcov, lcov, and genhtml to produce something that can be visualized.
We use Valgrind with the tool helgrind to test RDM for thread safeness. Libraries that by design are not reentrant uses mutexes to protect shared data structures. Helgrind can find places where data structures are not properly protected. However, this require tests that uses multiple threads against some shared data structures.
Our database invariant tests discussed earlier is one good candidate for this type of testing as the QA Framework have the capability to run several instances of the test in parallel. Other type of tests can also be run in parallel, however these tests will use separate databases thus testing data structures shared between databases for thread safeness.
Writing programs for parallelism is hard. You do not want to use a mutex unless it is needed for correctness. Using mutexes however, affect performance. Some places we have therefore used algorithms that do not require semaphore protection for certain shared data structures. These algorithms have been carefully designed to ensure correctness and we have used special decoration in the source code for Helgrind to suppress warnings. Such decoration is also useful as documentation and when debugging the code.
Most of our libraries are designed to be reentrant. Other than using functionality in other libraries, anything contained within the library are completely reentrant. That means, two callers can use functionality in the library without any risk of race conditions and leakage of information from one caller to another caller as long as the program control does not go into another library that are not reentrant, they both use separate handles, and there are no buffer overflows. This type of design makes it easier to reason about correctness.
With careful design this can easily be enforced by analyzing the compiled libraries for compliance. We have automated tests for this that analyze all our libraries, even those that are not reentrant. The test will trigger on any new additions of non-compliance. In our documentation we have included information about compliance for each library.
We use assert throughout our source code. An assert is a statement at a certain point in the code that always should be true. Careful design have to be taken to use it. For instance asserts also have to hold in the case of an error conditions.
Assertions are not designed for handling bit error in the CPU cache or the RAM. However, for file access one has to assume there can be bit errors. Actually, any type of disk corruption can happen in the general case and the engine have to be robust enough to handle it. Any decision based on content read, either have to be validated or the engine has to be robust enough to not crash or run into an infinite loop. If the error happened in user data, the database engine may return incomplete or wrong data. If the error on the other hand happen in metadata the engine may discover that. In that case an error is reported back to the user.
However, conceptually it is useful to assume that there can not be disk corruptions, but any assert that is based on such an assumption has to be treated specially. In production an error has to be reported back to the user. During testing we may want to instead assert depending on the type of testing we do. We use a separate type of assert for this than can be defined to behave one or the other way. This allows us to run tests where corruption is simulated and tests where corruption is assumed to not happen.
A slightly different case is handling of a database engine crash. This is a scenario that is much more likely to happen than a general disk corruption. Yet another type of assert is used for that case.
These specialized asserts allow us to run tests with different assumptions. This in combination with simulating different type of failures has proven helpful to find and fix bugs.
One important part of software design is to make sure the code is portable. We therefore avoid certain C features that are not portable. We also make sure the file formats we use are portable. This is ensured by running our tests on a wide variety of platforms. Both actual hardware platforms and simulated platforms. We use platforms with different byte order, different alignment, where char default to both unsigned and signed to mention a few.
We use PC-lint and FlexeLint for static analysis. This have provided valuable for finding some type of bugs. However, we have to be carefull, certain type of rewrites of the source code can easily hide or introduce bugs. We therefore use this tool where certain warnings are globally ignored and for other warnings we decorate the source code to suppress them at specific places. See also the section for Memcheck testing.
Part of the QA process is also to ensure that our interfaces are sane. This includes making sure the interface and the implementation is clearly separated. It also includes naming conventions, order of arguments, making sure it is complete, special cases are not harder to do that simple cases, and in general easy to use.
The APIs we have designed for RDM have gone through a rigorous process to make sure we meet industry standards. The public header files also include DoxyGen documentation. We find that it is easier to keep documentation up to date if it goes together with the code.
A couple of minor tests we do is to compile small tests that include only one RDM public header file. This compile is required to succeed without any errors or warnings. This is repeated with the combination of any two header files. This is done for both a C and C++.
Usually, it is a requirement that a system should not deadlock. However, RDM has been designed with an API where the user can explicitly request locks and with such an API it is possible to write an application that is guaranteed to deadlock. We therefore have tests that intentionally will deadlock when run. These tests are of course not run by default. They have to be run in a special environment where we can observe that they indeed deadlock and will fail if that is not the case.
Performance is one important aspect of any computer software but especially for a database. There are three main figures we measure. CPU load, Disk I/O, and memory usage.
CPU Load testing
For CPU load we have a number of performance tests. The C/C++ QA Framework has support for instantiating stopwatches that conveniently can be used to measure specific API calls. Using these in existing tests we can get numbers to compare how we are doing compared to previous versions of our product.
We also have performance tests that do not use our QA framework that has been developed more specifically towards customer cases which in some cases compare RDM to other databases.
In this area we have improvements we are working on.
Disk I/O testing
With RDM 14.1 we have drastically changed how data are being written to disk. Running RDM in a mode where disk writes is minimized are able to do insert, updates, and deletes with minimal disk I/O. The trade-offs here are increased time to open the database and increased time for crash recovery.
Testing the disk I/O is done using some of our general tests by preloading some code that will intercept certain operating system calls and collect some statistics. This is similar to the crash testing described earlier.
One advantage with this approach is that the library we use can be used for any application without having to recompile the code. There are tools on Linux that have similar approaches but we have found that this approach is often better. It allows us to see certain write patterns we are looking for and we get a better understanding of how different use cases affect disk I/O. For instance, it is important that files are synced when they are supposed to be synced and that we do not have unnecessary syncs or writes.
Another aspect of disk I/O is file system cache performance. If at all possible, data that are likely to be accessed again is good to be clustered together and data that are not likely to be accessed again should go together. Thereby the file system cache is likely to not cache the content. For files we know the engine will not need except for certain catastrophic failures we use posix_fadvise to advise the kernel to drop the page or page it out from the file system cache. That this is done correctly can also be verified by intercepting this operating system call.
Memory usage testing
We have a test that uses the public API in certain ways that should not affect the memory usage. These tests will fail if the memory usage is found to increase. Any tests using the C/C++ QA Framework reports the memory usage in the product just before termination.
There are also tools and other mechanism built into RDM we use to monitor the memory usage. The mechanisms build into RDM can give a better view of how each sub-system behaves. Valgrind has a tool called Cachegrind that can analyze the cache performance and branch prediction of your code.
We have four different QA Frameworks developed inhouse. The one written for C/C++ is the most comprehensive one. Then there are one written in Perl, Bash, and Java. They all share some similarities which makes it easy for our team to switch between them.
Perl QA Framework
This is a later addition to our suite of QA Frameworks. Its main purpose is to test RDM command line tools. It has support for feeding input into the tools and can compare the output with expected results or grep the output for certain patterns. It runs on both Unix and Windows. Many tests that previously where using the Bash QA Framework have been rewritten to use the Perl QA Framework. That way we can run those tests both on Unix and Windows. Lot of SQL-PL is tested using this framework.
C/C++ QA Framework
Most of our C/C++ tests are using this framework. It has the most comprehensible set of features. For some details see the section above about invariant testing.
Our build system is capable of generating make files and project files for various target platforms. It also generates shell scripts and batch files that can be used to run all or a subset of our tests in different configurations. This is used heavily in our continuous integration. It also makes it easy for developers to run the same tests manually.