Indexing with RaimaDB
When to use which indexing method
Currently, RaimaDB supports 4 different types of indexes that the user can choose to use with their database. These types are: B-tree, R-Tree, Hash and AVL. Each has their own ideal use cases and they will be detailed below but the user is free to try out each type on their own database and find the ideal performance.
B-Tree
The default indexing method for RaimaDB is the b-tree. A b-tree is a self-balancing tree structure that keeps data ordered to allow for both searching and sequential access. Insertions, deletions, updates, and lookups can be done in a logarithmic scale. A b-tree is a good solution for on-disk databases because the width of the b-tree nodes can keep the depth of the node tree smaller than a self-balancing binary tree. This results in fewer disk access required for searching and updates to the tree.
The RaimaDB b-tree implementation is a non-clustered external tree sorted on the column values specified in the index definition. There is an entry in a b-tree node for each row in the table the index is derived from. Nodes are identified by their object id, referred to as a node id for node objects. This allows the implementation to be efficient for databases using either the on-disk or the in-memory engine. In both engines, references to a node id will be looked up via the id index for the drawer the b-tree is stored in. In the on-disk engine, the id-index will contain an offset and size of the node in the pack file. In the in-memory engine, the id index will contain a pointer to the node. The RaimaDB b-tree implementation uses a node size of 32-items; however, only the items that are in-use will be stored on the transaction file server. RaimaDB b-tree nodes are never updated; instead, a new node is created and the id index is updated to point to the location of the new node. The old node will be available for re-use in later transactions.
R-Tree (Spatial Indexing)
RaimaDB has added an index algorithm designed specifically for geospatial data called an R-Tree. It is a balanced search tree that organizes its data into pages and is designed to group nearby objects and then represent that in the next level of the tree. This is the ideal index type when the users needs quick retrieval of multi-dimensional data within a bounding box. A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as “Find all museums within 2 km of my current location”, “retrieve all road segments within 2 km of my location” (to display them in a navigation system) or “find the nearest gas station” (although not taking roads into account)
Hash
The hash implementation in RaimaDB uses an extendible hashing algorithm. The extendible hashing algorithm does not require the developer to guess the cardinality of an index during design time. This is important for the in-memory storage engine as there is no need to reserve memory for a set of buckets that may never be used. Instead, as the number of buckets in the hash grow the directory size will increase. The current implementation only allows for unique keys and does order data based on the keyed columns (data is organized based on the hash of those columns). This means that a hash can be used for lookups, but cannot be used for sequential access or ranges.
AVL
RaimaDB also has added an index algorithm specifically for use by the in-memory storage engine called an AVL tree. An AVL is a self-balancing binary tree that in RaimaDB is implemented internally to a row instead of externally. There is no data duplication in the AVL as, unlike in a B-tree, external nodes containing copies of indexed columns are not maintained. An AVL is a binary tree, meaning the depth of the tree will be much larger than that of a B-tree. For this reason, the AVL index is better suited for the in-memory storage engine using the expanded row format than the on-disk-based engine using the packed row format. The packed row format does contain an implementation for the AVL index, but this is included primarily for persisting an in-memory image to disk and not intended for general use in disk-based tables.
An AVL can be used for any operations that a b-tree index would be used for. The AVL supports lookups, ranges, scanning, and both duplicate or unique constraints. The SQL optimizer will utilize an AVL in the same manner as a B-tree but will use a slightly different weight based on the implementation differences between a B-tree and an AVL.