2 Space-Saving Techniques For Size-Sensitive Databases

April 30

Raima logo on grey background

Here are two easy but non-obvious changes you can make to your DDL to decrease the size of your database files. Some interesting explanations of the internals of RDM show why this works, and allow you to make better DDL decisions in all of your database designs.

Consider the following SQL DDL:

create database EventTrace
create circular table TraceTab(
     deviceId smallint,
     lowPoint real,
     maxLevel double
) maxrows=8000

The on-disk (or in-memory) size for this row is 32 bytes. It looks like it should be 14 bytes, with 2 for the smallint, 4 for the real, and 8 for the double. Where do the extra 18 bytes come from? If your database is size-sensitive, it may be good to have a short lesson on the internals of this DBMS.

Every row/record in RDM has a header consisting of the record type (2 bytes) and its own database address (8 bytes). Following the header, the row data appears in an C-aligned layout. This means, on most compilers, that each element will be aligned so that it is on an N-byte boundary, where N is the size of the element. That is, an 8-byte double will begin on an 8-byte boundary, 4 on 4, and 2 on 2. The structure defined for the above DDL must be padded between the smallint and the real, and again between the real and the double, costing 2+4=6 padding bytes for each row.

First technique: arrange the column/field definitions by size, largest to smallest.

Our default “database address” (internal pointer to record slot) is 8 bytes (as mentioned above), allowing up to 281 trillion rows in one file. You can override the default and use a 4-byte database address, which allows up to 16.7 million records, which is usually more than enough. To do this, say “dba4” following the database name. Now the header is 6 bytes rather than 10, and you have saved another 4 bytes per row. The header is important in a DBMS, because it can be used for consistency/corruption checking.

Second technique: Use “dba4” when you know that you will not have more than 16.7 million rows in any one table.

FYI, each row has one additional byte used to store NULL flags, one bit for each column. If you have more than 8 columns, a second byte will be used, etc.

Now the DDL looks like this:

create database EventTrace dba4
create circular table TraceTab(
     maxLevel double,
     lowPoint real,
     deviceId smallint
) maxrows=8000


So the total length of our redefined, but equivalent row is:

2              record type
4              database address
8              first column (maxLevel)
4              second column (lowPoint)
2              third column (deviceId)
1              null-column bits

21           total bytes/row

One more FYI – when rows are placed into fixed-length page slots, they always go into even addresses. This reduces the number of memcpy operations needed by the runtime library substantially. So this row’s size is 22 bytes, and 46 of them will fit onto one 1024-byte page.

All of the above applies to a non-SQL database definition, with one exception – there is no NULL flags byte. The syntax of the DDL is different, of course, but means the same thing.

Get notified about new RDM updates

Be the first to know about new Raima Database Manager updates when they go live, use cases, industry trends and more.