RDM SQL utilizes a cost-based method of optimizing a query in which data usage information is analyzed by the query optimizer in order to determine the fastest way to execute a query. In RDM 11, the information available to SQL was limited to the number of rows in the referenced tables plus any additional information provided by the user through the optional distinct values and range column declaration clauses used in the create table statement.
For basic queries, this does not cause any problem because those execution option choices are simple. But for more complex queries that involve retrieving data from multiple tables that satisfy a where clause comprised of multiple conditional expressions, those limitations could result in a noticeable performance penalty.
A significant improvement has been made in RDM 12.0 to SQL’s ability to make good execution plan choices that will produce queries that execute faster. A new statement, update statistics (stats), can now be executed on a populated database that will collect data distribution statistics for each column of each table in the database. This data will then be used by the SQL optimizer to better predict the access costs associated with each possible method available to access the rows of each table referenced in the query.
The amount of data to be sampled and the extent of the distribution data to be stored is controlled by the user through optional clauses in the update stats statement. The larger the database, the longer it may take update stats to collect the distribution data. This is mitigated as the user can control the amount of data read by specifying a percentage of the database to be sampled. Moreover, the data collection is multi-threaded with the number of threads used specified by an update stats clause or, by default, set to the number of core CPUs on the computer.
Also provided is the ability to specify a condition which will cause the system to automatically run update stats based on how much the database has changed since the last run. So, you may only have to manually run update stats once and the system will automatically control additional runs based on the change parameters you specified on that first run.
Stored procedures can contain one or more select statements that were compiled and optimized using the statistics that were available at the time the create procedure was executed. Stored procedures allows the use of pre-compiled SQL statements so that the compilation cost does not have to be incurred at application execution time. RDM 12 SQL has added the auto compile clause to the create procedure statement that will cause SQL to automatically recompile the stored procedure in the event that an update stats has been run since the last time the procedure was executed. Of course, this will add the re-compilation cost to the execution time for that particular execution but in many cases, all subsequent executions will be faster because of the changes in the data distribution statistics and the positive effect that has on the optimizer’s execution plan choice.
The collected statistics are stored in a file named dbname.sts in the same directory on the TFS as the database files and the catalog file (dbname.cat).
Four new system catalog files are provided that can be queried in order to view the collected statistics. For example, table sys$dbstats contains one row for each open database on which an update stats has been executed. Information in the table includes a stats file version number, the date when update stats was last run, the execution time for the update stats, the total number of rows stored in the database, along with the values of each of the update stats control parameters. Other tables include sys$tabstats, sys$colstats, and sys$colhisto. This last table contains the collected distribution stats for each column in the database. Use of this information along with an understanding of how the optimizer works as described in “How Queries are Processed by RDM SQL” can help you analyze particular SQL queries.
With the availability of data distribution statistics in RDM 12 SQL, the full benefits of sophisticated cost-based query optimization can be realized in RDM 12 SQL applications.