Storage Data Map

This chapter describes the Storage Data Map, a key feature of ZetaData.

Overview

The Storage Data Map feature is a technology that allows for efficient use of disk bandwidth by reducing the amount of disk reads performed.

ZetaData automatically calculates the statistics (e.g.: maximum and minimum values of each column) for disks and preserves them in memory. This avoids reading disks that do not contain the data that do not satisfy the filtering conditions. Since the disk read size is decreased, the disk bandwidth can be used more efficiently as well as greatly reducing the amount of time consumed in disk I/O.

Operation

When querying data through Table Full Scan, physically contiguous disk areas are read. Here, the storage server calculates each column's maximum and minimum values for the disk area to configure a storage data map. When reading a specific disk block such as with table queries using indexes, the entire contiguous areas of disk are not read, so a storage data map is not created. Therefore, to increase performance, it is important to perform a Table Full Scan by not creating an index just like Function Offloading, or by using query hints.

Columns used to create a storage data map are limited to the columns that are contained in the WHERE clause of the view query amongst all columns in a table, as well as the columns that have not been preprocessed such as by functions or formulas.

The types of columns to be applied are not only NUMBER, CHAR, and VARCHAR but also most basic types that can be included in the WHERE clause without conversion. However, an LOB column is stored in a separate LOB segment, so this function cannot be used when a LOB column is included.

Since Function Offloading does not operate for LONG, RAW, and XML types, Storage Data Map is not available.

When the column is used in a WHERE clause, size comparison such as "large and small", as well as "same, different, NULL, and not NULL" can be expressed. If there is a large number columns, then storage data maps are created for up to eight columns, and then columns queried after are ignored.

When querying data stored for the first time after starting a storage server, there is no existing storage data map, which means no performance gain. As well, using a WHERE clause to repeatedly view a same column increases the performance. The WHERE clause used here does not need to be completely the same as the WHERE clause used when creating a storage data map, and performance increases as long as the columns used are the same.

A storage data map includes the maximum and minimum values of the columns for each disk area. If a condition of the WHERE clause is outside of the range of the minimum and maximum values, a read task may not be performed for the disk area.

Therefore, the lower the view query's selectivity is (in other words, the lower the number of data that meets the condition), the range of performance improvement becomes larger. As well, by storing data in the sorted format for the columns used in the WHERE clause, the difference between the column's maximum and minimum values becomes smaller, which means the impact of storage data map can be maximized.

A created storage data map is immediately removed from memory the moment in which a write request for the area is executed. This is because if data is modified, then the existing statistical value may be changed. Therefore, it is not recommended to use the storage data map in OLTP systems with a lot of data changes.

Since Storage Data Map operates when Function Offloading is enabled, disable Function Offloading by using its initialization parameter.

Caution

Shared memory needs to be secured for SDM.

For more information, refer to 'initialization parameters'.

PreviousFunction Offloading NextData Backup and Recovery

Last updated 7 days ago