Introduction to IMCS
This chapter describes the concept and features of In-Memory Column Store.
Last updated
This chapter describes the concept and features of In-Memory Column Store.
Last updated
IIn-Memory Column Store (IMCS) is a memory storage architecture optimized for column scans, which stores and manages data copies in a columnar format.
The columnar storage format is supported in addition to the traditional row-oriented storage, to improve database performance in a mixed environment handling both OLTP and OLAP workloads.
[Figure 1] Row-based storage vs Column-based storage
IMCS resides in the In-Memory Area, a component of the shared memory. The size of the In-Memory Area is allocated at startup of the database and remains fixed until the next startup.
Unlike the buffer cache, which discards unused data when filled up, IMCS keeps data permanently in the In-Memory Area unless manipulated through a command by the user. This eliminates additional I/O costs when reading data stored IMCS. (However, if no valid data is found through an in-memory scan, I/O costs may incur because the buffer cache needs to be additionally read).
Columnar Format
IMCS stores copies of data in a columnar format optimized for scans. Data stored in a columnar format has an internally fixed size, which enables efficient filter operation (ex. <, >, =) during a scan.
IMCS eliminates duplicate data by compressing each columnar data. This improves memory efficiency and scan performance by reducing the number of data to be scanned.
IMCS manages each columnar data in a storage unit called In-Memory Compression Unit (IMCU). IMCU contains information about the minimum and maximum values of each columnar data. Based on these minimum and maximum values of data in each IMCU, the database performs IMCU pruning to eliminate unnecessary scans and improve thus the scanning performance.
The In-Memory Area is allocated in a fixed location within the shared memory at startup of the database. The size of the In-Memory Area is set by the INMEMORY_SIZE initialization parameter.
The In-Memory Area is divided into two subpools: In-Memory Compression Unit (IMCU) for columnar data, and Snapshot Metadata Unit (SMU) for metadata of columnar data.
[Figure 2] In-Memory Column Store Architecture
An In-Memory Compression Unit (IMCU) is a storage unit of IMCS that contains data for one or more columns in a compressed format. The size of an IMCU is 1 MB, and IMCS stores data for a single object (ex. table, partition or subpartition) to multiple IMCUs in columnar format.
An IMCU stores columnar data for only one object, and every IMCU contains one or more Column Compression Units (CU).
Column Compression Unit(CU) A Column Compression Unit (CU) is a storage for a single column in an IMCU. A CU encodes the column value of each row in a 2-byte value, which is called a dictionary code. A CU contains a mapping table for each dictionary code corresponding to the actual column data.
A Snapshot Metadata Unit (SMU) stores metadata for an associated IMCU. In IMCS, each IMCU is mapped to a separate SMU. Every SMU contains information about the object and the address of a data block populated in each associated IMCU, as well as a transaction journal, which records rows modified with DML statements.
Population, which converts data into columnar format in IMCS, is performed by background processes. At startup of the database, agent processes initiate population of objects having In-Memory priority. When initially accessing objects with no priority, agent processes populate them by using queries.
Reading columnar data for objects populated in IMCS through queries, or recording transaction journals in SMUs using DML statements are performed by worker processes.
SIMD (Single Instruction, Multiple Data) vector processing enhances performance of expressions in a WHERE clause, by scanning a set of column values in a single CPU instruction. For example, 8 values (2-byte data) from a CU are loaded into the CPU, to get 8 results in a single comparison instruction.
SIMD is used for comparing In-Memory columns to bind parameters or constant values. The following comparison operators are supported: >, >=, =, <, <=, AND