Introduction

This chapter describes the basic concepts and features of ZetaData.

Overview

The current data warehouse market has reached its limits due to the explosion of data. The size of data is expanding beyond terabytes to petabytes, and the size of a single data warehouse will further increase the amount of data across the enterprise. This data growth eventually creates performance and cost issues.

In a data warehouse configured in a traditional storage area network (SAN) environment, the bottleneck is I/O performance. Typical storage I/O performance growth rates do not keep pace with data growth.

In addition, scalability is limited in a SAN environment. To solve this problem, you need to purchase more expensive equipment to reinstall DBs and migrate data. At some point, it will no longer be possible to scale-up and the existing data will need to be backed up externally or deleted. In other words, to address future data growth trends, horizontal scalability and high-speed access to large amounts of data are required.

One solution to this problem is Hadoop, which is open source in the "big data" market. Hadoop provides HDFS, an infinitely scalable file system, and the MapReduce framework for analyzing it. However, it is not as easy to use Hadoop to perform high-level analytics as a general-purpose RDBMS.

Hadoop has the following problems

  • MapReduce makes it difficult to write and utilize simple queries like SQL.

  • It doesn't have strict consistency.

Another solution is found in the appliance market. These are commonly used by large enterprises and include Oracle's Exadata, EMC's Greenplum, and MS's PDW. They offer storage database integration solutions that support hundreds of terabytes to petabytes. They also provide both hardware (H/W) and software (S/W). Most of them are optimized for analytics and comply with SQL standards, making it easy to write consistent queries in various environments. The problem with this Appliance DW approach is dependencies. Because you have to adopt an all-in-one H/W and S/W solution, you are locked into a specific vendor in every way. For example, Exadata requires Exadata S/W and Sun H/W. Different customers may want different H/W, but you must always use the same H/W.

Another issue is cost. The above appliances are very expensive equipment, with H/W and S/W prices ranging from hundreds of millions to billions of won. Therefore, it is difficult to adopt them unless you are a company of a certain size. To solve all of these problems, TmaxTibero released ZetaData.


Key features

ZetaData provides the following key features.

Horizontal storage structure optimized for processing large amounts of data

ZetaData consists of a DB and storage software to support it. The storage software is installed on servers with multiple local disks, which are called storage servers. Each of these storage servers is an independent component, so adding more storage servers does not affect the other storage servers. It just makes them appear as one volume in the TAS layer. This provides a storage structure that can scale to large volumes.

I/O Cache Tiering with Flash Devices

If you have flash devices in your H/W, you can use them to perform I/O Cache Tiering, which means that you can use flash devices as I/O caches to provide very fast response speeds even in OLTP environments. If you need to, you can use the flash device as a disk instead of an I/O cache.

Delivers higher bandwidth I/O than a SAN using InfiniBand as a network interconnect

ZetaData uses InfiniBand as its primary network interconnect. InfiniBand has much higher bandwidth than traditional network technologies, providing several times the bandwidth of SAN environments built with Fiber Channel.

RDMA protocol reduces communication latency

ZetaData basically uses the RDMA (Remote Direct Memory Access) protocol to exchange data between nodes. By using the RDMA protocol, communication delay time and CPU utilization are greatly reduced.

Column compression for maximum compression

It is possible to obtain maximized compression efficiency by changing the existing row-direction data storage structure into a column direction and then compressing it. In addition, by applying different compression methods according to the access frequency of data, it is possible to increase a query speed for frequently used data and increase a compression rate for frequently used data.

The provision of a market-verified DBMS,Tibero 7

Tibero 7 is a RDBMS that has been verified in real environments for more than 10 years, and has proven excellent performance in BMT (Bench Marking Test) with other DBMS, and has been applied in various fields such as the public, finance, and companies. By adopting a resource-efficient architecture that takes into account core and analytic tasks, it effectively responds to large-scale data processing and cloud environment demands, and ensures stability, high performance, compatibility, and convenience.

The Tibero offers the following advanced features:

  • Distributed database links

  • Data Replication

  • Database clustering

  • Parallel query processing

  • A query optimizer

Note

For a detailed description of Tibero's features, refer to the "Tibero guide".

Configuration without separate clusterware using TAS and CM

ZetaData provides TAS to unify multiple distributed storage servers and perform volume management. Using TAS, you can use features such as striping, mirroring, and logical volume management like a general storage solution. For a detailed description of other features, see "Tibero Active Storage Guide". Cluster Manager (CM) enables reliable clustering operations.

Note

For a detailed description of Tibero's additional features, refer to the "Tibero Active Storage guide".

Figure 1. ZetaData architecture

Note

In this guide, a physical server is defined as a node, and software executed in the node is defined as an instance. In the ZetaData, a server in which the TAS, TAC, and CM instances are executed is a DB node, and a server in which a SSVR is executed is a Storage node.

Since then, Tibero is used not only as a storage solution such as ZetaData, but also as a comprehensive meaning of various products such as TAC, HA, and Single.


SSVR Instance

SSVR instances are independently configured to manage local disks and perform in-process I/O operations for the DBMS. You can access and manage SSVR instances using tbSQL in the same way as you access TAC instances. It provides information for managing local disks and checking statistical information. The binaries for installing SSVR instances are provided in the same form as the existing Tibero binaries.

Note

It is recommended to use the same binaries built exclusively for ZetaData for all DB nodes and Storage nodes.

Processes in an SSVR instance

The following processes are created in an SSVR instance.

Process
Description

MGWP

This process is used to manage the system. It basically performs the same role as the worker process, but handles connections directly through a special port without going through a listener. Only SYS accounts are allowed to access it.

FGWP0000

This is the process that actually communicates with the client and handles the user's needs. Specifically, it manages local disks, provides information for checking statistics, etc.

SSVR

This is the process that receives and processes actual input and output requests.

It also manages the flash cache and the storage data map. Most of the work is done by the SSVR process.

SSVR communication process dedicated to TAS/TAC instances

The following processes are additionally created on the TAS and TAC instances in order for the TAS and TAC instances to send I/O requests to the SSVR instance and receive the results.

Process
Description

SSIO

SSIO is created in the ZetaData environment for TAS instances and TAC instances to exchange I/O request messages with SSVR instances.

I/O request messages with the SSVR instance in a ZetaData environment.

Note

The directory created when SSVR is installed is the same as the existing Tibero.

For more information, refer to the "Tibero guide".

Last updated