Introduction of Tibero Hadoop Connector
This chapter describes the concepts and features of the Tibero Hadoop Connector.
Hadoop is an open-source framework solution from Apache Software Foundation to facilitate storage, distribution, and parallel processing of large amounts of data.
Hadoop includes the following software stacks.
HDFS(Hadoop Distributed File System) A distributed file system that provides failure recovery and high availability through data block redundancy in a distributed file system.
MapReduce A distributed programming framework that automatically performs distributed parallel processing on tasks that are divided into Map and Reduce. It supports parallel and distributed processing using resources from multiple nodes and failure recovery function.
In summary, Hadoop is a system aimed at mass storage and fast processing of data.
The number of businesses using Hadoop is increasing rapidly as the amount of data is growing exponentially. However, Hadoop requires a MapReduce program for data processing which creates high programming burden to create various queries needed for data analysis. It lacks an interactive SQL interface with immediate feedback causing inconvenience in having to write code to achieve the desired results.
There are also many cases that require multiple types of data sources to store various data formats.
In such cases, unstructured data is stored in HDFS, and structured data in the existing RDBMS. Combining heterogeneous data sources to analyze legacy database and big data together dramatically increases data processing complexity.
Key Features
Tibero Hadoop Connector is a solution that satisfies big data processing requirements, and the need for heterogeneous data source integration and convenient interface.
The following Tibero Hadoop Connector features are provided to supplement Hadoop.
Provides Extern Table interface to process data stored in HDFS with data in RDBMS tables.
External Table interface reduces data migration inconvenience.
Supports all query functions of Tibero.
Supports data integration functions such as table joins between HBase and Tibero tables.
Data in Hadoop can be combined with data in Tibero in a query using Ansi-SQL. The access interface between Tibero and Hadoop HDFS is unified in SQL, which reduces the burden of using heterogeneous data sources. Using SQL to perform various queries according to the fast changing data analysis needs facilitates a fast data analysis process.
Tibero Hadoop Connector uses the External Table function to access data so that queries can be performed on various data formats as with structured data. Various functions including query processing functions provided by Tibero InfiniData can also be used with data in Hadoop.
In summary, Tibero Hadoop Connector enables easy integrated analysis of data in Hadoop and RDBMS. Such agile big data analysis functionality can help to quickly respond to the rapidly changing business environment.
Supported Hadoop Versions
Tibero HDFS Connector only supports Linux. HDFS Connector supports Hadoop 1.2.X versions.
Last updated