TEXT Indexing Elements

This chapter describes elements related to Tibero TEXT indexes.

Tibero TEXT indexes can be configured with the following preference classes:

PREFERENCE Class
Description

STORAGE

Storage of an index and a table, which are created when a TEXT index

is created.

LEXER

Lexer used for the TEXT index.

WORDLIST

Token created for the TEXT index.

STOPLIST

Words which are excluded from TEXT indexes.

DATASTORE

Datastore that specifies where your TEXT index is stored.

FILTER

Filtering method to index your document.

SECTION GROUP

Organizes your document into section groups.

Preferences can be created using TEXT_DDL.CREATE_PREFERENCE.

Tibero TEXT provides predefined preferences for the seven preference classes above.

Users can create new preferences using the defined preferences.

After a preference has been created, the properties of the preference can be changed using TEXT_DDL.SET_ATTRIBUTE.

STORAGE

The STORAGE class specifies the storage for an index and table which are created when a TEXT indexis created.

BASIC_STORAGE is defined in the STORAGE class and has the following properties:

Property
Description

I_TABLE_CLAUSE

Tablespace of a table that saves TEXT index token.

I_ROWID_INDEX_CLAUSE

Tablespace of an index for the ROWID column in a token table.

I_INDEX_CLAUSE

Tablespace of an index for a token column in a token table.

K_TABLE_CLAUSE

Reserved for future functionality.

P_TABLE_CLAUSE

Reserved for future functionality.

N_TABLE_CLAUSE

Reserved for future functionality.

R_TABLE_CLAUSE

Tablespace of an index for the ROWID column in a mapping table.

S_TABLE_CLAUSE

Reserved for future functionality.


LEXER

The LEXER class specifies the lexer to use for a TEXT index.

BASIC_LEXER

Uses whitespace or special character delimited words.

BASIC_LEXER has the following attributes:

Attribute
Description

MIXED_CASE

Option to preserve case-sensitivity of text for indexing. If set to YES, case

is preserved. (Attribute: TRUE/FALSE, default value: FALSE)

BASE_LETTER

Option to ignore base letters, such as Umlaut and acute accent for indexing.If set to YES, base letters are ignored.

(Attribute: TRUE/FALSE, default value: FALSE)

WHITESPACE

Option to search for a sentence or paragraph by joining with punctuation or newline characters to separate sentences or paragraphs. The default valueis space or tab. Other than the default value, user-specified whitespace characters can be recognized.

(Attribute: string, default value: space or tab characters)

JAPANESE_LEXER

Used to generate tokens in Japanese text. JAPANESE_LEXER is a Japanese morphological analyzer. JAPANESE_LEXER has the following attributes.

Attribute
Description

ASCII_MIXED_CASE

Option to preserve case-sensitivity of ASCII text for indexing. If set to YES, case is preserved. (Attribute: TRUE/FALSE, Default: FALSE)

DELIMETER

Option to ignore Japanese delimiters, such as /, -, ·, etc. If set to YES, such characters are ignored. (Attribute: TRUE/FALSE, Default: FALSE)

CHINESE_LEXER

Used to generate tokens in Chinese text. CHINESE_LEXER has the following attributes.

  • Only available in UTF8 and GBK character sets.

  • Can use CONTEXT indexes but not CTXCAT indexes.

CHINESE_VGRAM_LEXER

Used to generate tokens in Chinese text. Unlike CHINESE_LEXER, tokens are generated based on the number of characters. Lexical meaning is not used.

CHINESE_VGRAM_LEXER has the following limitations.

  • Only available in UTF8 character set.

WORLD_LEXER

Used to generate tokens from documents containing multiple languages. This automatically detects the language and generates a token. Tokens are generated based on whitespace for languages with whitespace, and they are made in 2-grams for Japanese or Chinese containing no whitespace.

WORLD_LEXER has the following limitations.

  • Only available in UTF8 character sets.

  • Japanese or Chinese, which generate tokens in 2-gram, do not support wildcards.

  • Can use CONTEXT indexes but not CTXCAT indexes.


WORDLIST

The WORDLIST class specifies a token created when a TEXT index is created. BASIC_WORDLIST is defined in the WORDLIST class and has the following properties:

Attribute
Description

PREFIX_INDEX

If this is specified as YES, prefixes of words are indexed to improve the performance of searching for a prefix.

(Attribute: YES/NO, Default: NO)

PREFIX_MIN_LENGTH

Minimum length of a prefix when PREFIX_INDEX is specified as YES. (Range: 1 ~ 64, Default: 1)

PREFIX_MAX_LENGTH

Maximum length of a prefix when PREFIX_INDEX is specified as YES. (Range: 1 ~ 64, Default: 64)


STOPLIST

The STOPLIST class specifies words which are excluded from TEXT indexes. DEFAULT_STOPLIST and EMPTY_STOPLIST are defined in the STOPLIST class.

DEFAULT_STOPLIST has basic stopwords. For details about the basic stopwords, please refer to “Appendix”.

EMPTY_STOPLIST is a STOPLIST without STOPWORD.

A new stoplist can be created using TEXT_DDL.CREATE_STOPLIST. A new stopword can be added to the list using TEXT_DDL.ADD_STOPWORD.


DATASTORE

The DATASTORE class specifies how your documents are stored. The DATASTORE class defines DIRECT_DATASTORE and MULTI_COLUMN_DATASTORE.

DIRECT_DATASTORE

DIRECT_DATASTORE internally stores data in text columns. Each row is indexed as a single document. DIRECT_DATASTORE has no attributes.

MULTI_COLUMN_DATASTORE

MULTI_COLUMN_DATASTORE stores data in multiple columns. Tibero combines the columns specified with MULTI_COLUMN_DATASTORE by each row and then indexes the rows.

Attribute
Description

COLUMNS

Comma-separated columns to be indexed. You can specify the columns of type VARCHAR, CLOB, NUMBER, and DATE. You can also specify column

name, column expression, and PSM function. (Attribute: STRING)

DELIMITER

Delimiter to split the column text.

– COLUMN_NAME_TAG: Delimits the text of each column by XML format start and end tags. Each tag has a column name. (Default)

– NEWLINE: Delimits the text of each column by newline.


FILTER

The FILTER class specifies how to filter documents to be indexed. The FILTER class only defines NULL_FILTER.

NULL_FILTER

NULL_FILTER does not filter. It indexes general documents, HTML and XML documents without filtering. Type NULL_FILTER has no attributes.


SECTION GROUP

The SECTION_GROUP class divides the document structure into multiple sections. This class only defines NULL_SECTION_GROUP, which does not define a section. You can use TEXT_DDL.CREATE_SECTION_GROUP to create a new SECTION_GROUP.


TEXT_DDL Package

The following procedures are provided by TEXT_DDL to create and manage preferences.

Procedure
Description

ADD_SPECIAL_SECTION

Adds a SENTENCE or PARAGRAPH as a SPECIAL SECTION in

a SECTION GROUP.

ADD_STOPWORD

Adds a stopword to a stoplist.

CREATE_PREFERENCE

Creates a new preference.

CREATE_SECTION_GROUP

Defines a new SECTION_GROUP.

CREATE_STOPLIST

Creates a new stoplist.

DROP_PREFERENCE

Deletes a preference.

DROP_SECTION_GROUP

Deletes SECTION_GROUP.

DROP_STOPLIST

Deletes a stoplist.

REMOVE_SECTION

Deletes a SECTION from a SECTION GROUP.

REMOVE_STOPWORD

Deletes a stopword from a stoplist.

SET_ATTRIBUTE

Specifies the attributes of a preference.

ADD_SPECIAL_SECTION

The ADD_SPECIAL_SECTION procedure adds a SENTENCE or PARAGRAPH as a SPECIAL SECTION in a SECTION GROUP. Adding a special section allows you to find a sentence or paragraph by using the WITHIN operator.

After specifying section groups containing special sections, the delimiter specified when creating indexes tokenizes special sections.

Sentences are delimited by [word][punctuation][whitespace] or [word][punctuation][newline]. Paragraphs are delimited by [word][punctuation][newline][whitespace] or [word][punctuation][newline][newline].

Note

The default punctuation value is [. / ! / ?], andthe default newline value is [\n / \r]. The user cannot modify these punctuation and newline values.

Prototype

ADD_SPECIAL_SECTION (
    group_name	        IN VARCHAR2, 
    section_name	IN VARCHAR2)

Parameter

Parameter
Description

group_name

Name of a SECTION GROUP to be added.

section_name

Name of a section to be added. Can be specified as 'SENTENCE' or

'PARAGRAPH'.

Example

SQL>EXEC TEXT_DDL.ADD_SPECIAL_SECTION('nullgroup','SENTENCE');

ADD_STOPWORD

The ADD_STOPWORD procedure adds a stopword to a stoplist.

Prototype

ADD_STOPWORD (
    stoplist_name     IN VARCHAR2,
    stopword          IN VARCHAR2,
    lang              IN VARCHAR2 DEFAULT 'ALL')

Parameter

Parameter
Description

stoplist_name

STOPLIST name.

stopword

Stopword to add.

lang

Language to which the stopword applies. (Unsupported)

Example

SQL>EXEC TEXT_DDL.ADD_STOPWORD('DEFAULT_STOPLIST','test_word');

CREATE_PREFERENCE

The CREATE_PREFERENCE procedure creates a new preference.

Prototype

CREATE_PREFERENCE (
    udef_pref_name     IN VARCHAR2,
    pdef_pref_name     IN VARCHAR2)

Parameter

Parameter
Description

udef_pref_name

PREFERENCE name

pdef_pref_name

Predefined Tibero TEXT preference name.

Example

SQL>EXEC TEXT_DDL.CREATE_PREFERENCE('TEST1','BASIC_WORDLIST');

CREATE_SECTION_GROUP

The CREATE_SECTION_GROUP procedure creates a new section group.

Prototype

CREATE_SECTION_GROUP (
    udef_pref_name         IN VARCHAR2,
    pdef_pref_name         IN VARCHAR2)

Parameter

Parameter
Description

udef_pref_name

Preference name.

pdef_pref_name

Predefined Tibero TEXT preference name.

Example

SQL>EXEC TEXT_DDL.CREATE_SECTION_GROUP('my_sg','NULL_SECTION_GROUP');

CREATE_STOPLIST

The CREATE_STOPLIST procedure creates a new stoplist.

Prototype

CREATE_STOPLIST (
    stoplist_name         IN VARCHAR2,
    stoplist_type         IN VARCHAR2 DEFAULT 'DEFAULT_STOPLIST')

Parameter

Parameter
Description

stoplist_name

Stoplist name.

stoplist_type

Stoplist type.

Currently, DEFAULT_STOPLIST and EMPTY_STOPLIST are supported.

Example

SQL>EXEC TEXT_DDL.CREATE_STOPLIST('new_stoplist', 'DEFAULT_STOPLIST');

DROP_PREFERENCE

The DROP_STOPLIST procedure deletes a stoplist.

Prototype

DROP_PREFERENCE (udef_pref_name IN VARCHAR2);

Parameter

Parameter
Description

udef_pref_name

Preference name.

Example

SQL>EXEC TEXT_DDL.DROP_PREFERENCE('TEST1');

DROP_SECTION_GROUP

The DROP_PREFERENCE procedure deletes a preference.

Prototype

DROP_SECTION_GROUP (udef_pref_name IN VARCHAR2);

Parameter

Parameter
Description

udef_pref_name

Section group name.

Example

SQL>EXEC TEXT_DDL.DROP_PREFERENCE('my_sg');

DROP_STOPLIST

The DROP_STOPLIST procedure deletes a stoplist.

Prototype

DROP_STOPLIST (stoplist_name IN VARCHAR2);

Parameter

Parameter
Description

stoplist_name

Stoplist to delete.

Example

SQL>EXEC TEXT_DDL.DROP_STOPLIST('new_stoplist');

REMOVE_SECTION

Deletes a SECTION from a SECTION GROUP

Prototype

REMOVE_SECTION (
    group_name         IN VARCHAR2,
    section_name       IN VARCHAR2)

Parameter

Parameter
Description

group_name

Name of a SECTION GROUP to be deleted.

section_name

Name of a SECTION to be deleted.

Example

SQL>EXEC TEXT_DDL.REMOVE_SECTION('nullgroup','test_section'); 
SQL>EXEC TEXT_DDL.REMOVE_SECTION('nullgroup','SENTENCE');

REMOVE_STOPWORD

The REMOVE_STOPWORD procedure deletes a stopword from a stoplist.

Prototype

REMOVE_STOPWORD (
    stoplist_name         IN VARCHAR2, 
    stopword              IN VARCHAR2,
    lang                  IN VARCHAR2 DEFAULT 'ALL' )

Parameter

Parameter
Description

stoplist_name

Name of a stoplist which includes the stopword to be deleted.

stopword

Stopword to delete.

lang

Language to which the stopword applies. (Unsupported)

Example

SQL>EXEC TEXT_DDL.REMOVE_STOPWORD('DEFAULT_STOPLIST','test_word');

SET_ATTRIBUTE

The SET_ATTRIBUTE procedure specifies the attributes for a preference.

Prototype

SET_ATTRIBUTE (
    udef_pref_name	IN VARCHAR2,
    attribute_name	IN VARCHAR2,
    value	        IN VARCHAR2)

Parameter

Parameter
Description

udef_pref_name

Preference to specify the attribute for.

attribute_name

Attribute name.

value

Attribute value.

Example

SQL>EXEC TEXT_DDL.SET_ATTRIBUTE('TEST1','PREFIX_INDEX','TRUE');


CTX_DDL Package

The following are procedures provided by CTX_DDL to manage CONTEXT indexes.

OPTIMIZE_INDEX

Optimizes synchronized indexes. This improves query performance because old data is removed.

Prototype

CTX_DDL.OPTIMIZE_INDEX (
        idx_name           IN VARCHAR2,
        optlevel           IN VARCHAR2,
        maxtime            IN NUMBER DEFAULT NULL,
        token              IN VARCHAR2 DEFAULT NULL,
        part_name          IN VARCHAR2 DEFAULT NULL,
        token_type         IN NUMBER DEFAULT NULL,

        parallel_degree    IN NUMBER DEFAULT 1)

Parameter

Parameter
Description

idx_name

Index name.

optlevel

One of the following two optimization methods.

– FAST: combine duplicate tokens, but do not remove old (removed row) data.

– FULL: combine duplicate tokens, and then remove the old (removed row) data.

maxtime

Maximum optimization time. (Unsupported)

token

Token to be optimized. (Unsupported)

part_name

Partition name for local indexes.

token_type

Token type. (Unsupported)

parallel_degree

Parallel degree. (Unsupported)

Example

begin
    ctx_ddl.optimize_index('myindex','FAST');
end

SYNC_INDEX

Synchronizes CONTEXT indexes to process inserts, updates, and deletes to the base table.

Prototype

CTX_DDL.SYNC_INDEX (
    idx_name                  IN VARCHAR2 DEFAULT NULL,
    memory                    IN VARCHAR2 DEFAULT NULL,
    part_name                 IN VARCHAR2 DEFAULT NULL,
    parallel_degree           IN NUMBER DEFAULT 1,
    maxtime                   IN NUMBER DEFAULT NULL,
    locking                   IN NUMBER DEFAULT LOCK_WAIT)

Parameter

Parameter
Description

idx_name

Index name.

memory

Runtime memory to use for synchronization. (Unsupported)

part_name

Partition name for local indexes.

parallel_degree

Parallel degree. (Unsupported)

maxtime

Maximum synchronization time. (Unsupported)

locking

How to deal with the case where another synchronization is already running

in the same index or partition. (Unsupported)

Example

begin
    ctx_ddl.sync_index('myindex');
end


CTX_OUTPUT Package

The following are procedures provided by CTX_OUTPUT to manage CONTEXT indexes.

END_LOG

Stops logging for indexing.

Prototype

CTX_OUTPUT.END_LOG;

Example

begin
    ctx_output.end_log;
end

START_LOG

Starts logging for indexing.

Prototype

CTX_OUTPUT.START_LOG (
    file_name             IN VARCHAR2,
    write_option          IN BOOLEAN DEFAULT TRUE)

Parameter

Parameter
Description

file_name

Name of the log file.

write_option

If the log file already exists, this parameter specifies whether you overwrite

the file or add a new log file.

Example

begin
    ctx_output.start_log('myindex', true);
end

Last updated