Translate into your own language

Friday, April 15, 2016

How to create B-tree index and how it works

Suppose you want to create an index. You understand that the default type of index in Oracle is the B-tree, but you don’t quite understand how an index is physically implemented. You want to fully comprehend the B-tree index internals so as to make intelligent performance decisions when building database applications.

An example with a good diagram will help illustrate the mechanics of a B-tree index. Even if you’ve been working with B-tree indexes for quite some time, a good example may illuminate technical aspects of using an index. To get started, suppose you have a table created as follows:

create table cust(
cust_id number
,last_name varchar2(30)
,first_name varchar2(30));

You determine that several SQL queries will frequently use LAST_NAME in the WHERE clause. This
prompts you to create an index:

SQL> create index cust_idx1 on cust(last_name);

Several hundred rows are now inserted into the table (not all of the rows are shown here):

insert into cust values(7, 'ACER','SCOTT');
insert into cust values(5, 'STARK','JIM');
insert into cust values(3, 'GREY','BOB');
insert into cust values(11,'KHAN','BRAD');
.....
insert into cust values(274, 'ACER','SID');

After the rows are inserted, we ensure that the table statistics are up to date so as to provide the query optimizer sufficient information to make good choices on how to retrieve the data.

SQL> exec dbms_stats.gather_table_stats(ownname=>'MV_MAINT', -
tabname=>'CUST',cascade=>true);

As rows are inserted into the table, Oracle will allocate extents that consist of physical database blocks. Oracle will also allocate blocks for the index. For each record inserted into the table, Oracle will also create an entry in the index that consists of the ROWID and column value (the value in LAST_NAME in this example). The ROWID for each index entry points to the datafile and block that the table column value is stored in. Figure shows a graphical representation of how data is stored in the table and the corresponding B-tree index. For this example, datafiles 10 and 15 contain table data stored in associated blocks and datafile 22 stores the index blocks.



 There are two dotted lines in Figure. These lines depict how the ROWID (in the index structure) points to the physical location in the table for the column values of ACER. These particular values will be used in the scenarios in this solution.

When selecting data from a table and its corresponding index, there are three basic scenarios:
  • All table data required by the SQL query is contained in the index structure.Therefore only the index blocks need to be accessed. The blocks from the table are never read.
  • All of the information required by the query is not contained in the index blocks.Therefore the query optimizer chooses to access both the index blocks and the table blocks to retrieve the data needed to satisfy the results of the query.
  • The query optimizer chooses not to access the index. Therefore only the tableblocks are accessed.

The prior situations are covered in the next three subsections.

Scenario 1: All Data Lies in the Index Blocks

There are two scenarios that will be shown in this section:


  • Index range scan: This occurs when the optimizer determines it is efficient to use the index structure to retrieve multiple rows required by the query. Index range scans are used extensively in a wide variety of situations.

  • Index fast full scan: This occurs when the optimizer determines that most of the rows in the table will need to be retrieved. However, all of the information required is stored in the index. Since the index structure is usually smaller than the table structure, the optimizer determines that a full scan of the index is more efficient. This scenario is common for queries that count values.

First the index range scan is demonstrated. For this example, suppose this query is issued that selects from the table:

SQL> select last_name from cust where last_name='ACER';

Before reading on, look at above figure and try to answer this question: “What are the minimal number of blocks Oracle will need to read to return the data for this query?” In other words, what is the most efficient way to access the physical blocks in order to satisfy the results of this query? The optimizer could choose to read through every block in the table structure. However, that would result in a great deal of I/O, and thus it is not the most optimal way to retrieve the data.

For this example, the most efficient way to retrieve the data is to use the index structure. To return the rows that contain the value of ACER in the LAST_NAME column, Oracle will need to read three blocks: block 20, block 30, and block 39. We can verify that this is occurring by using Oracle’s Autotrace utility:

SQL> set autotrace on;
SQL> select last_name from cust where last_name='ACER';

Here is a partial snippet of the output:


 The prior output shows that Oracle needed to use only the CUST_IDX1 index to retrieve the data to satisfy the result set of the query. The table data blocks were not accessed; only the index blocks were required. This is a particularly efficient indexing strategy for the given query. Listed next are the statistics displayed by Autotrace for this example:

Statistics
-----------------------
1 recursive calls
0 db block gets
3 consistent gets
0 physical reads

The consistent gets value indicates that three blocks were read from memory (db block gets plus consistent gets equals the total blocks read from memory). Since the index blocks were already in memory, no physical reads were required to return the result set of this query.

Next an example that results in an index fast full scan is demonstrated. Consider this query:

SQL> select count(last_name) from cust;

Using SET AUTOTRACE ON, an execution plan is generated. Here is the corresponding output:


 The prior output shows that only the index structure was used to determine the count within the table. In this situation, the optimizer determined that a full scan of the index was more efficient than a full scan of the table.


Scenario 2: All Information Is Not Contained in the Index

Now consider this situation: suppose we need additional information from the CUST table. This query additionally selects the FIRST_NAME column:

SQL> select last_name, first_name from cust where last_name = 'ACER';

Using SET AUTOTRACE ON and executing the prior query results in the following execution plan:

The prior output indicates that the CUST_IDX1 index was accessed via an INDEX RANGE SCAN. The INDEX RANGE SCAN identifies the index blocks required to satisfy the results of this query. Additionally the table is read by TABLE ACCESS BY INDEX ROWID. The access to the table by the index’s ROWID means that Oracle uses the ROWID (stored in the index) to locate the data contained within the table blocks. In Figure , this is indicated by the dotted lines that map the ROWID to the appropriate table blocks that contain the value of ACER in the LAST_NAME column.

Again, looking at Figure, how many table and index blocks need to be read in this scenario? The index requires that blocks 20, 30, and 39 must be read. Since FIRST_NAME is not included in the index, Oracle must read the table blocks to retrieve these values. Oracle knows the ROWID of the table blocks and directly reads blocks 11 and 2500 to retrieve that data. That makes a total of five blocks. Here is a partial snippet of the statistics generated by Autotrace that confirms the number of blocks read is five:


Statistics
---------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads


Scenario 3: Only the Table Blocks Are Accessed

In some situations, even if there is an index, Oracle will determine that it’s more efficient to use only the table blocks. When Oracle inspects every row within a table, this is known a full table scan. For example, take this query:

SQL> select * from cust;

Here are the corresponding execution plan and statistics:



The prior output shows that a total of 119 blocks were inspected. Oracle searched every row in the table to bring back the results required to satisfy the query. In this situation, all blocks of the table must be read, and there is no way for Oracle to use the index to speed up the retrieval of the data.

How It Works

The B-tree index is the default index type in Oracle. For most OLTP-type applications, this index type is sufficient. This index type is known as B-tree because the ROWID and associated column values are stored within blocks in a balanced tree-like structure. The B stands for balanced.

B-tree indexes are efficient because, when properly used, they result in a query retrieving data far faster than it would without the index. If the index structure itself contains the required column values to satisfy the result of the query, then the table data blocks need not be accessed. Understanding these mechanics will guide your indexing decision-making process. For example, this will help you decide which columns to index and whether a concatenated index might be more efficient for certain queries and less optimal for others.

No comments:

Post a Comment