Index-Organized Tables Wrap-Up-Database Tables-1
By Michele leonti / March 24, 2021 / No Comments / An Overview of Oracle Indexes, Oracle Certifications, Reverse Key Indexes, Setting PCTFREE Values
Index-organized tables (IOTs) are quite simply tables stored in an index structure. Whereas a table stored in a heap is unorganized (i.e., data goes wherever there is available space), data in an IOT is stored and sorted by primary key. IOTs behave just like “regular” tables do as far as your application is concerned; you use SQL to access them as normal. They are especially useful for information retrieval, spatial, and OLAP applications.
What is the point of an IOT? You might ask the converse, actually: what is the point of a heap-organized table? Since all tables in a relational database are supposed to have a primary key anyway, isn’t a heap-organized table just a waste of space? We have to make room for both the table and the index on the primary key of the table when using a heap-organized table.
With an IOT, the space overhead of the primary key index is removed, as the index is the data, and the data is the index. The fact is that an index is a complex data structure that requires a lot of work to manage and maintain, and the maintenance requirements increase as the width of the row to store increases. A heap, on the other hand, is trivial to manage by comparison.
There are efficiencies in a heap- organized table over an IOT. That said, IOTs have some definite advantages over their heap counterparts.
For example, I once built an inverted list index on some textual data (this predated the introduction of interMedia and related technologies). I had a table full of documents, and I would parse the documents and find words within them. My table looked like this:
SQL> create table keywords( word varchar2(50),position int,doc_id int,primary key(word,position,doc_id));
Here, I had a table that consisted solely of columns of the primary key. I had over 100 percent overhead; the size of my table and primary key index were comparable (actually, the primary key index was larger since it physically stored the rowid of the row it pointed to, whereas a rowid is not stored in the table—it is inferred). I only used this table with a WHERE clause on the WORD or WORD and POSITION columns. That is, I never used the table—I used only the index on the table. The table itself was no more than overhead. I wanted to find all documents containing a given word (or near another word and so on).
The KEYWORDS heap table was useless, and it just slowed down the application during maintenance of the KEYWORDS table and doubled the storage requirements. This is a perfect application for an IOT.
Another implementation that begs for an IOT is a code lookup table. Here, you might have ZIP_CODE to STATE lookup, for example. You can now do away with the heap table and just use an IOT itself. Anytime you have a table that you access via its primary key exclusively, it is a possible candidate for an IOT.
When you want to enforce colocation of data or you want data to be physically stored in a specific order, the IOT is the structure for you. For users of Sybase and SQL Server, this is where you would have used a clustered index, but IOTs go one better.
A clustered index in those databases may have up to a 110 percent overhead (similar to the previous KEYWORDS table example). Here, we have a 0 percent overhead since the data is stored only once.
A classic example of when you might want this physically colocated data would be in a parent/child relationship. Let’s say the EMP table had a child table containing addresses. You might have a home address entered into the system when the employee is initially sent an offer letter for a job. Later, they add their work address.
Over time, they move and change the home address to a previous address and add a new home address. Then they have a school address they added when they went back for a degree and so on. That is, the employee has three or four (or more) detail records, but these details arrive randomly over time. In a normal heap-based table, they just go anywhere.
The odds that two or more of the address records would be on the same database block in the heap table are very near zero. However, when you query an employee’s information, you always pull the address detail records as well. The rows that arrive over time are always retrieved together. To make the retrieval more efficient, you can use an IOT for the child table to put all of the records for a given employee near each other upon insertion, so when you retrieve them over and over again, you do less work.
An example will easily show the effects of using an IOT to physically colocate the child table information. Let’s create and populate an EMP table:
$ sqlplus eoda/foo@PDB1
SQL> create table emp asselect object_id empno,object_name ename,created hiredate,owner job
from all_objects;
I populated these tables by inserting into them a work address for each employee, then a home address, then a previous address, and finally a school address. A heap table would tend to place the data at the end of the table; as the data arrives, the heap table would simply add it to the end, due to the fact that the data is just arriving and no data is being deleted.
Over time, if addresses are deleted, the inserts would become more random throughout the table. Suffice it to say, the chance an employee’s work address would be on the same block as their home address in the heap table is near zero. For the IOT, however, since the key is on EMPNO, ADDR_TYPE, we’ll be pretty sure that all of the addresses for a given EMPNO are located on one or maybe two index blocks together. The inserts used to populate this data were
SQL> insert into heap_addressesselect empno, ‘WORK’, ‘123 main street’, ‘Washington’, ‘DC’, 20123 from emp;72075 rows created.
SQL> insert into iot_addressesselect empno, ‘WORK’, ‘123 main street’, ‘Washington’, ‘DC’, 20123 from emp;72075 rows created.
That is a pretty common plan: go to the EMP table by primary key; get the row; then using that EMPNO, go to the address table; and using the index, pick up the child records. We did 11 I/Os to retrieve this data. Now run the same query, but use the IOT for the addresses:
SQL> select * from emp, iot_addresses where emp.empno = iot_addresses.empno and emp.empno = 42;
Execution Plan
We did four fewer I/Os (the four should have been guessable); we skipped four TABLE ACCESS (BY INDEX ROWID BATCHED) steps. The more child records we have, the more I/Os we would anticipate skipping.
So, what is four I/Os? Well, in this case, it was over one-third of the I/O performed for the query, and if we execute this query repeatedly, it would add up. Each I/O and each consistent get requires an access to the buffer cache, and while it is true that reading data out of the buffer cache is faster than disk, it is also true that the buffer cache gets are not free and not totally cheap. Each will require many latches of the buffer cache, and latches are serialization devices that will inhibit our ability to scale. We can measure both the I/O reduction and latching reduction by running a PL/SQL block such as this:
SQL> beginfor x in ( select empno from emp )
Here, we are just emulating a busy period and running the query some 72,000 times, once for each EMPNO. If we run that for the HEAP_ADDRESSES and IOT_ADDRESSES tables, TKPROF shows us the following:
SELECT EMP.ENAME, A.STREET, A.CITY, A.STATE, A.ZIP FROM EMP, HEAP_ADDRESSES A WHERE EMP.EMPNO = A.EMPNO AND EMP.EMPNO = :B1
Both queries fetched exactly the same number of rows, but the HEAP table performed considerably more logical I/O. As the degree of concurrency on the system goes up, we would likewise expect the CPU used by the HEAP table to go up more rapidly as well, while the query possibly waits for latches into the buffer cache. Using runstats (a utility of my own design; refer to the introductory section of this book “Setting Up Your Environment” for details), we can measure the difference in latching. On my system, I observed the following:
where Run1 was the HEAP_ADDRESSES table and Run2 was the IOT_ADDRESSES table. As you can see, there was a dramatic and repeatable decrease in the latching taking place, mostly due to the cache buffers chains latch (the one that protects the buffer cache). The IOT in this case would provide the following benefits:
•\ Increased buffer cache efficiency, as any given query needs to havefewer blocks in the cache
•\ Decreased buffer cache access, which increases scalability
•\ Less overall work to retrieve our data, as it is faster
•\ Less physical I/O per query possibly, as fewer distinct blocks are
needed for any given query and a single physical I/O of the addresses most likely retrieves all of them (not just one of them, as the heap table implementation does)
The same would be true if you frequently use BETWEEN queries on a primary or unique key. Having the data stored physically sorted will increase the performance of those queries as well. For example, I maintain a table of stock quotes in my database. Every day, for hundreds of stocks, I gather together the stock ticker, date, closing price, day’s high, day’s low, volume, and other related information. The table looks like this.