Index Key Compression-Indexes
By Michele leonti / March 24, 2024 / No Comments / Index-Organized Tables Wrap-Up, Oracle Certifications
One of the interesting things you can do with a B*Tree index is compress it. This is not compression in the same manner that ZIP files are compressed; rather, this is compression that removes redundancies from concatenated (multicolumn) indexes.
We covered compressed key indexes in some detail in the section “Index-Organized Tables” in Chapter 10, and we will take a brief look at them again here. The basic concept behind a compressed key index is that every entry is broken into two pieces: a prefix and suffix component. The prefix is built on the leading columns of the concatenated index and will have many repeating values. The suffix is built on the trailing columns in the index key and is the unique component of the index entry within the prefix.
By way of example, we’ll create a table and a concatenated index and measure its space without compression using ANALYZE INDEX.
Note There is a common misperception that ANALYZE should not be used as a command in Oracle—that the DBMS_STATS package supersedes it. This is not true. What is true is that ANALYZE should not be used to gather statistics, but the other capabilities of ANALYZE still apply. The ANALYZE command should be used to perform operations such as validating the structure of an index (as we will later) or listing chained rows in a table. DBMS_STATS should be used exclusively to gather statistics on objects.
We’ll then re-create the index with index key compression, compressing a different number of key entries, and see the difference. Let’s start with this table and index:
$ sqlplus eoda/foo@PDB1
SQL> create table t as select * from all_objects where rownum <= 50000;Table created. SQL> create index t_idx on t(owner,object_type,object_name); Index created.
SQL> analyze index t_idx validate structure; Index analyzed.
We then create an IDX_STATS table in which to save INDEX_STATS information, and we label the rows in the table as “noncompressed”:
SQL> create table idx_stats asselect ‘noncompressed’ what, a.*from index_stats a;Table created.
Now, we could realize that the OWNER component is repeated many times, meaning that a single index block in this index will have dozens of entries, as shown in Figure 11-2.
Figure 11-2. Index block with the OWNER column repeated
We could factor the repeated OWNER column out of this, resulting in a block that looks more like Figure 11-3.
Figure 11-3. Index block with the OWNER column factored out
In Figure 11-3, the owner name appears once on the leaf block—not once per repeated entry. We run the following script, passing in the number 1, to re-create the scenario whereby the index is using compression on just the leading column:
SQL> drop index t_idx;
SQL> create index t_idx ont(owner,object_type,object_name)compress &1;
SQL> analyze index t_idx validate structure; SQL> insert into idx_statsselect ‘compress &1’, a.*from index_stats a;
For comparison reasons, we run this script not only with one column but also two and three compressed columns to see what happens. At the end, we query IDX_STATS and should observe this:
SQL> select what, height, lf_blks, br_blks,btree_space, opt_cmpr_count, opt_cmpr_pctsave
from idx_stats;WHAT HEIGHT LF_BLKS BR_BLKS BTREE_SPACE OPT_CMPR_COUNT
OPT_CMPR_PCTSAVE
We see that the COMPRESS 1 index is about 90 percent the size of the noncompressed index (comparing BTREE_SPACE). The number of leaf blocks has decreased measurably. Further, when we use COMPRESS 2, the savings are even more impressive. The resultingindex is about 71 percent the size of the original. In fact, using the column OPT_CMPR_ PCTSAVE, which stands for optimum compression percent saved or the expected savings from compression, we could have guessed the size of the COMPRESS 2 index:
SQL> select 1823120(1-0.28) from dual;1823120(1-0.28)1312646.4
Note The ANALYZE command against the noncompressed index populated the OPT_CMPR_PCTSAVE/OPT_CMPR_COUNT columns and estimated a 28 percent savings with COMPRESS 2, and we achieved just about exactly that.
But notice what happens with COMPRESS 3. The resulting index is actually larger: 117 percent the size of the original index. This is due to the fact that each repeated prefix we remove saves the space of N copies, but adds 4 bytes of overhead on the leaf block as part of the compression scheme. By adding in the OBJECT_NAME column to the compressed key, we made that key almost unique—in this case meaning there were really no duplicate copies to factor out. Therefore, we ended up adding 4 bytes to almost every single index key entry and factoring out no repeating data. The OPT_CMPR_COUNT column in IDX_STATS is dead accurate at providing the best compression count to be used, and OPT_CMPR_PCTSAVE will tell you exactly how much savings to expect.
Now, you do not get this compression for free. The compressed index structure is now more complex than it used to be. Oracle will spend more time processing the data in this structure, both while maintaining the index during modifications and when you search the index during a query. What we are doing here is trading off increased CPU time for reduced I/O time. With compression, our block buffer cache will be able to hold more index entries than before, our cache-hit ratio might go up, and our physical I/Os should go down, but it will take a little more CPU horsepower to process the index, and it will also increase the chance of block contention. Just as in our discussion of the hash cluster, where it might take more CPU to retrieve a million random rows but half the I/O, we must be aware of the trade-off. If you are currently CPU bound, adding compressed key indexes may slow down your processing. On the other hand, if you are I/O bound, using them may speed up things.