Index Clustered Tables Wrap-Up-Database Tables-2

We can see that the total number of blocks allocated to the table is 1152. 117 of these blocks are unused (free). One block goes to table overhead to manage the extents. Therefore, 1034 blocks are under the HWM of this object, and these are used by the cluster.

This example points out one of the issues with hash clusters you need to be aware of. Normally, if we create an empty table, the number of blocks under the HWM for that table is zero. If we full scan it, it reaches the HWM and stops. With a hash cluster, the tables will start out big and will take longer to create, as Oracle must initialize each block, an action that normally takes place as data is added to the table. They have the potential to have data in their first block and their last block, with nothing in between. Full scanning a virtually empty hash cluster will take as long as full scanning a full hash cluster. This is not necessarily a bad thing; we built the hash cluster to have very fast access to the data by a hash key lookup. We did not build it to full scan it frequently.

To see the difference a hash cluster can make, I set up a small test. I created a hash cluster, loaded some data in it, copied this data to a regular table with a conventional index on it, and then did random reads on each table (the same “random” reads on each). Using runstats, SQL_TRACE, and TKPROF, I was able to determine the characteristics of each. The following is the setup I performed, followed by the analysis:

I created the hash cluster with a SIZE of 150 bytes. This is because I determined the average row size for a row in my table would be about 100 bytes, but would vary up and down based on the data with many rows coming in at around 150 bytes. I then created and populated a table in that cluster as a copy of ALL_OBJECTS.

Now, all I needed was some random data to pick rows from each of the tables with. To achieve that, I simply selected all of the OBJECT_IDs into an array and had them sorted randomly, to hit the table all over in a scattered fashion. I used a PL/SQL package to define and declare the array and a bit of PL/SQL code to prime the array, to fill it up:

To see the work performed by each, I used the following block of code (if you replace occurrences of the word HEAP with HASHED, you have the other block of code you need to test against):

Next, I ran the preceding block of code three times (and the copy of that block of code where HEAP is replaced with HASHED as well). The first run was to warm up the system, to get any hard parses out of the way. The second time I ran the blocks of code, I used runstats to see the material differences between the two: running first the hashed im4plementation and then the heap. The third time I ran the blocks of code, I did so with SQL_TRACE enabled so I could see a TKPROF report. The runstats run reported the following:

The HASHED implementation simply converted the OBJECT_ID passed into the query into a FILE/BLOCK to be read and read it—no index. The HEAP table, however, had to do two I/Os on the index for each row. The cr=2 in the TKPROF Row Source Operation line shows us exactly how many consistent reads were done against the index. Each time

I looked up OBJECT_ID = :B1, Oracle had to get the root block of the index and then find the leaf block containing the location of that row. Then, I had to take the leaf block information, which included the ROWID of that row, and access that row in the table for a third I/O. The HEAP table did three times the I/O of the HASHED implementation.

The points of interest here are as follows:

•\ The hash cluster did significantly less I/O (query column). This is what we anticipated. The query simply took the random OBJECT_IDs, performed the hash on them, and went to the block. The hash cluster has to do at least one I/O to get the data. The conventional table with an index had to perform index scans followed by a table access by rowid to get the same answer. The indexed table has to do at least three I/Os in this case to get the data.

•\ The hash cluster query took the same amount of CPU for all intents and purposes, even though it went to the buffer cache one-third as many times. This, too, could be anticipated. The act of performing a hash is very CPU intensive. The act of performing an index lookup is I/O intensive. It was a trade-off. However, as we scale up users, we would expect the hash cluster query to scale better, as it has to get in line to access the buffer cache less frequently.

This last point is the important one. When working with computers, it is all about resources and their utilization. If we are I/O bound and perform queries that do lots of keyed reads like I just did, a hash cluster may improve performance. If we are already CPU bound, a hash cluster may possibly decrease performance since it needs more CPU horsepower to hash. However, if the extra CPU we are burning is due to spinning on cache buffers chains latches, the hash cluster could significantly reduce the CPU needed. This is one of the major reasons why rules of thumb do not work on real-world systems: what works for you might not work for others in similar but different conditions.

There is a special case of a hash cluster called a single table hash cluster. This is an optimized version of the general hash cluster we’ve already looked at. It supports only one table in the cluster at a time (you have to DROP the existing table in a single table hash cluster before you can create another). Additionally, if there is a one-to-one mapping between hash keys and data rows, the access to the rows is somewhat faster as well. These hash clusters are designed for those occasions when you want to access a table by primary key and do not care to cluster other tables with it. If you need fast access to an employee record by EMPNO, a single table hash cluster might be called for. I did the preceding test on a single table hash cluster as well and found the performance to be even better than just a hash cluster. You could even go a step further with this example and take advantage of the fact that Oracle will allow you to write your own specialized hash function (instead of using the default one provided by Oracle). You are limited to using only the columns available in the table, and you may use only the Oracle built-in functions (e.g., no PL/SQL code) when writing these hash functions. By taking advantage of the fact that OBJECT_ID is a number between 1 and 75,000 in the preceding example, I made my hash function simply be the OBJECT_ID column itself. In this fashion, I am guaranteed to never have a hash collision. Putting it all together, I’ll create a single table hash cluster with my own hash function via

I’ve simply added the keywords SINGLE TABLE to make it a single table hash cluster. My HASH IS clause uses the HASH_KEY cluster key in this case. This is a SQL function, so I could have used trunc(mod(hash_key/324+278,555)/abs(hash_key+1)) if I wanted (not that this is a good hash function—it just demonstrates that we can use a complex function there if we wish). I used a NUMBER(10) instead of just a number. Since the hash value must be an integer, it cannot have any fractional components. Then, I create the table in that cluster to build the hashed table:

Note the use of the CAST built-in function to make the datatype of OBJECT_ID be what it must be. I ran the test as before (three runs of each block), and this time the runstats output was consistently even more positive:

Run1 ran in 99 cpu hsecs
Run2 ran in 132 cpu hsecs
run 1 ran in 75% of the time

This single table hash cluster required even less latching into the buffer cache to process (it can stop looking for data sooner, and it has more information). As a result, the TKPROF report shows a measurable decrease in CPU utilization this time around:

Leave a Reply Cancel reply