Index Clustered Tables Wrap-Up-Database Tables-1
By Michele leonti / August 24, 2021 / No Comments / Object Tables Wrap-Up, Oracle Certifications, Reverse Key Indexes, Setting PCTFREE Values
Clustered tables give you the ability to physically prejoin data together. You use clusters to store related data from many tables on the same database block. Clusters can help read-intensive operations that always join data together or access related sets of data (e.g., everyone in department 10). Clustered tables reduce the number of blocks that Oracle must cache.
Instead of keeping ten blocks for ten employees in the same department, Oracle will put them in one block and therefore increase the efficiency of your buffer cache. On the downside, unless you can calculate your SIZE parameter setting correctly, clusters may be inefficient with their space utilization and can tend to slow down DML-heavy operations.
In a perfect world, with nicely distributed hash key values and a hash function that distributes them evenly over all of the blocks allocated to the hash cluster, we can go straight from a query to the data with one I/O. In the real world, we will end up with more hash key values hashing to the same database block address than fit on that block. This will result in Oracle having to chain blocks together in a linked list to hold all of the rows that hash to this block. Now, when we need to retrieve the rows that match our hash key, we might have to visit more than one block.
Like a hash table in a programming language, hash tables in the database have a fixed size. When you create the table, you must determine the number of hash keys your table will have, forever. That does not limit the amount of rows you can put in there.
In Figure 10-9, we can see a graphical representation of a hash cluster with table EMP created in it. When the client issues a query that uses the hash cluster key in the predicate, Oracle will apply the hash function to determine which block the data should be in. It will then read that one block to find the data. If there have been many collisions, or the SIZE parameter to the CREATE CLUSTER was underestimated, Oracle will have allocated overflow blocks that are chained off the original block.
Figure 10-9. Depiction of a hash cluster
When you create a hash cluster, you’ll use the same CREATE CLUSTER statement you used to create the index cluster with different options. You’ll just be adding a HASHKEYS option to it to specify the size of the hash table. Oracle will take your HASHKEYS value and round it up to the nearest prime number; the number of hash keys will always be a prime. Oracle will then compute a value based on the SIZE parameter multiplied by the modified HASHKEYS value. It will allocate at least that much space in bytes for the cluster. This is a big difference from the preceding index cluster, which dynamically allocates space as it needs it. A hash cluster preallocates enough space to hold (HASHKEYS/trunc(blocksize/SIZE)) bytes of data. For example, if you set your SIZE to 1500 bytes and you have a 4KB block size, Oracle will expect to store two keys per block.
If you plan on having 1000 HASHKEYs, Oracle will allocate 500 blocks.
It is interesting to note that unlike a conventional hash table in a computer language, it is OK to have hash collisions—in fact, it is desirable in many cases. If you take the same DEPT/EMP example from earlier, you could set up a hash cluster based on the DEPTNO column. Obviously, many rows will hash to the same value, and you expect them to (they have the same DEPTNO). This is what the cluster is about in some respects: clustering like data together. This is why Oracle asks you to specify the HASHKEYs (how many department numbers you anticipate over time) and SIZE (what the size of the data is that will be associated with each department number). It allocates a hash table to hold HASHKEY number of departments of SIZE bytes each. What you do want to avoid is unintended hash collisions. It is obvious that if you set the size of the hash table to 1000 (really 1009, since the hash table size is always a prime number and Oracle rounds up for you), and you put 1010 departments in the table, there will be at least one collision (two different departments hashing to the same value). Unintended hash collisions are to be avoided, as they add overhead and increase the probability of blockchaining occurring.
To see what sort of space hash clusters take, we’ll use a small utility stored procedure SHOW_SPACE (for details on this procedure, see the “Setting Up Your Environment” section at the beginning of the book) that we’ll use in this chapter and in the next chapter. This routine just uses the DBMS_SPACE-supplied package to get details about the storage used by segments in the database.