Hash Clustered Tables Wrap-Up-Database Tables

That is the nuts and bolts of a hash cluster. Hash clusters are similar in concept to index clusters, except a cluster index is not used. The data is the index in this case. The cluster key is hashed into a block address, and the data is expected to be there. The important things to understand about hash clusters are as follows:

•\ The hash cluster is allocated right from the beginning. Oracle will take your HASHKEYS/trunc(blocksize/SIZE) and allocate and format that space right away. As soon as the first table is put in that cluster, any full scan will hit every allocated block. This is different from every other table in this respect.

•\ The number of HASHKEYs in a hash cluster is a fixed size. You cannot change the size of the hash table without a rebuild of the cluster. This does not in any way limit the amount of data you can store in this cluster; it simply limits the number of unique hash keys that can be generated for this cluster. This may affect performance due to unintended hash collisions if the value was set too low.

•\ Range scanning on the cluster key is not available. Predicates such as WHERE cluster_key BETWEEN 50 AND 60 cannot use the hashing algorithm. There are an infinite number of possible values between 50 and 60, and the server would have to generate them all to hash each one and see if there was any data there. This is not possible. The cluster will be full scanned if you use a range on a cluster key and have not indexed it using a conventional index.

Hash clusters are suitable in the following situations:

•\ You know with a good degree of accuracy how many rows the table will have over its life, or you have some reasonable upper bound.Getting the size of the HASHKEYs and SIZE parameters right is crucial to avoid a rebuild.

•\ DML, especially inserts, is light with respect to retrieval. This means you have to balance optimizing data retrieval with new data creation. Light inserts might be 100,000 per unit of time for one person and 100 per unit of time for another—all depending on their data retrieval patterns. Updates do not introduce significant overhead, unless you update the HASHKEY, which would not be a good idea as it would cause the row to migrate.

•\ You access the data by the HASHKEY value constantly. For example, say you have a table of parts, and these parts are accessed by part number. Lookup tables are especially appropriate for hash clusters.

Leave a Reply Cancel reply