Hashing

Hashing

hashing

Hashing is a technique to map things from one range to another range by using a hash function which process the input key and returns an hash value. Modulus operation can be used on hash value to get index in fixed range. Key value pair is stored at this returned index. Pigeon hole rule says if pigeons are more than the holes then some pigeon have to share holes. So whenever we try to map a larger range of keys into a fixed size hash array then modulus operation may return same index for two different hash values and hash function also can generate same hash value for two different keys that we call hash collision. In general hash function does not have any fixed output range so hash function tries to minimize the collision with better algorithm but for storing key-value pairs we have to use fixed size array. Second hash function modulus is used to map hash values generated by hash function to fixed range so collisions are unavoidable. Hash function which generates uniform hash values should be used to minimize collisions. 

Collision Handling:
1) Separate Chaining
2) Open Addressing  

Separate Chaining 

9a1f6-chaining                                                                                              

Separate chaining is very simple approach to deal with collisions just keep a linked list at each index and in case of collision add one more node. For accessing stored value first hash function and modulus will give index then search through whole list till key matches with the stored nodes or list ends.

Open Addressing                                                                                                  b2f61-openaddressing2b252812529                    
In open addressing method key and value pairs are stored in table itself. In case of collision we search for next empty slot to put the key value pair. This approach have limitation that we can not insert key if table is full, to insert more keys after getting table full a new big table can be allocated and old table can be copied to new bigger table. Searching has same approach as insert first go the index given by [hashValue % k] then keep traversing more slots till key matches or one round of table is complete.
There are many type of open addressing strategies:
a) Linear Probing
b) Quadratic Probing.

  Seperate Chaining Open Addressing
1. Chaining is Simple and easy to implement. Open Addressing requires more traversal of list during insert.
2. In chaining, Hash table never fills up, we can always add more elements to chain. In open addressing, table may become full.
3. Chaining is Less sensitive to the hash function or load factors. Open addressing requires extra care for to avoid clustering and load factor.
4. Chaining is mostly used when it is unknown how many and how frequently keys may be inserted or deleted. Open addressing is used when the frequency and number of keys is known.
5. Cache performance of chaining is not good as keys are stored using linked list. Open addressing provides better cache performance as everything is stored in the same table.
6. Wastage of Space (Some Parts of hash table in chaining are never used). In Open addressing, a slot can be used even if an input doesn’t map to it.
7. Chaining uses extra space for links. No links in Open addressing

a) Linear Probing                                                                                                                     
b) Quadratic Probing                                                                                                         
c) Double Hash                                                                                                                       

Performance Analysis

work in progress.

Leave a comment