Hash Table

Prerequisites

Singly Linked List and Malloc

Check out my article on Malloc here

What is a hash table?

A hash table is a hash map, that maps keys to values, just like a dictionary in other programming languages such as Python. It uses a hash function to compute the index also called hash code into slots from which desired values can be found. A hash function is used to map data in the hash table. The hash function is a regular function that takes a string as an arg and returns an integer after running an algorithm on the string.

To implement a collision-free hash table, we’ll need three things

The normally distributed hash function
Arrays data structure and
Linked List

However, before we move on to the implementation let’s understand collision; Collision is a situation that occurs when two or more pieces of keys run through the hash function and yields the same hash code or index number, hence the index would have already been allocated to the first key: value.

There are couple ways of resolving the collision, but here we’ll be using the chaining method which leads to the use of a data structure linked list.

Ok, that's that…. let's write our hash function

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// data types and structure
typedef char *string;

/*Table size, the max number of item we can store in the hash table 
before collision
collision occurs when two pieces of data when ran through 
the hash function yield the same hash code, hence different value 
will  be mapped to the same hash codes or index*/
#define TAB_SIZE 5

/*Hash_function*/
// we need this to generate our hash code which is the key index
unsigned int hash_func(string key){
 unsigned int result =0;
 int index, len = 0;
 len = strlen(key);
 for (int i = 0; i < len; i++){
  result += (key[i]);
 }

 index = result % TAB_SIZE;
 return index;
}

int main(void)
{
 // testing our hash function
 int key = 0;
 key = hash_func("Ayo");
 printf(" key for Ayo is :%d\n", key);
 return 0;
}

From the above code, we defined our hash function, which we’ll use to generate our key index to map data to, the function returns an unsigned int and takes a string argument which is char\,* we looped through all the array of chars and sum up all their ASCII value to result variable and then get the modulus of the result by the TAB_SIZE, and assign the result to the index which is then returned.

key for Ayo is :2

Note: If you should run the hash function on “Ayo” 1 million times, you should always get 2.

Implementation of a hash table create function

This function is what we’ll use to create our hash table.

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// The hash_node for the linked list that will be inserted into the arrays of
// linked list
typedef struct hash_node_p{
 string key;
 string value;
 struct hash_node_p *next;
}hash_node_t;

/*Arrays of linked list struct
*/
typedef struct hash_ll_array_t{
 hash_node_t **array ;// array of the linked list
} hash_ll_array_s;

hash_ll_array_s *hash_create(void){
 hash_ll_array_s *hashtable = malloc(sizeof(hash_ll_array_s)); // creating mem for the arrays of ll
 if (hashtable == NULL)
  return (NULL);
 hashtable->array = malloc(sizeof(hash_node_t *) * TAB_SIZE);  // creating mem for each linked list in the array

 //  now lets make all nodes null to avoid garbage values 
 for (int i = 0; i < TAB_SIZE; i++){
  hashtable->array[i] = NULL;
 }
 return hashtable;
}

So this is the hash create function, I’m guessing if we’re familiar with the above prerequisites the code will seem clear, but let me make it clearer.

The first data structure hash_node_t is the linked list struct that has three placeholders or spots for the key, value and the next node. The second data structure hash_ll_array_s is the data structure for the arrays of linked lists, so this array can hold as many as possible linked list.

The hash_create function returns hash_ll_array_s struct and takes no argument, inside the function memory of the size of the hash_ll_array_s struct and size of hash_node struct was allocated, so will get memories to store linked list nodes and its data. The last loop ensures that all the linked list first nodes in the array are set to null.

Next is a function to set the key, value pair

hash_node_t *key_value_set(string key, string value){
 hash_node_t *node = malloc(sizeof(hash_node_t));
 if (node == NULL)
  return (NULL);

 node->key = key;
 node->value = value;

 node->next = NULL;
 return node;
}

This function returns an updated node, allocates memory for the node, checks if memory allocations fail, assigns the key and value arguments to the node key and value respectively, and then sets the next node to be null.

The map data function will be used to set data to our hashtables based on the respective index generated with the hash function.

void map_data(hash_ll_array_s *hashtable, string key, string value){
 unsigned int index = hash_func(key);

 // giving the index to a node 
 hash_node_t *node = hashtable->array[index];

 if (node == NULL) {
  // if null just put key and value
  hashtable->array[index] = key_value_set(key, value);
  return;
 }

 hash_node_t *head;

 /*While the node is not null, that is when there is a collision
 we initialize the chaining method of resolving collision
 */
 while (node != NULL){
  // check if key exist already
  if (strcmp(node->key, key) == 0){
   node->value = value;
   return;
  }
  head = node;
  node = head->next;
 }
 // if no key match 
 head->next = key_value_set(key, value);
}

Alright, let's talk about the chaining method before I walk us through the code,

The chaining method of resolving collision works in such a way that, when the index has already been mapped with data instead of resolving using linear probing, that is checking the next to find an empty spot which might even result in another problem called clustering, it goes to the original index, which is going to be a linked list; the reason why we need to use a linked list, it compares the keys of the first node of the linked list, if true it reassign value, else if not the same key goes to next node and map the new data there. Now let’s picture this, got a code slot which is the index from the hash function, and want to map data to the slot but not empty, problem, what there? linked list, thank goodness! are the keys the same? no, well, I’ll just go to the next node and sit there.

The map function returns nothing, takes three args, assigns the index from the hash function to a node, and checks if a node in the slot is empty to map data, if not empty go to the node key and check if the same with the new one and if not just map the new data to the next node of the linked list in that same index slot.

Hash Table Look-Up Function

string hash_lookup(hash_ll_array_s *hashtable, string key){
 unsigned int index = hash_func(key);
 hash_node_t *node = hashtable->array[index];
 if (node == NULL)
  return NULL;
 while (node != NULL){
  if(strcmp(node->key, key) == 0){
   return node->value;
  }
  node = node->next;
 }
 return NULL;
}

This looks up the value of a key in the hash table, gets the key code, goes to the node index to compare if the passed key is the same as the key in the node key, if true returns the value, repeats this until the node is null and if no match returns null.

Display The Hash Table Function

void hash_table_show(hash_ll_array_s *hashtable){
 for (int i = 0; i < TAB_SIZE; i++){
  hash_node_t *node = hashtable->array[i];
  if (node == NULL)
   continue;
  printf("Index: %d ", i);
  while(node){
   printf("key, value is %s : %s ", node->key, node->value);
   node = node->next;
  }
  printf("\n");
 }
}

This function prints out the content of the hash table.

Delete Hash Table Function

void hash_delete(hash_ll_array_s *hashtable){
 hash_node_t *node, *tmp;
// if the array is empty
 if (hashtable->array == NULL)
 {
  free(hashtable);
  return;
 }

 for (int i = 0; i < TAB_SIZE; i++)
 {
  node = hashtable->array[i];
  while (node)
  {
   tmp = node;
   node = node->next;
   free(tmp);
  }
 }
 free(hashtable->array);
 free(hashtable);
}

This frees all memory allocated for the hash table, the first check, checks if there is no linked list in the array, The loop goes through all the linked list nodes in the array and deletes them one by one and when done, deletes the array and the hashtable itself.

Sum it all in code

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// data type and data Strucures
typedef char *string;

// The hash_node for the linked list that will be inserted into the arrays of
// linked list
typedef struct hash_node_p{
 string key;
 string value;
 struct hash_node_p *next;
}hash_node_t;

/*Arrays of linked list struct
*/
typedef struct hash_ll_array_t{
 hash_node_t **array ;// array of the linked list
} hash_ll_array_s;

/*Table size, the max number of item we can store in the hash table 
before collision
collision occurs when two pieces of data when ran through 
the hash function yield the same hash code, hence different value 
will  be mapped to the same hash codes or index*/
#define TAB_SIZE 5

/*Hash_function*/
// we need this to generate our hash code which is the key index

unsigned int hash_func(string key){
 unsigned int result =0;
 int index, len = 0;
 len = strlen(key);
 for (int i = 0; i < len; i++){
  result += (key[i]);
 }

 index = result % TAB_SIZE;
 return index;
}

/*now we've created how hash_function
*/

/*lets now create the hash_table_create function
to do that we need the help of two data structures arrays and 
singly linked list
*/

// Create the hash_table_create function, takes no arg

hash_ll_array_s *hash_create(void){
 hash_ll_array_s *hashtable = malloc(sizeof(hash_ll_array_s)); // creating mem for the arrays of ll
 if (hashtable == NULL)
  return (NULL);
 hashtable->array = malloc(sizeof(hash_node_t *) * TAB_SIZE);  // creating mem for each linked list in the array

 //  now lets make all nodes to null to avoid garbage values 
 for (int i = 0; i < TAB_SIZE; i++){
  hashtable->array[i] = NULL;
 }
 return hashtable;
}

hash_node_t *key_value_set(string key, string value){
 hash_node_t *node = malloc(sizeof(hash_node_t));
 if (node == NULL)
  return (NULL);
 node->key = key;
 node->value = value;

 node->next = NULL;
 return node;
}

void map_data(hash_ll_array_s *hashtable, string key, string value){
 unsigned int index = hash_func(key);

 // giving the index to a node 
 hash_node_t *node = hashtable->array[index];

 if (node == NULL) {
  // if null just put key and value
  hashtable->array[index] = key_value_set(key, value);
  return;
 }

 hash_node_t *head;

 /*While the node is not null, that is when the is a collision
 initialize the chaining method of resolving collision
 */
 while (node != NULL){
  // check if key exist already
  if (strcmp(node->key, key) == 0){
   node->value = value;
   return;
  }
  head = node;
  node = head->next;
 }
 // if no key match 
 head->next = key_value_set(key, value);
}

string hash_lookup(hash_ll_array_s *hashtable, string key){
 unsigned int index = hash_func(key);
 hash_node_t *node = hashtable->array[index];
 if (node == NULL)
  return NULL;
 while (node != NULL){
  if(strcmp(node->key, key) == 0){
   return node->value;
  }
  node = node->next;
 }
 return NULL;
}

void hash_table_show(hash_ll_array_s *hashtable){
 for (int i = 0; i < TAB_SIZE; i++){
  hash_node_t *node = hashtable->array[i];
  if (node == NULL)
   continue;
  printf("Index: %d ", i);
  while(node){
   printf("key, value is %s : %s ", node->key, node->value);
   node = node->next;
  }
  printf("\n");
 }

}

void hash_delete(hash_ll_array_s *hashtable){
 hash_node_t *node, *tmp;

 if (hashtable->array == NULL)
 {
  free(hashtable);
  return;
 }

 for (int i = 0; i < TAB_SIZE; i++)
 {
  node = hashtable->array[i];
  while (node)
  {
   tmp = node;
   node = node->next;
   free(tmp);
  }
 }
 free(hashtable->array);
 free(hashtable);
}

int main(void)
{
 int key = 0;
 key = hash_func("Ayo");
 printf(" key for Ayo is :%d\n", key);
 hash_ll_array_s *hashtable = hash_create();
 string keys[5] = {"Nigeria", "Kenya", "Ghana", "UK", "US"};
 string values[5] ={"Abuja", "Nairobi", "Accra", "London", "Washington DC"};
 for (int i = 0; i < 5; i++){
  map_data(hashtable, keys[i], values[i]);
 }
 printf("hash for US is %d\n", hash_func("US"));
    hash_table_show(hashtable);
    hash_delete(hashtable);
 return 0;
}

Output:

➜  practice git:(main) ✗ ./hash_table1                                                              
key for Ayo is :2
hash for US is 3
Index: 0 key, value is UK : London 
Index: 3 key, value is Nigeria : Abuja key, value is US : Washington DC 
Index: 4 key, value is Kenya : Nairobi key, value is Ghana : Accra

Memcheck output

➜  practice git:(main) ✗ valgrind ./hash_table1                                                     
==9960== Memcheck, a memory error detector
==9960== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9960== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==9960== Command: ./hash_table1
==9960== 
key for Ayo is :2
hash for US is 3
Index: 0 key, value is UK : London 
Index: 3 key, value is Nigeria : Abuja key, value is US : Washington DC 
Index: 4 key, value is Kenya : Nairobi key, value is Ghana : Accra 
==9960== 
==9960== HEAP SUMMARY:
==9960==     in use at exit: 0 bytes in 0 blocks
==9960==   total heap usage: 8 allocs, 8 frees, 1,192 bytes allocated
==9960== 
==9960== All heap blocks were freed -- no leaks are possible
==9960== 
==9960== For lists of detected and suppressed errors, rerun with: -s
==9960== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

References

Thats all for now..

Thanks for reading

Hash Table Implementation In C

Table of contents