How HashSet works internally in Java ?

In this tutorial we will see how java.util.HashSet works internally. It is very common Collection framework interview question. Some interviewer can even ask you to implement your own Set interface which behaves similar like a HashSet. In both cases you need to know internal working of HashSet or How HashSet works in Java. How will you make sure each and every element is unique without using Set interfaces or Classes that implements Set Interface.

HashSet uses HashMap internally to store Objects. When we create a HashSet Object, one HashMap gets created internally as a backing data structure for your HashSet. This HashMap object is used to store the elements we enter in the HashSet. The elements we add into HashSet are stored as keys of this HashMap object. The value associated with those keys will be a constant. Let's look at the code of HashSet to check what happens when we instantiate HashSet class.

private transient HashMap map;

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

/**
* Constructs a new, empty set; the backing HashMap instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
map = new HashMap<>();
}

So if we create a HashSet by the following code snippet :

Set set = new HashSet<>();

An HashMap Object will be created internally where the key are the values of the HashSet and the value is a dummy Object named PRESENT. The following diagram will present how set are represented by a backing HashMap :

Now let us check other constructors available in HashSet class.

1. Constructs a new, empty set; backed by a HashMap instance.Here the HashMap instance is created with a specified initial capacity and of default load factor (0.75).

/**
 * @param      initialCapacity   the initial capacity of the hash table
 * @throws     IllegalArgumentException if the initial capacity is less
 *             than zero
 */
public HashSet(int initialCapacity) {
    map = new HashMap<>(initialCapacity);
}

2. Constructs a new set with the elements in the specified collection passed as an argument in the constructor.  The HashMap is created with default load factor (0.75) and an initial capacity sufficient to contain the elements in the specified collection.

/**
 * @param c the collection whose elements are to be placed into this set
 * @throws NullPointerException if the specified collection is null
 */
public HashSet(Collection  c) {
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
    addAll(c);
}

3.  Constructs a new, empty set; backed by a HashMap instance which has the specified initial capacity and the specified load factor.

/**
 * @param      initialCapacity   the initial capacity of the hash map
 * @param      loadFactor        the load factor of the hash map
 * @throws     IllegalArgumentException if the initial capacity is less
 *             than zero, or if the load factor is nonpositive
 */
public HashSet(int initialCapacity, float loadFactor) {
    map = new HashMap<>(initialCapacity, loadFactor);
}

How add() works in HashSet ?

As we already discussed that HashSet is backed by an internal HashMap so a call to add(Object) is delegate to put(Key, Value) internally, where Key is the Object we are trying to add in the HashSet and value is another object,  called PRESENT, which is a constant in java.util.HashSet as shown below :

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

As HashMap does not allow duplicate Keys, HashSet achieves its uniqueness behavior because of this. Now if we check HashMap class's put method's implementation, It says the following   

  1. If the key to be added is Unique then the Entry with the key is added to map and returns null.
  2. If Key to be added is not unique and already exist,put method simply replaces old value with the new value and returns old value against key.

 Please check how put method of HashMap works internally. Below is the source code of put() method of HashMap class.

/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with key, or
*         null if there was no mapping for key.
*         (A null return can also indicate that the map
*         previously associated null with key.)
*/
public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

Now the add() method of HashSet,it has following code

public boolean add(E e) {
  return map.put(e, PRESENT)==null;
}

which says that,if put method of underlying HashMap returns null then return true,which means element has been successfully added to HashSet. And if we try to add duplicate Object in HashSet, put method will replace the old value with new value and return old value, which of-course will not be null,so add method will return false depicting that duplicate element has not been added.

How Objects are retrieved from HashSet ?

There is no get() method as such in HashSet class. A set is a collection of objects which treats a.equals(b) == true as duplicates, so it doesn't make sense to try to get the same object you already have.

Also HashSet is not an ordered collection, it does not preserve the insertion order. Which means that order in which elements are retrieved can different from order in which those elements are added in the HashSet. So it does not make sense to get element from HashSet on the basis of index like get(index i),which is used in case of ArrayList. Here ArrayList is an ordered collection backed by an array. Where getting an Object by index makes sense.

So how to retrieve elements from HashSet ?

We can retrieve elements of HashSet using an Iterator. Let us check how HashSet implements iterator() for retrieval of Objects.

/**

* Returns an iterator over the elements in this set.  The elements
* are returned in no particular order.
*
* @return an Iterator over the elements in this set
* @see ConcurrentModificationException
*/

public Iterator iterator() {
 return map.keySet().iterator();
}

It simply returns an iterator over the keySet() of the backing HashMap. Which are the actual Objects in the HashSet. Now we can iterate over the keySet() and check for the object required.

Now let us check an example code snippet of HashSet:

import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;

public class SetExample {

    public static void main(String[] args)
  {

      Set set = new HashSet();
      
      // will return true as added successfully
      System.out.println(set.add("NINJA"));
      System.out.println(set.add("PANDA"));
      System.out.println(set.add("JAVA"));
      System.out.println(set.add("C++"));
      
      // will return false as duplicate
      System.out.println(set.add("NINJA"));
      
      // Now iterate over the HashSet
      System.out.println("PRINTING THE HASHSET");
      Iterator setItr = set.iterator();
      while(setItr.hasNext()){
          System.out.println(setItr.next());
      }
      
      // Now REMOVE AN Object from HashSet
      set.remove("PANDA");
      
      // Now printing the set again as a whole
      System.out.println(set);
  }
}

From the output of the program we can check the following important points:

  1. It prints true when the Object is added successfully to the HashSet and false whenever we have tried to add a duplicate Object.
  2. While printing the HashSet using an Iterator we can see, The insertion order of Objects in HashSet is different than the retrieval order. The String "NINJA" is added first to the HashSet but retrieved at the 3rd iteration.
true
true
true
true
false
PRINTING THE HASHSET
JAVA
C++
NINJA
PANDA
[JAVA, C++, NINJA]

 

 

 

CORE JAVA COLLECTIONS SET