Union Find / Disjoint Set data structure C++ implementation

A disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It supports two operations :

  1. Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.
  2. Union: Join two subsets into a single subset.

Space Complexity: O(N) for N elements

Time Complexity: O(N + M lg*N) for N elements and M queries, where lg* is the inverse of Ackermann’s function. For all practical applications, lg*N <= 5

For a complete tutorial on disjoint set data structure, view the topcoder tutorial.

For an application of union find data structure, view Kruskal’s MST algorithm.

Take a look at the C++ implementation.

Reference: Algorithms 4th Edition by Robert Sedgewick and Kevin Wayne

6 thoughts on “Union Find / Disjoint Set data structure C++ implementation

  1. Pingback: Kruskal’s Minimum Spanning Tree Algorithm | Everything Under The Sun

  2. It seems like your find () member function is changing the id[] array by pointing it to root…should it really do that in there? It does not seem to fit the semantics of find(), where the object should remain unchanged… Anyway, I was looking for a simple implementation of UnionFind, and I like yours

    • Yes, that’s intentional and is called path compression and No, it does not disturb the semantics of find(). find() was supposed to give a representative node for a group and it still does that. It is allowed to modify its internal representation of groups in order to do its job better. If we did not change id[] array somewhere, then find() could take linear time in the worst case for a single query. With this change (combined with weighted union), it takes log*N, which is <= 5 for all practical values of N.

  3. My last comment was incomplete, what I meant is I am doing my repointing to root in the merge subroutine, and I think that’s a better place to put it

    • I suppose you could but this is a pretty standard implementation. I don’t think we have to worry about the semantics of find() because it only operates on internal data and is transparent to users.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s