A disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It supports two operations :

**Find**: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.
**Union**: Join two subsets into a single subset.

**Space Complexity:** O(N) for N elements

**Time Complexity:** O(N + M lg*N) for N elements and M queries, where lg* is the inverse of Ackermann’s function. For all practical applications, lg*N <= 5

For a complete tutorial on disjoint set data structure, view the topcoder tutorial.

For an application of union find data structure, view Kruskal’s MST algorithm.

Take a look at the C++ implementation.

**Reference:** Algorithms 4th Edition by Robert Sedgewick and Kevin Wayne

28.635308
77.224960

### Like this:

Like Loading...

*Related*

Pingback: Kruskal’s Minimum Spanning Tree Algorithm | Everything Under The Sun

It seems like your find () member function is changing the id[] array by pointing it to root…should it really do that in there? It does not seem to fit the semantics of find(), where the object should remain unchanged… Anyway, I was looking for a simple implementation of UnionFind, and I like yours

Yes, that’s intentional and is called path compression and No, it does not disturb the semantics of find(). find() was supposed to give a representative node for a group and it still does that. It is allowed to modify its internal representation of groups in order to do its job better. If we did not change id[] array somewhere, then find() could take linear time in the worst case for a single query. With this change (combined with weighted union), it takes log*N, which is <= 5 for all practical values of N.

My last comment was incomplete, what I meant is I am doing my repointing to root in the merge subroutine, and I think that’s a better place to put it

I suppose you could but this is a pretty standard implementation. I don’t think we have to worry about the semantics of find() because it only operates on internal data and is transparent to users.

Thank you so much for this post! Really helpful! ðŸ˜€

I also found this link helpful for Union Find Data Structure: https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf

Hello. I was wondering if there is a way to do it without knowing the number of elements N beforehand.

Yeah. You can use a vector instead of an array and for each new element, insert its index in the vector. Everything else works out.