Dynamic connectivity

The dynamic connectivity is a data structure that dynamically maintains information about the connected components of a graph. In our context the problem specification is as follows: The input is a sequence of pairs of integers, where each integer represents an object of some type and we are to interpret that the pair p q as meaning p is connected to q. The assumption is that is connected to is an equivalence relation:

  • symmetric: if p is connected to q, then q is connected to p.
  • transitive: if p is connected to q and q is connected to r, then p is connected to r.
  • reflexive: p is connected to p.

An equivalence relation partitions the objects into equivalence classes or connected components.

For simplicity the first input will be the total number of elements, an example of the content of a possible input can be found on the data directory from the repository:

10
4 3
3 8
6 5
9 4
2 1
8 9
5 0
7 2
6 1
1 0
6 7

We are going to resolve the stated problem using four different approaches based on the Algorithms book each of those with different performance, the details of each of them are described in their own sections:

There is a UnionFind trait which defines the basic interface needed to resolve the problem, it basically defines three functions:

  • union: used to join two elements if they are in different components.
  • find: it returns the component identifier for a given element, its meaning depends on the algorithm.
  • components: it returns the current number of components.
pub use weighted_quick_union_with_path_compression::WeightedQuickUnionWithPathCompression;

pub trait UnionFind: Display {
    /// Merges two components if the two elements are in different components.
    fn union(&mut self, p: usize, q: usize);

    /// Returns the component identifier for a given element.
    fn find(&mut self, i: usize) -> usize;

    /// Returns the number of components.
    fn components(&self) -> usize;
}

To execute the different implementation of those algorithms there is a union_find CLI tool that, once compiled with cargo build, can be used to test each of those algorithms:

$ ./target/debug/union_find --help
Usage: union_find [OPTIONS] <ALGORITHM>

Arguments:
  <ALGORITHM>
          What algorithm to use

          Possible values:
          - quick-find:                                 Execute using quick find
          - quick-union:                                Execute using quick union
          - weighted-quick-union:                       Execute using weighted quick union
          - weighted-quick-union-with-path-compression: Execute using weighted quick union with path compression

Options:
  -d, --debug
          Whether to enable debug while processing

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

In the data directory there is a file that can be used as input for the command but the book's website also offers bigger files that are not in the repository for space efficiency reasons, see some execution examples:

$ ./target/debug/union_find quick-find < data/tinyUF.txt
there are 2 components

$ wget -qO - https://algs4.cs.princeton.edu/15uf/largeUF.txt | ./target/debug/union_find weighted-quick-union
there are 6 components

$ ./target/debug/union_find --debug quick-find <<EOF
5
1 0
2 1
3 4
EOF
p: 1, q: 0, Components: 4
  0  1  2  3  4
  0  0  2  3  4
p: 2, q: 1, Components: 3
  0  1  2  3  4
  0  0  0  3  4
p: 3, q: 4, Components: 2
  0  1  2  3  4
  0  0  0  4  4
there are 2 components

Summary

Algorithmunionfind
Quick findN1
Quick uniontree heighttree height
Weighted quick union
Weighted quick union with path compression1see note1see note
1

The amortized cost per operation for this algorithm is known to be bounded by a function known as the inverse Ackermann function and is less than 5 for any conceivable value of n that arises in practice.