Dynamic connectivity

The dynamic connectivity is a data structure that dynamically maintains information about the connected components of a graph. In our context the problem specification is as follows: The input is a sequence of pairs of integers, where each integer represents an object of some type and we are to interpret that the pair p q as meaning p is connected to q. The assumption is that is connected to is an equivalence relation:

symmetric: if p is connected to q, then q is connected to p.
transitive: if p is connected to q and q is connected to r, then p is connected to r.
reflexive: p is connected to p.

An equivalence relation partitions the objects into equivalence classes or connected components.

For simplicity the first input will be the total number of elements, an example of the content of a possible input can be found on the data directory from the repository:

We are going to resolve the stated problem using four different approaches based on the Algorithms book each of those with different performance, the details of each of them are described in their own sections:

There is a UnionFind trait which defines the basic interface needed to resolve the problem, it basically defines three functions:

union: used to join two elements if they are in different components.
find: it returns the component identifier for a given element, its meaning depends on the algorithm.
components: it returns the current number of components.

pub use weighted_quick_union_with_path_compression::WeightedQuickUnionWithPathCompression;

pub trait UnionFind: Display {
    /// Merges two components if the two elements are in different components.
    fn union(&mut self, p: usize, q: usize);

    /// Returns the component identifier for a given element.
    fn find(&mut self, i: usize) -> usize;

    /// Returns the number of components.
    fn components(&self) -> usize;
}

To execute the different implementation of those algorithms there is a union_find CLI tool that, once compiled with cargo build, can be used to test each of those algorithms:

$ ./target/debug/union_find --help
Usage: union_find [OPTIONS] <ALGORITHM>

Arguments:
  <ALGORITHM>
          What algorithm to use

          Possible values:
          - quick-find:                                 Execute using quick find
          - quick-union:                                Execute using quick union
          - weighted-quick-union:                       Execute using weighted quick union
          - weighted-quick-union-with-path-compression: Execute using weighted quick union with path compression

Options:
  -d, --debug
          Whether to enable debug while processing

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

In the data directory there is a file that can be used as input for the command but the book's website also offers bigger files that are not in the repository for space efficiency reasons, see some execution examples:

$ ./target/debug/union_find quick-find < data/tinyUF.txt
there are 2 components

$ wget -qO - https://algs4.cs.princeton.edu/15uf/largeUF.txt | ./target/debug/union_find weighted-quick-union
there are 6 components

$ ./target/debug/union_find --debug quick-find <<EOF
5
1 0
2 1
3 4
EOF
p: 1, q: 0, Components: 4
  0  1  2  3  4
  0  0  2  3  4
p: 2, q: 1, Components: 3
  0  1  2  3  4
  0  0  0  3  4
p: 3, q: 4, Components: 2
  0  1  2  3  4
  0  0  0  4  4
there are 2 components

Summary

Algorithm	union	find
Quick find	N	1
Quick union	tree height	tree height
Weighted quick union	$ln N$	$ln N$
Weighted quick union with path compression	¹see note	¹see note

The amortized cost per operation for this algorithm is known to be bounded by a function known as the inverse Ackermann function and is less than 5 for any conceivable value of n that arises in practice.

RSPG: Rust Playground

Dynamic connectivity

Summary