Dynamic connectivity
The dynamic connectivity is a data structure that dynamically maintains information about the connected components of a graph. In our context the problem specification is as follows: The input is a sequence of pairs of integers, where each integer represents an object of some type and we are to interpret that the pair p q as meaning p is connected to q. The assumption is that is connected to is an equivalence relation:
- symmetric: if
pis connected toq, thenqis connected top. - transitive: if
pis connected toqandqis connected tor, thenpis connected tor. - reflexive:
pis connected top.
An equivalence relation partitions the objects into equivalence classes or connected components.
For simplicity the first input will be the total number of elements, an example of the content of a possible input can be found on the data directory from the repository:
10
4 3
3 8
6 5
9 4
2 1
8 9
5 0
7 2
6 1
1 0
6 7
We are going to resolve the stated problem using four different approaches based on the Algorithms book each of those with different performance, the details of each of them are described in their own sections:
There is a UnionFind trait which defines the basic interface needed to resolve the problem, it basically defines three functions:
union: used to join two elements if they are in different components.find: it returns the component identifier for a given element, its meaning depends on the algorithm.components: it returns the current number of components.
pub use weighted_quick_union_with_path_compression::WeightedQuickUnionWithPathCompression;
pub trait UnionFind: Display {
/// Merges two components if the two elements are in different components.
fn union(&mut self, p: usize, q: usize);
/// Returns the component identifier for a given element.
fn find(&mut self, i: usize) -> usize;
/// Returns the number of components.
fn components(&self) -> usize;
}
To execute the different implementation of those algorithms there is a union_find CLI tool that, once compiled with cargo build, can be used to test each of those algorithms:
$ ./target/debug/union_find --help
Usage: union_find [OPTIONS] <ALGORITHM>
Arguments:
<ALGORITHM>
What algorithm to use
Possible values:
- quick-find: Execute using quick find
- quick-union: Execute using quick union
- weighted-quick-union: Execute using weighted quick union
- weighted-quick-union-with-path-compression: Execute using weighted quick union with path compression
Options:
-d, --debug
Whether to enable debug while processing
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
In the data directory there is a file that can be used as input for the command but the book's website also offers bigger files that are not in the repository for space efficiency reasons, see some execution examples:
$ ./target/debug/union_find quick-find < data/tinyUF.txt
there are 2 components
$ wget -qO - https://algs4.cs.princeton.edu/15uf/largeUF.txt | ./target/debug/union_find weighted-quick-union
there are 6 components
$ ./target/debug/union_find --debug quick-find <<EOF
5
1 0
2 1
3 4
EOF
p: 1, q: 0, Components: 4
0 1 2 3 4
0 0 2 3 4
p: 2, q: 1, Components: 3
0 1 2 3 4
0 0 0 3 4
p: 3, q: 4, Components: 2
0 1 2 3 4
0 0 0 4 4
there are 2 components
Summary
| Algorithm | union | find |
|---|---|---|
| Quick find | N | 1 |
| Quick union | tree height | tree height |
| Weighted quick union | ||
| Weighted quick union with path compression | 1see note | 1see note |
The amortized cost per operation for this algorithm is known to be bounded by a function known as the inverse Ackermann function and is less than 5 for any conceivable value of n that arises in practice.