Algorithms and Data structures

This section is dedicated to the implementation of common algorithms and data structures in Rust as a process to learning the programming language.

All the code is based on the following books:

Dynamic connectivity

The dynamic connectivity is a data structure that dynamically maintains information about the connected components of a graph. In our context the problem specification is as follows: The input is a sequence of pairs of integers, where each integer represents an object of some type and we are to interpret that the pair p q as meaning p is connected to q. The assumption is that is connected to is an equivalence relation:

  • symmetric: if p is connected to q, then q is connected to p.
  • transitive: if p is connected to q and q is connected to r, then p is connected to r.
  • reflexive: p is connected to p.

An equivalence relation partitions the objects into equivalence classes or connected components.

For simplicity the first input will be the total number of elements, an example of the content of a possible input can be found on the data directory from the repository:

10
4 3
3 8
6 5
9 4
2 1
8 9
5 0
7 2
6 1
1 0
6 7

We are going to resolve the stated problem using four different approaches based on the Algorithms book each of those with different performance, the details of each of them are described in their own sections:

There is a UnionFind trait which defines the basic interface needed to resolve the problem, it basically defines three functions:

  • union: used to join two elements if they are in different components.
  • find: it returns the component identifier for a given element, its meaning depends on the algorithm.
  • components: it returns the current number of components.
pub use weighted_quick_union_with_path_compression::WeightedQuickUnionWithPathCompression;

pub trait UnionFind: Display {
    /// Merges two components if the two elements are in different components.
    fn union(&mut self, p: usize, q: usize);

    /// Returns the component identifier for a given element.
    fn find(&mut self, i: usize) -> usize;

    /// Returns the number of components.
    fn components(&self) -> usize;
}

To execute the different implementation of those algorithms there is a union_find CLI tool that, once compiled with cargo build, can be used to test each of those algorithms:

$ ./target/debug/union_find --help
Usage: union_find [OPTIONS] <ALGORITHM>

Arguments:
  <ALGORITHM>
          What algorithm to use

          Possible values:
          - quick-find:                                 Execute using quick find
          - quick-union:                                Execute using quick union
          - weighted-quick-union:                       Execute using weighted quick union
          - weighted-quick-union-with-path-compression: Execute using weighted quick union with path compression

Options:
  -d, --debug
          Whether to enable debug while processing

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

In the data directory there is a file that can be used as input for the command but the book's website also offers bigger files that are not in the repository for space efficiency reasons, see some execution examples:

$ ./target/debug/union_find quick-find < data/tinyUF.txt
there are 2 components

$ wget -qO - https://algs4.cs.princeton.edu/15uf/largeUF.txt | ./target/debug/union_find weighted-quick-union
there are 6 components

$ ./target/debug/union_find --debug quick-find <<EOF
5
1 0
2 1
3 4
EOF
p: 1, q: 0, Components: 4
  0  1  2  3  4
  0  0  2  3  4
p: 2, q: 1, Components: 3
  0  1  2  3  4
  0  0  0  3  4
p: 3, q: 4, Components: 2
  0  1  2  3  4
  0  0  0  4  4
there are 2 components

Summary

Algorithmunionfind
Quick findN1
Quick uniontree heighttree height
Weighted quick union
Weighted quick union with path compression1see note1see note
1

The amortized cost per operation for this algorithm is known to be bounded by a function known as the inverse Ackermann function and is less than 5 for any conceivable value of n that arises in practice.

Quick find

Quick find, as its name implies, favors having a quick find over union. The data structure keeps an ids element indexed vector used to determine when two elements are in the same component. There is also a count variable to keep the total of connected components:

#[derive(Debug)]
/// QuickFind data structure for solving the dynamic connectivity problem.
pub struct QuickFind {
    /// Keeps the number of connected components
    count: usize,
    /// An element-indexed vector to represent the components. The name of one of the elements in a
    /// component will be used as the component identifier.
    ids: Vec<usize>,
}

Two elements p and q are connected iff ids[p] == ids[q] as we can see in the following code the find implementation is quick because it just returns the element value in the ids vector and the union needs to scan the whole ids vector to properly update all references to the p component with the q one, for each input pair meaning quadratic-time processing:

impl UnionFind for QuickFind {
    /// Merges two components if the two elements are in different components.
    fn union(&mut self, p: usize, q: usize) {
        let p_at = self.find(p);
        let q_at = self.find(q);

        if p_at != q_at {
            for i in &mut self.ids {
                if *i == p_at {
                    *i = q_at;
                }
            }
            self.count -= 1;
        }
    }

    /// Returns the component identifier for a given element.
    fn find(&mut self, i: usize) -> usize {
        self.ids[i]
    }

    /// Returns the number of components.
    fn components(&self) -> usize {
        self.count
    }
}

The details of how the algorithm process looks like is described in its unit test:

#[test]
fn test_quick_find() {
    let mut alg = QuickFind::new(10);

    // Initial state: all elements roots are themselves and their root sizes are 1. The total
    // components are 10.
    let mut expected_find_pos = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
    //
    assert_eq!(alg.components(), 10);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 4 and 3, both elements have different entry values, 4 and 3 respectively, so they are
    // in different components and can be connected so the algorithms changes all references to
    // p(4) to q(3). The total components are 9.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 3, 3, 5, 6, 7, 8, 9];
    //                            ^  ^
    alg.union(4, 3);
    assert_eq!(alg.components(), 9);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 3 and 8, their entry values are 3 and 8 so all references to p(3) which is 3 are
    // changed to the value of q(8) which is 8. The total components are 8.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 8, 8, 5, 6, 7, 8, 9];
    //                            ^  ^           ^
    alg.union(3, 8);
    assert_eq!(alg.components(), 8);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 6 and 5, their entry values are 6 and 5 so all references to p(6) are changed to the
    // value of q(5) which is 5. The total components are 7.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 8, 8, 5, 5, 7, 8, 9];
    //                                  ^  ^
    alg.union(6, 5);
    assert_eq!(alg.components(), 7);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 9 and 4, their entry values are 9 and 8 so all references to p(9) are changed to the
    // value of q(4) which is 8. The total components are 6.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 8, 8, 5, 5, 7, 8, 8];
    //                               ^              ^
    alg.union(9, 4);
    assert_eq!(alg.components(), 6);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 2 and 1, their entry values are 2 and 1 so all references to p(2) are changed to the
    // value of q(1) which is 1. The total components are 5.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 1, 8, 8, 5, 5, 7, 8, 8];
    //                      ^  ^
    alg.union(2, 1);
    assert_eq!(alg.components(), 5);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 8 and 9, their entry values are the same so nothing else is needed.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 1, 8, 8, 5, 5, 7, 8, 8];
    //                                           ^  ^
    alg.union(8, 9);
    assert_eq!(alg.components(), 5);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 5 and 0, their entry values are 5 and 0 so all references to p(5) are changed to the
    // value of q(0) which is 0. The total components are 4.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 1, 8, 8, 0, 0, 7, 8, 8];
    //                   ^              ^  ^
    alg.union(5, 0);
    assert_eq!(alg.components(), 4);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 7 and 2, their entry values are 7 and 1 so all references to p(7) are changed to the
    // value of q(2) which is 1. The total components are 3.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 1, 8, 8, 0, 0, 1, 8, 8];
    //                         ^              ^
    alg.union(7, 2);
    assert_eq!(alg.components(), 3);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 6 and 1, their entry values are 0 and 1 so all references to p(6) which is 0 are
    // changed to the value of q(1) which is 1. The total components are 2.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [1, 1, 1, 8, 8, 1, 1, 1, 8, 8];
    //                   ^  ^           ^  ^
    alg.union(6, 1);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 1 and 0, their entry values are the same 1 so nothing else is needed.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [1, 1, 1, 8, 8, 1, 1, 1, 8, 8];
    //                   ^  ^
    alg.union(1, 0);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 6 and 7, like in the last iteration their roots are the same 1 so nothing else is
    // needed.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [1, 1, 1, 8, 8, 1, 1, 1, 8, 8];
    //                                     ^  ^
    alg.union(6, 7);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }
}

The full source code can be found here.

Quick union

Quick union, at the contrary of quick find, favors having a quick union over find. It uses the same data structure as Quick find speeding up the union function. Each ids entry will be the name of another element in the same component (also possibly itself).

The find function will follow each ids value until reaching the root, which is the element that links to itself. Two elements are in the same component if they share the same root. The union function will just obtain the roots from the two components passed as arguments and, if they are different, it will link them assigning one of the roots ancestor the other one.

impl UnionFind for QuickUnion {
    /// Merges two components if the two elements are in different components.
    fn union(&mut self, p: usize, q: usize) {
        let p_root = self.find(p);
        let q_root = self.find(q);

        if p_root != q_root {
            self.ids[p_root] = q_root;
            self.count -= 1;
        }
    }

    /// Returns the component identifier for a given element.
    fn find(&mut self, mut i: usize) -> usize {
        while i != self.ids[i] {
            i = self.ids[i];
        }

        i
    }

    /// Returns the number of components.
    fn components(&self) -> usize {
        self.count
    }
}

The details of how the algorithm process looks like is described in its unit test:

#[test]
fn test_quick_union() {
    let mut alg = QuickUnion::new(10);

    // Initial state: all elements roots are themselves and their root sizes are 1. The total
    // components are 10.
    let mut expected_find_pos = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
    //
    assert_eq!(alg.components(), 10);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 4 and 3, both elements have different roots, 4 and 3 respectively so they are in
    // different components and can be joined so the root of p(4) is updated to the root of q(3)
    // and the total components are 9.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 3, 3, 5, 6, 7, 8, 9];
    //                            ^  ^
    alg.union(4, 3);
    assert_eq!(alg.components(), 9);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 3 and 8, their roots are 3 and 8 respectively so like in the previous iteration the
    // root of p(3) is updated to the root of q(8). The total components are 8.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 8, 3, 5, 6, 7, 8, 9];
    //                            ^              ^
    alg.union(3, 8);
    assert_eq!(alg.components(), 8);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 6 and 5, their roots are 6 and 5 respectively so the root of p(6) is updated to the
    // root of q(5) which is 5. The total components are 7.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 8, 3, 5, 5, 7, 8, 9];
    //                                  ^  ^
    alg.union(6, 5);
    assert_eq!(alg.components(), 7);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 9 and 4, their roots are 9 and 8 respectively so the root of p(9) is udpated to the
    // root of q(4) which is 8. The total components are 6.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 8, 3, 5, 5, 7, 8, 8];
    //                            ^  ^              ^
    alg.union(9, 4);
    assert_eq!(alg.components(), 6);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 2 and 1, their roots are 2 and 1 respectively so the root of p(2) is updated to the
    // root of q(1) which is 1. The total components are 5.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 1, 8, 3, 5, 5, 7, 8, 8];
    //                      ^  ^
    alg.union(2, 1);
    assert_eq!(alg.components(), 5);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 8 and 9, their roots are the same so nothing else is needed.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 1, 8, 3, 5, 5, 7, 8, 8];
    //                                           ^  ^
    alg.union(8, 9);
    assert_eq!(alg.components(), 5);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 5 and 0, their roots are 5 and 0 respectively so the root of p(5) is updated to the
    // root of q(0) which is 0. The total components are 4.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 1, 8, 3, 0, 5, 7, 8, 8];
    //                   ^              ^
    alg.union(5, 0);
    assert_eq!(alg.components(), 4);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 7 and 2, their roots are 7 and 1 respectively so the root of p(7) is updated to the
    // root of q(2) which is 1. The total components are 3.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 1, 8, 3, 0, 5, 1, 8, 8];
    //                         ^              ^
    alg.union(7, 2);
    assert_eq!(alg.components(), 3);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 6 and 1, their roots are 0 and 1 respectively so the root of p(6), which is 0, is
    // updated to the root of q(1) which is 1. The total components are 2.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [1, 1, 1, 8, 3, 0, 5, 1, 8, 8];
    //                   ^  ^           ^  ^
    alg.union(6, 1);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 1 and 0, their roots are the same 1 so nothing else is needed.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [1, 1, 1, 8, 3, 0, 5, 1, 8, 8];
    //                   ^  ^
    alg.union(1, 0);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 6 and 7, like in the last iteration their roots are the same 1 so nothing else is
    // needed.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [1, 1, 1, 8, 3, 0, 5, 1, 8, 8];
    //                   ^  ^           ^  ^  ^
    alg.union(6, 7);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }
}

Although, initially, quick union seems faster than quick find because it does not need to completely traverse the entire vector on each union, it is not guaranteed so. There are worst case scenarios where the complexity is still quadratic N².

For example, having input pairs in the form of: 0 1, 0 2, 0 3, etc. we will end up with a tree after N-1 of such pairs with a height of N-1, with 0 linked to 1, 1 to 2 and so forth. The number of accesses to the vector for the 0 i union call is 2i + 2 (the element 0 is at depth i and the element i is at depth 0) thus the total number of vector accesses of the find function for these N pairs is: 1(1 + 2 + ... + N) which is ~N².

The full source code can be found here.

Weighted quick union

The weighted quick union, is a quick union variant that keeps track of the size of each tree so it can always connect the smaller tree to the larger one so avoiding worst cases. The tree height is substantially smaller guaranteeing logarithmic performance.

The algorithm introduces a new element-indexed vector to hold the total nodes count (szs) which is used by the union function to link the root of the smaller tree size to the root of the larger one:

#[derive(Debug)]
/// WeightedQuickUnion data structure for solving the dynamic connectivity problem.
pub struct WeightedQuickUnion {
    /// Keeps the number of connected components
    count: usize,
    /// An element-indexed vector to represent the components. The name of one of the elements in a
    /// component will be used as the component identifier.
    ids: Vec<usize>,
    /// An element-indexed vector to hold node counts.
    szs: Vec<usize>,
}

The find function remains the same but now the union function checks the sizes of the root elements in order to connect the smaller tree to the larger one:

impl UnionFind for WeightedQuickUnion {
    /// Merges two components if the two elements are in different components.
    fn union(&mut self, p: usize, q: usize) {
        let i = self.find(p);
        let j = self.find(q);

        if i != j {
            if self.szs[i] < self.szs[j] {
                self.ids[i] = j;
                self.szs[j] += self.szs[i];
            } else {
                self.ids[j] = i;
                self.szs[i] += self.szs[j];
            }
            self.count -= 1;
        }
    }

    /// Returns the component identifier for a given element.
    fn find(&mut self, mut i: usize) -> usize {
        while i != self.ids[i] {
            i = self.ids[i];
        }

        i
    }

    /// Returns the number of components.
    fn components(&self) -> usize {
        self.count
    }
}

The size of a tree is the number of nodes. The depth of a node tree is the number of hops to its root and the height of a tree is the maximum depth among its nodes.

Even for the worst cases, where the sizes of the trees to union are the equal the performance of the algorithm is logarithmic. The tree structure has a property that the depth of any node is at most . A detailed explanation of the algorithm process can be found in the unit test:

#[test]
fn test_weighted_quick_union() {
    let mut alg = WeightedQuickUnion::new(10);

    // Initial state: all elements roots are themselves and their root sizes are 1. The total
    // components are 10.
    let mut expected_find_pos = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
    //
    //                   sizes: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
    assert_eq!(alg.components(), 10);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 4 and 3, Both elements are roots and have the same size so q(3) is connected to the
    // root of p(4) and its size is added to the size of the root of p. The total components are 9.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 4, 4, 5, 6, 7, 8, 9];
    //                            ^  ^
    //           sizes: [1, 1, 1, 1, 2, 1, 1, 1, 1, 1]
    //                               ^
    alg.union(4, 3);
    assert_eq!(alg.components(), 9);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 3 and 8, their roots are 4 and 8 respectively and their root sizes are 2 and 1, the
    // size of the root of p(4) is greater than the size of the root of q (8) so the root of q(8)
    // gets connected to the root of p(4) and its size gets added to the size of the root of p(4).
    // The total components are 8.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 4, 4, 5, 6, 7, 4, 9];
    //                            ^              ^
    //           sizes: [1, 1, 1, 1, 3, 1, 1, 1, 1, 1]
    //                               ^
    alg.union(3, 8);
    assert_eq!(alg.components(), 8);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 6 and 5, their roots are themselves and their root sizes is 1, so q(5) is connected
    // to p(6) and the size of 6 is incremented by the size of 5. The total components are 7.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 4, 4, 6, 6, 7, 4, 9];
    //                                  ^  ^
    //           sizes: [1, 1, 1, 1, 3, 1, 2, 1, 1, 1]
    //                                     ^
    alg.union(6, 5);
    assert_eq!(alg.components(), 7);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 9 and 4, their roots are themselves and their root sizes are 1 and 3 respectively so
    // the root of p(9) is connected to the root of q(4) and its size is incremented by the size of
    // the root of p(9). The total components are 6.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 1, 2, 4, 4, 6, 6, 7, 4, 4];
    //                               ^              ^
    //           sizes: [1, 1, 1, 1, 4, 1, 2, 1, 1, 1]
    //                               ^
    alg.union(9, 4);
    assert_eq!(alg.components(), 6);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 2 and 1, their roots are themselves and their root sizes are 1 so q(1) is connected to
    // p(2) and the its size is incremented by one. The total components are 5.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 2, 2, 4, 4, 6, 6, 7, 4, 4];
    //                      ^  ^
    //           sizes: [1, 1, 2, 1, 4, 1, 2, 1, 1, 1]
    //                         ^
    alg.union(2, 1);
    assert_eq!(alg.components(), 5);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 8 and 9, their roots are the same: 4 so nothing else is needed.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [0, 2, 2, 4, 4, 6, 6, 7, 4, 4];
    //                                           ^  ^
    //           sizes: [1, 1, 2, 1, 4, 1, 2, 1, 1, 1]
    //
    alg.union(8, 9);
    assert_eq!(alg.components(), 5);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 5 and 0, their roots are 6 and 0 respectively and their root sizes 2 and 1 so the root
    // of q(0) is connected to the root of p(6) and its size is incremented by the size of the root
    // of q which is 1. The total components are 4.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [6, 2, 2, 4, 4, 6, 6, 7, 4, 4];
    //                   ^              ^
    //           sizes: [1, 1, 2, 1, 4, 1, 3, 1, 1, 1]
    //                                     ^
    alg.union(5, 0);
    assert_eq!(alg.components(), 4);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 7 and 2, both are roots but their sizes are 1 and 2 respectively so the root of p(7)
    // is connected to q(2) and its size is incremented by the size of the root of p. The total
    // components are 3.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [6, 2, 2, 4, 4, 6, 6, 2, 4, 4];
    //                         ^              ^
    //           sizes: [1, 1, 3, 1, 4, 1, 3, 1, 1, 1]
    //                         ^
    alg.union(7, 2);
    assert_eq!(alg.components(), 3);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 6 and 1, their roots are 6 and 2 and their root sizes are the same so the
    // root of q(1) which is 2 is connected to the root of p(6) which is 6 and its size is
    // incremented after summing up the size of the root of q(1) which is 3. The total components
    // are 2.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [6, 2, 6, 4, 4, 6, 6, 2, 4, 4];
    //                      ^  ^           ^
    //           sizes: [1, 1, 3, 1, 4, 1, 6, 1, 1, 1]
    //                                     ^
    alg.union(6, 1);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 1 and 0, their roots are the same 6 so nothing else is needed.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [6, 2, 6, 4, 4, 6, 6, 2, 4, 4];
    //                   ^  ^
    //           sizes: [1, 1, 3, 1, 4, 1, 6, 1, 1, 1]
    //
    alg.union(1, 0);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 6 and 7, like in the last iteration their roots are the same 6 so nothing else is
    // needed.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [6, 2, 6, 4, 4, 6, 6, 2, 4, 4];
    //                                     ^  ^
    //           sizes: [1, 1, 3, 1, 4, 1, 6, 1, 1, 1]
    alg.union(6, 7);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }
}

The full source code can be found here.

Weighted quick union with path compression

The last implementation is called weighted quick union with path compression which is the result of studying for looking for an algorithm that has guaranteed constant time. The idea of the algorithm is to have every node to link directly to the root of its tree without paying the cost of changing a large number of links as we did for the quick find algorithm.

The approach is to do it just for the nodes that we do examine. To do so we just add another loop to the find function that sets the ids entry corresponding to each node encountered along the way to link directly to the root, resulting on almost flattening the trees completely.

impl UnionFind for WeightedQuickUnionWithPathCompression {
    /// Merges two components if the two elements are in different components.
    fn union(&mut self, p: usize, q: usize) {
        let root_p = self.find(p);
        let root_q = self.find(q);

        if root_p != root_q {
            if self.szs[root_p] < self.szs[root_q] {
                self.ids[root_p] = root_q;
                self.szs[root_q] += self.szs[root_p];
            } else {
                self.ids[root_q] = root_p;
                self.szs[root_p] += self.szs[root_q];
            }
            self.count -= 1;
        }
    }

    /// Returns the component identifier for a given element.
    fn find(&mut self, mut i: usize) -> usize {
        let mut root = i;

        while root != self.ids[root] {
            root = self.ids[root];
        }

        while i != root {
            let newi = self.ids[i];
            self.ids[i] = root;
            i = newi;
        }

        root
    }

    /// Returns the number of components.
    fn components(&self) -> usize {
        self.count
    }
}

Weighted quick union with path compression is optimal but not quite constant time per operation, no algorithm can guarantee to perform each union find operation in amortized constant time. A detailed explanation of the process can be seen in the unit test but here we just show the section where the path compression takes place:

    // Union 6 and 1, their roots are 6 and 2 and their root sizes are the same so the
    // root of q(1) which is 2 is connected to the root of p(6) which is 6 and its size is
    // incremented after summing up the size of the root of q(1) which is 3. The total components
    // are 2.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [6, 2, 6, 4, 4, 6, 6, 2, 4, 4];
    //                      ^  ^           ^
    //           sizes: [1, 1, 3, 1, 4, 1, 6, 1, 1, 1]
    //                                     ^
    alg.union(6, 1);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 1 and 0, their roots are the same 6, but notice how the path compression logic of find
    // when evaluating p(1) changed its root to the root of the tree it belongs, which is 6. As the
    // roots are the same nothing else is needed.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [6, 6, 6, 4, 4, 6, 6, 2, 4, 4];
    //                   ^  ^
    //           sizes: [1, 1, 3, 1, 4, 1, 6, 1, 1, 1]
    //
    alg.union(1, 0);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

    // Union 6 and 7, like in the last iteration their roots are the same 6 so nothing else is
    // needed but the find logic updates the root of q to 6.
    //                   0  1  2  3  4  5  6  7  8  9
    expected_find_pos = [6, 6, 6, 4, 4, 6, 6, 6, 4, 4];
    //                                     ^  ^
    //           sizes: [1, 1, 3, 1, 4, 1, 6, 1, 1, 1]
    alg.union(6, 7);
    assert_eq!(alg.components(), 2);
    for (i, expected_find) in expected_find_pos.into_iter().enumerate() {
        assert_eq!(alg.ids[i], expected_find, "i={}", i);
    }

The full source code can be found here.

Sorting algorithms

Well sorting, you know? the process of rearranging an array of objects by its key. The elements must be comparable so satisfying total ordering:

  • reflexive: for all v, v=v.
  • antisymmetric: for all v and w, if v<w then w>v, and if v=w then w=v.
  • transitive: for all v,w and x, if v<=w and w<=x then v<=x.

We're going to visit the following algorithms:

Selection sort

Selection sort is a straightforward, iterative algorithm that repeatedly identifies and relocates the smallest unsorted element within a portion of the array, moving it to the front of the remaining unsorted section until the entire array is in order:

use super::AlgorithmState;

/// Sorts the slice using selection sort algorithm and the comparison function `is_less` passed as
/// argument.
fn selection_sort<T, F>(v: &mut [T], mut is_less: F)
where
    F: FnMut(&T, &T) -> bool,
{
    for i in 0..v.len() {
        let mut k = i;
        for j in i + 1..v.len() {
            if is_less(&v[j], &v[k]) {
                k = j;
            }
        }
        v.swap(i, k);

Is easy to see from the code that the algorithm for each from to there is only one swap and comparisons, so the total is:

Which is of complexity in terms of number of comparisons. Here is an animated image of the visualization of how the algorithm works:

The full source code can be found here

Insertion sort

Insertion sort is a simple sorting algorithm that iterates over the slice moving back the elements larger than the previous ones, keeping the elements at the left of the current index ordered but not in their final position.

use super::AlgorithmState;

/// Sorts the slice using insertion sort algorithm and the comparison function `is_less` passed as
/// argument.
fn insertion_sort<T, F>(v: &mut [T], mut is_less: F)
where
    F: FnMut(&T, &T) -> bool,
{
    for i in 0..v.len() {
        let mut j = i;
        while j >= 1 && is_less(&v[j], &v[j - 1]) {
            v.swap(j, j - 1);
            j -= 1;
        }
    }
}

In terms of comparisons the algorithm performs in the best case (slice already sorted) and for the worst case (slice sorted in reverse order) each element must be compared with all the previous elements so the total number of comparisons can be summed up:

That simplifies to .

For exchanges, in the best case no exchanges are needed and in the worst case each insertion requires moving all the previous elements to the right this the total number of exchanges is similar to the number of comparisons in the worst case: .

Here is an animation of its execution:

The full source code can be found here

Shell sort

Shell sort is an extension of insertion sort because it permits to exchange elements of the array that are far apart. It does it sorting elements at specific gaps which produces partially sorted slices that can be eventually sorted by insertion sort.

It works as follows:

  1. Choose a sequence of gaps: This is the most difficult part of the Shellsort algorithm. Any sequence that contains 1 yields a correct sort because it makes the final pass an ordinary insertion sort.
  2. Sort the gap sub arrays: Each element in the gap sequence generates a sub array of elements spaced by that gap value. This step sorts the sub array using insertion sort generating a partially sorted general array.
  3. Repeat for smaller gaps: Continue the process with the rest of the elements of the gap sequence until the final pass which is a regular insertion sort with a gap of 1. At that stage the array is close to being sorted which is where the insertion sort is more efficient.

There has been proposed different gap sequences such as:

OEISGeneral termGap sequenceTime complexityAuthor and date
, Donald Shell in 1959
A0034621Donald Knuth in 1973
A036562Sedgewick in 1982
A102549Unknown2UnknownMarcin Ciura in 2001
1

not greater than

2

experimentally derived

The implementation admits passing one of the previous sequences:

/// Sorts the slice using shell sort algorithm and the comparison function `is_less` passed as
/// argument. The decreasing interval sequence is also passed as argument.
fn shell_sort<T, F, S>(v: &mut [T], mut is_less: F, seq: S)
where
    F: FnMut(&T, &T) -> bool,
    S: Iterator<Item = usize>,
{
    for h in seq {
        let mut i = h;
        while i < v.len() {
            let mut j = i;
            while j >= h && is_less(&v[j], &v[j - h]) {
                v.swap(j, j - h);
                j -= h;
            }
            i += 1;
        }
    }
}

The average algorithm performance depends on the gap sequence used. Using the bench cli tool on a M1 processor executing 100 iterations the results obtained are as follows:

sequence1000500010000500001000001000000
shell.rs91.306µs471.897µs935.834µs5.626ms11.333ms159.381ms
knuth.rs79.847µs423.083µs813.259µs4.575ms10.115ms146.414ms
sedgewick.rs78.781µs395.065µs732.768µs3.996ms8.484ms106.259ms
ciura.rs84.742µs430.146µs811.698µs4.318ms9.784ms309.116ms

See the algorithm in action using the Knuth sequence:

The full source code can be found here