As we discussed in the last article, sorting is the most frequent activity a computer performs, and also happens to be the most thoroughly-studied activity by computer scientists. Over the next few articles, we’re going to explore the tried-and-true sorting algorithms have that defined the field throughout the years, introducing relevant advanced data structures along the way.

This article is part of the Data Structures and Algorithms Series following the course The Analysis of Algorithms. If you missed the previous article, check that out first as well as a copy of The Algorithm Design Manual.

## Daily Problem Explained

Give an efficient algorithm to determine whether two sets (of size `m` and `n`) are disjoint. Analyze the complexity of your algorithm in terms of `m` and `n`. Be sure to consider the case where `m` is substantially smaller than `n`.

This is just another exercise in writing pseudocode; a great problem for a programming interview! The only real catch here is you have to know what a disjoint set is. A disjoint set is really just two collections that have no items in common. An example for this problem would be `m = [1,2,3], n = [4,5,6]`.

Let’s just start throwing stuff out there and see what sticks. My first thought is that when you combine set of two disjoint sets is of length `m + n`, since no duplicates exist. That’s pretty important because if your final set `S` is such that `S.length < m.length + n.length` you fail the disjoint part of the problem. Can we exploit that? Here’s what I’m thinking:

``````function isDisjoint(m, n) {
let finalSet = m;
for (let item in n) {
if (!finalSet.includes(item)) finalSet.push(item);
}
return finalSet.length === (m.length + n.length);
}
``````

On first glance you only see 1 `for` loop so it looks like it’s `O(n)` which is fast! But then we quickly realize the absolute fastest we can go is `O(n+m)` since we have to evaluate all items to duplication, so something is up. T

The key here is that `includes` has to inspect the entire `m` list for each `n` items. That leaves us with an algorithm of `O(mn)`. It’s not bad but it’s not great either; remember, that’s roughly equivalent to `O(n^2)` (assuming `m` and `n` were the same length) which is quadratic. Quadratic functions are fine for relatively small data sets, but as we noted in our article on Big O Notation, once we go above a million items in our dataset, the algorithm will be too slow. Can we improve on this?

One thing I quickly realized is we can leverage what we’ve learned so far as a hint on how to solve this. What data structures have we been studying carefully? Which of those are praised for their performance?

Then it hit me: we can use a hash table to keep track of the number of times we’ve touched an item and if any already exist we know right away the set is not disjoint and we can exit early.

Adding to and searching from a hash table takes an expected `O(1)` time, `O(n)` at worst. That means we can hit that magical runtime of `O(m+n)` in the worst-case using a hash table!

``````function isDisjoint(m, n) {
let finalSet = new HashTable();
for (let item in m) {
finalSet.insert(item);
}
for (let item in n) {
if (finalSet.search(item)) {
return false;
} else {
finalSet.insert(item);
}
}
return true;
}
``````

## Standard Selection Sort in JavaScript

The first sorting algorithm we’re going to look at is selection sort. Selection sort’s algorithm is very simple: continue to take the smallest item from one array a place it into a new array. By virtue of taking the smallest item from a recursively-shrinking array, you’re left with a new array that is sorted. A naive implementation using standard JavaScript library functions looks like this:

``````function selectionSort(arr) {
let sorted = [], min;
for (let i = 0, n = arr.length; i < n; i++) {
min = Math.min(...arr);
sorted[i] = min;
arr.pop(min);
}
return sorted;
}
``````

As we can see from the algorithm above, selection sort performs `n` iterations, but each iteration will take roughly `n/2` steps in order to find the minimum value (we know this because finding a minimum value takes at worst `O(n)` but our problem space is being reduced with each successive iteration), giving us a grand total worst-case runtime of `O(n^2)`. Is there a way we can improve on this runtime?

## Selection Sort + Priority Queues = Heapsort

Look back to our previous algorithm - what are the primary operations for selection sort? Each iteration consists of a `min()`, an `insert()`, and a `delete()`. There is, in fact, a special data structure that combines `delete()` with `min()` to efficiently extract the minimum value from its collection. This data structure is called a priority queue.

### Priority queues

Priority queues are flexible data structures that allow for elements to enter the queue at arbitrary intervals while exiting in a sorted fashion. Priority queues support three primary operations: insert, findMin (or findMax), and extractMin (or extractMax). For the purposes of simplicity going forward, we’re going to focus on priority queues that implement the minimum rather than the maximum value.

A summary of their worst-case runtimes can be found below. As an exercise: can you explain how these values are calculated? Here’s a hint: `findMin()` is always constant-time complexity because you can always use a variable to keep track of the minimum value as new values are inserted.

Data structure / method `findMin()` `extractMin()` `insert()`
Unsorted array `O(1)` `O(n)` `O(1)`
Sorted array `O(1)` `O(1)` `O(n)`
BST `O(1)` `O(log n)` `O(log n)`

### Implementing `extractMin()` and `insert()` with heaps

As I mentioned earlier, there is a special data structure that can efficiently handle the primary operations of selection sort. This data structure is known as a heap. Heaps are a variant of a priority queues that keep a rough track of the order of its items. This ordering is smart enough to keep track of the minimum/maximum value, but not a complete sort which would significantly slow down the data structure.

To implement a heap, we are effectively creating a binary tree without worrying about pointers, or in other words an array of keys where the index of the array item _is the pointer_. The root of the tree is the first item in the array (and also the minimum value), with its left and right children corresponding to the 2nd and 3rd values in the array. In general, it looks something like this:

``````class PriorityQueue {
const PRIORITY_QUEUE_SIZE = 500; //arbitrarily picked

constructor(arr, n) {
this.queue = new Array(PRIORITY_QUEUE_SIZE);
this.length = 0;

for (let i = 0; i < n; i++) {
this.insert(arr[i]);
}
}

parent() {
if (this.length === 0) return -1;
return Math.floor(this.length / 2);
}

child(n) {
return 2 * n;
}
}
``````

Remember: we don’t care about what’s inside of each slot in our priority queue, we just care about finding elements at specific locations. That is why we can calculate locations with just some multiplication and division rather than having to search by value. Because we’re not pointing to any specific values, the index of our elements is a reflection of the math we are performing, and when we perform basic operations like `*` and `/`, we know we’re dealing with fast computational operations in constant time.

The downside to a heap is that we can’t really do an efficient search in a heap. Because we’re only concerned about keeping the smallest item at the root, everything else is arbitrarily packed into the tree. The tree items are packed to maintain space efficiency, but remember this isn’t a binary search tree, just a binary tree (i.e. each node has two children).

#### Insertion

As we mentioned earlier, save for the smallest item, the heap is just packed in from the top down, filling in from left to right before moving down to the next layer of the tree. This keeps the heap efficiently packed. To ensure the minimum-dominance relationship is maintained, we have to make sure we can swap up the tree as far as necessary to satisfy our minimum constraints.

``````class PriorityQueue {
// ...

insert(item) {
if (this.length >= PRIORITY_QUEUE_SIZE) {
throw new Error("Priority queue overflow");
} else {
this.length += 1;
this.queue[this.length] = item;
this.bubbleUp(this.length);
}
}

bubbleUp(parent) {
if (this.parent(parent) === -1) return;

let currentParrent = this.queue[this.parent()];
let parentContender = this.queue[parent];

if (currentParrent > parentContender) {
[currentParent, parentContender] = [parentContender, currentParrent];
this.bubbleUp(parent);
}
}
}
``````

Analyzing the above two functions, we see that `insert()` has no loops and emulates a tightly-packed binary tree. We know the height of a binary tree is `lg n` (shorthand for `log n` of base 2), so our worst-case runtime is `O(log n)`.

The `bubbleUp()` has the primary function of swapping elements (remember from ES6 the swap appears as `[x,y] = [y,x]`) which takes constant time. Therefore, to create a heap with `n` items it will take `O(n log n)` to insert and construct.

#### Extracting the minimum value

Finding the minimum value is easy, since it is the root of the tree. Actually pulling it out of the array is the hard part, since this leaves the tree broken in two at the root. To handle this, we’ll need to re-assign the right child of the root’s children to be the new root.

Why the right-most element? Because we pack from left to right, so we want to unfurl our array in a clean manner from the opposite direction. If the value isn’t correct in the minimum-dominance relationship, we can always bubble down the array (or “heapify”) to swap out for the correct minimum.

``````class PriorityQueue {
// ...

extractMin() {
let min = -1;

if (this.length <= 0) throw new Error("No minimum: Empty priority queue");

min = this.queue;
this.queue = this.queue[this.length];
this.length -= 1;
bubbleDown(1);

return min;
}

bubbleDown(parent) {
let childIndex = child(parent), minChild = parent;

for (let i = 0; i <= 1; i++) {
if (childIndex <= this.length) {
if (this.queue[minChild] > this.queue[childIndex + i]) {
minChild = childIndex + i;
}
}
}

if (minChild !== parent) {
[minChild, parent] = [parent, minChild];
bubbleDown(minChild);
}
}
}
``````

Similarly for extracting the minimum value, bubbling down only goes down a tree height of `lg n`, so at worst finding and removing the minimum value will take `O(log n)` time.

## Selection Sort using Priority Queues as Heaps

Putting it all together, we now have our optimized selection sort, known as heapsort, which combines everything we’ve learned here into an efficient extraction machine:

``````function heapsort(arr, n) {
let heap = new Heap(arr, n);

for (let i = 0; i < n; i++) {
arr[i] = heap.extractMin();
}

return arr;
}
``````

How elegant! And the best part: since our priority queue’s operations run in `O(log n)` time, over a set of `n` elements, heapsort improves on selection sort from `O(n^2)` to `O(n log n)`, the gold standard for sorting algorithm time efficiency!

Even better, we did it in-place, meaning we didn’t have to instantiate a second array to copy the elements over. We were able to use the same array to swap elements and create our sorted list of items.

Overall, this meaps heapsort an excellent algorithm for both its space and time efficiency.

Now there is a slightly more advanced way of designing the constructor such that we populate the first `n` positions in heap directly from our input array (meaning we’ve created a one-sided heap) and then use a bunch of `bubbleDown()` calls (`n/2` to be exact since half of the items are already on the correct side of their parents) to balance out the array and find our minimum. This speeds up construction from `O(n log n)` to `O(n)` on average, but since construction is not the main operation of a heap, and the worst-case operations still take `O(n log n)`, this isn’t too big of a win, but you’re welcome to implement it for yourself if you’re looking for an extra exercise in algorithm design.

## Wrapping up with the next Daily Problem

We’ve covered a lot with selection sort, priority queues, and heapsort. We’ve seen we can pretty easily cover sorting in `O(n^2)` time, but with a few optimizations, we can improve that time to the optimal `O(n log n)`. This will be useful in tackling today’s Daily Problem:

You are given a collection of `n` bolts of different widths, and `n` corresponding nuts. You can test whether a given nut and bolt together, from which you learn whether the nut is too large, too small, or an exact match for the bolt. The differences in size between pairs of nuts or bolts can be too small to see by eye, so you cannot rely on comparing the sizes of two nuts or two bolts directly. You are to match each bolt to each nut.

1. Give an `O(n^2)` algorithm to solve the above problem.

2. Suppose that instead of matching all of the nuts and bolts, you wish to find the smallest bolt and its corresponding nut. Show that this can be done in only `2n − 2` comparisons.

3. Match the nuts and bolts in expected `O(n log n)` time.

Think you have an answer? Create a gist and send it to me on Twitter and let’s compare notes!

## More problems to practice on

Now that homework 2 is out, here are a few problems that are relevant to the sorting we’ve discussed so far:

1. 4-1
2. 4-2
3. 4-5
4. 4-6
5. 4-12
6. 4-13
7. 4-14
8. 4-15