Update: The code is now more C++
Most of the sorting algorithms that are used are generally comparison sort. It means that each element of the collection being sorted will be compared to see which one is the first one. A comparison must have a lower bound of Ω(n log n) comparisons. That is why there are no comparison-based sorting algorithm better than O(n log n).
On the other hand, there are also sorting algorithms that are performing better. This is the family of the integer sorting algorithms. These algorithms are using properties of integer to sort them without comparing them. They can be only be used to sort integers. Nevertheless, a hash function can be used to assign a unique integer to any value and so sort any value. All these algorithms are using extra space. There are several of these algorithms. In this article, we will see three of them and I will present an implementation in C++. At the end of the article, I will compare them to std::sort.
In the article, I will use n as the size of the array to sort and m as the max number that is permitted in the array.

Bin Sort

Bin Sort, or Bucket Sort, is a very simple algorithm that partition all the input numbers into a number of buckets. Then, all the buckets are outputted in order in the array, resulting in a sorting array. I decided to implement the simplest case of Bin Sort where each number goes in its own bucket, so there are m buckets.
The implementation is pretty straightforward:
```cpp
void binsort(std::vector&& A){
std::vector> B(MAX + 1);
for(std::size_t i = 0; i < SIZE; ++i){
B[A[i]].push_back(A[i]);
}
std::size_t current = 0;
for(std::size_t i = 0; i < MAX; ++i){
for(auto item : B[i]){
A[current++] = item;
}
}
}
```
B is the array of buckets. Each bucket is implemented as a std::vector. The algorithm starts by filling each buckets with the numbers from the input array. Then, it outputs them in order in the array.
This algorithm works in O(n + m) and requires O(m) extra memory. With these properties, it makes a very limited algorithm, because if you don't know the maximum number and you have to use the maximum number of the array type, you will have to allocate for instance 2^32 buckets. That won't be possible.

Couting Sort

An interesting fact about binsort is that each bucket contains only the same numbers. The size of the bucket would be enough. That is exactly what Counting Sort. It counts the number of times an element is present instead of the elements themselves. I will present two versions. The first one is a version using a secondary array and then copying again into the input array and the second one is an in-place sort.
```cpp
void counting_sort(std::vector&& A){
std::vector B(SIZE);
std::vector C(MAX);
for (std::size_t i = 0; i < SIZE; ++i){
++C[A[i]];
}
for (std::size_t i = 1; i <= MAX; ++i){
C[i] += C[i - 1];
}
for (long i = SIZE - 1; i >= 0; --i) {
B[C[A[i]] - 1] = A[i];
--C[A[i]];
}
for (std::size_t i = 0; i < SIZE; ++i){
A[i] = B[i];
}
}
```
The algorithm is also simple. It starts by counting the number of elements in each bucket. Then, it aggregates the number by summing them to obtain the position of the element in the final sorted array. Then, all the elements are copied in the temporary array. Finally, the temporary array is copied in the final array. This algorithms works in O(m + n) and requires O(m + n). This version is presented only because it is present in the literature. We can do much better by avoiding the temporary array and optimizing it a bit:
```cpp
void in_place_counting_sort(std::vector&& A){
std::vector C(MAX + 1);
for (std::size_t i = 0; i < SIZE; ++i){
++C[A[i]];
}
int current = 0;
for (std::size_t i = 0; i < MAX; ++i){
for(std::size_t j =0; j < C[i]; ++j){
A[current++] = i;
}
}
}
```
The temporary array is removed and the elements are directly written in the sorted array. The counts are not used directly as position, so there is no need to sum them. This version still works in O(m + n) but requires only O(m) extra memory. It is much faster than the previous version.

Radix Sort

The last version that I will discuss here is a Radix Sort. This algorithm sorts the number digit after digit in a specific radix. It is a form of bucket sort, where there is a bucket by digit. Like Counting Sort, only the counts are necessary. For example, if you use radix sort in base 10. It will first sort all the numbers by their first digit, then the second, .... It can work in any base and that is its force. With a well chosen base, it can be very powerful. Here, we will focus on radix that are in the form 2^r. These radix have good properties, we can use shifts and mask to perform division and modulo, making the algorithm much faster.
The implementation is a bit more complex than the other implementations:
```cpp
static const std::size_t digits = 2; //Digits
static const std::size_t r = 16; //Bits
static const std::size_t radix = 1 << r; //Bins
static const std::size_t mask = radix - 1;
void radix_sort(std::vector&& A){
std::vector B(SIZE);
std::vector cnt(radix);
for(std::size_t i = 0, shift = 0; i < digits; i++, shift += r){
for(std::size_t j = 0; j < radix; ++j){
cnt[j] = 0;
}
for(std::size_t j = 0; j < SIZE; ++j){
++cnt[(A[j] >> shift) && mask];
}
for(std::size_t j = 1; j < radix; ++j){
cnt[j] += cnt[j - 1];
}
for(long j = SIZE - 1; j >= 0; --j){
B[--cnt[(A[j] >> shift) && mask]] = A[j];
}
for(std::size_t j = 0; j < SIZE; ++j){
A[j] = B[j];
}
}
}
```
r indicates the power of two used as the radix (2^r). The mask is used to compute modulo faster. The algorithm repeats the steps for each digit. Here digits equals 2. It means that we support 2^32 values. A 32 bits value is sorted in two pass. The steps are very similar to counting sort. Each value of the digit is counted and then the counts are summed to give the position of the number. Finally, the numbers are put in order in the temporary array and copied into A.
This algorithm works in O(digits (m + radix)) and requires O(n + radix) extra memory. A very good thing is that the algorithm does not require space based on the maximum value, only based on the radix.

Results

It's time to compare the different implementations in terms of runtime. For each size, each version is tested 25 times on different random arrays. The arrays are the same for each algorithm. The number is the time necessary to sort the 25 arrays. The benchmark has been compiler with GCC 4.7.
The first test is made with very few duplicates (m = 10n).
Radix Sort comes to be the fastest in this case, twice faster as std::sort. In place counting sort has almost the same performance as std::sort. The other are performing worse.
The second test is made with few duplicates (m ~= n).
The numbers are impressive. In place counting sort is between 3-4 times faster than std::sort and radix sort is twice faster than std::sort ! Bin Sort does not performs very well and counting sort even if generally faster than std::sort does not scale very well.
Let's test with more duplicates (m = n / 2).
std::sort and radix sort performance does not change a lot but the other sort are performing better. In-place counting sort is still the leader with a higher margin.
Finally, with a lot of duplicates (m = n / 10).
Again, std::sort and radix sort performance are stable, but in-place counting is now ten times faster than std::sort !

Conclusion

To conclude, we have seen that these algorithms can outperforms std::sort by a high factor (10 times for In place Counting Sort when there m << n). If you have to sort integers, you should consider these two cases:

m > n or m is unknown : Use radix sort that is about twice faster than std::sort.

m << n : Use in place counting sort that can be much faster than std::sort.

I hope you found this article interesting. The implementation can be found on Github: https://github.com/wichtounet/articles/tree/master/src/linear_sorting