This page presents original sorting algorithms (+ variants) in Java designed by Arne Maus. All algorithms are downloadable and their usage are regulated by the BSD license (basically that their original author must always be credited whenever used). All code is accompanied by a pubished paper explaining its usage, performance and limitations. Disclaimer: Although all these algorithms are thoroughly tested, no guarantee is given that they work as intended in any application where they might be used. Assoc. Professor (em), Department of Informatics Univ of Oslo, Norway Previous head of research group: Programming and Software Engineering (PSE) Phone: +4797532110(mobile) E-mail: arnem (a) ifi.uio.no

## The algorithms

All code (in Java) have a small test program in 'main' so you can test the algorithms . These algorithms are, like almost all memory intensive algorithms, rather sensitive for the number of caches on the CPU and their speed relative to main memory; as demonstrated in this paper. All these algorithms operate on integer array with n non-negative numbers.
NEW 2018:All All Parallel Mergesort
A all parallel version of Mergesort that is all parallel in the sense that by recursive decent it is two-parallel in the top node, four-parallel on the next level in the recursion, then eight-parallel until we with p real cores have (at least) started one thread at a recursion levellevel for all the p cores. After the parallelization phase, each thread then uses sequential recursion merge sort with a new variant of insertion sort for sorting short subsections. ParaMerge is an improvement over traditional parallelization of the mergesort algorithm that follows the sequential algorithm and substitute recursive calls with the creation of parallel threads at the top levels of the recursion tree. This traditional parallel merge sort finally in the top node does a sequential merge of the twosorted halves of a[]. Only at the next level does a traditional approach go two-parallel,then four parallel on the next level, and so on. More details in the paper.
Main new features in algorithms:All parallel MergeSort.

1) ParaMerge, 2018,
Implemented with a second array of same length as the array to be sorted
On a 4 core Intel i5 CPU, ParaMerge Has a speedup of up to 4.9 over sequental mergesort, and is significantly faster then the textbook parallelization of Merge sort when the number of real cores (disregarding hyperthreaded cores) > 2 and n > 20000. .
Uses: A new version of Insertion sort and sequential MergeSort as a sub algorithms. Extra space used is an array of the same length as the array to be sorted. Stable.
Execution time: O(n log n).
Code Test Program: ParallelMergeTest.java
Code,simplyfied test: ParallelMergeMethod.java
Paper: A faster, all parallel Merge sort algorithm for multicore processors
[NIKĺ2018, Norwegian Informatics Conf. Svalbard, 2018. Tapir, www.nik.no]

2015:Full Parallel QuickSort (ParaQuick)
A full parallel version of Quicksort where there is no slow start with first sequential partition of the array a[] in two,
then a 2 parallel splitting into four parts,... ParaQuick starts with k =64 threads in full parallel Quicksorting of 1/64 part of a[] each, and then combining these parts to a valid 2-way split of a[]. Then each of these two parts can then with half of the threads be split in the same way,.., until a section only has one thread. Then a call to ordinary, sequential quicksort is made. It uses 64 threads even though your CPU might only have 4 or 8 cores. ParaQuick responses well to such 'overbooking' of threads to core. More details in the paper.
Main new features in algorithms:Full parallel Quicksort.

2) ParaQuick, 2015,
Implemented with a few supporting arrays of constant length: #threads (=64).
On a 4 core Intel i5 CPU, ParaQuick is a 1.5-3.5 times faster algorithm than both the dual split Java library Quicksort algorithm: Array.sort() and the usual parallelization of Quicksort, when the number of elements to be sorted > 500 000. Improvements has been made to the code so it performs somewhat better than the figures in the paper.
Uses: Insertion sort and sequential Quicksort as a sub algorithms. Constant and minimal extra space. Not stable
Execution time: O(n log n).
Uses: Code: FullParaQuickTest java
Paper: A full parallel Quicksort algorithm for multicore processors
[NIKĺ2015, Norwegian Informatics Conf. Alesund, 2015. Tapir, www.nik.no]

Two left (most significant digit) recursive decent Radix algorithms for sorting in ascending order an integer array with n non-negative numbers.
Let M be the largest element in the array, and b be the maximum number of bits in any sorting digit (tunable)
Main new features in algorithms:In place sorting with permutational cycles & dynamically sized radix (adjusting to distribution).

3a) Implemented with two supporting arrays of length 2**b.
The generally preferred unstable replacement for (the unstable) Quicksort.
Uses Insertsort as a sub algorithm.
Execution time: O(n log M) . Uses: O(2**b) extra space. Not stable
More than twice as fast as Quicksort (n > 1000).
Code: ARLSort java
Paper: ARL, a faster in-place, cache friendly sorting algorithm
[NIKĺ2002, Norwegian Informatics Conf. Kongsberg, 2002. Tapir, ISBN 82-91116-45-8. www.nik.no]

3b) Implemented with one supporting array of length 2**b.
Somewhat slower than 3a), but uses half the amount of extra space
Uses Insertsort as a sub algorithm.
Execution time: O(n log M) . Uses: O(2**b) extra space. Not stable
More than twice as fast as Quicksort (n > 1000).
Code: ARLSortN java
Paper: ARL, a faster in-place, cache friendly sorting algorithm
[NIKĺ2002, Norwegian Informatics Conf. Kongsberg, 2002. Tapir, ISBN 82-91116-45-8. www.nik.no]

implemented with two supporting arrays of length 2**b.
The generally preferred stable replacement for (the unstable) Quicksort.
Uses Insertsort and (on rare occasions) ordinary right radix as sub algorithms.
Execution time: O(n log M) . Uses: O(2**b) extra space. Stable
Almost twice as fast as Quicksort (n > 1000).
Code: ARLSortS.java
Paper: Making a fast unstable sorting algorithm stable
[NIKĺ2006, Norwegian Informatics Conf, Molde, 2006. Tapir, ISBN 82-519-2186-4. www.nik.no]
A full parallel version of algorithm 1a) above for a multicore CPU with k cores and one shared main memory, A pool of k threads is started by generating a sorting object (ParaSort p = new ParaSort();) To sort an array a, the user only calls the soring method within the sorting object p ( p.pARL(a);). The users does not create, start or stop threads - they see this as plain sequental code.
Main new features in algorithm: A full parallel algorithm, meaning that all the barrier synchronized sorting steps are of length < n/k, and there is only one sequential statement in the algorithm (Amdahl's law).

Implemented with k threads, and basically a set of supporting arrays with a total length = n + 2**b.
The fastest replacement for (the unstable) Quicksort. Very effective even on dual core CPUs
Uses Insertsort and sequential ARL as a sub algorithms.
Execution time: O(n log M/k) . Uses: O(n+ 2**b) extra space. Not stable
Faster then Quicksort and from 4 and up to 30 times as fast as Quicksort (n > 150000), depending on the number of cores.
Code: PARLSort java
Paper: A full parallel radix sorting algorithm for multicore processors
[NIKĺ2011, Norwegian Informatics Conf. Troms°, 2011. Tapir, ISBN 978-82-519-2843-4. www.nik.no]

Two Radix algorithms for sorting in ascending order an integer array with n non-negative numbers with the help of a potentially smaller buffer. One algorithm is the iterative right (least significant digit) radix algorithm and the other is the left recursive (most significant digit first ) radix sorting algorithm Both algorithms moves data back and forth between the array and a buffer of hopefully the same length as the array.to be sorted. However, if only a smaller buffer is available, these two algorithms avoids program abortion by the system ('no more heap space') by using a the largest buffer available and demonstrate how to modify the two Radix algorithms to use more passes to copy data back and forth between buffer and array - thus saving space by using somewhat more time.
Main new features in algorithms:Radix sorting with the help of a non-zero buffer of any size <= n. They also adapt the radix size to the distribution of numbers to be sorted (smaller radixes for a sparse distributions).

Recursive decent, left to right Radix sorting.
Implemented with two supporting arrays of length 2**b.
A possible stable replacement for (the unstable) Quicksort.
Execution time: O(n log M) . Uses: up to O(n) extra space, but survives on any extra space > 0. Stable
Approx. twice as Quicksort (n > 250)when buffer size is 100% of array length (n), faster when > 20%
The interface method is long (and stable) and handles also negative numbers.