Copyright | (c) 2013-2015 Dan Doel 2015 Tim Baumann |
---|---|
Maintainer | Dan Doel <dan.doel@gmail.com> |
Stability | Experimental |
Portability | Non-portable (bang patterns) |
Safe Haskell | None |
Language | Haskell98 |
Timsort is a complex, adaptive, bottom-up merge sort. It is designed to minimize comparisons as much as possible, even at some cost in overhead. Thus, it may not be ideal for sorting simple primitive types, for which comparison is cheap. It may, however, be significantly faster for sorting arrays of complex values (strings would be an example, though an algorithm not based on comparison would probably be superior in that particular case).
For more information on the details of the algorithm, read on.
The first step of the algorithm is to identify runs of elements. These can either be non-decreasing or strictly decreasing sequences of elements in the input. Strictly decreasing sequences are used rather than non-increasing so that they can be easily reversed in place without the sort becoming unstable.
If the natural runs are too short, they are padded to a minimum value. The minimum is chosen based on the length of the array, and padded runs are put in order using insertion sort. The length of the minimum run size is determined as follows:
- If the length of the array is less than 64, the minimum size is the length of the array, and insertion sort is used for the entirety
- Otherwise, a value between 32 and 64 is chosen such that N/min is either equal to or just below a power of two. This avoids having a small chunk left over to merge into much larger chunks at the end.
This is accomplished by taking the the mininum to be the lowest six bits containing the highest set bit, and adding one if any other bits are set. For instance:
length: 00000000 00000000 00000000 00011011 = 25 min: 00000000 00000000 00000000 00011011 = 25
length: 00000000 11111100 00000000 00000000 = 63 * 2^18 min: 00000000 00000000 00000000 00111111 = 63
length: 00000000 11111100 00000000 00000001 = 63 * 2^18 + 1 min: 00000000 00000000 00000000 01000000 = 64
Once chunks can be produced, the next step is merging them. The indices of all runs are stored in a stack. When we identify a new run, we push it onto the stack. However, certain invariants are maintained of the stack entries. Namely:
if stk = _ :> z :> y :> x length x + length y < length z
if stk = _ :> y :> x length x < length y
This ensures that the chunks stored are decreasing, and that the chunk sizes follow something like the fibonacci sequence, ensuring there at most log-many chunks at any time. If pushing a new chunk on the stack would violate either of the invariants, we first perform a merge.
If length x + length y >= length z, then y is merged with the smaller of x and z (if they are tied, x is chosen, because it is more likely to be cached). If, further, length x >= length y then they are merged. These steps are repeated until the invariants are established.
The last important piece of the algorithm is the merging. At first, two chunks are merged element-wise. However, while doing so, counts are kept of the number of elements taken from one chunk without any from its partner. If this count exceeds a threshold, the merge switches to searching for elements from one chunk in the other, and copying chunks at a time. If these chunks start falling below the threshold, the merge switches back to element-wise.
The search used in the merge is also special. It uses a galloping strategy, where exponentially increasing indices are tested, and once two such indices are determined to bracket the desired value, binary search is used to find the exact index within that range. This is asymptotically the same as simply using binary search, but is likely to do fewer comparisons than binary search would.
One aspect that is not yet implemented from the original Tim sort is the adjustment of the above threshold. When galloping saves time, the threshold is lowered, and when it doesn't, it is raised. This may be implemented in the future.