Java Data Structures - Contents
A D V E R T I S E M E N T
Optimizing Quicksort written in Java is not such a
great task. Whatever tricks we use, we will still be restrained by the speed of the Java
Virtual Machine (JVM). In this section, we will talk about algorithmic improvements,
rather than quirky tricks.
The first thing what we should do is look at the
above code of the un-optimized sort, and see what can be improved. One thing that should
be obvious is that it's way too much work if the array is very small. There are a LOT
simpler sorts available for small arrays. For example, simple insertion sort has
almost no overhead, and on small arrays, is actually faster than quicksort! This
can be fixed rather easily, by including an if statement to see if the size
of the array is smaller than some particular value, and doing a simple insertion sort
if it is. This threshold value can only be determined from doing actual experiments with
sample data. However, usually any value less than 15 is pretty good; and that
is why we will choose 7 for our upcoming example.
What makes quicksort so fast is the simplicity of
it's inner loops. In our previous example, we were doing extra work in those inner loops!
We were checking for the array bounds; we should have made some swaps beforehand to make
sure that the inner loops are as simple as possible. Basically, the idea is to make sure
that the first element in the array is less than or equal to the second, and that the
second is less than or equal to the last. Knowing that, we can remove those array bounds
checking from the inner loops (in some cases speeding up the algorithm by about two
times).
Another thing that should be obvious is recursion.
Because sort methods are usually very performance hungry, we would like to remove as much
function calls as possible. This includes getting rid of recursion. The way we can get rid
of recursion is using a stack to store the upper and lower bounds of arrays to be sorted,
and using a loop to control the sort.
A question automatically arises: How big should
that stack be? A simple answer is: It should be big enough. (you'll be
surprised how many books avoid this topic!) We cannot just use java.util.Vector
object for it's dynamic growing ability, since that would be way too much overhead for
simple stack operations that should be as quick as possible. Now, what would be a big
enough stack? Well, a formula below calculates the size:
size = 2 * ln N / ln 2
Where ln is natural logarithm, and N
is the number of elements in the array. Which all translates to:
size = (int)(Math.ceil(2.0*Math.log(array.length)/Math.log(2.0));
Believe it or not, but it's actually not worth using
this equation. If you think about it, you will notice that the size of the stack grows
VERY slowly. For example, if the array.length is 0xFFFFFFFF in
length (highest 32 bit integer value), then size will be 64 ! We
will make it 128 just in case we will ever need it for 64 bit
integers. (If you don't like magic values inside your program, then by all means, use the
equation.)
Having gotten to this point, we are almost ready to
implement our optimized version of quicksort. I say almost because it is still
not optimized to it's fullest. If we were using native types instead of Comparable
objects, then the whole thing would be faster. If we implementing it as native code, it
would be even faster. Basically, most other speed-ups are left up to these system or
program specific quirky optimizations. And now, here's our optimized version:
import java.lang.*;
import java.io.*;
public class pQuicksort{
public static void sort(Comparable[] c){
int i,j,left = 0,right = c.length - 1,stack_pointer = -1;
int[] stack = new int[128];
Comparable swap,temp;
while(true){
if(right - left <= 7){
for(j=left+1;j<=right;j++){
swap = c[j];
i = j-1;
while(i>=left && c[i].compareTo(swap) > 0)
c[i+1] = c[i--];
c[i+1] = swap;
}
if(stack_pointer == -1)
break;
right = stack[stack_pointer--];
left = stack[stack_pointer--];
}else{
int median = (left + right) >> 1;
i = left + 1;
j = right;
swap = c[median]; c[median] = c[i]; c[i] = swap;
/* make sure: c[left] <= c[left+1] <= c[right] */
if(c[left].compareTo(c[right]) > 0){
swap = c[left]; c[left] = c[right]; c[right] = swap;
}if(c[i].compareTo(c[right]) > 0){
swap = c[i]; c[i] = c[right]; c[right] = swap;
}if(c[left].compareTo(c[i]) > 0){
swap = c[left]; c[left] = c[i]; c[i] = swap;
}
temp = c[i];
while(true){
do i++; while(c[i].compareTo(temp) < 0);
do j--; while(c[j].compareTo(temp) > 0);
if(j < i)
break;
swap = c[i]; c[i] = c[j]; c[j] = swap;
}
c[left + 1] = c[j];
c[j] = temp;
if(right-i+1 >= j-left){
stack[++stack_pointer] = i;
stack[++stack_pointer] = right;
right = j-1;
}else{
stack[++stack_pointer] = left;
stack[++stack_pointer] = j-1;
left = i;
}
}
}
}
public static void main(String[] args){
int i;
Integer[] arr = new Integer[20];
System.out.println("inserting: ");
for(i=0;i<arr.length;i++){
arr[i] = new Integer((int)(Math.random()*99));
System.out.print(arr[i]+" ");
}
pQuicksort.sort(arr);
System.out.println("\nsorted: ");
for(i=0;i<arr.length;i++)
System.out.print(arr[i]+" ");
System.out.println("\nDone ;-)");
}
}
The output closely follows:
inserting:
17 52 88 79 91 41 31 57 0 29 87 66 94 22 19 30 76 85 61 16
sorted:
0 16 17 19 22 29 30 31 41 52 57 61 66 76 79 85 87 88 91 94
Done ;-)
This new sort function is a bit larger than most
other sorts. It starts out by setting left and right pointers,
and the stack . Stack allocation could be moved outside the function, but
making it static is not a good choice since that introduces all kinds of multithreading
problems.
We then fall into an infinite loop in which we have
an if() and an else statement. The if() statement
finds out if we should do a simple insertion sort, else , it will do
quicksort. I will not explain insertion sort here (since you should already be
familiar with it). So, after insertion sort, we see if the stack is
not empty, if it is, we break out of the infinite loop, and return
from the function. If we don't return , we pop the stack , set new
left and right pointers (from the stack), and continue on with
the next iteration of the loop.
Now, if threshold value passed, and we ended up
doing quicksort we first find the median . This median is used
here to make the case with bad performance less likely to occur. It is useless
against random data, but does help if the data is in somewhat sorted order.
We swap this median with the second
value in the array, and make sure that the first value is less than or equal than the
second, and the second is less than or equal than the last. After that, we pick our
partition element (or pivot), and fall into an infinite loop of finding that pivot's
correct place.
Notice that the most inner loops are fairly simple.
Only one increment or decrement operation, and a compare. The compare could be improved
quite a bit by using native types; instead of calling a function. The rest of this part of
the sort is almost exactly like in the un-optimized version.
After we find the correct position of the pivot
variable, we are ready to continue the sort on the right sub-array, and on the left
sub-array. What we do next is check to see which one of these sub-arrays is bigger. We
then insert the bigger sub-array bounds on to the stack, and setup new left
and right pointers so that the smaller sub-array gets processed next.
I guess that's it for quicksort. If you still don't
understand it, I doubt any other reference would be of any help. Basically go through the
algorithm; tracing it a few times, and eventually, it will seem like second nature to you.
The above function is good enough for most purposes. I've sorted HUGE arrays (~100k)
of random data, and it performed quite well.
Back to Table of Contents
A D V E R T I S E M E N T
|
Subscribe to SourceCodesWorld - Techies Talk |
|