Analysis of Fast Parallel Sorting Algorithms for GPU Architectures‘ (IEEE)
Sorting algorithms have been studied extensively since past three decades. Their uses are found in many applications including real-time systems, operating systems, and discrete event simulations. In most cases, the efficiency of an application itself depends on usage of a sorting algorithm. Lately, the usage of graphic cards for general purpose computing has again revisited sorting algorithms. In this paper we extended our previous work regarding parallel sorting algorithms on GPU, and are presenting an analysis of parallel and sequential bitonic, odd-even and rank-sort algorithms on different GPU and CPU architectures. Their performance for various queue sizes is measured with respect to sorting time and rate and also the speed up of bitonic sort over odd-even sorting algorithms is shown on different GPUs and CPU. The algorithms have been written to exploit task parallelism model as available on multi-core GPUs using the OpenCL specification. Our findings report minimum of 19x speed-up of bitonic sort against odd-even sorting technique for small queue sizes on CPU and maximum of 2300x speed-up for very large queue sizes on Nvidia Quadro 6000 GPU architecture.
Paper available at IEEE.