Enhancing Performance of CUDA Quicksort Through Pivot Selection and Branching Avoidance Methods
This paper presents a fine-tuned implementation of the quicksort algorithm for highly parallel multicore NVIDIA graphics processors. The described approach focuses on algorith-mic and implementation-level improvements to achieve enhanced performance. Several fine-tuning techniques are explored to identify the best combination of improvements for the quicksort algorithm on GPUs. The results show that this approach leads to a significant reduction in execution time and an improvement in algorithmic operations, such as the number of iterations of the algorithm and the number of operations performed compared to its predecessors. The experiments are conducted on an NVIDIA graphics card, taking into account several distributions of input data. The findings suggest that this fine-tuning approach can enable efficient and fast sorting on GPUs for a wide range of applications.