Fast Mersenne Prime Testing on the GPU
The Lucas-Lehmer test for Mersenne primality can be efficiently parallelized for GPU-based computation. The gpuLucas project implements an irrational-base discrete weighted transform approach (IBDWT) using balanced-integers, non-power-of-two transforms, and carry-save radix representations. gpuLucas uses the CUDA programming language and requires the double-precision floating point capabilities of recent GPUs. Results show up to 7× speedups over benchmark averages for optimized sequential code and factor-of-two speedups over CUDALucas, another GPU-based Lucas-Lehmer tester developed independently and with a different optimization strategy. This work demonstrates techniques for implementing GPU-based number theoretic algorithms on very large numbers, including fast multiplication, prefix-sum-based carry-propagation, and the use of carry-save arithmetic with balanced integers. The work presents timing profiles of convolution-based integer multiplication based on the IBDWT, in particular for non-power-of-two transformations, and establishes the usefulness of the software as a GPU benchmarking application and as a platform for large-integer and polynomial experimentation.