Floating-point mixed-radix FFT core generation for FPGA and comparison with GPU and CPU (IEEE)
Over the past decades, we noticed huge advances in FPGA technologies. The topic of floating-point accelerator on FPGA has gained renewed interests due to the increased device size and the emergence of fast hardware floating-point library. The popularity of FFT makes it easier to justify spending lots of effort doing detailed optimization. However, the ever increasing data size in some compelling application domains remains beyond the capability of existing FFT accelerators. The demand for more performance remains an active research topic. In this paper, leveraging structured description of FFT algorithms, we propose a FPGA-based FFT core generation framework, which emits Verilog HDL code given high-level algorithmic description and can handle radix-2 as well as prime-radix problem size. In particular, the proposed framework is optimized for 2D FFT and real FFT. The performance of our implementation is comparable with a commercial FFT IP. When compared with the latest results on GPU and CPU, measured in peak floating-point performance and energy efficiency, it shows that GPUs have outperformed FPGAs for FFT acceleration. However, we consider that FPGAs still have advantage in some situations.
Paper available at IEEE.