Efficient Pesonal Supercomputing in Fortran 9x on CPU-GPU Systems
The availability of graphics-processors based compute devices and multi-core host architectures with larger memories on both means that it is possible to run relatively large scientific computing problems on “personal” machines. For wide adoption by scientists and to achieve an increase in their productivity these architectures must be relatively easy to use in the languages scientists use (among which are Matlab and Fortran-9x), without having to retranslate their thinking and algorithms into graphics methaphors. At the same time, to actually achieve good performance, developers must be aware of issues such as the relatively large cost for device-host communications, the preference for certain numbers of threads and block sizes, etc. Using NVIDIA’s CUDA architecture on 8800GTX GPUs hosted in a multicore Wintel host, we develop a set of wrappers that allow use of the architecture in conventional Fortran-9x with OMP. Example applications in magnetohydrodynamics and radial-basis-function interpolation were translated to the architecture and showed significant speed-ups by factors of approximately 25 and 15 times respectively over the performance on a single Intel QX6700 CPU processor, demonstrating the power of our approach.