Loading...
Stories, Papers, WIKIs
| Title | Body |
|---|---|
| Real-time Minute Change Detection on GPU for Cellular and Remote Sensor Imaging (IEEE) |
Abstract Discovering subtle alterations of pairs of images taken from the same scene at different time intervals is called minute change detection problem. To achieve this goal, we have developed a framework that captures and highlights minute changes in digital images that are otherwise hidden to the human eye. Moreover, unnoticeable differences from image pairs that are taken at different time intervals with similar viewing conditions are detected. Although our framework's application areas cover a wide variety of different disciplines, from medicine to security, weather forecasting, urban planning, and monitoring natural disasters, in this study our focus was real-time minute change detection and tracking on biomedical and satellite images. Real-time performance in cases such as medicine is crucial; we enhance this approach by using the extensive computational power of the graphical processing unit (GPU). Our experimental results in detection of subtle changes in light microscopic images of mouse MC3T3-E1 osteoblastic cells grown in culture with the resolution of 2600times2060 and remote sensor images performed by the GPU computations illustrate that our algorithm detects infinitesimal differences on images in real-time. Paper available at IEEE. |
| An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code (IEEE) |
Abstract Regional weather forecasting demands fast simulation over fine-grained grids, resulting in extremely memory- bottlenecked computation, a difficult problem on conventional supercomputers. Early work on accelerating mainstream weather code WRF using GPUs with their high memory performance, however, resulted in only minor speedup due to partial GPU porting of the huge code. Our full CUDA porting of the high- resolution weather prediction model ASUCA is the first such one we know to date; ASUCA is a next-generation, production weather code developed by the Japan Meteorological Agency, similar to WRF in the underlying physics (non-hydrostatic model). Benchmark on the 528 (NVIDIA GT200 Tesla) GPU TSUBAME Supercomputer at the Tokyo Institute of Technology demonstrated over 80-fold speedup and good weak scaling achieving 15.0 TFlops in single precision for 6956 x 6052 x 48 mesh. Further benchmarks on TSUBAME 2.0, which will embody over 4000 NVIDIA Fermi GPUs and deployed in October 2010, will be presented. Paper available at IEEE. |
| Parallel Implementation of the Irregular Terrain Model (ITM) for Radio Transmission Loss Prediction Using GPU and Cell BE Processors (IEEE) |
Abstract The Irregular Terrain Model (ITM), also known as the Longley-Rice model, predicts long-range average transmission loss of a radio signal based on atmospheric and geographic conditions. Due to variable terrain effects and constantly changing atmospheric conditions which can dramatically influence radio wave propagation, there is a pressing need for computational resources capable of running hundreds of thousands of transmission loss calculations per second. Multicore processors, like the NVIDIA Graphics Processing Unit (GPU) and IBM Cell Broadband Engine (BE), offer improved performance over mainstream microprocessors for ITM. We study architectural features of the Tesla C870 GPU and Cell BE and evaluate the effectiveness of architecture-specific optimizations and parallelization strategies for ITM on these platforms. We assess the GPU implementations that utilize both global and shared memories along with fine-grained parallelism. We assess the Cell BE implementations that utilize direct memory access, double buffering and SIMDization. With these optimization strategies, we achieve less than a second of computation time on each platform which is not feasible with a general purpose processor, and we observe that the GPU delivers better performance than Cell BE in terms of total execution time and performance per watt metrics by a factor of 2.3x and 1.6x respectively. Paper available at IEEE. |
| GPU-based real-time virtual reality modeling and simulation of seashore (IEEE) |
Abstract This paper is devoted to efficient algorithms for real-time rendering of seashore using programmable Graphics Processing Unit (GPU). The scene of seashore is a usual component of virtual environment in simulators or games and should be realistic and real-time. We realized the realtime seashore simulation in three steps: first the ocean wave generation, using a simple but high-efficiency model which can describe both shallow and deep ocean; second the optical effects imitation and third the interaction of ocean waves with the coast, including the coastline simulation designed mathematically and the breaking waves simulation designed by 3D Bézier curved surface via metamorphosing and key-frame animation. Scenes under different atmospheric conditions are also presented in this paper. Paper available at IEEE. |
| GPU implementation of belief propagation using CUDA for cloud tracking and reconstruction (IEEE) |
Abstract This paper describes an efficient CUDA-based GPU implementation of the belief propagation algorithm that can be used to speed up stereo image processing and motion tracking calculations without loss of accuracy. Preliminary results in using belief propagation to analyze satellite images of hurricane Luis for real-time cloud structure and tracking are promising with speed-ups of nearly a factor of five. Paper available at IEEE. |
| GPU acceleration of numerical weather prediction (IEEE) |
Abstract Weather and climate prediction software has enjoyed the benefits of exponentially increasing processor power for almost 50 years. Even with the advent of large-scale parallelism in weather models, much of the performance increase has come from increasing processor speed rather than increased parallelism. This free ride is nearly over. Recent results also indicate that simply increasing the use of large- scale parallelism will prove ineffective for many scenarios. We present an alternative method of scaling model performance by exploiting emerging architectures using the fine-grain parallelism once used in vector machines. The paper shows the promise of this approach by demonstrating a 20 times speedup for a computationally intensive portion of the Weather Research and Forecast (WRF) model on an NVIDIA 8800 GTX graphics processing unit (GPU). We expect an overall 1.3 times speedup from this change alone. Paper available at IEEE. |
| GPU Computing for Atmospheric Modeling (IEEE) |
Abstract Much success has been achieved using GPUs to accelerate existing applications that are highly data parallel, or that are dominated by small, intense computational kernels. What are the prospects for porting existing large scientific models that do not fit this mold? We take an expensive routine from the CAM atmosphere model, and port it to a GPU using CUDA. We use the experience gained as a guide in thinking about porting the full application to an accelerator based system. We consider the best path forward for getting large scientific models running on accelerator based systems, and identify cases where porting may be feasible, and where a complete redesign may be the best option. Paper available at IEEE. |
| GPS Forward Model Computing Study on CPU/GPU Co-Processing Parallel System Using CUDA (IEEE) |
Abstract Profiles of refraction and bending angle, which computed through the forward model for GPSRO (Global Positioning System radio occultation), are extremely important for GPS radio occultation data assimilation to the forecast system of NWP (Numerical Weather Prediction). The daily processing of GPS RO data in assimilation system costs amount of time, thus there is an urgent need to find a new way to reduce the computing time. GPU is suited for many data computation-intensive task and has emerged as an inexpensive high performance co-processor because of their tremendous computing power. In this paper, we demonstrate how forward model for GPS can be accelerated considerably by using throughput-oriented GPU on a standard PC. Our implementation is based on loop unrolling, CUDA stream, SPMD, and SIMD vector parallel computing. We have successfully implemented the forward model on single GPU platform, and then develop a simple CPU/GPU parallel cluster. The results on GTX 480 for a single-GPU show a speedup of up to 259 over CPU-based program. In comparison to a single node, the speedup on our cluster which has three nodes is 2.68. All results demonstrate that the forward model can be high efficiently parallelized on CPU/GPU cluster. Besides, it also indicates that the cluster has good scalability. Paper available at IEEE. |
| Running the NIM Next-Generation Weather Model on GPUs (IEEE) |
Abstract We are using GPUs to run a new weather model being developed at NOAA's Earth System Research Laboratory (ESRL). The parallelization approach is to run the entire model on the GPU and only rely on the CPU for model initialization, I/O, and inter-processor communications. We have written a compiler to convert Fortran into CUDA, and used it to parallelize the dynamics portion of the model. Dynamics, the most computationally intensive part of the model, is currently running 34 times faster on a single GPU than the CPU. We also describe our approach and progress to date in running NIM on multiple GPUs. Paper available at IEEE. |
| SAR focusing of P-band Ice Sounding Data Using Back-Projection (IEEE) |
Abstract SAR processing can be applied to ice sounder data to improve along-track resolution and clutter suppression. This paper presents a time-domain back-projection technique for SAR focusing of ice sounder data. With this technique, variations in flight track and ice surface slope can be accurately accommodated at the expense of computation time. The back-projection algorithm can be easily parallelized however, and can advantageously be implemented on a graphics processing unit (GPU). Results from using the back-projection algorithm on POLARIS ice sounder data from North Greenland shows that the quality of data is improved by the processing, and the performance of the GPU implementation allows for very fast focusing. Paper available at IEEE. |

BayWebSoft