Stories, Papers, WIKIs

Title Body
Design Space Exploration of Adaptive Beamforming Acceleration for Bedside and Portable Medical Ultrasound Imaging (ACM)

Abstract:

The use of adaptive beamforming is a viable solution to provide high-resolution real-time medical ultrasound imaging. However, the increase in image resolution comes at an expense of a significant increase in compute requirement over conventional algorithms. In a bedside diagnosis setting where plug-in power is available, GPUs are promising accelerators to address the processing demand. However, in the case of point-of-care diagnostics where portable ultrasound imaging devices must be used, alternative power-efficient computer systems must be employed, possibly at the expense of lower image resolution in order to maintain real-time performance. This paper presents an initial design space exploration on viable compute architectures that might address the drastically different requirements between bedside and portable medical ultrasound imaging systems using adaptive beamforming. The design and implementation of a GPU accelerator that provides over 45x performance improvement over the equivalent C implementation on a single CPU is presented. Furthermore, and implementation of the beamforming algorithm on a high-performance mobile platform based on an ARM Cortex A8 mobile processor in combination with the built-in NEON accelerator is also presented. The mobile platform delivers over 270x reduction in power consumption when compared to the GPU platform at an expense of much reduced performance. The tradeoffs between power, performance and image quality among the target platforms are studied and future research directions in power-efficient architectures for high-performance medical ultrasound systems are presented.

Paper available at ACM.

High-Performance 3D Compressive Sensing MRI Reconstruction using Many-Core Architectures (ACM)

Abstract:

Compressive sensing (CS) describes how sparse signals can be accurately reconstructed from many fewer samples than required by the Nyquist criterion. Since MRI scan duration is proportional to the number of acquired samples, CS has been gaining significant attention in MRI. However, the computationally intensive nature of CS reconstructions has precluded their use in routine clinical practice. In this work, we investigate how different throughput-oriented architectures can benefit one CS algorithm and what levels of acceleration are feasible on different modern platforms. We demonstrate that a CUDA-based code running on an NVIDIA Tesla C2050 GPU can reconstruct a 256 × 160 × 80 volume from an 8-channel acquisition in 19 seconds, which is in itself a significant improvement over the state of the art. We then show that Intel‘s Knights Ferry can perform the same 3D MRI reconstruction in only 12 seconds, bringing CS methods even closer to clinical viability.

Paper available at ACM.

Fast Medical Image Reconstruction using Graphics Processing Unit: Towards Real-Time Reconstruction of Magnetic Resonance Images (ACM)

Abstract:

A fast magnetic resonance images (MRI) reconstruction algorithm taking advantage of the prevailing general purpose graphics processing unit (GPGPU) programming paradigm is experimented in this book. In a number of medical imaging modalities, the Fast Fourier Transform (FFT) is being used for the reconstruction of images from acquired raw data.The objective is to develop an algorithm to run under CPU and also in GPU for the reconstruction by performing the Fast Fourier Transform (FFT) as well as Inverse Fourier Transformation (IFT) in much faster way. The algorithm is developed in MATLAB environment. The CUFFT library is used to run under device to study the improved performance of reconstructions. GPUMat is used to running CUDA code in MATLAB. This book exercises the acceleration of MRI reconstruction algorithm on NVIDIA‘s GeForce G 103M based GPU and Intel Core2 Duo based CPU. Experimental FFT based reconstruction algorithm shows that GPU based MRI reconstruction achieved significant speedup compared to the CPUs for medical applications at a cheaper cost. The runtime for GPU shows that real-time MRI reconstruction will be possible.

Paper available at ACM.

Real-Time Ultrasound Simulation using the GPU (IEEE)

Abstract:

Ultrasound simulators can be used for training ultrasound image acquisition and interpretation. In such simulators, synthetic ultrasound images must be generated in real time. Anatomy can be modeled by computed tomography (CT). Shadows can be calculated by combining reflection coefficients and depth dependent, exponential attenuation. To include speckle, a pre-calculated texture map is typically added. Dynamic objects must be simulated separately. We propose to increase the speckle realism and allow for dynamic objects by using a physical model of the underlying scattering process. The model is based on convolution of the point spread function (PSF) of the ultrasound scanner with a scatterer distribution. The challenge is that the typical field-of-view contains millions of scatterers which must be selected by a virtual probe from an even larger body of scatterers. The main idea of this paper is to select and sample scatterers in parallel on the graphic processing unit (GPU). The method was used to image a cyst phantom and a movable needle. Speckle images were produced in real time (more than 10 frames per second) on a standard GPU. The ultrasound images were visually similar to images calculated by a reference method.

Paper available at IEEE.

ISP: An Optimal Out-of-Core Image-Set Processing Streaming Architecture for Parallel Heterogeneous Systems (IEEE)

Abstract:

Image population analysis is the class of statistical methods that plays a central role in understanding the development, evolution, and disease of a population. However, these techniques often require excessive computational power and memory that are compounded with a large number of volumetric inputs. Restricted access to supercomputing power limits its influence in general research and practical applications. In this paper we introduce ISP, an Image-Set Processing streaming framework that harnesses the processing power of commodity heterogeneous CPU/GPU systems and attempts to solve this computational problem. In ISP, we introduce specially designed streaming algorithms and data structures that provide an optimal solution for out-of-core multiimage processing problems both in terms of memory usage and computational efficiency. ISP makes use of the asynchronous execution mechanism supported by parallel heterogeneous systems to efficiently hide the inherent latency of the processing pipeline of out-of-core approaches. Consequently, with computationally intensive problems, the ISP out-of-core solution can achieve the same performance as the in-core solution. We demonstrate the efficiency of the ISP framework on synthetic and real datasets.

 

Paper available at IEEE.

Multiscale AM-FM Decompositions with GPU Acceleration for Diabetic Retinopathy Screening (IEEE)

Abstract:

A Computer Aided Diagnosis system based on multiscale amplitude-modulation frequency-modulation (AM-FM) methods has been recently developed for discriminating between normal and pathological retinal images. The original Matlab implementation of this system required large amounts of computational time and memory resources that would not permit real-time patient consultation. In this manuscript, we present a new implementation of the multiscale AM-FM decomposition, converted from MATLAB code into C/CUDA (Compute Unified Device Architecture) code, in order to take advantage of the graphics processing units (GPU) to significantly reduce the running time and memory resources.

Paper available at IEEE.

Dynamic Task-Scheduling and Resource Management for GPU Accelerators in Medical Imaging (ACM)

Abstract:

For medical imaging applications, a timely execution of tasks is essential. Hence, running multiple applications on the same system, scheduling with the capability of task preemption and prioritization becomes mandatory. Using GPUs as accelerators in this domain, imposes new challenges since GPU‘s common FIFO scheduling does not support task prioritization and preemption. As a remedy, this paper investigates the employment of resource management and scheduling techniques for applications from the medical domain for GPU accelerators. A scheduler supporting both, priority-based and LDF scheduling is added to the system such that high-priority tasks can interrupt tasks already enqueued for execution. The scheduler is capable of utilizing multiple GPUs in a system to minimize the average response time of applications. Moreover, it supports simultaneous execution of multiple tasks to hide data transfers latencies. We show that the scheduler interrupts scheduled and already enqueued applications to fulfill the timing requirements of high-priority dynamic tasks.

Paper available at ACM.

A GPU-Optimized Binary Space Partition Structure to Accelerate the Monte Carlo Simulation of CT Projections of Voxelized Patient Models with Metal Implants (IEEE)

Abstract:

Monte Carlo x-ray transport simulation codes can generate radiographic images that are equivalent to images produced by clinical systems. Most codes optimized for medical imaging use voxels to represent the patient anatomy and employ delta scattering (Woodcock algorithm) as an essential acceleration technique. With delta scattering all voxels have the same attenuation and x rays cross multiple voxels in each step reducing the time spent accessing memory and computing voxel interfaces. A drawback of this approach is that it is inefficient in phantoms with highly attenuating voxels. We present a binary space partition structure, bitree, that improves the performance of delta scattering by selecting an optimum attenuation within different regions while minimizing the interfaces to be crossed. The described bitree and its traversal algorithm are optimized for GPU-computing and have been implemented in the PENELOPE-based MC-GPU code. The bitree approach reduced the execution time by 88.6% for the simulation of a head CT scan with a metallic implant.

Paper available at IEEE.

 

GPU-Based Visualization and Synchronization of 4D Cardiac MR and Ultrasound Images (IEEE)

Abstract:

In minimally invasive image-guided interventions, different imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), and three-dimensional (3D) ultrasound (US), can provide complementary, multi-spectral image information. Dynamic image registration is a well-established approach that permits real-time diagnostic information to be enhanced by placing lower-quality real-time images within a high quality anatomical context. For the guidance of cardiac interventions, it would be valuable to register dynamic MRI or CT with intra-operative US. However, in practice, either the high computational cost prohibits such real-time visualization, or else the resulting image quality is not satisfactory for accurate interventional guidance. Modern graphics processing units (GPUs) provide the programmability, parallelism and increased computational precision to address this problem. In this work, we first outline our research on dynamic 3D cardiac MR and US image acquisition, real-time dual-modality registration and US tracking. Next, we describe our contributions on image processing and optimization techniques for 4D (3D + time) cardiac image rendering, and our GPU-accelerated methodologies for multimodality 4D medical image visualization and optical blending, along with real-time synchronization of dual-modality dynamic cardiac images. Finally, multiple transfer functions, various image composition schemes, and an extended windowlevel setting and adjustment approach are proposed and applied to facilitate the dynamic volumetric MR and US cardiac data exploration and enhance the feature of interest of US image that is usually restricted to a narrow voxel intensity range.

Paper available at IEEE.

Colour Diffusion Model Acceleration on GPUs (IEEE)

Abstract:

In this paper we propose several algorithms for parallel implementation on GPUs of a diffusion model for colour images to be used as external energy for active contours. The diffusion model was proposed for colour images and is based on the first order moment of the correlation integral expressed using ΔE distances in the CIE Lab colour space, while most of the existing diffusion models are defined only for grayscale images. The acceleration we propose was used in a multi-scale active contour approach. We describe our various approaches for accelerating and parallel implementation and report on the performance increase we obtain. We show how the diffusion model was successfully used to segment dermatological images. We conclude that the proposed approaches and parallel implementations are useful for speeding up the process of automatic analysis of medical images, mandatory for every-day clinical screening.

Paper available at IEEE.