Stories, Papers, WIKIs

Title Body
Towards Chip-on-Chip Neuroscience: Fast Mining of Neuronal Spike Streams Using Graphics Hardware

Abstract:

Computational neuroscience is being revolutionized with the advent of multi-electrode arrays that provide real-time, dynamic perspectives into brain function. Mining neuronal spike streams from these chips is critical to understand the firing patterns of neurons and gain insight into the underlying cellular activity. To address this need, we present a solution that uses a massively parallel graphics processing unit (GPU) to mine the stream of spikes. We focus on mining frequent episodes that capture coordinated events across time even in the presence of intervening background events. Our contributions include new computation-to-core mapping schemes and novel strategies to map finite state machine-based counting algorithms onto the GPU. Together, these contributions move us towards a real-time ‘chip-on- chip’ solution to neuroscience data mining, where one chip (the multi-electrode array) supplies the spike train data and another chip (the GPU) mines it at a scale previously unachievable. 

Parallelisation of Fuzzy Inference on a Graphics Processor Unit Using the Compute Unified Device Architecture

Abstract:

 

The inherently parallel nature of fuzzy inference is rarely exploited by fuzzy systems researchers. Hardware implementations, such
as Field Programmable Gate Arrays (FPGAs), commonly use parallel architectures to achieve fast inference speeds. In this paper,
we explore the use of Graphics Processor Units (GPUs) and NVIDIA‟s Compute Unified Device Architecture (CUDA) for fast
inference speeds in a scalable and flexible Mamdani type fuzzy inference system (FIS). Our goal is to provide computational
intelligence researchers the skills necessary to exploit the low cost and high performance of GPUs with a minimum learning cost. 

A Translation System for Enabling Data Mining Applications on GPUs (ACM)

 Abstract:


Modern GPUs offer much computing power at a very modest cost. Even though CUDA and other related recent developments are accelerating the use of GPUs for general purpose applications, several challenges still remain in programming the GPUs. Thus, it is clearly desirable to be able to program GPUs using a higher-level interface.
 


In this paper, we offer a solution that targets a specific class of applications, which are the data mining and scientific data analysis applications. Our work is driven by the observation that a common processing structure, that of generalized reductions, fits a large number of popular data mining algorithms. In our solution, the programmers simply need to specify the sequential reduction loop(s) with some additional information about the parameters. We use program analysis and code generation to map the applications to a GPU. Several additional optimizations are also performed by the system.
 


We have evaluated our system using three popular data mining applications, k-means clustering, EM clustering, and Principal Component Analysis (PCA). The main observations from our experiments are as follows. The speedup that each of these applications achieve over a sequential CPU version ranges between 20 and 50. The automatically generated version did not have any noticeable overheads compared to hand written codes. Finally, the optimizations performed in the system resulted in significant performance improvements.

Data Visualization and Mining using the GPU

 Abstract:


An exciting development in the computing industry has been the emergence of graphics processing units (the GPU) as a fast general purpose co-processor. Initially designed for gaming applications, todays GPUs demonstrate impressive computing power and high levels of parallelism and are now being used for a variety of applications far removed from traditional graphics rendering settings.
Perhaps the most powerful use of the GPU has been in visualization, which couples the raw computing power of the GPU with its extensive capabilities for rendering scenes. The GPU provides the required computing power and real-time interactive rendering capabilities and there are now GPU-assisted algorithms for many fundamental problems in data visualization and analysis, including such basic primitives as matrix operations, FFTs, wavelet transforms,
clustering and mining data streams. This is an exciting and fast developing area, and the tools and technique are now mature enough that researchers with no experience in using the GPU can use it to develop new data mining tools. The purpose of this tutorial is to introduce the KDD audience to the GPU and the programming model it represents, describe the ways in which one can program the GPU, and demonstrate a set of data mining primitives that have been implemented effectively on the GPU.

Data Mining Using Graphics Processing Units

Abstract:


During the last few years, Graphics Processing Units (GPU) have evolved from simple devices for the display signal preparation into powerful coprocessors that do not only support typical computer graphic tasks such as rendering of 3D scenarios but can also be used for general numeric and symbolic computation tasks such as simulation and optimization. As major advantage, GPUs provide extremely high parallelism (with several hundred simple programmable processors) combined with a high bandwidth in memory transfer at low cost. In this paper, we propose several algorithms for computationally expensive data mining tasks like similarity search and clustering which are designed for the highly parallel environment of a GPU. We define a multidimensional index structure which is particularly suited to support similarity queries under the restricted programming model of a GPU, and define a similarity join method. Moreover, we define highly parallel algorithms for density-based and partitioning clustering. In an extensive experimental evaluation, we demonstrate the superiority of our algorithms running on GPU over their conventional counterparts in CPU.

Real-time Foreground Segmentation on GPUs using Local Online Learning and Global Graph Cut Optimization

Abstract:
This paper is to address the problem of foreground separation from the background modeling perspective. In particular, we deal with the difficult scenarios where the background texture might change spatially and temporally. A novel approach is proposed that incorporates a pixel-based online learning method to adapt to temporal background changes promptly, together with a graph cuts method to propagate per-pixel evaluation results over nearby pixels. Empirical experiments on a variety of datasets demonstrate the competitiveness of the proposed approach, which is also able to work in real-time on the Graphics Processing Unit (GPU) of programmable graphics cards.

Locally-Connected Hierarchical Neural Networks for GPU-Accelerated Object Recognition

Abstract:

 

Convolutional neural networks have achieved good recognition results on image datasets while being computationally efficient, i.e., scaling well with the number of training patterns and the resolution of the patterns. Here we investigate a neural network model that has a similar hierarchical structure, but does not employ weight sharing. Instead, each neuron has a fixed receptive field with unique connection weights. To deal with the enormous number of weights resulting from this architecture, we implemented a parallel version of the model using Nvidia’s CUDA framework. This implementation is up to 82 times faster than a serial CPU implementation. Our model achieves state-of-the-art recognition performance on the NORB normalized-uniform dataset (2.87% error rate) and good results on the MNIST dataset (0.76% error rate). This suggests that large networks with local, non-shared connections might be an interesting architecture for object recognition tasks. To further evaluate the model, we created a large, publicly available training and testing set, which consists of objects extracted from the LabelMe natural image dataset.

 

Towards Automated Learning of Object Detectors

Abstract:
Recognizing arbitrary objects in images or video sequences is a difficult task for a computer vision system. We work towards automated learning of object detectors from video sequences (without user interaction). Our system uses object motion as an important cue to detect independently moving objects in the input sequence. The largest object is always taken as the teaching input, i.e. the object to be extracted. We use Cartesian Genetic Programming to evolve image processing routines which deliver the maximum output at the same position where the detected object is located. The graphics processor (GPU) is
used to speed up the image processing. Our system is a step towards automated learning of object detectors.

Active Structured Learning for High-Speed Object Detection

Abstract:


High-speed smooth and accurate visual tracking of objects in arbitrary, unstructured environments is essential for robotics and human motion analysis. However, building a system that can adapt to arbitrary objects and a wide range of lighting conditions is a challenging problem, especially if hard real-time constraints apply like in robotics scenarios. In this work, we introduce a method for learning a discriminative object tracking system based on the recent structured regression framework for object localization. Using a kernel function that allows fast evaluation on the GPU, the resulting system can process video streams at speed of 100 frames per second or more. Consecutive frames in high speed video sequences are typically very redundant, and for training an object detection system, it is sufficient to have training labels from only a subset of all images. We propose an active learning method that select training examples in a data-driven way, thereby minimizing the required number of training labeling. Experiments on realistic data show that the active learning is superior to previously used methods for dataset subsampling for this task.

Learning Two-View Stereo Matching

Abstract:
We propose a graph-based semi-supervised symmetric matching framework that performs dense matching between two uncalibrated wide-baseline images by exploiting the results of sparse matching as labeled data. Our method utilizes multiple sources of information including the underlying manifold structure,matching preference, shapes of the surfaces in the scene, and global epipolar geometric constraints for occlusion handling. It can give inherent sub-pixel accuracy and can be implemented in a parallel fashion on a graphics processing unit (GPU). Since the graphs are directly learned from the input images without relying on extra training data, its performance is very stable and hence the method is applicable under general settings. Our algorithm is robust against outliers in the initial sparse matching due to our consideration of all matching costs simultaneously, and the provision of iterative restarts to reject outliers from the previous estimate. Some challenging experiments have been conducted to evaluate the robustness of our method.