Fast and approximate stream mining of quantiles and frequencies using graphics processors

Publication Year: 
2005

     We present algorithms for fast quantile and frequency estimation in large data streams using graphics processor units (GPUs). We exploit the high computational power and memory bandwidth of graphics processors and present a novel sorting algorithm that performs rasterization operations on the GPUs. We use sorting as the main computational component for histogram approximation and the construction of ɛ-approximate quantile and frequency summaries. Our overall algorithms for numerical statistics computation on data streams are deterministic, applicable to fixed or variablesized sliding windows and use a limited memory footprint. We use the GPU as a co-processor and minimize the data transmission between the CPU and GPU by taking into account the low bus bandwidth. We have implemented our algorithms on a PC with a NVIDIA GeForce FX 6800 Ultra GPU and a 3:4 GHz Pentium IV CPU and applied them to large data streams consisting of more than 100 million values. We have compared their performance against optimized implementations of prior CPU-based algorithms. Overall, our results demonstrate that the graphics processor available on a commodity computer system is an efficient stream-processor and a useful co-processor for mining data streams.