Data Structures and Algorithms
Greetings! I'd like to introduce myself to this community and look forward to posting and conversing with you all about algorithms and data structures on the GPU. This is an area that's particularly interesting to me as a researcher; algorithms and data structures have been the core of much of our group's research.
My name is John Owens; I'm an associate professor of electrical and computer engineering at UC Davis. My research group of fantastic graduate students study problems across GPU computing, both the fundamentals of GPU computing (including data structures and algorithms) as well as applications. Beyond just research papers, I am also one of the maintainers of the CUDPP library (the “CUDA Data-Parallel Primitives”) that contains a number of freely available algorithms and data structures; I frequently help teach one-day courses on GPU computing at conferences like Supercomputing, Siggraph, ASPLOS, and IEEE Visualization; and I've been on the conference committee for the Graphics Hardware and High Performance Graphics conferences for several years, and will cochair HPG in 2011.
I first learned about parallel computing as a graduate student on the Imagine project at Stanford, working with (current NVIDIA chief scientist) Bill Dally and an amazing group of graduate student colleagues. Imagine was a data-parallel machine and so a good deal of our collective effort was finding efficient parallel algorithms that would run on Imagine. Learning to “think in parallel” was a lengthy process, but one that I think is increasingly important for computer engineers on today's and tomorrow's computer systems.
So: In this forum I'd like to concentrate on interesting developments in the area of parallel algorithms and data structures, with an emphasis on GPU computing. I particularly want to discuss algorithms and data structures that have general use beyond a single application. One of the benefits of working on CPUs is an enormous knowledge base of data structures and algorithms (often with associated and widely available libraries); in the GPU world we're nowhere near that point. The result is a lot of interesting problems to solve!
I talked with my graduate students and some other colleagues to get an idea of what might be useful to discuss here. I don't want to discuss things in the detail that's in an academic paper—I'll often just refer you to the papers for details—but I hope I can communicate the big ideas behind the work that may be useful for others solving similar problems, or who want to use a particular algorithm or data structure and understand how it works.
I am happy to take suggestions on interesting topics! Here's some topics that my graduate students and other colleagues thought might be interesting to begin:
- The radix sort algorithm of Satish et al. (IPDPS 2009) (also the sort currently used in CUDPP)
- Lars Nyland et al.'s all-pairs n-body simulation work (GPU Gems 3). This is one of my favorite GPU computing papers because the authors really motivate the design decisions they made in structuring their implementation of this algorithm.
- Algorithms that deal with nested or irregular parallelism, such as tree traversal; how do we make these run efficiently on a wide data-parallel machine?
- Work queues. How do we efficiently distribute work, particularly irregular work that can potentially generate an irregular amount of additional work, to many parallel compute units?
- The GPU-based hash table work that Dan Alcantara recently presented at Siggraph Asia in December 2009. I was a coauthor on this work and very much enjoyed working with the team that made this paper possible. If you like elegant algorithms, you'll enjoy hearing about “cuckoo hashing.”
Looking forward to your feedback, suggestions, and comments.