Loading...
| Code Snippet Name | Code Snippet Description | Parent Node | Communities |
|---|---|---|---|
| Pixel Buffer Objects: Mixing CUDA and OpenGL within the same application |
Following is the source for the Doctor Dobb's Journal article Part 15 Using Pixel Buffer Objects with CUDA and OpenGL. This source includes Microsoft Visual Studio build files as well as a Linux command-line to build an executable. Many thanks to Joe Stam at NVIDIA for providing the Visual Studio build files. Joe also notes you need to remove the following lines from perlinKernelPBO.cu: #include <cutil_gl_inline.h> #include <cuda_gl_interop.h>
|
Machine Learning and Data Mining | |
| Using Vertex Buffer Objects with CUDA and very fast surface rendering with primitive restart |
Following is the source for the Doctor Dobb's Journal article for a future article entitled Using Vertex Buffer Objects with CUDA and OpenGL. This source includes Microsoft Visual Studio build files as well as a Linux command-line to build an executable. This code demonstrates how to draw 3D points, wireframe and surfaces using the framework described in Part 15 of my Doctor Dobb's article Using Pixel Buffer Objects with CUDA and OpenGL. I left in some ifdef statements so you can verify for yourself the speed of using Primitive Restart to bypass PCI bus bandwidth limitations. Many thanks to Joe Stam at NVIDIA for providing the Visual Studio build files. Joe also notes you need to remove the following lines from perlinKernelVBO.cu and change the uint variable in runCUDA to "unsigned int": #include <cutil_gl_inline.h> #include <cuda_gl_interop.h>
|
Machine Learning and Data Mining | |
| Line forward projection on CUDA |
|
GPU Computing Gems Source Code | |
| Cone-Beam CT image reconstruction using the Katsevich Algorithm |
This program reconstructs Helical Cone-Beam CT images using the Katsevich algorithm. Two versions are included. Kat_1024 reconstructs the 1024x1024x1024 volumes, it allocates and deallocates intermediary memory to accomodate the large size projections used for 1024. The version Kat_512 is a bit better, it allocates all the device memory needed at the beginning and free only at the end. It is better when reconstructing 512x512x512 volumes or smaller. It can not reconstruct 1024x1024x1024 volumes unless the GPU board has 6GB memory or more. The program genproj, generates the projections to be used for reconstruction tests. Real projections from CT machines can be used,
|
GPU Computing Gems Source Code | |
| Parallelization of the x264 encoder using OpenCL |
We present an OpenCL enhanced version of the x264 video encoder, using GPUs to accelerate the processing of motion estimation and other significant parts of the algorithm. We present a system wide approach, where we concentrate on the whole encoder architecture, not only in accelerating the critical paths. This demo includes the full source code for the OpenCL enhanced version of x264, plus some scripts to fetch images from "Big Buck Bunny" and encode them using x264. Please note that the download of the source images might take a long time and take some serious space in your hard drive (~3.4G). We are working in adding the source code to the x264 development tree. More info can be found at: http://li5.ziti.uni-heidelberg.de/x264gpu/ |
GPU Computing Gems Source Code | |
| MAGMA Library |
Major chip manufacturers are developing next-generation microprocessor designs that are heterogeneous/hybrid in nature, integrating homogeneous x86-based multicore CPU components and GPU components. The MAGMA (Matrix Algebra on GPU and Multicore Architectures) project’s goal is to develop innovative linear algebra algorithms and to incorporate them into a library that is
but targeting the
|
||
| A Programmable Graphics Pipeline in CUDA for Order Independent Transparency |
This work present a rasterization based rendering pipeline using CUDA. We discuss the implementation details of the basic functionalities in hardware rendering pipeline, with focus on triangle rasterization and raster operations. Within this architecture, we propose two single pass algorithms for efficient rendering of order independent transparency. The results demonstrate significant performance speedups in comparison to the state-of-the-art methods that are based on traditional graphics pipelines. This work is based on the SI3D paper "FreePipe: a Programmable Parallel Rendering Architecture for Efficient Multi-Fragment Effects" (http://portal.acm.org/citation.cfm?id=1730804.1730817). The source code is attached below, please read the "Readme" before running the code. |
GPU Computing Gems Source Code | |
| Multiclass Support Vector Machine |
The scaling of serial algorithms cannot rely on the improvement of CPUs anymore. The performance of classical Support Vector Machine (SVM) implementations has reached its limit and the arrival of the multi core era requires these algorithms to adapt to a new parallel scenario. Graphics Processing Units (GPU) have arisen as high performance platforms to implement data parallel algorithms. In this project, it is described how a naïve implementation of a multiclass classifier based on SVMs can map its inherent degrees of parallelism to the GPU programming model and efficiently use its computational throughput. Empirical results show that the training and classification time of the algorithm can be reduced an order of magnitude compared to a classical solver, LIBSVM, while guaranteeing the same accuracy. Please find attached the multisvm 2.0 release of the source code. The link to the source code repository where future versions will be available is http://code.google.com/p/multisvm/ * Sample datasets were removed due to their large file size. These can be obtained from the code repository site or the LIBSVM site. ** To compile the code please add the following CUDA libraries to the bin folder of the project or download the release code from the google code repository (which already contains these as part of the visual studio solution). cublas64_30_14.dll cudart64_30_14.dll cudpp64_30_14.dll cufft64_30_14.dll cutil64.dll glew64.dll glut32.dll |
Computational Finance, Machine Learning and Data Mining, GPU Computing Gems Source Code | |
| Haar Classifiers for Object Detection with CUDA: Pixel-parallel processing kernel |
This kernel performs pixel-parallel processing of the image using Haar classifiers cascade. The getter-functions implement interfaces to various kinds of GPU memory, which is dispatched by the kernel template parameters. The snippet is presented as pseudo-code. To query the status of the project source code, contact me directly Anton Obukhov < aobukhov@nvidia.com > or devsupport@nvidia.com. |
GPU Computing Gems Source Code | |
| RNA folding GPU |
This code is a GPU implementation of the 'hybrid-ss-min' function of the Unafold package computing RNA secondary structure. |
GPU Computing Gems Source Code |
BayWebSoft