Loading...
3D Arrays and Fermi - FDTD
Sat, 11/21/2009 - 06:02
Will manipulating 3D arrays become any easier in Fermi GPUs than the current generation? I'm not well-versed in C (or C++) as most of my in-depth programming experience is in the old FORTRAN 77 and currently MATLAB. How does "support for C++" benefit me in this regrad if I'm to convert my FDTD code to C++?
Sat, 11/21/2009 - 09:08
#2
Direct control of how arrays are manipulated in GPU Memory
Thank you cgorag for your response...
I plan to eventually contrast my work with PGI and Jacket implementations. My objective, however, is to have direct influence on how the GPU's tiered memory is utilized and how data arrays are traversed. My initial experiments with PGI suggests no direct control is possible---for example, there is no interim CUDA code being generated for inspection and fine tuning before final compilation. I need to study the PGI documentation more closely to see if there are optional directives that allow this from within the main program.
Sat, 11/21/2009 - 09:37
#3
Re:
You may wish to lookup PGI user guide for "-Mcuda" option, and its "keepptx" sub-option - this way you should be able to generate this interim representation (the PTX virtual machine code) you're looking for (albeit: most of the time you actually "have direct influence" by just writing your kernel code - both C and Fortran are enough low level that I doubt much optimization could be achieved by messing with PTX instead, although indeed from time to time it is needed to look into PTX to understand some aspects of the kernel code behavior). As for AccelerEyes Jacket solution for Matlab, I think you don't have that kind of levels of freedom allowed there - they're providing pre-compiled MEX routines with CUDA-enabled versions of some of Matlab routines, but the underlying code is not avialable, so you won't be able to examine it at the low level.
Mon, 11/23/2009 - 01:38
#4
the -Mcuda=keepptx is
the -Mcuda=keepptx is supported only for the 2010 release, for 2009 release -ta=nvidia,keepgpu can be used to keep the CUDA code generated files.
Mon, 11/23/2009 - 03:28
#5
Thank you for the very helpful hint.
Sun, 12/13/2009 - 15:29
#6
support from accelereyes
Hi hadimf, Sounds like interesting work. Once you get to the point where you want to start looking at Jacket, send me a PM (which I believe will send me an email notification) and I can help get you the software and provide any support you may need on this. I look forward to learning more about this work and am happy to help where I can.
Mon, 11/23/2009 - 20:39
#7
Fermi Arch programming model and C++ features
The Fermi architecture does have alot of new features which would not be leveraged fully using previously released CUDA SDK's so we now have a developer only release of a new version called CUDA 3.0 SDK .
There are a couple of good documents describing the new architecture and pointing out some of the key changes : One such document is:
http://www.nvidia.com/object/fermi_architecture.html
Also if you go to the main page:
http://www.nvidia.com/object/fermi_architecture.html
Next to the image of each "expert" quote there are links to their personal write up of the key new features of Fermi.
The new "C++" features will actually help any programming language ported to this new architecture, features such as unified memory addressing, pointers to functions, function calls, multi-kernels and global memory cache are all examples of features which can - and I am sure will be leveraged by PGI and Mathlab in future releases of their products.
Tue, 11/24/2009 - 09:35
#8
Fermi program stack
I would think that with Fermi being the first Nvidia GPU to have a program stack and also having a shared coherent cache we're going to see a whole slew of new interesting extensions to the CUDA feature set? Recursive algorithms should now be possible which is a big change. Someone correct me if I'm wrong here please.
Thanks,
Robert
Tue, 11/24/2009 - 23:33
#9
Future CUDA releases
CUDA C is our platform for innovation :) - so no one should expect our CUDA 3.0 beta to be the final word in exposing the Fermi architecture.
Just looking at the evolutionary steps that occured during all the releases upto the most resent CUDA 2.3 many very cool and useful features were introduced, some time after GT200 was released.
Tue, 11/24/2009 - 13:07
#10
Multidimensional arrays in C and C++
In C, multidimensional arrays are most generally handled simply by handling the strides "by hand". That is, to access 3D array A at (i,j,k), you can simply do
A[i*istride + j*jstride + k*kstride]
Often, kstride is implicitly 1. In C++, this can be hidden with an inline member function which overrides the [] or () operator, so you can just use A(i,j,k) or A[i,j,k] according to your preference. If I'm not mistaken, this inlining is very simple and does not require any of Fermi's new hardware features, but just requires support from the nvcc compiler. I am not certain whether this works at present or not.
The older way of doing this in C is to define a preprocessor macro such as
#define A3(i,j,k) A[(i)*istride +((j)*jstride + (k)*kstride]
which assumes you've define istride, jstride, and kstride before using the macro. Then you can use A3(i,j,k) as you would expect. I'm not sure if this is the kind of support you were looking for.
Wed, 11/25/2009 - 00:07
#11
Spot On!
That is exactly what I was asking about. The algorithm I'm working on (high-order FDTD, where the stencil extends in all three dimensions including off-diagonals and across multiple arrays for every node) is complicated enough without having to perform striding "by hand".

BayWebSoft
If I understood it properly, Fermi devices will be programmed using same CUDA programming model as current devices - it's only that with the upcoming SDK releases the compiler is going to support more C++ constructs (some are already supported by previous/current SDK versions) in the device code. I'm not much into using C++ for writing this kind of code though, so I guess someone else could provide further details in that regard. But, if you're used to writing code in Fortran and/or Matlab, then why don't you try writing device code the same way? Namely, Portland Group recently released an update of their compiler suite, making it possible to write device code in Fortran (http://www.pgroup.com/resources/cudafortran.htm). And on the other side, there exist some vendors, like AccelerEyes (http://www.accelereyes.com/), providing tools for writing Matlab code that is going to be executed on the GPU; furthermore, according to an annoucement that just appeared here (http://www.gpucomputing.net/?q=node/75), MathWorks guys are actually considering building something alike by themselves.