Loading...
It takes problem-domain and algorithm knowledge to be a superhero
Submitted by John Stratton on Tue, 12/08/2009 - 15:15 CST CST
Recently tried my hand at the CUDA Superhero Challenge 2. Tried a quick-and-dirty brute-force attempt just to see if it was even remotely possible in the time constraints (it wasn't), and then did a little Monte Carlo exploration, which did much better. Still, the solutions I was getting in the time limit were scoring way below the standing leaders, and I ran out of ideas.
It could be that with a more directly CS-oriented background I would have had a better shot, and I hope to get a chance to see what the best solutions were. I think my biggest problem was not knowing well enough how to constrain the problem and spend more time looking for "good" solutions.
With our research group's knowledge, I'm pretty well convinced that if given an algorithm, we've got the in-house knowledge to make it run as fast as humanly possible on NVIDIA's chips. But with our focus there, we don't always have the domain knowledge to know how the algorithm could adapt and get an even better solution. Which is why we keep so closely connected with application domain research groups: it takes domain and algorithm knowledge in addition to programming and architecture knowledge to craft the world-changing solution.
- John Stratton's blog
- Login or register to post comments
Fri, 01/15/2010 - 13:55
#2
Winners Code is accessible
I didn't notice this blog - sorry - still getting used to the system like everyone else :)
The code from the winning entry is posted on TopCoder. Its good to look at. In this contest you are absolutely right getting the right huristic is most important - then appling the additional performance of the GPU to transverse more of the possible solutions.
Alot of contestants used simulated anealing to hone the results.
There was also multiple levels of parallelism which could be used - every time the repair team returned home basically made the next series independent of the previous one - so in theory one could run all the of sections in parallel. In addition within the each section we could also perform swaps in parallel.
Anyway - its always good to look at new solution - I'll be interesting in what you think of the solution.
Sat, 02/06/2010 - 11:10
#3
Of course, any sequence of
Of course, any sequence of destinations is independent and can be processed in parallel. This is a good optimization.
Where to get the winners code?
Mon, 12/14/2009 - 11:54
#4
Fun exercise
Part of the fun of such a project, though, is that there aren't any well-known algorithms out there for such a variant on a travelling salesman problem.
I only found out about the contest as it was about to end; I wish the testing, etc had stayed up so we could play with the problem anyway. In the end I implemented a greedy search (went forward and backward looking for the cheapest next step) as a starting point, and then did edge flips (you can do all possible single-edge flips with just one pass through the data) repeatedly, accepting them if they reduced the cost. It did ok (I think) but I have no idea how it would have done.
I broke it down into one maintance period per block, which obviously has real disadvantages (there are way too few to keep the machine really busy) but couldn't figure out any other nice way to do things. I'm pretty sure my arbitrary-N reduction could have used a lot of work, too.
I hope NVidia keeps sponsoring these, it seems like it is good for community development.
BayWebSoft
I have programmed a more sophisticated algorithm but it demanded more performance for success execution. I tried to run it on the CPU and after 10 min for a one iteration (of M), he gave the solution in 5.7 points. For the GPU I made a lot of optimizations, but the best solution on GPU for 20 seconds was about 4.4. One of my optimizations has been packing latitude and longitude in the hypercube ((N+2)x(N+2)x(N+2)x(N+2)xM - 3 GB of memory in worst case) with cells containing distance, packed in one byte. It was bad approach, because building a cube with M = 30 takes 5 seconds on Tesla C1060. And now i found another way.