Loading...
Stories, Papers, WIKIs
| Title | Body |
|---|---|
| Fast Full-Wave Modeling of Passive Structures with Graphic Processors (IEEE) |
Abstract: A parallel computation approach based on the properties of the Graphics Processor Units (GPU) is here presented to speed-up the broadband modeling of passive 3D structures. The full-wave electromagnetic model is based on a surface integral formulation, numerically implemented by using a null-pinv decomposition of the unknowns. The numerical model has been proven to be accurate and well-posed for a frequency range from DC to hundreds of GHz. A bottleneck of the model is the assembly of fully populated matrices and the final matrix inversion. This paper presents A GPU parallelization of the matrix assembly phase, and analyzes two case-studies which refer to full-wave analysis of interconnects. The achieved speedup with respect to a conventional serial approach is around 50x. Paper available at IEEE. |
| Efficient Full Wave 3D EM Modeling of Large Phased Arrays (by WIPL-D software) (IEEE) |
Abstract: The paper presents efficient technique for full wave 3D EM modeling of electrically large arrays. Basically it consists in two steps: 1) the network parameters (Y,Z,S) of the array are calculated together with its radiation patterns when each of the elements is active, while all other elements are short-circuited; 2) radiation patterns are obtained by arbitrary combinations of feeding voltages by post-processing data from the first step. 3D EM modeling is facilitated decomposing the problem having geometrical symmetry and asymmetrical excitation into four fully symmetrical problems of quarter size. Matrix solution is accelerated using graphical processor unit (GPU). Thus, array problems of 400 and 900 microstrip patch antennas become doable in 6 hours and 1 day, respectively. Paper available at IEEE. |
| Large-Scale Multi-Robot Mapping in MAGIC 2010 (IEEE) |
Abstract: We describe a large-scale decentralised multi-robot mapping system that outputs globally optimised metric maps in real-time. The mapping system was used by team WAMbot in the finals of the Multi-Autonomous Ground-robotics International Challenge (MAGIC 2010). Research contributions include a novel large-scale multi-robot graph-based non-linear map optimisation approach, a hybrid decentralised and distributed mapping system and novel graphics processing unit (GPU) based approaches for accelerating intensive map matching and fusion operations. Our mapping system scales linearly with map size and on commodity hardware can easily map a 500m×500m urban area. We demonstrate robust, highly efficient and accurate mapping results from two different fleets of mobile robots. Videos, maps and timing results from the MAGIC 2010 challenge are presented. Paper available at IEEE. |
| Solving Electrically Large EM Problems by Using Out-Of-Core Solver Accelerated with Multiple Graphical Processing Units (IEEE) |
Abstract: We present results for frequency-domain MoM simulations of electrically large structures using out-of-core solver accelerated with multiple GPUs on a single personal computer. The structures analyzed in order to demonstrate the efficiency of proposed out-of-core solver are Cassegrain reflector antenna with up to 240 λ reflector diameter and Luneburg lens, up to 16 λ diameter, excited with a half-wavelength dipole. The acceleration of out-of-core solver is up to 10 times with one GPU compared to a standard CPU, or up to 20 times when using 3 GPUs.
Paper available at IEEE |
| Parallel Statistical Analysis of Analog Circuits by GPU-Accelerated Graph-Based Approach (IEEE) |
Abstract: In this paper, we propose a new parallel statistical analysis method for large analog circuits using determinant decision diagram (DDD) based graph technique based on GPU platforms. DDD-based symbolic analysis technique enables exact symbolic analysis of vary large analog circuits. But we show that DDD-based graph analysis is very amenable for massively threaded based parallel computing based on GPU platforms. We design novel data structures to represent the DDD graphs in the GPUs to enable fast memory access of massive parallel threads for computing the numerical values of DDD graphs. The new method is inspired by inherent data parallelism and simple data independence in the DDD-based numerical evaluation process. Experimental results show that the new evaluation algorithm can achieve about one to two order of magnitudes speedup over the serial CPU based evaluations and 2–3 times speedup over numerical SPICE-based simulation method on some large analog circuits.
Paper available at IEEE.
|
| Neural Network-Based Thermal Simulation of Integrated Circuits on GPUs (IEEE) |
Abstract: With the rising challenges in heat removal in integrated circuits (ICs), the development of thermal-aware computing architectures and run-time management systems has become indispensable to the continuation of IC design scaling. These thermal-aware design technologies of the future strongly depend on the availability of efficient and accurate means for thermal modeling and analysis. These thermal models must have not only the sufficient accuracy to capture the complex mechanisms that regulate thermal diffusion in ICs, but also a level of abstraction that allows for their fast execution for design space exploration. In this paper, we propose an innovative thermal modeling approach for full-chips that can handle the scalability problem of transient heat flow simulation in large 2-D/3-D multiprocessor ICs. This is achieved by parallelizing the computation-intensive task of transient temperature tracking using neural networks and exploiting the computational power of massively parallel graphics processing units. Our results show up to 35× run-time speedup compared to state-of-the-art IC thermal simulation tools while keeping the error lower than 1°C. Speedups scale with the size of the 3-D multiprocessor ICs and our proposed method serves as a valuable design space exploration tool. Paper available at IEEE. |
| iSense3D: A Real-time Viewpoint-Aware 3D Video Synthesis System (IEEE) |
Abstract: In this paper, a real-time 3D video synthesis system is proposed. The system achieves more real 3D effect by dynamically adapts the synthesized view to user's viewpoint. There are two major parts: 6D viewpoint parameter extraction, and real-time Free-Viewpoint View Synthesis(FVVS). 6D viewpoint parameter extraction is done by 3D object tracking over image and depth. Various techniques for FVVS are proposed and implemented on GPU to achieve real-time. The system is demonstrated on multi-core system with programmable GPU. Real-time performance up to 1280×720p with 30fps is achieved. Paper avaliable at IEEE. |
| RAG: An Efficient Reliability Analysis of Logic Circuits on Graphics Processing Units (IEEE) |
Abstract: In this paper, we present RAG, an efficient Reliability Analysis tool based on Graphics processing units (GPU). RAG is a fault injection based parallel stochastic simulator implemented on a state-of-the-art GPU. A two-stage simulation framework is proposed to exploit the high computation efficiency of GPUs. Experimental results demonstrate the accuracy and performance of RAG. An average speedup of 412× and 198× is achieved compared to two state-of-the-art CPU-based approaches for reliability analysis. Paper available at IEEE. |
| Row-Based Analysis of Structure Power/Ground Grids with General Purpose GPU (ACM) |
Abstract: As mega-scale power/ground (P/G) grids came into being, the IR drop analysis is of the daunting computational complexity. By taking the topological advantage of the structure P/G grids, this work uses the row-based analysis method to transform the mesh-circuit analysis into many parallel triangle-diagonal row-circuit analyses of far smaller complexity. Then, the Graphics Process Unit (GPU) is employed to fast solve these row circuits in the parallel style. And this work further employs the LU decomposition of the triple-diagonal matrix to increase the efficiency of our method. Experimental results show that our method out-performs the traditional methods implemented on CPU. For mega-scale P/G grids of 1-4 million nodes, our GPU-implemented method is 9-12 times faster than its CPU counterpart and 2-3 times faster than its OpenMP counterpart.
Paper available at ACM. |
| Accelerating RTL Simulation with GPUs (ACM) |
Abstract: With the fast increasing complexity of integrated circuits, verification has become the bottleneck of today‘s IC design flow. In fact, over 70% of the IC design turn-around time can be spent on the verification process in a typical IC design project. Among various verification tasks, Register Transfer Level (RTL) simulation is the most widely used method to validate the correctness of digital IC designs. When simulating a large IC design with complicated internal behaviors (e.g., CPU cores running embedded software), RTL simulation can be extremely time consuming. Since RTL-to-layout is still the most prevalent IC design methodology, it is essential to speedup the RTL simulation process. Recently, General Purpose computing on Graphics Processing Units (GPGPU) is becoming a promising paradigm to accelerate computing-intensive workloads. A few recent works have demonstrated the effectiveness of using GPU to expedite gate and system level simulation tasks. In this work, we proposed an efficient GPU-accelerated RTL simulation framework. We introduce a methodology to translate Verilog RTL description into equivalent GPU source code so as to simulate circuit behavior on GPUs. In addition, a CMB based parallel simulation protocol is also adopted to provide a sufficient level of parallelism. Because RTL simulation lacks data-level parallelism, we also present a novel solution to use GPU as an efficient task-level parallel processor. Experimental results prove that our GPU based simulator outperforms a commercial sequential RTL simulator by over 20 fold. Paper available at ACM. |

BayWebSoft