Stories, Papers, WIKIs

Title Body
Massively Parallel Finite Element Simulator for Full-Chip STI Stress Analysis (IEEE)

Abstract

In modern integrated circuit (IC) designs with feature size finer than 90nm, the stress among different material layers is playing an important role in determining device performance. The stress can be classified into two categories, stress deliberately introduced during semiconductor process, and stress unintentionally formed through the synergy of different processing steps. Among different types of inadvertent stresses, Shallow trench isolation (STI) stress which is exerted from the isolation materials is the primary one that has a major impact on circuit characteristics. A detailed analysis of STI stress on an IC chip, however, is a complicated process because the stress is determined by the distribution of layout patterns, which could add up to trillions in today's typical IC designs. The traditional technology computer aided design (TCAD) tools for such an analysis are already too slow on large circuits. In this work, a GPU-based finite element simulator for full chip stress analysis is developed. Experimental results showed that the GPU-based simulator could outperform its CPU equivalent by a factor of 20X. Such a speedup would allow detailed stress-aware performance optimization for large ICs.

Paper available at IEEE.

A GPU/CUDA Implementation of the Collection-Diffusion Model to Compute SER of Large Area and Complex Circuits (IEEE)

Abstract

This work reports the CUDA implementation of the collection-diffusion model to compute the soft-error rate (SER) of large area and/or complex circuits on graphics processing units (GPU). We detail the time parallelization introduced in the algorithm to accelerate by one order of magnitude the SER calculation. Code performances are evaluated on a NVIDIA Tesla C1060 GPU card for the calculation of the SER of a 65 nm SRAM circuit subjected to an alpha-particle source irradiation.

Paper available at IEEE.

Hardware-Efficient Belief Propagation (IEEE)

Abstract

Loopy belief propagation (BP) is an effective solution for assigning labels to the nodes of a graphical model such as the Markov random field (MRF), but it requires high memory, bandwidth, and computational costs. Furthermore, the iterative, pixel-wise, and sequential operations of BP make it difficult to parallelize the computation. In this paper, we propose two tech-niques to address these issues. The first technique is a new mes-sage passing scheme named tile-based belief propagation that reduces the memory and bandwidth to a fraction of the ordinary BP algorithms without performance degradation by splitting the MRF into many tiles and only storing the messages across the neighboring tiles. The tile-wise processing also enables data reuse and pipeline, resulting in efficient hardware implementation. The second technique is an O(L) fast message construction algorithm that exploits the properties of robust functions for parallelization. We apply these two techniques to a VLSI circuit for stereo matching that generates high-resolution disparity maps in near real-time. We also implement the proposed schemes on GPU which is four-time faster than standard BP on GPU.

Paper available at IEEE.

GPGPU-based Gaussian Filtering for Surface Metrological Data Processing (IEEE)

Abstract

Engineering surfaces are characterized by the form, waviness and roughness features that are comprised of a range of spatial wavelengths. Filtering techniques are commonly adopted to separate these different wavelength components into well-defined bandwidths for further processing. The Gaussian filtered surface in which a 2D Gaussian filter is employed for surface assessments has been recommended by the ISO 11562-1996 and ASME B46-1995 standards to establish a reference surface. For Gaussian filtering, computational efficiency is a key problem when it is issued on a large set of surface metrology data. In the past this problem was tackled through reducing computation amount by the design and adoption of some fast algorithms. In this paper, a general purpose computing on GPU (GPGPU) framework is discussed to accelerate 2D Gaussian filtering for surface characterization. This framework takes advantage of the GPUpsilas parallel computing ability and has achieved better data efficiency without reducing the computational amount while maintaining the filtering quality. Filtering results and their accuracy from this model have been compared with the results obtained from the MATLAB simulation kits and the satisfied outcomes were observed.

Paper available at IEEE.

Discontinuous Galerkin Time Domain for Maxwell´s equations on GPUs (IEEE)

Abstract

In this paper, we discuss our approach on the GPU implementation of the Discontinuous Galerkin Time-Domain (DGTD) method to solve the time dependent Maxwell's equations. We exploit the inherent DGTD parallelism and combine the GPU computing capabilities with the benefits of a local time-stepping strategy. The combination results in significant increase in efficiency and reduction of the computational time, especially for multi-scale applications.

Paper available at IEEE.

Efficient Fault Simulation on Many-Core Processors (IEEE)

Abstract

Fault simulation is essential in test generation, design for test and reliability assessment of integrated circuits. Reliability analysis and the simulation of self-test structures are particularly computationally expensive as a large number of patterns has to be evaluated. In this work, we propose to map a fault simulation algorithm based on the parallel-pattern single-fault propagation (PPSFP) paradigm to many-core architectures and describe the involved algorithmic optimizations. Many-core architectures are characterized by a high number of simple execution units with small local memory. The proposed fault simulation algorithm exploits the parallelism of these architectures by use of parallel data structures. The algorithm is implemented for the NVIDIA GT200 Graphics Processing Unit (GPU) architecture and achieves a speed-up of up to 17x compared to an existing GPU fault-simulation algorithm and up to 16x compared to state-of-the-art algorithms on conventional processor architectures.

Paper available at IEEE.

Three-Dimensional Particle Beam Simulation Using High Performance Graphics Processing Hardware (IEEE)

Abstract

In this paper, a new three- dimensional particle beam simulation using a high performance hardware is introduced. The graphics processing unit (GPU) hardware is highly parallel and has low-cost computational capabilities. The code uses NVIDIA CUDA programming model and a particle trajectory integration steps on the graphics hardware, in a standard 4th-order Runge-Kutta scheme. The model uses mesh less representations for electromagnetic fields, based on either analytic formulas or field expansion techniques, to achieve a high degree of parallelism in the calculations. The author describe potential applications of the code for 3D simulation of vacuum electronic devices, and present details of both the algorithms used and simulation performance results.

Paper available at IEEE.

 

Using CUDA Enabled FDTD Simulations to Solve Multi-Gigahertz EMI Challenges (IEEE)

Abstract

Thanks to the application of GPU-CUDA acceleration technology to EM simulation tools, more and more complicated EMI challenges can be efficiently investigated and solved very early in the design process. This paper presents a novel methodology to predict EMI emission due to memory SSO noise from a real, commercial graphics card by means of a commercially available CUDA accelerated full-wave FDTD simulator. It is shown that thanks to the CUDA acceleration one can estimate the influence of on-board decoupling capacitors on the EMI emission within hours.

Paper available at IEEE.

Dead-Time Compensation for VSI Based Power Supply with Small Filter Inductor (IEEE)

Abstract

For voltage source inverter (VSI) based power supply with small filter inductor, the current ripple of the power switch is large especially under light load conditions. The dead-time effect in this situation is analyzed in this paper. The results show that the dead-time effect is eliminated at most zero-crossing instants of the current. Conventional dead-time compensation methods either average voltage based or pulse based cannot be adopted in this case because the current polarity changes many times within one fundamental period. A closed-loop method for dead-time compensation is proposed for this condition. The proposed method does not need the detection of the current polarity or any extra hardware and is very easy to implement. Experimental results on a DSP controlled three-phase 400 Hz GPU show that the low-order harmonics introduced by the dead-time in the output voltage are almost eliminated completely.

Paper available at IEEE.

Virtual-EMI Lab: Removing Mysteries From Black-Magic to a Successful Front-End Design (IEEE)

Abstract

EMI engineers are struggling everyday with complex radiation problems that fail critical products to pass EMI certification and causes big loss of profit. Advances in EMI engineering are following a similar trend like Signal-Integrity engineering 10-years ago when simulation tools became capable of providing accurate predictive simulations in a reasonable amount of time. With careful engineering utilizing cutting-edge full-wave field-solver software: Momentum (MOM), EMpro (FDTD) along with a hardware boost with heterogeneous massive CPU/GPU parallel processing (CUDA) technology, we can move the EMI teams from the back-end black-magic to a successful cost-effective front-end design. This paper presents an innovative process (Virtual-EMI lab) for pre- and post-tape-out providing the designers with an early stage EMI-suppression matrix (on-chip and onboard enablers) to find the optimum trade-off between performance and cost.

Paper available at IEEE.