Stories, Papers, WIKIs

Title Body
To DFM or not to DFM? (IEEE)

Abstract:

Design for manufacturability (DFM) is inevitable because of the formidable challenges encountered in nano-scale integrated circuit (IC) manufacturing. Unfortunately, it is difficult for designers to understand the cost-benefit tradeoff when tuning their design through DFM to achieve better manufacturability. This work attempts to assist the designer in this aspect by providing a methodology (called RADAR - Rule Assessment of Defect-Affected Regions) which uses failing-IC diagnosis results to systematically evaluate the effectiveness of DFM rules. RADAR is applied to the fail data from a 90nm Nvidia graphics processing unit (GPU) to demonstrate its viability. Specifically, evaluation of the via-enclosure rules revealed that they are much more needed in metal layers 3-6 than the remaining layers.

Paper available at IEEE.

GPU-Based Parallelization for Fast Circuit Optimization (ACM)

Abstract:
The progress of GPU (Graphics Processing Unit) technology opens a new avenue for boosting computing power. This work is an attempt to exploit the GPU for accelerating VLSI circuit optimization. We propose GPU-based parallel computing techniques and apply them on simultaneous gate sizing and threshold voltage assignment, which is a popular method for VLSI performance and power optimization. These techniques include efficient task scheduling and memory organization, all of which are aimed to fully utilize the advantages of GPUs. Compared to conventional sequential computation, our techniques can provide up to 56× (39× on average) speedup without any sacrifice on solution quality.

Paper available at ACM.

Towards Accelerating Irregular EDA Applications with GPUs (ACM)

Abstract:
Recently graphic processing units (GPUs) are rising as a new vehicle for high-performance, general purpose computing. It is attractive to unleash the power of GPU for Electronic Design Automation (EDA) computations to cut the design turn-around time of VLSI systems. EDA algorithms, however, generally depend on irregular data structures such as sparse matrix and graphs, which pose major challenges for efficient GPU implementations. In this paper, we propose high-performance GPU implementations for a set of important irregular EDA computing patterns including sparse matrix, graph algorithms and message-passing algorithms. In the sparse matrix domain, we solve a core problem, sparse-matrix vector product (SMVP). On a wide range of EDA problem instances, our SMVP implementation outperforms all prior work and achieves a speedup up to 50x over the CPU baseline implementation. The GPU based SMVP procedure is applied to successfully accelerate two core EDA computing engines, timing analysis and linear system solution. In the graph algorithm domain, we developed a SMVP based formulation to efficiently solve the breadth-first search (BFS) problem on GPUs. We also developed efficient solutions for two message-passing algorithms, survey propagation (SP) based SAT solution and a register-transfer level (RTL) simulation. Our results prove that GPUs have a strong potential to accelerate EDA computing through designing GPU-friendly algorithms and/or re-organizing computing structures of sequential algorithms.

Paper available at ACM.

Using Graphics Processing Units for Logic Simulation of Electronic Designs (ACM)

Abstract:
Logic simulation is the major verification technique used for electronic system designs. Speeding up logic simulation results in great savings and shorter time-to-market. We parallelize logic simulation using Graphics Processing Units (GPUs). We present a parallel cycle-based logic simulation algorithm that uses And Inverter Graphs (AIGs) as design representations. We partition the gates in the design into independent blocks and simulate these blocks using the GPU. Our algorithm exploits the massively parallel GPU architecture featuring thousands of concurrent threads, fast memory, and memory coalescing for optimizations. We demonstrate upto 21x speedup on several benchmarks using our simulation system.

Paper available at ACM.

GPU-Based Acceleration of the Time-Domain Electrical Full-Wave Solvers in PI/SI/EMI Simulation (IEEE)

Abstract:

In this paper, massively parallel time-domain electrical full-wave solvers, which can be used in circuit and electromagnetic-field simulators, are described. First, FDTD (Finite-Difference Time-Domain) method and its derivative methods such as semi-implicit type of FDTD method and LIM (Latency Insertion Method) are introduced for the large-scale simulation in PI/SI/EMI design. Next, several acceleration techniques of these numerical simulation methods are reviewed and discussed, which are based on the parallel computing methods such as PC-cluster and GPGPU (Graphics Processing Units). Finally, the computational performances on the hardware accelerators are compared each other and systematically summarized.

Paper available at IEEE.

To DFM or not to DFM? (ACM)

Abstract:
Design for manufacturability (DFM) is inevitable because of the formidable challenges encountered in nano-scale integrated circuit (IC) manufacturing. Unfortunately, it is difficult for designers to understand the cost-benefit tradeoff when tuning their design through DFM to achieve better manufacturability. This work attempts to assist the designer in this aspect by providing a methodology (called RADAR --- Rule Assessment of Defect-Affected Regions) which uses failing-IC diagnosis results to systematically evaluate the effectiveness of DFM rules. RADAR is applied to the fail data from a 90nm Nvidia graphics processing unit (GPU) to demonstrate its viability. Specifically, evaluation of the via-enclosure rules revealed that they are much more needed in metal layers 3--6 than the remaining layers.

Paper available at ACM.

Processing of Circular SAR trajectories with Fast Factorized Back-Projection (IEEE)

Abstract:
This paper describes the implementation of an efficient implementation of a Fast Factorized Back Projection (FFBP) algorithm for Circular SAR (CSAR) trajectories for real airborne data. Unlike Fourier-domain based focusing processors, this approach considers the azimuth variance and topography changes with high accuracy, while improving significantly the computational time factor in comparison with the direct Back Projection (BP). To further accelerate the focusing, the circular FFBP was implemented also on a Graphics Processor Unit (GPU). In the second part of the document is shown a fully polarimetric image of the region of Kaufbeuren, Germany, (acquired by the DLR's E-SAR system) focused with this method to the theoretical limit of ∼ λ over 4. Thus, the efficiency, accuracy and performance of the circular FFBP is demonstrated, as well as the potential of CSAR when focusing over 360 °.

Paper available at IEEE.

A fast CAST-Based Clustering Algorithm for Very Large Database (IEEE)

Abstract:

The advances in nanometer technology and integrated circuit technology enable the graphics card to attach individual memory and one or more processing units, named GPU, in which most of the graphing instructions can be processed parallelly. Obviously, the computation resource can be used to improve the execution efficiency of not only graphing applications but other time consuming applications like data mining. CAST (Clustering Affinity Search Technique) is a famous clustering algorithm, which is widely used in clustering the biological data. In this paper, we will propose two algorithms, namely Calculation-On-Demand CAST, abbreviated as COD-CAST and Calculation-On-Demand CAST with GPU, abbreviated as COD-CAST-GPU, respectively. The first proposed COD-CAST algorithm is a refined CAST algorithm that can process large amount of objects more efficiently in terms of execution time. The proposed COD-CAST-GPU algorithm can utilize the GPU and the individual memory of graphics card to accelerate the COD-CAST. The experimental results show that our proposed algorithms deliver excellent performance in terms of execution time and required memory.

Paper available at IEEE.

Using Graphics Processing Units for Logic Simulation of Electronic Designs (IEEE)

Abstract:

Logic simulation is the major verification technique used for electronic system designs. Speeding up logic simulation results in great savings and shorter time-to-market. We parallelize logic simulation using Graphics Processing Units (GPUs). We present a parallel cycle-based logic simulation algorithm that uses And Inverter Graphs (AIGs) as design representations. We partition the gates in the design into independent blocks and simulate these blocks using the GPU. Our algorithm exploits the massively parallel GPU architecture featuring thousands of concurrent threads, fast memory, and memory coalescing for optimizations. We demonstrate upto 21x speedup on several benchmarks using our simulation system.

Paper available at IEEE.

Efficient Full Wave Analysis of electrically large multilayered radomes (IEEE)

Abstract:

The paper presents efficient full wave analysis of electrically large multilayered radomes, which are based on advanced MoM techniques: 1) higher order basis functions, 2) "smart reduction" of expansion orders, 3) excitation of structure by field generators, 4) equidistant positioning of dielectric surfaces based on equal meshing of all layers, 5) direct solution of matrix equation based on LU decomposition, 6) out-of-core matrix solver, and 7) CPU and GPU parallelization. Results for A-sandwich radome of 8 λ in diameter, which are obtained in 7 minutes at PC, are shown for different beam steering angles. Two different A-sandwich radomes with 16 λ in diameter are shown in order to illustrate influence of the middle layer thickness on the radiation pattern. Finally, the radiation pattern of the array with a radome which diameter is equal to 60 λ is shown.

Paper available at IEEE.