Loading...
Stories, Papers, WIKIs
| Title | Body |
|---|---|
| Accelerating Computational Electromagnetic Diffraction Model on Programmable Graphics Processors |
Abstract:
EDM, stands for “Electromagnetic Diffraction Model” is used in wafer metrology in order to deduce the quality of the photolithographic process. This numerical model is dedicated to solve sets of linear algebraic equations - i.e. electromagnetic wave equations - by means of computing Fast Fourier Transforms (FFT). The time complexity of EDM is possessed by computing the 3D electromagnetic wave equation that is solved by 2D convolution. It solo consumes about 50% of the total solving time of this method on serial computers. Therefore, in this thesis, the main focus is on accelerating these computations on massively parallel hardware. Driven by the huge numerical computing demand of this application, Graphic Processing Unit (GPU) has become the top choice to be used throughout this thesis, because of its tremendous performance. Thus, this thesis introduces a framework for the GPU-based parallel implementation and explorers the performance of solving such computations on general purpose GPUs using NVIDIA CUDA programming model. This thesis highlights modest algorithm modifications that could significantly increase the data parallelism. The overall results show that the proposed parallel algorithms have been able to fully utilize CUDA architecture features justifying the use of such technology for general purposes. It reveals that the GPU-based parallel implementation for a big enough problem size yields a speedup factor of about 6-19 times faster than its counterpart that runs serially on the CPU. |
| FPGA-Based Hardware Acceleration of Lithographic Aerial Image Simulation |
Abstract:
Lithography simulation, an essential step in design for manufacturability (DFM), is still far |
| Accelerating System-Level Design Tasks using Commodity Graphics Hardware: A Case Study |
Many system-level design tasks (e.g. timing analysis, hardware/software partitioning and design space exploration) involve computational kernels that are intractable (usually NP-hard). As a result, they involve high running times even for mid-sized problems. In this paper we explore the possibility of using commodity graphics processing units (GPUs) to accelerate such tasks that commonly arise in the electronic design automation (EDA) domain. We demonstrate this idea via a detailed case study on a general hardware/software design space exploration problem and propose a GPU-based engine for it. Not only does this problem commonly arise in the embedded systems domain, its computational kernel turns out to be a general combinatorial optimization problem (viz. the knapsack problem) which lies at the heart of several EDA applications. Our experimental results show that our GPU-based implementation offers very attractive speedups for this computational kernel (up to 100×), and speedups of up to 17× for the full problem. In contrast to ASIC/FPGA-based accelerators – since even low-end desktop and notebook computers are today equipped with GPUs – our solution involves no extra hardware cost. Although recent research has shown the benefits of using GPUs for a variety of non-graphics applications (e.g. in databases and bioinformatics), hardly any work has been done on harnessing the parallelism of GPUs to accelerate problems from the EDA domain. We hope that our results and the generality of the problem we address will motivate researchers from this community to explore the possibility of using GPUs for a wider variety of problems from the EDA domain. |
| Accelerating system-level design tasks using commodity graphics hardware: A case study |
Abstract:
Many system-level design tasks (e.g. timing analysis, hardware/software partitioning and design space exploration) involve |
| Parallel Multi-level Analytical Global Placement on Graphics Processing Units |
Abstract:
GPU platforms are becoming increasingly attractive for implementing accelerators because they feature a larger number |
| An Improved Parallel Implementation of 3D DRIE Simulation on GPU |
Abstract:
Deep reactive ion etching (DRIE) technique is a new and powerful tool in Micro-Electro-Mechanical Systems |
| GPU-based Acceleration of System-Level Design Tasks |
Abstract: Many system-level design tasks (e.g., high-level timing analysis, hardware/software partitioning and design space exploration) involve computational kernels that are intractable (usually NP-hard). As a result, they involve high running times even for mid-sized problems. In this paper we explore the possibility of using commodity graphics processing units (GPUs) to accelerate such tasks that commonly arise in the electronic design automation (EDA) domain. We demonstrate this idea via two detailed case studies. The first explores the possibility of using GPUs to speedup standard schedulability analysis problems. The second proposes a GPU-based engine for a general hardware/software design space exploration problem. Not only do these problems commonly arise in the embedded systems domain, their computational kernels turn out to be variants of a combinatorial optimization problem – viz., the knapsack problem – that lies at the heart of several EDA applications. Experimental results show that our GPU-based implementations offer very attractive speedups for the computational kernels (up to 100×), and speedups of up to 17× for the full problem. In contrast to ASIC/FPGA-based accelerators – given that even low-end desktop and notebook computers are now equipped with GPUs – our solution involves no extra hardware cost. Although recent research has shown the benefits of using GPUs for a variety of non-graphics applications (e.g., in databases and bioinformatics), harnessing the parallelism of GPUs to accelerate problems from the EDA domain has not been sufficiently explored so far. We believe that our results and the generality of the core problem that we address will motivate researchers from this community to explore the possibility of using GPUs for a wider variety of problems from the EDA domain. |
| Fast Schedulability Analysis Using Commodity Graphics Hardware |
Abstract: In this paper we explore the possibility of using commodity graphics processing units (GPUs) to speedup standard schedulability analysis algorithms. Our long-term goal is to exploit GPUs to accelerate common electronic design automation algorithms, most of which tend to be computationally expensive. Our main contribution in this paper is a reformulation of a standard demand bound criteria-based schedulability analysis algorithm as a streaming algorithm expressed in terms of computer graphics primitives. This allows the algorithm to be efficiently implemented on a GPU, thereby resulting in very attractive speedups. |
| GPU-Based Parallelization for Fast Circuit Optimization |
Abstract: The progress of GPU (Graphics Processing Unit) technology opens a new avenue for boosting computing power. This work is an attempt to exploit GPU for accelerating VLSI circuit optimization. We propose GPU-based parallel computing techniques and apply them on simultaneous gate sizing and threshold voltage assignment, which is often employed in practice for performance and power optimization. These techniques are aimed to fully utilize the benefits of GPU through efficient task scheduling and memory organization. Compared to conventional sequential computation, our techniques can provide up to 56x speedup without any sacrifice on solution quality. |
| Accelerating Hardware Simulation on Multi-cores |
Abstract: Electronic design automation (EDA) tools play a central role in bridging the productivity gap for designing complex hardware systems. However, with an increase in the size and complexity of today's design requirements, current methodologies and EDA tools are unable to effectively mitigate the further widening of productivity gap. It is estimated that testing and verification takes 2/3 of the total development time of complex hardware systems. Functional simulation forms the main stay of testing and verification process and is the most widely used technique for testing and verification. Most of the simulation algorithms and their implementations are designed for uniprocessor systems that cannot easily leverage the parallelism in multi-core and GPU platforms. For example, logic simulation often uses levelized sequential algorithms, whereas the discrete-event simulation frameworks for Verilog, VHDL and SystemC employ concurrency in the form of multi-threading to given an illusion of the inherent parallelism present in circuits. However, the discrete-event model of computation requires a global notion of an event-queue, which makes improving its simulation performance via parallelization even more challenging. This work investigates automatic parallelization of simulation algorithms used to simulate hardware models. In particular, we focus on parallelizing the simulation of hardware designs described at the RTL using SystemC/HDL with examples to clearly describe the parallelization. Even though multi-cores and GPUs offer parallelism, efficiently exploiting this parallelism with their programming models is not straightforward. To overcome this, we also focus our research on building intelligent translators to map simulation applications onto multi-cores and GPUs such that the complexity of the low-level programming models is hidden from the designers. |

BayWebSoft