This work describes a softcore design of a processor in style of Graphic Processing Units. The design is realized using Verilog Hardware Description Language. The proposed design has an advantage in the flexibility to scale up by adding more processing elements to attain more speedup. The design of its instruction set architecture is explained. A realization of four processing elements processor is presented. It requires 268,637 equivalent gates. The maximum frequency is 117 MHz. It is suitable of embedded applications. In term of cycles consumed, it compares very well to a test program running on commercial Intel's CPU, Core2 Duo P8400.
Paper available at IEEE.