Scalable Parallel Simulation of Many-Core Tiled Architectures
Title: Scalable Parallel Simulation of Many-Core Tiled Architectures
SNIC Project: SNIC 2013/1-279
Project Type: SNAC Medium
Principal Investigator: Sally McKee <>
Affiliation: Chalmers University of Technology
Duration: 2013-11-13 – 2014-12-01
Classification: 10206 20203 20206


During the decade leading up to 2005, computer designers consistently delivered higher single-core performance by increasing clock speeds and increasing Instruction-Level Parallelism through ever more sophisticated speculation mechanisms. Technology constraints --- especially power and thermal constraints --- have brought an end to our ability to continue this trend, and the community has turned to increasing parallelism to deliver more computational power per chip. Shrinking feature sizes have now made it possible for us to put hundreds of processing cores on a single chip, and thousand-core chips are predicted by 2020. Computer architects usually model proposed designs with detailed ``cycle accurate'' simulation models. Such simulators require significant computational resources: full-system simulations (which include the OS and the runtime software stack in addition to the application being modeled) can take many days or even weeks (if the application is run to completion). Our ability to conduct thorough design-space exploration of thousand-core chips is thus limited by our modeling capabilities. In order to study highly parallel processor designs, we need to parallelize our simulation models to leverage the many simultaneous threads available in current high-performance platforms. This project will create such a highly parallel simulation model for the Adapteva Parallela platform, which consists of an ARM general-purpose core that drives an Epiphany tiled-core chip. The Epiphany represents the highest computational power per watt in the world today. It is intended to be used as a high-performance, low-power coprocessor, and it has significant potential to accelerate applications like baseband processing, and to do so at minimal power. A 16-core Epiphany chip is already available, and 256-core chip is currently under test. A 1024-core chip is projected, but at present the only way to model it is by building prototypes. At present, the only simulator for the Adapteva platform consists of a single-core model. No scalable network model yet exists, and no simulation tools yet integrate models of the ARM core and external DRAM memory (among other resources) on the Parallela board. We propose to develop a highly threaded, scalable simulation model that can (eventually) boot Linux and model full applications. This project will produce two masters theses here at Chalmers, and it will support an ongoing research collaboration with Ericsson AB on lowering power consumption in LTE base stations. Our initial driving application will be an open-source LTE uplink model that we developed in collaboration with Ericsson colleagues in 2010-2012. The purpose of the project is thus twofold: 1) to study scalability methods for computer architecture design tools. and 2) to investigate the use and optimization of highly parallel, low-power tiled architectures like the Adapteva for baseband processing. Beneficiaries of this project include not only Chalmers, Ericsson, and Adapteva, but a growing open-source community of Adapteva technology adopters (including fellow researchers at Halmstad). In addition to the intellectual merits of the project itself, this research has ramifications for sustainable development and for Sweden's telecommunications industry.