Applying active accelerated self-healing techniques on Parallella platform

From UVA ECE & BME wiki
Jump to: navigation, search

Contents

Applying active accelerated self-healing techniques on Parallella platform

This is a subproject for the CLASH project, which is funded by NSF and SRC.

Group Members

Motivation and Goals

Parallella Introduction

Vertical View

Parallella-board-22-609x400.jpg


Architecture

Processor

Memory

Network-On-Chip

The Epiphany Network-on-Chip (eMesh) is a 2D mesh network that handles all on-chip and offchip communication. The eMesh network uses atomic 32-bit memory transactions and operates without the need for any special programming. The network consists of three separate and orthogonal mesh structures, each serving different types of transaction traffic: one network for onchip write traffic, one network for off chip write traffic, and one network for all read traffic[7].

Problems suited for the Epiphany

[4] pointed out that the tasks should be parallel, as the main feature of the Epiphany is its parallel nature. However, since the architecture is MIMD, it does not have to be a task that is only parallel in its data, though it very well may be. Another factor is the memory of the Epiphany. Since the processor has a so little amount of memory it should not be a task that requires very high amounts of data. Last, the data transfer between the main processor of the Parallella and the Epiphany is not very fast. This means that the Epiphany is best suited for generating data, or processing a small amount of data many times. We would want to transfer a small amount of data to the Epiphany, then work on this data in many iterations, and at last transfer the result back to the main processor.

How to get access to Parallella chip?

Windows Version


Image04.png



Image05.png




Image00.png



Mac Version


Image02.png

System Integration

FPGA/ASIC Interfacing

The E16G301 can be directly interfaced to an FPGA or ASIC by instantiating the eLink interface provided by Adapteva. The eLink interface block is used to convert the high speed serial link I/O interface to a lower speed parallel interface. To the system, the eLink interface looks like a simple memory mapped interface[4].

Device Package

The E16G301 uses a 324 ball 0.8 mm pitch wire-bond BGA package that measures double 15mms.

Chip-To-Chip Link Interface

The E16G301 has four identical source-synchronous bidirectional off chip LVDS links (eLink) that can be used to connect the E16G301 to other E16G301 chips, FPGAs, and/or ASICs. Interfacing the E16G301 with an FPGA should be done by instantiating the eLink HDL open source HDL code provided by Adapteva[4].

Accelerated Self-healing

Motivation

Reliability Challenges

Proposed Solution

Fixing Wearout through Accelerated Self-Healing

Experimental Setup

Expected Deliverables

A core allocation solution which fixes wearout at the architecture/system level by taking advantages of accelerated recovery while keeping power, performance, area optimal. Or a scheduling solution which enables the proactive scheduling based on the application behaviors.

Metric Improvements

Dissertation Research Flow

Experimental Validation

Research Question

Implementation & Evaluation

Research Question

Parallella Issues

Figure out how to stress cores under high voltage or temperature. Can we control the core voltage for each core? How does the power network work

No. We can only control the voltage for the whole core mesh. This is from the document [6].

Do we have control for the clock? Can we do overclocking or slow down the clock

The clock is from the FPGA IO, so to control the clock, we need to program the FPGA.

Figure out the metrics for capturing wearout issues. (For example, performance? or failures?)

Failures

How to program the core so that it could give the metrics we need and what is the best program for this

How to enable or disable each core separately? Is it possible? [7]

the IDLE instruction, at which time the core enters a standby state. During the standby state, core clocks are disabled and the power consumption is minimized. Applications that need minimal power consumption should use the IDLE instruction to put the core in a standby state and use interrupts to activate the core when needed.

Is there voltage regulators on chip? on board?

There is no voltage regulator on chip. But there are two on board. This is from [8].

How to enable the Epiphany low power mode

Based on the parallella forum, it is answered here [2].

Figure out what does IDLE instructure mean?

How to detect hardware errors via software?

Looking into papers:

1. [3] 2. [4]

What is the clock tree distribution?

Some possible info:

in document: http://www.adapteva.com/docs/epiphany_arch_ref.pdf

Page 19: Epiphany Shared Memory Map

Page 30: Local Memory

Chapter 5: eMesh Network-On-Chip

Chapter 7.5: Status Flags: Active


in document: http://www.adapteva.com/docs/e16g301_datasheet.pdf

Page 14: 3.3 Reset and Clock, table 5

On Power Saving

On how to measure power

Other

Performance Analysis Tools [8]

Look into other multicore architectures and how they control power.

Problem 1

How to control the single core in the 16-cores chip?

Idea 1: Reset all the system at the beginning of the program.

shortcoming: not sure whether it is purely idle state, need more steps for checking.

Idea 2: Call the library of the Parallella functions.

shortcoming: but no library function to set single core.

Idea 3: Check each time before running any program.

shortcoming: but even find no work, still not sure the core is in the purely idle.

Choice?

As far as we are concerned, we choose option 2.

Reason

Can set all of them in the beginning so that it does not matter if I cannot set idle to single core.

Introduction

Program seems to load the e_filename.srec program into the core at the row, col, and the specified &dev address using the e_load or e_reset methods.

Code

unsigned row, col, coreid, i, j;

       e_platform_t platform;
       e_epiphany_t dev;
       e_mem_t   mbuf;
       int rc;

srand(1);

e_set_loader_verbosity(H_D0);

       e_set_host_verbosity(H_D0);

// initialize system, read platform params from // default HDF. Then, reset the platform and // get the actual system parameters.

       e_init(NULL);

e_reset_system();

e_get_platform_info(&platform);

details:https://github.com/johnjp15/parallella-helloworld

Implementation

Implementation is based on the examples provided by Adapteva.

Host side:

Initializes the operand matrices and transfers it to the shared memory when device signals completion of execution, host reads the result matrix from shared memory

Device side:

Further details of implementation can be found in: http://arxiv.org/abs/1410.8772

Tested on the Epiphany-IV evaluation module

Building

Single-core version

Configure the parameters accordingly in src/defs.h and run:

$ make single Multi-core version

Configure the parameters accordingly in src/defs_multi.h and run:

$ make multi

Usage

Single-core version

$ ./run.sh Multi-core version

$ ./run_multi.sh Result matrix will be written to output/

Detailed Explanations

results

Specific cores work by controlling the id for each core, and they print the id addresses when printing out "hello world"!!
Parallella hello world.png

Problem 2

What is the lowest voltage for the input voltage without error?

The way to the problem

What is Surface-mount technology?

Surface-mount technology (SMT) is a method for producing electronic circuits in which the components are mounted or placed directly onto the surface of printed circuit boards (PCBs). An electronic device so made is called a surface-mount device (SMD).

Problem3

Is the location of cores affects the performance?

example

Such as: (if I set two of them are idle), the first option I choose (A12 B20) and (A30 B00), the second options is (A01 B12) and (A32 B22), whether there will be difference in the two options? Since I think that there should be thermal issues.

Problem4

What is the code for the power consuming?

In order to get a overall result, we need a loop, and arithmetic code is a good choice.

example

Such as: the add or minus loop.

Problem 5

How to assign a duty more efficiently?

References

Parallella Resources

Papers using Parallella as a Platform[9][10]

Papers on Hardware Error Detection

Wearout and Accelerated Self-healing Papers

Useful links

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox