ClassECE6332Fall12Group-Fault-Tolerant Reconfigurable PPA

From UVA ECE & BME wiki
Jump to: navigation, search

Contents

The Reconfigurables

Kyle Powers
Email: kep4md@virginia.edu
Dar-Eaum Andrew Nam
Email: dan6ra@virginia.edu
Eric Llana
Email: eal2yf@virginia.edu

Introduction

Motivation

Technology scaling results in an increase of process, voltage, and temperature (PVT) variations, aging phenomena, and to worse static (hard) and dynamic (soft) error rates. From this occurrence arises a necessity for robust system and circuit design methodologies. While such techniques have been necessary for some time, and are now widespread for memory design, only now are such techniques becoming necessary for other types of circuits, such as datapath and irregular logic. The key aspects of such a design methodology are redundancy, error detection and correction, and reconfiguration.

Advised by Mircea R. Stan, Professor in the Charles L. Brown ECE Department, this research and design project, done through the capstone course Introduction to VLSI, serves to answer the question, “How can datapath circuits achieve fault-tolerance and reconfigurability with the least redundancy possible?” Current techniques either require massive redundancy, such as triple modular redundancy (TMR) schemes, or are difficult to extend and modify, such as the low-redundancy methods used for memory arrays, as datapath circuits lack the two-dimensional regularity required.

Approach

The fundamental approach to this research problem is arithmetic coding, which uses redundancy to add error detection and error correction for arithmetic circuits. Unfortunately, due to the coding complexity, redundancy schemes have to be developed specifically for each separate arithmetic structure. The remedy for this is by utilizing a novel set of adders that are all based on a full prefix tree (Bailey, 2012). The full prefix tree subsumes all previously proposed base-2 prefix adders with fanout of 2 by exploiting the design’s inherent redundancy and can be used as the base case for a fault-tolerant, reconfigurable adder design methodology.

Assignment

Description

Work in a group of two on a circuit research and design project related to the class concepts. The topic of the project is flexible, but your group must have the topic approved by the professor and TA prior to the first design review – make sure to get this approval as early as possible. Projects may vary in content and level of abstraction (e.g. device-centric vs. circuit-centric vs. architecture-centric), but all projects must include these common elements:

Use IEEE Explore to research topic ideas and to get references for your work: http://ieeexplore.ieee.org/

Graded Elements

The project grade is based on the following five components.

1. Project proposal – Turn in a 3-4 page proposal clearly describing the problem that you will address, the approach you plan to take, what you need to design, the novelty (e.g. research component), and the expected outcomes. You must cite relevant references. NB – Your group must meet with the professor and TA to have your idea approved prior to submitting the proposal.
2. Design Reviews – Each design review is intended to give you a chance to update the professor and TA on your progress and to get feedback and suggestions. You will turn in hard copies of simulations and/or schematics to show your progress. You will also turn in a 2 page typed description of your group’s progress and of the remaining tasks. The final design review will accompany the Final Report and Final Presentation. You will receive separate assignments related to each design review to give more details. NB – the first design review is before the proposal.
3. Final Report – Turn in a 4 page conference-style paper describing your project. A template will be available to show the format.
4. Final Presentation – Your group will present your results to the class in a conference-style presentation using slides. Name your presentation as follows: ECE6332_LastNamesOfMembers_FinalPresentation.ppt(x)
5. Create and update your group’s page on the wiki to describe your project and your results (ongoing assignment).

Parallel Prefix Adders

16-bit Kogge-Stone tree graph
Figure 1 - 16-bit Kogge-Stone tree graph
16-bit Brent-Kung tree graph
Figure 2 - 16-bit Brent-Kung tree graph
16-bit Han-Carlson tree graph
Figure 3 - 16-bit Han-Carlson tree graph
16-bit Ladner-Fischer tree graph
Figure 4 - 16-bit Ladner-Fischer tree graph

Description

In a PPA, a prefix operation is constructed that permits the computation of intermediate carries. In conjunction with generate and propagate signals, the prefix operator allows PPAs to obtain an advantageous latency of O(log2N) instead of O(N) (like in a Ripple Carry adder), where N is the word length. Figure 5 (below) illustrates a tree graph of an 8-bit Kogge-Stone PPA.

Functionality

The PPA functions properly by successfully going through three stages. The first stage is a pre-computation stage, which generates the generate and propagate signals, and is governed by the equations:

gi = Gi:i = ai AND bi ----- pi = Pi:i = ai XOR bi ---(1)

where ai and bi are inputs bits. Additionally,

g0 = G0:0 = Cin ----- p0 = P0:0 = 0 ---(2)

The prefix operation serves as the middle stages in this high performance adder topology. A dot operator denotes the prefix operation:

(Gj:mv Pj:mv) = (Gj:k Pj:k) • (Gl:mv Pl:m) ---(3)
Gj:m = Gj:k OR (Pj:k AND Gl:m) ----- Pj:m = Pj:k AND Pl:m ---(4)

where j ≤ (k, l) ≤ m. The dot operator is associative and idempotent, but not commutative.
The Carry logic is also utilized in the middle stages. By utilizing equations (3) and (4), the prefix tree can determine the carry:

ci =Gi:0 ---(5)

The last stage in this high performance PPA is the sum computation. The sum is governed by the following equation:

si = ci XOR pi ---(6)

Parallel Prefix Adder Variations

Common designs and their attributes are:

8-bit Kogge-Stone PPA tree graph
Figure 5 - 8-bit Kogge-Stone PPA tree graph

Superset Adder

Errors

"The impetus for this project is the presence of hard errors, or faults, in integrated circuits. These errors occur during fabrication or due to aging. Hard errors mean some permanent defect exists in a transistor or wire. Soft errors, on the other hand, are temporary changes in signal values due to outside intervention. For instance, a cosmic ray can flip a number of data bits inside an SRAM, ruining the values stored. Ways to mitigate errors and provide fault-tolerant circuitry have been around for over 50 years, but these old solutions are frequently still applicable today. Hard faults occur for numerous reasons. What matters more is the fact that they occur and when they occur. Many occur during fabrication and initial testing. Once the bad chips are weeded out, normal chips have an average failure rate over the course of their lifetime. Eventually aging sets in, and transistors start failing more often as problems like NBTI and electromigration arise."[1]

Reconfiguration

"Reconfiguration in this design comes from using MUXs to select signals. Each prefix operation contains an output MUX which selects if the calculated output should be used or not. If not, it passes the most significant input (ignoring the incoming edge from a different column). There has been lots of work in designing fault-tolerant adders, but not as much in making them reconfigurable. A UVA paper from 2009 introduces a simple reconfiguration scheme using MUXs to select two different outputs from the adder."[1]

Design Space

An 8-bit full tree graph

"The idea for the design space came by noticing a connection between various radix 2 and fan-out 2 adders (i.e. Kogge-Stone, Han-Carlson, and Brent-Kung). Combining these adders produces a full tree adder, seen to the right. The full tree adder contains the sufficient nodes to produce the Kogge-Stone and different varieties of Han-Carlson and Brent-Kung adders. Additionally, new adder structures fall out of the full tree adder. The ability to turn nodes (prefixes) on and off gives rise to the reconfigurability. However, determining the combination of nodes which give functioning adders can be difficult. "A New Taxonomy for Reconfigurable Prefix Adders" mollifies that difficulty by organizing some "regular" adders into a design space. To find out more, read the paper."[1]

Design

"The structure begins like a Kogge-Stone. It has the standard setup nodes which calculate initial P and G values. Following those is a binary prefix tree, except each prefix node has a mux afterwards which selects either the most significant P and G values or the output of the prefix calculation. A reverse tree adds redundancy and allows for various configurations. Sum nodes calculate the sum, and an extra prefix stage calculates redundant outputs. XOR gates compare the outputs and give the error signature. See this paper for more information on the design and the results. Also take a look at this project, which used Synopsys tools to synthesize and optimize the superset adder."[1]

Gate-Level Diagrams in Logisim

16-bit Kogge-Stone PPA

Gate-Level Diagram of a 16-bit Kogge-Stone Parallel Prefix Adder

8-bit Kogge-Stone PPA

Gate-Level Diagram of an 8-bit Kogge-Stone Parallel Prefix Adder

8-bit Brent-Kung PPA

Gate-Level Diagram of an 8-bit Brent-Kung Parallel Prefix Adder

8-bit Han-Carlson PPA

Gate-Level Diagram of an 8-bit Han-Carlson Parallel Prefix Adder

8-bit Ladner-Fischer

Gate-Level Diagram of an 8-bit Ladner-Fischer Parallel Prefix Adder

8-bit Superset Adder

Gate-Level Diagram of an 8-bit Superset Adder

8-bit Superset Adder with Inversion

Gate-Level Diagram of an 8-bit Superset Adder with Inversion

Cadence Designs

16-bit Kogge-Stone PPA

16-bit Kogge-Stone PPA in Cadence

Testbench of a 16-bit Kogge-Stone PPA

Testbench of a 16-bit Kogge-Stone PPA in Cadence

16-bit Superset Adder

16-bit Superset Adder in Cadence

Larger view: Superset_Adder.png

16-bti Superset Adder with Inversion

16-bit Superset Adder with Inversion in Cadence

Larger view: Superset_Adder_with_Inversion.png

Simulations

16-bit Kogge-Stone PPA

Transient simulation:

16-bit Kogge-Stone Parallel Prefix Adder (Cin and A[0:15])
Cin = 0, A[0:15] = 1
16-bit Kogge-Stone Parallel Prefix Adder (B[0:15])
B[0:15] = 1
16-bit Kogge-Stone Parallel Prefix Adder (Sum[0:15] and Cout)
Sum[0:15] = 1, Cout = 1

16-bit Superset Adder

All our simulations, including those for the Superset Adder with inversion, can be found in this zip file: SimGraphs.zip

Results

Our first set of results, Table 1, demonstrates the trade-off in size when accounting for reconfigurability and fault-tolerance. The Superset Adder was found to be four times as large as an equivalent Kogge-Stone. This increase in area can be attributed to the reverse tree and numerous multiplexers. Naturally, utilizing inversion and power gating only increased area, as additional inverters and transistors were necessary.

Table 1. Comparison of area in micrometers

For our next set of results, we tested the Superset Adder in a Kogge-Stone configuration by setting one input to 0000 0000 0000 0001 and the other input to 1111 1111 1111 1111; this creates a carry that exercises the critical path. Ignoring our base Kogge-Stone, we found the power gated Superset Adder to be the least power consuming and the Superset Adder “with inversion” to be the fastest.

Table 2. A Kogge-Stone comparison of delay and power

Table 3 demonstrates a Brent-Kung configuration with the same input stimulation. We found the power gated Superset Adder “with inversion” to be marginally faster than the power gated Superset Adder (non-inversion), while the Superset Adder “with inversion” had the least delay.

Table 3. A Brent-Kung comparison of delay and power

Our last set of results is a Han-Carlson configuration with the same input setup. Our results show that the power gated Superset Adder had the least power consumption, while the Superset Adder “with inversion” was the fastest.

Table 4 A Han-Carlson comparison of delay and power

Conclusions

A power gated prefix node
A power gated prefix node

The Superset Adder is a novel solution for reconfigurability and fault-tolerance. What we have presented in this project are ways to improve upon it. The utilization of the inversion property shows a drastic decrease in delay in any given configuration. The use of power gating also provides a large reduction in power consumption. In the case of the Kogge-Stone configuration, the Superset Adder “with inversion” and power gating was 10% faster and 8% more power efficient than its standard Superset Adder counterpart. In the case of the Brent-Kung configuration, it was 9.5% faster and 8.9% less power consuming. When considering the Han-Carlson configuration, it was 5.1% faster and 7.4% more power efficient. Thusly, we propose, that by using inversion and power gating simultaneously, the Superset Adder moves forward in its goal for high performance, low power applications where robustness is a necessity.

The Superset Adder, however, may also benefit from additional modifications. The sizing of the header switch, the PMOS transistor, undoubtedly affects delay. We found a 0.02ns increase in speed in the Han-Carlson configuration when doubling its size from 90nm to 180nm. Using less area-intensive MUX designs, such as those with pass gates or transmission gates, can reduce the costly area of implementing multiplexers at each node. This, in turn, could also decrease delay and power consumption. Moreover, our experiment used the same logic layout for each prefix node; some nodes, however, could be further reduced in area and power by removing the propagate logic (an entire AND gate) because the logic is not inherently necessary. We chose to keep the logic due to ease of implementation and future expandability (to 32-bits for example).

Project Deliverables

Further Extensions

Timeline

References

[1] https://venividiwiki.ee.virginia.edu/mediawiki/index.php/SupersetAdder
[2] Harris, D. (2003). A taxonomy of parallel prefix networks. Signals, Systems and Computers, 2003. Conference Record of the Thirty-Seventh Asilomar Conference on, 2213-2217 Vol.2.
[3] Knowles, S. (1999). A family of adders. Computer Arithmetic, 1999. Proceedings of the 14th IEEE Symposium on, 30-34.
[4] Beaumont-Smith, A., & Lim, C. (2001). Parallel prefix adder design. Computer Arithmetic, 2001. Proceedings from the 15th IEEE Symposium on, 218-225.
[5] Rao, W., & Orailoglu, A. (2008). Towards fault tolerant parallel prefix adders in nanoelectronic systems. Design, Automation and Test in Europe, 2008, 360-365.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox