# ClassECE6332Fall12Group-Fault-Tolerant Reconfigurable PPA

## Contents |

## The Reconfigurables

##### Kyle Powers

- Email: kep4md@virginia.edu

##### Dar-Eaum Andrew Nam

- Email: dan6ra@virginia.edu

##### Eric Llana

- Email: eal2yf@virginia.edu

## Introduction

### Motivation

Technology scaling results in an increase of process, voltage, and temperature (PVT) variations, aging phenomena, and to worse static (hard) and dynamic (soft) error rates. From this occurrence arises a necessity for robust system and circuit design methodologies. While such techniques have been necessary for some time, and are now widespread for memory design, only now are such techniques becoming necessary for other types of circuits, such as datapath and irregular logic. The key aspects of such a design methodology are redundancy, error detection and correction, and reconfiguration.

Advised by Mircea R. Stan, Professor in the Charles L. Brown ECE Department, this research and design project, done through the capstone course Introduction to VLSI, serves to answer the question, “How can datapath circuits achieve fault-tolerance and reconfigurability with the least redundancy possible?” Current techniques either require massive redundancy, such as triple modular redundancy (TMR) schemes, or are difficult to extend and modify, such as the low-redundancy methods used for memory arrays, as datapath circuits lack the two-dimensional regularity required.

### Approach

The fundamental approach to this research problem is arithmetic coding, which uses redundancy to add error detection and error correction for arithmetic circuits. Unfortunately, due to the coding complexity, redundancy schemes have to be developed specifically for each separate arithmetic structure. The remedy for this is by utilizing a novel set of adders that are all based on a full prefix tree (Bailey, 2012). The full prefix tree subsumes all previously proposed base-2 prefix adders with fanout of 2 by exploiting the design’s inherent redundancy and can be used as the base case for a fault-tolerant, reconfigurable adder design methodology.

## Assignment

### Description

Work in a group of two on a circuit research and design project related to the class concepts. The topic of the project is flexible, but your group must have the topic approved by the professor and TA prior to the first design review – make sure to get this approval as early as possible. Projects may vary in content and level of abstraction (e.g. device-centric vs. circuit-centric vs. architecture-centric), but all projects must include these common elements:

- Design component: the project must include some design. This may be design of new circuits, adaptation of known circuits to different technologies (e.g. predictive technology models, PTMs) or applications, or architectural re-design, for example.

- Research component: the project must involve something new and cannot simply be an implementation of existing techniques. The novelty may be an entirely new circuit, or it may be a comparison of known approaches in a novel way, for example.

- Simulation: every project must include circuit simulations.

Use IEEE Explore to research topic ideas and to get references for your work: http://ieeexplore.ieee.org/

### Graded Elements

The project grade is based on the following five components.

- 1. Project proposal – Turn in a 3-4 page proposal clearly describing the problem that you will address, the approach you plan to take, what you need to design, the novelty (e.g. research component), and the expected outcomes. You must cite relevant references. NB – Your group must meet with the professor and TA to have your idea approved prior to submitting the proposal.

- 2. Design Reviews – Each design review is intended to give you a chance to update the professor and TA on your progress and to get feedback and suggestions. You will turn in hard copies of simulations and/or schematics to show your progress. You will also turn in a 2 page typed description of your group’s progress and of the remaining tasks. The final design review will accompany the Final Report and Final Presentation. You will receive separate assignments related to each design review to give more details. NB – the first design review is before the proposal.

- 3. Final Report – Turn in a 4 page conference-style paper describing your project. A template will be available to show the format.

- 4. Final Presentation – Your group will present your results to the class in a conference-style presentation using slides. Name your presentation as follows: ECE6332_LastNamesOfMembers_FinalPresentation.ppt(x)

- 5. Create and update your group’s page on the wiki to describe your project and your results (ongoing assignment).

## Parallel Prefix Adders

### Description

In a PPA, a prefix operation is constructed that permits the computation of intermediate carries. In conjunction with generate and propagate signals, the prefix operator allows PPAs to obtain an advantageous latency of *O(log2N)* instead of *O(N)* (like in a Ripple Carry adder), where N is the word length. Figure 5 (below) illustrates a tree graph of an 8-bit Kogge-Stone PPA.

### Functionality

The PPA functions properly by successfully going through three stages. The first stage is a pre-computation stage, which generates the generate and propagate signals, and is governed by the equations:

*g*_{i}= G_{i:i}= a_{i}AND b_{i}----- p_{i}= P_{i:i}= a_{i}XOR b_{i}---**(1)**

where a_{i} and b_{i} are inputs bits. Additionally,

*g*_{0}= G_{0:0}= C_{in}----- p_{0}= P_{0:0}= 0 ---**(2)**

The prefix operation serves as the middle stages in this high performance adder topology. A dot operator denotes the prefix operation:

*(G*_{j:mv}P_{j:mv}) = (G_{j:k}P_{j:k}) • (G_{l:mv}P_{l:m}) ---**(3)***G*_{j:m}= G_{j:k}OR (P_{j:k}AND G_{l:m}) ----- P_{j:m}= P_{j:k}AND P_{l:m}---**(4)**

where j ≤ (k, l) ≤ m. The dot operator is associative and idempotent, but not commutative.

The Carry logic is also utilized in the middle stages. By utilizing equations (3) and (4), the prefix tree can determine the carry:

*c*_{i}=G_{i:0}---**(5)**

The last stage in this high performance PPA is the sum computation. The sum is governed by the following equation:

*s*￼￼_{i}= c_{i}XOR p_{i}---**(6)**

### Parallel Prefix Adder Variations

Common designs and their attributes are:

- The Kogge-Stone (Figure 1) which has low logic depth, high node count, and minimal fanout. While a high node count implies a larger area, the low logic depth and minimal fanout allow for faster performance
- The Brent-Kung PPA (Figure 2) could optionally be constructed, which with its minimum number of nodes and maximum logic depth, trades speed for reduced area.
- The Han-Carlson, a hybrid Kogge‐Stone/Brent-Kung PPA, is a compromise between speed and area (Figure 3)
- The Ladner-Fischer, which has minimum logic depth, but has large fan-out requirement up to n/2 (Figure 4)

## Superset Adder

### Errors

"The impetus for this project is the presence of hard errors, or faults, in integrated circuits. These errors occur during fabrication or due to aging. Hard errors mean some permanent defect exists in a transistor or wire. Soft errors, on the other hand, are temporary changes in signal values due to outside intervention. For instance, a cosmic ray can flip a number of data bits inside an SRAM, ruining the values stored. Ways to mitigate errors and provide fault-tolerant circuitry have been around for over 50 years, but these old solutions are frequently still applicable today. Hard faults occur for numerous reasons. What matters more is the fact that they occur and when they occur. Many occur during fabrication and initial testing. Once the bad chips are weeded out, normal chips have an average failure rate over the course of their lifetime. Eventually aging sets in, and transistors start failing more often as problems like NBTI and electromigration arise."[1]

### Reconfiguration

"Reconfiguration in this design comes from using MUXs to select signals. Each prefix operation contains an output MUX which selects if the calculated output should be used or not. If not, it passes the most significant input (ignoring the incoming edge from a different column). There has been lots of work in designing fault-tolerant adders, but not as much in making them reconfigurable. A UVA paper from 2009 introduces a simple reconfiguration scheme using MUXs to select two different outputs from the adder."[1]

### Design Space

"The idea for the design space came by noticing a connection between various radix 2 and fan-out 2 adders (i.e. Kogge-Stone, Han-Carlson, and Brent-Kung). Combining these adders produces a full tree adder, seen to the right. The full tree adder contains the sufficient nodes to produce the Kogge-Stone and different varieties of Han-Carlson and Brent-Kung adders. Additionally, new adder structures fall out of the full tree adder. The ability to turn nodes (prefixes) on and off gives rise to the reconfigurability. However, determining the combination of nodes which give functioning adders can be difficult. "A New Taxonomy for Reconfigurable Prefix Adders" mollifies that difficulty by organizing some "regular" adders into a design space. To find out more, read the paper."[1]

### Design

"The structure begins like a Kogge-Stone. It has the standard setup nodes which calculate initial P and G values. Following those is a binary prefix tree, except each prefix node has a mux afterwards which selects either the most significant P and G values or the output of the prefix calculation. A reverse tree adds redundancy and allows for various configurations. Sum nodes calculate the sum, and an extra prefix stage calculates redundant outputs. XOR gates compare the outputs and give the error signature. See this paper for more information on the design and the results. Also take a look at this project, which used Synopsys tools to synthesize and optimize the superset adder."[1]

## Gate-Level Diagrams in Logisim

### 16-bit Kogge-Stone PPA

### 8-bit Kogge-Stone PPA

### 8-bit Brent-Kung PPA

### 8-bit Han-Carlson PPA

### 8-bit Ladner-Fischer

### 8-bit Superset Adder

### 8-bit Superset Adder with Inversion

## Cadence Designs

### 16-bit Kogge-Stone PPA

### Testbench of a 16-bit Kogge-Stone PPA

### 16-bit Superset Adder

### 16-bti Superset Adder with Inversion

## Simulations

### 16-bit Kogge-Stone PPA

Transient simulation:

### 16-bit Superset Adder

All our simulations, including those for the Superset Adder with inversion, can be found in this zip file: SimGraphs.zip

## Results

Our first set of results, Table 1, demonstrates the trade-off in size when accounting for reconfigurability and fault-tolerance. The Superset Adder was found to be four times as large as an equivalent Kogge-Stone. This increase in area can be attributed to the reverse tree and numerous multiplexers. Naturally, utilizing inversion and power gating only increased area, as additional inverters and transistors were necessary.

For our next set of results, we tested the Superset Adder in a Kogge-Stone configuration by setting one input to 0000 0000 0000 0001 and the other input to 1111 1111 1111 1111; this creates a carry that exercises the critical path. Ignoring our base Kogge-Stone, we found the power gated Superset Adder to be the least power consuming and the Superset Adder “with inversion” to be the fastest.

Table 3 demonstrates a Brent-Kung configuration with the same input stimulation. We found the power gated Superset Adder “with inversion” to be marginally faster than the power gated Superset Adder (non-inversion), while the Superset Adder “with inversion” had the least delay.

Our last set of results is a Han-Carlson configuration with the same input setup. Our results show that the power gated Superset Adder had the least power consumption, while the Superset Adder “with inversion” was the fastest.

## Conclusions

The Superset Adder is a novel solution for reconfigurability and fault-tolerance. What we have presented in this project are ways to improve upon it. The utilization of the inversion property shows a drastic decrease in delay in any given configuration. The use of power gating also provides a large reduction in power consumption. In the case of the Kogge-Stone configuration, the Superset Adder “with inversion” and power gating was 10% faster and 8% more power efficient than its standard Superset Adder counterpart. In the case of the Brent-Kung configuration, it was 9.5% faster and 8.9% less power consuming. When considering the Han-Carlson configuration, it was 5.1% faster and 7.4% more power efficient. Thusly, we propose, that by using inversion and power gating simultaneously, the Superset Adder moves forward in its goal for high performance, low power applications where robustness is a necessity.

The Superset Adder, however, may also benefit from additional modifications. The sizing of the header switch, the PMOS transistor, undoubtedly affects delay. We found a 0.02ns increase in speed in the Han-Carlson configuration when doubling its size from 90nm to 180nm. Using less area-intensive MUX designs, such as those with pass gates or transmission gates, can reduce the costly area of implementing multiplexers at each node. This, in turn, could also decrease delay and power consumption. Moreover, our experiment used the same logic layout for each prefix node; some nodes, however, could be further reduced in area and power by removing the propagate logic (an entire AND gate) because the logic is not inherently necessary. We chose to keep the logic due to ease of implementation and future expandability (to 32-bits for example).

## Project Deliverables

- File:Superset Adder-Paper.pdf - Final paper
- File:Superset Adder-Presentation.pptx - Final presentation (animations included --> view in presentation mode)
- Supserset_Adder.circ - 8-bit Superset Adder and Superset Adder with Inversion in Logisim
- Superset_Adder-Cadence.zip - Zipped folder of Cadence files (includes a README.txt)

## Further Extensions

- Fabricate the Superset Adder and assess for efficiency and functionality
- Have the Superset Adder be a standard library-item in industry-standard software
- Alter/improve the design space by increasing the radix and/or fanout
- Explore the use of Null Convention Logic to eliminate glitches at the expense of area
- Alter/improve the design space by localizing errors and reconfiguring the adder on chip
- Explore reducing the number of multiplexers and/or control signals when extending the superset technique to other arithmetic datapaths (i.e. tree multipliers)

## Timeline

- Issued: 08/30/12
- Group Formation: 09/11/12
- Design Review #1 Due: 10/05/12
- Design Proposal Due: 10/18/12
- Design Review #2 Due: 11/16/12
- Final Presentation Due: 12/10/12
- Final Report Due: 12/12/12
- Final Review (meeting): 12/12/12

## References

[1] https://venividiwiki.ee.virginia.edu/mediawiki/index.php/SupersetAdder

[2] Harris, D. (2003). A taxonomy of parallel prefix networks.* Signals, Systems and Computers, 2003. Conference Record of the Thirty-Seventh Asilomar Conference on,* 2213-2217 Vol.2.

[3] Knowles, S. (1999). A family of adders. *Computer Arithmetic, 1999. Proceedings of the 14th IEEE Symposium on,* 30-34.

[4] Beaumont-Smith, A., & Lim, C. (2001). Parallel prefix adder design. *Computer Arithmetic, 2001. Proceedings from the 15th IEEE Symposium on,* 218-225.

[5] Rao, W., & Orailoglu, A. (2008). Towards fault tolerant parallel prefix adders in nanoelectronic systems. *Design, Automation and Test in Europe, 2008*, 360-365.