# ClassECE6332Fall15GroupMultipler

## Contents |

# Comparison of Different Architectural Design Methods for Multiplier

## Group Members

Weizhong Huang(wh6ma@virginia.edu)

Zekun Cai (zc4hs@virginia.edu)

Siyi Shen (ss3dd@virginia.edu)

## Abstract

A multiplier is one of the key hardware blocks in most digital signal processing systems. This project is a study of a relative performance comparison of various 4-bits multiplier designs of Array, Wallace, Dadda, and Booth Encoding multipliers in Cadence. We compare their performance from looking into the trade-offs between parameters like the number of transistors, gate propagation delays and power dissipation in different types of multiplier structures. With the advancement in technology, various studies have been conducted to produce high speed, low power consumption and less area or a combination of the performance in one multiplier. Hence, with these performance characteristics, we are able to find out the various high speeds, low power and compact VLSI implementation of the multipliers designed.

## Introduction

- Multiplication has become a widely used arithmetic operation in signal processing and scientific applications. The main concern in classic multiplication, often realized by K cycles of shifting and adding, is to speed up the underlying multi-operand addition of partial products.

## Different Multiplier Architecture

### I. Array Multiplier

- The Array algorithm uses a simple addition and shifting method which comprises of AND gates and adders. Each of the partial products of the array multiplier is generated by multiplying the multiplicand with the multiplier. The partial products computed are shifted according to the bit orders and addition is being done after that. When array multiplication is applied, we need to add as many numbers of partial products as there are in the multiplier bits.

### II. Wallace Tree Multiplier

- Wallace multiplier exercises an efficient parallel multiplication algorithm. The main advantage of the Wallace tree is that there is only a reduction stage delay of the order O(log n) and each layer has O(1) propagation delay. The logarithm of O(log n) only present in the reduction layers and each layer has O(1) propagation delay. The partial products are made up of O(1) while the final addition is O(log n). So, the multiplication only consists of O(log n) and this is not any slower than addition being computed.

### III. Dadda Tree Multiplier

- Dadda Tree Design An important characteristic of Dadda Tree is that Dadda algorithm performs minimum reduction necessary at each level. There are three stages in which Dadda Tree works. First is the formation of partial product matrix. Then this matrix is reduced to two rows. At last, these two rows are added using carry full adder. The difference between Wallace Tree and Dadda Tree is that Dadda Tree doesn't add all partial products, but instead shift the partial products to get a triangle and then get two rows using half adders or carry full adders.

### IV. Vedic Multiplier

- The Vedic logic used in this project is known as “Urdhva Triyakbhyam Sutra”. It means "Vertically and Crosswise". The figure below shows how to multiply two 2-bit numbers using "Urdhva Triyakbhyam Sutra". Based on this logic, two bit multiplication is equivalent to a “and” gate. To add two bits, a half adder is needed and for this logic we need two half adders. So the logic circuit schematic can be obtained as follows.

- The principles of 4 bit multiplication are illustrated in the figure below. Using 4 such 2x2 Vedic multipliers and 3 adders we can built 4x4 bit multipliers as shown in the design. And 8x8 Vedic multipliers can be realized with existing 4x4 multipliers.

### V. Booth-Encoding Radix-2 Array Multiplier

The modified Booth recoding algorithm allows for the reduction of the number of partial products to be compressed in a carry save adder tree. Thus the compression speed can be enhanced. This is a standard technique used in chip design and provides significant improvements over the long multiplication technique.[3] The encoders of Radix-2 and Radix-4 Booth recoding are shown on the tables below. Radix-2 is used in our 4x4 Array Booth Multiplier design. The block diagram is shown below.

### VI. Baugh-Wooley Array Multiplier

The two’s complement number system represents positive as well as negative numbers. They have the following format:

The range of an n-bit 2’s complement number is {-2^(n-1) , 2^(n-1)-1} Squaring two 2’s complement numbers: If two 2’s complement numbers were multiplied together, the resulting multiplication would result in 4 terms. Let us consider the case of squaring an n-bit 2’s complement number. We get:

The diagram shown below is the algorithm of Baugh Wooley Logic Design.

## Trade-offs

In different applications, one may need to use different kinds of multiplier architectures that adapts to some specific scenarios. Therefore, it is necessary to discuss the trade-offs of multipliers, from the comparison of basic algorithms, to structure implementations such as the number of transistors, area, propagation delay, power dissipation, etc.

**Wallace Tree Multiplier**

In this architecture, after getting the partial product matrix, one may combine partial products as early as possible to compress the matrix to a ultimate matrix with 2 rows. The result is to minimize the overall delay by making the final CPA as short as possible, thus leading to fastest possible design. The number of logic levels required to perform the summation can be reduced. But the disadvantage of Wallace tree is the irregular structure and wires.

**Dadda Tree Multiplier**

Compared with Wallace algorithm, Dadda algorithm was designed to minimize the reduction performed at each level. One may combine partial product bits as late as possible, while keeping the critical path length of the tree minimal. Hence, this algorithm leads to a simpler CSA tree structure but much wider CPA at the end.

**Array Multiplier**

One important advantage of an array multiplier is the regular structure. Since it is regular, it is easy to layout and has a small size. The design time of an array multiplier is much less than that of a tree multiplier.[9] Another advantage is its ease of design for pipelined structure, which largely increases the efficiency of the multiplier. But the main disadvantage is the worst delay of the multiplier. The speed will be very slow for a high bit array multiplier.

**Vedic Multiplier**

Vedic mathematics used in a multiplier reduces the complexity in calculations that exist in a conventional multiplier. Due to its regular structure, it can be easily layout in a silicon chip. The Multiplier has the advantage that as the number of bits increases, gate delay and area increases very slowly as compared to other multipliers.[6] However, one main drawback is that the system will be extremely large because of the algorithm.

**Baugh Wooley Multiplier**

Baugh Wooley Multiplier is another efficient technique to handle signed multiplication. It maximizes the regularity of the multiplier logic and allow all the partial products to have positive signed bits.[9] The main disadvantage is still the complex structure coming from operations of 2's complement numbers.

**Booth Encoding Multiplier**

Instead of directly adding operands on partial product matrix, Booth encoding techniques firstly performs the recoding that defines the operations to be performed on the multiplicand. The advantage is that it generates only a halve of the partial products compared to other multiplier implementations which do not use Booth Encoding.[10] However, this benefit achieves at the expense of increased complexity. The main difficulties come from the use of 2's complement representation which is necessary to compute negative partial products.[10]

## Cadence Schematics

### Basic Formed Units Schematics

### 4 x 4 Binary Multiplier Schematics

[1] 4 x 4 Array Multiplier

http://venividiwiki.ee.virginia.edu/mediawiki/images/f/f4/4_array.png

[2] 4-bit Dadda Tree Multiplier

http://venividiwiki.ee.virginia.edu/mediawiki/images/2/25/4_dadda.png

[3] 4-bit Wallace Tree Multiplier

http://venividiwiki.ee.virginia.edu/mediawiki/images/9/9b/4_wallace.png

[4] 4-bit Baugh Wooley Array Multiplier

http://venividiwiki.ee.virginia.edu/mediawiki/images/1/16/4_wooley.png

[5] 4 x 4 Booth Encoding Radix-2 Multiplier

http://venividiwiki.ee.virginia.edu/mediawiki/images/3/37/4_booth.png

[6] 4 x 4 Vedic Multiplier

http://venividiwiki.ee.virginia.edu/mediawiki/images/d/db/4_vedic.png

### 8 x 8 Binary Multiplier Schematics

[1] 8-bit Wallace Tree Multiplier

http://venividiwiki.ee.virginia.edu/mediawiki/images/d/da/8_wallace.png

[2] 8-bit Dadda tree multiplier

http://venividiwiki.ee.virginia.edu/mediawiki/images/b/b3/8_dadda.png

[3] 8 x 8 Array Multiplier

http://venividiwiki.ee.virginia.edu/mediawiki/images/8/82/8_array.png

[4] 8 x 8 Vedic Multiplier

http://venividiwiki.ee.virginia.edu/mediawiki/images/8/82/8_vedic.png

## Simulation Results

### I. 4x4 Binary Multiplier Architecture

#### 1) Propagation Delay

To compare the propagation delay of different multiplier implementations, we use two sets of numbers for each type of multiplier to multiply. For simulations of 4-bit multipliers, X=1111, Y=1111 is the one set of number while X=1010, Y=1010 is the other. For simulations of 8-bit multipliers, X=11111111, Y=11111111 is the one set of number while X=10101010, Y=10101010 is the other one. We both set one bit input of the multiplicand, Y3 for 4-bit tests and Y7 for 8-bit tests, as a pulse with a period of 3ns and pulse width 1ns, while keeping others constant all the time. As a result, we get specific outputs as the response to input pulse, which show the propagation delay if we measure the time difference between 50% pulse edges of input and output.

Measured data is listed in the tables below. According to the measurement, we pick out the maximum delay of each kind of architecture and plot it as a histogram and a line chart for comparison.

Firstly, we compared 4 kinds of 4x4 multipliers, 4x4 Wallace multiplier,4x4 Dadda multiplier,4x4 Vedic multiplier and 4x4 Bit Array multiplier. According to the line chart, with the same operands, the Dadda Multiplier has smaller propagation delay than 3 other multipliers. And the delay of Wallace multiplier is a little bit higher than that of Dadda Multiplier. The delay of Array Multiplier is the highest. This result is reasonable because tree structures get short overall delay but have irregular structure while array architectures get higher overall delay in order to get regular structure. Besides, the Dadda multiplier was designed to minimize the reduction performed at each level while compressing the partial product matrix. As a result, the CSA tree structure inside is simpler while a wider CPA appears at the end. On the contrary, the Wallace multiplier has a short enough CPA but much wider CSA structure. So the delay of these two structures are close. As for the Vedic multiplier, its advantage is that the delay increases slowly with higher bits. Thus, the delay is not obvious with 4-bit multiplication.

Secondly, among array structures, we compared 3 types of 4x4 multipliers, 4x4 array multiplier, 4x4 Radix-2 Booth array multiplier and 4x4 Baugh Wooley array multiplier. As shown in the histogram, the delay of Baugh Wooley Multiplier is the lowest among these three structures and the basic array multiplier is the highest. It is not surprising to get that since Baugh Wooley and Booth Encoding are two algorithms proposed to improve array structures. Both two implementations reduce the number of partial products produced in matrix and hence get better performance.

#### 2) Transistor Number Estimation

Based on the schematics we use in Cadence, we can count out the total number of transistors of different types of multiplier implementations. In general, the number of transistors can be used to estimate the areas of multiplier architectures. But it is not an accurate way to estimate areas unless the schematics are constructed in layout. For example, the regularity of units can greatly save space and increase the integration.

In the histogram of estimation of the transistor number of 4x4 multipliers, Baugh Wooley multiplier and Radix-2 Booth multiplier are the top two on transistor numbers, with the Vedic multiplier and other basic multipliers following. This can be explained by the complexity of 2's complement binary computation used in two structures. Then the Vedic multiplier is the another complex one because of the very steps that Vedic mathematics has.

#### 3) Power consumption

Power consumption is an important index to characterize different multipliers. In this project, we use vdc parts in Cadence to connect every source supply port of multipliers with vdd. In this case, we got the total transient current response through the multiplier while providing the pulse to one input and keeping others constant. Then we took average value of the transient current over one period and multiply it with vdd. Hence, Power consumption of multipliers can be estimated. The simulation results are shown in the tables below and an example of transient current response of 4 bit Baugh Wooley Multiplier is posted.

As we can see in the example of 4 bit Baugh Wooley Multiplier, the static leakage current is trivial enough while the dynamic current in the transition is pretty high. So we took average current over one period then estimated the power consumption.

As for the comparison, 4x4 Booth Encoding Multiplier and 4x4 Baugh Wooth Multiplier have the highest Power consumptions, which may be caused by the complex computation of 2's complement binary representations. Interestingly, 4x4 Vedic Multiplier has relatively more transistors than Wallace, Dadda, Array, But the least power consumption among all the multipliers implemented. In other words, although making the system much more transistors and area, Vedic multiplier has low power consumption, which may show better performance while computing high bit multiplication. Wallace, Dadda, Array have similar power consumption, which can fluctuate in different scenarios.
**Transient Current of 4 Bit Baugh Wooley Multiplier over one pulse period.**
**Overall Transient Current of 4 Bit Baugh Wooley Multiplier**

### II. 8x8 Binary Multiplier Architecture

#### 1) Propagation Delay

Similarily, we use two sets of numbers for each type of multiplier to multiply for simulations of 8-bit multipliers, X=11111111, Y=11111111 is the one set of number while X=10101010, Y=10101010 is the other one. We set one bit input of the multiplicand, denoted as Y7, the highest bit of Y, as a pulse with a period of 3ns and pulse width 1ns, while keeping others constant all the time.

Likewise, we collected the measured propagation delay and plot the maximum delay of each structure as a histogram.

The result is that, the propagation delay of Array multiplier is still the highest. Interestingly, the delay of 8x8 Vedic multiplier is the lowest even compared with 8x8 Wallace and Dadda Multiplier. On the other hand, for 8x8 multipliers, the delay of the Dadda multiplier appears to be higher than that of Wallace multiplier, which is different with the results of 4x4 multipliers.

Firstly, this result proves that although having regular and straightforward structure, the array multiplier gets extremely high overall propagation delay. Secondly, this result is corresponding to the fact that the delay of Vedic multiplier increases slower compared with other conventional multipliers. For 4x4 multipliers, the delay of Vedic multiplier is much higher than that of 4x4 Wallace and Dadda multiplier, slightly smaller than 4x4 array multiplier. But for 8x8 multipliers, its delay is the lowest and increases much slower, from about 217 to 306 ps, than others.

One important thing that needs mentioning is the instant leakage pulse appearing in the pulse transition edge. The figure below is two examples from the measurement of 8-bit Wallace tree multiplier and 8-bit Vedic Multiplier.

**Wallace Multiplier P7 responding to Y7 **

**Wallace Multiplier P15 responding to Y7 **

**Vedic Multiplier responding to Y7 **

**Vedic Multiplier responding to Y7**

#### 2) Transistor Number Estimation

Likewise, we counted the transistor number of 8x8 multipliers. The result is that, the total number of transistors of the Vedic multiplier is the most, almost twice than 3 other structures. This proves that although showing great properties that the delay increases slowly in higher-bits multiplication, the Vedic multiplier comes to that at the expense of the number of transistors or area, which means that systems will be complex and large in higher bit multiplication.

#### 3) Power consumption

Likewise, we stillo use vdc parts to connect every source supply port of multipliers with vdd in the simulation of 8x8 multipliers. Then we got the total transient current response through the multiplier while providing the pulse to one input and keeping others constant. And we took average value of the transient current over one period and got the power estimation. The simulation results for 8x8 multipliers are shown in the tables below and An example of transient current response is posted.

Similarily, as shown in the figure, the static current is low and the dynamic power consumption is extremely high, which can cause big problem in applications. So that is one of the future works we need to do.

Based on the simulation of current, 8x8 Dadda Multiplier has the highest power consumption, followed by Wallace, Array, Vedic multiplier. Again, the array multiplier shows great properties of structure regularity. Although having the disadvantage that delay is extremely higher than others, array multiplier has relatively low power consumption because of its regularity. At last, the 8x8 Vedic multiplier shows the same condition as 4 bit multiplication. It has the lowest power consumption in spite of twice in transistor number than others.

Transient Current of 8 bit Wallace Multiplier

## Schedule

Week | Tasks | Notes | Complete? |
---|---|---|---|

4 x 4 bit binary multiplier schematic designs with Dadda, Wallace, Array, Booth encoding methods,etc in Cadence. And give the basic results. | Done | ||

8 x 8 bit binary multiplier schematic designs with Dadda, Wallace, Array methods,etc in Cadence. And give the basic results. | Done | ||

Finish all the 4 x 4 schematics and simulation results. And Summarize all the results and make a conclusion. | Done | ||

Finish all the 8 x 8 schematics and simulation results. And Summarize all the results and make a conclusion. | Done | ||

Final report; Final presentation |

## Future Works

The Baugh-Wooley multiplication algorithm and the Booth multiplication algorithm are two efficient ways to handle the sign bits. As we could see in the early sections, 5 by 5 Baugh-Wooley Array Multiplier and 5 by 5 Booth Radix-2 Array Multiplier could handle positive signed 5 bit 2's compliment numbers multiplication as the unsigned 4 by 4 simple array multiplier. In the future, we would like to explorer designs for 8 by 8 Baugh-Wooley Array Multiplier and 8 by 8 Booth Radix-2 Array Multiplier. Below are block diagram for those two designs:

## Files

Please download project files Project.tar.gz

## Reference

[1] Abraham, S.; Kaur, S.; Singh, S., "Study of various high speed multipliers," in Computer Communication and Informatics (ICCCI), 2015 International Conference on , vol., no., pp.1-5, 8-10 Jan. 2015

This paper describes four multipliers that is Modified Booth Multiplier, Vedic Multiplier (Urdhva Tiryakbhyam Sutra), Wallace Multiplier and Dadda Multiplier. The performance of many computational problems are often dominated by the speed at which a multiplication operation can be executed. various algorithms have been developed in order to perform multiplication faster and also to reduce the size of the multiplier. Various multipliers like Modified Booth multiplier, Vedic multiplier (Urdhva Tiryakbyham Sutra), Wallace multiplier and Dadda multipliers are discussed here. All these multipliers perform the task of multiplication correctly but their algorithms are different.

[2] Swee, K.L.S.; Lo Hai Hiung, "Performance comparison review of 32-bit multiplier designs," in Intelligent and Advanced Systems (ICIAS), 2012 4th International Conference on , vol.2, no., pp.836-841, 12-14 June 2012

This is a study of a relative performance comparison of various 32-bits multiplier designs of Array, Wallace, Dadda, Reduced Area and Radix-4 Booth Encoding multipliers in the Area-Optimized, Speed-Optimized and Auto-Optimized synthesis modes in Leonardo Spectrum. Multipliers are very important in Digital Signal Processing applications and microprocessor designs. With the advancement in technology, various studies have been conducted to produce high speed, low power consumption and less area or a combination of the performance in one multiplier. Hence, with these performance characteristics, we are able to find out the various high speeds, low power and compact VLSI implementation of the multipliers designed.

[3] Kelly Liew Suet Swee; Lo Hai Hiung; "Performance Comparison Review of Radix-Based Multiplier Designs", in Intelligent and Advanced Systems (ICIAS), 2012 4th International Conference on, vol.2, no., pp.854-859, 12-14 June 2012

In this paper, we would like to determine the type of the 32- bit multiplier designs that will produce the fastest speed and has the least area constraints. Not only that, we are also able to determine the most suitable multiplier for the type of Digital Signal Processing applications used based on the findings obtained. A performance comparison was done among Array, Wallace, Dadda, Reduced Area and Radix-4 Booth Encoding multiplier.

[4] T. Arunachalam and S. Kirubaveni, "Analysis of High Speed Multipliers", International conference on Communication and Signal Processing, April 3-5, 2013, India

[5] Devika Jaina, Kabiraj Sethi and Rutuparna Panda, "Vedic Mathematics Based Multiply Accumulate Unit", Computational Intelligence and Communication Networks (CICN), 2011 International Conference on, Pages: 754 - 757.

[6] Mohammed Hasmat Ali, Anil Kumar Sahani, "Study, Implementation and Comparison of Different Multipliers based on Array, KCM and Vedic Mathematics Using EDA Tools", International Journal of Scientific and Research Publications, Volume 3, Issue 6, June 2013. ISSN 2250-3153

[7] Sumit R. Vaidya and D. R. Dandekar "Performance Comparison of Multipliers for Power-Speed Trade-off in VLSI Design" http://www.wseas.us/e-library/conferences/2010/Cambridge/ICNVS/ICNVS-44.pdf

[8] Design and implementation of 16 Bit Vedic Arithmetic Unit (From: http://verilog-code.blogspot.com/2014/01/design-and-implementation-of-16-bit.html)

[9] http://dspace.unimap.edu.my/dspace/bitstream/123456789/1934/4/Literature%20review.pdf

[10] David Villeger and Vojin G. Oklobdzija "Evaluation of Booth Encoding Techniques For Parallel Multiplier Implementation" http://www.acsel-lab.com/Publications/Papers/38-booth-para-multi-EL93.pdf