ClassECE6332Fall15GroupMultipler

From UVA ECE & BME wiki
Jump to: navigation, search

Contents

Comparison of Different Architectural Design Methods for Multiplier

Group Members

        Weizhong Huang(wh6ma@virginia.edu)

        Zekun Cai (zc4hs@virginia.edu)

        Siyi Shen (ss3dd@virginia.edu)

Abstract

        A multiplier is one of the key hardware blocks in most digital signal processing systems. This project is a study of a relative performance comparison of various 4-bits multiplier designs of Array, Wallace, Dadda, and Booth Encoding multipliers in Cadence. We compare their performance from looking into the trade-offs between parameters like the number of transistors, gate propagation delays and power dissipation in different types of multiplier structures. With the advancement in technology, various studies have been conducted to produce high speed, low power consumption and less area or a combination of the performance in one multiplier. Hence, with these performance characteristics, we are able to find out the various high speeds, low power and compact VLSI implementation of the multipliers designed.

Introduction

Different Multiplier Architecture

I. Array Multiplier


II. Wallace Tree Multiplier

Wallace Tree.png

III. Dadda Tree Multiplier

Dadda Tree.png


IV. Vedic Multiplier

Basic Operations of 2 bit Vedic Multiplication Mathematics
Logic Circuit Realization of 2-bit Multiplication

Vedic3.png

V. Booth-Encoding Radix-2 Array Multiplier

        The modified Booth recoding algorithm allows for the reduction of the number of partial products to be compressed in a carry save adder tree. Thus the compression speed can be enhanced. This is a standard technique used in chip design and provides significant improvements over the long multiplication technique.[3] The encoders of Radix-2 and Radix-4 Booth recoding are shown on the tables below. Radix-2 is used in our 4x4 Array Booth Multiplier design. The block diagram is shown below.

Radix-2 Booth Encoding
Radix-4 Booth Encoding

55Booth.png

VI. Baugh-Wooley Array Multiplier

The two’s complement number system represents positive as well as negative numbers. They have the following format: Baugh1.png

The range of an n-bit 2’s complement number is {-2^(n-1) , 2^(n-1)-1} Squaring two 2’s complement numbers: If two 2’s complement numbers were multiplied together, the resulting multiplication would result in 4 terms. Let us consider the case of squaring an n-bit 2’s complement number. We get:

Baugh2.png

Baugh3.png
The diagram shown below is the algorithm of Baugh Wooley Logic Design. BaughWooley55.png

Trade-offs

        In different applications, one may need to use different kinds of multiplier architectures that adapts to some specific scenarios. Therefore, it is necessary to discuss the trade-offs of multipliers, from the comparison of basic algorithms, to structure implementations such as the number of transistors, area, propagation delay, power dissipation, etc.

        In this architecture, after getting the partial product matrix, one may combine partial products as early as possible to compress the matrix to a ultimate matrix with 2 rows. The result is to minimize the overall delay by making the final CPA as short as possible, thus leading to fastest possible design. The number of logic levels required to perform the summation can be reduced. But the disadvantage of Wallace tree is the irregular structure and wires.

        Compared with Wallace algorithm, Dadda algorithm was designed to minimize the reduction performed at each level. One may combine partial product bits as late as possible, while keeping the critical path length of the tree minimal. Hence, this algorithm leads to a simpler CSA tree structure but much wider CPA at the end.

        One important advantage of an array multiplier is the regular structure. Since it is regular, it is easy to layout and has a small size. The design time of an array multiplier is much less than that of a tree multiplier.[9] Another advantage is its ease of design for pipelined structure, which largely increases the efficiency of the multiplier. But the main disadvantage is the worst delay of the multiplier. The speed will be very slow for a high bit array multiplier.

         Vedic mathematics used in a multiplier reduces the complexity in calculations that exist in a conventional multiplier. Due to its regular structure, it can be easily layout in a silicon chip. The Multiplier has the advantage that as the number of bits increases, gate delay and area increases very slowly as compared to other multipliers.[6] However, one main drawback is that the system will be extremely large because of the algorithm.

         Baugh Wooley Multiplier is another efficient technique to handle signed multiplication. It maximizes the regularity of the multiplier logic and allow all the partial products to have positive signed bits.[9] The main disadvantage is still the complex structure coming from operations of 2's complement numbers.

         Instead of directly adding operands on partial product matrix, Booth encoding techniques firstly performs the recoding that defines the operations to be performed on the multiplicand. The advantage is that it generates only a halve of the partial products compared to other multiplier implementations which do not use Booth Encoding.[10] However, this benefit achieves at the expense of increased complexity. The main difficulties come from the use of 2's complement representation which is necessary to compute negative partial products.[10]

Cadence Schematics

Basic Formed Units Schematics

Controlled Adder Subtracter in Booth Radix-2 Multiplier
Booth Encoder in Booth Radix-2 Multiplier
Negative Full Adder Unit in Baugh Wooley Multiplier
Positive Full Adder Unit in Baugh Wooley Multiplier


















































4 x 4 Binary Multiplier Schematics

[1] 4 x 4 Array Multiplier
http://venividiwiki.ee.virginia.edu/mediawiki/images/f/f4/4_array.png

4 array.png

[2] 4-bit Dadda Tree Multiplier
http://venividiwiki.ee.virginia.edu/mediawiki/images/2/25/4_dadda.png 4 dadda.png

[3] 4-bit Wallace Tree Multiplier
http://venividiwiki.ee.virginia.edu/mediawiki/images/9/9b/4_wallace.png 4 wallace.png

[4] 4-bit Baugh Wooley Array Multiplier
http://venividiwiki.ee.virginia.edu/mediawiki/images/1/16/4_wooley.png 4 wooley.png

[5] 4 x 4 Booth Encoding Radix-2 Multiplier
http://venividiwiki.ee.virginia.edu/mediawiki/images/3/37/4_booth.png 4 booth.png


[6] 4 x 4 Vedic Multiplier
http://venividiwiki.ee.virginia.edu/mediawiki/images/d/db/4_vedic.png 4 vedic.png

8 x 8 Binary Multiplier Schematics

[1] 8-bit Wallace Tree Multiplier
http://venividiwiki.ee.virginia.edu/mediawiki/images/d/da/8_wallace.png 8 wallace.png

[2] 8-bit Dadda tree multiplier
http://venividiwiki.ee.virginia.edu/mediawiki/images/b/b3/8_dadda.png 8 dadda.png

[3] 8 x 8 Array Multiplier
http://venividiwiki.ee.virginia.edu/mediawiki/images/8/82/8_array.png 8 array.png

[4] 8 x 8 Vedic Multiplier
http://venividiwiki.ee.virginia.edu/mediawiki/images/8/82/8_vedic.png 8 vedic.png

Simulation Results

I. 4x4 Binary Multiplier Architecture

1) Propagation Delay

        To compare the propagation delay of different multiplier implementations, we use two sets of numbers for each type of multiplier to multiply. For simulations of 4-bit multipliers, X=1111, Y=1111 is the one set of number while X=1010, Y=1010 is the other. For simulations of 8-bit multipliers, X=11111111, Y=11111111 is the one set of number while X=10101010, Y=10101010 is the other one. We both set one bit input of the multiplicand, Y3 for 4-bit tests and Y7 for 8-bit tests, as a pulse with a period of 3ns and pulse width 1ns, while keeping others constant all the time. As a result, we get specific outputs as the response to input pulse, which show the propagation delay if we measure the time difference between 50% pulse edges of input and output.

        Measured data is listed in the tables below. According to the measurement, we pick out the maximum delay of each kind of architecture and plot it as a histogram and a line chart for comparison.

        Firstly, we compared 4 kinds of 4x4 multipliers, 4x4 Wallace multiplier,4x4 Dadda multiplier,4x4 Vedic multiplier and 4x4 Bit Array multiplier. According to the line chart, with the same operands, the Dadda Multiplier has smaller propagation delay than 3 other multipliers. And the delay of Wallace multiplier is a little bit higher than that of Dadda Multiplier. The delay of Array Multiplier is the highest. This result is reasonable because tree structures get short overall delay but have irregular structure while array architectures get higher overall delay in order to get regular structure. Besides, the Dadda multiplier was designed to minimize the reduction performed at each level while compressing the partial product matrix. As a result, the CSA tree structure inside is simpler while a wider CPA appears at the end. On the contrary, the Wallace multiplier has a short enough CPA but much wider CSA structure. So the delay of these two structures are close. As for the Vedic multiplier, its advantage is that the delay increases slowly with higher bits. Thus, the delay is not obvious with 4-bit multiplication.

        Secondly, among array structures, we compared 3 types of 4x4 multipliers, 4x4 array multiplier, 4x4 Radix-2 Booth array multiplier and 4x4 Baugh Wooley array multiplier. As shown in the histogram, the delay of Baugh Wooley Multiplier is the lowest among these three structures and the basic array multiplier is the highest. It is not surprising to get that since Baugh Wooley and Booth Encoding are two algorithms proposed to improve array structures. Both two implementations reduce the number of partial products produced in matrix and hence get better performance.

Delay1.JPG

4x4 1.JPG
4x4 2.JPG

2) Transistor Number Estimation

         Based on the schematics we use in Cadence, we can count out the total number of transistors of different types of multiplier implementations. In general, the number of transistors can be used to estimate the areas of multiplier architectures. But it is not an accurate way to estimate areas unless the schematics are constructed in layout. For example, the regularity of units can greatly save space and increase the integration.

         In the histogram of estimation of the transistor number of 4x4 multipliers, Baugh Wooley multiplier and Radix-2 Booth multiplier are the top two on transistor numbers, with the Vedic multiplier and other basic multipliers following. This can be explained by the complexity of 2's complement binary computation used in two structures. Then the Vedic multiplier is the another complex one because of the very steps that Vedic mathematics has.

Transistor Number of Basic Gates and Formed Units

Transistor Num.jpg Screenshot 2.jpg

3) Power consumption

        Power consumption is an important index to characterize different multipliers. In this project, we use vdc parts in Cadence to connect every source supply port of multipliers with vdd. In this case, we got the total transient current response through the multiplier while providing the pulse to one input and keeping others constant. Then we took average value of the transient current over one period and multiply it with vdd. Hence, Power consumption of multipliers can be estimated. The simulation results are shown in the tables below and an example of transient current response of 4 bit Baugh Wooley Multiplier is posted.

        As we can see in the example of 4 bit Baugh Wooley Multiplier, the static leakage current is trivial enough while the dynamic current in the transition is pretty high. So we took average current over one period then estimated the power consumption.

        As for the comparison, 4x4 Booth Encoding Multiplier and 4x4 Baugh Wooth Multiplier have the highest Power consumptions, which may be caused by the complex computation of 2's complement binary representations. Interestingly, 4x4 Vedic Multiplier has relatively more transistors than Wallace, Dadda, Array, But the least power consumption among all the multipliers implemented. In other words, although making the system much more transistors and area, Vedic multiplier has low power consumption, which may show better performance while computing high bit multiplication. Wallace, Dadda, Array have similar power consumption, which can fluctuate in different scenarios.
4 Baugh Wooley.jpgTransient Current of 4 Bit Baugh Wooley Multiplier over one pulse period.
4 BW.jpgOverall Transient Current of 4 Bit Baugh Wooley Multiplier


Average Current and Power of 4x4 Multipliers


Power2.jpg

II. 8x8 Binary Multiplier Architecture

1) Propagation Delay

        Similarily, we use two sets of numbers for each type of multiplier to multiply for simulations of 8-bit multipliers, X=11111111, Y=11111111 is the one set of number while X=10101010, Y=10101010 is the other one. We set one bit input of the multiplicand, denoted as Y7, the highest bit of Y, as a pulse with a period of 3ns and pulse width 1ns, while keeping others constant all the time.

        Likewise, we collected the measured propagation delay and plot the maximum delay of each structure as a histogram.
        The result is that, the propagation delay of Array multiplier is still the highest. Interestingly, the delay of 8x8 Vedic multiplier is the lowest even compared with 8x8 Wallace and Dadda Multiplier. On the other hand, for 8x8 multipliers, the delay of the Dadda multiplier appears to be higher than that of Wallace multiplier, which is different with the results of 4x4 multipliers.

        Firstly, this result proves that although having regular and straightforward structure, the array multiplier gets extremely high overall propagation delay. Secondly, this result is corresponding to the fact that the delay of Vedic multiplier increases slower compared with other conventional multipliers. For 4x4 multipliers, the delay of Vedic multiplier is much higher than that of 4x4 Wallace and Dadda multiplier, slightly smaller than 4x4 array multiplier. But for 8x8 multipliers, its delay is the lowest and increases much slower, from about 217 to 306 ps, than others.


Delay111.jpg Delay222.jpg

One important thing that needs mentioning is the instant leakage pulse appearing in the pulse transition edge. The figure below is two examples from the measurement of 8-bit Wallace tree multiplier and 8-bit Vedic Multiplier.
Wallace All 1 P7.jpgWallace Multiplier P7 responding to Y7


ALL1 P15.jpgWallace Multiplier P15 responding to Y7


Vedic All 1 P7.jpgVedic Multiplier responding to Y7


Vedic All 1 P15.jpgVedic Multiplier responding to Y7

2) Transistor Number Estimation

        Likewise, we counted the transistor number of 8x8 multipliers. The result is that, the total number of transistors of the Vedic multiplier is the most, almost twice than 3 other structures. This proves that although showing great properties that the delay increases slowly in higher-bits multiplication, the Vedic multiplier comes to that at the expense of the number of transistors or area, which means that systems will be complex and large in higher bit multiplication.


Tran Num2.jpg Est2.jpg

3) Power consumption

        Likewise, we stillo use vdc parts to connect every source supply port of multipliers with vdd in the simulation of 8x8 multipliers. Then we got the total transient current response through the multiplier while providing the pulse to one input and keeping others constant. And we took average value of the transient current over one period and got the power estimation. The simulation results for 8x8 multipliers are shown in the tables below and An example of transient current response is posted.

        Similarily, as shown in the figure, the static current is low and the dynamic power consumption is extremely high, which can cause big problem in applications. So that is one of the future works we need to do.

        Based on the simulation of current, 8x8 Dadda Multiplier has the highest power consumption, followed by Wallace, Array, Vedic multiplier. Again, the array multiplier shows great properties of structure regularity. Although having the disadvantage that delay is extremely higher than others, array multiplier has relatively low power consumption because of its regularity. At last, the 8x8 Vedic multiplier shows the same condition as 4 bit multiplication. It has the lowest power consumption in spite of twice in transistor number than others.
8Wallace.JPGTransient Current of 8 bit Wallace Multiplier

Power3.jpg

Power4.jpg

Schedule

WeekTasksNotesComplete?
4 x 4 bit binary multiplier schematic designs with Dadda, Wallace, Array, Booth encoding methods,etc in Cadence. And give the basic results. Done
8 x 8 bit binary multiplier schematic designs with Dadda, Wallace, Array methods,etc in Cadence. And give the basic results. Done
Finish all the 4 x 4 schematics and simulation results. And Summarize all the results and make a conclusion. Done
Finish all the 8 x 8 schematics and simulation results. And Summarize all the results and make a conclusion. Done
Final report; Final presentation

Future Works

         The Baugh-Wooley multiplication algorithm and the Booth multiplication algorithm are two efficient ways to handle the sign bits. As we could see in the early sections, 5 by 5 Baugh-Wooley Array Multiplier and 5 by 5 Booth Radix-2 Array Multiplier could handle positive signed 5 bit 2's compliment numbers multiplication as the unsigned 4 by 4 simple array multiplier. In the future, we would like to explorer designs for 8 by 8 Baugh-Wooley Array Multiplier and 8 by 8 Booth Radix-2 Array Multiplier. Below are block diagram for those two designs:

Boughwooley88.png BoothEncode.PNG

Files

Please download project files Project.tar.gz

Reference

[1] Abraham, S.; Kaur, S.; Singh, S., "Study of various high speed multipliers," in Computer Communication and Informatics (ICCCI), 2015 International Conference on , vol., no., pp.1-5, 8-10 Jan. 2015

This paper describes four multipliers that is Modified Booth Multiplier, Vedic Multiplier (Urdhva Tiryakbhyam Sutra), Wallace Multiplier and Dadda Multiplier. The performance of many computational problems are often dominated by the speed at which a multiplication operation can be executed. various algorithms have been developed in order to perform multiplication faster and also to reduce the size of the multiplier. Various multipliers like Modified Booth multiplier, Vedic multiplier (Urdhva Tiryakbyham Sutra), Wallace multiplier and Dadda multipliers are discussed here. All these multipliers perform the task of multiplication correctly but their algorithms are different.

[2] Swee, K.L.S.; Lo Hai Hiung, "Performance comparison review of 32-bit multiplier designs," in Intelligent and Advanced Systems (ICIAS), 2012 4th International Conference on , vol.2, no., pp.836-841, 12-14 June 2012

This is a study of a relative performance comparison of various 32-bits multiplier designs of Array, Wallace, Dadda, Reduced Area and Radix-4 Booth Encoding multipliers in the Area-Optimized, Speed-Optimized and Auto-Optimized synthesis modes in Leonardo Spectrum. Multipliers are very important in Digital Signal Processing applications and microprocessor designs. With the advancement in technology, various studies have been conducted to produce high speed, low power consumption and less area or a combination of the performance in one multiplier. Hence, with these performance characteristics, we are able to find out the various high speeds, low power and compact VLSI implementation of the multipliers designed.

[3] Kelly Liew Suet Swee; Lo Hai Hiung; "Performance Comparison Review of Radix-Based Multiplier Designs", in Intelligent and Advanced Systems (ICIAS), 2012 4th International Conference on, vol.2, no., pp.854-859, 12-14 June 2012

In this paper, we would like to determine the type of the 32- bit multiplier designs that will produce the fastest speed and has the least area constraints. Not only that, we are also able to determine the most suitable multiplier for the type of Digital Signal Processing applications used based on the findings obtained. A performance comparison was done among Array, Wallace, Dadda, Reduced Area and Radix-4 Booth Encoding multiplier.

[4] T. Arunachalam and S. Kirubaveni, "Analysis of High Speed Multipliers", International conference on Communication and Signal Processing, April 3-5, 2013, India

[5] Devika Jaina, Kabiraj Sethi and Rutuparna Panda, "Vedic Mathematics Based Multiply Accumulate Unit", Computational Intelligence and Communication Networks (CICN), 2011 International Conference on, Pages: 754 - 757.

[6] Mohammed Hasmat Ali, Anil Kumar Sahani, "Study, Implementation and Comparison of Different Multipliers based on Array, KCM and Vedic Mathematics Using EDA Tools", International Journal of Scientific and Research Publications, Volume 3, Issue 6, June 2013. ISSN 2250-3153

[7] Sumit R. Vaidya and D. R. Dandekar "Performance Comparison of Multipliers for Power-Speed Trade-off in VLSI Design" http://www.wseas.us/e-library/conferences/2010/Cambridge/ICNVS/ICNVS-44.pdf

[8] Design and implementation of 16 Bit Vedic Arithmetic Unit (From: http://verilog-code.blogspot.com/2014/01/design-and-implementation-of-16-bit.html)

[9] http://dspace.unimap.edu.my/dspace/bitstream/123456789/1934/4/Literature%20review.pdf

[10] David Villeger and Vojin G. Oklobdzija "Evaluation of Booth Encoding Techniques For Parallel Multiplier Implementation" http://www.acsel-lab.com/Publications/Papers/38-booth-para-multi-EL93.pdf

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox