

A peer reviewed international journal ISSN: 2457-0362

www.ijarst.in

### Design of High Performance 64 bit MAC Unit

N.Pandu Ranga Reddy<sup>1</sup>, K.Maheswari Devi<sup>2</sup>
Associate Professor<sup>1</sup>, Assistant Professor<sup>2</sup>
Department of ECE
Malla Reddy Engineering College

**ABSTRACT:** A design of high performance 64 bit Multiplier-and-Accumulator (MAC) is implemented in this paper. MAC unit performs important operation in many of the digital signal processing (DSP) applications. In existing method the multiplier is designed using modified Wallace multiplier and the adder is done with carry save adder. In proposed method designed using compressor using radix 4 multiplication . The total design is coded with verilog-HDL and the simulation and synthesis is done using Modelsim 6.4b and Xilinx ISE 10.1e. The total MAC unit operates at 217 MHz The total power dissipation is 177.732 mW.

#### I. INTRODUCTION

With the rapid advances in multimedia and communication systems, real-time signal processing and large capacity data processing are increasingly being demanded. The multiplier is an essential element of the digital signal processing such as filtering and convolution. Most digital signal processing methods use nonlinear functions such as discrete cosine transform (DCT) or discrete wavelet transform (DWT).

As they are basically accomplished by repetitive application of Multiplication and addition, their speed becomes a major factor which determines the performance of the entire calculation. Since the multiplier requires the longest delay among the basic operational blocks in digital system, the critical path isdetermined more by the multiplier [1]. Furthermore, multiplier consumes much area and dissipates more power. Hence designing multipliers which offer either of the following design targets - high speed, low power consumption [2], less area or Even a combination of them is of substantial research interest. Multiplication operation involves generation of partial products and their accumulation. The speed of multiplication can be increased by reducing the number of partial products and/or accelerating the accumulation of partial products. Among the many methods of implementing high speed parallel multipliers, there are two basic approaches namely Booth algorithm and Wallace Tree compressors. This paper describes an efficient implementation of a high speed parallel multiplier using both these approaches. Here two multipliers are proposed. The first multiplier makes use of the Radix-4 Booth Algorithm with 3:2 compressors while the second multiplier uses the Radix-8 Booth algorithm with 4:2 compressors. The design is structured for m x n multiplication where m and n can reach up to 126 bits. The number of partial products is n/2 in Radix-4 Booth algorithm while it gets reduced to n/3 in Radix-8 Booth algorithm. The Wallace tree uses Carry Save Adders (CSA) to accumulate the partial products. This reduces the time as well as the chip area. To further enhance the speed of operation, carry-look-ahead (CLA) adder is used as the final adder [3].



Fig 1: Basic architecture of MAC unit.



A peer reviewed international journal ISSN: 2457-0362

www.ijarst.in

### II. RELATEDWORK:

Multipliers play a significant role in today's digital signal processing and various other applications. With advances in technology, many researchers have tried and are trying to design multipliers which offer either of following high speed, low power consumption, regularity of layout and hence less area or even combination of them in one multiplier, thus making them suitable for various high speed, low power, and compact VLSIimplementation.

### Wallace tree Multiplier

To make the conventional Wallace multiplier more efficient we use modified Wallace multiplier. Here in this modified Wallace multiplier our main aim is to reduce the number of half adder by replacing them with full adders. Generally in conventional Wallace multipliers many full adders and half adders are used in their reduction phase. Half adders will not reduce number partial products bits. Therefore minimizing the number of half adders with a very slight increase in the number of full adders will somewhat reduces the delay. Modified Wallace multiplier consists of three stages. First stage theN×N product matrix is formed and before passing on to the second phase the product matrix is rearranged to take the shape of inverted pyramid. Now in second phase the inverted pyramid is grouped into non-overlapping group based on the belowformula.

 $r_{i+1} = 2[r_{i}/3] + r_{i} \mod 3$ 

if rimod3=0, then ri+1=2ri/3

If the valve calculated from the above equation for number of rows in each stage in the second phase and the number of rows that are formed in each stage of the second phase does not match, only then the half adders will be used. The final product of this phase will be in a height of two bits and passed to the third phase .During this stage carry save adder is used for better performance rather than carry select adder and a ripple carry adder. The 64 bit modified Wallace multiplier is difficult to represent, so for understanding purpose a typical 8-bit by 8-bit reduction is shown in the below.



Figure 2: Modified Wallace 9bit by 9 bit reduction.

As shown above the two dots joined by a diagonal line indicates that these two are the output from the full adder. Similarly two dots joined by a crossed diagonal line indicate that these two dots are the output from half adder. Even though the method is efficient in area and speed, the circuit layout is not easy and also the circuit is quite irregular. So reduce this disadvantages which occur in this method we are going for Vedic algorithm basedmultiplier.

The high performance MAC unit is obtained by reducing the area and delay of the multiplier blockforthatpurposetherearemany algorithm among them some are array multiplier, booth multiplier, and conventional Wallace multiplier. Array Multiplier gives more power consumption as well as optimum number of components required, but delay for this multiplier is larger. It also requires larger number of gates because of which area is also increased; due to this array multiplier is less economical .so, to overcome the disadvantages of array multiplier an Australian computer scientist Chris Wallace developed an efficient



A peer reviewed international journal ISSN: 2457-0362

www.ijarst.in

methodology for multiplying of two integers in 1964. And named it as conventional Wallace multiplierwhichreducesthedelayinorderofO (log n) [4]. This multiplier is less regular, thus making it more difficult to layout in VLSI design.

#### III. PROPOSEDMETHOD:

In MAC unit multiplier block efficiency is increasedusing Vedicmethodwhencompared to existing methods. Vedic mathematics for computation of algorithms of the coprocessor will reduce the computational time, complexity, power, area, etc. Vedicmathematics is based on the 16 sutras of Vedas. This system is simpler and faster than the modern mathematics. Jagadguru Swami Sri Bharati Krishna TirthajiMaharaj who introduce Vedic mathematics and acknowledge the work of various people on Vedic mathematics. Later Anvesh Kumar, Ashish Raman, they gave the idea that Vedic sutras should be used to design the ALU [5]. They suggest two sutras for the designing of ALU. They are Nikhilam Sutra and Urdhvatriyak by ham Sutra. Multiply Accumulate block is extensible used here. Multiplication algorithm is implemented using Verilog HDL.

### a) UrdhvaTriyakbyhamSutra:

The sutras in Vedic mathematics helpto do almost all types of numeric calculations in easy and fast manner [6]. Among above sutras the Urdhva Triyak by hamistypically used for the multiplication purpose, applicable to all types of multiplication. Any bit binary number can be multiplied quickly by using this sutra. The meaning of this sutra is vertically and crosswise.



Fig 3: 2×2 multiplier.

This sutra is used for multiplying two numbers. Here we are multiplying two 64-bit binary numbers using this sutra. Basic block of design is  $2\times2$ multiplier block it designed our usingthisUrdhvaTriyakbyhamtechniquewhich isshownintheabovefigure.Forconvertingthe abovemultiplicationintohardwarestructurewe needfourANDgatesandtwoHALFADDERS. Letusconsidertwobinarvnumbersaandbtobe multiplied. a0. a1 are multiplied Where b1wegetq0,q1,q2,q3asoutputisshownbelow.



A peer reviewed international journal ISSN: 2457-0362

www.ijarst.in



Fig 4: Block diagram of 2×2 multiplier.

### b) VEDICMULTIPLIER:

Vedic multiplier uses hierarchical structure to reduce the number of partial product generation. The design of this multiplier starts with Multiplier design that is 2x2 bit multiplier [7]. Here, "Urdhvatriyagbhyam Sutra" or "Vertically and Crosswise Algorithm" for multiplication has been effectively used to develop digital multipliers. This algorithm is quite different from the traditional method of multiplication that is to add and shift the partial products.

This Sutras will us show how to handle multiplication of a larger number (N x N, of N bits each) by breaking it into smaller numbers of size (N/2 = n, say) and these smaller numbers can again be broken into smaller numbers (n/2 each) till we reach multiplicand size of (2 x 2). For Multiplier, first the basic blocks that are the 2x2 bit multipliers can be made and then using these blocks 4x4 blocks can be implemented. Further, using 4x4 blocks, 8x8 bit block can be implemented; from 8×8 blocks,  $16\times16$  bit block can also be implemented. This process of implementing can continue till our desired bit multiplier obtained.

The block diagram of 64-bit Vedic multiplier is as follow. Here in this diagram first 64-bit is divided into 4-block of 32-bit, and the 32-bit is divided into 4-blocks of 16-bit this will continue until basic block that is  $2\times2$  multiplier block. But in the below diagram we shown up to four 8-bit blocks. From this 8-bits block we get 16-bits product using our Vedic algorithm which is explained below blockdiagram.



A peer reviewed international journal ISSN: 2457-0362

www.ijarst.in



Fig 5: Block diagram of Vedic multiplication process

#### c) 16×16 Bits Vedicmultiplier:

First we have to design 8 bit block by using four 4 bit blocks which are implemented by using our basic 2 bit block. When we obtain 8\*8 multiply block.Nowwedividedourtwo16bitnumbers[a (15:0), b[ (15:0)].Into 4 block of 8\*8 bit multiply block. They are

First block –a [7:0], b [7:0];

Second block-a [15:8], b [7:0];

Three blocks-a [7:0], b [15:0];

Fourth block-a [15:8], b [15:8];

For every 8 bit block we get 16 bit product which are represented as q0, q1, q2 and q3 respectively. Now the output intermediate product is denoted as Q [31:0]. Firstly we get Q [7:0] directly from q0 [7:0]. Where Q [7:0] = q0 [7:0]. Next q0[15:8] is given to ADDER1 as one of the input and the other input is obtained from second block directly and the output obtained from ADDER1 is given to ADDER3 as one of the input and the other input is obtained from ADDER2 which gets input from third and fourth block of 8-bits multiplier. And finally we get the out from ADDER3 i.e. equal toQ[31:8]



Fig 6:block diagram of 16×16 Vedic multiplier.



A peer reviewed international journal ISSN: 2457-0362

www.ijarst.in

Adder tree diagram for above design is given below where we can clearly understand the process of addition.

Bits input to Addition Tree



Fig 7: adder tree of 16×16 Vedic multiplier.

### D) Carry save adder:

After completion of multiplication of two 64-bits binary numbers we obtained 128-bit product which is given to carry save adder which give parallel out. The Carry Save Adder (CSA) is a type of Digital adder, used to compute the sum of three or more number of bits in binary form. CSA (carry save adder) gives less propagation delay and the Glitching problem in RCA is also avoided [8]. Since, the Representation of 128 bit CSA is very difficult, A Typical example of 8 bit CSA is shownbelow.

#### Fig 8: carry save adder.

Here, we compute the sum of two 128 bit binary numbers so 128 half adders at the first stage is required instead of 128 full adders. Since, we add bits of two binary numbers only [9]. If, P and Q are two 128 bit numbers then i t produces the partial products and carry Siand Ci respectively. Where,

However, a CSA Produces all the output values in parallel. so that, the computation time is reduced compared to RCA. Also, Parallel in Parallel out (PIPO) is used in Accumulator Stage. From accumulator we get finaloutput.

#### **IV. RESULTS:**

The Design is developed using Verilog - HDL and Synthesized using Xilinx 14.3 ISE. As a previous work different MAC Units were developed using different combination of multipliers and adders. Here in this implementation we selected modified Wallace multiplier with carry save adder to compare with our efficient method that is Vedic multiplier using carry save adder and measure the performance parameters of two MAC units. The parameters are area, delay where area is measured in terms of number of slice and delay is measured in nanoseconds respectively. When code is checked our multiplier is shown in schematic way asfollow.



Fig 9: RTL schematic of proposed 64 bit MAC unit



A peer reviewed international journal ISSN: 2457-0362

www.ijarst.in



Fig 10. Technology schematic of proposed 64 bit MAC unit



Fig 11. Simulation result of proposed 64 bit MAC unit

### V. CONCLUSION

Hence a design of high performance 64 bit Multiplier-and-Accumulator (MAC) is implemented in this paper. The total MAC unit operates at a frequency of 217 MHz. The total power d issipated by 64 bit MAC unit is 177.732 mW. The total area occupied by it is 542177 11m2. Since the delay of 64 bit is less, this design can be used in the system which requires high performance in processors

involving large number of bits of the operation. The MAC unit is designed using Verilog-HDL and synthesized in Cadence 180nm RTL Complier.

### REFERENCES

- [1]. Young-Ho Seo and Dong-Wook Kim, "New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm," IEEE Transactions on very large scale integration (vlsi) systems, vol. 18, no. 2, february 20 10.
- (2). Ron S. Waters and Earl E. Swartzlander, Jr., "A Reduced Complexity Wall ace Multiplier Reduction, " IEEE Transactions On Computers, vol. 59, no. 8, Aug 20 10.
- [3]. C. S. Wallace, "A suggestion for a fast multiplier," Ieee Trans. lectronComput, vol. EC-13, no. I, pp. 14-17, Feb. 1964
- [4]. Shanthala S, Cyril Prasanna Raj, Dr.S.Y.Kulkarni, "Design and VLST implementation of Pipelined Multiply Accumulate Unit," IEEE International Conference on Emerging Trends in Engineering and Technology, ICETET-09



A peer reviewed international journal ISSN: 2457-0362

www.ijarst.in

[5]. B.Ramkumar, Harish M Kittur and P.Mahesh Kannan, "ASIC Implementation of Modified Faster Carry Save Adder ", European Journal of Scientific Research, Vol. 42, Issue 1, 2010.

- [6]. R.UMA, VidyaVijayan, M. Mohanapriya and Sharon Paul, "Area, Delay and Power Comparison of Adder Topologies", International Journal of VLSI design & Communication Systems (VLSICSj Vo1.3, No.1, February 2012 [7]. V. G. Oklobdzija, "High-Speed VLSI Arithmetic Units: Adders and Multipliers", in "Design of High-Performance Microprocessor Circuits", Book edited by A.Chandrakasan,IEEE Press,2000
- [8]. Dadda, "Some Schemes for Parallel Multipliers," Alta Frequenza, vol. 34, pp. 349-356, 1965.