# COMPARATIVE STUDY OF FFT/IFFT PROCESSOR FOR HIGH THROUGHPUT-RATE APPLICATION

### Krishan Kumar<sup>1</sup>, Sonal Dahiya<sup>2</sup>

ASET, Amity University Haryana, India
 ASET, Amity University Haryana, India

#### **ABSTRACT**

This paper proposes a FFT processor with multiple pipeline architecture for high throughput-rate applications. The power consumption and hardware cost can also be save in this processor by using the higher radix FFT algorithm and less memory and complex multipliers.

**INTRODUCTION:** Fast Fourier transforms (FFT) and inverse fast Fourier transform (IFFT) are the key components ultra wideband system. However, it is a challenge to realize the UWB physical layer in VLSI implementation. So, it is important to desire high data rate transmission in time dispersive or frequency selective channels without having complex time domain channel equalizer but also can provide high spectral efficiency.

Although the memory-based architecture is considered most area efficient, it requires many computation cycles. Therefore we propose a multipath pipeline based architecture for high-speed application. The pipeline architecture has also the advantage of being highly regular, which can be easily scaled and parameterized in hardware design.

#### Design issue of the FFT processor for high-throughput rate application

Various architectures such as single-memory architecture, dual-memory pipeline architectures, array architecture and cached memory have been proposed over the last three decade.

#### Single memory architecture

A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories. **GE- International Journal of Engineering Research (GE-IJER)** Website: www.aarf.asia. Email: editoraarf@gmail.com , editor@aarf.asia

Single memory architectures have one processor and memory. Its in this case memory size is big for large sized FFTs and hence results in slow processing, hence not used generally.



Fig.1 Single Memory Architecture

### **Dual Memory Architecture**

Dual memory architectures are faster than the single memory architecture. The contrast circuitry is somewhat complex and also is costlier than single

memory Architecture.



Fig2. Dual Memory Architecture



Fig.3 Array Architecture

A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories. **GE- International Journal of Engineering Research (GE-IJER)** Website: www.aarf.asia. Email: editoraarf@gmail.com , editor@aarf.asia

This architecture is interconnected network results in the consumption of large chip area. That is why these type of architecture are not commonly used.

#### **Cache memory Architecture**

Cache memory Architecture is faster than main memory and it also increases the effective bandwidth of memory overall speed of the processor is increased. But this type of architecture having disadvantage of increased controller complexity as well as makes the processor costly.



#### Fig.4 Cache Memory Architecture

Pipeline architecture offers a good tradeoff between hardware complexity and processing rate. It is very attractive in implementing the FFT processor for high speed multimedia communication system. Figure shown below is basic pipeline architecture for FFT processor.

It consist of number of butterfly computational units interspersed with delay commutator elements foe interstate data reordering with this architecture it is necessary to select the appropriate read for butterfly computation.

In our view the pipelined architecture should be best choice for high throughput rate application. Since it can provide high throughput rate with acceptable hardware cost. The pipelined FFT architecture typically falls one of two following categories. One is multipath delay commutator (MDC) and the other is Single path delay feedback (SDF) as shown in figure below.



Fig.5 (a) MDC (b) SDF

In general, if appropriately reordered M parallel input data can be supported simultaneously in the MDC Scheme, this scheme provides M times throughput rate of SDF Scheme. However there are some limitations on the number of data path, FFT size and radix-2 FFT algorithm in MDC architecture.

Beside, the requirement of memory and complex multiplier in MDC scheme is more than that of SDF scheme. In general, the MDC scheme needs less memory and hardware cost.

For high throughput rate application, the MDC architecture is more suitable than SDF architecture in high frequency range application. If the input data are reordered in the input buffer they are loaded into the MDC processor. Unfortunately traditional R2 (Radix-2) MDC architecture cannot provide the available throughput rate unless it raise the work frequency. The R4 (Radix-4) MDC architecture, which needs a power of four, has the limitations on FFT size and split-radix (SRJ MDC) has higher hardware cost. In addition, the higher radix FFT algorithm is difficult to be implemented in the traditional MDC architecture.



Fig.6 SFG of 128 point mix Radix FFT algorithm

In general, the higher throughput rate of FFT processor can be provided by increasing the number of data paths in MDC pipelines architecture. However the hardware cost is also significantly increased because more memory and complex multiplier are needed to allow multiple data to be operated simultaneously.

In this paper four data path pipelined FFT architecture which is called mixed radix multipath delay feedback (MRMDF) is studied which here combined the features of SDF and MDC architectures. This FFT/IIFT processor provides a throughput rate meet to ultra sideband specification. The MRMDF architecture has lower hardware cost compared with traditional MDC approach and adopts the high-radix FFT algorithm to save power dissipation.

#### Architecture of FFT for high throughput rate

The block diagram of proposed 128 point FFT/IIFT processor is derived from equation 1,2,3,4 and signal flow graph shown in fig.1 ..

A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories. **GE- International Journal of Engineering Research (GE-IJER)** Website: www.aarf.asia. Email: editoraarf@gmail.com , editor@aarf.asia



This equation can be considered as two- dimensional DFT . One is a 64 point DFT and other is a 2 point DFT .

$$X(2(\beta_1 + 2\beta_2 + 4\beta_3 + 8\beta_4) + k_1) = \sum_{\alpha_4=0}^{7} BU_8(k_1, \beta_1, \beta_2, \beta_3, \alpha_4) W_8^{\alpha_4, \beta_4}$$
(2)

In the above equation a complex multiplication with one of two cofficient can be computed using addition and a real multiplication whose hardware can be realized by 6 shifters and 4 adders .

$$X(n) = \frac{1}{N} \left\{ \sum_{k=0}^{N-1} X^*(k) W^{nk} \right\}^*.$$
(3)

Where X(n) is the desried output sequence .

The proposed MRMDF architecture combining the feature of SDF and MDC architecture consist of module- 1, module-2 module-3, conjugate blocks, a division block and multiplexers. The features of proposed MDRDF architecture are the following:

- 1. Higher throughput rate can be provided by using 4 parallel data path
- 2. The minimum memory is required by using the delay feedback approach to reorder the input data and the intermediate results of each module.
- 3. The 128 point mixed radix FFT/IIFT algorithm is implemented to save power consumption.
- 4. The number of complex multiplier is minimized by using the scheduling scheme and the specified constant multipliers.

A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories. **GE- International Journal of Engineering Research (GE-IJER)** Website: www.aarf.asia. Email: editoraarf@gmail.com , editor@aarf.asia

In the MRMDF architecture the input sequence and the output sequence are in specified order . The order of the output sequence is the bit reversal of the order of input sequence as in fig.7



Fig.7 Block Diagram of 128 point FFT/IIFT processor

The operation of FFT/IIFT is controlled by control signal, as shown in fig7. . When the IIFT is performed in our processor, the sign of imaginary part of input sequence will be changed and then they will be performed by the process in treating FFT. The sign of imaginary part of output data from FFT will be changed again and then will be divided by 128point. Because 128 is a power of 2. The operation of division is implemented by shifting the decimal point location. The function of module 1 is to implement a radix 2 FFT algorithm corresponding to the first stage of signal flow graph. Module2 and module3 are to realize the radix8 FFT algorithm corresponding to second and third stage of signal flow graph. In order to minimize the memory requirement and to ensure the correction of the FFT output later.

#### Comparison of 128-point pipelined FFT architecture

|                              |    | MRMDF      | SRMDC*    | R2MDC*       | R23SDF   |
|------------------------------|----|------------|-----------|--------------|----------|
| No.<br>registers<br>(complex | of | 124(38.9%) | 318(100%) | 190(60%<br>) | 127(40%) |

A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories. **GE- International Journal of Engineering Research (GE-IJER)** Website: www.aarf.asia. Email: editoraarf@gmail.com , editor@aarf.asia

| words)        |                  |              |            |         |
|---------------|------------------|--------------|------------|---------|
|               |                  |              |            |         |
| No. of        | 2+4*0.62(44.8    | 10(100%)     | 6(60%)     | 3(30%)  |
| complex       | %)               |              |            |         |
| multipliers   |                  |              |            |         |
|               | 40/1000/         | 24(70.00())  | 14(2004)   | 14(200) |
| No. of        | 48(100%)         | 34(70.8%)    | 14(29%)    | 14(29%) |
| complex       |                  |              |            |         |
| adders        |                  |              |            |         |
| Algorithm     | Radix-2          | Calit and in | Radix-2    | Radix-2 |
| Algorithm     | Kaulx-2          | Split-radix  | Radix-2    | Kadix-2 |
|               | Radix-8          |              |            | Radix-8 |
|               |                  |              |            |         |
| Input data    | Parallel(4 input | Parallel(4   | Parallel(2 | Serial  |
| format        | port)            | input port)  | input      |         |
|               |                  |              | port)      |         |
|               |                  |              |            |         |
| Output data   | Parallel(4       | Parallel(6   | Parallel(2 | Serial  |
| format        | output port)     | output port) | output     |         |
|               |                  |              | port)      |         |
|               | 45               |              | 20         |         |
| Throughput    | 4R               | 6R*0.73      | 2R         | R       |
| rate(R: clock |                  |              |            |         |
| rate)         |                  |              |            |         |
|               |                  | 1 00         |            |         |

(1)No. of registers excluding the input buffers as listed.

(2)The desired throughput rate can be achieved if employing appropriately reordered parallel input data.

Table1: Comparison of 128 point pipeline FFT architecture

### Comparison

In general the performance and hardware cost of pipeline architecture are increased by using the multiple data path approach. Thus the multiple path architecture usually provides the higher throughput rate with higher hardware cost if the parallel input data can be supported in

this approach, the proposed MDRMDF architecture hardware cost in term of 128point FFT are as follows:

- 1. Register number 124
- 2. complex adder48
- 3. complex multiplier 2+4\*0.62

Table shown above compares the hardware requirement FFT algorithm and throughput rate with several classical and the above approach in 128point FFT.

### CONCLUSION

It is concluded that by MRMDF architecture high throughput rate can be achieved by using 4 data path . Furthermore the hardware costs of memory and complex multiplier can be saved by adopting delay feedback and data scheduling approach .In addition the number of complex multiplication is reduced effectively by using a higher radix algorithm.

#### REFERENCES

[1] Y.Jung, H.Yoon, and j.Kim,"New efficient FFT algorithm and pipeline implementation results for OFDM/DMT applications," IEEE Trans. Consum. Electron. vol.49,no.1,pp.14-20,Feb.2003.

[2]. L.Jia, Y.Gao, J. Isoaho, and H.Tenhunen,"A new VLSI-oriented FFT algorithm and implement,"in Proc.11<sup>th</sup> Annu.IEEE Int. ASIC Conf., Sep.1998, pp.337-341.

[3]. L.R. Rabiner and B.gold, Theory and application of Digital Signal Processing. Englewood cliffs, NJ:Prentice Hall, 1975.

[4]. S.Magar, S.Shen, G.Luikio, M.Fleming, and R.Aguilar,"an application specific DSP chip set for 100 MHz data rates,"in proc. Int. Conf. Acoustics, Speech, and Signal Processing, vol. 4, Apr.1988, pp.1989-1992.

[5]. Shousheng he and Mats Torkelson,"Designing pipeline FFT processor,"parallel Processing Symposium, pp.766-770, 1996.

A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories. **GE- International Journal of Engineering Research (GE-IJER)** Website: www.aarf.asia. Email: editoraarf@gmail.com , editor@aarf.asia

[6]. J.Garcia, J.A.Michell, and A.M.Buron,"VLSI configurable delay commutator for a pipeline split radix FFT architecture,"IEEE Traans. Signal Processing, vol47, pp.3098-3107, Nov.1999.

[7]. L.R.Rabiner and B.Gold, Theory and Application of Digital Signal Processing. Englewood Cliffs,NJ:Prentice Hall,1975.

[8]. K.K. Parthi, VLSI digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999, ch.7, p.189.

[9]. E.H. Wold and A.M. Despain. Pipeline and parallel-pipeline FFt processors for VLSI implementation. IEEE Trans. Comput., C-33(5):414-426, May 1984.