In this article I want to discuss the Cooley-Tukey algorithm for general composite $N=N_1N_2$. The easiest way to see how the algorithm works is to use index mapping. Consider the length-$N$ DFT of a sequence $x[n]$: $$X[k]=\sum_{n=0}^{N-1}x[n]W_N^{nk}\tag{1}$$ with $W_N=e^{-j2\pi/N}$. Let's assume that $N$ can be factored as $N=N_1N_2$. Now use the following index mapping $$n=n_1N_2+n_2,\quad n_1=0,1,\ldots, N_1-1,\quad n_2=0,1,\ldots,N_2-1\\ k=k_1+N_1k_2,\quad k_1=0,1,\ldots, N_1-1,\quad k_2=0,1,\ldots,N_2-1\tag{2}$$ Plugging (2) into (1) gives $$X[k_1+N_1k_2]=\sum_{n_2=0}^{N_2-1}\sum_{n_1=0}^{N_1-1}x[n_1N_2+n_2]W_N^{(n_1N_2+n_2)(k_1+k_2N_1)}\tag{3}$$ The complex exponentials in (3) can be simplified as follows: $$W_N^{(n_1N_2+n_2)(k_1+k_2N_1)}=W_N^{N_2n_1k_1}W_N^{N_1N_2n_1k_2}W_N^{n_2k_1}W_N^{N_1n_2k_2}=\\= W_{N_1}^{n_1k_1}\cdot 1\cdot W_{N}^{n_2k_1}W_{N_2}^{n_2k_2}\tag{4}$$ because $W_N^{N_1}=W_{N_2}$, $W_N^{N_2}=W_{N_1}$, and $W_N^{N_1N_2}=W_N^N=1$. Plugging (4) back into (3) and rearranging terms gives $$X[k_1+N_1k_2]=\sum_{n_2=0}^{N_2-1}\left[W_N^{n_2k_1}\left(\sum_{n_1=0}^{N_1-1}x[n_1N_2+n_2]W_{N_1}^{n_1k_1}\right)\right]W_{N_2}^{n_2k_2}\tag{5}$$ Equation (5) can be interpreted as follows: the result can be computed by computing $N_2$ length-$N_1$ DFTs of subsampled versions of the input sequence $x[n]$ (this is the inner sum in (5)), then multiplying the result by twiddle factors $W_N^{n_2k_1}$, and finally computing $N_1$ length-$N_2$ DFTs (the outer sum in (5)). This procedure is most easily visualized using matrices:

- read the data row-wise into a $N_1\times N_2$ matrix
- compute DFTs over all $N_2$ columns (this can be done in-place)
- multiply the result element-wise with twiddle factors
- compute DFTs over all $N_1$ rows (in-place)
- read out the data column-wise to get the final length-$N$ result

In practice, an FFT routine which implements $N=2^M$ will be sufficient most of the time, because in most cases one is free to choose the DFT length, and the data can usually be zero-padded to the next power of two. However, there are cases where a non-power of 2 length is required, such as in 3GPP Long Term Evolution (LTE), where for a transmission bandwidth of 15MHz an FFT length of 1536 is required. Suppose that on a certain platform you have an FFT routine for $N=2^M$. In this case, since $1536=3\times 512$, the representation (5) can be used to split up the computation of the DFT in $3$ FFTs of length $512$, which can be computed by the available FFT routine, and $512$ length-3 FFTs, the implementation of which is trivial.

This little octave/Matlab script illustrates the procedure for the case $N=1536$:

```
N = 1536; % non-power of 2 FFT length used in LTE
N1 = 512; N2 = 3; % 512 x 3 = 1536
x = randn(N,1); % some input signal
% generate N1 x N2 matrix of input data, reading them row wise
X = reshape(x,[N2,N1]).';
% DFT over each column
X = fft(X);
% Multiplication with twiddle factors
X = X.*exp(-1i*2*pi/N*(0:N1-1)'*(0:N2-1));
% DFT over each row
X = (fft(X.')).';
% read out column wise
X = X(:);
```

So even if you don't want to program your own FFT algorithm it's very useful to understand the Cooley-Tukey method because it can be used to implement non-power-of-two FFTs using basic (power-of-2) FFT routines, if the FFT length $N$ is composed of only a few (small) factors (other than 2), and this will be the case for almost all practical applications.

## No comments:

## Post a Comment