Loading [MathJax]/jax/output/HTML-CSS/jax.js

Saturday, November 8, 2014

FFT: the Cooley-Tukey algorithm for general composite lengths

The Cooley-Tukey fast Fourier transform (FFT) algorithm is probably the most well-known FFT algorithm. Their paper "An algorithm for the machine calculation of complex Fourier series" was published in 1965, and it is said to have caused the boom of digital signal processing (DSP) in the 1970s. Note, however, that the algorithm was already known to Gauss at the beginning of the 19th century. The algorithm is based on recursively splitting the problem into smaller sub-problems, and thereby decreasing the computational complexity. In its most well-known form (the radix-2 FFT), the algorithm reduces the complexity from N2 to the famous Nlog(N) for a discrete Fourier transform (DFT) of length N=2M.

In this article I want to discuss the Cooley-Tukey algorithm for general composite N=N1N2. The easiest way to see how the algorithm works is to use index mapping. Consider the length-N DFT of a sequence x[n]: X[k]=N1n=0x[n]WnkN with WN=ej2π/N. Let's assume that N can be factored as N=N1N2. Now use the following index mapping n=n1N2+n2,n1=0,1,,N11,n2=0,1,,N21k=k1+N1k2,k1=0,1,,N11,k2=0,1,,N21 Plugging (2) into (1) gives X[k1+N1k2]=N21n2=0N11n1=0x[n1N2+n2]W(n1N2+n2)(k1+k2N1)N The complex exponentials in (3) can be simplified as follows: W(n1N2+n2)(k1+k2N1)N=WN2n1k1NWN1N2n1k2NWn2k1NWN1n2k2N==Wn1k1N11Wn2k1NWn2k2N2 because WN1N=WN2, WN2N=WN1, and WN1N2N=WNN=1. Plugging (4) back into (3) and rearranging terms gives X[k1+N1k2]=N21n2=0[Wn2k1N(N11n1=0x[n1N2+n2]Wn1k1N1)]Wn2k2N2 Equation (5) can be interpreted as follows: the result can be computed by computing N2 length-N1 DFTs of subsampled versions of the input sequence x[n] (this is the inner sum in (5)), then multiplying the result by twiddle factors Wn2k1N, and finally computing N1 length-N2 DFTs (the outer sum in (5)). This procedure is most easily visualized using matrices:
  1. read the data row-wise into a N1×N2 matrix
  2. compute DFTs over all N2 columns (this can be done in-place)
  3. multiply the result element-wise with twiddle factors
  4. compute DFTs over all N1 rows (in-place)
  5. read out the data column-wise to get the final length-N result
Let's see if and how this procedure reduces the complexity of computing a length-N DFT. Direct computation of a length-N DFT requires N2 complex multiplications. With the above procedure we need N2 DFTs of length N1 (N2N21 complex multiplications), N1N2 complex multiplications for the twiddle factors, and N1 DFTs of length N2 (N1N22 complex multiplications), resulting in a total of N2N21+N1N2+N1N22=N1N2(N1+N2+1)==N(N1+N2+1)<N2,for N1,N2>2 So the procedure results in reduced complexity compared to the direct computation of a length-N DFT. If the factors N1 and N2 themselves can be factored, then the method can be used recursively, which will result in a greater reduction of complexity. E.g., if N=2M this results in the famous radix-2 version of the Cooley-Tukey algorithm, which is discussed in most DSP textbooks.

In practice, an FFT routine which implements N=2M will be sufficient most of the time, because in most cases one is free to choose the DFT length, and the data can usually be zero-padded to the next power of two. However, there are cases where a non-power of 2 length is required, such as in 3GPP Long Term Evolution (LTE), where for a transmission bandwidth of 15MHz an FFT length of 1536 is required. Suppose that on a certain platform you have an FFT routine for N=2M. In this case, since 1536=3×512, the representation (5) can be used to split up the computation of the DFT in 3 FFTs of length 512, which can be computed by the available FFT routine, and 512 length-3 FFTs, the implementation of which is trivial.

This little octave/Matlab script illustrates the procedure for the case N=1536:

 N = 1536;         % non-power of 2 FFT length used in LTE  
 N1 = 512; N2 = 3; % 512 x 3 = 1536  
 x = randn(N,1);   % some input signal  
   
 % generate N1 x N2 matrix of input data, reading them row wise  
 X = reshape(x,[N2,N1]).';  
   
 % DFT over each column  
 X = fft(X);  
   
 % Multiplication with twiddle factors  
 X = X.*exp(-1i*2*pi/N*(0:N1-1)'*(0:N2-1));  
   
 % DFT over each row  
 X = (fft(X.')).';  
   
 % read out column wise  
 X = X(:);

So even if you don't want to program your own FFT algorithm it's very useful to understand the Cooley-Tukey method because it can be used to implement non-power-of-two FFTs using basic (power-of-2) FFT routines, if the FFT length N is composed of only a few (small) factors (other than 2), and this will be the case for almost all practical applications.

No comments:

Post a Comment