RegistrationThe registration phase consists of considering each pair of images and finding the transform that best maps the first one on the second one. An exhaustive search for the best match for all possible parameters can be computationally extremely expensive. So we decided to implement and compare two different approximate approaches: a frequency domain technique for finding displacement and rotation/scale, and a feature based method relying on the accurate detection of image features.
Many image registration methods work in the spatial domain. The phase correlation however uses the frequency domain, i.e. a two-dimensional DFT of the respective images. Phase correlation has been known for a considerable time [Kuglin75] and is widely used for image registration. It exploits basic properties of the Fourier transform.
Consider two images f1(x,y) and f2(x,y) that are related by a simple shift in the spatial domain:
Their corresponding Fourier transforms will then be related by
When taking the normalized cross-power spectrum of the two images as follows, a simple complex exponential results:
The inverse Fourier transform of this is a Dirac pulse at the coordinates of the translational movement. One has to exhaustively search for this peak to find the displacement. Even in the presence of noise this still works very well, one simply takes the maximum value of the IFT as the Dirac location. Its height can in practice be used as a measure for the quality of the match. Of course in practice one resorts to the DFT instead of the FT, which can be rapidly computed using the FFT. We used the FFTW [FFTW] public domain package for computing the two-dimensional FFTs and IFFTs.
A problem arises though in the implementation when the algorithm has to distinguish between negative shifts and shifts larger than N/2 (where N is the image size in one dimension). This becomes clear in an 1D-analogy. The normalized cross-power spectrum is denoted by S(k). The spatial index is n, the frequency index is k:
The algorithm is thus unable to distinguish between those two cases. The ambiguity is resolved by simply doubling the FFT sizes. This rules out the case that a shift is larger than N/2, but comes at the price of higher computational complexity and memory needs.
Once the IDFT is computed, the largest peak is searched (see Figure 1) and its value µ is recorded as a measure of the match quality. Given the focal length of the camera which is assumed to be known a priori, the translation is then converted to a relative 3D rotation matrix.
Reddy et al. [Reddy96] describe a method of additionally detecting a rotation and scale in two images by the phase correlation method. However, we did not implement this method since we constrained ourselves to no camera rotation and a fixed focal length in the test sequences. Although some small inadvertent rotation was present in some images, we found that the simple approach still gave us excellent results. We do however use the Reddy method for correlating features (see below).
Feature Points ExtractionTo locate common points of interest in the images, we use a generalization of the gray-image Harris corner-detector [Harris88] to color data introduced in [Montesinos98]. Given a matrix M:
where R, G and B are the red, green and blue pixel value functions of (x,y) and is an isotropic gaussian filter, the feature points are given by the positive local maxima of the function
The value of the local color gradients used to compute M are estimated using the Deriche filter [Deriche87], which has low sensitivity to noise. The following figures show stages of the feature extraction pipeline.
Feature Points MatchingConsidering to images Img1 and Img2, we want to match some of the features from Img1 to some of the features of Img2. Given those point pairs, we could derive the transform that maps all points of Img2 into Img1's space. We considered several criterions to assess the quality of a feature pairing :
To get an approximate registration of the images, we proceed as follows:
Only a few pair couples reach the linear regression step, so the main computation time is spent on the NxN phase correlations on 32x32 windows. But globally this phase is pretty fast (below 1s). Nevertheless, the accuracy on the local rotation angles and scale factors is limited, so the resulting transform is only roughly registering the 2 images.
Registration RefinementTo reach an arbitrarily precise registration, we use a simple iterative refinement presented in [Zoghlami97]. This algorithm relies on the fact that the local scale and rotation angles at any point are approximately right, and that the local error essentially consists in a translation. So we select a number of evenly distributed features in the overlapping area, and find the local erroneous translation by standard window correlation. We then compute an optimal transform using a gradient-descent directly in the space. This scheme is illustrated in the following figure
We can repeat that step until the linear regression grading yields the wanted correlation over the overlapping area.
Previous (Projective Geometry) · Up (Main) · Next (Global Registration)
|© 2000 Laurent Meunier and Moritz Borgmann|