Registration

The registration phase consists of considering each pair of images and finding the transform that best maps the first one on the second one. An exhaustive search for the best match for all possible parameters can be computationally extremely expensive. So we decided to implement and compare two different approximate approaches: a frequency domain technique for finding displacement and rotation/scale, and a feature based method relying on the accurate detection of image features.

Phase Correlation

Many image registration methods work in the spatial domain. The phase correlation however uses the frequency domain, i.e. a two-dimensional DFT of the respective images. Phase correlation has been known for a considerable time [Kuglin75] and is widely used for image registration. It exploits basic properties of the Fourier transform.

Consider two images f₁(x,y) and f₂(x,y) that are related by a simple shift in the spatial domain:

Their corresponding Fourier transforms will then be related by

When taking the normalized cross-power spectrum of the two images as follows, a simple complex exponential results:

The inverse Fourier transform of this is a Dirac pulse at the coordinates of the translational movement. One has to exhaustively search for this peak to find the displacement. Even in the presence of noise this still works very well, one simply takes the maximum value of the IFT as the Dirac location. Its height can in practice be used as a measure for the quality of the match. Of course in practice one resorts to the DFT instead of the FT, which can be rapidly computed using the FFT. We used the FFTW [FFTW] public domain package for computing the two-dimensional FFTs and IFFTs.

A problem arises though in the implementation when the algorithm has to distinguish between negative shifts and shifts larger than N/2 (where N is the image size in one dimension). This becomes clear in an 1D-analogy. The normalized cross-power spectrum is denoted by S(k). The spatial index is n, the frequency index is k:

The algorithm is thus unable to distinguish between those two cases. The ambiguity is resolved by simply doubling the FFT sizes. This rules out the case that a shift is larger than N/2, but comes at the price of higher computational complexity and memory needs.

Once the IDFT is computed, the largest peak is searched (see Figure 1) and its value µ is recorded as a measure of the match quality. Given the focal length of the camera which is assumed to be known a priori, the translation is then converted to a relative 3D rotation matrix.

Figure 1: IDFT of the normalized cross power spectrum

Reddy et al. [Reddy96] describe a method of additionally detecting a rotation and scale in two images by the phase correlation method. However, we did not implement this method since we constrained ourselves to no camera rotation and a fixed focal length in the test sequences. Although some small inadvertent rotation was present in some images, we found that the simple approach still gave us excellent results. We do however use the Reddy method for correlating features (see below).

Feature Based Method

The feature-based matching algorithm we implemented can be decomposed into three steps: we first filter the images to extract feature points from the images. Then we try to find matches between sets of points, to get an approximate registration of the 2 images. The last phase consists of a iterative refinement of the transform.

Feature Points Extraction

To locate common points of interest in the images, we use a generalization of the gray-image Harris corner-detector [Harris88] to color data introduced in [Montesinos98]. Given a matrix M:

where R, G and B are the red, green and blue pixel value functions of (x,y) and is an isotropic gaussian filter, the feature points are given by the positive local maxima of the function

The value of the local color gradients used to compute M are estimated using the Deriche filter [Deriche87], which has low sensitivity to noise. The following figures show stages of the feature extraction pipeline.

[IMG_2006.JPG]
Figure 1: Original image

[grad.jpg]
Figure 2: Deriche |gradients|

[harris.jpg]
Figure 3: Harris function

[features.jpg]
Figure 4: Extracted features

Feature Points Matching

Considering to images Img1 and Img2, we want to match some of the features from Img1 to some of the features of Img2. Given those point pairs, we could derive the transform that maps all points of Img2 into Img1's space. We considered several criterions to assess the quality of a feature pairing :

We first used color differential invariants, as mentioned in [Montesinos98]. We computed the following vector for each of the features in the images: . This vector is invariant through translation and translation, and can be made invariant through scaling if the set of the 6 last coordinates is normalized. A similarity measure for features can then be computed by simply using the L2 distance in this vector's space. However, we realized that this measure yielded a quite poor discrimination between good and bad pairs, mainly because the 3 components' gradients are often very close.
The second approach we considered was feature's neighborhood window correlation, which gave good results for very small rotations and perspective distortion, but collapsed otherwise.
Consequently we modified this method by reparametrizing the windows to get rotation and scale invariance. The polar-log parametrization turns rotation into x-axis shifting and scaling into y-axis shifting, as illustrated in the following figures [Reddy96]. Using phase correlation on a pair of such windows, we then get a matching grade, a relative local rotation and a relative local scaling of the 2 features' neighborhoods.

Figure 4: polar-log parametrization

To get an approximate registration of the images, we proceed as follows:

Select the N "best" features points from both images (best in the sense that their Harris value is high).
Compute all NxN matching grades, local rotation angles and local scale factors.
Build a pair set S by keeping only the matches over two given threshold, one absolute, and the other relative to the best match of each feature.
For all possible couple of pairs from S:
- compute a 3x3 homogeneous matrix from the coordinates of the 4 points, the 2 scaling factors and the 2 rotation angles.
- check if the obtained projective transform is lying in some acceptable bounds.
- if so, compute a global registration grade using a linear regression correlation measure applied to the overlapping area pixels.
Keep the best acceptable transform from S.

Only a few pair couples reach the linear regression step, so the main computation time is spent on the NxN phase correlations on 32x32 windows. But globally this phase is pretty fast (below 1s). Nevertheless, the accuracy on the local rotation angles and scale factors is limited, so the resulting transform is only roughly registering the 2 images.

Registration Refinement

To reach an arbitrarily precise registration, we use a simple iterative refinement presented in [Zoghlami97]. This algorithm relies on the fact that the local scale and rotation angles at any point are approximately right, and that the local error essentially consists in a translation. So we select a number of evenly distributed features in the overlapping area, and find the local erroneous translation by standard window correlation. We then compute an optimal transform using a gradient-descent directly in the

space. This scheme is illustrated in the following figure

We can repeat that step until the linear regression grading yields the wanted correlation over the overlapping area.

Registration Example

Figure 5: Features in the left image

Figure 6: Features in the right image

Figure 7: Resulting registration

Previous (Projective Geometry) · Up (Main) · Next (Global Registration)