Image registration is the process of geometrically aligning two or more images of the same scene taken either at different times, or from different viewpoints or by different sensors[1]. It has a wide range of applications like in remote sensing (multispectral classification, environmental monitoring, change detection, image mosaicing, weather forecasting, creating super-resolution images, integrating information into geographic information systems (GIS)), in medicine (combining computer tomography (CT) and NMR data to obtain more complete information about the patient, monitoring tumor growth, treatment verification, comparison of the patient’s data with anatomical atlases), in cartography (map updating), and in computer vision (target localization, automatic quality control, face recognition, image fusion, image retrieval), in creating panoramic image from different adjacent images, etc[2]. Flixstock yields a number of fashion-centric computer vision solutions like image draping, retouching, editing, and many more. Image registration is an integral part of the solution pipeline in almost all these tasks. 


Image registration involves linear parametric transformation of spatial coordinates to map the source image upon the target [3].  A set of sparse correspondence point pairs between the source and target image is first selected. This set of control points is taken into consideration to compute the parameters of the underlying linear transformation, and then the query image is registered over the target image [4]. The steps of image registration are well depicted using a block diagram in Figure 1 and each of these steps is detailed in the following sections.

Figure 1: Steps of Image Registration

    (a)   (b)

Figure 2: Input Images (a) Target Image and (b) Query Image


The goal of this step is to extract the common salient features of both the query and target images as shown in Figure 2. Features are significant structures or regions in the image like lines, corners, edges or contours as shown in Figure 3. These feature points have to be invariant with respect to scaling, rotation, translation and skewing [4]. Each of these feature points is represented using local descriptors that encode the extracted feature on the basis of the local geometric information around its neighbourhood. SIFT, SURF, BRISK, ORB are some standard feature descriptors. 

Figure 3: Detected features of input images.


In this stage, the correspondence between the selected feature points are established using various similarity measures as shown in Figure 4. Methods like SIFT and SURF usually deploy Sum of absolute difference (SAD) or Sum of squared difference (SSD), while binary features like BRISK and ORB use hamming distance as similarity metric.

Figure 4: Result of feature matching.


The previous step outputs a set of matched correspondence point pairs between the query and target image. However, this set might contain some faulty matches as well. Therefore, an additional post-processing is deployed to remove these faulty point-pairs . Most of the existing methods usually deploy the RANSAC [5] algorithm for outlier rejection. The result obtained after outlier removal is shown in Figure 5.

Figure 5: Result of outlier removal.


In general, a linear spatial mapping needs the prior knowledge of the required transformation between the query and target image. For instance, a composite transformation with translation, rotation, scaling, and skewing along the same plane indicates affine transformation, whereas these transformations across the plane demand projective transformation. The set of control point pairs obtained in the previous stage alongside the prior knowledge of the underlying mapping are used to compute the parameters of the required transformation, and then the query image is registered over the target image as shown in Figure 6.

Figure 6: Registered Image


Image registration is a very useful technique in various  image editing applications. The current state-of-the-art methods perform well in constrained scenarios, however fails to yield  satisfactory results in challenging environments. Objects with lack of high frequency details and/or presence of other occluding objects may yield inadequate feature points. Moreover, the conventional similarity metrics often yield incorrect matches during feature matching. These challenges demand the need of an automated deep architecture to learn the underlying mapping between the input and output space and accordingly design more robust features and similarity metrics.


[1] Saxena, Siddharth, and Rajeev Kumar Singh. “A survey of recent and classical image registration methods.” International journal of signal processing, image processing and pattern recognition 7, no. 4 (2014): 167-176.

[2] Zitova, Barbara, and Jan Flusser. “Image registration methods: a survey.” Image and vision computing 21, no. 11 (2003): 977-1000.

[3] Leng, Chengcai, Hai Zhang, Bo Li, Guorong Cai, Zhao Pei, and Li He. “Local feature descriptor for image matching: A survey.” IEEE Access 7 (2018): 6424-6434.

[4] Bisht, Sombir Singh, Bhumika Gupta, and Parvez Rahi. “Image registration concept and techniques: a review.” J Eng Res App 4 (2014): 30-5.[5] Dung, Lan-Rong, Chang-Min Huang, and Yin-Yi Wu. “Implementation of RANSAC algorithm for feature-based image registration.” J. Comput. Commun 1, no. 6 (2013): 46-50.