Project 4: [Auto]Stitching Photo Mosaics
Tej Bade, tbade12@berkeley.edu
Part A: Image Warping and Mosaicing
Taking the Pictures
The first step is taking pictures of a scene from different perspectives but keeping the same center of projection.
We then identify feature points, or correspondences, between the two images that will be used to warp an image onto the coordinate space of another image. The images with labeled correspondences are shown below.
In the following sections, im1 refers to the left image, im2 refers to the right image, and im1_corr and im2_corr refer to the correspondence points of im1 and im2 respectively.
Recovering Homographies
The homography matrix is a matrix of the form that will transform the points im1_corr to the points im2_corr. Specifically, if and is a correspondence pair, then we want for some scalar Manipulating the equations that result from this setup gives us a new matrix equation:
Thus, every correspondence pair gives us 2 rows in the leftmost matrix (yielding 2 equations in 8 variables). Having more than 4 correspondence points will yield an overdetermined system with 8 variables and more than 8 equations. In this case, we can use the method of linear least-squares to find the optimal values of and that satisfy the matrix equation and construct the homography matrix
Warping the Images
We are given two images im1 and im2 representing different perspectives of the same scene and want to produce a “stitched image.” We use the correspondences to compute the homography matrix H from im1 to im2. We then warp the corners of im1 using H to find the polygon that defines warped_im1, the warped version of im1 within the coordinate space of im2. Since the coordinates of the pixels in the polygon can be negative, we shift the entire image in the x and y directions so that all coordinates are positive. For each pixel in this polygon, we shift the coordinates back and apply the inverse of H to get the corresponding pixel in im1 to interpolate from (I used scipy.interpolate.RegularGridInterpolator with method = ‘linear’ for the interpolation). Finally, I use im2, warped_im1, and the x and y shifts to stitch the images together. In this process, im2 is essentially placed on top of warped_im1 without any weighted averaging. The result of this method using the first pair of images from the “Taking the Pictures” section is shown below.
Notice that we can easily tell that im2 is placed on top of warped_im1 from the edge artifacts. This is resolved in the “Blending the Images into a Mosaic” section.
Image Rectification
Image rectification refers to taking a picture that contains an object known to be rectangular and warping the image so that the object is a rectangle that aligns with the x- and y-axes of the image. I used the two images below.
To do this, I picked the 4 corners of the rectangular object as correspondences, found the dimensions of the object by taking the distances between correspondence points, and warping the image with the method in the previous section. The homography matrix is computed by taking the object corners as “im1_corr” and as “im2_corr” where and are the dimensions of the object. The original images are shown below on the left (along with the corners of the object) and the rectified images are shown below on the right.
Blending the Images into a Mosaic
The warping technique used before works well, but there are noticeable edge artifacts. In other words, it is obvious that we are just placing im2 on top of warped_im1. To blend the images better, I use cv2.distanceTransform to create an “alpha mask”, where each pixel inside the boundary of an image is set to the distance to the edge of the image. The images below show the masks for warped_im1 and im2 for the first pair of images in the “Taking the Pictures” section.
Then, I normalize the masks and use them to compute a weighted average of warped_im1 and im2. This produces the final blended image below.
Shown below are the results for the two other pairs of images.
Part B: Feature Matching for Autostitching
The goal of this part is complete all of the steps in part A without manually choosing correspondence points.
Detecting Features
I first use skimage.feature.corner_harris and skimage.feature.peak_local_max to compute the Harris corner measure responses and get the Harris corners for the image. I set threshold_rel to 0.01 in the peak_local_max function (since otherwise the entire image will be filled with corners) and discard corners around the edges. Shown below are the resulting Harris corners on the two Soda Hall images from part A.
Notice that there are a lot of clumps containing feature points that are very close to each other. Using Adaptive Non-Maximal Suppression (ANMS), we can reduce the total number of features while keeping them spread apart. ANMS works by first calculating the suppression radius for each point as follows:
, where is the Euclidean distance between the points and is the Harris corner response of point Then, we choose the top 100 points with the largest suppression radii. Shown below are the 100 points for both of the Soda Hall images.
Extracting Feature Descriptors
We then assign a descriptor for each feature, which contains information about the pixels around that feature point. For each point, we take the 40 by 40 patch centered at that point, downsample it to an 8 by 8 patch, and flatten it into a 64-dimensional vector. This gives us 100 64-dimensional vectors for each image that we can use to match features.
Feature Matching
For each point , we find the points and with the closest and second closest feature vectors (using Euclidean distance as the metric). We can then employ Lowe’s trick for rejecting possible outliers. If is the squared Euclidean distance between points and , then we only keep the match if / . I set the threshold to 0.7. Doing this gives us the following feature matches for the Soda Hall images:
Notice that there are still some outliers.
Robust Homographies
To further reject the outliers, we can implement the RANSAC method. For each iteration, we randomly sample four feature points, compute the homography from those points, and find the number of inliers among the features, where an inlier is a point in one image that, when projected using the homography, is close to its corresponding point in the other image. We set a threshold for how close the projected point has to be to the corresponding point and keep track of the largest set of inliers across the iterations. The resulting inliers from the method are shown below on the Soda Hall images.
Finally, to autostitch the images, we use the inlier set from the RANSAC method as our correspondences for im1 and im2 in the stitching/blending method from part A.
Final Results
The following pairs of images consist of the mosaic made with manually chosen correspondences on the left (these are the images computed in part A) alongside the autostitched mosaics on the right.
Reflection
I think the coolest thing I learned were the different techniques to reject feature outliers, including Lowe’s thresholding and the RANSAC method. It is very interesting that we can automatically find feature matches between two images even when the information around multiple corners in an image are similar.