Fall 2021 CS543/ECE549

Assignment 3: Robust estimation and geometric vision

Due date: Thursday, November 4, 11:59:59PM

The goal of this assignment is to implement homography and fundamental matrix estimation to register pairs of images, as well as attempt camera calibration, triangulation, and single-view 3D measurements.


Download starter code

This zip file contains the starter code and the report template. Read on for complete details and instructions.

Part 1: Stitching pairs of images

The first step is to write code to stitch together a single pair of images. For this part, you will be working with the following pair (click on the images to download the high-resolution versions):

  1. Load both images, convert to double and to grayscale.

  2. Detect feature points in both images. We provide Harris detector code you can use (it is also copied into the starter .ipynb). Alternatively, feel free to use the blob detector you wrote for Assignment 2.

  3. Extract local neighborhoods around every keypoint in both images, and form descriptors simply by "flattening" the pixel values in each neighborhood to one-dimensional vectors. Experiment with different neighborhood sizes to see which one works the best. If you're using your Laplacian detector, use the detected feature scales to define the neighborhood scales.

    Optionally, feel free to experiment with SIFT descriptors. You can use the OpenCV library to extract keypoints and compute descriptors through the function cv2.xfeatures2d.SIFT_create().detectAndCompute. This tutorial provides details about using SIFT in OpenCV.

  4. Compute distances between every descriptor in one image and every descriptor in the other image. In Python, you can use scipy.spatial.distance.cdist(X,Y,'sqeuclidean') for fast computation of Euclidean distance. If you are not using SIFT descriptors, you should experiment with computing normalized correlation, or Euclidean distance after normalizing all descriptors to have zero mean and unit standard deviation.

  5. Select putative matches based on the matrix of pairwise descriptor distances obtained above. You can select all pairs whose descriptor distances are below a specified threshold, or select the top few hundred descriptor pairs with the smallest pairwise distances.

  6. Implement RANSAC to estimate a homography mapping one image onto the other. Report the number of inliers and the average residual for the inliers (squared distance between the point coordinates in one image and the transformed coordinates of the matching point in the other image). Also, display the locations of inlier matches in both images by using plot_inlier_matches (provided in the starter .ipynb).

  7. Warp one image onto the other using the estimated transformation. In Python, use skimage.transform.ProjectiveTransform and skimage.transform.warp.

  8. Create a new image big enough to hold the panorama and composite the two images into it. You can composite by averaging the pixel values where the two images overlap, or by using the pixel values from one of the images. Your result should look something like this:

  9. You should create a color panorama by applying the same compositing step to each of the color channels separately (for estimating the transformation, it is sufficient to use grayscale images).

Tips and Details

  • For RANSAC, a very simple implementation is sufficient. Use four matches to initialize the homography in each iteration. You should output a single transformation that gets the most inliers in the course of all the iterations. For the various RANSAC parameters (number of iterations, inlier threshold), play around with a few "reasonable" values and pick the ones that work best. Refer to this lecture for details on RANSAC.

  • For details of homography fitting, you should review this lecture.

  • Homography fitting calls for homogeneous least squares. The solution to the homogeneous least squares system AX=0 is obtained from the SVD of A by the singular vector corresponding to the smallest singular value. In Python, U, s, V = numpy.linalg.svd(A) performs the singular value decomposition and V[len(V)-1] gives the smallest singular value.

For extra credit

  • Extend your homography estimation to work on multiple images. You can use this data, consisting of three sequences consisting of three images each. For the "pier" sequence, sample output can look as follows (although yours may be different if you choose a different order of transformations):

    Alternatively, feel free to acquire your own images and stitch them.

  • Experiment with registering very "difficult" image pairs or sequences -- for instance, try to find a modern and a historical view of the same location to mimic the kinds of composites found here. Or try to find two views of the same location taken at different times of day, different times of year, etc. Another idea is to try to register images with a lot of repetition, or images separated by an extreme transformation (large rotation, scaling, etc.). To make stitching work for such challenging situations, you may need to experiment with alternative feature detectors and/or descriptors, as well as feature space outlier rejection techniques such as Lowe's ratio test.

  • Try to implement a more complete version of a system for "Recognizing panoramas" -- i.e., a system that can take as input a "pile" of input images (including possible outliers), figure out the subsets that should be stitched together, and then stitch them together. As data for this, either use images you take yourself or combine all the provided input images into one folder (plus, feel free to add outlier images that do not match any of the provided ones).

  • Implement bundle adjustment or global nonlinear optimization to simultaneously refine transformation parameters between all pairs of images.

  • Learn about and experiment with image blending techniques and panorama mapping techniques (cylindrical or spherical).

Part 2: Fundamental Matrix Estimation, Camera Calibration, Triangulation

You will be using these two image pairs:

Source of Lab image pair:
GeorgiaTech CS4476/6476

Download full-size images and data files.

  1. Fundamental matrix estimation. Load each image pair and matching points file using the provided sample code (it is also copied into the starter .ipynb). Add your own code to fit a fundamental matrix to the matching points and use the sample code to visualize the results. You need to implement and compare the normalized and the unnormalized algorithms (see this lecture for the methods). For each algorithm and each image pair, report your residual, or the mean squared distance in pixels between points in both images and the corresponding epipolar lines.

  2. Camera calibration. For the lab pair, calculate the camera projection matrices by using 2D matches in both views and 3-D point coordinates given in lab_3d.txt in the data file. Refer to this lecture for the calibration method. Once you have computed your projection matrices, you can evaluate them using this sample function (it is also copied into the starter .ipynb), which will provide you the projected 2-D points and residual error. (Hint: For a quick check to make sure you are on the right track, empirically this residual error should be < 20 and the squared distance of your projected 2-D points from actual 2-D points should be < 4.)

    For the library pair, there are no ground truth 3D points. Instead, camera projection matrices are already provided in library1_camera.txt and library2_camera.txt.

  3. Calculate the camera centers using the estimated or provided projection matrices for both pairs.

  4. Triangulation. Use linear least squares to triangulate the 3D position of each matching pair of 2D points given the two camera projection matrices (see this lecture for the method). As a sanity check, your triangulated 3D points for the lab pair should match very closely the originally provided 3D points in lab_3d.txt. For each pair, display the two camera centers and reconstructed points in 3D. Also report the residuals between the observed 2D points and the projected 3D points in the two images.

Tips and Details

  • For fundamental matrix estimation, don't forget to enforce the rank-2 constraint. This can be done by taking the SVD of F, setting the smallest singular value to zero, and recomputing F.

  • Recall that the camera centers are given by the null spaces of the matrices. They can be found by taking the SVD of the camera matrix and taking the last column of V.

  • You do not need the camera centers to solve the triangulation problem. They are used just for the visualization.

Extra Credit

  • Use your putative match generation and RANSAC code from Part 1 to estimate fundamental matrices without ground-truth matches. For this part, only use the normalized algorithm. Report the number of inliers and the average residual for the inliers. Compare the quality of the result with the one you get from ground-truth matches.

Part 3: Single-View Geometry

  1. You will be working with the above image of the North Quad (save it to get the full-resolution version). First, you need to estimate the three major orthogonal vanishing points. Use at least three manually selected lines to solve for each vanishing point. The starter code provides an interface for selecting and drawing the lines, but the code for computing the vanishing point needs to be inserted. For details on estimating vanishing points, see Derek Hoiem's book chapter (section 4). You should also refer to this chapter and the single-view metrology lecture for details on the subsequent steps. In your report, you should:
    • Plot the VPs and the lines used to estimate them on the image plane using the provided code.
    • Specify the VP pixel coordinates.
    • Plot the ground horizon line and specify its parameters in the form a * x + b * y + c = 0. Normalize the parameters so that: a^2 + b^2 = 1.

  2. Using the fact that the vanishing directions are orthogonal, solve for the focal length and optical center (principal point) of the camera. Show all your work.

  3. Compute the rotation matrix for the camera, setting the vertical vanishing point as the Y-direction, the right-most vanishing point as the X-direction, and the left-most vanishing point as the Z-direction.

  4. Estimate the heights of (a) the CSL building, (b) the spike statue, and (c) the lamp posts assuming that the person nearest to the spike is 5ft 6in tall. In the report, show all the lines and measurements used to perform the calculation. How do the answers change if you assume the person is 6ft tall?

Part 3 Extra Credit

  • Perform additional measurements on the image: which of the people visible are the tallest? What are the heights of the windows? etc.

  • Compute and display rectified views of the ground plane and the facades of the CSL building.

  • Attempt to fit lines to the image and estimate vanishing points automatically either using your own code or code downloaded from the web.

  • Find or take other images with three prominently visible orthogonal vanishing points and demonstrate varions measurements on those images.

Grading checklist

Be sure to include the following in your report:
  1. Homography estimation:
    1. Describe your solution, including any interesting parameters or implementation choices for feature extraction, putative matching, RANSAC, etc.
    2. For the image pair provided, report the number of homography inliers and the average residual for the inliers (squared distance between the point coordinates in one image and the transformed coordinates of the matching point in the other image). Also, display the locations of inlier matches in both images.
    3. Display the final result of your stitching.

  2. Fundamental matrix estimation, calibration, triangulation:
    1. For both image pairs, for both unnormalized and normalized fundamental matrix estimation, display your result (points and epipolar lines) and report your residual.
    2. For the lab image pair, show your estimated 3x4 camera projection matrices. Report the residual between the projected and observed 2D points.
    3. For both image pairs, visualize 3D camera centers and triangulated 3D points.

  3. Single-view geometry: See items 1-4 in Part 3 above.

Submission Instructions

You must upload the following files to
Compass 2g.
  1. All your code for all three parts in a SINGLE zipped file. The filename should be netid_mp3_code.zip. There is no need for PDFs of any ipython notebook output, just make sure you include the notebooks themselves in the zip file and show any required outputs in the report.
  2. A single report for all three parts in PDF format. The filename should be netid_mp3_report.pdf.

Don't forget to hit "Submit" after uploading your files, otherwise we will not receive your submission.

Please refer to course policies on late submission, academic integrity, etc.