Fall 2024 CS543/ECE549

Assignment 5: Two-view and multi-view geometry

Due date: Wednesday, December 11, 11:59:59PM

Contents

Part 1: Fundamental Matrix Estimation, Camera Calibration, Triangulation

For this part, you will be using these four image pairs:




First, download the
full-size images and data files and part 1 starter code. Note, the images shown above are just thumbnails and should not be used directly. Also note that the house and gaudi pairs are only to be used for item 5 below.

  1. Fundamental matrix estimation from ground truth matches. Load the lab and library image pairs and matching points file using the starter code. Add your own code to fit a fundamental matrix to the matching points and use the sample code to visualize the results. You need to implement and compare the normalized and the unnormalized algorithms (see this lecture for the methods). For each algorithm and each image pair, report your residual, or the mean squared distance in pixels between points in both images and the corresponding epipolar lines.

  2. Camera calibration. For the lab pair, calculate the camera projection matrices by using 2D matches in both views and 3-D point coordinates given in lab_3d.txt in the data file. Refer to this lecture for the calibration method. Once you have computed your projection matrices, you can evaluate them using the evaluate_points function included in the starter code, which will provide you the projected 2-D points and residual error. (Hint: For a quick check to make sure you are on the right track, empirically this residual error should be < 20 and the squared distance of your projected 2-D points from actual 2-D points should be < 4.)

    For the library pair, there are no ground truth 3D points. Instead, camera projection matrices are already provided in library1_camera.txt and library2_camera.txt.

  3. Calculate the camera centers for the lab and library pairs using the estimated or provided projection matrices.

  4. Triangulation. For the lab and library pairs, use linear least squares to triangulate the 3D position of each matching pair of 2D points given the two camera projection matrices (see this lecture for the method). As a sanity check, your triangulated 3D points for the lab pair should match very closely the originally provided 3D points in lab_3d.txt. For each pair, display the two camera centers and reconstructed points in 3D. Also report the residuals between the observed 2D points and the projected 3D points in the two images.

  5. Fundamental matrix estimation without ground-truth matches. The provided house and gaudi image pairs do not include ground truth 2D matches. For these pairs, you will use your putative match generation and RANSAC code from Assignment 3 to estimate fundamental matrices. To generate matches, you should use the SIFT descriptor functions from OpenCV mentioned in Assignment 3. For this part, only use the normalized algorithm. Report the number of inliers and the average residual for the inliers, and display the inliers in each image.

Part 1 Tips and Details

  • For fundamental matrix estimation, don't forget to enforce the rank-2 constraint. This can be done by taking the SVD of F, setting the smallest singular value to zero, and recomputing F.

  • Recall that the camera centers are given by the null spaces of the matrices. They can be found by taking the SVD of the camera matrix and taking the last column of V.

  • You do not need the camera centers to solve the triangulation problem. They are used just for the visualization.

Part 1 Extra Credit

  • Perform nonlinear refinement of esimated fundamental matrices and/or triangulated points by minimizing pixel-level residuals as explained in the lectures.
  • Estimate the fundamental matrix using the seven-point algorithm.

Part 2: Affine Factorization

The goal of this part of the assignment is to implement the Tomasi and Kanade affine structure from motion method as described in this lecture. You will be working with Carlo Tomasi's 101-frame hotel sequence:



Download the data file, including all the 101 images and a measurement matrix consisting of 215 points visible in each of the 101 frames (see readme file inside archive for details).

  1. Load the data matrix and normalize the point coordinates by translating them to the mean of the points in each view (see lecture for details).

  2. Apply SVD to the 2M x N data matrix to express it as D = U @ W @ V' (using NumPy notation) where U is a 2Mx3 matrix, W is a 3x3 matrix of the top three singular values, and V is a Nx3 matrix. You can use numpy.linalg.svd to compute this decomposition. Next, derive structure and motion matrices from the SVD as explained in the lecture.

  3. Find the matrix Q to eliminate the affine ambiguity using the method described on slide 32 of the lecture.

  4. Use matplotlib to display the 3D structure (in your report, you may want to include snapshots from several viewpoints to show the structure clearly). Discuss whether or not the reconstruction has an ambiguity.

  5. Display three frames with both the observed feature points and the estimated projected 3D points overlayed. Report your total residual (sum of squared Euclidean distances, in pixels, between the observed and the reprojected features) over all the frames, and plot the per-frame residual as a function of the frame number.

Part 2 Extra Credit

  • Incorporate incomplete point tracks (i.e., points that are not visible in every frame) into the reconstruction. Use the tracks from this file.

  • Create a textured 3D model of the reconstructed points. For example, you can compute a Delaunay triangulation of the points in 2D, then lift that mesh structure to 3D, and texture it using the images.

Part 3: Binocular Stereo

The goal of this part is to implement a simple window-based stereo matching algorithm for rectified stereo pairs. You will be using the following stereo pairs:

 
 

Follow the basic outline given in this lecture: pick a window around each pixel in the first (reference) image, and then search the corresponding scanline in the second image for a matching window. The output should be a disparity map with respect to the first view (use these ground truth maps for qualitative reference for first pair and second pair). You should experiment with the following settings and parameters:

  • Search window size: show disparity maps for several window sizes and discuss which window size works the best (or what are the tradeoffs between using different window sizes). How does the running time depend on window size?

  • Disparity range: what is the range of the scanline in the second image that should be traversed in order to find a match for a given location in the first image? Examine the stereo pair to determine what is the maximum disparity value that makes sense, where to start the search on the scanline, and which direction to search in. Report which settings you ended up using.

  • Matching function: try sum of squared differences (SSD), sum of absolute differences (SAD), and normalized correlation. Discuss in your report whether there is any difference between using these functions, both in terms of quality of the results and in terms of running time.
In addition to showing your results and discussing implementation parameters, discuss the shortcomings of your algorithm. Where do the estimated disparity maps look good, and where do they look bad? What would be required to produce better results? Also discuss the running time of your approach and what might be needed to make stereo run faster.

Part 3 Extra Credit

  • Convert your disparity map to a depth map and attempt to visualize the depth map in 3D. Just pretend that all projection rays are parallel, and try to "guess" the depth scaling constant. Experiment with displaying a 3D point cloud, or computing a Delaunay triangulation of that point cloud.

  • Find additional rectified stereo pairs on the Web and show the results of your algorithm on these pairs.

  • Find non-rectified stereo pairs and rectification code on the Web and apply your algorithm to this data.

  • Implement multiple-baseline stereo as described in this paper. Use this data.

  • Try to incorporate non-local constraints (smoothness, uniqueness, ordering) into your algorithm. You can come up with simple heuristic ways of incorporating these constraints, or try to implement some of the more advanced methods discussed in the course (dynamic programming, graph cuts). For this part, it is also fine to find code on the web.

Submission Instructions

You must upload the following files on Canvas:

  1. Your code in separate files for each part. As before, the filenames should be lastname_firstname_a5_pX.ipynb or lastname_firstname_a5_pX.py. For any iPython notebooks, you should also output an exported PDF.
  2. A report in a single PDF file with all your results and discussion for all parts following this template. The filename should be lastname_firstname_a5.pdf.
  3. All your output images and visualizations in a single zip file. The filename should be lastname_firstname_a5.zip. Note that this zip file is for backup documentation only, in case we cannot see the images in your PDF report clearly enough. You will not receive credit for any output images that are part of the zip file but are not shown (in some form) in the report PDF.

Please refer to course policies on late submission, academic integrity, etc.