Fall 2024 CS543/ECE549Assignment 1: Color channel alignment, hybrid imagesDue date: Monday, September 23, 11:59:59 PMContents
Part 1: Color channel alignment in the spatial domainOriginally adapted from A. Efros and updated numerous times. Image source: Wikipedia Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a photographer who, between the years 1909-1915, traveled the Russian empire and took thousands of photos of everything he saw. He used an early color technology that involved recording three exposures of every scene onto a glass plate using a red, green, and blue filter. Back then, there was no way to print such photos, and they had to be displayed using a special projector. Prokudin-Gorskii left Russia in 1918. His glass plate negatives survived and were purchased by the Library of Congress in 1948. Today, a digitized version of the Prokudin-Gorskii collection is available online. The goal of this part of the assignment is to learn to work with images by taking the digitized Prokudin-Gorskii glass plate images and automatically producing a color image with as few visual artifacts as possible. In order to do this, you will need to extract the three color channel images, place them on top of each other, and align them so that they form a single RGB color image. You will need to implement this assignment in Python, and you should familiarize yourself with libraries for scientific computing and image processing including NumPy and PIL. Data
Part 1A: Basic alignmentYour program should divide the image into three equal parts (channels) and align two of the channels to the third. For each image, you should try different orders of aligning the channels and figure out which one works the best. You will need to include in your report the colorized output and the (x,y) displacement vectors that were used to align the channels.IMPORTANT: For consistency of grading, you need to report your displacements of R and G channels with respect to the B channel, even if you used a different order of alignment. After you split the input images into three equal parts along the vertical dimension, take the upper left corner of each channel to be the origin or reference point with respect to which the offset is reported. You may need to perform cropping within each channel as discussed below, but be sure to maintain these initial reference points for the purpose of reporting your offsets. In Part 1A, work with the set of six lower-resolution images. The easiest way to align the parts is to exhaustively search over a window of possible displacements (say [-15,15] pixels independently for the x and y axis), score each one using some image matching metric, and take the displacement with the best score. There is a number of possible metrics that one could use to score how well the images match. The most basic one is the L2 norm of the pixel differences of the two channels, also known as the sum of squared differences (SSD), which in Python is simply sum((image1-image2)**2) for images loaded as NumPy arrays. Note that in our case, the images to be matched do not actually have the same brightness values (they are different color channels), so it may be more appropriate to use normalized cross-correlation (NCC), which is simply the dot product between the two images normalized to have zero mean and unit norm. Experiment with both SSD and NCC and use the one that gives you the best results. Border cropping. The borders of the photograph will cause problems since the three channels won't exactly align there. For best alignment results, you should crop out the borders of the images and do alignment search along the interior portions that are more consistent across the three color channels. It is fine if you need to choose the cropping width manually, but be sure to specify your choices in your report. Part 1B: Multiscale alignmentFor the high-resolution glass plate scans provided above, exhaustive search over all possible displacements will become prohibitively expensive. To deal with this case, implement a faster search procedure using an image pyramid. An image pyramid represents the image at multiple scales (usually scaled by a factor of 2) and the processing is done sequentially starting from the coarsest scale (smallest image) and going down the pyramid, updating your estimate as you go. It is very easy to implement by adding recursive calls to your original single-scale implementation. Report on the improvement of your multiscale solution in terms of running time (feel free to use an estimate if the single-scale solution takes too long to run). For timing, you can use the python time module. For example:import time start_time = time.time() # your code end_time = time.time() total_time = end_time - start_time Part 1: Bonus PointsImplement and test any additional ideas you may have for improving the reliability or speed of alignment. Ideas may include, but are not limited to, the following:
Part 2: Color channel alignment in the frequency domainFor the second part of this assignment, we perform color channel alignment using the Fourier transform. As discussed in this lecture, convolution in the spatial domain translates to multiplication in the frequency domain. Further, the Fast Fourier Transform algorithm computes a transform in O(N M log N M) operations for an N by M image. As a result, Fourier-based alignment may provide an efficient alternative to sliding window alignment approaches for high-resolution images. You will perform color channel alignment on the same set of six low-resolution and three high-resolution input images as in Part 1. Use your preprocessing from Part 1 to split the data into individual color channels. You should use only the original input scale (not the multiscale pyramid from Part 1) for both high-resolution and low-resolution images in Fourier-based alignment. The Fourier-based alignment algorithm consists of the following steps:
Color channel preprocessing. Applying the Fourier-based alignment to the image color channels directly may not be sufficient to align all the images. To address any faulty alignments, try sharpening the inputs or applying a small Laplacian of Gaussian filter to highlight edges in each color channel. Functions to use. You should implement your algorithm using standard libraries in Python. To compute the 2D Fourier transforms you should use the np.fft.fft2 function followed by the np.fft.fftshift function to shift components for better visualization. You can use np.conjugate to take the conjugate of a transform, and you should compute inverse transforms using the np.fft.ifft2 function. Finally, you can use scipy.ndimage.gaussian_filter or cv2.filter2D for filter-based preprocessing of input channels. In addition to the final aligned images, we will ask you to include visualization of the inverse Fourier transform outputs you used to find the offset for each channel. You can use matplotlib.pyplot.imshow to visualize the output. Make sure that the plots are clear and properly scaled so that you can see the maximum response region. Part 3: Hybrid Images In this part of the assignment you will be creating hybrid images using the technique described in this
SIGGRAPH 2006 paper by Oliva et al. (see also the end of this lecture).
Hybrid images are static images with two interpretations, which changes as a function of the viewing distance. Consider the example below.
Figures A and B are the input images of a cereal box and its contents respectively. Figures C and D are the same hybrid image displayed at
different resolutions. When we view the hybrid image at its normal size we see the cereal box and when we zoom out we see the bag of cereal. For this task, you should first use the images A and B from the example above. In addition, you should create two more hybrid images using input images of your own. Here are some guidelines for choosing good input images:
Part 3 bonus
Submission instructionsYou should turn in both your code and a report discussing your solution and results. For the specific results to be reported, consult the template linked below. Below are general guidelines to be used for all assignments:
To submit this assignment, you must upload the following files on Canvas:
Please refer to course policies on academic honesty, collaboration, late days, etc. |