Fall 2024 CS543/ECE549

Assignment 1: Color channel alignment, hybrid images

Due date: Monday, September 23, 11:59:59 PM

Contents


Part 1: Color channel alignment in the spatial domain


Originally adapted from A. Efros and updated numerous times. Image source: Wikipedia

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a photographer who, between the years 1909-1915, traveled the Russian empire and took thousands of photos of everything he saw. He used an early color technology that involved recording three exposures of every scene onto a glass plate using a red, green, and blue filter. Back then, there was no way to print such photos, and they had to be displayed using a special projector. Prokudin-Gorskii left Russia in 1918. His glass plate negatives survived and were purchased by the Library of Congress in 1948. Today, a digitized version of the Prokudin-Gorskii collection is available online.

The goal of this part of the assignment is to learn to work with images by taking the digitized Prokudin-Gorskii glass plate images and automatically producing a color image with as few visual artifacts as possible. In order to do this, you will need to extract the three color channel images, place them on top of each other, and align them so that they form a single RGB color image. You will need to implement this assignment in Python, and you should familiarize yourself with libraries for scientific computing and image processing including NumPy and PIL.

Data

  • Zip file with six input images for the basic alignment experiments
  • Zip file with high-resolution images for multiscale alignment experiments (150MB)
Note that the filter order for all files from top to bottom is BGR, not RGB!

Part 1A: Basic alignment

Your program should divide the image into three equal parts (channels) and align two of the channels to the third. For each image, you should try different orders of aligning the channels and figure out which one works the best. You will need to include in your report the colorized output and the (x,y) displacement vectors that were used to align the channels.

IMPORTANT: For consistency of grading, you need to report your displacements of R and G channels with respect to the B channel, even if you used a different order of alignment. After you split the input images into three equal parts along the vertical dimension, take the upper left corner of each channel to be the origin or reference point with respect to which the offset is reported. You may need to perform cropping within each channel as discussed below, but be sure to maintain these initial reference points for the purpose of reporting your offsets.

In Part 1A, work with the set of six lower-resolution images. The easiest way to align the parts is to exhaustively search over a window of possible displacements (say [-15,15] pixels independently for the x and y axis), score each one using some image matching metric, and take the displacement with the best score. There is a number of possible metrics that one could use to score how well the images match. The most basic one is the L2 norm of the pixel differences of the two channels, also known as the sum of squared differences (SSD), which in Python is simply sum((image1-image2)**2) for images loaded as NumPy arrays. Note that in our case, the images to be matched do not actually have the same brightness values (they are different color channels), so it may be more appropriate to use normalized cross-correlation (NCC), which is simply the dot product between the two images normalized to have zero mean and unit norm. Experiment with both SSD and NCC and use the one that gives you the best results.

Border cropping. The borders of the photograph will cause problems since the three channels won't exactly align there. For best alignment results, you should crop out the borders of the images and do alignment search along the interior portions that are more consistent across the three color channels. It is fine if you need to choose the cropping width manually, but be sure to specify your choices in your report.

Part 1B: Multiscale alignment

For the high-resolution glass plate scans provided above, exhaustive search over all possible displacements will become prohibitively expensive. To deal with this case, implement a faster search procedure using an image pyramid. An image pyramid represents the image at multiple scales (usually scaled by a factor of 2) and the processing is done sequentially starting from the coarsest scale (smallest image) and going down the pyramid, updating your estimate as you go. It is very easy to implement by adding recursive calls to your original single-scale implementation. Report on the improvement of your multiscale solution in terms of running time (feel free to use an estimate if the single-scale solution takes too long to run). For timing, you can use the python time module. For example:

import time
start_time = time.time()
# your code
end_time = time.time()
total_time = end_time - start_time

Part 1: Bonus Points

Implement and test any additional ideas you may have for improving the reliability or speed of alignment. Ideas may include, but are not limited to, the following:

  • Can you preprocess the channels in some way to improve the alignment acccuracy?
  • Can you automatically determine the best order of alignment for each image?
  • Can you automatically determine the cropping width?
  • Instead of aligning the entire channel to the reference, can you automatically identify smaller patches that would give especially informative results? E.g., uniform regions are not especially informative, so you probably want regions that have some distinctive patterns in them, but how to find those?
  • Can you estimate alignments at subpixel precision?

Part 2: Color channel alignment in the frequency domain

For the second part of this assignment, we perform color channel alignment using the Fourier transform. As discussed in this lecture, convolution in the spatial domain translates to multiplication in the frequency domain. Further, the Fast Fourier Transform algorithm computes a transform in O(N M log N M) operations for an N by M image. As a result, Fourier-based alignment may provide an efficient alternative to sliding window alignment approaches for high-resolution images.

You will perform color channel alignment on the same set of six low-resolution and three high-resolution input images as in Part 1. Use your preprocessing from Part 1 to split the data into individual color channels. You should use only the original input scale (not the multiscale pyramid from Part 1) for both high-resolution and low-resolution images in Fourier-based alignment.

The Fourier-based alignment algorithm consists of the following steps:

  1. For two color channels C1 and C2, compute corresponding Fourier transforms FT1 and FT2.
  2. Compute the conjugate of FT2 (denoted as FT2*), and compute the product of FT1 and FT2*.
  3. Take the inverse Fourier transform of this product and find the location of the maximum value in the output image. Use the displacement of the maximum value to obtain the offset of C2 from C1.
To colorize a full image, you will need to choose a base color channel, and run the above algorithm twice to align the other two channels to the base. For further details of the alignment algorithm, see section 9.1.2 of Computer Vision: Algorithms and Applications, 2nd ed.

Color channel preprocessing. Applying the Fourier-based alignment to the image color channels directly may not be sufficient to align all the images. To address any faulty alignments, try sharpening the inputs or applying a small Laplacian of Gaussian filter to highlight edges in each color channel.

Functions to use. You should implement your algorithm using standard libraries in Python. To compute the 2D Fourier transforms you should use the np.fft.fft2 function followed by the np.fft.fftshift function to shift components for better visualization. You can use np.conjugate to take the conjugate of a transform, and you should compute inverse transforms using the np.fft.ifft2 function. Finally, you can use scipy.ndimage.gaussian_filter or cv2.filter2D for filter-based preprocessing of input channels.

In addition to the final aligned images, we will ask you to include visualization of the inverse Fourier transform outputs you used to find the offset for each channel. You can use matplotlib.pyplot.imshow to visualize the output. Make sure that the plots are clear and properly scaled so that you can see the maximum response region.

Part 3: Hybrid Images

In this part of the assignment you will be creating hybrid images using the technique described in this SIGGRAPH 2006 paper by Oliva et al. (see also the end of this lecture). Hybrid images are static images with two interpretations, which changes as a function of the viewing distance. Consider the example below. Figures A and B are the input images of a cereal box and its contents respectively. Figures C and D are the same hybrid image displayed at different resolutions. When we view the hybrid image at its normal size we see the cereal box and when we zoom out we see the bag of cereal.


For this task, you should first use the images A and B from the example above. In addition, you should create two more hybrid images using input images of your own. Here are some guidelines for choosing good input images:

  • Good input image pairs usually consist of one image of a smooth surface (eg. a picture of your face) and another of a textured surface (eg. a picture of a dog's face). This is because creating hybrid images combines the smooth (low-frequency) part of one image with the high-frequency part of another. Another good example pair would be an apple and orange.
  • Edit the images to ensure that the objects are aligned.
  • Ensure that there are no diffuse shadows on either image and the objects are clearly visible.
After you have chosen your image pair, follow these steps:
  1. Crop and align the images such that the objects and their edges are aligned. The alignment is important because it affects the perceptual grouping (read the paper for details). You are free to use any image editing tool for this and there is no need for code for this step.
  2. Read aligned input images and convert them to grayscale (you are only required to produce grayscale hybrid images for full credit; color is part of the extra credit).
  3. Apply a low-pass filter, i.e., a standard 2D Gaussian filter, on the first (smooth) image. You can use scipy function scipy.ndimage.gaussian_filter. Then apply a high-pass filter on the second image. The paper suggests using an impulse (identity) filter minus a Gaussian filter for this operation. Use your intuition and trial and error to determine good values of σ for the high-pass and low-pass filters. One of the σ's should always be higher than the other (which one?), but the optimal values can vary from image to image. Alternatively, you can specify the filters in the frequency domain as described in Section 2 and Figure 5 of the paper. Once again, some trial and error may be required.
  4. Add or average the tranformed images to create the hybrid image.
In your report you will provide the following for 3 different examples (1 provided pair, 2 pairs of your own):
  • Two input images;
  • Two filtered input images;
  • Two generated hybrid images (at different resolutions, similar to images C and D above).
In your report you should also include the following:
  • Explanation of how you chose the σ values or how you created the filters in the frequency domain.
  • Discussion of how successful your examples are, plus any interesting observations.

Part 3 bonus

  • Try to use color images as input to the hybrid images task and analyze the results. To merge color images, you will have to apply the filters on each of the three channels separately.
  • Try to come up with interesting failure cases of hybrid images and possible reasons for failure.
  • Play around with AI-based image generation tools to get synthetic hybrid image inputs, inspired by Factorized Diffusion (you don't have to actually implement their method).

Submission instructions

You should turn in both your code and a report discussing your solution and results. For the specific results to be reported, consult the template linked below. Below are general guidelines to be used for all assignments:
  • For each part of the assignment, be sure to include a brief description of your implemented solution, focusing especially on the more "interesting" parts (i.e., not the parts that follow the provided instructions or outline verbatim). What implementation choices did you make, and how did they affect the quality of the result and the speed of computation? What are some artifacts and/or limitations of your implementation, and what are possible reasons for them?

  • When inserting results images into your report, you should resize/compress them in JPEG format to keep the file size manageable (under 20MB ideally) -- but make sure that the correctness and quality of your output can be clearly and easily judged.

  • We also ask you to include your output images in your code zip file. Note that this is for backup documentation only, in case we cannot see the images in your PDF report clearly enough. You will not receive credit for any output images that are part of the zip file but are not shown (in some form) in the report PDF.

  • For potential extra credit, include in your report any bonus improvements you attempted, with a brief description and results. Any parts of the report you are submitting for extra credit should be clearly marked as such.

To submit this assignment, you must upload the following files on Canvas:

  1. All your code and output images in a single zip file. The filename should be netid_a1.zip. The code files in it should be named as netid_a1_part1.py, netid_a1_part2.py and netid_a1_part3.py (or another Python extension). The zip file should also contain the output images, but not the original input images.

  2. A brief report in a single PDF file with all your results and discussion following this template. The filename should be netid_a1.pdf.

Please refer to course policies on academic honesty, collaboration, late days, etc.