CS 180 Project 1 - Geoffrey Xiang

Background

Sergei Mikhailovich Prokudin-Gorskii foresaw how color was the future for photography. Determined to document the Russian Empire in its true light, he traveled throughout the land while taking 3 photos per scene with a red, green, and blue filter. Although he was never able to see his exposures used to create colored images, the RGB glass plate negatives he left behind were eventually able to be used to reconstruct color pictures of his photographed scenes. This project revives his vision by computationally aligning the RGB glass exposures to reproduce the images in their true colors.

Image Alignment

To align the 3 exposures for each scene, my main strategy was to continuously overlay the red and green exposures on top of the blue one before using an image matching metric to determine which alignment was the best (metrics discussed further below). For small (.jpg) images, I iteratively searched through a 30 by 30 range of displacements (maximum offset of 15 pixels on each side of the blue image). However, this approach was far too inefficient for large (.tif) images, so I implemented an image pyramid to speed up the process. I found that 6 levels for the image pyramid worked extremely well and was relatively fast (less than 30 seconds) for each of the images. The recursive pyramid scheme first searched through a 128 by 128 range of displacements around the image scaled down by a factor of 32. Taking the resulting displacement vector and scaling it by 2, it then did a 64 by 64 search around that displacement for the original image scaled down by 16. Continuing this process until the base case, a 4 by 4 search on the original image using the previous recursive frame's displacement vector.

Image Matching Metrics

Euclidean Distance (L2 Norm): This was the first metric I tried, where the displacement vector that led to the lowest difference in corresponding pixel values would be the classifsied as the best alignment. This was unsurprisingly the worst metric, likely due to how brightness and contrast could vary between glass negatives, slight noise could introduce big penalties to what should be well-aligned images, and in general RGB values may not be close in value to each other in the actual image.

Euclidean Distance on Pixel Values
R: [1, 10], G: [0, -6]

Normalized Cross-Correlation: This was the second metric I tried, where the displacement vector that led to the highest NCC score (closest match across normalized pixel values) would be classified as the best alignment. However, it still didn't quite work that well. This was likely due to how real images tend to have varying RGB values per pixel. For instance, a green bush would have a high green value but low red and blue values, but NCC would punish an alignment that matched a high green value to a low blue value.

NCC on Pixel Values
R: [1, 9], G: [0, -6]

Normalized Cross-Correlation with Edge Detection: This was by far the best metric for alignment since it used edges detected within the images to align them rather than the pixel values, and each RGB pane tended to share the same edge features with each other. I implemented this by first converting each image into a black and white version with just the edges using Canny edge detection before using the same NCC metric as above to find the best displacement vector alignment. I also experimented with Euclidean Distance with Edge Detection, and for the few .jpg images I used, it produced the same result as NCC with Edge Detection.

NCC with Edge Detection
R: [2, 3], G: [2, -3]

[Bells and Whistles] Automatic Cropping

I implemented automatic border cropping using Canny edge detection. After converting the image into a black and white version with just edges (same process as the last image matching metric), I scanned the borders of the image and looked for rows or columns that were in the 75th percentile for white pixels within their respective region (top, bottom, left, right). By cropping the image at these rows and columns, the sections of the color image that were removed would likely contain the harsh borders. Interestingly, I originally tried to just scan for the 75th percentile of white pixels across the entire image as opposed to just within a certain region of the image. For most images, this worked extremely well and often even better than the current version since it was less aggressive in its cropping and would leave more of the image intact. However, some images with lots of edges within the body of the scene would interfere with the percentile threshold and leave at least 1 side with the harsh border. This caused me to make the white-value comparisons more granular so top edges would only be compared with each other, and this process would apply to the rest of the 3 edges.

Results

Cathedral

NCC with Edge Detection
R: [3, 12], G: [2, 5]

Tobolsk

NCC with Edge Detection
R: [3, 6], G: [3, 3]

Church

NCC with Edge Detection
R: [-4, 58], G: [3, 25]

Emir

NCC with Edge Detection
R: [40, 107], G: [24, 49]

Harvesters

NCC with Edge Detection
R: [11, 125], G: [17, 60]

Icon

Original icon.tif

NCC with Edge Detection

R: [23, 89], G: [16, 40]

Cropped Image

Lady

NCC with Edge Detection
R: [13, 120], G: [10, 56]

Melons

NCC with Edge Detection
R: [14, 176], G: [10, 80]

Onion Church

NCC with Edge Detection
R: [35, 107], G: [24, 52]

Sculpture

NCC with Edge Detection
R: [-27, 140], G: [-11, 33]

Self Portrait

NCC with Edge Detection
R: [37, 175], G: [29, 77]

Three Generations

NCC with Edge Detection
R: [12, 115], G: [12, 56]

Train

NCC with Edge Detection
R: [29, 85], G: [1, 44]