COMPSCI 180 Project1
Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection
Part1: Background & Overview
The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. First, I cropped the pictures into three parts, and treat them as blue channel, green channel and red channel. Then, the vanilla method is just to stack the three channels together, as shown below.
Picture Selected: Cathedral
The easiest attempt clearly has two drawbacks. One is that the three channels of pictures don’t crop quite well, resulting in the huge colored bands on the edge of the stacked picture. The other is that the three channels of pictures don’t align quite well, resulting in the blur and gleming of the stacked picture.
Part2: Simple Image Alignment
The easiest way to align the parts is to exhaustively search over a window of possible displacements (say [-15,15] pixels), score each one using some image matching metric, and take the displacement with the best score. I choose the easier way of calculating L2 norm:
# L2_norm calculation
def L2_norm(image1, image2):
return np.sqrt(np.sum((image1 - image2) ** 2))
Using dual for loop, the simple attempt can exhaustively search over a window of possible displacements, compare L2 norm values 256 times, and apply the displacement with the smallest L2 norm, indicating the best alignment. With the simple alignment method, I am able to reconstruct low-res pictures like the cathedral well. The “Best matching displacement(Using blue channel as base image)” applies to all results shown below.
For green channel: move -1 on x-axis, move 1 on y-axis.
For red channel: move -1 on x-axis, move 7 on y-axis.
Stacked w/o simple alignment |
Stacked w/ simple alignment |
Also, I noticed that the picture has black edges that only bear unnecessary informations and can cause poor results in the simple alignment method. So I revised the simple alignment method by adding one more step of cropping the black edges. With the preprocessing cropping used, the simple alignment algorithm can achieve fine results on other low-res pictures like the monastery and the tobolsk.
For green channel: move 2 on x-axis, move -3 on y-axis.
For red channel: move 2 on x-axis, move 3 on y-axis.
Stacked w/ cropping w/o simple alignment |
Stacked w/ simple alignment |
For green channel: move 2 on x-axis, move 3 on y-axis.
For red channel: move 3 on x-axis, move 6 on y-axis.
Stacked w/ cropping w/o simple alignment |
Stacked w/ simple alignment |
Part3: Image alignment using image pyramid
Exhaustive search will become prohibitively expensive if the pixel displacement is too large (which will be the case for high-resolution glass plate scans). In this case, we can implement a faster search procedure such as an image pyramid. An image pyramid represents the image at multiple scales (usually scaled by a factor of 2) and the processing is done sequentially starting from the coarsest scale (smallest image) and going down the pyramid, updating your estimate as you go. It is very easy to implement by adding recursive calls to your original single-scale implementation.
My recursive part of the image pyramid consists of two major parts: when level reaches zero, it will call the original simple alignment function, otherwise, it will resize the images to half of its width and height and recursively call itself. In each call we can get a coarse displacement, accompanied by the fined displacement result derived at its level, we will get a correct result at the final output. Using the image pyramid, I am able to get nice results of icon and harvesters.
You can click on the image to see a bigger picture for hi-res results.
For green channel: move 17 on x-axis, move 40 on y-axis.
For red channel: move 23 on x-axis, move 89 on y-axis.
For green channel: move 15 on x-axis, move 58 on y-axis.
For red channel: move 13 on x-axis, move 120 on y-axis.
I thought the image pyramid alignment performs not that bad until I get the result of Emir:
For green channel: move 24 on x-axis, move 49 on y-axis.
For red channel: move 32 on x-axis, move 0 on y-axis.
After checking the original channel-splitted picture of emir, I discovered that the exposure and the brightness level of three channels vary dramatically, which can lead to a bad result demonstrated above.
How to solve it?
Part4: Bells & Whistles: a try on canny
After examining the emir’s picture, I discovered that Emir’s burka bears lots of stripes and treads on it. And the stripes are so clear and sharp that under uneven exposure and brightness level they can be distinguished easily. So edge detection techique may achieve unbelievably good results.
After channel-split, I ran canny edge detection on three pictures and get the following result:
Emir canny blue channel | Emir canny green channel | Emir canny red channel |
By feeding canny-processed pictures into the previous image pyramid algorithm, Emir’s color image is sharp and the skin tone is also right. The left is Alignment result only using image pyramid, the right is Alignment result using canny.
Picture Selected: emir
Best matching displacement for emir.tif (Using blue channel as base image): For green channel: move 24 on x-axis, move 49 on y-axis. For red channel: move 40 on x-axis, move 107 on y-axis.
More results using canny-processed image pyramid:
lady canny blue channel | lady canny green channel | lady canny red channel |
Picture Selected: lady
Best matching displacement for lady.tif (Using blue channel as base image): For green channel: move 10 on x-axis, move 56 on y-axis. For red channel: move 13 on x-axis, move 120 on y-axis.
onion church canny blue channel | onion church canny green channel | onion church canny red channel |
Picture Selected: onion church
Best matching displacement for onion_church.tif (Using blue channel as base image): For green channel: move 24 on x-axis, move 52 on y-axis. For red channel: move 35 on x-axis, move 107 on y-axis.
melons canny blue channel | melons canny green channel | melons canny red channel |
Picture Selected: melons
Best matching displacement for melons.tif (Using blue channel as base image): For green channel: move 10 on x-axis, move 80 on y-axis. For red channel: move 14 on x-axis, move 176 on y-axis.
sculpture canny blue channel | sculpture canny green channel | sculpture canny red channel |
Picture Selected: sculpture
Best matching displacement for sculpture.tif (Using blue channel as base image): For green channel: move -11 on x-axis, move 33 on y-axis. For red channel: move -27 on x-axis, move 140 on y-axis.
three generations canny blue channel | three generations canny green channel | three generations canny red channel |
Picture Selected: three generations
Best matching displacement for three_generations.tif (Using blue channel as base image): For green channel: move 12 on x-axis, move 55 on y-axis. For red channel: move 8 on x-axis, move 111 on y-axis.
self portrait canny blue channel | self portrait canny green channel | self portrait canny red channel |
Picture Selected: self portrait
Best matching displacement for self_portrait.tif (Using blue channel as base image): For green channel: move 29 on x-axis, move 77 on y-axis. For red channel: move 37 on x-axis, move 175 on y-axis.
train canny blue channel | train canny green channel | train canny red channel |
Picture Selected: train
Best matching displacement for train.tif (Using blue channel as base image): For green channel: move 0 on x-axis, move 41 on y-axis. For red channel: move 29 on x-axis, move 85 on y-axis.
Two result from Prokudin-Gorskii collection:
Stacked w/o simple alignment |
Stacked w/ simple alignment |
Best matching displacement for more.jpg (Using blue channel as base image): For green channel: move 2 on x-axis, move 4 on y-axis. For red channel: move 4 on x-axis, move 7 on y-axis.
boy canny blue channel | boy canny green channel | boy canny red channel |
Picture Selected: boy
Best matching displacement for boy.tif (Using blue channel as base image): For green channel: move -14 on x-axis, move 45 on y-axis. For red channel: move -11 on x-axis, move 103 on y-axis.
Part5: My debugging journey
Although it’s nothing important or difficult, I got a bad result on image pyramid implementation due to the false understanding of np.roll
at first. It was not until I viewed the previous post when I realized that it wasn’t the pictures that are hard to align, it was my function wrong. My Bells&Whistles is originally an attempt to rotate the picture to achieve alignment. But obviously, if the base method is wrong, there’s no way for this attempt to be right.
The way np.roll
handles image translation is to move the section out of the border to the gap created at the other side. So if the original image has black edges on both left and the right side, rolling left means the black edge on the left transferred to the right of the original right black edge, which doesn’t contribute to reducing the L2 norm and therefore pictures won’t match in the end. In short, counteract the effect of np.roll
in the recursion bit of the function, which will help reducing the L2 norm dramatically, and keep the effect of np.roll
at the last level of the pixel calculation.