COMPSCI 180 Project1

Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection

Sep 3, 2024 8 min read

Left: Canny & ImagePyramid tuned. Right: stacked without alignment.

Part1: Background & Overview

The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. First, I cropped the pictures into three parts, and treat them as blue channel, green channel and red channel. Then, the vanilla method is just to stack the three channels together, as shown below.

Picture Selected: Cathedral

The easiest attempt clearly has two drawbacks. One is that the three channels of pictures don’t crop quite well, resulting in the huge colored bands on the edge of the stacked picture. The other is that the three channels of pictures don’t align quite well, resulting in the blur and gleming of the stacked picture.

Part2: Simple Image Alignment

The easiest way to align the parts is to exhaustively search over a window of possible displacements (say [-15,15] pixels), score each one using some image matching metric, and take the displacement with the best score. I choose the easier way of calculating L2 norm:

# L2_norm calculation
def L2_norm(image1, image2):
    return np.sqrt(np.sum((image1 - image2) ** 2))

Using dual for loop, the simple attempt can exhaustively search over a window of possible displacements, compare L2 norm values 256 times, and apply the displacement with the smallest L2 norm, indicating the best alignment. With the simple alignment method, I am able to reconstruct low-res pictures like the cathedral well. The “Best matching displacement(Using blue channel as base image)” applies to all results shown below.

For green channel: move -1 on x-axis, move 1 on y-axis.

For red channel: move -1 on x-axis, move 7 on y-axis.

Stacked w/o simple alignment

Stacked w/ simple alignment

Also, I noticed that the picture has black edges that only bear unnecessary informations and can cause poor results in the simple alignment method. So I revised the simple alignment method by adding one more step of cropping the black edges. With the preprocessing cropping used, the simple alignment algorithm can achieve fine results on other low-res pictures like the monastery and the tobolsk.

For green channel: move 2 on x-axis, move -3 on y-axis.

For red channel: move 2 on x-axis, move 3 on y-axis.

Stacked w/ cropping w/o simple alignment

Stacked w/ simple alignment

For green channel: move 2 on x-axis, move 3 on y-axis.

For red channel: move 3 on x-axis, move 6 on y-axis.

Stacked w/ cropping w/o simple alignment

Stacked w/ simple alignment

Part3: Image alignment using image pyramid

Exhaustive search will become prohibitively expensive if the pixel displacement is too large (which will be the case for high-resolution glass plate scans). In this case, we can implement a faster search procedure such as an image pyramid. An image pyramid represents the image at multiple scales (usually scaled by a factor of 2) and the processing is done sequentially starting from the coarsest scale (smallest image) and going down the pyramid, updating your estimate as you go. It is very easy to implement by adding recursive calls to your original single-scale implementation.

My recursive part of the image pyramid consists of two major parts: when level reaches zero, it will call the original simple alignment function, otherwise, it will resize the images to half of its width and height and recursively call itself. In each call we can get a coarse displacement, accompanied by the fined displacement result derived at its level, we will get a correct result at the final output. Using the image pyramid, I am able to get nice results of icon and harvesters.

You can click on the image to see a bigger picture for hi-res results.

For green channel: move 17 on x-axis, move 40 on y-axis.

For red channel: move 23 on x-axis, move 89 on y-axis.

For green channel: move 15 on x-axis, move 58 on y-axis.

For red channel: move 13 on x-axis, move 120 on y-axis.

I thought the image pyramid alignment performs not that bad until I get the result of Emir:

For green channel: move 24 on x-axis, move 49 on y-axis.

For red channel: move 32 on x-axis, move 0 on y-axis.

After checking the original channel-splitted picture of emir, I discovered that the exposure and the brightness level of three channels vary dramatically, which can lead to a bad result demonstrated above.

How to solve it?

Part4: Bells & Whistles: a try on canny

After examining the emir’s picture, I discovered that Emir’s burka bears lots of stripes and treads on it. And the stripes are so clear and sharp that under uneven exposure and brightness level they can be distinguished easily. So edge detection techique may achieve unbelievably good results.

After channel-split, I ran canny edge detection on three pictures and get the following result:


Emir canny blue channel	Emir canny green channel	Emir canny red channel

By feeding canny-processed pictures into the previous image pyramid algorithm, Emir’s color image is sharp and the skin tone is also right. The left is Alignment result only using image pyramid, the right is Alignment result using canny.

Picture Selected: emir

Best matching displacement for emir.tif (Using blue channel as base image): For green channel: move 24 on x-axis, move 49 on y-axis. For red channel: move 40 on x-axis, move 107 on y-axis.

More results using canny-processed image pyramid:


lady canny blue channel	lady canny green channel	lady canny red channel

Picture Selected: lady

Best matching displacement for lady.tif (Using blue channel as base image): For green channel: move 10 on x-axis, move 56 on y-axis. For red channel: move 13 on x-axis, move 120 on y-axis.


onion church canny blue channel	onion church canny green channel	onion church canny red channel

Picture Selected: onion church

Best matching displacement for onion_church.tif (Using blue channel as base image): For green channel: move 24 on x-axis, move 52 on y-axis. For red channel: move 35 on x-axis, move 107 on y-axis.


melons canny blue channel	melons canny green channel	melons canny red channel

Picture Selected: melons

Best matching displacement for melons.tif (Using blue channel as base image): For green channel: move 10 on x-axis, move 80 on y-axis. For red channel: move 14 on x-axis, move 176 on y-axis.


sculpture canny blue channel	sculpture canny green channel	sculpture canny red channel

Picture Selected: sculpture

Best matching displacement for sculpture.tif (Using blue channel as base image): For green channel: move -11 on x-axis, move 33 on y-axis. For red channel: move -27 on x-axis, move 140 on y-axis.


three generations canny blue channel	three generations canny green channel	three generations canny red channel

Picture Selected: three generations

Best matching displacement for three_generations.tif (Using blue channel as base image): For green channel: move 12 on x-axis, move 55 on y-axis. For red channel: move 8 on x-axis, move 111 on y-axis.


self portrait canny blue channel	self portrait canny green channel	self portrait canny red channel

Picture Selected: self portrait

Best matching displacement for self_portrait.tif (Using blue channel as base image): For green channel: move 29 on x-axis, move 77 on y-axis. For red channel: move 37 on x-axis, move 175 on y-axis.


train canny blue channel	train canny green channel	train canny red channel

Picture Selected: train

Best matching displacement for train.tif (Using blue channel as base image): For green channel: move 0 on x-axis, move 41 on y-axis. For red channel: move 29 on x-axis, move 85 on y-axis.

Two result from Prokudin-Gorskii collection:

Stacked w/o simple alignment

Stacked w/ simple alignment

Best matching displacement for more.jpg (Using blue channel as base image): For green channel: move 2 on x-axis, move 4 on y-axis. For red channel: move 4 on x-axis, move 7 on y-axis.


boy canny blue channel	boy canny green channel	boy canny red channel

Picture Selected: boy

Best matching displacement for boy.tif (Using blue channel as base image): For green channel: move -14 on x-axis, move 45 on y-axis. For red channel: move -11 on x-axis, move 103 on y-axis.

Part5: My debugging journey

Although it’s nothing important or difficult, I got a bad result on image pyramid implementation due to the false understanding of np.roll at first. It was not until I viewed the previous post when I realized that it wasn’t the pictures that are hard to align, it was my function wrong. My Bells&Whistles is originally an attempt to rotate the picture to achieve alignment. But obviously, if the base method is wrong, there’s no way for this attempt to be right.

The way np.roll handles image translation is to move the section out of the border to the gap created at the other side. So if the original image has black edges on both left and the right side, rolling left means the black edge on the left transferred to the right of the original right black edge, which doesn’t contribute to reducing the L2 norm and therefore pictures won’t match in the end. In short, counteract the effect of np.roll in the recursion bit of the function, which will help reducing the L2 norm dramatically, and keep the effect of np.roll at the last level of the pixel calculation.

Jiashen Du

Undergrad Student major in CS

My research interests include Artificial Intelligence, Deep Learning, Computer Vision and LLMs.