A Computer Vision and Machine learning blog

My photo
I'm a Computer Vision and Machine learning developer.

Sunday, April 26, 2015

Smoothing images with the Mumford Shah functional

Try out my python implementation for minimizing the Mumford Shah functional.
import cv2
from AmbrosioTortorelliMinimizer import *

img = cv2.imread("image.jpg", 0)
solver = AmbrosioTortorelliMinimizer(img)
img, edges = solver.minimize()
enter image description hereenter image description here
enter image description hereenter image description here
Result of minimizing the Ambrosio-Tortorelli functionalResult of minimizing the Ambrosio-Tortorelli functional
The Mumford Shah functional, one of the most cited models in Computer Vision,
is designed to smooth homogeneous regions in an image, while enhancing edges.
You can minimize it to get edge preserving smoothing, or a segmentation of the image, depending on the parameters you choose.
enter image description here
The functional tries to enforce some conditions on the target image J:
1. The target image J should be similar to the input image I.
2. The target image J should be smooth, except at a set of edges B.
3. There should be few edges.
That’s a hard optimization problem.
Luckily, Ambrosio and Tortorelli showed that the functional can be represented in terms of another functional, where the edge set B is modeled by multiplication with a continuous function that represents the edges, and converges to the original mumford-shah functional when ɛ approaches 0.
enter image description here
z is ~0 for edges, and ~1 for smooth pixels.
The optimization problem is now to minimize:
enter image description here
Using functional calculus you can take derivatives of that, and obtain a linear system of equations for solving:
1. The edge set Z.
2. The image J.
Both depend on one another, so usually an alternation scheme is used:
Solve for the edge set Z while fixing J, and then solve for J while fixing Z.
To derive the equations I followed http://www.eng.tau.ac.il/~nk/TRs/LeahTIP2006.pdf
That paper also shows you can use the same functional as a regularizer for reconstructing sharp images out of blurred ones if you have the blurring kernel, or if you parameterize the kernel as a Gaussian.

There are a few practical problems with all this, though.
  1. Solving such a huge set of equations (one equation for every pixel) is slow.
    I provided the conjugate gradient solver a linear operator, instead of creating and storing a huge matrix.
    I wonder if using a pre-conditioner would do any good for speed of convergence.
    Also there should probably be a unified calculation for RGB images, instead of smoothing each channel.
  2. You have to choose a good value of ɛ.
In my implementation for the purpose of speed I set the default parameters to be few iterations and a high numerical tolerance. Results look good for even one AM iteration.
It’s pretty straight forward to implement on a GPU, since single pixel calculations are independent of one another.
On my machine the unconcurrent python implementation run time for 640x480 color images was about 0.6 seconds, so I guess a GPU implementation would speed it up a lot.
There is lots of ongoing work on phrasing other minimization problems to solve the same thing, and it’s getting close to real time.
In this paper (github repo) they phrased a similar functional, that’s independent of ɛ or the edge set (they model the edge set in terms of the image gradient), and boast 20 fps performance for color images.
They did use a GPU implementation, and a strong GPU to achieve that,
so I wonder what would be the performance of a GPU implementation for the Ambrosio-Tortorelli minimization.

Friday, April 24, 2015

Simple Image saliency detection from histogram backprojection

Image saliency detection is about identifying the interesting parts of an image, the parts of the image human eyes would fix on.
enter image description here enter image description here
Saliency detection is used in a lot of applications, the most popular of them is probably automatic thumbnail generation, where a descriptive thumbnail has to be generated for an image.
Usually a saliency map is generated, where a pixel is bright if it’s salient, and dark otherwise.
There are lots of interesting papers on this. Check out a great overview on saliency detection.
Anyway, in this post I’l share a simple heuristic for saliency detection.
The basic idea is that usually salient pixels should have very different colors than most of the other pixels in the image.
We measure each pixel’s similarity to the background by histogram back-projection.
Finally, we refine the saliency map with Grabcut.
So here we go.
Original image (taken at Mount. Takao, Japan)
enter image description here

1. Preprocessing: Filter the image with Mean Shift for an initial soft segmentation of the image.

This groups together close pixels that are similar, and generates a very smooth version of the image.
enter image description here
Some of the saliency papers call an inital segmentation stage - “abstraction”.
It’s common that each segment is compared somehow to other segments to give it a saliency score.
We won’t do that here, though.
For speed we use the multiscale pyramidal meanshift version in OpenCV.
This stage isn’t entirely necessary, but improves results on some images.
cv2.pyrMeanShiftFiltering(img, 2, 10, img, 4)

2. Back-project the Hue, Saturation histogram of the entire image, on the image itself.

The back-projection for each channel pixel is just the intensity, divided by how many pixels have that intensity.
It’s an attempt to assign each pixel a probability of belonging to the background.
We use only 2 bins for each channel.
That means we quantize the image into 4 colors (since we have 2 channels).
The more levels we use, the more details we get in the saliency map.
enter image description here
Notice how the blue jacket gets low values, since it’s unique and different from the background.
Also notice how the face back-projection is not unique.
That’s bad, we will try to fix that later.
def backproject(source, target, levels = 2, scale = 1):
    hsv = cv2.cvtColor(source,  cv2.COLOR_BGR2HSV)
    hsvt = cv2.cvtColor(target, cv2.COLOR_BGR2HSV)
    # calculating object histogram
    roihist = cv2.calcHist([hsv],[0, 1], None, [levels, levels], [0, 180, 0, 256] )

    # normalize histogram and apply backprojection
    dst = cv2.calcBackProject([hsvt],[0,1],roihist,[0,180,0,256], scale)
    return dst
backproj = np.uint8(backproject(img, img, levels = 2))

3. Process the back-projection to get a saliency map.

So here we smooth the back-projection image with mean shift, enhance the contrast of the saliency map with histogram equalization, and invert the image.
The goal is to produce a smooth saliency map where salient regions have bright pixels.
enter image description here

saliencies = [backproj, backproj, backproj]
saliency = cv2.merge(saliencies)
cv2.pyrMeanShiftFiltering(saliency, 20, 200, saliency, 2)
saliency = cv2.cvtColor(saliency, cv2.COLOR_BGR2GRAY)
cv2.equalizeHist(saliency, saliency)

4. Threshold.

enter image description here
Ok, we have a few issues here:
- The segmentation is pretty rough.
- My head isn’t attached to the salient region.
(T, saliency) = cv2.threshold(saliency, 200, 255, cv2.THRESH_BINARY)

5. Find the bounding box of the connected component with the largest area.

We now have two salient regions.
A heuristic we use here is that the salient object will probably be the largest one.
We could’ve similarly assumed that the salient object will usually be closest to the center.
Saliency papers call that “encoding a prior”, since we use prior knowledge about how salient objects look.
def largest_contour_rect(saliency):
    contours, hierarchy = cv2.findContours(saliency * 1,

    contours = sorted(contours, key = cv2.contourArea)
    return cv2.boundingRect(contours[-1])

6. Refine the salient region with Grabcut.

We use the bounding box as an initial foreground estimation, and apply Grabcut.
This makes the salient region segmentation much more accurate,
and most importantly, attaches back my head.
enter image description hereenter image description hereenter image description here
def refine_saliency_with_grabcut(img, saliency):
    rect = largest_contours_rect(saliency)
    bgdmodel = np.zeros((1, 65),np.float64)
    fgdmodel = np.zeros((1, 65),np.float64)
    saliency[np.where(saliency > 0)] = cv2.GC_FGD
    mask = saliency
    cv2.grabCut(img, mask, rect, bgdmodel, fgdmodel, 1, cv2.GC_INIT_WITH_RECT)
    mask = np.where((mask==2)|(mask==0),0,1).astype('uint8')
    return mask
Link to the code on github.
I like this method since it’s simple, but it has it’s drawbacks.
enter image description hereenter image description here