What are frequencies in images?
December 31, 2019
Hough Transform Implementation With Python
December 31, 2019
Show all

Hough Transform

The Hough Transform


The Basics

Series: The Hough Transform:

  1. The Basics
  2. The Normal form

The Hough transform is an incredible tool that lets you identify lines. Not just lines, but other shapes as well. In this article, I’ll talk about the mechanics behind the Hough transform. It will involve a bit of math, but just elementary concepts you learned in school. In this article, we’ll work with lines only, though the technique can be easily extended to other shapes.

Why the Hough Transform?

Lets say you take the snapshot of pole. You figure out edge pixels (using the Canny edge detector, the Sobel edge detector, or any other thing). Now you want a geometrical representation of the pole’s edge.You want to know its slope, its intercept, etc. But right now the “edge” is just a sequence of pixels.

You can loop through all pixels, and some how figure out the slope and intercept. But that is one difficult task. Images are never perfect.

So you want some mechanism that give more weightage to pixels that are already in a line. This is exactly what the Hough Transform does.

It lets each point on the image “vote”. And because of the mathematical properties of the transform, this “voting” allows us to figure out prominent lines in the image.

From lines to points

A lines is a collection of points. And managing a collection of points is tougher than managing a single point. Obviously. So the first thing we learn is how to represent a line as a single point, without losing any information about it.

This is done through the m-c space.

The m-c space

As shown in the above picture, every line has two quantities associated with it, the slope and the intercept. With these two numbers, you can describe a line completely.

So the parameter space, or the mc space was devised. Every line in the xy space is equivalent to a single point in the mc space. Neat eh?

From points to lines

Now onto the next step. Consider a point (say, (xa, ya) )in the xy space. What would its representation in the mc space be?

For starters, you could guess that infinite lines pass through a point. So, for every line passing through (xa, ya), there would be a point in the mc space.

And you’re correct there, because of the following:

  1. Any line passing through (xa, ya): ya = mxa + c
  2. Rearranging: c = – xam + ya
  3. The above is the equation of a line in the mc space.

So, a point in the xy space is equivalent to a line in the mc space.

How does this help us?

The Hough transform is all about doing what we just learned: converting points in the xy space to lines in the mc space.

You taken an edge detected image, and for every point that is non black, you draw lines in the mc place. Obviously, some lines will intersect. These intersections mark are the parameters of the line.

The following picture will clarify the idea:

An example of houghlines

The points 1, 2, 3, and 4 are represented as various lines in the mc space. And the intersection of these lines is equivalent to the original line.

Great!

Now you know the basic idea of how the Hough transform works. But there’s one big flaw in the We’ll discuss how to resolve this issue in the next article.

For a pet-project of mine I wanted to understand more details about the Hough Transformation algorithm. This algorithm is used in image processing to detect lines (and also other shapes) in images. There are already a lot of explanations out there, but none of them really clicked for me, usually because they assume some knowledge in Mathematics – given that I forgot a lot about what I learned in school, I thought it would be an interesting exercise to take the Hough Transform implementation apart and examine how it works, using my own words.

Example from wikipedia

To give credit where credit is due, there is an example on the Hough Transform wikipedia page that might not be very pretty but served as an inspiration for this article.


By Mike1024 [Public domain], via Wikimedia Commons

The Hough Transform algorithm needs an input image that is used for edge detection. Meaning, it makes sense to convert the input to a black and white matrix first, using any of the usual edge detector algorithms, like Canny Edge Detector. In the example from wikipedia, shown above, the image is assumed to be already a black and white matrix. It contains three black dots, which are “edges” – that means they are part of the “line” we are looking for in this image.

How Hough Transform works

In order to understand the inner workins of the Hough Transform, I started from the sample image from wikipedia, but drew a complete line in it.

  • The input image is 155 x 155 pixel.
  • It is assumed the input image contains one solid line from top left to lower right, and we assume some fake edge detection algorithm that will find three points of interest in the image, all on that line.

This is now our starting image:

Initial image, with one line.Points of interest, detected as edges.

We assume some fake edge detector algorithm, that will find three data points on the line:

  • P1P1 at 64/110 (red line)
  • P2P2 at 90/95 (blue line)
  • P3P3 at 119/81 (green line)

First step – draw some more lines

In the first step of the Hough Transform we go through each pixel in the image. If we decide that it’s a edge point, that means it is not a background pixel but could be part of a line, as defined by our edge detection mechanism, we draw some lines through this point, in certain angles we can chose.

As our example is very limited in scope, we only chose three angles: 30°, 60° and 90°. The idea behind this is, that if multiple points produce identical lines (we see what identical lines means later) at identical angles, then these points must be part of the same line. This is the key to understanding how Hough Transform works. Let’s draw some lines at our data points and see how it looks. The examples show 30° and then 60°.

Starting from the data point as center,
we draw lines going through it,
counter-clockwise. First 30° …
… then 60° …… this image shows all angles for all data points

As it seems, we have a winner here. At 60° we can see that the lines from all three points more or less intersect. Now we need to teach the computer that this means there is a line.

Let’s vote!

As we can see in the previous picture, we have three lines, starting from P1P1, P2P2 and P3P3 in a 60° angle that intersect. As all three data points produce the same line they can be part of the same line.

In this simple example, we only have three datapoints. But in images with noise, these three data points can be only background noise and should maybe ignored. Hough Transform uses a voting mechanism so that after all pixels of an image are processed, it can be determined which of these pixels are background noise and which are actual points on lines.

Back to our example: given is the current angle angle (called θθ), here 60°, and the x/y coordinates of the current data point (current pixel). So θθ describes the angle of the line we draw from the data point. Now we need another parameter to describe the line we just drew. As wikipedia says:

In general, the straight line y=mx+by=mx+b can be represented as a point (b,m)(b,m) in the parameter space. […] for computational reasons, Duda and Hart proposed the use of the Hesse normal form ρ=xcosθ+ysinθρ=xcosθ+ysinθ , where ρρ is the distance from the origin to the closest point on the straight line, and θθ is the angle between the xx axis and the line connecting the origin with that closest point.

So there you have it, we use angle θθ and distance ρρ to describe the line.

In order to count how often a line occurs at angle θθ with distance ρρ, we create a 2D matrix, with the x-axis being the maxium amount of angles we want to check, let this be 180 (although we only check three angles actually: 30°, 60° and 90°) and the y-axis being the maximum of ρρ. The maxium distance from any point to the origin of the image can be a diagonal line from the bottom left of the image (the origin) to the top right of the image – this is the hypotenuse which we can easily compute with some basic trigonometry.

This will create a 2-dimensional array of the size 180 (maximum θθ) by maximum_line_length (which is the maximum of ρρ):

let max_line_length = (img_width as f64).hypot(img_height as f64).ceil();
let mut accumulator: na::DMatrix<u64> = na::DMatrix::new_zeros(180, max_line_length as usize);

view rawhough_transform_accumulator.rs hosted with ❤ by GitHub

So for each data point in the image we already know θθ and then can compute ρρ. With these two paramters we “vote” in the 2D-matrix, that means, we increment the index of matrix[theta][rho] by one. So every point that produces the same θθ and ρρ will cast one vote for this index in the voting matrix. The voting matrix is correctly called accumulator.

Our code will look like this:

// calculate the longest line that can occur in the image (hypotenuse)
let max_line_length = (img_width as f64).hypot(img_height as f64).ceil();
// Create the accumulator matrix, x-axis is angle theta, y-axis is distance rho
let mut accumulator: na::DMatrix<u64> = na::DMatrix::new_zeros(180, max_line_length as usize);
// Flip y axis when reading image, as the image has 0/0 at the top left,
// but it’s easier for us to work with when the origin 0/0 is at the bottom left
for y in 0..img_height {
for x in 0..img_width {
if !detect_edge(x, y_flipped) {
continue;
}
for i in 0..3 {
let theta = (i as f64 + 1.0) * 30.0;
let rho = (x as f64) * theta.to_radians().cos() + (y_flipped as f64) * theta.to_radians().sin();
let rho_rounded = rho.round();
accumulator[(theta as usize, rho_rounded as usize)] += 1;
}
}
}

view rawhough_transform_voting.rs hosted with ❤ by GitHub

This is how we measure ρρ, shown for P1 at 30° and 60°.:

ρρ/θθ for P1 at 30°ρρ/θθ for P1 at 60°

The actual Hough Transform is now finished. The result is an accumualtor which shows what combination of θθ and ρρ appeared most often. The space described by the accumulator is called Hough Space. As a user of the algorithm, we need to define a value to filter the Hough Space. Meaning, this value will decribe how often a combination of θθ and ρρ must have appeared before we consider it a line. In our case, as we only have three datapoints on the same line, so the threshold can be two. In general it makes sense to make the threshold based on the maximum value in the accumulator, so we can say we only consider lines that are at least 75% of the maximum votes.

In order to visualize what datapoints we found, we can also dump the Hough Space as an image. The brightest point in this image is the intersection of two of three data points in our image. For people wearing glasses, like me, I added a larger version of the graph showing the hough space:

Hough Space from three data points in our simple example …… a bit larger.Another example …… and checking more angles produces this hough space, clearly showing the detected lines (the brightest points).

Transforming maxima in Hough Space back to x/y coordinates

After having found lines in the image it’s often required to outline these lines in the original image. This is also useful to see if we actually detected anything interesting.

Given that we already know which combination of ρρ/θθ produce the values of interest, we can now derive a line again from ρρ θθ. The line will be described by P1xP1x, P1yP1y, P2xP2x and P2yP2y. We know that the line we are looking for is described via ρ=xcosθ+ysinθρ=xcosθ+ysinθ, but this equation has two unknows for us: xx and yy. So in order to find out xx we need to know yy and the other way round. But how can we find out yy? We can be pragmatic here and just substitute a value we know.

If we assume that the line we are looking for is infinitly long, we know that at some point it will intersect with the x-axis of our image at 0, and it will intersect with the y-axis of our image at 0. Knowing this we can use 0 as a substitute for xx or yy when solving ρ=xcosθ+ysinθρ=xcosθ+ysinθ. We assume P1xP1x will be 0 – and we compute P1yP1y and we assume that P2yP2y will be 0 and we compute P2xP2x.

In order to find out the coordinates of the detected line, we iterate again over the Hough Space and compute the start and end point of the line:

let accumulator_threshold = 2; // Hardcoded value for our simple example how many votes a combination of rho/theta must have
// before we consider it as a line
for theta in 0..houghspace_img_width {
for rho in 0..houghspace_img_height {
let accumulator_value = accumulator[(theta as usize, rho as usize)];
if accumulator_value < accumulator_threshold {
continue;
}
let p1_x = rho as f64 / (theta as f64).to_radians().cos();
let p1_y = 0.0f64;
let p2_x = 0.0f64;
let p2_y = rho as f64 / (theta as f64).to_radians().sin();
}
}

view rawhough_transform_xy_coords.rs hosted with ❤ by GitHub

There is one caveat though: It is very well possible that the computes coordinates are outside of our image – we will need to clip them to the size of our image, otherwise we won’t be able to plot the line. I ported the Liang Barsky clipping algorithm in my example Rust project in order to clip the lines. The final result looks like this:


The initial, black line, overlayed with a green line that shows which coordinates we computed

That’s how Hough Transform works, explained with a simple example and some hacked-together code. The example code in Rust can be found on GitHub.

enter image description here

http://aishack.in/tutorials/hough-transform-normal/

https://alyssaq.github.io/2014/understanding-hough-transform/

https://stackoverflow.com/questions/4709725/explain-hough-transformation

Amir Masoud Sefidian
Amir Masoud Sefidian
Data Scientist, Machine Learning Engineer, Researcher, Software Developer

Comments are closed.