The most basic implementation of the Hough transform is used to find lines in an edge image like the one below. In this post I'll walk through the basic process of using the Hough transform to find lines in this image.
An image of machinery with edge detection applied. This image would be a good candidate for further processing with the Hough transform. (Source: Wikipedia) |
Naively, we might imagine making this process directly into an algorithm. Take each point in turn, and search through the lines that pass through that point, looking for ones that pass through an unusually high number of points. This would probably work, but I doubt it would work particularly well.
Even if our algorithm works perfectly, accepting all of the best lines for each point and nothing else, we'll be left with many duplicate lines, all of which will likely be subtly different due to rounding error, randomness in the image, and the fact that pixels are discrete. Clustering these lines into a reasonable number of representatives will be costly and potentially complicated.
The idea that we can reduce the number of lines we have to consider by using points from our image is solid, but clearly focusing on one point at a time makes us do a lot of duplicate work. Instead, we should try to focus on all the points at once, letting each point cast some sort of vote for lines that pass through it. We'll have a sort of arbiter that will collect all the votes, then, after the votes have been cast, we'll search over the space of lines for particularly dense clusters of votes, and choose a line to characterize each cluster. This is the idea underlying the Hough transform.
To actually formalize and use the Hough transform, we need to flesh out how the arbiter (which we're going to call the Accumulator from here on out) stores the set of all possible lines, and how it determines the most popular of these lines.
First a brief refresher on points.
There are two major ways to represent an arbitrary point in R2. The Hough transform as it is simplest to implement uses both. We could represent a point as a set of values ( x, y ), where we first move x units in one direction, then y units perpendicularly (or vice versa), or we could describe it as a set of values ( rho, theta ) where rho is the distance from the origin to the point, and theta is the angle from our point to the origin and out the x axis measured clockwise. For the purposes of actual implementation, I'll stick to the convention that theta is between -pi and pi.
In terms of implementation, it can be tricky to make an elegant data structure which can deal with both of these cases. I like to use inheritance ( or in java, an interface ) to give my self two types of points which can be converted into each other when needed.
It's also worth keeping in mind that most systems for representing images use coordinate systems that have opposite chirality from the systems I've drawn in my figures. This turns out not to be an issue so long as we're consistent about it. Still, if you implement your own version of the transform, it's easy to mess up.
Next we need to think about how to concisely and uniquely represent lines. I claim that the space of all lines in R2 is itself two dimensional. While it's not a proof, one way to reason this out is as follows: we can represent any line with two points for a total of 4 parameters. Two of these parameters, however, can be thought of as representing the positions of the points we chose on the line. Since we want to represent a line independent of our reference points, we ought to be able to neglect these parameters somehow.
There is in fact such a way. It is often useful to represent lines by the point on that line closest to to some reference point ( for the Hough transform, we'll typically take this point to be the origin ). This is a really nice way to do things with a single flaw: it doesn't leave us with a way to uniquely represent lines that pass through our reference point. For our purposes, I'm going to gloss over this and assume that there are no lines that pass through the origin. If you're worried that there might be, it's pretty easy to just check for these lines as a last step, since they all by definition pass through the origin.
In the figure below, I've drawn a sampling of lines through a specific point, and marked on each line the point closest to the origin.
Stare at the image above for a moment, and you'll notice something funny. The points closest to the origin on all of these lines look like they might form a circle. This is in fact the case, as is shown in the image below. If we take a circle which has a diameter from the origin to any point p of choice, the angle between the origin, any point q ( misnamed [L] in the figure ) on that circle (except of course the origin or p) will, and p will be a right angle by basic geometry. This turns out to be the necessary and sufficient condition to say that the q represents a line that passes through p.
Even if our algorithm works perfectly, accepting all of the best lines for each point and nothing else, we'll be left with many duplicate lines, all of which will likely be subtly different due to rounding error, randomness in the image, and the fact that pixels are discrete. Clustering these lines into a reasonable number of representatives will be costly and potentially complicated.
The idea that we can reduce the number of lines we have to consider by using points from our image is solid, but clearly focusing on one point at a time makes us do a lot of duplicate work. Instead, we should try to focus on all the points at once, letting each point cast some sort of vote for lines that pass through it. We'll have a sort of arbiter that will collect all the votes, then, after the votes have been cast, we'll search over the space of lines for particularly dense clusters of votes, and choose a line to characterize each cluster. This is the idea underlying the Hough transform.
To actually formalize and use the Hough transform, we need to flesh out how the arbiter (which we're going to call the Accumulator from here on out) stores the set of all possible lines, and how it determines the most popular of these lines.
First a brief refresher on points.
Different ways to represent a point p in R2. |
In terms of implementation, it can be tricky to make an elegant data structure which can deal with both of these cases. I like to use inheritance ( or in java, an interface ) to give my self two types of points which can be converted into each other when needed.
It's also worth keeping in mind that most systems for representing images use coordinate systems that have opposite chirality from the systems I've drawn in my figures. This turns out not to be an issue so long as we're consistent about it. Still, if you implement your own version of the transform, it's easy to mess up.
Next we need to think about how to concisely and uniquely represent lines. I claim that the space of all lines in R2 is itself two dimensional. While it's not a proof, one way to reason this out is as follows: we can represent any line with two points for a total of 4 parameters. Two of these parameters, however, can be thought of as representing the positions of the points we chose on the line. Since we want to represent a line independent of our reference points, we ought to be able to neglect these parameters somehow.
There is in fact such a way. It is often useful to represent lines by the point on that line closest to to some reference point ( for the Hough transform, we'll typically take this point to be the origin ). This is a really nice way to do things with a single flaw: it doesn't leave us with a way to uniquely represent lines that pass through our reference point. For our purposes, I'm going to gloss over this and assume that there are no lines that pass through the origin. If you're worried that there might be, it's pretty easy to just check for these lines as a last step, since they all by definition pass through the origin.
In the figure below, I've drawn a sampling of lines through a specific point, and marked on each line the point closest to the origin.
Sample lines through a point (filled) and the points on those lines closest to the origin (open). |
In fact, if you think about it for a while, you may be able to see that this circle (with the origin removed) represents all the points that represent lines that pass through p. There are multiple ways I know of to show this, so I'm going to leave it as an exercise to the reader.
Once all the points have had their lines tallied, we can find the bins which are most populated (checking to make sure that we have local maxima to avoid duplication), then return the median rho theta value for each as a list of lines. These lines will correspond to the lines in our image, and we'll be done!
I have some code to go with this post, which should be up as soon as I clean it up and document it. In the mean time, though, I'm going to put this post up.
No comments:
Post a Comment