Go to Alexandria's home page
The Library of Alexandria

Bubbles

Alexandria Home | Up One Level

Last updated 4/22/2005

Table of Contents

Introduction
     About this Document
     The Problem
     The Essential Concept
Bubble Growth
     Ever Outward
     Gradient Traversal and Soft Edges
     Pixels as Cellular Automata
     Engulfing Islands
Dumpster Diving for Data
     What's in a Bubble?
     Area and Perimeter
     Bounding Box
     Longest Line
     Predominant Color or Gradient
Next Steps
     Bubbles? So What?
     Polygon Approximation
     Spline Curve Approximation
     Circles and Sticks
Shortcomings of Bubbles
Download My Source Code
Updates
Your Feedback

Back up to the table of contents Introduction

Back up to the table of contents About this Document

I have created a machine vision algorithm for finding contiguous regions by "planting" structures into them that grow like bubbles filling in a hollow space. This document presents in detail this concept and the key mechanisms behind the algorithm.

This document was first drafted on the eleventh of April of 2005 by Jim Carnicelli.

Back up to the table of contents The Problem

In earlier machine vision research (Texture Vision project) I found I was able to write code to find edges fairly easily, but when it came to discovering objects, I realized there was a very important deficiency with the approach I was using. To recognize objects, I assumed I would need to find contiguous regions, but I was only finding linear paths. I had assumed, further, that the paths would tend to close on themselves, but they would branch and get cut off in "softer" areas of an image.

Back up to the table of contents The Essential Concept

It occurred to me that I could guarantee getting a contiguous region if I started with a tiny, continuous loop inside a region and expanded it outward to meet the region's boundaries, never letting the loop get broken along the way. I liken this to a bubble growing to fill a hollow space. The following image illustrates some bubbles grown within an image:

Figure 1: Examples of bubbles
Figure 1: Examples of bubbles

Incidentally, this is essentially the same as the approach I took with my Windows Vision project. I didn't think I'd be able to easily reuse much from a purely rectilinear vision system in general purpose machine vision.

Back up to the table of contents Bubble Growth

Back up to the table of contents Ever Outward

Bubble growth appears to be a macro-scale operation, but it is accomplished by a focus on the smallest units of an image: the pixels. When a bubble is first "planted" in an image, it has a diamond formation consisting of four pixels. That diamond will continue to grow as a diamond shape until it meets a boundary. Figure 2 below shows several frames in an animation of a buble growing from a seed bubble outward to fill the region of the figured man's hair.

Figure 2: Example of bubble growth
Figure 2: Example of bubble growth

Figure 3 below illustrates the process in finer detail so one can see the individual pixels involved.

Figure 3: Bubble growth at the pixel level
Figure 3: Bubble growth at the pixel level

Take careful note of what happens as the bubble meets and pushes around the obstacles. Whereas initially, it seems the pixel loop nodes area always moving outward, you see here that they appear to stop and take up stationary positions at the edges of obstacles while other loop nodes continue their expansion.

Every node in the loop must have exactly two neighboring loop nodes, referred to as its previous and next neighbors. The rules that govern the loop node movement and multiplication ensure that no node's neighbors will ever be more (or less) than one pixel away from it. As the bubble expands, new nodes are needed. They get inserted between neighboring pixels to become part of the continuous chain of nodes.

There are special constraints that are important for ensuring that bubbles are useful and for making certain analyses possible. One is that a pixel loop's edge can never cross over itself, like in a figure "8". The other, which actually encompasses the first, is that a loop's edge must always be touching both the "outside" and "inside" of the bubble. Though this normally won't happen because of the rules that govern loop growth, there is one case, which I'll address later, in which a part of a loop might become engulfed within the rest and so violate this requirement. The result of these requirements is that what's contained by any given loop is a contiguous, "solid" region, not separate regions or regions with "doughnut holes".

Back up to the table of contents Gradient Traversal and Soft Edges

To the casual observer, it may seem sufficient to grow a bubble in a roughly solid colored area, but it becomes quickly clear that most natural subjects with basically smooth surfaces appear as blends of "colors". One common cause is differing ways light reflects off of different parts of the object pictured. Some picture elements, like a clear sky, have subtle shifts from one color (e.g., deep blue at the top of the picture) to another (e.g., a nearly white shade of blue at the horizon).

Figure 4 below of a flower petal illustrates both the above causes -- lighting effects and subtle variations in color -- with a bubble grown inside the petal.

Figure 4: Example of a bubble spanning a smooth color gradient
Figure 4: Example of a bubble spanning a smooth color gradient

It seems reasonable to believe that human vision will treat the petal pictured here as a single thing as it attempts to analyze the scene. Only artists and pinheads like me programming graphics or machine vision systems would likely even notice the distinctions among the pixels in the gradients.

To treat a smooth gradient like the one in figure 4 as a single surface, our algorithm has to be able to seek out subtle shifts from one pixel to the next. Fortunately, this is not terribly difficult. As the bubble growing algorithm addresses each node (pixel) in the outer loop, that pixel's memory of the color it is looking for is compared against the color of the actual pixel it attempts to move into. When new nodes are created to accommodate the loop's perimeter growth, they inherit this color expectation from the original node they are spawned from.

If the color of the next pixel is significantly different from that which is expected, that next pixel is considered an obstacle to further growth. The math behind the comparison is very simple. Any color in this system is represented by a set of red, green, and blue (RGB) light components from 0 (dark) to 255 (light). For example, black is [0, 0, 0], white is [255, 255, 255], red is [255, 0, 0], and yellow is [255, 255, 0]. To calculate the "variance" from one color to another, then, one uses the following equation, where "A" is one color and "B" is another one to be compared:

Traditional machine vision experts might disagree with my choice of equations, here. Typically, brightness is used, which is mathematically equivalent to using my equations with black and white images. That's one reason many MV systems you may have seen use black and white instead of color images. It's not because B&W cameras are cheaper than color, but because they don't give any extra information to an MV system detecting edges using brightness comparisons alone. I did actually try using this algorithm with brightness comparisons, but the results were deeply unsatisfactory. Figure 4 is a perfect illustration of why. Turned into a black and white image, figure 4 loses some of its sharp edges between the yellow petal and the grayish background. Bubble growth will spill out of the petal either if a black and white version of figure 4 is used or if brightness comparisons are equivalently used. We clearly evolved multi-color vision because there's more information to be gained with three components than with just one (brightness).

There's a pitfall one has to watch for with gradients. Study any blurry photo close up and you'll see why. An otherwise sharp edge in a blurry photograph is actually a gradient. The danger of opening the bubble growth algorithm to being able to morph what color it expects over time across a smooth gradient is that a soft edge will be treated as no edge at all. Human vision does not appear to suffer this problem. We have an incredibly strong propensity to see edges, even when they don't actually exist in some pictures.

I implemented a concept of "momentum" in the bubble growth algorithm that looks not just at neighboring pixels, but also at the trend of color shifting over time. While above I suggested that one can look at the next neighboring pixel and compare it to the current one, that's not exactly what happens in my code. As I said above, each loop node carries with it an expectation of what color to be looking for. If the next pixel's color does not vary by more than the allowable threshold, we then subtly shift that expected color in the direction of that next pixel as we move into it. The math, again, is very simple. We have the following equations, where the math for the red channel is identical to that for the green and blue channels:

Here, MaxStep puts a limit on how "fast" the color expectation can shift. If we set this to 255 or higher, we would be guaranteed that in each step, the color expected will be exactly the color of the pixel we land on, which voids the "momentum" concept. A value of 0 would mean we would never shift the color, so we would not be able to have bubble growth expand beyond a nearly solid color. As stated above, then, we compare the difference between the expected color and the actual color to be found in the next pixel we might move our loop node into.

The max-step threshold should not be confused with the variance threshold described above. They serve different functions and their values are determined independently of one another. No matter what value is chosen for the max-step threshold, the variance threshold will still be used to decide if the next pixel is too strongly different from the current expectation.

Limiting how fast we shift our color expectation for any given loop node is key to avoiding getting fooled by blurry edges. The larger the max-step threshold chosen, the more tolerant the algorithm is of soft edges.

The person experienced with machine vision at this point will probably ask a key question that trips up just about everyone doing MV: how do you choose your threshold values? Although I address this in the Shortcomings section below in greater detail, I will say that my answer is just as disappointing as most other researchers'. I freely admit that I had to tweak my thresholds with the various images I experimented with to some degree to achieve certain desired results. There is no universal answer, though -- not even for humans. The answer is subjective, and I'll talk more about that later.

Back up to the table of contents Pixels as Cellular Automata

Up to now, I've talked about the individual pixel loop nodes that define the perimeter of a bubble moving and multiplying according to a set of rules. I realized that expressing those rules in terms of simple if-then statements in code would probably result in a big headache. Taking a page from the history of artificial life, I decided to treat an image as an exercise in cellular automata and express the bubble growth rules in terms of that CA.

If you're not familiar with the concept of cellular automata, the concept is brilliant in its conception, yet fairly simple to state. A CA "world" is composed of a "checker board" of boxes. Each box can be thought of as an entity that can be in one of multiple finite states -- colors or numbers, for example. Time passes in discrete moments, and in each moment, every entity decides what state it will take on in the next step based exclusively on the current state of its eight immediate neighbors -- those that touch one of its sides or corners. There are variations on this theme -- like considering one's own state, looking farther than the immediate neighbors, or looking back in time -- but in our case, the above constraints are good enough. In our case, there are exactly four possible states, which I'll identify later. If my own cell's state in the next step is determined purely by considering my neighbor's current states, then there are 48, or about 66 thousand, possible combinations of current states to consider. We could draft a table with every one of these combinations and, in the last column, write down what the next state will be. That would be a complete rule set.

In our case, the checker-board world is the image, and each pixel can be thought of as an entity (checker square) in the CA. For the sake of sanity and practicality, I took a variety of shortcuts. For one thing, I don't consider every pixel in the image in each step because only the ones that have loop nodes in them in the current step and their immediate neighbors can change states. For another, I don't pre-compute the next-step states of all the entities before moving them on to their next states. In fact, I just follow the chain of nodes and decide along the way, even though I know the changes for the previous node I considered are going to affect the decisions of the current node. These and other tricks help speed things up a lot, reducing calculations by perhaps hundreds or thousands of times.

I said that there are four current states to consider. Any given pixel neighboring the current node under consideration can be considered to be in one of the following states:

And while I could be a cellular automaton purist and express the next states in similar terms, that would be more complicated than it needs to be. Let me speak instead of actions that can be taken for any combination of current states. There are three different actions that can be taken in zero or more steps per moment in time for any given combination of current states.

I should apologize at this point and explain that I've been using the term "neighbor" to refer to two very different things. On one hand, I use it to refer to those eight pixels around any given pixel in the cellular automata context. On the other hand, I also use it to refer to the next and previous nodes that any given loop node is linked to in the chain of nodes that forms the outer pixel loop of a bubble. To help keep the two separate, I'll try henceforth to distinguish between a "neighboring pixel" and a "neighboring node".

On a technical node, it's worth pointing out the data structure and algorithm I'm using to place and find nodes on the pixel grid of the underlying image. It might have been tempting to just create a 2D array of the same size as the image, but that would require a lot of memory. I could also just keep a list of nodes and, whenever I need to fetch the node at X, Y, search through my collection until I find a match. That would take very little memory, but run too slow. Instead, I used a key/value dictionary, where the key is a string with the X and Y coordinates and a comma between them. This may sound odd, but the algorithms built into the .Net Hashtable class, which implements a dictionary, are so efficient that searching for the X and Y coordinates as a string in this way is much faster than it would be with a random search through a list. So if the given "X,Y" key doesn't exist in the dictionary, I know there's not a node at that position in the image. Otherwise, I can easily fetch the node there.

So the essential rules for bubble growth at the pixel level are encapsulated in cellular automata-like lookup tables. I realized that in some cases, the node under consideration would not care what state some of the neighboring pixels are in, so I added a quasi-state called "don't care" to the possible states to search for in any given case declaration. Figure 5 below shows a graphical table of all of the cases I encoded:

Figure 5: Cellular automaton growth rules
Figure 5: Cellular automaton growth rules

It looks like gibberish for a while, but let me explain the symbols in it a little more. First, remember that the candidate node always starts out in the center square. In case 1, the arrow is there to help illustrate its movement from the center to the middle right square. Next, if you are struggling to read them, the words "Prev" and "Next" appear in two blocks of every case. They represent any given node's next and previous neighboring nodes. The dark blocks represent obstacles, which are anything that can impede outward growth. Next, note that there are some gold colored blocks that represent newly created nodes. Next, the lines connecting the centers of blocks are there to show exactly how all the nodes after changes are made will be hooked together. Finally, the gray blocks appear in cases where the node being considered will actually delete itself.

Before continuing, I should point out that all the cases above are repeated three more times, each combination rotated 90°, 180°, and 270°. As we'll see below, the cases above deal with one of the four cardinal directions, so these rotated copies take care of their rotationally symmetric brothers.

It at first appears that the cases above are a collection of random combinations of states, but look more closely and you will see patterns. For example, the chain of resulting nodes always flows clockwise from previous to next neighboring node. This is because a bubble's outer loop must by definition flow clockwise. If we had a bubble with all of its next- and previous-neighbor links reversed and so had it flowing counter-clockwise, the bubble would actually collapse inward instead of expanding outward. That brings us to the point that all of the above cases entail outward growth or cleanup of junk nodes left behind.

Next, as explained above, the cases here tend to focus on expanding rightward. This is not necessary, but helped me during analysis to keep from creating redundant cases to deal with downward, leftward, or upward growth, since they follow the same basic rules. The program has to deal with all four orientations, but my solution was to write code to take each of the defined cases and rotate it clockwise 90° and repeat that until each case has four copies with their own orientations.

One other pattern you may notice is that new nodes are always created "behind" the current node under consideration, where a node's previous neighbor is considered to be "behind" it in the loop. That is, they are always placed after the previous node and before the current node (green) in its new position. This is for the simple reason that processing of nodes in the loop proceeds from node to next neighbor. I didn't want those new nodes to be next in line and so be processed before the next time around. Otherwise, the growth of new nodes might start to appear cancer-like or spiraling. (I didn't really want to find out.)

Incidentally, you may recall that I said that neighboring nodes are always exactly one pixel away from one another and wonder why it appears here that the current node is sometimes allowed to move to more than one pixel away from its next and previous neighbors. This is an illusion, though, that appears because these state diagrams actually blend present and future states. The next and previous nodes are always right next to the center block, where the current node originates from. For the current node to move to a position more than one block away, it is necessary to create one or more new nodes that bridge the gap that is formed by the movement.

Cases 34 through 36 cover situations where, for whatever reason, a node gets squeezed in behind the outer perimeter of the bubble. It can never again grow outward, and the loop will still be continuous if the current node weren't there, so we just delete it and patch the next and previous nodes together.

Given the patterns you see emerging, you may now be wondering if I had some mathematically beautiful master pattern of cases to work from. I entertained the fantasy that I was for about the first ten cases, but then quickly realized the cases were just dribbling in largely at random. They don't even add up to an even numbered count. Oh, well. I can't even guarantee that there aren't other cases I might have missed, too, but I think it's fair to say that this set is thorough enough to be practical.

Finally, I should give credit where it's due. I first heard of the idea of using cellular automata for machine vision from a brief mention in Steven Levy's 1992 book, Artificial Life: A Report from the Frontier Where Computers Meet Biology. I would recommend this book, which is still as prescient today as it was when it was first published, to anyone wanting to get an introduction to AL and some great background into some aspects of modern AI research. The mention of using CA for MV said nothing about how CA was used for MV, but I wouldn't be surprised if it was used for finding contiguous regions in a similar sort of way as I'm describing.

Back up to the table of contents Engulfing Islands

One challenge I had to deal with is cases where a bubble runs into an obstacle that it wraps completely around as it grows. Eventually, the two sides of the bubble will meet on the other side. This seems like it should present no problem at first, but as the bubble continues to grow, one can end up with a large bubble that has a long seam running from its outside to some part deep inside. This would almost certainly get in the way of trying to do pattern matching, especially if there are many such seams.

Assuming one would want to get rid of the seam that results from this engulfing of an island of obstacle pixels, there's the practical question of how to do it. Figure 6 below illustrates an island that's just been engulfed by a bubble:

Figure 6: Engulfing an island
Figure 6: Engulfing an island

When I first wrote the code I used to test the bubble concept, I envisioned that the bubble would break itself in two. I figured out some simple algorithms for determining when the conditions were right for doing so. Figure 7 shows some example cases. The left-most before and after illustration shows the shadow of the larger bubble to help orient you to the important parts. Imagine each of the other four illustrations have the same sort of relationship to their parent bubbles, left out to keep the image size manageable.

Figure 7: Breaking away
Figure 7: Breaking away

In each case in figure 7, the node under consideration has another node that is part of its own bubble but is not its next or previous neighbor touching; i.e., exactly one pixel away from it. The light purple blocks represent those touching parts. But because one requirement of a bubble is that its loop nodes are all exactly one pixel unit away from their next and previous neighbors, I made it a rule that the candidate node's next and previous neighbors to be adjacent to its opposite's previous and next neighbors. The idea is that we would then delete the current node and its opposite and join the open ends of the two sub-loops together to then form two separate loops. In figure 7, you may note the light grey arrows indicating the original connections from node to node in each part. They go away when the light purple blocks go away. The bright red arrows indicate the new links created across the two parts of the loop to seal up the two open wounds and form two new loops.

I said the original goal was to break off a new, independent, inner loop. I came to realize that this was not as straightforward as it sounds and so decided instead to simply delete the inner loop. Looking back, I know I no longer need to have three nodes in a row touching three opposing nodes. That would increase the chance of a proper break-away because I'd only need to find two touching nodes and just delete all "inside" nodes.

One reason I gave up on the idea of keeping the inner loop is that, unlike normal bubbles, they would be facing inward. Tracing from node to node would mean traversing counter-clockwise, opposing the definition of a bubble. It would "prefer" to grow inward. We could, of course, just swap every node's next and previous node links to make it clockwise. But if we didn't artificially halt bubble growth, it would just start growing outward until it meets the outer bubble. What would be the use? And if we artificially halt the growth, we'll have a bubble whose perimeter is outside what it contains, rather than being just inside, like with normal bubbles. I considered tracing around the perimeter just inside to form a new bubble and then discarding the "outer inner" bubble, but it seemed nearly pointless and very complicated.

One interesting problem I had to solve, once I found two touching candidates for break-away, is which side -- the next-node side or the previous-node side -- represents the inside of the bubble to be deleted. You and I look at the bubble and it's obvious which part is inside, but it's not so obvious to the program. I thought about using some tricks used in traditional ray tracing graphics, but they would be terribly inefficient. After much thought, I came to the realization that I could take advantage of the fact that I defined a bubble as necessarily having a clockwise flow from node to next node. Take another look at figure 7. Put your finger on the upper of the two light purple nodes in the "Before" figure and start tracing in the direction of the arrows. You'll find that when you reach the other light purple node it's touching that you have traced a counter-clockwise direction. Try the same thing starting from the lower purple node until you touch the upper one. You'll trace an overall clockwise direction. It turns out that, no matter how complicated the shape of the pixel loop for a bubble is, it will always be true that the predominant direction of flow for the outer part of the bubble will flow clockwise and for an inner piece to break off will flow counter-clockwise.

The hard part, then, was figuring out a way to measure the predominating rotation direction. At first, I tried to imagine a wheel with its axle somewhere inside the sub-loop under consideration being turned by a crank as a person walks from block to block on it. That was really complicated and didn't always work. Then I realized I could get a direction (north, south, east, west, northwest, northeast, etc.) by pointing from one node to its next node. In the next step, I would see what the new direction is. I knew it could never be a 180 degree switch because two loop nodes can't occupy the same position. If it went from north to northeast, for example, it would contribute a 1 to a cumulative rotation. If it went from north to east, it would contribute a 2. North to southeast would contribute a 3. Conversely, if it went from north to northwest, that would contribute a -1. And so on. At the end, so long as the sub-loop under consideration has at least three nodes in it, we would have either a positive (clockwise) or negative (counter-clockwise) cumulative change in direction. So we would know whether we traversed the outer or inner sub-loop and so which one to delete.

One other interesting problem I had to solve is determining if a place where nodes on a bubble meet represents a case like we're discussing of engulfing an island or, alternatively, a case where a bubble grows outward through two barriers with a narrow neck. We wouldn't 'want to destroy that finger accidentally. I reasoned that the key is to know if the "outside" of the bubble is between the two nodes or outside them. If it's between them, then the two are meeting because of an engulfing. The hard part was figuring out which side of the loop node is inside and which outside. Knowing that a bubble's loop nodes traverse clockwise, I figured out that one could identify all the pixels outside the bubble that the current node touches by traversing clockwise from the previous node to the next node along those pixels that touch the current node. So if the node that's touching the current node occupies one of those outside pixels, we know this is an engulfing case. If not, it's an extended-finger case.

By now it may be apparent to you that so much of the success of this algorithm rests heavily on the assumption that bubbles have a clockwise traversal direction to them. This is an excellent illustration of the brilliant principle a colleague of mine once explained to me that the more one constrains a model's behavior, the more efficient can be the algorithms for implementing it. The coding is usually more complicated at first, but the constraints often come with hidden benefits that one becomes aware of only later once past the basic kernel of the original problem.

This process of engulfing and eliminating "islands" of obstacles has a very handy side effect. Most real-world images contain noise or small one- or two-pixel details that one can safely ignore in most cases that tend to get in the way of many region-growing techniques. In this algorithm, though, we just wrap them up and throw them out like garbage bags.

Back up to the table of contents Dumpster Diving for Data

Back up to the table of contents What's in a Bubble?

It's a fun diversion to sit in front of a computer and click away at parts of a picture and watch bubbles grow, but that's not the point. Once we have a bubble, we should be able to extract some useful information that can aid in some practical goal, like identifying objects. Fortunately, there's some low-hanging fruit here.

Back up to the table of contents Area and Perimeter

One of the easiest pieces of information we can collect is the length of the perimeter. This may sound complicated, but it's as simple as counting how many nodes there are in a bubble. In my system, I actually just keep track of the count from the original seed and increment or decrement it as new nodes are added or old ones are deleted, so the perimeter is already calculated. Technically, it's not a true perimeter in the mathematical sense, because the distance between the centers of two pixels beside each other is slightly less than the distance between the centers of two pixels diagonally touching. But our fudge factor here doesn't cause any problems (unless you're a mathematician).

Calculating the area is a little more of a challenge. I didn't bother doing this, but it shouldn't be too difficult. One way would be to use a classic flood-fill algorithm, count how many pixels get filled, and add the perimeter to it. One way that's more memory-intensive but faster would be to keep a count along the way in the same fashion as I keep the perimeter, adding to or subtracting from it with each change to the perimeter. One place where that fails is where we engulf an island and then delete the inner, inside-out bubble. We'd have to calculate the area taken up by the island when we delete it and add that value to the running total. That would probably require the same flood-fill algorithm described above. Still, this complex approach would probably be more efficient than flood-filling the entire bubble.

Back up to the table of contents Bounding Box

Computationally, one can do useful things with a bounding box, which is simply a rectangle just large enough to fully contain the bubble. This is easy to compute by cycling through each loop node and comparing the left, right, top, and bottom edges of a candidate rectangle against the node's position, nudging the rectangle's edges ever outward as needed. The rectangle starts out being exactly the same position and size as the first node in the loop.

Back up to the table of contents Longest Line

Many shapes, like sausages or flag poles, can be thought of as elongated rectangles. It can be helpful to know the orientation of such clearly orientable shapes. A simple way of doing so is to find the longest line through the bubble.

To find the longest spanning line is easy. Traverse the entire loop once. For each node considered, traverse the entire loop again. Calculate the distance between the two points being considered. If it's longer than the current best (starting with zero), make those two points and that length be the best found so far.

Figure 8 below illustrates some bubbles and their computed longest lines:

Figure 8: Some bubbles and the longest spanning lines through them
Figure 8: Some bubbles and the longest spanning lines through them

As one can see, some of the shapes clearly fit the elongated-rectangle idea and so lend themselves to finding their orientation using longest line. The players' leg pieces are great examples.

One example of a nasty problem is in the case of the left player's thighs, which were both captured in one single bubble. The longest line spanning that bubble goes outside the bubble. I didn't bother doing so, but it would have been fairly easy to traverse that line to see if any of the pixels in it are outside the bubble. We might use such a case as a way to disqualify a longest-line calculation from being used.

Incidentally, it might also be helpful to calculate a shortest spanning line, too, but chances are pretty good that the winner would go to some small local bulge if the perimeter of the bubble is ragged.

Back up to the table of contents Predominant Color or Gradient

One helpful thing to know is what the predominating color of a bubble is. One good approximation would be to take the color of the pixel in the center of the bubble seed when it is planted and assume the rest of the bubble contains like colors.

A more accurate way would be to take all the pixels contained by the bubble -- including at the perimeter -- and average out their colors. That can be computationally expensive, though.

But maybe you actually want to recognize that the bubble spans a gradient instead of a homogenously colored thing. The first thing one would have to do is define the nature of the gradient to report on. For example, one could assume the gradient is linear with a definitive orientation. Perhaps we might assume it flows in the direction of the longest line described above. Or perhaps we assume it has three corners and a blend between. Or perhaps some more complicated definition. Once that's defined, we have to take a sampling of colors at the end or corner points. That probably means sampling multiple pixels around each corner and averaging them out to make sure we don't have noisy pixels used incorrectly to represent predominating colors.

Back up to the table of contents Next Steps

Back up to the table of contents Bubbles? So What?

If you've read this far, you've probably already formed the opinion that I haven't really created any sort of machine vision. I agree. This isn't much. It's just a modest little tool, but sorely lacking in real significance.

There are two reasons I'm publishing the results of my little experiment. First, I'd like to think the algorithm I've created might give a tool to other machine vision researchers to build on or add to their existing tricks.

The second reason for publishing this is more basic. There are a lot of web sites devoted to the subject of machine vision. Unfortunately, in my research, there are few that do a particularly good job of explaining machine vision algorithms in a way that can be understood by a lay researcher like me. I've tried to describe many of my data structures and algorithms in somewhat agonizing detail because I want this stuff to be approachable to programmers, even ones with no artificial intelligence or machine vision background. I like spreading what little knowledge I have on the subjects.

So how can one use bubbles to solve practical machine vision problems?

Back up to the table of contents Polygon Approximation

One interesting application of a bubble is to reduce the ragged edge to a polygon composed of straight edges. Perhaps a picture of a cube might generate a roughly parallelogram-shaped bubble for one of the surfaces that we might be able to approximate and ultimately use for recognizing the cube or perhaps just figuring out its orientation if we already know it's a picture of a cube.

Figure 9 below illustrates an example of this idea. Note the bubble closest to the bottom with the four-sided polygon drawn over its tattered edge:

Figure 9: Polygon approximation
Figure 9: Polygon approximation

I had intended to add polygon approximation to my code, but thought that was a little too ambitious for this small project. I'll probably do so in a future variant on this theme.

How would one do polygon approximation? I can't say for sure because I've not done it, but I think it should not be terribly difficult. One method I thought of would be to start with some arbitrary node in the loop. That would be the start point for a single line segment. One would proceed forward along the nodes. The current node under consideration would be the end-point for a candidate line segment. One would continue forward in this way until the line segment no longer looks like it seems a decent match for that part of the edge. Then one would take the end-point for the previous acceptable line segment as the start point for the next and would proceed forward to create the second line segment, and so on.

How would one know that the line is starting to deviate too much? This is the hard part, and there again are some arbitrary thresholds one would have to set. I would say that probably a good algorithm to use would be to take each node between the nodes that compose the end-points of our candidate line segment and calculate how far they are from the segment. How to do it? The math isn't terribly difficult, but I'm too lazy to formulate and prove it here. Figure 10 below is a great substitute for actual math:

Figure 10: A candidate line segment and the deviation of loop nodes from it
Figure 10: A candidate line segment and the deviation of loop nodes from it

Suffice to say that the distance to the line is measured along a line drawn from the given node's center and exactly perpendicular to the line segment. So one might simply add up all the distances from node to line and then divide by the number of pixels to get the average deviation. Once the average deviation gets above a certain threshold, we would say it's time to step back one node and call that the end-point.

One problem with this trick is that the rougher the edge -- and figure 9 shows some pretty rough-edged bubbles, given how perfectly straight and crisp the windows appear to be -- the greater will be the average deviation measured. Common sense should tell one that that shouldn't stop us from drawing straight lines. And setting the threshold higher will probably mean we screw up on a lot of less straight bits. Perhaps one thing we might add to the mix is whether the deviation is to the "left" or "right" of the line segment and try to stay in the middle of lots of little deviations. Or we might say we'll allow the deviation to be no larger than a certain amount for any single pixel instead of averaging them all out.

Or maybe we might add a feedback mechanism that takes a first stab at coming up with a polygon and then tweak some thresholds and try again. Maybe the goal would be to get an acceptable balance between seeking the fewest line segments and the smallest deviations.

I should point out that the node we start the process off with would probably often be in the middle of a "real" line segment. We probably wouldn't want two separate line segments (first and last) being basically parallel and connected, so how would we resolve this? One simple way would be to always discard the first line segment drawn and expect to draw it again as the last segment. Another way might be to see if they are basically parallel and, if so, replace them both with a single segment as a final step.

One other thing I notice in the example I contrived in figure 9 is that my polygon is well within the outer bounds of the window pane. I can tell my bubble stopped not on that outer boundary, but just inside it. That's what I intended. But maybe a polygon-fitting algorithm should seek to fit the polygon mostly outside the bubble. How would one do this? I'm not sure, but one possibility would be to come up with the best approximation using the tricks described above and then create line segments that are parallel to each, but have each one offset enough to put it outside at least 90% of the pixels in the tattered edge of the bubble. (Yet another threshold value to tweak.) Calculating how far to offset should not be difficult if we're already calculating the deviations of each pixel along a line segment any way. And joining the new line segments to their neighbors is a simple matter of finding the intersections of infinitely long lines extended through the line segments.

One interesting benefit to polygon approximation is that one can look out beyond one bubble to see if its line segments line up closely with those of other bubbles. Imagine being able to recognize, for example, that the window in figure 9 above is vertically in line with the window above it. That's exactly the sort of pattern your own eyes pick up naturally. In fact, if I added a tree branch across the middle of the picture, you'd see right "through" the tree that the straight lines continue from one side of it the next. This kind of intuitive recognition of the continuity of edges through obstacles should be a part of a good general purpose machine vision tool kit, too.

It seems to me one can use a polygon to characterize a bubble's shape in a few ways. For one, we could calculate the angles among the connecting line segments. We could count up the total number of vertices, of course, but then count up the number of concave corners -- the ones whose angles cut inside the bubble instead of bulging outward. We could then use some simple math to come up with a ratio of how "dented" our bubble is, which can easily be used in pattern matching. We can also use the number of segments found to narrow down the number of possible shapes that might actually match it. Four segments could be considered likely a rectangle, for example.

Back up to the table of contents Spline Curve Approximation

If polygon approximation doesn't make your head spin enough, consider trying to approximate spline curves instead. Splines are smooth curves used by engineers and graphics people every day. There are several different kinds and they each have their own mathematical formulations, but one that seems to best suit the needs of fitting our bubbles is a kind called a "natural" spline.

Figure 11 below shows an illustration of a cool spline-drawing Java applet I found on the personal web site of Mizutori Tetsuya:

Figure 11: Screen shot of a spline-drawing applet
Figure 11: Screen shot of a spline-drawing applet

The little squares in figure 11 represent points I placed in the image and the curve near them is a natural spline that approximates a smoothed-out version of a polygon (poly-line, actually; it's not closed) drawn from square to square.

It's tempting to think this might even be easier than polygon approximation, from a programming standpoint. Just take maybe every tenth loop node as a point on a natural spline and go from there. But I doubt the results of such a cheap algorithm would be satisfactory.

And there's a more basic question of what one would do with the resulting spline. Yes, it wouldn't be as tattered as the original bubble in many cases, but so what? PhotoShop makes great use of this technique in one of its newer region selection tools. And I suppose if your goal were to, say, abstract a cartoon-like vector drawing of a person's face recognized in an image, this might be useful. But as for recognition, I'm not sure it would be very helpful. Perhaps it could be used to find tangent lines' directions in some cases. I'm pretty sure for the rectangular windows of figure 9 in the previous section, though, going for splines instead of polygons would do more harm than good.

How would one know when to prefer a spline over a straight line segment? I don't know, actually. Ideally, one would be able to support both. I haven't given enough thought to how to decide when to use which to give useful suggestions.

Still, I thought it worth mentioning this idea, because there may yet be some interesting applications that could come of it.

Back up to the table of contents Circles and Sticks

One thing I've been trying to determine is a good way of characterizing the shape of a bubble. I've come up with all sorts of ideas, but not a single one seems fully satisfactory for lots of cases. One that shows a little promise I like to call "circles and sticks".

The idea seems straightforward. Look for corners, circular cusps, or "dead end alleys" and plant circles in them. Then start drawing lines to connect all of the circles planted. Those that cross significantly outside the bubble would be thrown away. Figure 12 below illustrates this in part, though I got too lazy to actually draw all the possible circles and sticks.

Figure 12: Circles and sticks
Figure 12: Circles and sticks

The circles and remaining connecting lines may indeed be useful enough to characterize some shapes in ways that could be used with traditional pattern matching techniques like neural networks.

How would one determine where to plant circles? I'm not entirely sure of a very efficient way, but think I know of at least one possible way. Start with the assumption that circles must be within a certain radius range (yet more thresholds to tweak). Starting with one node in the loop, we would traverse along the entire loop. For each node we come across, we would step forward some number of nodes that's the same size as the minimum radius or some proportion of it. Looking at the pixels from the current node to that forward node, we would test to see if there is a trend towards cusping inward. One way to do this is to take the node in the middle of the loop segment we're studying and see if, together with the end points, it forms a roughly isosceles triangle -- one whose "legs" from the center point are equal in length. If it does and if the triangle points outward, then it's not hard to fit a circle to these three points. We would calculate the radius for it and, if it's in the range we consider acceptable, we can continue. We then take each node in our loop segment and calculate how much its center deviates from the edge of the circle we are considering. Since we know where know where the circle's center point is, we need only calculate the distance from each node to the center point and subtract the circle's radius to get the deviation. We then use the same kind of techniques described above for polygon fitting to decide if the deviations are acceptable or not. Figure 13 below illustrates this idea a little better:

Figure 13: Fitting a circle to a segment of loop nodes
Figure 13: Fitting a circle to a segment of loop nodes

We would continue extending our end point forward along the loop, looking to see if we can still find circles that fit. At some point, naturally, we would stop finding good matches.

One thing one would also have to do is check the other nodes in the loop to make sure they mostly don't fall inside our circle. If they do, then our circle is too big. We only want circles that are inside the bubble.

Back up to the table of contents Shortcomings of Bubbles

I've tried to make clear that bubbles are not an end in themselves for machine vision, to my thinking. But in fairness, it's also worth pointing out some of the basic shortcomings of the bubble concept.

One of the biggest shortcomings is that there is no obvious way to decide where to start growing bubbles. One possible approach would be to start at the center and, once the first bubble is grown, start planting new ones just outside its perimeter, one at a time, until the whole image is basically filled with bubbles. I doubt that would be good enough for a lot of cases. Would one really want a picture of a tree with lots of little, well-defined leaves in it filled with thousands of tiny bubbles and all the follow-up computation that comes with them?

Actually, though, maybe this isn't so much a shortcoming as an opportunity to consider a different way of thinking about machine vision. I know I'm not the first to suggest it, but there seems some currency in the idea of making machine vision more like the darting-gaze approach used by your own eyes. When you read a page of text, do you take a snapshot of the page and then mull over the image? No; your fovea -- the central, detail-rich portion of your vision -- is brought to focus on a very small area of the page only a few words in size and your eyes trace along to focus on whatever is next in line to read. This happens in all things. You may have a visual awareness of the world around you, but your eyes -- and often your head -- swivel around the panorama to focus on things of interest. Why can't a bubble-based vision system do the same? Maybe an MV system can shift a central focus point around the frame of a camera's video stream as action takes place, building an "awareness" of the total scene's important features, but only focusing on some specific feature of interest at any given time by "bubbling it" to study its shape, size, movement, and so on.

More generally, this project has added weight to my growing conviction that human vision is proactive. I don't think we simply absorb images passively like we ask machine vision systems to do. I think we build 3D models in our brains of the things we see. Most of our visual lives are spent silently taking in small facts about how those objects are coming and going in those models and we update our models to fit those facts. Only on rare occasions do unexpected things creep in. When that happens, the lower-levels of one's visual system reaches out to the higher levels and sometimes one's highest level of awareness to help make sense of the unexpected. You watch cars go by on a busy street, but if you suddenly saw one of them shoot straight up, even if it was in an orderly fashion, it would catch your attention and you would look at it.

I think that ultimately, general purpose machine vision of a level of sophistication similar to our own will probably not come to fruition until this model-building function is added to the picture. I think it speeds up processing for us, but it also seems a basic necessity for dealing with a visually noisy world.

I guess it goes without saying that my sample code doesn't do anything like this.

One thing I'm disappointed by but not surprised at is how badly my bubble growth, like many other flood-fill based systems, deals with spilling over some boundaries I don't want it to and failing to bleed out to the edges I do want it to. I think this is probably a place where expectations play a role. Having a bubble detect a possible rectangle should lead to a rough approximation of that rectangle and then a finer-grained set of attempts to fit the rectangle to the underlying pixels. That is, there has to be a back and forth process in which lower levels provide raw information and higher levels negotiate with the lower levels over possible interpretations and outward projections of models for comparison with the observable.

Again, my program doesn't do anything like this.

Sadly, my program is also a bit slow. I've learned not to despair over this, though. While I'm always trying to take advantage of obvious avenues for optimization, I find the most important thing is to get a first prototype of a new concept working. Cut the path and someone will eventually pave it.

Back up to the table of contents Download My Source Code

I'd like to think a programmer can take what I've described above and reproduce it, but there's something nice about seeing source code. Here it is for download:

Download the ZIP file (original)

Download version 2 (total rewrite after this essay was published)

Admittedly, my test UI is a pretty bad. I'm not proud of it at all. You'll have to change the code to do simple things like select a different image to work with. But then, I didn't intend for it to be a practical end in itself. Here's a screen shot of the complete UI:

Figure 14: Sample program's user interface
Figure 14: Sample program's user interface

To start growing bubbles, just click on the image on the right side and watch the fun. Following are some pieces of code you might want to tweak to get somewhat different behaviors:

I paid much more attention to the core algorithms and so largely separated them into their own classes. They can easily be packed into a separate DLL project, but I didn't see the point for my tests. Besides, it's my first take at this, so I was much more lax in my commenting and good entity naming standards than I usually am. I usually do UIs in VB.Net and core engines in C#, but not this time. C# is great, but it is slower to code in in most cases. I didn't want to be slowed down by that for a simple demonstration system.

Bubble.vb is the heart of the bubble engine. Following are some things you might want to tweak to get different behaviors:

One final note. Although I have found that this code is mostly stable, there are some odd cases of bubble growth where, for whatever reason, island engulfing doesn't work properly and the program crashes. You may see it if you work with the flower image, so don't think it's just you if you come across it. You'll probably find that if you reproduce the case, you'll get the exact same crash, a good indication that you've come across a genuine bug.

Back up to the table of contents Updates

5/22/2005 - Code rewrite (version 2)

Some time after first publishing this essay, I realized there was more I'd like to do with bubbles. To get there, though, I thought I should rewrite my bubble code so it would be more object oriented and otherwise easier to build applications on top of. I never did get to the later application work, but I did get a more solid and efficient algorithm. Instead of the demo and engine code being one, they are separate projects. Instead of all VB.Net, the demo UI is in VB.Net and the engine is in C#.

Significantly, instead of sweeping engulfed islands of stuff under the carpet and ignoring them, I recognized that they are important for object recognition, too. I now make a distinction between a bubble's outer loop and its collection of zero or more inner loops.

One other small nicety is that each pixel loop also comes with a bounding rectangle. This can be useful in determining relative size or in optimizing calculations that might cover the full extents of a bubble.

To download the newer version, go back up to the Download My Source Code section and select version 2.

Back up to the table of contents Your Feedback

I encourage you to let me know what you think of this project or if you intend to use this in your own projects. Does this help solve some problems for you? What limitations get in the way of your research? To be sure, I don't really want to be a help desk for the software, but you're welcome to ask if you want some advice on how to apply these concepts to your work. Drop me a line.

Send me email.


Go to Alexandria's home page Copyright © 2012 The Library of Alexandria. All rights reserved.
Produced in cooperation with Carnell Information Systems, Inc.