Go to Alexandria's home page
The Library of Alexandria

Topics in Machine Vision

Examples of Patch Equivalence in Stereo Vision

Alexandria Home | Up One Level

[ < Return to Main Document | Remove Enlargment ]

Table of Contents

Data Points
Explanation
     What's in the Figures?
     Interpretation
     Room for Improvement

Back up to the table of contents Data Points

Following is a series of illustrations of the patch equivalence concept in action in the stereo vision context. It shows roughly equal numbers of examples of successful and failed attempts to find the corresponding patch in the right image as was hand selected in the left image. In each case, the search in the right image involved moving the candidate patch one pixel at a time over the entire image. The algorithm selected whatever it considered the closest match without any biases in either the horizontal or vertical direction.

See below for a detailed explanation of what's going on here and my interpretation of the strengths and weaknesses of this algorithm.

Click "Enlarge these images" next to any that you would like to see enlarged. An enlarged triplet of the images will appear just below it. I've customized this page to print out well by default, but the enlargement will most likely not print properly. If you find you want to print the page without an enlargement, use the "Remove Enlargement" link in the toolbar at the top of this page.

Statistics Left Eye Right Eye Similarity Field
Set 1
Enlarge these images.
Difference Range: 39.38%
Close Matches: 2.06%
Certainty: 97.94%
My Opinion: Good Match
Set 2
Enlarge these images.
Difference Range: 41.86%
Close Matches: 2.31%
Certainty: 97.69%
My Opinion: Bad Match
Set 3
Enlarge these images.
Difference Range: 30.42%
Close Matches: 6.75%
Certainty: 93.25%
My Opinion: Bad Match
Set 4
Enlarge these images.
Difference Range: 29.6%
Close Matches: 4.48%
Certainty: 95.52%
My Opinion: Good Match
Set 5
Enlarge these images.
Difference Range: 31.91%
Close Matches: 6.71%
Certainty: 93.29%
My Opinion: Good Match
Set 6
Enlarge these images.
Difference Range: 31.57%
Close Matches: 1.08%
Certainty: 98.92%
My Opinion: Bad Match
Set 7
Enlarge these images.
Difference Range: 39.68%
Close Matches: 0.04%
Certainty: 99.96%
My Opinion: Good Match
Set 8
Enlarge these images.
Difference Range: 55.93%
Close Matches: 27.81%
Certainty: 72.19%
My Opinion: Bad Match
Set 9
Enlarge these images.
Difference Range: 31.1%
Close Matches: 5.71%
Certainty: 94.29%
My Opinion: Bad Match
Set 10
Enlarge these images.
Difference Range: 45.82%
Close Matches: 2.79%
Certainty: 97.21%
My Opinion: Good Match
Set 11
Enlarge these images.
Difference Range: 29.76%
Close Matches: 9.13%
Certainty: 90.88%
My Opinion: Bad Match
Set 12
Enlarge these images.
Difference Range: 15.06%
Close Matches: 0.04%
Certainty: 99.96%
My Opinion: Good Match
Set 13
Enlarge these images.
Difference Range: 15.07%
Close Matches: 0.88%
Certainty: 99.13%
My Opinion: Bad Match
Set 14
Enlarge these images.
Difference Range: 27.25%
Close Matches: 7.73%
Certainty: 92.27%
My Opinion: Good Match
Set 15
Enlarge these images.
Difference Range: 12.71%
Close Matches: 1.25%
Certainty: 98.75%
My Opinion: Bad Match
Set 16
Enlarge these images.
Difference Range: 24.05%
Close Matches: 0.13%
Certainty: 99.88%
My Opinion: Good Match
Set 17
Enlarge these images.
Difference Range: 17.55%
Close Matches: 4.02%
Certainty: 95.98%
My Opinion: Bad Match
Set 18
Enlarge these images.
Difference Range: 19.99%
Close Matches: 3.77%
Certainty: 96.23%
My Opinion: Good Match
Set 19
Enlarge these images.
Difference Range: 20.89%
Close Matches: 6.13%
Certainty: 93.88%
My Opinion: Bad Match
Set 20
Enlarge these images.
Difference Range: 25.22%
Close Matches: 0.31%
Certainty: 99.69%
My Opinion: Good Match
Set 21
Enlarge these images.
Difference Range: 48.8%
Close Matches: 0.1%
Certainty: 99.9%
My Opinion: Good Match
Set 22
Enlarge these images.
Difference Range: 54.86%
Close Matches: 4.02%
Certainty: 95.98%
My Opinion: Bad Match
Set 23
Enlarge these images.
Difference Range: 31.13%
Close Matches: 2.79%
Certainty: 97.21%
My Opinion: Bad Match
Set 24
Enlarge these images.
Difference Range: 38.02%
Close Matches: 0.08%
Certainty: 99.92%
My Opinion: Good Match
Set 25
Enlarge these images.
Difference Range: 42.02%
Close Matches: 0.02%
Certainty: 99.98%
My Opinion: Good Match
Set 26
Enlarge these images.
Difference Range: 66.68%
Close Matches: 14.71%
Certainty: 85.29%
My Opinion: Bad Match
Set 27
Enlarge these images.
Difference Range: 37.52%
Close Matches: 0.02%
Certainty: 99.98%
My Opinion: Good Match
Set 28
Enlarge these images.
Difference Range: 62.01%
Close Matches: 22.71%
Certainty: 77.29%
My Opinion: Bad Match
Set 29
Enlarge these images.
Difference Range: 47.78%
Close Matches: 0.35%
Certainty: 99.65%
My Opinion: Good Match
Set 30
Enlarge these images.
Difference Range: 73.74%
Close Matches: 9.83%
Certainty: 90.17%
My Opinion: Bad Match

Back up to the table of contents Explanation

Back up to Explanation What's in the Figures?

The data figures above are laid out as follows. Each row represents a single snapshot taken of what is seen in a given moment by the left and right eyes (cameras). You typically see a series of ten nearly identical images, but each row's left and right images have red boxes in them representing a patch pair for that pair of left / right images. I hand-chose the patch for the left eye's image. The right image's patch, then, was chosen by a simple patch equivalence algorithm that considered every possible location in the right-hand image.

The right-most, black and white image is interesting. It shows the degree of similarity of each location considered. The blacker the pixel, the more similar that location was to the stationary left-hand patch. The little red dot in each of these "similarity field" images indicates the best match chosen.

To the left of each triplet of images is a set of basic statistics. The "Difference Range" statistic is calculated by finding the best and worst of all differences measured throughout the image and finding the difference between them. Each difference is between 0% (no differences) and 100% (completely different). The "Close Matches" statistic is calculated as the percent of all locations tested that had a difference below 5% of the difference range; that is, those candidate patch locations that pretty closely resembled the left-hand patch. Though I considered some more interesting alternatives, I simply calculated the "Certainty" statistic as 100% minus the close-matches statistic. Naturally, the "My Opinion" value is how I judged the resulting match. I was slightly liberal in that I accepted matches that might be one and perhaps two pixels off from what I might have hand-picked. Still, it should be pretty clear from the bad matches that they weren't just slightly bad; they were pretty awful.

Back up to Explanation Interpretation

I was tempted to just pass by this academic experiment, because I doubted it would be worth much. But I was so close to doing it as I was recently experimenting with stereo vision that I thought it worth at least a little try. I think the data here illustrates pretty well some of the potentials and challenges of using the patch equivalence concept in stereo vision.

Set 10
Enlarge these images.
Difference Range: 45.82%
Close Matches: 2.79%
Certainty: 97.21%
My Opinion: Good Match

Image set 10 is a decent illustration of what I considered a good match. If you enlarge the following set and depending on your monitor, you should see that there is only one very dark region in the similarity field (black and white) image, though there are some other stripes of darkness. This is exactly what one would like to expect to happen. Each pixel in the patch on the left is compared with the patch on the right, so in theory, an interesting patch like the one on the left really should only match a single place on the right, like this.

Set 2
Enlarge these images.
Difference Range: 41.86%
Close Matches: 2.31%
Certainty: 97.69%
My Opinion: Bad Match

Set 2 provides an understandable counterexample. The left patch does include some level of interesting features, but one can easily forgive the algorithm for picking a quite different place to put the right patch, given how subtle the color differences within the left patch are. The similarity field image shows a big, soft region of potential matches in the right image for the left patch.

Set 6
Enlarge these images.
Difference Range: 31.57%
Close Matches: 1.08%
Certainty: 98.92%
My Opinion: Bad Match

Set 6 begins to bring to light just how miserably the present algorithm can fail. The left patch clearly has features of interest that can't be found anywhere else in the right image but the place you expect it to. So why does it choose a place in the ceiling as a better match than the picture frame? Study the left and right images a little more closely and it should become clear that the right-hand image is a little darker than the left. Others have commented on the brittleness of left-right patch matching based on color alone -- especially in black and white images -- and this example makes clear this problem. The simple fact is that the colors in the right patch chosen on the ceiling are actually more similar overall to the colors in the left patch in the picture frame than the ones for the right picture frame.

To be sure, this patch equivalence algorithm really doesn't know anything about the edges and relative color distinctions within the image. If it did, it might more easily find the right target. Instead, it's just looking, pixel by pixel within each patch, at its color, without regard to the rest of the patch pixels, let alone the pixels outside the patch. Could better cameras resolve this by making the images more similar in brightness and color levels? I'm sure I could do better than the cheap webcams I chose, but I doubt even the best cameras could completely eliminate this problem. Perhaps calibration-based adjustments of the images to normalize them to each other could correct for some of the differences between the images, but again, I doubt this would eliminate this problem entirely. Nor, I think, are our eyes perfectly calibrated in this sense. Clearly, we use other mechanisms to overcome the fact that each eye sees things somewhat differently.

It should be pointed out, however, that while there are pretty sharp features of interest within the left patch, they are only strong features to our eyes because we see the broader image and pick them out. The patch's pixels are actually all very similar in color to one another. Subsequent sets will show how strong contrasts are key to successfully finding appropriate matches.

Set 12
Enlarge these images.
Difference Range: 15.06%
Close Matches: 0.04%
Certainty: 99.96%
My Opinion: Good Match

Set 12, taken from a different viewpoint, shows another good match. The similarity field shows just how distinct the target area (dark black) is from the rest of the image. But then, here's a patch whose features are strikingly different: the corner of a white block bounded by a black band and then the brown carpet beyond it. The fact that the right image is darker than the left can't overcome this set of strong contrasts.

Set 11
Enlarge these images.
Difference Range: 29.76%
Close Matches: 9.13%
Certainty: 90.88%
My Opinion: Bad Match

Yet set 11 provides a contrasting example. It seems hard to imagine a more strikingly contrasted set of features than a black laptop lid with a white logo, yet the algorithm instead choose a different part of the laptop cover as its best match. It's hard for me to say clearly why it did so, but I suspect the fact that the logo feature is so small, compared to the patch overall, that its pixels' "votes" don't count for much in measuring the degree of difference between it and candidate patches in the right image.

Set 20
Enlarge these images.
Difference Range: 25.22%
Close Matches: 0.31%
Certainty: 99.69%
My Opinion: Good Match

Set 20 provides a surprisingly good match. Given how regular the pattern of CDs in the rack is, it could have matched several other places. Yet again, closer inspection shows I chose carefully a set of features with reasonably strong contrasts within this area. There's a black case next to some white ones, plus we have the bottom edge (and black shadow) below those CDs, and the boundary between two other stacks of CDs underneath. So even though the overall contrasts are not very strong within the patch because of how dim the room is, there are enough strong features for the algorithm to correctly pick the right candidate patch.

Set 21
Enlarge these images.
Difference Range: 48.8%
Close Matches: 0.1%
Certainty: 99.9%
My Opinion: Good Match

Set 21, taken from still another viewpoint, shows again a good match, despite clear brightness differences between the left and right images. But again, there are strong contrasting colors to the features. Also illustrated here is the importance of sharp corners. The peak of the path light helps ensure that the candidate patch chosen isn't too much to the left or right of the goal patch.

Set 8
Enlarge these images.
Difference Range: 55.93%
Close Matches: 27.81%
Certainty: 72.19%
My Opinion: Bad Match

Set 8 shows an obvious example of the converse, where strong features are selected, but with no sharp corners to anchor the patch along the "rail" of strong candidates.

Set 24
Enlarge these images.
Difference Range: 38.02%
Close Matches: 0.08%
Certainty: 99.92%
My Opinion: Good Match

Set 24 shows just how soft the features' edges can be. The color contrasts are the important part.

Set 28
Enlarge these images.
Difference Range: 62.01%
Close Matches: 22.71%
Certainty: 77.29%
My Opinion: Bad Match

Set 28 is interesting in that the left patch clearly has strong features, but the right image's version of that area is so different that we would be hard pressed to complain that the algorithm couldn't find it.

Back up to Explanation Room for Improvement

While I wasn't expecting miracles from this first patch equivalence algorithm, I was surprised at some of the results. One message I take away from the results I got is just how sensitive this algorithm is to subtly varying color disparities in the left and right images. It seems that a good match requires the test patch to have a set of contrasting colors that are stronger in their contrasts than the overall color difference of the cameras.

It occurred to me that there are some interesting tweaks that can be performed on the graphics before the PE algorithm is applied to increase the contrasts and thus reduce the impact of such color disparities. A basic contrast enhancement function of the sort found in most image manipulation programs would probably greatly improve the results. On the other hand, it could also magnify the disparities to the point of negating the value of the stronger patch features. Another possibility might be to blend in a basic pixel-level edge detection algorithm's output. This too could create sharper contrasts, but without artificially increasing contrasts in ambient color regions.

Naturally, I could also constrain where to grab test patches from. I already know that I can expect matching patches to be somewhere within the same narrow horizontal band. We can also reasonably expect that the right patch should only be able to fall somewhere within left and right bounds that would be appropriate if the target object were within a distance range from, say, a few inches to infinitely far from the cameras. These vertical and horizontal constraints, alone, would surely greatly improve the accuracy of this PE algorithm, not to mention speeding it up an order of magnitude or two. But then, this experiment was to see how well the underlying matching algorithm would work without the crutches of such helper algorithms.

I've been spending much time lately considering how Jeff Hawkins' memory-prediction model can play a role in machine vision systems. I got a vivid example of it this morning as I was straining my eyes in the dark to determine the color of two similarly shaped belts. I knew one was reddish brown and the other black, but in the dim light, it was hard to tell which was which. I thought I had finally figured it out, until I brought them into the light and saw I had guessed the colors backward. Returning to the dark, I was amazed at how my new expectation seemed to viscerally change their colors. It was now "obvious" which colors I was looking at. I don't mean to say that I simply knew the left was brown and the right was black and I could see a subtle difference. I mean that suddenly I could "see" brown and black there in the dark. All I needed was a sampling of their true colors and the knowledge that I had one belt in each hand and the expected colors got mapped to the dim-light colors, especially as I held them side by side for comparison.

Somehow, it must be possible to bring this same principle into our machine vision products. An MV system should be able to sense that it's dark and thus be able to set its expectations about colors, sharpness of contrasts, how visible distant objects will be, and so on. I still don't quite know how, though. What is there to remember? How does one symbolically represent such a circumstance as to be able to detect it again? And how does one represent the "actions" to take to affect the process of seeing? What are those actions? And finally, how does a visual system come to recognize the significance of this novel circumstance and decide that it's worth categorizing and remembering? These pixel-level tests seem so pathetically primitive when one considers the mountain yet to climb.


Go to Alexandria's home page Copyright © 2012 The Library of Alexandria. All rights reserved.
Produced in cooperation with Carnell Information Systems, Inc.