Interpretation
I was tempted to just pass by this academic experiment, because I doubted it would be worth much. But I was so close to doing it as I was recently experimenting with stereo vision that I thought it worth at least a little try. I think the data here illustrates pretty well some of the potentials and challenges of using the patch equivalence concept in stereo vision.
| Set 10 |
| Enlarge these images. |
| Difference Range: | | 45.82% |
| Close Matches: | | 2.79% |
| Certainty: | | 97.21% |
| My Opinion: | | Good Match |
|
 |
 |
 |
Image set 10 is a decent illustration of what I considered a good match. If you enlarge the following set and depending on your monitor, you should see that there is only one very dark region in the similarity field (black and white) image, though there are some other stripes of darkness. This is exactly what one would like to expect to happen. Each pixel in the patch on the left is compared with the patch on the right, so in theory, an interesting patch like the one on the left really should only match a single place on the right, like this.
| Set 2 |
| Enlarge these images. |
| Difference Range: | | 41.86% |
| Close Matches: | | 2.31% |
| Certainty: | | 97.69% |
| My Opinion: | | Bad Match |
|
 |
 |
 |
Set 2 provides an understandable counterexample. The left patch does include some level of interesting features, but one can easily forgive the algorithm for picking a quite different place to put the right patch, given how subtle the color differences within the left patch are. The similarity field image shows a big, soft region of potential matches in the right image for the left patch.
| Set 6 |
| Enlarge these images. |
| Difference Range: | | 31.57% |
| Close Matches: | | 1.08% |
| Certainty: | | 98.92% |
| My Opinion: | | Bad Match |
|
 |
 |
 |
Set 6 begins to bring to light just how miserably the present algorithm can fail. The left patch clearly has features of interest that can't be found anywhere else in the right image but the place you expect it to. So why does it choose a place in the ceiling as a better match than the picture frame? Study the left and right images a little more closely and it should become clear that the right-hand image is a little darker than the left. Others have commented on the brittleness of left-right patch matching based on color alone -- especially in black and white images -- and this example makes clear this problem. The simple fact is that the colors in the right patch chosen on the ceiling are actually more similar overall to the colors in the left patch in the picture frame than the ones for the right picture frame.
To be sure, this patch equivalence algorithm really doesn't know anything about the edges and relative color distinctions within the image. If it did, it might more easily find the right target. Instead, it's just looking, pixel by pixel within each patch, at its color, without regard to the rest of the patch pixels, let alone the pixels outside the patch. Could better cameras resolve this by making the images more similar in brightness and color levels? I'm sure I could do better than the cheap webcams I chose, but I doubt even the best cameras could completely eliminate this problem. Perhaps calibration-based adjustments of the images to normalize them to each other could correct for some of the differences between the images, but again, I doubt this would eliminate this problem entirely. Nor, I think, are our eyes perfectly calibrated in this sense. Clearly, we use other mechanisms to overcome the fact that each eye sees things somewhat differently.
It should be pointed out, however, that while there are pretty sharp features of interest within the left patch, they are only strong features to our eyes because we see the broader image and pick them out. The patch's pixels are actually all very similar in color to one another. Subsequent sets will show how strong contrasts are key to successfully finding appropriate matches.
| Set 12 |
| Enlarge these images. |
| Difference Range: | | 15.06% |
| Close Matches: | | 0.04% |
| Certainty: | | 99.96% |
| My Opinion: | | Good Match |
|
 |
 |
 |
Set 12, taken from a different viewpoint, shows another good match. The similarity field shows just how distinct the target area (dark black) is from the rest of the image. But then, here's a patch whose features are strikingly different: the corner of a white block bounded by a black band and then the brown carpet beyond it. The fact that the right image is darker than the left can't overcome this set of strong contrasts.
| Set 11 |
| Enlarge these images. |
| Difference Range: | | 29.76% |
| Close Matches: | | 9.13% |
| Certainty: | | 90.88% |
| My Opinion: | | Bad Match |
|
 |
 |
 |
Yet set 11 provides a contrasting example. It seems hard to imagine a more strikingly contrasted set of features than a black laptop lid with a white logo, yet the algorithm instead choose a different part of the laptop cover as its best match. It's hard for me to say clearly why it did so, but I suspect the fact that the logo feature is so small, compared to the patch overall, that its pixels' "votes" don't count for much in measuring the degree of difference between it and candidate patches in the right image.
| Set 20 |
| Enlarge these images. |
| Difference Range: | | 25.22% |
| Close Matches: | | 0.31% |
| Certainty: | | 99.69% |
| My Opinion: | | Good Match |
|
 |
 |
 |
Set 20 provides a surprisingly good match. Given how regular the pattern of CDs in the rack is, it could have matched several other places. Yet again, closer inspection shows I chose carefully a set of features with reasonably strong contrasts within this area. There's a black case next to some white ones, plus we have the bottom edge (and black shadow) below those CDs, and the boundary between two other stacks of CDs underneath. So even though the overall contrasts are not very strong within the patch because of how dim the room is, there are enough strong features for the algorithm to correctly pick the right candidate patch.
| Set 21 |
| Enlarge these images. |
| Difference Range: | | 48.8% |
| Close Matches: | | 0.1% |
| Certainty: | | 99.9% |
| My Opinion: | | Good Match |
|
 |
 |
 |
Set 21, taken from still another viewpoint, shows again a good match, despite clear brightness differences between the left and right images. But again, there are strong contrasting colors to the features. Also illustrated here is the importance of sharp corners. The peak of the path light helps ensure that the candidate patch chosen isn't too much to the left or right of the goal patch.
| Set 8 |
| Enlarge these images. |
| Difference Range: | | 55.93% |
| Close Matches: | | 27.81% |
| Certainty: | | 72.19% |
| My Opinion: | | Bad Match |
|
 |
 |
 |
Set 8 shows an obvious example of the converse, where strong features are selected, but with no sharp corners to anchor the patch along the "rail" of strong candidates.
| Set 24 |
| Enlarge these images. |
| Difference Range: | | 38.02% |
| Close Matches: | | 0.08% |
| Certainty: | | 99.92% |
| My Opinion: | | Good Match |
|
 |
 |
 |
Set 24 shows just how soft the features' edges can be. The color contrasts are the important part.
| Set 28 |
| Enlarge these images. |
| Difference Range: | | 62.01% |
| Close Matches: | | 22.71% |
| Certainty: | | 77.29% |
| My Opinion: | | Bad Match |
|
 |
 |
 |
Set 28 is interesting in that the left patch clearly has strong features, but the right image's version of that area is so different that we would be hard pressed to complain that the algorithm couldn't find it.
Room for Improvement
While I wasn't expecting miracles from this first patch equivalence algorithm, I was surprised at some of the results. One message I take away from the results I got is just how sensitive this algorithm is to subtly varying color disparities in the left and right images. It seems that a good match requires the test patch to have a set of contrasting colors that are stronger in their contrasts than the overall color difference of the cameras.
It occurred to me that there are some interesting tweaks that can be performed on the graphics before the PE algorithm is applied to increase the contrasts and thus reduce the impact of such color disparities. A basic contrast enhancement function of the sort found in most image manipulation programs would probably greatly improve the results. On the other hand, it could also magnify the disparities to the point of negating the value of the stronger patch features. Another possibility might be to blend in a basic pixel-level edge detection algorithm's output. This too could create sharper contrasts, but without artificially increasing contrasts in ambient color regions.
Naturally, I could also constrain where to grab test patches from. I already know that I can expect matching patches to be somewhere within the same narrow horizontal band. We can also reasonably expect that the right patch should only be able to fall somewhere within left and right bounds that would be appropriate if the target object were within a distance range from, say, a few inches to infinitely far from the cameras. These vertical and horizontal constraints, alone, would surely greatly improve the accuracy of this PE algorithm, not to mention speeding it up an order of magnitude or two. But then, this experiment was to see how well the underlying matching algorithm would work without the crutches of such helper algorithms.
I've been spending much time lately considering how Jeff Hawkins' memory-prediction model can play a role in machine vision systems. I got a vivid example of it this morning as I was straining my eyes in the dark to determine the color of two similarly shaped belts. I knew one was reddish brown and the other black, but in the dim light, it was hard to tell which was which. I thought I had finally figured it out, until I brought them into the light and saw I had guessed the colors backward. Returning to the dark, I was amazed at how my new expectation seemed to viscerally change their colors. It was now "obvious" which colors I was looking at. I don't mean to say that I simply knew the left was brown and the right was black and I could see a subtle difference. I mean that suddenly I could "see" brown and black there in the dark. All I needed was a sampling of their true colors and the knowledge that I had one belt in each hand and the expected colors got mapped to the dim-light colors, especially as I held them side by side for comparison.
Somehow, it must be possible to bring this same principle into our machine vision products. An MV system should be able to sense that it's dark and thus be able to set its expectations about colors, sharpness of contrasts, how visible distant objects will be, and so on. I still don't quite know how, though. What is there to remember? How does one symbolically represent such a circumstance as to be able to detect it again? And how does one represent the "actions" to take to affect the process of seeing? What are those actions? And finally, how does a visual system come to recognize the significance of this novel circumstance and decide that it's worth categorizing and remembering? These pixel-level tests seem so pathetically primitive when one considers the mountain yet to climb.