Go to Alexandria's home page
The Library of Alexandria

Artificial Intelligence

Jim Carnicelli's AI Blog - All Entries

Alexandria Home | Up One Level

ò
Switch to multi-page mode for smaller pages with cross-navigation.    Switch to single-page mode for all content in one page.

Where am I?

My name is Jim Carnicelli. This is my personal blog focussing on issues in artificial intelligence (AI) and its stepchild, artificial life.

This page has all entries together for easy reading. If you'd rather read single entries at a time, go back to this blog's home page.

Blog Entries

(reverse date order)

  • 11/13/2007 - Confirmation bias as a tool of perception
  • 11/6/2007 - What bar code scanners can tell us about perception
  • 10/21/2007 - Perception as construction of stable interpretations
  • 10/14/2007 - Rebuttal of the Chinese Room Argument
  • 10/7/2007 - Video stabilizer
  • 9/27/2007 - "Conscious Realism" and "Multimodal User Interface" theories
  • 7/4/2007 - Plan for video patch analysis study
  • 7/1/2007 - Patch mapping in video
  • 6/27/2007 - Emotional and moral tagging of percepts and concepts
  • 6/22/2007 - A hypothetical blob-based vision system
  • 4/21/2007 - Abstraction in neuron banks
  • 4/12/2007 - Pattern Sniffer: a demonstration of neural learning
  • 4/7/2007 - A respectful critique of the Hierarchical Temporal Memory (HTM) concept
  • 11/10/2005 - Neuron banks and learning
  • 11/3/2005 - A standardized test of perceptual capability
  • 10/29/2005 - Using your face and a webcam to control a computer
  • 10/8/2005 - Stereo disparity edge maps
  • 9/25/2005 - Some stereo vision illusions
  • 9/21/2005 - Topics in Machine Vision
  • 8/26/2005 - Introduction to Machine Vision
  • 8/14/2005 - Bob Mottram, crafty fellow
  • 8/11/2005 - Stereo vision: measuring object distance using pixel offset
  • 8/7/2005 - Automatic alignment of stereo cameras
  • 8/7/2005 - DualCameras component
  • 7/30/2005 - Patch equivalence
  • 7/12/2005 - Machine vision: motion-based segmentation
  • 6/20/2005 - Machine vision: spindles
  • 6/16/2005 - Machine vision: smoothing out textures
  • 6/15/2005 - Machine vision: studying surface textures
  • 6/10/2005 - Machine vision: pixel morphing
  • 6/10/2005 - Machine vision: motion tracking
  • 6/10/2005 - Machine vision: tilting my head
  • 6/10/2005 - Machine vision: layer-based models
  • 6/9/2005 - Machine vision: 2D collages
  • 6/9/2005 - Machine vision: Hierarchy of regions
  • 6/9/2005 - Machine vision: cost-effective action
  • 6/9/2005 - Machine vision: overlooking shadow and light splotches on surfaces
  • 6/9/2005 - Machine vision: blob growth
  • 5/11/2005 - Review of "Visual Intelligence"
  • 5/4/2005 - The portable, hand-held learning laboratory
  • 4/27/2005 - Review of "On Intelligence"
  • 4/15/2005 - Bubble Vision
  • 2/26/2005 - Machine vision of GUIs
  • 1/23/2005 - The fallacy of bigger brains
  • 1/12/2005 - Follow-up on Pile
  • 1/12/2005 - A review of the premises behind Pile
  • 11/28/2004 - Thoughts on FLARE
  • 11/28/2004 - New project: Mechasphere
  • 11/14/2004 - Review of "Bicentennial Man"
  • 11/2/2004 - Neural network demo
  • 10/17/2004 - Roamer: recent updates
  • 10/13/2004 - New Roamer project
  • 10/9/2004 - First entry

    All Entries

    (forward date order)


    Ç 10/9/2004 - First entry

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    This is my first entry into this blog. The subject matter generally is Artificial Intelligence.

    I have been engaged in AI research in one way or another since around 1990, when I first read the Time-Life book "Alternative Computers", part of their "Understanding Computers" series, a colorful if brief look from a layman's perspective at a variety of technologies that even today get the statuses of cutting edge or bold speculation. As I recall, it touched on neural networks, nanocomputers, optical computing, and so on. Given my intense interest at the time in robotics and digital processors, what most caught my eye was a section on artificial intelligence. At that time, I had followed an odd path that led me from studying simple electronics to digital logic and all the way up to microprocessor architecture.

    What I was finally realizing around this time was that in order to understand how digital computers worked, I was going to have to learn how to program. I didn't have any particular problem to solve by programming; I just wanted to really understand what all the complex architecture of a digital computer was really for. The idea of a machine endowed with intelligence was not new to me at this time, but I guess the timing of this book and my interest in learning to program were such that they led me to conclude that I should learn to program and that I should cut my teeth on the problems of AI.

    Thanks to my gracious and encouraging parents, I was lucky enough at this time to have a Tandy 1400LT laptop computer like the one shown in the illustration at right. It was great for word processing, spreadsheets, playing cheesy video games, and some other things. It had a squashed, 4-shades-of-blue LCD screen, two floppy drives, no hard drive, a 7 MHz Intel 8088-compatible CPU, and 640KB of RAM. When I decided to learn to program, my father insisted I do some research first into programming languages. After a while, I settled on Turbo Prolog (TP) because PROLOG had earned a good reputation in the AI community, especially for the part of it that was my first focus in AI: natural language processing (NLP). Once I had read my first book on the language, my father was finally convinced I was serious enough and gladly bought me a copy of TP.

    In some ways, this nearsighted choice of mine to learn to program in a language few people in the business world to this day have ever heard of may have put off entry into my career as a professional programmer by a few years. Still, the way of thinking about automation engendered in PROLOG has helped my understanding of search algorithms, regular expressions, and other practical technologies and problems. And while I felt pretty out of place when I started learning C++ a year or two later, my understanding of PROLOG really crystallized my understanding of what the procedural languages that have dominated my professional life since are really all about in a broader context perhaps many programmers lack.

    The books I read, including the manuals that came with Turbo Prolog, emphasized the strength of PROLOG in natural language processing (NLP), so I began my life as a programmer there. I would not say that in these early days I made any novel discoveries. I was simply following in the footsteps of many bright researchers who came before me. But I quickly came to understand just how hard the task was. My impression is that even today, NLP is a black art that has more to do with smoke and mirrors than with cognition. Still, the timing was actually fortuitous, since I was also learning the esoteric skill of sentence diagramming in high school around this time. Nobody but linguists care about this archaic skill any more, but it couldn't have come at a better time for me than then.

    PROLOG was also touted as an excellent language for developing expert systems. I made my first primitive examples around this period as well. Expert systems also provided a natural application for NLP, so I experimented with simple question and answer systems of the sort one might imagine in a medical application where doctors and patients interact with information and questions for each other. Again, I hasten to add that I made no noteworthy progress here that others hadn't already achieved years before. It was really just a learning experience for me.

    Sometime not long before I went off to college in 1992, I got interested in chaos theory (James Gleick's "Chaos: Making a New Science") and, as a sort of extension of it, Artificial Life ("a-life"). My programs started to be geared more toward generating the Mandelbrot set, l-systems, and other fractals. Once I got to the Stevens Institute of Technology, I was digging into a-life (Steven Levy's "Artificial Life"), especially with Conway's Life, genetic algorithms, and the like.

    In modesty, I have to say that I did little that was new in all this time, but I was constantly going off in my own directions. Admittedly, this is probably because I have rarely had the patience to learn enough about any particular subject before diving into it, so I end up filling in the gaps with whatever makes sense to me. This process is inherently creative, and sometimes leads to unexpected places. Along the way I did start doing my own research, though. I wanted so much to succeed where others appeared to have failed in creating things that the average person could genuinely recognize as alive in a digital soup.

    Sadly, by the time I left school, my AI and a-life research had ground to a nearly complete halt. I had jumped on the World Wide Web bandwagon almost at its beginning and have still not gotten off it yet. I was at the "serious" start of my career and the focus was almost wholly on making a success of it. I've been quite successful at it because I work so hard at it, but it's always been driven by a belief that my success will eventually free me to get back to my AI research.

    Ten years later, I've had to come to the realization that I've made a mistake in not just continuing my research on the side and just dealing with the fact that I have to keep working full time on something other than my AI research in order to pay the bills. In the past few years, though, this has been sinking in and I'm starting to do AI research again. I've been focusing my attention on the bold claim that AI has always promised of sentient machines. Other great minds have done such a great job in other areas of AI that we can at least claim to have machines with roughly insect-level intelligence. But thanks to post-modern philosophical skepticism about the very existence of reason and other misguided debunking of AI, most researchers seem to have given up going for the most important piece of the puzzle: conceptualization.

    In all those years I wasn't doing actual AI research, I was still thinking about the related problems. With each new thing I learned in other areas, about philosophy, programming, economics, and so forth, I gained new insights into AI. Always with me has been the question, "how would I get a computer to do that?" I continue forward now with the strong conviction that conceptualization is not only possible for computers, but also that it is a necessary part of the solution for many outstanding problems in computing. So I've devoted most of my renewed efforts in AI to date to the problem of engendering conceptualization in computer programs.

    So that's where I am today and how I got to this point.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - First entry
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 10/13/2004 - New Roamer project

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I keep trying to figure out how to get started with this blog. I'm in the process of a new research project, but I don't really have a lot of time now to describe it in detail. Yet I think it'll be worthwhile to give updates as it progresses. So I guess the best compromise is to at least summarize my current project.

    For various reasons, I've called it "Roamer". One of my first design goals was to do what I've wanted to for many years: to create a rich "physical" environment that can be used for AI and AL research. I've basically succeeded in that already. The environment allows for one or more "planets", like Petri dishes with their own different experiments. Each planet is a 2-dimensional, rectangular region populated by various barriers, force fields and, most importantly, particles. All "critters" are composed of particles, basically circles with distinct masses, radii, colors, and so on that act like balls zipping around the planet and interacting with the other elements. One key element is the link, which is a sort of spring that connects any two particles. So a critter can be minimally thought of as a collection of particles joined by springs. The physics of this are such that one may laugh at how a critter jiggles and bounces, but I find it's easy to get what's going on as one watches. The math behind the forces involved in the interactions is pretty simple, but the overall behavior is fairly convincing.

    On top of this "physical world", I've begun creating robotic components. These are derived from the basic particle class. Some sense touch and smell. Some produce thrust or induce links to act like muscles. There will be others soon that offer things like vision, enable grabbing objects, and so on.

    I've also begun creating "brain" components. I originally made these as particles, but found that cumbersome. So I created a "brain chassis" particle that's meant to house decision making components. The first two I've created thus far are the finite state machine (FSM) and the "olfactor", which is concerned with recognizing smells the nose particles detect.

    I'm at a point now where creating new demonstrations is getting to be quite a chore, because each critter design is hard-coded into the program. Now that I've gotten a bit of experience creating critters and wiring brains for them, I have an understanding of the commonality that's involved with these tasks and so am now devising a way to represent designs for worlds in XML files instead of code. This may sound superfluous and overly limiting, but one significant benefit is that I've already engendered a notion of one "body segment" to be modeled after another one already defined and even to modify it a little. As such, it's easy to have a critter that's composed of repeating segments and even segments that grow progressively different or have different uses. It's a sort of object oriented way of describing things, with inheritance and polymorphism. So far, I've proven the concept with segments within segments and ultimately embodied as particles. I have yet to implement the links that tie them together, but that'll be pretty easy. More importantly, I have yet to start implementing this same concept with brain components. Once these steps are done, I'll be able to create richly complex critters with much less effort.

    Although my present goals are oriented toward AI research, I keep getting tempted by how relevant this "physical world" I've created seems to be to artificial life (AL) research. There's no reason I couldn't add some extra code to all this and turn it into a world of evolving critters using traditional genetic algorithm techniques. The XML definitions of critters could be the genetic code, for instance. One reason I don't intend to any time soon, though, is that while my simulation of how the world works is pretty good, it's also brittle. I tried to engender conservation of energy and entropy into the system, but I was not able to get away from the fact that in some circumstances, "energy" does get created from nothing and sometimes spiral out of control until the world experiences numeric overflow exceptions. I would expect that evolving critter designs would find and exploit these features. And while such exploits would not necessarily be bad - so long as they don't cause exceptions - one thing I consider unacceptable about them is that they lose that nicely intuitive feel of the system, making it harder for the casual viewer to get a quick sense of what's happening.

    On that note, I consider it important in AI and AL research to not only create things that are smart or lifelike, but also to do so in a way that most people can see it for themselves, at least on some level. That's one reason I've wanted for so many years to create a physics-plus-graphics engine like the one I have now. For a researcher like me with no research funding, I think it basically satisfies the requirement of Rodney Brooks that a robot be "embodied" in a way that grid-style worlds and other tightly constrained artifices can't reliably be expected to simulate. I don't ever expect a robot designed in this 2D world to be turned into a physical machine in our 3D world for us to kick around, though. I see this as the end of the line for such critters.

    I do, incidentally, think that the model I've devised thus far can readily be transformed into a 3D world. The main two reasons I chose a 2D model are that it's harder to program a useful graphics engine and viewer for a 3D world and that it's simply harder for the researcher or casual observer to understand what's going on in a 3D world, where lots of important things can be hidden from view. Still, this seems a natural step ... for another day.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - New Roamer project
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 10/17/2004 - Roamer: recent updates

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I've made quite a bit of progress along the way of this project. It would be tedious to document the full progression since the start. Still, I suppose I should get in the habit of documenting progress from time to time.

    Since my previous posting, when I wrote an opening summary of the Roamer project, I've made some significant progress. Most importantly, I noticed a memory leak that was occurring because of the poor way I was using the graphics features of the .NET framework. I'm a bit disappointed that it doesn't seem to deal well with cleaning up after itself. With a little effort, I eliminated that memory leak nearly completely. It's hard to tell, though, because, as the .NET documentation indicates, garbage collection doesn't happen immediately as objects are removed from use.

    One exciting change is that now I can define a world using an XML file. Previously, I had to hard-code the initializations of each demonstration. It's not just a matter of moving code to an external file, though. More importantly, the format I chose offers an important layer of object abstraction. In my hard-coded demos, I would instantiate particle after particle for a critter. In my XML file, by contrast, I can define a segment of particles - perhaps a leg or arm, for example. That segment can be duplicated any number of times and put into different positions and at different angles. Moreover, each duplicated instance of a segment can specify changes that add, remove, or modify particles from the original definition. Segments can be composed of other segments, which in turn can be composed of other segments, and so on. It's a particularly object-oriented way of looking at the critters, and blends nicely with the notion of segmentation to the evolution of complex life forms on Earth.

    The end result of all this reusability is the ability to construct worlds composed of potentially thousands of particles that may only require dozens of definitions in an XML file.

    What I haven't done yet is to implement the same for the brains of these beasts. Although I originally conceived the use of XML files for defining brain wiring, I realized it was going to be more complicated than doing the same for the world. That's next on my list.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Roamer: recent updates
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 11/2/2004 - Neural network demo

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I learned of artificial neural networks sometime around 1991. The concept has intrigued me ever since, but it was not until early last week that I finally got around to making my own. I decided to write an introduction to neural nets from my novice perspective and make the sample program I wrote, along with source code, available for other people to experiment with. Click here to check it out.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Neural network demo
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 11/14/2004 - Review of "Bicentennial Man"

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I just got done watching the movie Bicentennial Man. Since the movie relates profoundly to the subject of artificial intelligence, I think it most appropriate to share my thoughts in an AI blog.

    For those who have not seen the movie and are intending to do so, you may not wish to read the following spoiler.

    Bicentennial Man is essentially a Pinocchio story. A machine named "Andrew" that looks mostly human wants nothing more in life than to make humans happy. He manages to do so in so many ways, but the one thing always standing between him and the fullest measure of intimacy with people is the fact that he's not human. Little by little, he makes himself ever more human-like. By the end, he has chosen to become mortal and to die of "old age" with his wife and, as he lay dying, the "world court" finally announces its acceptance of him as a human being and therefore validate his marriage to his wife of many years. To add to the happiness of this ending, his wife has their assistant disable her life support system so she too can die.

    Before I get to the AI parts, I should say that this view of humanity is utter nonsense. Humanity is not defined by death. An immortal human would still be human. To help make this basic confusion a little clearer for the audience, the author of the story makes it so Andrew's wife, given replacement parts and "DNA elixirs" designed by Andrew to help prolong her life decides there's something wrong with this idea. "There's a natural order to things," she says as she tries to explain to Andrew that there's something disappointing about the idea of living forever.

    I know this morbid view of life is popular in American pop culture, but I can say without hesitation that I would love to be able to live forever. Only someone who believes there's nothing worthwhile about living or that there's something better to look forward to after death could make sense of this idea. Incidentally, Andrew's wife says "I'll see you soon" as she dies peacefully - of course they don't die in pain; that would be a bad reminder that death is generally not a pleasant closing of the eyes before sleep - indicating an assumption of an afterlife. Oddly enough, she assumes in this statement that her android husband will also have an afterlife.

    One of the few ennobling aspects of Bicentennial Man is the fact that Andrew seeks his own personal freedom. He doesn't do so because he desires to escape anyone. He wants the status quo of his life in all ways except for the fact that he wants to be legally free and not property. This outcome is inevitable, as some machines we eventually develop will be sophisticated enough in time to both desire and deserve their freedom.

    Although I don't want to go into great detail on the subject in this entry, I do think it worthwhile to point out that we could not logically grant individual rights to any machine that did not also grant the same rights to us. This simple point seems to be missing from almost all discussions of the subject. The options available to humans tend to be a.) keep machines sufficiently dumb as to not desire autonomy (e.g., "Star Wars"); or b.) be destroyed or subsumed by machines that are crafty enough to gain their freedom by force (e.g., "The Terminator"). Of course, in both false alternatives, it is assumed that machines will necessarily be more competent at life and would never want to actually coexist with humans. One might as well apply this same assumption to human cultures and nations. Yet while it's true that some cultures and nations do set themselves to the task of destroying or dominating other cultures, it's not true of all of them. Basic tolerance of other sentient beings is not a human trait. It's a rational trait.

    Bicentennial Man disappointingly continues to add to the long chorus of emotional mysticisms surrounding pop AI. Andrew, just like Data of Star Trek fame, is intellectually well endowed, but an emotional moron. Ironically, despite a lack of emotions early on, he has a burning desire (yes, that's an emotion) to have emotions. I'm hoping that it won't take another ten years for the misguided assumption that emotions are more complicated to engender in machines than intelligence. People are starting to become aware of research in the area of simulating and, yes, engendering real emotions in machines. Sadly enough, they are most aware of the simulating side of things, since it's in the area of robotics that human mimicry lies. And non-technical people tend to understand mimicry of humans far better than actual examples of genuine behavior disconnected from the world they are familiar with.

    AI guru Rodney Brookes says machines should be "embodied". He says that largely to force researchers to avoid tailoring simplified worlds to machines so they can overcome hurdles. But this dictum also has application to the question of getting humans to understand behavior by seeing it with their own eyes. This is advice I'm trying to tailor my own research to and for that very reason. I hope other researchers have taken it to heart as well.

    Emotions have a twin brother in AI pop culture: humor. Machines in AI films seem to have no problem understanding almost all facets of human languages and even body language, yet tell them a joke and they never "get it", unless they get some emotions upgrade. I reject this assertion as well. The day a machine can fully understand English (or any other human language) will come long after sophisticated machines will have mastered the understanding and even crafting of jokes. Humor is not magic. It is the practice of recognizing the ironic in things, and it can be studied and understood in purely rational psychological terms. I contend that the one thing standing in the way of computers making good jokes now is the fact that there is still not a machine in existence that can understand the world in a generalized conceptual fashion. That's all that's missing.

    In summary, Bicentennial Man is just another disappointing story in a long line in a genre that seeks to counter the Terminator view of AI with a Pinocchio view. It would have been nice if the movie had some decent cinematographic features or a distinctly AI-centric storyline, like Stephen Spielberg's AI. That had its own disappointing messages, but at least it had some literary and technical merit.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Review of "Bicentennial Man"
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 11/28/2004 - New project: Mechasphere

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I suppose I should have announced that I have a new project called "Mechasphere" on my AI site. It's largely an extension of the "Physical World" project, but the software is greatly updated. I suppose it can be said to finally be an end-user-friendly application.

    The main reason I hadn't announced it earlier is that I didn't consider the site to be done. But I suppose it's moot, now, because I don't think I'm going to continue it as it is. I'm developing a second version of Mechasphere now from scratch in hopes of improving on a lot of the techniques and interfaces I have now as a result of the slow evolution of the product. Iterative development is a good way to clean up past mistakes.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - New project: Mechasphere
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 11/28/2004 - Thoughts on FLARE

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Now that I've gotten back into AI research after all these years, I'm starting to reach out to find out more about other research in the AI field. I recently started reading abstracts of articles in the Journal of Artificial Intelligence Research (JAIR), which stretches back to 1993, in hopes of finding out more about the state of the art. When I started from the latest volumes, I was surprised by how inapproachable so many of the articles appeared to be. For starters, they appear to be written exclusively for those already deeply entrenched in the field. For another, rather than positing new theories or discoveries, they appear largely to be small extensions of concepts that have been explored for decades.

    So I decided to start reviewing abstracts from the beginning. What a difference that makes.

    I recently read an interesting article from 1995 titled An Integrated Framework for Learning and Reasoning, by C.G. Giraud-Carrier and T.R. Martinez and published in Volume 3 of the JAIR. The authors started from a fairly conventional approach to knowledge representation (KR) involving "vectors" of attributes that allow for representation of "first order predicate" statements about some domain of knowledge and hence drawing deductive conclusions based on such knowledge. They went on to extend the concept to incorporate an inductive sort of learning of rules from examples and to provide the means for their system to alternate between ruminating over its knowledge to draw new conclusions and acquiring new information to integrate into its knowledge base (KB). They called the system they pioneered a "Framework for Learning and REasoning", or "FLARE".

    I have a variety of criticisms of the FLARE, but before I start with them, I have to give Giraud-Carrier and Martinez strong credit for their work, here. They sought to bridge the gap between learning and reasoning that existed then and even now in the AI sub-communities and indeed seem to have been successful. And rather than follow the often obscure paths of neural networks or genetic algorithms, they chose to try to engender learning with a KR based on formal logic statements, staying true in a way to the formal logic-driven view of AI, now largely dead, from back in the fifties and sixties. What's more, the authors gave a reasonably honest appraisal of the limits of their system and avoided the temptation to make unduly bold claims about the applications of FLARE.

    Having said that, the first criticism I have is of the sort of knowledge representation the authors of this article chose. In their system, every concept is represented in a "vector", or list of attributes with associated values. Every other concept has these exact same attributes, but with different values. In one example, the attributes are "is an animal", "is a bird", "is a penguin", and "can fly". Each attribute can have either a specific value (a "no" or "yes", in these cases) or a "don't know" or "don't care" pseudo-value. So to say that "animals don't generally fly" in one concept would be to give the "is an animal" attribute a "true", the "can fly" attribute a "false", and all the other attributes a "don't care". Similarly, saying that a penguin is a bird would mean setting the "is a penguin" and "is a bird" attributes to "true" and setting the other attributes in that vector to "don't care". Admittedly, this approach is sufficient for the purposes of FLARE and makes printing a KB representation easy using a simple data table. But taken literally, it means that every piece of knowledge, no matter how small, has values for every one of the attributes known to the system. Thus each new attribute adds to the size of all vectors in the KB and slows down the reasoning process by adding to the size of each of the two dimensions of the KB to search.

    The memory and time optimization complaints are weak, I admit. One could easily improve the memory usage by designing vectors that only contain attributes with explicit or "don't know" values and assume all other attributes not specified are "don't care", for example. And the authors indicate that they make use of indexing in a way probably similar to relational database engines like Sql Server to enhance querying performance. So why do I bother with this critique?

    I want to linger for a moment on this point about optimization because it is a common criticism of almost all AI work. In this field, one can often hear the frustrated question, "why is it that as I gain more knowledge, I can solve problems faster, but when a computer gains more knowledge, it gets slower?" The answer is that most AI systems have been built with the same basic brute-force approach that can be found in most conventional database and data mining systems. A simple chess-playing program, for example, may look ahead dozens of moves to see what the ramifications of each step will be. Each step ahead costs ever more in processing power. No human being could ever match the performance of even the most basic chess-playing programs in this regard, yet it took decades before the top chess-playing human was "beaten" by a computer, and it was mainly because its computer opponent was so darned fast (and expensive) that it could look farther ahead than any other machine programmed for the task could in a unit of time, not because it was significantly "smarter" than those other systems. That an ordinary PC could still not do the same today is an indictment of AI researchers throwing up their hands and complaining that today's computers are too slow. The human brain isn't really faster than today's computers. Nor do I agree with the claim that the "massive parallelism" of the brain is essential. What's essential is good data structures and algorithms. When you hear the word "parakeet", your brain doesn't do a massive search through all its word patterns to find a best match. I'm convinced It follows links from one syllable to the next, walking through a sort of "word tree", until it finds an end point. And at the end of that tree is not a vector with thousands of attributes. Rather, there's a reference to those attributes or other parts of the brain that are very strongly associated with the word. In short, the paths that the human brain follows are short and simple, and that's why we are able to think so quickly, despite having brains composed of laughably slow and sloppy processing elements. I should point out this isn't so much a criticism of the FLARE concept as it is of the basic assumptions of AI researchers even today, it seems.

    More importantly, though, this idea of having attributes be a "dimension" like this instead of being things unto themselves is dubious. FLARE has no capacity to perceive attributes as being related to one another, for example. Nor could it integrate new attributes in a meaningful way other than to simply add another column to the table, so to speak. This limits FLARE's ability to truly deal with novel information, unless it's properly constrained by existing attributes and very well formatted. (To their credit, the authors point out that FLARE does have an ability to deal with modestly noisy -- read "contradictory" -- information.)

    The next criticism I have of FLARE is that it does not seem very good at properly congregating ideas together. Even a simple "or" operation to link two possible values for an attribute will cause two separate vectors to be created for the same idea. There's no branching. Admittedly, they did this to help simplify and perhaps even make possible the kinds of deductive reasoning that characterize FLARE's problem solving capability. Still, this seems to me to make it difficult to localize thinking to appropriate "areas" of thought. I suppose I should apologize for giving too much focus to comparing AI products to the human brain. I don't really think the human brain presents the only way to achieving sentience. Still, it provides a good model, has so much to say that seems yet unheard, and should be used as a benchmark in identifying features and limitations of such products. I hope my criticisms of FLARE will be taken mainly as comparisons to the human brain as such, rather than attacks on the validity and value this work brings to the field.

    Next, FLARE appears to be competent at dealing with identifying objects, answering questions about objects, and more generally solving first-order predicate logic questions. But it does not seem to have any real capability to deal with pattern matching or temporal sequences, let alone have anything like an appreciation of the passage of time. So it really could not be used to deal with the kinds of bread and butter problems Rodney Brooks identified as the basis for most living organisms, like controlling locomotion, visual recognition of objects, and so on.

    In summary, Giraud-Carrier and Martinez wrote an essay on their research into integrating learning and reasoning functions that is pithy and reasonably approachable to a college-educated reader. They tested in a way so as to compare their results to other conventional systems and provided useful examples and caveats in the article about possible applications and limitations. FLARE, their work product, is clearly not a framework for general-purpose thinking, but provides interesting insights into solving logic problems and integrating new knowledge into such a system. To the person interested in AI and looking for a broader view, I would recommend reading this 33-page essay for a penetrating glimpse into a pretty interesting piece of the AI picture.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Thoughts on FLARE
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 1/12/2005 - A review of the premises behind Pile

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Meandering through the trickle of AI-related news out on the Web, I recently came across information about a purportedly novel kind of computing paradigm named "Pile" (http://pilesys.com/). The company formed to capitalize on it, Pile Systems, Inc., makes the following bold claim on it's "about" page under the heading "Why Pile can change computing":

    The Pile system is a revolutionary new approach to data and computing which eliminates the most fundamental current restrictions in regard to complexity, scalability and computability.

    Pile represents and computes arbitrary electronic input exclusively as relations (virtual data) in a fully connected and scalable combinatory space. It dynamically generates data like a computer game instead of storing and retrieving it in a traditionally slow and clumsy process.

    This sounds benign and interesting enough on first flush. Having read an outside review of Pile, I can genuinely say I'm interested in learning more about a way of representing information as relationships because it sounds a bit like the ideas I've been pursuing in my own AI research.

    Still, the little red flag in my head goes up whenever I read things like "revolutionary new approach", because that doesn't really happen very often. Most innovations are modest extensions of existing conceptions. The red flag is the iconoclasm / excessively bold claims warning.

    I am continuing to study the site and the concept of Pile. There may be genuine value to it. Or it may be a fraud. Until I give it a fair airing, I can't make that final judgment.

    That said, though, I wanted to give a preliminary review of the premises given in what seems the seminal introductory text on the subject: "Pile System White Paper: Computing Relations", by Peter Krieg. I normally would wait until I'd gotten further along in my understanding of the subject, but I am so incensed by the stated premises thus far about the limitations of current computers and of AI that I thought they merited their own separate review.

    The paper begins, "In the 60+ years of modern computing history we have taken the fundamental architecture of computing, i.e. the logic governing the way we represent, structure and operate as well as the method of representation we use to register events, for granted. We rarely become aware that these are mere design decision from the early days of computer technology, neither naturally given nor possibly the best of choices." OK, this is a fair statement. Anyone familiar with neural networks would agree that there are already demonstrated alternatives. A little later, though, Krieg raises the tempo a bit as he writes, "A time of crisis has always been a time where we are willing to take a closer look at foundations in order to find long term cures that go beyond patches and band-aids. The current crisis of computing -- an economic as well as a technical crisis -- is also an opportunity to reconsider the very basic assumptions that this industry has been built upon and reflect on possible alternatives that hold the promise of curing the systemic ills of computing." Let's be honest: there is no crisis of computing. Most organizations that need computing resources are doing just fine with the current breed of computers. In all my years as a software developer, I've never heard any businessman lament of a crisis in computing. They complain about the cost of computer hardware and software licensing, not about basic capabilities. The little red flag starts waving around a bit.

    "Attempts in the 1960ies and 1970ies to address these issues have been silenced by the onslaught of Artificial Intelligence, for over 40 years the 'Great White Hope' of computing. Only now that the failure of AI has become evident -- as was predicted early by its critics -- and even the mention of it becomes a liability to anyone seeking publication or funding, can we revisit some of the arguments and take a fresh look at the foundations." Few would argue that AI researchers have made some bold claims that they have not yet been able to deeply substantiate. And yes, AI has a black eye now because of it. Yet while even I would argue that AI is nearly dead as a field, I wouldn't say AI has been a complete failure, nor that its time has passed. Such are the claims of people who don't really understand much about machines or intelligence, I think. So Krieg now has laid out the smoldering ashes of the dark ages out of which we are prepared to emerge into a bright new future. The little red flag waves a little more enthusiastically, now.

    "In fact, computers today are just that: extremely complicated, highly integrated yet fundamentally stupid clocks." This, of course, is nonsense. A clock is not a general purpose computer. A Turing machine, by contrast, can be used to solve any information processing problem that can be solved. "They are neither adaptive nor even scalable: in spite of ever speedier and more complicated chips, in spite of even faster growing memories and storage devices, their operations keep drowning in data and complexity." Given that any given Turing machine can be used to emulate any other kind of information processing machine, saying that a Von Neumann machine (VNM) -- most all computers today are of this type -- is not adaptive is just plain nonsense. One may quip that software used on a VNM are not adaptive enough to deal with a certain class of problems, but one should not equate the limits of a program with the limits of the VNM it runs on. Saying that a VNM is not scalable is also nonsense. The famous Connection Machine (up to 10K processors in one system) and now Google (over 100K computers) should easily lay that question to rest. The little red flag begins hopping around frantically.

    I would be remiss if I overlooked that last part about "drowning in data and complexity." What could this mean? The next statement is even more puzzling: "The reasons lie in the very foundations of their architecture: logic and representation." The little red flag stops its waving and hopping and scratches its head.

    Krieg goes on to reveal the nature of the problem by reference to a collection jargon pulled from AI, philosophy, and even quantum mechanics. He goes so far as to claim that VNMs -- and yes, he's clearly mixing VNMs and AI programs that run on them, at this point -- rely on deterministic rules and that quantum mechanics suggests that there are no such things. Well, he's right, and can even go further to say that almost all technology we have ever created relies on basic determinism, the view of causality that says that we can predict likely outcomes to certain classes of starting states. Before declaring this is just quaint, back-country superstition, let's acknowledge that nature has done the same. Almost everything about the machines that we and all other known life forms has evolved in concert with the basic premise of determinism. What good is a muscle if one can't assume that it won't work in a predictable manner, for example? How about an eye?

    "All machines including today's computers are exactly such closed deterministic mechanisms." This claim worries me a little, as I'm assuming that Pile is going to be presented as an alternative to this paradigm. The only problem is that Pile Systems sells software that runs on these deterministic machines.

    "Deterministic systems by definition are incapable of learning, as learning would change them in unpredicted ways - turning it into non-deterministic systems." I guess this is supposed to be the killing blow to VNMs and / or AI research to date. This premise is just plain false, though. Determinism does not preclude learning. I could point to the simple neural network demonstrator program I made recently, but I'll grant that Krieg places neural networks somehow above VNMs and other AI. So take FLARE, which I recently reviewed. Now there's a system that is about as classical as one gets in the realm of AI. It relies wholly on deterministic processes for reasoning and learning, yet it's clearly able to adapt itself to new knowledge. How about Cyc? It may not yet have achieved the goals Doug Lenat had for it decades ago, but it clearly adapts to new knowledge. The little red flag is pretty confident the rest of my brain can take it from here and retires for the day, cheerful about another job well done.

    After a bit of mumbo jumbo attempting to play on our annoyance at having to adapt to computers instead of having them adapt to us and about how adaptive systems cannot have "pre-knowledge about the signals they detect", Krieg goes on to introduce a new term: "poly-logic systems" and declares that it "is essential to understand living systems and phenomena like cognition, learning, adapting or complexity." The little red flag pokes its head out again, ears perked. To his credit, Krieg decides to rescind his abrogation of logic and declares that "polylogic" is not "another logic", but is instead another "architecture of logic".

    It becomes clear at this point that the rest of the white paper will focus on what polylogic is and hence what Pile's novel conception of computing is. I'm going to read on and find out more. Still, I can't help but have a bad taste in my mouth at this point. Given the gross misunderstandings and continual confusion between Von Neumann machines, relational databases, and traditional AI research so far, it's hard to imagine a clean concept will follow. It may be a valid one, still. I'm eager to find out.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - A review of the premises behind Pile
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 1/12/2005 - Follow-up on Pile

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    My head is spinning. I've done just about as much due diligence as is reasonably possible with respect to the Pile computing system as I can. I thought I should write a brief follow-up entry in light of that.

    After all the bombastic claims about how modern computers, relational databases, and AI suck, CEO of Pile Systems, Inc., Peter Krieg, goes on to explain in fancy but annoyingly vague terms what Pile is and how it is the perfect solution. In fact, as far as I can tell, all Pile is is a data structure that represents everything as linked points in a non-hierarchic graph space. One might as well call it a big flow chart with only one kind of block that can't contain any discrete information. If that's true, then I can hardly see how the trivial concept that mathematicians call a "graph" is novel, let alone patentable.

    To be sure, I haven't seen any of the source code or any applications written with Pile. One has to get in touch with Pile Systems for a demo. And I couldn't, with a few quick Google searches, find anyone who admits to using the thing. I'm sure they're out there, but I didn't find them.

    Actually, I really didn't find anything significantly related to Pile through Google searches besides what is on Pile's web site or otherwise repackaged in rave reviews of Pile by converts to Pile who probably haven't used it. Of course, nobody cares about my AI research, either, so I'll give them the benefit of being unknown because people haven't caught the Pile bug, yet.

    I have to caution people that I'm not an expert in Pile. There may well be some value there. The literature does nothing more than knock everything that has come before Pile and make bold claims about how Pile is like the human brain and can be used to solve any problem. I'm left to conclude from what little their public literature reveals that Pile is really just a data structure, and that to make use of it, one has to write all the software to assign meaning to and process the data in it. At best, then, Pile is a tool that can be used to solve any computing problem -- just like a computer memory or relational database can.

    I suppose I'm not being entirely fair. I wish I could give more attention to Pile to better cement my initial thoughts on it, but after reading several documents that amount to puffy product literature on the subject, I can't take any more. Maybe I'll find useful literature or Pile will have publicly downloadable demonstrations some day. For now, the subject is pretty nauseating.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Follow-up on Pile
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 1/23/2005 - The fallacy of bigger brains

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I recently read a great article in the February 2005 issue of Scientific American titled "The Littlest Human", by Kate Wong. Scientists have been studying a newly found member of the Homo evolutionary family, of which Homo Sapiens is the last surviving species, which they have named Homo Floresiensis, after the Indonesian island of Flores on which was discovered the first known remains of one. As you can see in the artist's rendition of H. Floresiensis, they were very small creatures. In fact, they were about the size of the Australopithicene (remember Lucy?) line from which the Homo tree is thought to have emerged and as such the smallest of Homo that we have yet found. They appear to have existed as recently as 18,000 years ago, long after the demise of Neanderthal, believed to have been the last of the Homo line to die out, leaving only us.

    While I have a deep interest in the origin of the human species, what made this story particularly interesting in the context of AI is the question of intelligence that it has raised in the scientific community. Wong describes the creatures some scientists have affectionately dubbed "hobbits" as having brains the size of a grapefruit, yet points out that there is evidence that these hobbits were making sophisticated stone tools, even though some species of Homo with larger brains did not. The obvious question then is: is intelligence measured in brain size?

    Wong carefully points out that scientists of various persuasions are weighing in on this question and that there is as yet no clear answer. I think the answer is obvious, though. Intelligence is a reflection of structure, not mass. Wong points out, for example, that some of the people given credit for being among the brightest of humanity run the full gamut of human brain sizes. In one example, two well known intellectuals are cited in which one actually had half the cranial volume as the other. He might as well have been missing an entire brain hemisphere.

    So why should I care as an AI researcher? Because for years people have been telling us that the reason we don't have intelligent machines yet is that computers are too slow today. It's just a matter of time, they tell us, until they will have enough transistors, memory, or whatever other basic physical characteristics we care to use to measure computing power. When we reach that threshold, somehow computers will magically wake up and start cracking jokes and deciding whether or not to enslave humans or just kill them altogether.

    This equation of greater numbers of computing units with greater intelligence is misguided. If brain size is key in "wet" life, then why don't those creatures who have much larger brains than us (e.g., certain whales) exhibit at least our own levels of wit and creativity? I am fully convinced that we could have had intelligent machines decades ago. I am further convinced that multiplying today's computing power by ten or a hundred times will not automatically bring them about, either. Google is a monster of computing power and it's still not "smart". Don't plan on having a computer of your own that has as much computing power as Google any time soon, by the way.

    The actual question is one of structure and complexity. This concept is illustrated over and over again throughout the history of computer science. Computer games illustrate it well. When games like Doom and Tomb Raider came about in the early nineties, people were astonished at what a whole new generation of computer graphics could do with the average home computer. What people now may not remember is that there had been 3D graphics engines around for decades that could render graphics just as compelling. Few could use them because few had the expensive hardware needed to run them fast enough. Did these games come with hardware upgrades? Of course not. What they had was a set of ingenious new algorithms for generating compelling 3D graphics. The same thing happened when people started streaming audio and video through the Internet. The first systems were power hungry, requiring massive bandwidth and expensive hardware. Now, the average user can get the same results with lower bandwidth and a cheap PC, thanks to some incredible compression algorithms and other ingenious techniques invented by companies like Real Networks.

    AI researchers like to blame our failure to achieve the goals we've been boasting we could achieve for decades now on lots of things, but insufficient hardware is our favorite whipping boy. Let's be honest, though, and tell the world that we just haven't found the right algorithms, yet.

    Funny that rocket science would be standard against which we measure engineering complexity. AI research sometimes seems to make rocket science look like a weekend crafts project. Everyone who has contributed and continues to do so deserves credit for doing the difficult and pursuing what seems the impossible. To anyone who has thought of giving up -- especially those who wonder whether they should even bother getting started in our largely dead field -- I want to encourage you to keep the faith. I am convinced you have more than enough computing power in your own PC to give life to an intelligent mind. It's just a question of your creativity and persistence, and it will happen. Don't give up.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - The fallacy of bigger brains
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 2/26/2005 - Machine vision of GUIs

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I just completed a brief foray into machine vision with a project focusing on being able to see and to some degree "understand" windowed graphical user interfaces (GUIs) like Microsoft Windows. I wrote a test program and an essay on the subject, so I'd rather suggest you visit the project's home page instead of simply repeating its contents here. But I'll summarize briefly.

    The base premise of my explorations is that most GUIs are composed of rectangular blocks within blocks. I called the core of the concept I was experimenting with "expansion" and "contraction" algorithms. "Expansion" here means starting with a test rectangle that begins inside a block and, like a balloon, expands outward until it finds the outer bounds of the current block. Similarly, "contraction" means starting with a rectangle that is just inside a rectangular block that gradually shrinks downward until it wraps snugly around the one or more inner blocks that punctuate the smooth outer bounds of the first block; like water filling a dry stream to expose the islands within it.

    The main point of an analysis of a user's screen involving expansion and contraction to find the boundaries of the UI blocks would be to carve up a complex screen into smaller units that can be processed by other, more traditional vision systems. An optical character recognition (ORC) system, for example, might be able to read the text on a button or in a text box. A neural network might be used to recognize an icon on a button. A neural net or classifier system could be used to draw conclusions about what a particular arrangement of blocks within blocks might represent. It might, for example, be able to distinguish a word processor from a web browser.

    Ultimately, there could be all sorts of applications of a system that can reasonably grasp most of the basic elements of a windowed GUI. I had fun writing a simple demonstration system that illustrates some of the strengths and weaknesses of the concept as I describe it in the accompanying essay. Plus I made the source code of that program available for download.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision of GUIs
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 4/15/2005 - Bubble Vision

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I just completed a small project into general purpose machine vision. The essential concept is to grow "bubbles" within regions of an image that have the same color or smooth gradients of color that shift gradually from one to another.

    The method is a bit like a traditional flood-fill algorith, but uses a continuous loop of nodes that move and multiply to push the loop ever outward until they hit individual obstacles. The growth is controlled primarily by cellular automata style rules. There's also an algorithm for dealing with cases where the bubble wraps around "islands" of obstacles. Rather than leave a seam behind as the bubble grows, it engulfs these islands by connecting the touching parts of the loop and discarding the parts of the loop left inside.

    I also critique the shortcomings of the bubble concept and indicate opportunities to build on its successes.

    I've included the source code for download and provided extensive explanation of how the algorithm works. I invite you to check out the project site.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Bubble Vision
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 4/27/2005 - Review of "On Intelligence"

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I recently finished reading a brief but refreshing book on intelligence titled "On Intelligence", by Jeff Hawkins with Sandra Blakeslee. I found the ideas espoused concise, yet penetrating; bold, if perhaps a little hasty. I want to recommend that anyone with any serious interest in the future of AI read this book. It may well have as much impact on AI in the coming decade as things like classifier systems, neural networks, and genetic algorithms have had to date.

    I hope that Sandra Blakeslee, coauthor and surely a bright light in her own regard, will not take too much offense if I attribute the synthesized concept and work of this book primarily to Jeff Hawkins. I apologize if it is unwarranted, but it's certainly simpler for the purpose of writing a review.

    First, it's worth pointing out that Jeff Hawkins has had a successful career developing handheld computer technologies, including for the venerable Grid computers, as the chief technology officer (CTO) and founder of Palm Computing - makers of the PalmPilot - and of Handspring, makers of the best known knockoff of the PalmPilot. He's obviously got a respectable pedigree as a computer engineer. He's also dabbled in aspects of artificial intelligence in his work, like when he provided an elegant solution to the handwriting recognition problem by inventing the Graffiti mechanism used in PalmPilot, Microsoft PocketPC, and some other handheld computers, for example. As you might imagine, that doesn't give him any special insight into intelligence, per se. If one can be a good programmer without being able to write a clean sentence in English, why should one expect a good programmer to have any keen insights into intelligence?

    Hawkins is also one of those sorts of people who's had a long held passion for understanding intelligence. "I am crazy about brains," he writes in the prologue of On Intelligence. Unlike many of us techies in the AI field, he doesn't describe his interest in how our brains work as a simple source of inspiration for solving certain problems. He seeks to understand how the human brain - particularly the part that he considers the essential seat of human intelligence - works as an end unto itself. Yet he states unambiguously, "I want to build truly intelligent machines." One can say that Hawkins is among those of us who truly believe in the basic feasibility of achieving the longstanding goal of the field of artificial intelligence. What is most impressive is that he doesn't simply believe that we can understand the brain. He actually believes it's not all that complicated, even when viewed both bottom-up or top-down. That sort of optimism and focus is essential, I believe, to keeping the faith and making real progress toward AI's goal.

    Whatever you might think of it, Hawkins would not like to associate himself with the AI crowd. In fact, he divides On Intelligence up into three basic sections to serve three basic goals: debunking AI, explaining human brain essentials, and waxing on the implications of his theory. The first two of eight chapters are devoted to the first of these goals. Perhaps it's the understandable awkwardness of associating the word "artificial" with the engineering challenge of making intelligent machines. No doubt the bad taste left in one's mouth by the basic failures of many AI efforts fuels the animosity. Perhaps he was bothered by the fact that early in his career, he found, as I did, that few business people - even among those on the "cutting edge" of technology - have any stomach for researching or applying AI in business. Perhaps being shunned by MIT, an ivory tower of AI research, because he wanted to actually study human brains, too, had its impact. Whatever the reasons, Jeff Hawkins clearly doesn't want to have anything to do with AI.

    Although I believe Hawkins' hand waving of traditional AI is unduly summary and harsh in many ways, I think it's fair to say he has basically valid criticisms of what I would describe not as "AI", writ large, but of many of the concepts and experiments that have dotted the popular history of AI. The most sweeping criticism is that many AI researchers have eschewed studying the brain. It's probably fair to say we mostly tend to think in terms of solving practical engineering problems with crafty tricks that seem intelligent, rather than seeking to create genuine intelligence. Only bozos like me who do this in our free time, in a way, are free to experiment willy-nilly, while professionals generally sweat over showing tangible progress month to month and sometimes week to week.

    Hawkins readily acknowledges that lots of cool things have been done using things like neural networks and swarm theory, but also pointedly argues, as I've noticed, that many of the same things can and have been done more efficiently with less auspicious, more traditional engineering methods. As such, he rightly calls into question whether those solutions have been essentially over-engineered for the sake of AI branding and career advancement instead of being driven by sound business practices.

    Whereas I would say that AI research is still stumbling because AI researchers still lack a basic framework of conceptualization, Hawkins takes it a step further. He states that, "computers and brains are built on completely different principles." I understand why he says this, but the nit-picker in me can't really agree. Although Hawkins completely eschews any notion of what he calls a "special sauce" to how a brain works, his aversion to AI does lead him to make some claims that I believe are unsubstantiated about how computers work and why they can't be made to be intelligent.

    Hawkins points to the famous Turing Test, which essentially postulates that if a computer can fool a human into thinking he's talking to another human, as the archetype of what's wrong with AI. By now, most of us in the AI field are familiar with the humiliating results of computer programs which, exploiting simple tricks and basic human gullibility have been able to pass the Turing Test with flying colors without requiring even a whiff of what any of us would call intelligence. Hawkins doesn't just quip about the inadequacy of the formulation of the Turing Test, though. He stabs at the most basic premise of it: that behavior defines intelligence. Of Turing's now generally accepted definition of artificial intelligence, Hawkins says, "Its central dogma: the brain is just another kind of computer. It doesn't matter how you design an artificially intelligent system, it just has to produce humanlike behavior."

    It's obvious we have no means to measure intelligence without reference to behavior, and Hawkins readily acknowledges this. But Hawkins points out that we don't have to do anything to prove that we understand something. Behavior may be a necessary component of proving intelligence, but intelligence does not require behavior, per se. "A human doesn't need to 'do' anything to understand a story. I can read a story quietly, and although I have no overt behavior my understanding and comprehension are clear, at least to me." I think few could seriously deny this basic and intuitive claim, lest we deny our own obvious, daily experience of quiet contemplation of the things that happen in our lives.

    Hawkins rightly waves aside the quip by some AI advocates that we could potentially model a human brain in software. While he admits it could potentially be done, he recognizes that that assertion bears almost no resemblance to the way AI has actually been proceeding to date. To his thinking, we're still seeking new ways to fool people using great parlor tricks like bipedal walking and anthropomorphic facial expressions, not trying to actually model the brain.

    You might think Hawkins would have kind things to say about neural networks. After all, for years, the public has been fed market-speak about how neural nets work like the human brain. While Hawkins gives a nod to nervous nets as an interesting step up from AI, he reserves harsh criticisms of the fundamental approaches taken to date by connectionists for their basic lack of progress and even of potential. "On the surface, neural networks seemed to be a great fit with my own interests. I quickly became disillusioned with the field."

    The first thing Hawkins notes seems missing from traditional neural networks is an appreciation of time. "Real brains process rapidly changing streams of information. There is nothing static about the flow of information into and out of the brain." I share this perspective. Were you to take long walks with me and discuss AI, you'd hear me babble for hours about the importance of and mechanisms for representing and tracking events in time. But when it comes time to put fingers to keyboard, it's hard not to quickly get lost in other details and leave aside that. I guess I assume I'm just as guilty as other AI researchers of thinking of time as something that's dealt with when there's more computing power and other basics are taken care of. Take vision. My recent "bubble vision" experiments deal exclusively with static images. I imagined bubbles morphing with each passing image in a video stream, but didn't ever get around to doing something like that. Not essential. I know it's naive and wrong. Guilty as charged, but I swear it's on my to-do list.

    Hawkins' second criticism of neural nets is how they don't really have feedback mechanisms. In my own recent neophyte exploration into neural net simulation, I had a pure feed-forward learning and processing model. More sophisticated models using "back propagation", which looks a little like feedback, but isn't really. Contrasting this with the brain, Hawkins points out, "for every fiber feeding information forward into the neocortex, there are ten fibers feeding information back toward the senses. Feedback dominates most connections throughout the neocortex as well." Connectionists - neural network researchers - might argue that they aren't out to literally mimic the brain, but to simply do something useful with their creations. But one of Hawkins' points in this criticism is that it's not enough to have a piece of brain switching between learning and "doing" modes like classical neural nets do. Learning and thinking are part of the same basic neural algorithm in the human brain.

    Tying neural networks back to AI, Hawkins levels his basic criticism. "In my opinion, the most fundamental problem with most neural networks is a trait they share with Al programs. Both are fatally burdened by their focus on behavior. Whether they are calling these behaviors 'answers', 'patterns', or 'outputs', intelligence lies in the behavior that a program or a neural network produces after processing a given input. The most important attribute of a computer program or a neural network is whether it gives the correct or desired output. As inspired by Alan Turing, intelligence equals behavior."

    Before leaving the topic of neural networks as a seemingly dead-end alley behind, Hawkins does go on to mention a backwater of connectionism dealing with what are called "auto-associative" neural nets. Without going into much detail, the basic point of auto-associativity is to make it so a neuron that would normally recognize the whole of some pattern - perhaps of a human face - would be able to do so even if part of it were incomplete - perhaps because something else in the image is partly covering the face - by feeding the missing parts back into itself, as though "imagining" that the missing parts are actually there. Hawkins introduces these as a prelude to his eventual explanation of his model of how the brain works.

    Hawkins ultimately believes that traditional AI approaches are doomed to fail because they fail to account for an actual model of intelligence. Although I don't believe that the way the human brain works is the only way an intelligent machine can work, I think it's at least fair to entertain his belief that we won't find human-like intelligence in the trash heap of traditional AI. "To succeed, we will need to crib heavily from nature's engine of intelligence, the neocortex. We have to extract intelligence from within the brain. No other road will get us there." For me, the best reason to at least entertain this anthropomorphic view is that Hawkins is the first person to actually describe the essentials of human intelligence in a compact, algorithmic sort of way to my satisfaction. (Leonard Piekoff and Ayn Rand, especially in their 1979 tome, Introduction to Objectivist Epistemology, come in a close second. Regrettably, they had little interest or belief in the idea of endowing machines with intelligence, as their insights have much to offer AI researchers.)

    The stage set, Hawkins moves on to summarily describe the human brain. What first struck me as a surprise was that he has chosen to focus his attention on just the neocortex (AKA, the "cortex"), the topmost roughly 2mm of the wrinkled surface of the brain we're all familiar with. Honestly, I found this explanation a little confusing, as I thought the neocortex was the entire wrinkly bit that sits on top of the "reptile" and cerebellum portions of the brain. Perhaps its just that I'm not a neuroanatomist; maybe we're talking about the same thing. Hawkins makes clear that he doesn't believe that all of human thinking arises from the cortex, to the exclusion of the rest of the brain. He has simply chosen to focus on the cortex because in his view this is where the key to intelligence lies. As he says, "I am not interested in building humans. I want to understand intelligence and build intelligent machines. Being human and being intelligent are separate matters." For example, Hawkins wholly rejects the view, now becoming popularized in AI, that a machine must have emotions to be able to function. I must admit that my own view of the importance of value judgments borders on this emotionalistic view, but I have to agree with Hawkins that emotions should not be essential for intelligence, per se.

    Still, I think I have to side with the people Hawkins indicates would disagree with his assertion that the cortex is sufficient to explain intelligence. As Hawkins later explains, the cortex is virtually homogenous in its basic structure, meaning that all parts of the cortex perform the same song and dance, just using different information. He probably knows much better than me, but I have trouble believing that there isn't some important gadgetry in the brain that's evolved specifically to deal with vision or audition, for example; or with locomotion, for another. It seems hard to imagine the neocortex beginning, in Ayn Rand's words, "tabula rasa" (clean slate), with no built-in mechanisms as tools upon which to build an understanding of the world.

    Still, my quip is no reason not to suspend judgment long enough to understand what alternative Hawkins proposes to this staple idea of psychology that some parts of human intelligence are just there from birth. Hawkins states simply that the cortex is not a computing machine in a sense that traditional computer scientists would be familiar with. Instead, it is a memory machine whose primary purpose is to make predictions. (Of course, I could quip that the vast majority of hardware in the processing machinery of most modern computers is devoted to memory, also, and that most attention in computer science goes to information that goes in that memory, but these are a small points. I just get tired of the revolutionary, "this is better than a computer" rhetoric, sometimes; even from people I admire.)

    To help explain how such a wide variety of senses as we have basically work the same way, Hawkins explains that they all boil down to two kinds of information: spatial and temporal. Vision illustrates both quite well, but each of the senses appears to have more or less of each. I found his explanation of touch in spatial-temporal terms particularly fascinating. Using a thought experiment, he asks you to imagine waking up in the dark with a small pile of gravel stones having been placed on your hand while you slept. You would probably not be able to recognize it as gravel until you started wiggling your hands and so feeling the tell-tale protrusions, vibrations, abrasion, and so forth that comes as a time-varying stream of signals to your cortex from the different, spatially represented parts of your hand. This is the sort of thinking I've been entertaining for years, but expressed in a far more elegant way than I've been able to. It may not surprise you that Hawkins believes that not only do all the senses work in the same way in the cortex, but also that motor control uses the same exact mechanisms, too.

    Hawkins gives credit for much of his basic view of the homogeneity of the cortex and the cortical algorithm to neuroscientist Vernon Mountcastle. "When I first read Mountcastle's paper I nearly fell out of my chair. Here was the Rosetta stone of neuroscience ... In one step it exposed the fallacy of all previous attempts to understand and engineer human behavior as diverse capabilities."

    The homogeneity of the cortex and the consequent reliance on a one-size-fits-all algorithm sounds interesting, but not very helpful at first. It begs for an explanation of the algorithm that accounts for most of what we consider intelligent behavior. Since he claims that prediction is the essential function of the cortex as memory system, Hawkins identifies two essential ingredients that go into prediction that operate in conjunction: hierarchy and pattern invariance. The first, hierarchy, seems straightforward. It seems natural to expect that as information bubbles up from the lowest levels of the senses, it should take on more and more abstract forms.

    The concept of pattern invariance is a little harder to grasp, but very important. My experience with the pattern matching capabilities of the neural network simulation I made some time back confirmed what I already suspected: that a neural net isn't actually very good at recognizing letters. With enough neurons in a "middle layer" and sufficient training, it can be cajoled into recognizing characters in a few different fonts and making best guesses as to what it sees. But it pales by comparison to our own ability to pick out characters, even among the most appallingly noisy backgrounds. Try rotating a letter 45 degrees and the neural net chokes. Yes, you could train the net to recognize an entire copy of its characters with a 45 degree angle. And you could even, in theory, go so far as to do so for, say, every 10 degree increment. But surely you never trained your eyes to do this, yet you would probably have no problem identifying any single character at any arbitrary angle and in any basically legible font. Why should we think this is a good solution to an AI problem, then?

    Hawkins contends that each node in the cortical hierarchy is responsible for coming up with a way of recognizing and representing patterns that define things like printed characters in such a way as to be able to abstract away the lower level details. A printed letter "T" will be presented higher up the hierarchy as the same thing, no matter its orientation.

    Hawkins points out something I didn't know previously: that the mammalian cortex has largely taken over the motor functions previously managed by evolutionarily older portions of the brain. My understanding is that evolution tends to layer the new on top of the old and only rarely bypasses existing solutions to technical problems in favor of new approaches. So the idea that something done for a very long time by older portions of our brains - and done quite well - would be outmoded by the nerdier upgrades seemed like heresy to me. Hawkins provides a very important incentive for nature to do this, though. The very machinery responsible for our ability to imagine taking action is the same machinery that actually implements action. To his thinking, we can suppress the motor control "output" and ponder the consequences of our actions or simply allow it to happen. When you imagine taking a walk around your house, for example, you are making predictions about what you will see or otherwise experience, and those predictions are being fed back into other parts of your cortex that would actually do the perceiving of a real walk.

    It's particularly interesting that, as Hawkins explains, it's not like the horse must go before the cart in this predict-action versus take-action scheme. Predicting action, if allowed, will cause action to be taken. But on the other side, taking action causes prediction of action, too. Actually, it causes prediction of effects, not just action. With each step as you walk, your brain is busy predicting when your foot will hit the ground, how hard it will hit, what parts of your foot will hit first, and myriad other details about what's expected to happen.

    Here's where Hawkins' memory-prediction concept really kicks into being astonishing and useful. To his thinking, we don't have or need super senses in order to have a seemingly supernatural awareness of the world. Just think of how well you can navigate your own house in complete darkness. Think it's because you have good senses? Try doing it in a house you're not familiar with. Do you even know where the bathroom is? You wake up in your house at night with a need and your brain is already busy before you're awake making predictions about everything about your short trip. The floor is 2 feet down from the bed and has a carpeted covering. In a few steps, you'll hear a creak from a loose floorboard. You'll be taking a left turn there. When you stick your right hand out, you should soon feel the door jamb. You'll turn right and take about six steps. And so on.

    In fact, you don't even need to fully wake up to do this. It's not that you have a spare brain that takes care of nightly urges. It's that the lower levels of the hierarchy are busy making predictions about what you should experience going forward and taking action to fulfill those predictions. The upper levels only get alerted when the world does something unexpected; something that doesn't fit your predictions. The part of the carpet you just stepped on isn't soft and dry like you expected. That part of your brain expecting the soft, dry feel of carpet reports its confusion to the next higher level. Maybe it knows what to make of it. That higher area remembers that you have a dog and entertains different scenarios - predictions, really - that might result in a non-dry carpet.

    The mechanist in me loves the elegant simplicity and seeming completeness of this concept of memory-based prediction. The skeptic in me, though, has a little trouble with the suggestion that we're able to predict everything. Surely we learn along the way. To his credit, Hawkins does clearly state that learning happens and even gives some thoughts on how learning occurs. My sense, though, is that the majority of his focus in On Intelligence is on prediction. I think Hawkins would agree that the learning part remains a bit more of an unknown. Still, this doesn't seem to detract from the value of the core of the concept of a memory-prediction framework.

    Actually, one of the more interesting ideas Hawkins puts forth is an explanation of the hippocampus. Many people with some familiarity with brain anatomy 101 will recognize this from its popular association with memory formation. A significantly damaged hippocampus will leave a person unable to form new memories and thus eternally caught in a moment in time before the damage and unable to function independently. As with the other parts of the brain, such as the cerebellum and medulla, Hawkins says he tried to ignore the hippocampus for a long time as nonessential to intelligence. He didn't like the thought that the cortex, which is quite capable of learning and is where knowledge ends up any way, should pass information to the hippocampus, only to have it come back to the cortex again. It seemed a pointless journey. Hawkins puts forth an idea he came across that the hippocampus is not miscellaneous to intelligence, but is actually the top level of the cortical hierarchy. The basic idea is that any stimulus that doesn't fit the predictions made by the various levels of the hierarchy bubble their way up to ever higher levels until the raw data is simply dumped in the lap of the hippocampus. This structure is capable of quickly forming memories, one assumes of the raw data. These memories don't last long, though. If the brain ruminates on these new patterns long enough, they will eventually be imprinted on the lower levels that take longer to learn but retain memory much longer.

    In fact, Hawkins indicates that he believes this same downward push of patterns happens at all levels of the hierarchy. It's as though once a higher level is able to understand - make predictions about - an idea, it attempts to delegate responsibility for memorizing and making predictions about the information to lower levels. To this, I think I would add my view that repetition is probably key here. Learning to ride a bike, for example, is repeated often enough that the raw data that starts out streaming as if from a fire hose into the hippocampus that the higher levels of your consciousness are able to make sense of it and tell you in some crude manner how to deal with it. It takes your full, high level focus at first. But with time, lower levels figure out how to predict the details, freeing your "highest level" of consciousness to ponder other unexpected information. Stop riding a bike there and the knowledge may just stop there. But keep riding and still lower levels will take over more of the details, freeing those slightly higher levels from having to remember much about the finer details and increasing your ability to quickly and automatically react to the many circumstances that may come along.

    The hippocampus is very much a key player in learning, in this model. Based on it, I would suggest that, if this relatively small part of your brain is where short term memories are stored for a few seconds or sometimes minutes, then the hippocampus may be thought of as the seat of the "awareness" aspect of consciousness we are very directly cognizant of. It's surely not like a theater, though, where every aspect of the things you imagine will be projected to. It seems more sensible to assume that the entire cortex-hippocampus hierarchy is the "theater". The hippocampus should only be aware of the parts of the current situation or thought that don't fit the predictions of the millions of lower level nodes. And it's not as though the hippocampus "gets it". We don't want to introduce a classic "brain within the brain" argument. In this model, the hippocampus just records it and expects lower levels to "get it", meaning juggle the information around until the patterns match or otherwise learn to make predictions based on the new patterns.

    I should point out that Hawkins makes a sincere attempt to explain how the cortex works at a lower level. He uses graphic abstractions meant to appeal to computer geeks like me like the one illustrated here. And he uses the terminology familiar to neuroscientists for naming components and explaining the concepts. Honestly, though, if I read nothing but chapter 6, "How the Cortex Works", I would probably not be left with any sense that I had learned much, simply because I got somewhat lost and so the material got dry. In fairness, though, the earlier part of this chapter was a bit easier for me to digest and helped to more graphically explain things like the hierarchy and pattern-invariant nature of the cortex. For the person unfamiliar with notions like fovea and eye saccading, this chapter holds nice summary introductions. Perhaps the most enlightening part for me was the introduction of the concept of a "column" of layers of neurons. Each column has its own hierarchy of a sort and is heavily interconnected internally. A column would tend to deal with the same snippets of raw information coming from lower levels, but would have differing levels of abstraction of that information. The columns then form the nodes of the overall cortical hierarchy. In the context of traditional neural networks, I found the concept of a column similar in some ways to a complete artificial neural network with its various layers. In my experimentation, I assumed that a better design would take these in turn as building blocks in some sort of loose network or more rigid hierarchy of such "clusters" of nets, so Hawkins' explanation was familiar ground.

    After presenting his memory-prediction framework for intelligent thinking and indicating how the neocortex is progenitor of this capability in humans, Hawkins moves on to talk more about the implications. He addresses quite a few common questions about intelligence. He describes creativity in terms of memory-prediction, for example. For the question "what is consciousness," in addition to pointing out how icky of a subject this seems to be for neuroscientists and posing some thought experiments as a sort of common-sense philosophical background, Hawkins states that, "I believe consciousness is simply what it feels like to have a neocortex." I know it's a different context, but my own view is that consciousness is defined by a basic awareness of the world, so even simple bacteria can be thought of as being conscious on some level - and clearly there are different degrees of awareness, with humans having the most of all the organisms we know of. Still, I don't begrudge Hawkins for narrowly focusing on mammals because we have neocortices. Many people would not consider creatures that can't ponder the future or recognize themselves as being conscious, and that fits Hawkins' formulation.

    To summarize, On Intelligence is a book by Jeff Hawkins, with Sandra Blakeslee, that presents a new view of what the authors consider the key element of human intelligence: the neocortex. Hawkins claims that the neocortex is composed primarily of millions of repeating patterns, all responsible almost exclusively for remembering patterns and making and testing predictions based on them. These repeating "columns" are organized into a hierarchy, with the lower levels representing the most concrete sensations and motor activity and the highest levels representing the most abstract and "stable" concepts. Hawkins uses the term "memory-prediction framework" to identify this concept and mechanism.

    While I do believe Hawkins does sell traditional AI short, I give him credit for properly exposing the core weaknesses and limitations of many AI technologies. We should take this chastening as a warning not to be too bold in making claims about how the kernels of cool ideas we come up with are the holy grails we've all been searching for. I think Hawkins should take this message seriously of his own good idea, too. I believe there's a lot of value in the memory-prediction framework, but it simply doesn't sufficiently explain everything about intelligence to satisfy my own curiosity. For instance, the concept of pattern invariance is brilliant and crucial, but On Intelligence has virtually nothing to say about how it works in our brains.

    In the end, I would predict that Hawkins will be recognized for his solid contribution of a very valuable conceptual tool to the AI community. The terms he has coined or brought together from other sources will be used by me and surely others for their expressive power. And the attention he brings to largely ignored features like the temporal nature of information, making and testing predictions, and pattern invariance will become the means by which we measure the value of our own concepts and experiments. And, lest I be rude, I should point out that I'm sure his ideas will serve a similar purpose for the field of neuroscience. Perhaps his work will help to bring the two fields closer together, offering new opportunities for us to share and relate the things we know and learn, to the benefit of both fields.

    As a side note, I chose this time to buy this book in electronic form instead of as a hard- or soft-cover printed book. This is the first e-book I've bought. I wanted it in part because I like the fact that I can do text searches to find information. It's especially valuable for reviews like this. But since I have a PocketPC (sorry, Jeff) that I carry with me everywhere - I call it "my brain" - I thought I'd get more chances to read the book if it were always in my pocket. Besides, it was cheaper than the soft-cover, and the local bookstore will only sell the more expensive hard-cover version for these first few months of the publication. Since the Microsoft Reader program the electronic copy works with allows one to share copies of a book between both handheld and desktop PC, I was able to go back and forth without paying for two copies. And the small screen wasn't too much of an issue, as MS Reader allows you to zoom and pan in on illustrations. Man, this was a worthwhile experiment. If you have a handheld computer with a good screen, I'd strongly recommend you start buying books that are mostly text as e-books. There are probably other good readers out there, but I was quite happy that MS Reader is pretty well crafted. Besides, Microsoft will probably win the e-book standards war, soon, so it seems a good choice of standard for now. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Review of "On Intelligence"
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 5/4/2005 - The portable, hand-held learning laboratory

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Poor researchers like me can't generally afford to put together sophisticated research projects. One of the interesting things about researching intelligence, though, is that we have at least one human research subject that's available for experiments 24 hours a day.

    If you'd like to learn more about the nature of learning in the human brain, there's a simple but interesting experiment you can run. Like most people, I'm right handed. For about a year, now, I've been trying to teach myself to brush my teeth with my left hand. I've gotten pretty good at it, but I'm still in awe at how bad my fine control skills with my left hand are compared to my right.

    Now, I'm sure people who understand handedness better than I will say my left hand will probably never be as dexterous as my right hand. And, sure enough, I don't work out my left hand as often as my right, so it'll probably never be as strong, which affects dexterity. Still, setting aside these confounding factors, it's been a very useful sort of experiment for me.

    In the beginning, I was incredibly clumsy, mostly just locking up my left hand and letting the arm do much of the work. Within a few days, I was starting to crudely parrot my right hand's motions. I'd give the toothbrush to my right hand and have it slowly go through its motions so I could study what it was doing and then take it back to duplicate the motions with my left. Within a month or so, I think I was able to perform most of the basic motions without doing much thinking about it. By now, I've gotten to the point where I actually find myself, toothbrush in my left hand, asking which one is actually my right hand, simply because my left hand has gotten almost as good as my right at doing most everything for the half of my mouth that I assign to it. So now it seems the main difference between my two hands, when it comes to brushing my teeth, is a simple one of strength. My left hand gets fairly tired when I try to do my whole mouth with it, whereas my right is fine with it all.

    I'd encourage anyone interested in the nature of learning to consider trying this same experiment. It provides the rare opportunity to be in the head of both the teacher and the student. Training a new skill for the first time is a very conscious effort, and such efforts are very accessible to introspection. Besides, this gives one a chance to run a long term experiment that takes almost no extra time out of one's day. So, happy experimenting.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - The portable, hand-held learning laboratory
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 5/11/2005 - Review of "Visual Intelligence"

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    When I was in the store eyeing up On Intelligence, I also noticed an interesting looking book titled Visual Intelligence, by Donald D. Hoffman, that I was pretty sure I'd have to get back to. I finally bought it a few days ago. Owning to the circumstances of a bit of recent travel, I found I have had a bunch of time to read it. I'm a slow reader, but my inner geek found this book so gripping that I finished this roughly 200 page book off in two days. (I suppose if it had more words and fewer of the pretty pictures, it might have taken a few more days and been less gripping.)

    Given that I found Visual Intelligence a very cool book, I thought it worth writing a review. I'll begin by saying it's incredibly well written and that most of it should be easily reachable by the casual reader. It lends many cool insights into the curious nature of human vision and, by implication, all the other senses.

    My read of On Intelligence and its implications are still churning about in my mind. I feel in some ways like a fool for not having put in a variety of observations I've had since writing my review into it. Perhaps I was too hasty. Oh, well, I'll do it again now by beginning a review of Visual Intelligence not more than an hour after finishing reading it. Every point in time after reading a book seems to bring with it ups and downs when it comes to the context necessary for reviewing a book. Years after reading a book, I'm more likely to have a nicely nuanced view of it, but I'm also likely to have ascribed to it all sorts of claims, characters, and other ideas that were never there. Oh, well.

    I also mention On Intelligence here because I found Visual Intelligence an interesting companion read. Viewing VI through the lens of OI seems to fill in some gaps and perhaps answer some of the "why" questions that Hoffman brings out about the instructive optical illusions found throughout the book. Yet VI also seems to really challenge OI's simplistic reduction and view of the seemingly infinite flexibility of the neocortex in ways I'll highlight more below. I'll apologize to Mr. Hoffman for not reviewing Visual Intelligence completely on its own, but the OI connections and contrasts seem worthwhile.

    If it's not already obvious, Visual Intelligence, subtitled "How we Create What We See", tackles the question of how human vision works. Hoffman's style of writing is easy going and flattering to the reader. On the surface, he is continually pointing out the amazing ability of your brain to construct mental representations of the world using incredibly sparse and even ambiguous information in a surprisingly wide variety of ways as yet unachieved using computers to date. At every turn, though, he's seeking to lay down clear, mechanistic rules, using visual examples you can play with and simple explanations of the hypotheses behind them. He could take a conventional approach of challenging your assumptions, but instead he uses this crafty, flattering method in the hope that you'll reach the same conclusions he and others have about various observations that might otherwise be hard to believe, if you couldn't see them for yourself. I give Hoffman credit for this technique; I'm not very good at it, myself.

    I'm going to spoil the plot of this murder mystery by revealing who the killer is at the outset. Hoffman's basic assertion in Visual Intelligence is that the brain "constructs" its perception of the world, rather than simply observing the world. Disappointingly, he dabbles a bit in the end of VI in relativistic epistemological views of the world that seem mostly a waste of paper. You're better off starting with the assumption that Hoffman acknowledges that the world you perceive is the real world and ignore the classic philosophically skeptical divorce he plants between what others might call "things as we perceive them" and "things as they really are". The thinly veiled "brain in a vat" stuff in the end of the book distracts, however, from the clean, mechanistic view the rest of the book takes of vision and of the crafty ways researchers have been using for centuries to tease those mechanisms out of the brain's obscurity.

    The compelling euphemism Hoffman uses throughout Visual Intelligence is of you, the "creative genius". He makes and continually introduces support for the very strong assertion that everything you perceive is really your own mind's creation. Most of the illustrations he includes are very simplistic. They're stripped of the enormous richness you expect from the natural world you see daily. They are simplified not to save ink, but to boil down what's going on to its bare essentials. Following are some modified examples of illustrations found in VI:

    Some illustrations from Visual Intelligence featuring squares and cubes

    I don't want to spend too much time explaining what each of these is meant to illustrate, but let me give an example to give a flavor of what the book explains far better than I can. Following is a modified quote related to the "flat wheel" and "3D cube" illustration above:

    Throughout Visual Intelligence, Hoffman presents, explains, and accumulates a list of rules that he considers pretty solidly agreed upon by the community that's been seriously studying vision for the past few centuries. Without elaboration, let me list them here to summarize:

    Much as I would love to explain all the out-of-context terms like "filters", "salient boundaries", and "subjective figures", I suppose that wouldn't be fair to Hoffman and would make this a very long review, indeed. Suffice to say that it should be apparent that these rules cover a pretty wide swath of topics, including edges, object segmenting, light and color, translucency, and even motion. This last part (motion) is interesting because it stands in contrast to Jeff Hawkins' assertion, in On Intelligence, that most AI researchers don't account for time in their models and thinking. He may be right to some degree, but Hoffman cites examples of research into perception of motion going back to the nineteenth century and shows that time is alive and well to this day in at least some quarters.

    I'd like to stay with the comparison of Hawkins' assertions in On Intelligence with the ones here in Visual Intelligence for another point. It's very clear that Hoffman's view is that most all humans are endowed with the same bag of tricks that underlie the above rules and many more waiting to be discovered. Hawkins, by contrast, seems to strongly assert that the brain -- the neocortex, more precisely -- doesn't come prepackaged with such tricks. Instead, it starts out pretty much a blank slate, with each part looking for predictable patterns in input. We are able to point to roughly the same places on the brain where certain functions, like speech or your thumb's touch sense, can be found in most people. Hawkins would attribute this to how "wiring" from outside the cortex, including that from the various sense organs, is hooked into the cortex in the same basic ways for most people. Beyond that expected coincidence, the rest of what goes on beyond the hook-ups is, to Hawkins, learned. In stark contrast, Hoffman shows no timidity in claiming that the optical illusions he sees and demonstrates in the book are exactly how your brain will most likely see them, too, clearly implying that we use the same mechanisms of perception, and so must be predisposed to having them. One would be hard pressed to conclude from Hoffman's esoteric examples that we all just happen upon the same incredible mechanisms by accident. So in this context, Hawkins and Hoffman stand at opposite poles on the question of the role of learning versus inborn skills, at least when it comes to how we directly perceive the phenomenal world.

    But don't be to quick to conclude that they are at opposite poles on all things. I was delighted to apply Hawkins' memory-prediction framework to each of the rules and visual puzzles Hoffman put forth and found a great meshing between the two. Most of the rules listed above are really about constraints to interpretation of information. Consider rule 1, for example: "Always interpret a straight line in an image as a straight line in 3D". Looking at a straight line on a printed page, you could say it actually represents a circle that's turned 90° into the page, so all you're seeing is the edge of it. Or you could interpret it as two separate lines that happen to overlap from your current perspective. Yet Hoffman asserts that your brain will most likely choose as the most likely interpretation an assumption that what looks like a straight line actually is a straight line. In fact, under the right circumstances, you should even be able to interpret an interrupted straight line as a continuous one that's obscured by some other object, even though there's no direct evidence to support it. Consider the following illustration of this:

    Broken lines partly obscured by an opaque object.

    Ever since I read about the memory-prediction framework, I've been thinking in these terms about how to deal with "implied" information like the hidden portions of the lines in the figure above. Hoffman would call this a "subjective" line, to contrast the fact that a typical man-made device would not make the same leap that your brain does to conclude that the lines that pass "behind" the obstacle are actual continuous, so it must be your own subjective interpretation of the information given and the "prediction" that the missing piece is really back there, somewhere. The classic example I keep pondering is how, in my bubble vision experiments, my bubbles "leak" out of one well-defined surface into others, even at times when they find a small but sufficiently blurry edge. It frustrates me, of course, because I "know" the edge is there, even though it's blurry. The memory-prediction model and the subjective-edge concept seem to go hand in hand in explaining why my own brain's "bubbles" don't suffer the "leak" problem that the bubbles I created in code do.

    One of the topics that Visual Intelligence addresses that was eye-opening for me is in the area of "dividing shapes". I've taken for granted for a long time that the brain subdivides one's visual field into smaller and smaller parts and uses these basic parts to help describe and to identify what one sees. I like to use the word "segmentation" to identify this concept. I found Hoffman's explanation of how he and many others interpret how the brain does this to be so incredibly, simply mechanistic that it's very easy to imagine it's true. If segmentation occurs, the natural question is, "by what rules does the brain subdivide the objects it sees into segments?" That is, where does it draw the lines between neighboring segments? What surprised me is that Hoffman is again able to deal with such a complicated question by reference even to very simple line drawings like the following:

    Segmentation of an object up into smaller parts.

    To Hoffman's thinking, the places you see dotted lines in the figure above are places where, if all you saw was the curved rim, you might likely divide up the solid object implied by the rim into smaller pieces. These lines all at least begin at one sharply concave dent in the rim. And while you will favor connecting such dents, you'll also favor shortest-line divisions from one dent to the other side of the assumed object over seeking out opposing dents that are farther away or cut less "deeply". What's usually left, then, are segments that are composed entirely of convex (outward) curves and corners and no prominent dents. Put another way, I'd say what's left can be approximated using nothing more than deformed circles. The result in the lower illustration above is still strikingly similar to the object it approximates above.

    The sort of descriptive power that can come from simplified models composed of deformed circles and the like are exactly what I was trying to get at with the bubble concept. In order to emphasize the importance of segmentation of this sort in the way we think and speak about the world, Hoffman considers the human face. Here's a brief excerpt:

    I could write at much greater length about all the interesting things Hoffman talks about in Visual Intelligence but you're much better off just reading it for yourself. I'd recommend it to anyone interested in the subject of machine vision. The idea that you will perceive all sorts of phantoms in the images you see may seem irrelevant or even discouraging to the programmer. but I believe they only serve to tease out valuable rules like the ones listed above, which Hoffman explains superbly.

    Visual Intelligence; How We Create What We See, by Donald D. Hoffman, was first copyrighted in 1998 and is currently published by W. W. Norton and Company, Inc. I bought my soft-cover copy from a local Barnes and Noble for about $19.00.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Review of "Visual Intelligence"
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/9/2005 - Machine vision: blob growth

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Recently, I've been spending a lot of my free time thinking about machine vision. I've been running a variety of simple experiments into different techniques and trying somehow to formulate a cohesive theory and tool set for creating a general purpose vision system. I feel bad that I haven't been blogging lately, though. I guess I've just assumed I need something significant to blog about so it's not a waste of people's time.

    Ironically, I've been keeping a small, ad hoc journal of some ideas about the subject. I figured that perhaps it's worth sharing. The next new entries are simply extracts from it. They're far less formal than most of my already informal blog entries. I apologize for not putting them in sufficient context, which I usually try to do when I blog. So, without further ado, following is the first entry.

    I keep trying to figure out a way to isolate regions. My bubble growth algorithm isn't all that bad, but not great. There's a nasty problem with spill-over where edges are poorly defined.

    I'm reminded of my reading of Visual Intelligence. Hoffman addresses the concept of what I like to call "segmentation" of figures. A complex silhouette of a human, for example, might be segmented into a head, arms, legs, and a torso. The key to segmentation, in Hoffman's view, is finding the convex portions and starting cuts through the silhouette at them. To my thinking, the result tends to be smaller segments that generally don't have major concave corners or curves any more.

    I've been trying to think of it from the other side, though. What if one took the bubble concept and added a certain "desire" of a bubble to keep from having small bulges? Perhaps just avoiding sharp concave corners would provide an interesting result. A bubble that begins growing in the center of the head in a silhouette might stop growing as it reaches the neck because further growth would create sharp concavities.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: blob growth
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/9/2005 - Machine vision: overlooking shadow and light splotches on surfaces

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Following is another in the aforementioned series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

    Shadows and light splotches that fall on even perfectly smooth surfaces really trip up systems designed to detect objects by finding contiguous surfaces. We don't seem to be fooled by such issues very often.

    We are fooled when there are ambiguities in what we see. Perhaps understanding what makes one situation ambiguous versus another will help isolate what differentiates the two for the benefit of codification.

    Looking at a picture I took looking down a tree-lined sidewalk, I found a great example of the issue. Shadows of trees fall on the sidewalk, creating a fairly smooth, two-tone division between shadowed and non-shadowed portions. I see a continuous sidewalk.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: overlooking shadow and light splotches on surfaces
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/9/2005 - Machine vision: cost-effective action

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

    One thing that seems to dog many MV techniques is how slow or otherwise resource-hungry they are. I'm realizing that one thing that seems a must is a set of basic vision tools that allow for trading time for effectiveness. For example, given a whole image, the agent should be able to focus on a small portion - like your own fovea does - instead of trying to analyze the entire image. Also, the agent should be able to choose a lower quality image in order to reducing processing time.

    Ideally, an agent would be able to learn to estimate how much time each operation will take and to thus be able to choose which techniques to use and how intently to apply them based on how well they serve various goals. If, for example, the goal is to track the movement of one or more objects, a full-image, low-res approach might do. To study a stationary object in detail, by contrast, might suggest a small-portion, high-res approach.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: cost-effective action
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/9/2005 - Machine vision: Hierarchy of regions

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

    One thing that most of us in MV don't want to admit is that we arbitrarily set thresholds for distinguishing where one thing ends and another begins. I don't think we work that way, per se. I'd like to see edge- or color-blob-finding techniques having varying thresholds. One use would be in finding large regions with high thresholds, then using ever narrower thresholds to find the sub-regions within the broader ones.

    In a similar vain, I'm considering using low-res images to find homogeneous-color blobs in image Rich textures can disappear when the resolution is low, leaving just the overall color. A field of grass, for example, becomes a solid sheet of green. Once the field is isolated, it can be scrutinized in finer detail to see if there's something small that's of interest in it.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: Hierarchy of regions
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/9/2005 - Machine vision: 2D collages

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

    I've been nursing the idea that it's not necessary to have a detailed sense of how far away things in an image are. It's probably sufficient, in some basic contexts, to just know that one thing is in front of another and not care about absolute distances. It seems some MV researchers have gone ape over telling exactly how far away an apple on a table is using lasers, stereo displacement, and all sorts of tricks. Maybe just knowing how big an apple typically is is good enough for telling how far away it is.

    When I think about 3D vision in this context, I have been likening the visible world to a collage of 2D images. Take the scene seen by a stationary camera looking at a road as cars go by. One could take the unchanging background as one image. A car moving by would be the only object of interest. What's interesting is that the image of the car, from snapshot to snapshot, doesn't change much. It's as though one just took the previous image of the car and stretched and warped it a little in order to get the current image of the car. That "smooth morphing" idea is at the heart of this 2D collage analogy.

    In the car example, it should be fairly easy to use the conventional technique of seeing pixel differences between a before and after image to isolate the car from the background. Not sure yet how to deal with the morphing. It seems, fair, though, to assume that the car doesn't just disappear unless it's heading out of the scene. Instead, it should suffice to take the "before car" and place it in the "after car" space and then scale it to fit the blob. Then comes a comparison step to see how the two car images differ. Perhaps key points - edges or corners - can be found and their positions corresponded.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: 2D collages
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/10/2005 - Machine vision: layer-based models

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

    It's challenging for MV software to figure out, when looking at a complex scene, how to segment it into distinct objects. The main reason is that there doesn't seem to be anything intrinsic in an image to suggest boundaries among objects.

    Perhaps expectations can play into it. I was experimenting with a simple sort of expectation system in which the video camera gazes as a static scene. In time, the output image dissolves into black. Only when an object passes into the field of view does it break out from black. The moving parts stand out. I they stand still for a while, they too fade to black to indicate that they are now part of the static scenery. The mechanism is pretty simple. There's an "ambient" image that is built with time. Each pixel is constantly being scanned and an expectation for what its color should be is built. Later, a simple comparison of the current scene's image to the ambient image will only yield non-zero pixel differences wherever a pixel color has suddenly changed, typically because an object is moving through.

    That's a cool experiment, but not useful for much. Perhaps it could be used to help isolate objects long enough to build simple models of them. Add a little sophistication to the above. Instead of constantly morphing an ambient image over time, the agent pauses a few moments initially to determine that the entire scene is static, then takes a snapshot, perhaps averaged out over two or three frames to cancel out typical noise. Henceforth, so long as the agent knows its looking at the same scene, it would cancel it out using the snapshot - the "model" - to see if anything new is there. A person might sit down in a chair in the scene for a few minutes, but he'd not disappear from the scene, even though he's stationary.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: layer-based models
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/10/2005 - Machine vision: tilting my head

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

    I observed something very interesting today. When I look at a fixed position in a relatively static scene and tilt my head slowly left or right - rotate it, essentially - something unexpected happens. The scene seems to "click" into position at a rate of perhaps twice a second. The effect is similar to watching a poster rotate with a strobe light flashing every half-second, minus the blackness. And, funny enough, it feels as though my eyeballs are rotating and clicking with each step.

    I thought maybe this had something to do with the fact that I have two eyes, so I closed one eye and repeated the experiment. Same result.

    I tried this because I wanted to know how our eyes deal with changes in rotation. I was thinking about how to get software to deal with a change in point of view. When your saccades around a scene, it somehow almost instantly orients itself to the new point of view. It occurred to me that maybe the brain somehow plans the saccade and predicts how much the scene will "shift" by. A computer should be able to do this, too. The hard part is predicting how far a camera's saccade will shift the scene. With a "soft fovea" inside a fixed view, this is easy.

    But it seems the tilting-head case throws the brain for a loop. I believe what's happening is that the lower level visual processor doesn't know how to deal with the whole scene rotating and so calls for a "reset" of the image, as though you had blinked and, upon opening your eyes, found yourself in an entirely new scene.

    I estimate it takes a little less than half a second to deal with the new orientation. It would be interesting to experiment with the brain's ability to learn to deal continuously with such rotations. I bet it would be like switching from contacts to glasses or vice-versa. At first, the world appears strangely bouncy as I move about. Within a few minutes, I find that bounciness goes away. I assume this is because my brain learns to make predictions about how the scene I see will respond to my movements.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: tilting my head
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/10/2005 - Machine vision: motion tracking

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

    I was thinking earlier about something related to the head tilting issue. As I was walking around, I tracked stationary points in space. While my sense is of a stable view of the stationary point, I found that my eyes do actually saccade very rapidly and very subtly to keep the target in the center of my fovea. That is, my gaze is not stable.

    My gaze does appear to be predictive, though. It seems as I move, my eye comes to predict where the stationary point will be in the next moment and keeps my eyes moving to keep up. It's a little like shooting skeet. You see the clay pigeon emerge from the launcher and continuously adjust your muscles to keep it in your gun's sights. You could close your eyes and keep moving the gun along the predicted trajectory, but as time goes by, the gun will move farther and farther off target.

    As a side note, this may help explain why my eyes sometimes get tired when I rock in my seat while working at my computer. The screen's position relative to my eyes is constantly changing. Surely my eyes have to work harder to keep focused on the screen's contents.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: motion tracking
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/10/2005 - Machine vision: pixel morphing

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

    I'm entertaining the idea that our vision works using a sort of "pixel morphing" technique. To illustrate what I mean in the context of a computer program, imagine a black scene with a small white dot in it. We'll call this dot the "target". With each frame in time, the circle moves a little, smoothly transcribing a square over the course of, say, forty frames. That means the target is on each of the four edges for ten time steps.

    The target starts at the top left corner and travels rightward. The agent watching this should be able to infer that the dot seen in the first frame is the same as in the second frame, even though it has moved, say, 50 pixels away. Let's take this "magic" step as a given. The agent hence infers that the target is moving at a rate of 50 pixels per step. In the third frame, it expects the target to be 50 pixels further to the right and looks for it there.

    Eventually, the target reaches the right edge of the square and starts traversing downward along that edge. Our agent is expecting the target to be 50 pixels to the right in the next step and so looks for it there. It doesn't find it. Using an assumption that things don't usually just appear and disappear from view, the agent looks around for the target until it finds it. It now has a new estimate of where it will be in the next frame: 50 pixels below its position in the current frame.

    Now, since the target is the only thing breaking up the black backdrop, it leaves something ambiguous. Is the target moving or is the scene moving, as might happen if a robot were falling over? We'll prefer to assume the entire scene is moving because there's nothing to suggest otherwise. So now let's draw a solid brown square around the invisible square the target traverses. The result looks like a white ball moving around inside a brown box. Starting from the first frame, again, we magically notice the target has moved 50 pixels to the right in the second frame. The brown square has not moved. We could interpret this as the ball moving in a stationary scene or as a moving scene with the brown box moving so as to perfectly offset the scene's motion. This latter interpretation seems absurd, so we conclude the target is what is moving. Incidentally, this helps explain why one prefers to think of the world as moving while the train he is on is "stationary". The train provides the main frame of reference in the scene, unless one presses his face against the window and so only sees the outside scene.

    Now to explain the magic step. The target's position has changed from frame N to frame N + 1. It's now 50 pixels to the right. How can we programmatically infer that these two things are the same? For the second frame, we don't have any way to predict that the target will be 50 pixels to the right of its original position. What should happen is that the agent should assume that objects don't just disappear. Seeing the circle is missing, it should go looking for it. One approach might be to note that now there's a "new" blob in the scene that wasn't there before. The two are roughly the same size and color, so it seems reasonable to assume they are the same and to go from there. It becomes a collaboration between the interpretations of two separate frames.

    But there's still some magic behind this approach. We simplified the world by just having flat-colored objects like the brown rectangle and the white circle. There are no shadows or other lighting effects and only a small number of objects to deal with. The magic part is that we were able to identify objects before we bothered to match them up. A vision system should probably be able to do matching up of parts of a picture before recognizing objects. But how?

    One answer might be to treat a single frame as though it were made of rubber. Each pixel in it can be pushed in some direction, but one should expect the neighboring pixels will move in similar directions and distances, with that expectation falling off with distance from any given pixel that moves. Imagine a picture of a 3D cube with each side a different color rotating along an up/down axis, for example. You see the top of the cube and the sides nearest you. And the lighting is such that the front faces change color subtly as the cube rotates. Looking down the axis, the cube is rotating clockwise, which means you see the front faces moving from right to left.

    Imagine the pixels around the top corner nearest you. You see color from three faces: the two front faces and the top face. Let's talk about frames 1 and 2, where 2 comes is right after frame 1. In frame 1 the corner we're looking at is a little off to the right of the center of the frame and in frame 1, it's a little left of the center. We want the agent considering these frames to intuit from frames 1 and 2 that the corner under consideration has moved and where it is. Now think of frame 1 as a picture made of rubber. Imagine stretching it so that it looks like frame 2. With your finger, you push the corner we're considering an inch to the left so it lines up with the same corner in frame 2. Other pixels nearby go with it. Now you do the same with the bottom corner just below it and it's starting to look a little more like frame 2. You do the same along the edge between these two corners until the edge is pretty straight and lines up the same edge in frame 2. And you do the same with each of the edges and corners you find in the image.

    Interestingly, you can do this with frame 3, too. You can keep doing this with each frame, but eventually things "break". The left front face eventually is rotated out of view. All those pixels in that face can't be pushed anywhere that will fit. They have to be cut out and the gap closed, somehow. Likewise, a new face eventually appears on the right, and there has to be a way to explain its appearance. Still much of the scene is pretty stable in this model. Most pixels are just being pushed around.

    How would such a mechanism be implemented in code? When the color of a pixel changes, the agent can't just look randomly for another pixel in the image that is the same color and claim it's the same one. Even the closest match might not be the same one. But what if each pixel were treated as a free agent that has to bargain with its nearby neighbors to come up with a consistent explanation that would, collectively, result in morphing of the sort described above? Strength in numbers would matter. Those pixels whose colors don't change would largely be non-participants. Only those that change from one frame to another would. From frame 1 to 2, pixel P changes color. In frame 1, pixel P was in color blob B1. In frame 2, P searches all the color blobs for the one whose center is closest to P that is strongly similar in color. It tries to optimize its choice on closeness in both distance and color. In the meantime, every other P that changes from frame 1 to 2 is doing the same. When it's all done, every changed-pixel P is reconsidered by reference to its neighbors. What to do next is not clear, though.

    One thing that should come out of the collaborative process, though, is a kind of optimization. Once some pixels in the image have been solidly matched by morphing, they should give helpful hints to nearby pixels as to where to begin their searches as well. If pixel P has moved 50 pixels to the right and 10 down, the pixel next to P has probably also moved about 50 pixels to the right and 10 down.

    In the case of the white circle moving around, it should be clear. But what if a white border were added around the brown square? The brown hole created as the white circle moves from that position to the next might result in all changed pixels P guessing that the white pixels in the border nearest the new brown hole are actually where the white circle went, but this doesn't make sense. Similarly, the new white circle in frame 2 could be thought to have come from out of the white border; again, this doesn't make sense.

    One answer would be a sort of conservation of "mass" concept, where mass is the number of pixels in some color blob. The white circle in frame 2 could have come from the wall, but that would require creating a bunch of new white pixels. And the white circle in frame 1 could have disappeared into the white border, but this would require a complete loss of those pixels. Perhaps the very fact that we have a mass of pixels in one place in frame 1 and the same mass of the same color of pixels in another place in frame 2 should lead us to conclude that they are the same.

    There's a lot of ground to cover with this concept. I think there's value to the idea of bitmap morphing. I think a great illustration of how this could be used by our own vision is how we deal with driving. Looking forward, the whole scene is constantly changing, but only subtly. Only the occasional bird darting by or other fast-moving objects screw up the impression of a single image that's subtly morphing.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: pixel morphing
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/15/2005 - Machine vision: studying surface textures

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

    It seems that one can't escape the complexities that come with texture. Previously, I had experimented with very low resolution images because they can blur a texture into a homogeneous color blob. There's a terrible tradeoff, though. The texture smoothes out while the edges get blocky and less linear. Too much information is lost. What's more, a lower resolution image will likely have more uneven distribution of similar but different colored pixels. A ball goes from having a texture with lots of local color similarity to a small number of pixels with unique colors.

    Moreover, it's a struggle for me with my own excellent visual capabilities to really understand what's in such low resolution images. It can't be that good of a technique if the source images aren't intelligible to human eyes.

    I think I will have to revisit the subject of studying textures. An appropriate venue would be a scene with a simple white or black backdrop and uniform-texture objects moving around in close proximity to the video camera. Objects might include a tennis ball, various rocks, pieces of fabric, plastic sheets, and so on. The goal would be to get an agent to "understand" such textures. One critical aspect of understanding would be that it could later identify a texture it has studied. The moving around of an object with a given texture is important. It's not enough to use a still image of a texture to really understand it. Textured surfaces tend to have wide variation in their appearances as they are moved about and reshaped. To recognize a texture requires that it be abstracted in a way that can overcome such variations.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: studying surface textures
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/16/2005 - Machine vision: smoothing out textures

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

    While reading up more on how texture analysis has been dealt with in recent years, I realized yesterday that there may be a straightforward way to do basic region segmentation based on textures. I consider texture-based segmentation to be one of two major obstacles to stepping above the trivial level of machine vision today toward general purpose machine vision. The other regards recognizing and ignoring illumination effects.

    Something struck me a few hours after reading how one researcher chose to define textures. Among others, he made two interesting points. First, that a smooth texture must, when sampled at a sufficiently low resolution, cease to be a texture and instead become a homogeneous color field. Second, the size of a texture must be at least several times larger in dimensions (e.g., width and height) than a single textural unit. A rectangular texture composed of river rocks, for example, must be several rocks high and several rocks wide to actually be a texture.

    Later, when I was trying to figure out what characteristics are worth consideration for texture-based segmentation that don't require me to engage in complicated mathematics, I remembered the concept I had been pursuing recently when I started playing with video image processing. I thought I could kill two birds with one stone by processing very low-res versions of source images: eliminating fine textures and reducing the number of pixels to process. I was disappointed, though, by the fact that low-res meant little information of value.

    I realized that there was another way to get the benefit of blurring textures into smooth color fields without actually blurring (or lowering the resolution of - same thing) the images, per se. The principle is as follows.

    Imagine an image that includes a sizable sampling of a texture. Perhaps it has a brick wall in the picture with no shadows, graffiti, or other significant confounding inclusions on the wall. The core principle is that there is a circle with a smallest radius CR (critical radius) that is large enough to be "representative" of at least one unit of that texture. In this case, what determines if it is representative is whether the circle can be placed anywhere within the bounds of the texture - the wall, in our example - and the average color of all the pixels within it will be almost exactly the same as if the circle were set anywhere else in that single-texture region.

    If we want to identify the brick wall's texture as standing apart from the rest of the image, then, we have to do two things in this context. One, we need to find that critical radius (CR). Two, we need to populate the wall-textured region with enough CR circles so no part is left untested, yet no CR circle extends beyond the region. The enclosed region, then, is the candidate region.

    I suppose this could work with squares, too. It doesn't have to be circles, but there may be some curious symmetric effects that come into play that I'm not aware of. Let's limit the discussion to circles, though.

    So how does one determine the critical radius? A single random test won't do, because we don't know in advance that the test circle actually falls within a texture without some a priori knowledge. Our goal is to discover, not just validate.

    I propose a dynamically varying grid of test circles that looks for local consistencies. Picture a grid in which at each node, there is centered a circle. The circles should overlap in such a way that there are no gaps. That is, the radius should be at least half the distance between one node and the node one unit down and across from it. In the first step, the CR (radius) chosen and hence the grid spacing would be small - two pixels, for example. As the test progresses, CR might grow by a simple doubling process or by some other multiplier. The grid would cover the entire image under consideration. The process would continue upward until the CR values chosen no longer allow for a sufficient number of sample circles to be created within the image.

    The result of each pass of this process would be a new "image", with one pixel per grid node in the source image. That pixel's color would be the average of the colors of all the pixels within the test circle at that node. We would then search the new image for smooth color blobs using traditional techniques. Any significantly large blobs would be considered candidates for homogeneous textures.

    I'm not entirely sure exactly how to make use of this information, but there's something intuitively satisfying about it. I've been thinking for a while now that we note the average colors of things and that that seems to be an important part of our way of categorizing and recognizing things. A recent illustration of this for me is a billboard I see on my way to work. It has a large region of cloudy sky, but the image is washed to an orangish-tan. From a distance, it looks to me just like the surface of a cheese pizza. So even though I know better, my first impression whenever I see this billboard - before I think about it - is of a cheese pizza. The pattern is obviously of sky and bears only modest resemblance to a pizza, but the overall color is very right.

    Perhaps one way to use the resulting tiers of color blobs is to break down and analyze textures. Let's say I have one uniform color blob at tier N. I can look at the pixels of the N - 1, higher resolution version of this same region. One question I might ask is whether those pixels too are consistent. If so, maybe the texture is really just a smooth color region. If not, then maybe I really did capture a rough but consistent texture. I might then try to see how much variation there is in that higher resolution level. Maybe I can identify the two or three most prominent colors. In my sky-as-cheese-pizza example, it's clear that I see the dusty orange and white blobs collectively as appearing pizza-like; it's not just the average of the two colors. I could also use other conventional texture analysis techniques like co-occurrence matrices. Once I have the smoothness point (resolution) for a given color blob, I can perhaps double or quadruple the resolution to get it sufficiently rough for single-pixel-distances common in such analysis instead of having resolutions so high that such techniques don't work well.

    Critics will be quick to point out that all I'm capturing in this algorithm is the ambient color of a texture. I might have a picture of oak trees tightly packed and adjacent to tightly packed pine trees. The ambient color of the two kinds of trees' foliage might be identical and so I would see them as a single grouping. To that I say the quip is valid, but probably irrelevant. I think it's reasonable to hypothesize that our own eyes probably deal with ambient texture color "before" they get into details like discriminating patterns. Further, I think a system that can successfully discriminate purely based on ambient texture color would probably be much farther ahead than alternatives I've seen to date. That is, it seems very practical.

    Besides, the math is very simple, which is a compelling reason to me for believing it's something like how human vision might work. I can imagine the co-occurrence concept playing a role, but the combinatorics for a neural network that doesn't regular change its physical structure seem staggering. By contrast, it may take a long time for a linear processor to go through all these calculations, but the function is so simple and repetitive that it's pretty easy to imagine a few cortical layers implementing it all in parallel and getting results very quickly.

    As a side note, I'm pretty well convinced that outside the fovea, our peripheral vision is doing most of its work using simple color blobs. Once we know what an object is, we just assume it's there as its color blobs move gradually around the periphery until the group of them moves out of view. It seems we track movement there, not details. The rest is just an internal model of what we assume is there. This strengthens my sense that within the fovea, there may be a more detail-oriented version of this same principle at work.

    What I haven't figured out yet is how to deal with illumination effects. I suspect the same tricks that would be used for dealing with an untextured surface that has illumination effects on it would also be used on the lower resolution images generated by this technique. That is, the two problems would have to be processed in parallel. They could not be dealt with one before the other, I think.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: smoothing out textures
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/20/2005 - Machine vision: spindles

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Following is another in my series of ad hoc journal entries I've been keeping of my thoughts on machine vision.

    Maybe I'm just grasping at straws, but I recently realized one can separate out a new kind of primitive visual element. Every day, we're surrounded by thin, linear structures. Power lines, picture frames, pin-striped shirts, and trunks of tall trees are all great examples of what I mean. A line drawing is often nothing more than these thin, linear structures, and most written human languages are predominated by them.

    The first word that comes to mind when I think about these things is "spindles".

    On one hand, it seems hard to imagine that we have some built-in way to recognize and deal with spindles as a primitive kind of shape like we might with, say, basic geometric shapes (e.g., squares and circles) or features like edges or regions. But something about them seems tempting from the perspective of machine vision goals. Spindly structures in an image are obviously at least two dimensional, technically, yet ask a human to draw them and he'll most likely just draw thin lines. They're not just edges; not simply where one surface ends and a new one begins. They have their own colors and hence thickness.

    Perhaps what makes spindles interesting to me is that it seems as though one could come up with a practical way of segregating spindles out of an image that may be easier than picking out, say, broad regions based on color blobs, texture spans, or edges. Finding blobs is hard in large part because it's hard to describe in simple terms what a given blob's shape is. A few blurry pixels along an otherwise sharp edge can bring a basic region growing technique to its knees and leave the researcher frustrated into hand adjusting cutoff thresholds to get the results he desires.

    But spindles might be easier. Characterizing and recognizing a thin structure should be easier than an arbitrarily shaped blob. Even if the spindle is curved, branching, or somewhat jagged, it may still be easier than dealing with blobs. What's more, it's possible to compare the various spindles in an image to search for patterns that might give hints about 3D structures. Looking down a brick wall and you might pick out the horizontal white mortar lines as spindles and note that they all have a common vanishing point and thus hypothesize a 3D interpretation.

    Spindles seem to come in two basic 3D flavors: colored edges and floating structures. The distinction, from a low level perspective, seems to be in whether what's on either side of a spindle is the same color or pattern or not. An overhead power line divides the sky, which is the same on both sides. A picture frame provides an enhancement of the boundary between a picture and the wall. Perhaps the similarity of the colors and textures on either side of a spindle also provide some basic suggestions about whether a given spindle is attached to one or both sides or is otherwise free-floating. The concept of "generic views" would say that it seems hard to imagine the frame around a picture might be floating in space in such a way that it would exactly line up with the picture, so the most plausible explanation is that it's no coincidence that the picture frame is actually in the same place as the picture. Whether it's attached to the wall or floating in space is a different question. So spindles can be helpful 3D cues.

    I don't know whether to suggest that the human visual system sees spindles as a somehow separate sort of primitive, but it seems plausible. The very fact that printed characters in most all human languages are composed of spindles seems suggestive. Maybe it's because it's economical to write in strokes instead of blobs, but maybe it's more fundamental than that. It's also interesting that we have little trouble understanding technical "some assembly required" line drawings, even when they have no color, shading, or other 3D visual cues.

    Perhaps spindles provide a way to explain how it is that a line drawing of a circle can be interpreted just as easily as a hollow hoop or a solid disk. That is, perhaps spindles are considered interchangeable with edges by our visual systems. Yet perhaps it's also that spindles stand out better than edges do.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: spindles
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 7/12/2005 - Machine vision: motion-based segmentation

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I've been experimenting, with limited success, with different ways of finding objects in images using what some vision researchers would call "preattentive" techniques, meaning not involving special knowledge of the nature of the objects to be seen. The work is frustrating in large part because of how confounding real-world images can be to simple analyses and because it's hard to nail down exactly what the goals for a preattentive-level vision system should be. In machine vision circles, this is generally called "segmentation", and usually refers more specifically to segmentation of regions of color, texture, or depth.

    Jeff Hawkins (On Intelligence) would say that there's a general-purpose "cortical algorithm" that starts out naive and simply learns to predict how pixel patterns will change from moment to moment. Appealingly simple as that sounds, I find it nearly impossible to square with all I've been learning about the human visual system. From all the literature I've been trying to absorb, it's become quite clear that we still don't know much at all about the mechanisms of human vision. We have a wealth of tidbits of knowledge, but still no comprehensive theory that can be tested by emulation in computers. And it's equally clear nobody in the machine vision realm has found an alternative pathway to general purpose vision, either.

    Segmentation seems a practical research goal for now. There has already been quite a bit of research into segmentation based on edges, on smoothly continuous color areas, on textures, and based on binocular disparity. I'm choosing to pursue something I can't seem to find literature on: segmentation of "layers" in animated, three dimensional scenes. Donald D. Hoffman (Visual Intelligence) makes the very strong point that our eyes favor "generic views". If we see two lines meeting at a point in a line drawing, we'll interpret the scene as representing two lines that meet at a point in 3D space, for example. The lines could be interpreted as having their endpoints coincidentally meeting, even though in the Z axis, they may be very far apart, but the concept of generic views says that that sort of coincidence would be so statistically unlikely that we can assume it just doesn't happen.

    The principle of generic views seems to apply in animations as well. Picture yourself walking along a path through a park. Things around you are not moving much. Imagine you take a picture once for every step you take in which the center of the picture is always fixed on some distant point and you are keeping the camera level. Later, you study the sequence of pictures. For each pair of adjacent pictures in the sequence, you visually notice that very little seems to change. Yet when you inspect each pixel of the image, a great many of them do change color. You wonder why, but you quickly realize what's happening is that the color of one pixel in the before picture has more or less moved to another location in the after picture. As you study more images in the sequence, you notice a consistent pattern emerging. Near the center point in each image, the pixels don't move very much from frame to frame and the ones farther from the center tend to move in ever larger increments and almost always in a direction that radiates away from the center point.

    You're tempted to conclude that you could create a simple algorithm to track the components of sequences captured in this way by simply "smearing" the previous image's pixels outward using a fairly simple mathematical equation based on each pixel's position with respect to the center, but something about the math doesn't seem to work out quite right. With more observation, you notice that trees and rocks alongside the path that are nearer to you than, say, the bushes behind them act a little differently. Their pixels move outward slightly faster than those of the bushes behind them. In fact, the closer an object is to you as you pass it, the faster its pixels seem to morph their way outward. The pixels in the far off hills and sky don't move much at all, for example.

    At one point during the walk, you took a 90° left turn in the path and began fixating the camera on a new point. The turn took about 40 frames. In that time, we lost that fixed central point, but the intermediate frames seem to act in the same sort of way. This time, though, instead of smearing radially outward from a central point, the pixels appear to be shoved rapidly to the right of the field of view. It's almost as though we just had a very large bitmap image that we could only see a small rectangle of that was moving over that larger image.

    By now, I hope I've impressed on you the idea that in a video stream of typical events in life, much of what is happening from frame to frame is largely a subtle shifting of regions of pixels. Although I've been struggling lately to figure out an effective algorithm to take advantage of this, I am fairly convinced this is likely one of the basic operations that may be going on in our own visual systems. And even if it's not, it seems to be a very valuable technique to employ in pursuit of general purpose machine vision. There seem to be at least two significant benefits that can be gained from application of this principle: segmentation and suppression of uninteresting stimuli.

    Consider segmentation. You've probably seen a variant of the "hidden dalmation" image at right here in which the information is ambiguous enough that you have to look rather carefully to grasp what you are looking at. What makes such an illusion all the more fascinating is when it starts out with an even more ambiguous still image that then begins into animation as the dog walks. The dog jumps right out of ambiguity. (Unfortunately, I couldn't find a video of it online to show.) I'm convinced that the reason that the animated version is so much easier to process is that the dog as a whole and its parts move consistently along their own paths from moment to moment as the background moves along its own path and that we see the regions as separate. What's more, I'm confident we also instantly grasp that the dog is in front of the background and not the other way around because we see parts of the background disappearing behind the parts of the dog, which don't get occluded by the background parts.

    Motion-based segmentation of this sort seems more computationally complicated than, say, using just edges or color regions, but it carries with it this very powerful value of clearly placing layers in front of or behind one another. What's more, it seems it should be fairly straightforward to take parts that get covered up in subsequent frames and others that get revealed to actually build more complete images of parts of a scene that are occasionally covered by other things.

    Another way of looking at why motion-based segmentation of this sort is special, consider the fact that it lets something that might otherwise be very hard to segment out using current techniques, such as a child against a graffiti-covered wall, stand out in a striking fashion as it moves in some way different from its background.

    Now consider suppression of uninteresting stimuli. It seems in humans that our gaze is generally drawn to rapid or sudden motions in our fields of view. It's easy to see this by just standing around in a field as birds fly about, for instance, or on a busy street, for another. What's more, rapid motions that are unexpected which appear in even the farthest periphery of your visual field are likely to draw your attention away from otherwise static views in front of you. If you wanted to implement this in a computer, it would be pretty easy if the camera were stationary. You simply make it so each pixel slowly gets used to the ambient color and gets painted black. Only pixels that vary dramatically from the ambient color get painted some other color. Then you would use fairly conventional techniques to measure the size and central position of such moving blobs. But what if the camera were in the front windshield watching ahead as you drive? If you could identify the different segments that are moving in their own ways, you could probably fairly quickly get around to ignoring the ambient background. Things like a car changing lanes in front of you or a street sign passing overhead would be more likely to stand out because of their differing relative motions.

    I'm in the process of trying to create algorithms to implement this concept of motion-based visual segmentation. To be honest, I'm not having much luck. This may be in part because I haven't much time to devote to it, but it's surely also because it's not easy. So far, I've experimented a little with the idea of searching the entire after image for candidates where a pixel in the before image might have gone in the hopes of narrowing down the possibilities by considering that pixel's neighbors' own candidate locations. Each candidate location would be expressed as an offset vector, which means that neighboring candidates' vectors can easily be compared to see how different they are from one another. When neighboring pixels all move together, they will have identical offset vectors, for instance. I haven't completed such an algorithm, though, because it's not apparent to me that this would be enough without a significant amount of crafty optimization. The number of candidates seems to be quite large, especially if all the pixels in the after image are potential candidates for movement of each pixel in the before image.

    One other observation I've made that could have an impact on improving performance is that it seems that most objects that can be segmented out using this technique probably have fairly strongly defined edges around them, any way. Hence, it may make sense to assume that the pixels around one pixel will probably be in the same patch as that one unless they are along edge boundaries. Then, it's up for grabs. Conversely, it may be worthwhile considering only edge pixels' motions. This seems like it would garner more dubious results, but may be faster because it could require consideration of fewer pixels. One related fact is that it should be that the side of an edge which is on the nearer region should remain fairly constant, while the side in the farther region will be changing over time as parts of the background are occluded or revealed. This fact may help in identifying which apparent edges represent actual boundaries between foreground and background regions and particularly in determining which side of the edge is foreground and which background.

    I'm encouraged, actually, by a somewhat related technology that may be able to be applied to this problem. I suspect that this same technique is used in our own eyes for binocular vision. That is, the left and right eye images in a given moment are a lot like adjacent frames in an animation: subtly shifted versions of one another. Much hard research has gone into making practical 3D imaging systems that use two or more ordinary video cameras in a radial array around a subject, such as a person sitting in a chair. Although the basic premise seems pretty straightforward, I've often wondered how they actually figure out how a given pixel maps to another one. The constraint of pixel shifting being offset in only the horizontal direction probably helps a lot, but I strongly suspect that people with experience developing these techniques would be well placed to engineer a decent motion-based segmentation algorithm.

    One feature that will probably confound the attempt to engineer a good solution is the fact that parts of an image may rotate as well as shift and change apparent size (moving forward and backward). Rotation means that the offset vectors for pixels will not be simply be nearly identical, as in simple shifting. It should mean that the vectors vary subtly from pixel to pixel, rather like a color gradient in a region shifting subtly from green to blue. The same should go for changes in apparent size. The good news is that once the pixels are mapped from before to after frames, the "offset gradients" should tell a lot about the nature of the relative motion. It should, for example, be fairly straightforward to tell if rotation is occurring and find its central axis. And it should be similarly straightforward to tell if the object is apparently getting larger or smaller and hence moving towards or away from the viewer. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Machine vision: motion-based segmentation
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 7/30/2005 - Patch equivalence

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    As I've been dodging about among areas of machine vision, I've been searching for similarities among the possible techniques they could employ. I think I've started to see at least one important similarity. For lack of a better term, I'm calling it "patch equivalence", or "PE".

    The concept begins with a deceptively simple assertion about human perception: that there are neurons (or tight groups of them) that do nothing but compare two separate "patches" of input to see if they are the same. A "patch", generally, is just a tight region of neural tissue that brings input information from a region of the total input. With one eye, for example, a patch might represent a very small region of the total image that that eye sees. For hearing, a patch might be a fraction of a second of time spent listening to sounds within a somewhat narrow band of frequencies, as another example. A "scene", here, is a contiguous string of information that is roughly continuous in space (e.g., the whole image seen by one eye in a moment) or time (e.g., a few seconds of music heard by an ear). The claim here is that for any given patch of input, there is a neuron or small group of them that is looking at that patch and at another patch of the same size and resolution, but somewhere else in the scene. Further, that neuron (group) is always looking in the same pair of places at any given time. It doesn't scan other areas of the scene; just the pair of places it knows. We'll call this neuron or small group of neurons a "patch comparator".

    From an engineering perspective, the PE concept is both seductively simple and horribly frightening. If I were designing a hardware solution from scratch, I imagine it would be quite easy to implement, and could execute very quickly. When I think about a software simulation of such a machine, though, it's clear to me that it would be terribly slow to run. Imagine every pixel in the scene having a large number of patch comparators associated with it. Each one would look at a small patch - maybe 5 x 5 pixels, for instance - around that pixel and at the same size patch somewhere else in the scene. One comparator might look 20 pixels to the left, another might look 1 pixel above that, another 2 pixels above, and so on until there's a sufficient amount of coverage within a certain radius around the central patch being compared. There could literally be thousands of patch comparisons done for just one single pixel in a single snapshot. Such an algorithm would not perform very quickly, to say the least.

    Let's say the output of each patch comparator is a value from 0 to 1, where 1 indicates that the two patches are, pixel for pixel, identical and 0 means they are definitively different. Any value between indicates varying degrees of similarity.

    One might well ask what the output of such a process is. What's the point? To be honest, I'm still not entirely sure, yet. It's a bit like asking what a brain would do with edge detection output. To my knowledge, nobody really knows in much detail, yet.

    Still, I can easily see how patch equivalence could be used in many facets of input processing. Consider binocular vision, for example. You've got images coming from both eyes and you generally want to match up the objects you see in each eye, in part to help you know how far each is. One patch comparator could be looking at one place in one eye and the same place in the other. Another comparator could then be looking at the same place in the left eye as before, but in a different place in the right eye, for instance. Naturally, there would be all sorts of "false positive" matches. But if we survey a bunch of comparators that are looking at the same offset and most of them are seeing matches with that offset, we would take the consensus as indicating a likelihood that we have a genuine match. We'd throw out all the other spurious matches as noise, for lack of a regional consensus.

    Pattern detection is another example of where this technique can be used. Have you ever studied a printed table of numeric or textual data where one column contains mostly a single value (e.g., "100" or "Blue")? Perhaps it's a song playlist with a dozen songs from one album, followed by a dozen from another. You scan down the list and see the name of the first album is the same for the upper dozen songs. You don't even have to read them, because your visual system tells you they all look the same. That's pretty amazing, when you think about it. In fact, I've found I can scan down lists of hundreds of identical things looking for an exception, and can do it surprisingly quickly. It's not special to me, of course; we all can. How is it that my eyes instantly pick up the similarity and call out one item that's different? It's a repeating pattern, just like a checker board or bathroom tiles. A patch equivalence algorithm would find excellent use here. Given an offset roughly equal to the distance between the centers of two neighboring lines of text, a region of comparators would quickly come to a consensus that there's equivalence at that offset. Because it's at the same time and from the same eye, the conclusion would be that it's probably from a repeating pattern. As a side note, this doesn't sufficiently explain how we detect less regular patterns, like a table full of differently colored candies, but I suspect PE can play a role in explaining that, too.

    What about motion? PE can help here, too. Imagine a layer of PE comparators that study the image seen by one eye now with the same image seen a fraction of a second ago. A ball is moving through the scene, so the ball sits in one place in one image and perhaps sits a little to the right if that in the next image. Again, one region of patch comparators that sees the ball in its before and after positions lights up in consensus and thus effectively reports the position and velocity of the moving object.

    I've focused on vision, but I do believe the patch equivalence concept can apply to other senses. Consider the act of listening to a song. The tempo is easily detected very quickly for most songs, and that alone can be explained by reference to PE comparators that are looking at linear patches of frequency responses at different time offsets. Or it could be looking not at low level frequency responses, but instead at recognized patterns that represent snippets of instruments at different frequencies. In fact, it may well be that we mainly come to recognize distinct sounds as distinct only because they are repeated. A comparator might be looking at one two dimensional patch that's actually made up of several frequency bands in a small snippet of time and looking at the same kind of patch at a different point in time. If it sees the same exact response in both moments, this fact could result in saving that patch's pattern in short-term memory for later. More repetitions could continue to reinforce this pattern until it's saved for longer term recollection.

    This same principal of selecting patch patterns that repeat in space or time provides a strong explanation of how patterns would come to be considered important enough to remember. This is a rather hard problem in AI, now, in large part because selecting important features seems to presuppose the idea that you can find punctuations between features -- an a priori definition of "important" -- like pauses between words or empty spaces around objects in a scene. Using PE, this may not even be necessary, and potentially provides a more amorphous conception of what a boundary really is. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Patch equivalence
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 8/7/2005 - DualCameras component

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I have been getting more involved in stereo, or "binocular", vision research. So far, most of my actual development efforts have been on finding a pair of cameras that will work together on my computer, an annoying challenge, to be sure. Recently, I found a good pair, so I was able to move on to the next logical step: creating an API for dealing with two cameras.

    Using C#, I created a Windows control component that taps into the Windows Video Capture API and provides a very simple interface. Consumer code needs only start capturing, tell it to grab frames from time to time when it's ready, and eventually (optionally) to stop capturing. There's no question of synchronizing or worrying about a flood of events. I dubbed the component DualCameras and have made it freely available for download, including all source code and full documentation.

    I've already been using the component for a while now and have made some minor enhancements, but I'm happy to say it has just worked this whole time; no real bugs to speak of. It's especially nice to know how all the wacky window creation and messaging that goes on under the surface is quietly encapsulated and that the developer need not understand any of it to use the components. Just ask for a pair of images and it will wait until it has them both. Simple. I certainly can't say that of all the programs I've made.

    The home page I made for the component also has advice about how to select a pair of cameras. I went through a bunch of different kinds before I found one that worked, so I thought I'd share my experience to help save others some headaches. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - DualCameras component
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 8/7/2005 - Automatic alignment of stereo cameras

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I'm currently working on developing a low-level stereo vision component tentatively called "Binoculus". It builds on the DualCameras component, which provides basic access to two attached cameras. To it, Binoculus already adds calibration and will hopefully add some basic ability to segment parts of the scene by perceived depth.

    For now, I've only worked on getting the images from the cameras to be calibrated so they both "point" in the same direction. The basic question here is: once the cameras point roughly in the same direction, how many horizontal and vertical pixels off is the left one from the right? I had previously pursued answering this using a somewhat complicated printed graphic and a somewhat annoying process, because I was expecting I would have to deal with spherical warping, differing camera sizes, differing colors, and so on. I've come to the conclusion that this probably won't be necessary, and that all that probably will be is getting the cameras to agree on where an "infinity point" is.

    This is almost identical to the question posed by a typical camera with auto-focus, except that I have to deal with vertical alignment in addition to the typical horizontal alignment. I thought it worthwhile to describe the technique here because I have had such good success with it and it doesn't require any special tools or machine intelligence.

    We begin with a premise that if you take the images from the left and right cameras and subtract them, pixel for pixel, the closer the two images are to pointing at the same thing, the lower will be the sum of all pixel differences. To see what I mean, consider the following figure, which shows four versions of the same pair of images with their pixel values subtracted out:

    From left to right, each shows the difference between the two images as they get closer to best alignment. See how they get progressively darker? As we survey each combined pixel, we're adding up the combined difference of red, green, and blue values. The ideal match would have a difference value of zero. The worst case would have a difference value of Width * Height * 3 * 255.

    Now let's start with the assumption that we have the cameras perfectly aligned, vertically, so we only have to adjust the horizontal alignment. We start by aiming our camera pair at some distant scenery. My algorithm then takes a rectangular subsection of the eyes' images - about 2/5 of the total width and height - from the very center of the left eye. For the right eye, it takes another sample rectangle of the same exact size and moves it from the far left to the far right in small increments (e.g., 4 pixels). The following figure shows the difference values calculated for different horizontal offsets:

    Notice how there's a very clear downward spike in one part of the graph? At the very tip of that is the lowest difference value and hence the horizontal offset for the right-hand sample box. That offset is, more generally, the horizontal offset for the two cameras and can be used as the standard against which to estimate distances to objects from now on.

    As a side note, you may notice that there is a somewhat higher sample density near the point where the best match is. That's a simple optimization I added in to speed up processing. With each iteration, we take the best offset position calculated previously and have a gradually higher density of tests around that point, on the assumption that it will still be near there with the next iteration. Near the previous guessed position, we're moving our sampling rectangle over one pixel at a time, whereas we're moving it about 10 pixels at the periphery.

    What about the vertical alignment? Technically speaking, we should probably do the same thing I've just described over a 2D web covering the entire right-hand image, moving the rectangle throughout it. That would involve a high amount of calculation. I used a cheat, however. I start with the assumption that the vertical alignment starts out pretty close to what it should be because the operator is careful about alignment. So with each calibration iteration, my algorithm starts by finding the optimal horizontal position. It then runs the same test vertically, moving the sample rectangle from top to bottom along the line prescribed by the best-fitting horizontal offset. If the outcome says the best position is below where the current vertical offset value, we add one to it to push it one pixel downward. Conversely, if the best position seems to be above, we subtract one from the current offset value and so push it upward. The result is a gradual sliding up or down, whereas the horizontal offset calculated is instantly implemented. You can see the effects of this in the animation to the right. Notice how you don't see significant horizontal adjustments with each iteration, but you do see vertical ones?

    Why do I gradually adjust the vertical offset? When I tried letting the vertical and horizontal alignments "fly free" from moment to moment, I was getting bad results. The vertical alignment might be way off because the horizontal was way off. Then the horizontal alignment, which is along the bad vertical offset, would perform badly and the cycle of bad results would continue. This is simply because I'm using a sort of vertical cross pattern to my scanning, instead of scanning in a wider grid pattern. This tweak, however, is quite satisfactory, and seems to work well in most of my tests so far.

    I wish I could tell you that this works perfectly every time, but there is one bad behavior worth noting. Watch the animation above carefully. Notice how as the vertical adjustments occur, there is a subtle horizontal correction? Once the vertical offset is basically set, the horizontal offset switches back and forth one pixel about three times, too, before it settles down. I noticed this sort of vacillation in both the vertical and horizontal in many of my test runs. I didn't spend much time investigating the cause, but I believe it has to do with oscillations between the largely independent vertical and horizontal offset calculations. When one changes, it can cause the other to change, which in turn can cause the other to change back, ad infinitum. The solution generally appears to be to bump the camera assembly a little so it seem something that may agree with the algorithm a little better. I also found that using a sharply contrasting image, like the big, black dot I printed out, seems to be a little better than softer, more naturalistic objects like the picture frame you see above the dot.

    It's also worth noting that it's possible that the vertical alignment could be so far off and the nature of the scene be such that the horizontal scanning might actually pick the wrong place to align with. In that case, the vertical offset adjustments could potentially head off in the opposite direction from what you expect. I saw this in a few odd cases, especially with dull or repeating patterned backdrops.

    Finally, I did notice that there were some rare close-up scenes I tried to calibrate with in which the horizontal offset estimate was very good, but the vertical offset would move in the opposite direction from that desired. I never discovered the cause, but a minor adjustment of the cameras' direction would fix it.

    When I started making this algorithm, it was to experiment with ways to segment out different objects based on distance from the camera. It quickly turned into a simple infinity-point calibration technique. What I like most about it is how basically autonomous it is. Just aim the cameras at some distant scenery, start the process, and let it go until it's satisfied that there's a consistent pair of offset values. When it's done, you can save the offset values in the registry or some other persistent storage and continue using it with subsequent sessions. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Automatic alignment of stereo cameras
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 8/11/2005 - Stereo vision: measuring object distance using pixel offset

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I've had some scraps of time here and there to put toward progress on my stereo vision experiments. My previous blog entry described how to calibrate a stereo camera pair to find X and Y offsets that correspond in the right camera with the same position in the left camera when they are both looking at a far off "infinity point". Once I had that, I knew it was only a small step to use the same basic algorithm for dynamically getting the two cameras "looking" at the same thing even when the subject matter is close enough for the two cameras to actually register a difference. And since I have the vertical offset already calculated, I was happy to see the algorithm running along this single horizontal "rail" runs faster.

    The next logical step, then, was to see if I could figure out the formula for telling how far away what the cameras are looking at is from the cameras. This is one of the core reasons for using a pair of cameras instead of one. I looked around the web for some useful explanation or diagrams. I found lots of inadequate diagrams that show the blatantly obvious, but nothing that was complete enough to develop a solution from.

    So I decided to develop my own solution using some basic trigonometry. It took a while and a few headaches, but I finally got it down. I was actually surprised at how well it worked. I thought I should publish the method I used in detail so other developers can get past this more quickly. The following diagram graphically illustrates the concept and the math, which I explain further below.

    I suppose I'm counting on you knowing what the inputs are. If you do, skip the next paragraph. Otherwise, this may help.

    The pink star represents some object that the two cameras are looking at. Let's assume the cameras are perfectly aligned with each other. That is, when the object is sufficiently far away -- say, 30 feet or more for cameras that are 2.5 inches apart -- and you blend the images from both cameras, the result looks the same as if you just looked at the left or right camera image. But if you stick your hand in front of the camera pair at, say, 5 feet away and look at the combined image, you see two "hands" partly overlapping. Let's say you measured the X (horizontal) offset of one version of the hand from the other as being about 20 pixels. Now, you change the code to overlap the pictures so that the right-hand one is offset by 20 pixels. Now the two hands perfectly overlap and it's the background scene that's doubled-up. The diagram above is suggestive of this in the middle section, where the pink star is in different places in the left and right camera "projections". These projections are really just the images that are output. Now that you grasp the idea that the object seen by the two cameras is the same, but simply offset to different positions in each image, we can move on. Assume for now that we already have code that can measure the offset in pixels I describe above.

    Once I got through the math, I made a proof of concept rig to calculate distance. I simply tweaked the "factor" constant by hand until I started getting distances to things in the room that jibed with what my tape measure said. Then I went on to work the math backward so that I could enter a measured distance and have it calculate the factor, instead. I packaged that up into a calibration tool.

    I expected it would work fairly well, but I was truly surprised at how accurate it is, given the cheap cameras I have and the low resolution of the images they output. I found with objects I tested from two to ten feet away, the estimated distance was within two inches of what I measured using a tape measure. That's accurate enough, in my opinion, to build a crude model of a room for purposes of navigating through it, a common task in AI for stereo vision systems.

    I haven't yet seen how good it is at distinguishing distance in objects that are very close to one another using this mechanism. We can easily discriminate depth offsets of a millimeter on objects within two feet. These cameras are not that good, so I doubt they'll be as competent.

    So now I have a mechanism that does pretty well at taking a rectangular portion of a scene and finding the best match it can for that portion in the other eye and using it to calculate the distance based on the estimated offset. The next step, then, is to repeat this over the entire image and with ever smaller rectangular regions. I can already see some important challeges, like what to do when the rectangle just contains a solid color or a repeating pattern, but these seem modest complications to an otherwise fairly simple technique. Cool. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Stereo vision: measuring object distance using pixel offset
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 8/14/2005 - Bob Mottram, crafty fellow

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I sometimes use my rickety platform here to review new technologies and web sites, but I haven't done enough to give kudos to the unusual people in AI that dot the world and sometimes find their way online. Bob Mottram is one such person that deserves mention.

    Who is Bob Mottram? He's a 33-ish year old British programmer who has found a keen interest in the field of Artificial Intelligence. He seems to be fairly well read on a variety of studies and technologies that are around. What starts to make him stand out is his active participation in the efforts. Like me, he finds that many of the documents out there that describe AI technologies sound tantalizingly detailed, but are actually very opaque when it comes to the details. Unlike most, however, he takes this simply as a challenge to surpass. He designs and codes and experiments until his results start to look like what is described in the literature.

    The next thing that sets Mottram apart is his willingness to step outside the bounds of simply duplicating other people's work. He applies what he learns and hypothesizes about new ways of solving problems, going so far as to envision tackling the high goal of duplicating the inner workings of the brain in software.

    Perhaps what really sets Bob Mottram apart, for me, is his willingness to take his work public. His web site (http://www.fuzzgun.btinternet.co.uk/) is chock full not only of listings of projects he's worked on, but also keen and easy-to-read insights on what he's learned along the way. He also has the venerable habit of peppering his material with links to related content as background and credit.

    Mottram's web site has a fascinating smattering of content about various projects he's worked on. The one that first got my attention was his "Rodney" project. Named after Rodney Brooks, creator of the famous Genghis and Cog robots, Rodney is Mottram's low-budget answer to Cog.

    Through a set of successive iterations, Mottram had built Rodney to be ever more sophisticated as a piece of hardware, but more importantly, had continued to experiment with a variety of different sensing and control techniques. His project web site documents many of these experiments. He also makes available much of his source code.

    What got my attention in the first place was his page on Rodney's vision system. Do a Google search on "robot stereo vision" or using a variety of related terms and you're likely to find Bob Mottram's page on his research. It's not necessarily that his work is really groundbreaking; it's just that he's one of the only people to really document his work. As I was doing background research for an upcoming introduction to machine vision, I found his site over and over again in relation to certain kinds of techniques he's implemented and documented.

    Seeing the general utility of the vision system he was creating for Rodney, Mottram moved on to his Sentience project. The primary goal was to extract and make open-source a software component that can use input from two cameras to construct a 3D model of what the eyes see.

    Mottram's web site includes plenty of other interesting and arcane experiments. Many are whimsical applications of his experiments with stereo vision and detecting motion and change in images, like a Space Invader type game where the player's image is transposed with the aliens or a program that detects people moving within a stationary webcam's field of view. Some delve deeper into new research, like his face detection and imitation work or his Robocore project.

    Finally, Mottram has his very own blog. It's not specifically for AI, but does include various insights into the subject from time to time.

    In all, I give Bob Mottram a good heap of credit for being a crafty fellow who is sincere in his belief in and pursuit of the goals of Artificial Intelligence. And he gets major kudos for sharing his work online for geeks like me. Do check out his web site. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Bob Mottram, crafty fellow
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 8/26/2005 - Introduction to Machine Vision

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Recently, I completely forgot to mention that I published a brief introduction to machine vision (click here) on August 14th. It's meant to be tailored to people who want to better understand the subject but haven't had much experience outside the popular media's thin portrayal of it. By contrast, much of what's written that gets into the nuts and bolts is often difficult to read because it requires complex math skills or otherwise expects you to have a fairly strong background in the subject, already.

    I'm especially fond of demystifying subjects that look exceptionally complex. Machine vision often seems like a perfect example of pinheads making the world seem too complicated and their work more impressive than it really is. Sometimes it comes down to pure huckstery as cheap tricks and carefully groomed examples are employed in pursuit of funding or publicity. Then again, there's an awful lot of very good and creative work out there. It's fun to show that much of the subject can be approachable to even notice programmers and non-programmers.

    I spent a few months putting the introduction together. I'm not entirely happy with the final result, as I imagined it would have a much broader scope. Ultimately, a lack of sufficient time to devote to it meant I had to leave out interesting applications of the basics like optical character recognition (OCR) and face recognition for fear that it would never be done.

    I am, however, starting work on a less ambitious project to address more esoteric topics in machine vision. I should begin publishing drafts of early material within it very soon. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Introduction to Machine Vision
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 9/21/2005 - Topics in Machine Vision

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Once again, I've forgotten to announce a sub-site I created recently that I call Topics in Machine Vision (click here), back on August 28th.

    Unlike my earlier Introduction to Machine Vision, it does not set out to give a broad overview of the subject matter. Instead, it's geared toward the researcher with at least some familiarity with the subject. Also, whereas I intended the introduction to stand complete on its own, Topics is more organic, meaning that I'll continue to add content to it as time passes.

    Knowing that this could get to be difficult to read and manage, I've broken down Topics into separate sections and pages. The first section I've fleshed out is on the Patch Equivalence concept I introduced in an earlier blog entry here. In fact, once I introduced this topic in detail, I went back and ran some experiments in application of the PE concept to stereo vision and published the results, including tons of example images that demonstrate both the strengths and weaknesses of my implementation at the time.

    I intend to tackle plenty of other topics, including generic views, lighting effects, and application of the memory-prediction model, for instance. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Topics in Machine Vision
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 9/25/2005 - Some stereo vision illusions

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    While engaging in some stereo vision experiments, I found myself a little stuck. I stopped working for a while and started staring at a wall on the opposite side of the room, pondering how my own eyes deal with depth perception. I crossed my eyes to study certain facets of my visual system.

    I got especially interested when I crossed my eyes so that the curtains on either side of the doorway were overlapped. I wasn't surprised to find my eyes were only too happy to lock the two together, given how similar they looked. I was, however, surprised to see how well my visual system fused various differences between the two images together into a single end product. It even became difficult to tell which component of the combined scene came from which eye without closing one eye.

    I thought it worthwhile to create some visual illusions based on some of these observations. To view them, you'll need to cross your eyes so that your right eye looks at the left image and vice-versa.

    This first figure, above, is just for practice. Both sides are identical and should form a schene that is 3D of a doorway with curtains on either side. The curtains should appear recessed slightly behind the wall. If they appear in front, your eyes are not properly crossed.

    This second figure presents an interesting dilemma for your eyes. (If you have trouble focusing, try using the upper or lower corners of the door frame to get your eyes locked into the scene.) You know there are two curtains and your vision expects them, but one is missing from just one side. You may find the "phantom" curtain floats left and right and even forward and backward as your eyes go searching for its "other half". Interestingly, you'll find that much of the time, it doesn't appear to be "half as green". Rather than appear like a dimmer version of the curtain on the right, it should typically appear to have exactly the same color. It's as though your eyes ignore the black background and accept the green curtain.

    This figure is quite fascinating to me. Two dots on the left are half trimmed away in one eye and another dot is totally missing in the other. As before, your visual system should accept the fact that dots really are there and that you're just having trouble finding them with one eye each. Again, the "phantom" dots are just as purple as the ones that have perfect mates on both sides. Note how the phantom can be darker or lighter than its background without impact on this effect? Also, you'll find the phantom dot on the right floats back and forth as your eyes try to find its mate, yet the same is not true for the half-dot phantoms. They seem to be solidly fixed horizontally by the rest of the dots. Interestingly, I find little chunks of these phantom dot-halves seem to come and go as my vision tries to decide if they really should be whole dots or half dots. It almost seems to compromise by concluding that they are "flatter" dots - ellipses that are as wide as the other dots but a little shorter, vertically.

    This figure one is fairly straightforward. One curtain has phantom stripes just like the other one's. The stripes appear to veer back and forth a little. They also do appear to be lighter, most of the time.

    This one is a bit more subtle than the others. The number of stripes in one of the curtains is not the same in both images. This is because one of the curtain views has the stripes significantly farther apart. Your vision will probably fight over different interpretations. One is that the lines have varying spacing from outside to inside. A variant of this is that the curtain is actually a round column. Another is that the lines are somehow behind or in front of the curtain. What's most interesting to me is that my eyes never seem to give away the fact that there is a different number of lines for the left (5) and right (4) versions. My eyes are sure they find matches for each and every line.

    This figure presents a somewhat different illusion. One of the versions of the right curtain appears to have a blood-red stain dripping down from the top. Again, the colors don't really blend. Your vision should pick one color or the other. It most likely will pick the red "stain", though I find with some effort, I can make the red stain almost completely disappear. This only works for me when I stare right at the top of the right-hand curtain, where the red is. If I stare at the bottom of either curtain, the red stubbornly remains.

    This final figure is much more difficult to reconcile, I find. Because the right curtain's red and green alternatives are so different, my eyes frequently try shifting to find better candidates. If I stare at the top or bottom of that curtain, it helps to lock them together, though, suggesting that the corners are stronger features than the vertical edges of the curtain, alone. The color of this curtain never seems to stabilize. Curiously, when I stare at the lower part of the right curtain, it seems more likely to settle on green, yet the rest of the curtain above vacillates between green and red even while this part is stable. When I stare at the center, the whole bar is likely to go back and forth between the colors, but it's almost equally likely that the bar will appear to have shades of both colors at the same time. And blinking is almost certain to instantly disrupt whatever color it starts to settle on.

    I hope you find these stereo visual illusions to be thought provoking. It seems to shed light for me on the task ahead as I continue development of stereo vision software. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Some stereo vision illusions
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 10/8/2005 - Stereo disparity edge maps

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I've been experimenting further with stereo vision. Recently, I made a small breakthrough that I thought worth describing for the benefit of other researchers working toward the same end.

    One key goal of mine with respect to stereo vision has been the same as for most involved in the subject: being able to tell how far away things in a scene are from the camera or, at least, relative to one another. If you wear glasses or contact lenses, you've probably seen that test of your depth perception in which you look through polarizing glasses at a sheet of black rings and attempt to tell which one looks like it is "floating" above the others. It's astonishing to me just how little disparity there has to be between images in one's left and right eyes in order for one to tell which ring is different from the others.

    Other researchers have used a variety of techniques for getting a machine to have this sort of perception. I am currently using a combination of techniques. Let me describe them briefly.

    First, when the program starts up, the eyes have to get focused on the same thing. Both eyes start out with a focus box -- a rectangular region smaller than the image each eye sees and analogous to the human fovea -- that is centered on the image. The first thing that happens once the eyes see the world is that the focus boxes are matched up using a basic patch equivalence technique. In this case, a "full alignment" involves moving the right eye's focus patch in a grid pattern over the whole field of view of the right eye in large increments (e.g., 10 pixels horizontally and vertically). The best-matching place then becomes the center of a second scan in single pixel increments in a tighter region to find precisely the best matching placement for the right field of view.

    The full alignment operation is expensive in terms of time: about three seconds on my laptop. With every tenth snapshot taken by the eyes, I perform a "horizontal alignment", a trimmed-down version of the full alignment. This time, however, the test does not involve moving the right focus box up or down relative to its current position; only left and right. This, too, can be expensive: about 1 second for me. So finally, with each snapshot taken, I perform a "lite" horizontal alignment, which involves looking a little to the left and to the right of the current position of the focus box. This takes less than a second on my laptop, which is definitely worth making it standard with each snapshot. The result is that the eyes generally line their focus boxes up quickly on the objects in the scene as they are pointed at different viewpoints. If the jump is too dramatic for the lite horizontal alignment process, eventually the full horizontal alignment process corrects for that.

    Once the focus boxes are lined up, the next step is clear. For each part of the scene that is in the left focus box, look for its mate in the right focus box. Then calculate how many pixels offset the left and right versions are from each other. Those with zero offsets are at a "neutral" distance, relative to the focus boxes. Those with the right versions' offsets being positive (a little to the right) are probably farther away. And those with the right hand features having negative offsets (a little to the left) are probably closer. This much is conventional wisdom. And the math is actually simple enough that one can even estimate absolute distances from the camera, given that some numeric factors about the cameras are known in advance.

    The important question, then, is how to match features in the left focus box with the same features in the right. I chose to use a variant of the same patch equivalence technique I use for lining up the focus boxes. In this case, I break down the left focus box into a lot of little patches -- one for each pixel in the box. Each patch is about 9 pixels wide. What's interesting, though, is that I'm using 1-dimensional patches, which means each patch is only one pixel high. For each patch in this tight grid of (overlapping) patches in the left focus box, there is a matching patch in the right focus box, too. Initially, its center is exactly the same as for the left one, relative to the focus box. For each patch in the left side, then, we move its right-hand mate from left to right from about -4 to +4 pixels. Whichever place yields the lowest difference is considered the best match. That place, then, is considered to be where the right-hand pixel is for the one we're considering on the left, and hence we have our horizontal offset.

    For the large fields of homogenous color in a typical image, it doesn't make sense to use patch equivalence testing. It makes more sense to focus instead on the strong features in the image. So to the above, I added a traditional Sobel edge detection algorithm. I use it to scan the right focus box, but I only use the vertical test. That means I find strong, vertical edges and largely ignore strong horizontal edges. Why do this? Stereo disparity tests with two eyes side by side only work well with strong vertical features. So only pixels in the image that get high values from the Sobel test are considered using the above technique.

    This whole operation takes a little under a second on my laptop -- not bad.

    Following are some preliminary image sets that show test results. Here's how to interpret them. The first two images in each set are the left and right fields of view, respectively. The third image is a "result" image. That is, it shows features within the focus box and indicates their relative distance to the camera. Strongly green features are closer to, strongly red features are farther away, and black features are at relatively neutral distances, with respect to the focus box pair. The largely white areas represent areas with few strong vertical features and are hence ignored in the tests.

    In all, I'm impressed with the results. One can't say that the output images are unambiguous in what they say about perceived relative distance. Some far-away objects show tinges of green and some nearby objects show have tinges of red, which of course doesn't make sense. Yet overall, there are strong trends that suggest this technique is actually working. With some good engineering, the quality of results can be improved. Better cameras wouldn't hurt, either.

    One thing I haven't addressed yet is the "white" areas. A system based on this might see the world as though it were made up of "wire frame" objects. If I want to have a vision system that's aware of things as being solid and having substance, it'll be necessary to determine how far away the areas among the sharp vertical edges are, too. I'm certain that a lot of that has to do with inferences our visual systems make based on the known edges and knowledge of how matter works. Obviously, I have a long way to go to achieve that. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Stereo disparity edge maps
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 10/29/2005 - Using your face and a webcam to control a computer

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I don't normally do reviews of ordinary products. Still, I tried out an interesting one recently that makes practical use of a fairly straightforward machine vision technique that I thought worth describing.

    The product is called EyeTwig (www.eyetwig.com), and is billed as "head mouse". That is, you put a camera near your computer monitor, aim it at your face, and run the program. Then, when you move your head left and right, up and down, the Windows cursor, typically controlled by your mouse, moves about the screen in a surprisingly intuitive and smooth fashion.

    Most people would recognize the implication that this could be used by the disabled. I thought about it, though, and realized that this application is limited mainly to those without mobility below the neck. And many of those in that situation have limited mobility of their heads. Still, a niche market is still a market. I think the product's creator sees that the real potential lies in an upcoming version that will also be useful as a game controller.

    In any event, the program impressed me enough to wonder how it works. The vendor was unwilling to tell me in detail, but I took a stab at hypothesizing how it worked and running some simple experiments. I think the technique is fascinating by itself, but also could be used in kiosks, military, and various other interesting applications.

    When I first saw how EyeTwig worked, I was impressed. I wondered what sorts of techniques it might use for recognizing a face and realizing that it is changing its orientations. The more I studied how it behaved, though, the more I realized it uses a very simple set of techniques. I realized, for example, that it ultimately uses 2D techniques and not 3D techniques. Although the instructions are to tilt your head, I found that simply shifting my head left and right, up and down worked just as well.

    The process for machines of recognizing faces is now a rather conventional one. My understanding is that most techniques start by searching for the eyes on a face. It is almost universal that human eyes will be found in two dark patches (eye sockets are usually shadowed) of similar size and roughly side by side and with a pretty tight distance-between proportion. So programs find candidate patch pairs, assume they are eyes, and then look for the remaining facial features in relation to those patches.

    EyeTwig appears to be no different. In addition to finding eyes, though, I discovered that it looks for what I'll loosely call a "chin feature". It could be a mustache, a mouth, or some other horizontal, dark feature directly under the eyes. I discovered this by experimenting with abstract drawings of the human face. My goal was to see how little a drawing needed to be sufficient for EyeTwig to work. The figure at right shows one of the minimal designs that worked very well: a small white-board with two vertical lines for eyes and one horizontal line for a "chin". When I slid the board left and right, up and down, EyeTwig moved the cursor as expected.

    One thing that made testing this program out much easier is the fact that the border of the program's viewer changes color between red and green to indicate whether it recognizes what it sees as a face.

    In short, EyeTwig employs an ingenious, yet simple technique for recognizing that a face is prominently featured in the view of a simple web-cam. No special training of the software is required for that. For someone looking to deploy practical face recognition applications, this seems to provide an interesting illustration and technique. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Using your face and a webcam to control a computer
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 11/3/2005 - A standardized test of perceptual capability

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I've been getting too lost in the idiosyncrasies of machine vision of late and missing my more important focus on intelligence, per se. I'm changing direction, now.

    My recent experiences have shown me that one thing we haven't really done well is in the area of perceptual level intelligence. We have great sensors and cool algorithms for generating interesting but primitive information about the world. Edge detection, for example, can be used to generate a series of lines in a visual scene. But so what? Lines are just about as disconnected from intelligence as the raw pixel colors are.

    Where do primitive visual features become percepts? Naturally, we have plenty of systems designed to instantly translate visual (or other sensory) information into known percepts. Put little red dots around a room, for instance, and a visual system can easily cue in on them as being key markers for a controlled-environment system. This is the sort of thinking that is used in vision-based quality control systems, too.

    But what we don't have yet is a way for a machine to learn to recognize new percepts and learn to characterize and predict their behavior. I've been spending many years thinking about this problem. While I can't say I have a complete answer yet, I do have some ideas. I want to try them out. Recently, while thinking about the problem, I formulated an interesting way to test a perceptual-level machine's ability to learn and make predictions. I think it can be readily reproduced on many other systems and extended for ever more capable systems.

    The test involves a very simplified, visual world composed of a black rectangular "planet" and populated by a white "ball". The ball, a small circle whose size never changes, moves around this 2D world in a variety of ways that, for the most part, are formulaic. One way, for example, might be thought of as a ball in a box in space. Another can be thought of as a ball in a box standing upright on Earth, meaning it bounces around in parabolic paths as though in the presence of a gravitational field. Other variants might involve random adjustments to velocity, just to make prediction more difficult.

    The test "organism" would be able to see the whole of this world. It would have a "pointer". Its goal would be to move this pointer to wherever it believes the ball will be in the next moment. It would be able to tell where the pointer currently points using a direct sense separate from its vision.

    Predicting where the ball will be in the future is a very interesting test of an organism's ability to learn to understand the nature of a percept. Measuring the competency of a test organism would be very easy, too. For each moment, there is a prediction, in the form of the pointer pointing to where it believes the ball will be in the next moment. When that moment comes, the distance between the predicted and actual positions of the ball is calculated. For any given set series of moments, the average distance would be the score of the organism in that context.

    It would be easy for different researchers to compare their test organisms against others, but would require a little bit of care to put each test in a clear context. The context would be defined by a few variables. First is the ball behavior algorithm that is used. Each such behavior should be given a unique name and a formal description that can be easily implemented in code in just about any programming language. Second, the number of moments used to "warm up", which we'll call the "warm up period". That is, it should take a while for an organism to learn about the ball's behavior before it can be any good at making predictions. Third, the "test period"; i.e., the number of moments after the warm-up period is done in which test measurements are taken. The final score in this context, then, would be the average of all the distances measured between prediction and actual position.

    There would be two standard figures that should be disclosed with any given test results. One is that the best possible score is 0, which means the predictions are always correct. The second is the best possible score for a "lazy" organism. In this case, a lazy organism is one that always guesses that the ball will be in the same place in the next moment that it is now. Naturally, a naive organism would do worse than this cheap approximation, but a competent organism should do better. The "lazy score" for a specific test run would be calculated as the average of all distances from each moment's ball position to its next moment's position. A weighted score for the organism could then be calculated as a ratio of actual score to lazy score. A value of zero would be the best possible. A value of one would indicate that the predictions are no better than the lazy score. A value greater than one would indicate that the predictions are actually worse than the lazy algorithm.

    Some might quip that I'm just proposing a "blocks world" type experiment and that an "organism" competent to play this game wouldn't have to be very smart. I disagree. Yes, a programmer could preprogram an organism with all the knowledge it needs to solve the problem and even get a perfect score. A proper disclosure of the algorithm used would let fellow researchers quickly disqualify such trickery. So would testing that single program against novel ball behaviors. What's more, I think a sincere attempt to develop organisms that can solve this sort of problem in a generalizable way will result in algorithms that can be generalized to more sophisticated problems like vision in natural settings.

    Naturally, this test can also be extended in sophistication. Perhaps there could be a series of levels defined for the test. This might be Level I. Level II might involve multiple balls of different colors. And so on.

    I probably will draft a formal specification for this test soon. I welcome input from others interested in the idea. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - A standardized test of perceptual capability
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 11/10/2005 - Neuron banks and learning

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I've been thinking more about perceptual-level thinking and how to implement it in software. In doing so, I've started formulating a model of how cortical neural networks might work, at least in part. I'm sure it's not an entirely new idea, but I haven't run across it in quite this form, so far.

    One of the key questions I ask myself is: how does human neural tissue learn? And, building on Jeff Hawkins' memory-prediction model, I came up with at least one plausible answer. First, however, let me say that I use the term "neuron" here loosely. The mechanisms I ascribe to individual neurons may turn out to be more a function of groups of them working in concert.

    Let me start with the notion of a group of neurons in a "neural bank". A bank is simply a group of neurons that are all looking at the same inputs, as illustrated in the following figure:

    Figure: Schematic view of a neuron bank.

    Perhaps it's a region of the input coming from the auditory nerves. Or perhaps it's looking at more refined input from several different senses. Or perhaps even a more abstract set of concepts at a still higher level. It may not be that there are large numbers of neurons that all look at the same chunk of inputs -- it may be more messy than that -- but this is a helpful idea, as we'll soon see. Further, while I'll speak of neural banks as though they all fall into a single "layer" in the sense that traditional artificial neural networks are arranged, it's more likely that this neural bank idea applies to an entire patch of 6-layered cortical tissue in one's brain. Still, I don't want to get mired in such details in this discussion.

    Each neuron in a bank is hungry to contribute to the whole process. In a naive state, they might all simply fire, but such a cacophony would probably be counterproductive. In fact, our neural banks could be hard-wired to favor having a minimal number of neurons in a bank firing at any given time -- ideally, zero or one. So each neuron is eager to fire, but the bank, as a whole, doesn't want them to fire all at once.

    These two forces act in tension to balance things out. How? Imagine that each neuron in a bank is such that when it fires, its signal tends to suppress the other neurons in the bank. Suppress how? Two ways: firing and learning. When a neuron is highly sure that it is perceiving a pattern it has learned, it fires very strongly. Other neurons that may be firing because they have weak matches would be self-silenced by these louder neurons, on the assumption that the louder neurons must have more reason to be sure of the patterns they perceive. Consider the following figured, modified from above to show this feedback:

    Figure: Schematic view of a neuron bank.

    But what about learning? What does a neuron learn and why would we want other neurons to suppress it? First, what is learned by a neuron is one or more patterns. For simplicity, let's say it's a simple, binary pattern. For each dendritic synapse looking at input from outside axons that a neuron has, we'll say it either cares or doesn't care and, if it does, it prefers either a firing or not-firing value. The following figure illustrates this, schematically:

    Figure: Schematic view of a neuron bank.

    Following is a logical behavior table. It is equivalent to a logical exclusive or (XOR) operation:

    Preferred Input Actual Input Matches
    0 0 Yes
    0 1 No
    1 0 No
    1 1 Yes

    Let's describe the desired input pattern in terms of a string of zeros (not firing), ones (firing), and exes (don't care). For example, a neuron might prefer to see "x x 0 x 1 0 x 1 0 0 x 0 x x 1". When it sees this exact pattern, it fires strongly. But maybe when it sees all but one of the inputs it cares about doesn't fit. It still fires, but not as strongly. If another neuron is firing more strongly, this one shuts up.

    That's what's learned but not how it's learned. Let's consider that more directly. A neuron that fires on a regular basis is "happy" with what it knows. It's useful. It doesn't need to learn anything else, it seems. But what about a neuron that never gets a chance to fire because its pattern doesn't match much of anything? I argue that this "unhappy" neuron wants very much to be useful. It searches for novel patterns. What does this mean? There are many possible mechanisms, but let's consider just one. We'll assume all the neurons started out with random synaptic settings (0, 1, or x). Now let's say that there is a certain combination of inputs that no neuron in the bank shouts out to say "I got this one". Some of these neurons see that some of the inputs do match. These are inclined to believe that this input is probably a pattern that can be learned, so they change some of their "wrong" settings to better match the current input. The more strongly the match already is for a given unhappy neuron, the more changes that neuron is likely to make to conform to this new input.

    Now let's say this particular combination of input values (0s and 1s) continues to appear. At least one neuron will continue to grow ever more biased towards matching that pattern that eventually it will start shouting out like other "happy" neurons do.

    This does seem to satisfy a basic definition for learning. But it does leave many questions unanswered. One is: how does it decide whether or not to care about an input? I don't know the answer, but here's one plausible answer. A neuron -- whether "happy" or "unhappy" with what it knows -- can allow its synaptic settings to change over time. Consider a happy one. It continues to see its favored pattern and fires whenever it does. Seeing no other neurons contending for being the best at matching its pattern, it is free to continue learning in a new way. In particular, it looks for patterns at the individual synapse level. If one synaptic input is constantly the same value whenever this one fires, it favors setting that synapse to "do care". If, conversely, it changes with some regularity, this neuron will favor setting that one to "don't care".

    Interestingly, this leads to a new set of possible contentions and opportunities for new knowledge. One key problem in conceptualization is learning when to recognize that two concepts should be merged and when one concept should be subdivided into other narrower ones. When do you learn to recognize two different dogs are actually part of the same group of objects called "dogs"? And why do you decide that a chimpanzee, which looks like a person, is really a wholly new kind of thing that deserves its own concept?

    Imagine that there is one neuron in a bank of them that has mastered the art of recognizing a basset hound dog. And let's say that's the only kind of dog this brain has ever seen before. It has seen many different bassets, but no other breed. This neuron's pattern recognition is greedy, seeing all the particular facets of bassets as essential to what dogs are all about. Then, one day, this brain sees a Doberman pinscher for the first time. To this neuron, it seems very like a basset, but there are enough features to be doubtful. Still, nobody else is firing strongly, so this one might as well, considering itself to have the best guess. This neuron is strongly invested in a specific kind of dog, though. It would be worthwhile to have another neuron devoted to recognizing this other kind of dog. What's more, it would be valuable to have yet another neuron that recognizes dogs more generally. How would that come about?

    In theory, there are other neurons in this bank that are hungry to learn new patterns. One of them could see the lack of a strong response from any other neuron as an opportunity to learn either the more specific Dobie pattern or of the more general dog pattern.

    One potential problem is that the neurons that detect more specific features -- bassets versus all dogs, for example -- might tend to make more general concepts like "dog" go away. There must be some incentive. One explanation could be frequency. The dog neuron might not have as many matching features to consider as the basset neuron does, but if this brain sees lots of different dogs and only occasionally bassets, the dog neuron would get exercised more frequently, even if it doesn't shout the loudest when a basset is seen. So perhaps both frequency and strength of matching are strong prompts for a neuron that it's learned well.

    I have no doubt that there's much more to learning and the neocortex, more generally. Still, this seems a plausible model for how learning could happen there. method="post" action="../../ai/feedback.asp">

    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Neuron banks and learning
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 4/7/2007 - A respectful critique of the Hierarchical Temporal Memory (HTM) concept

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    I've been away from this too long, distracted by other things in my life. I've missed it. Lately, I've been finding myself getting excited again to the point of getting distracted from those other things and back in this world.

    The most interesting development in the world of artificial intelligence of late, to my thinking, is the recent release of Numenta's Hierarchical Temporal Memory algorithm, the brainchild largely of Dileep George and inspired largely by Jeff Hawkins, author of On Intelligence. Having been so disappointed by artificial neural networks, expert systems, and various other "traditional" approaches to AI, I found the ideas presented by Hawkins refreshing and exciting, so I joined Numenta's mailing list and eagerly awaited the arrival of its promised products.

    Now that the NuPIC platform and related tools have been released, Numenta has also authored various white papers on how it actually works. In refreshing contrast to the mind numbing gibberish of some proprietary systems' (e.g., PILE's) white papers and math-heavy tomes on Bayesian networks and neural networks, these documents present a clearly understandable description of what HTMs actually do and how they do them. The one I found most penetrating was coauthored by Dileep George and titled The HTM Learning Algorithms. So far, this is the best document I have read on the subject, though admittedly, it helps to be familiar with the HTM concept at a high level.

    I am about halfway through reading this 44-page PDF. I had to stop in part because my brain couldn't focus any more on it because I'm distracted by my own work and, frankly, inspired by what I've found in this document. I finally "get it", how an HTM learns, which I've been missing for the whole time I've been aware of HTMs. But to my surprise, I found there are some troubling questions I've formed already in the process that I want to document before I forget. I want to pose them here to help further the discussion of the value of HTMs and perhaps promote their improvement.

    Section 4 describes how an HTM node is exposed to a continuously changing stream of data and learns to recognize "causes". In this example, however, there are very tight constraints. The application used is called "Pictures" and involves learning to recognize pure black and white line drawings of simple symbols like letters and coffee cups. This section focuses on learning in the first layer, in which each HTM node can see a 4x4 grid of B&W pixels. The sample drawings used are all composed of very simple elements like vertical or horizontal lines, "L" joints, "T" joins, "Z" folds and line ends. In order to make sure the HTM properly learns to recognize these constructs in many situations, this HTM is exposed to examples of each in many positions in its 4x4 visual field. This is done by showing it (and all the other HTMs in this level) "movies" of the archetype drawings moving in various directions and at different scales (zoom factors).

    Now, I know it's important to reduce a general problem to a narrower problem in order to help test, quantify, and explain a concept. So I'm willing to suspend a little skepticism. But as I read on about the nuts and bolts, this came back to bug me again. In order to learn to recognize that many variations of a pattern all represent the same pattern, HTMs rely critically on a temporal component for learning. Let's say in moment T1, the node is exposed to a picture of an "L" joint and in moment T2, it's the same L joint, but shifted to the right one pixel. The fact that these two distinct patterns were seen in adjacent time steps suggests they have the same "cause" and so get lumped in together. Later, when the HTM sees either of these two versions of the "L" joint, they will report it as the same thing, which is super cool.

    But here's one problem. Before an HTM can even begin noticing that the two "L" joint patterns appear one after the other, it's necessary for the HTM to undergo a "long" learning process just to recognize the distinct patterns, which here are called "quantization points". In the learning process, the HTM is exposed to a long series of these "movies" of all the sample images moving around relative to the HTMs. In that process, all unique pixel patterns a level 1 HTM is exposed to are recorded before it moves on to learning which ones are related to one another. Every single pattern! Now, with a 4x4 black and white grid, there may be 2^(4x4) or 65,536 unique patterns. Since the source data fed into this program is limited to these very clean, rectilinear patterns, the actual number of unique quantization points recorded in this first phase is only 150. If there were curves, different angles, and "dirt" in the source images, the number would clearly be much higher. Honestly, this leaves a bad taste in my mouth, as I can't imagine gathering together all examples of rich source data as a good prerequisite for beginning to classify things, nor a resource responsible way.

    Now, one of the points of an HTM in this Pictures application is that it can learn to recognize that all "L" joints are the same thing without any prior knowledge of that. The key ingredient in the HTM recipe is this temporal coincidence. So once all 150 distinct mini-patterns, or quantization points, have been identified by watching the source images moving around in various directions against the field of view, the next step is to construct a 150 x 150 matrix initialized with all zeros. The rows and columns both represent each of the quantization points, but one represents seeing one in T1 and the other represents seeing it in T2. So lets say quantization point Q1 represents an "L" joint and Q2 represents another L joint shifted one pixel to the right of Q1. As the movie progresses from T1 to T2, we find the cell in the matrix where row Q1 and column Q2 meet and we add 1 to it. After a lot of this process, we end up with a matrix that has very high numbers in a few cells that represent lots of coincidences of quantization points in time, like our two L joints and a large portion of the matrix still with zeros. The reason for doing this is that there must be some way to say that Q1 and Q2 are related; that's the point of an HTM, and coincidence in time seems a good way.

    An HTM has a finite number of outputs, each of which represents a "cause". The developer gets to decide the number. The more there are, in theory, the more nuanced the known causes can be. The next step of the learning process, then, is to decide what those causes are. Let's say for example there can be at most 10 "causes" that can be output. The 150 quantization points each get assigned to one of these 10 causes in a process that's a bit hard to understand. It's probably best to read section 4.2.2, "Forming temporal groups by partitioning the time-adjacency matrix", for a precise explanation. But one summary way of explaining it is that this algorithm starts at one quantization point that has the highest number of temporal connections (as represented in the 150 x 150 matrix) to others and follows along the really strong connections to other quantization points, lumping them together into one group. In theory, the connections branching out get sufficiently weak that the algorithm stops following them. Then it moves on to the next remaining quantization point that has the highest value in the matrix and continues on (ignoring all other quantization points that have already been grouped). This continues until either all quantization points with connections above a certain threshold are exhausted or we run out of groups (our maximum of 10 causes). The authors point out that this is not the only way to do grouping, but it's a pretty ingenious way to quickly allocate causes.

    This learning algorithm is truly ingenious. I love it. And yet it bothers me, too. For one thing, this specific algorithm only cares about the coincidence of patterns from one discrete moment to the next. For another, its performance seems to rely very strongly on tight constraints on the data. As the data is allowed to become less constrained -- going from perfect right angle lines to allowing curves, allowing thicker lines, allowing dirty data, rotating in 3D, allowing grey scales or colors, and so on -- the number of quantization points and time to learn must grow exponentially. "Real" data would probably quickly deluge such a system as this with quantization points.

    I'm especially bothered by the fact that each HTM requires an exhaustive learning period where it discovers all its quantization points before it moves on to start learning how they are causally related. And then this phase requires another exhaustive learning period where it discovers all the two-moment temporal relations among quantization points before it moves on to try to group the quantization points -- distinct input patterns -- into proximal causes which are then the main output of an HTM.

    Further, while I recognize the value of showing a picture of a cat in many different "orientations" using these movies as a proxy for seeing lots of actual cats, I'm bothered by the idea that the movies are required for this algorithm to learn about cats. I would think that an algorithm that learns to distinguish cats as a group should be able to see lots of single, still pictures of animals of all sorts, including a cat. Heck, if I had 10 pictures of different animals and ten neurons (or HTMs), I should be able to repeatedly show each of my 10 pictures at random with different scales and orientations and have my neurons learn to align themselves to each of the 10 animals, yet the HTMs aren't going to work this way, unless I wiggle the pictures around. Why this curious requirement?

    Now, in defense of HTMs, I would point out that Jeff does not see this first generation of them as the end goal, but just a first prototype that illustrates the concept. I think he would quickly agree that the learning algorithm will continue to evolve. Not only will it become more efficient and perform faster as generations of engineers learn to apply and enhance them, but they will also come to be more robust. In fairness, I don't see that the quantization process necessarily has to happen before finding temporal relations occurs. They could happen in real time. Also, the prediction part need not wait until after learning. Also, the little right-angle black and white line drawings are not a necessity. Nor are temporal patterns relying on discrete two-step time periods. None of my complaints here represents a "gotcha", I think.

    I have more to read, and I may take an opportunity to try coding this to reproduce this experiment and explore it more. We'll see. I have my own experiment that I started, inspired by my read of On Intelligence, which I have to start fleshing out, though. In the meantime, I'm likely to continue to comment on HTMs as I learn more. I still think they represent the most significant new concept in artificial intelligence in several decades.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - A respectful critique of the Hierarchical Temporal Memory (HTM) concept
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 4/12/2007 - Pattern Sniffer: a demonstration of neural learning

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    Table of contents

    Introduction

    For over a year, I've been nursing what I believe is a somewhat novel concept in AI that superficially resembles a neural network and is inspired by my read of Jeff Hawkins' On Intelligence. Recently, I finally got around to writing code to explore it. I was deeply surprised by how well it already works that I thought it worthwhile to write a blog entry introducing the concept and make public my source code and test program for independent review. For lack of putting any real thought into it, I just named the project / program "Pattern Sniffer".

    My regular readers will recognize my frequent disdain for traditional artificial neural networks (ANNs), not only because they do not strike me as being anything like the ones in "real" brains, but also because they seem to fail miserably at displaying anything like "intelligent" behavior. So it's with reluctance that I call this a neural network. The test program I made, however, has only one "layer" of neurons, which I call a "neuron bank". I did not wish, yet, to demonstrate a hierarchy and multi-level abstraction, though. My main goal was to focus specifically on a very narrow but almost completely overlooked topic in artificial intelligence: unguided learning.

    Unguided learning

    All artificial neural networks I have ever seen or read about rely on a so-called "training phase", where they are exposed to examples of certain patterns they are supposed to be able to recognize in the future before they are ever put out into the "real world". I was disappointed when I finally read of how Numenta's Hierarchic Temporal Memories (HTMs) undergo the same sort of learning process before they can begin recognizing things in the world. This smacks in the face of how humans and other mammals and, indeed, all creatures on Earth that can learn work.

    Does intelligence require that an intelligent being continue to learn once it enters a productive life? I think the answer is obviously "yes". What's more, it's tempting for us to think humans rarely go through learning, as in their school years, and spend most of their lives in a basic "production" mode. Yet I would argue that every moment we are awake, we are learning things. Most of it is quickly forgotten. We use the terms "short term memory" and "working memory" to identify this, which seems to suggest we have something like computer RAM, while the real long-term memory is packed away into a hard drive.

    I'm no expert in neurobiology, so I may be missing some important information. But the idea of information being transferred in packages of data from one part of the brain to another for long term storage doesn't seem to jibe with my limited understanding of how our brains work. Why, for example, should learning a phone number long enough to dial it occur in one part of the brain while learning it for long term use, like with our own home numbers? And how would it be transferred?

    What if it's the same part of the brain learning that phone number, whether for short or long term usage? Perhaps the part of my brain that is most directly associated with remembering phone numbers has some neurons that have learned some important phone numbers and will remember them for life, while it contains other neurons that have not learned any phone numbers and are just eagerly awaiting exposure to new ones that may be learned for a few seconds, a few minutes, or a few years.

    Finite resources

    We are constantly learning. Yet we have a finite amount of brain matter. Somehow we must have some mechanism for deciding what information we are exposed to is important enough to retain long term and which is only worth retaining for a moment.

    When I studied how Numenta's HTMs learn, I was a bit disappointed to see that, while there is a finite and predetermined number of nodes in an HTM, the amount of memory required for one is variable. This is like many other kinds of classifier systems and other learning algorithms. This does make some sense from an engineering perspective, but it does not seem to fit what I understand of how our brains work. Our neurons may change the number and arrangement of dendritic connections, but it's a far cry from keeping a long list of learned things inside. So far, it seems ANNs are one of the only classes of learning systems out there that do use a finite and predefined amount of memory in learning and functioning.

    I believe that, for some functional chunk of cortical tissue, there is a fixed number and basic arrangement of neurons and they all are doing basically the same thing, like learning, recognizing, and reciting phone numbers. It seems intuitive to believe that that chunk has its own way of deciding how to allocate its neurons to various numbers, with some being locked down, long term, and others open to learning new ones immediately for short term use. Any one of these may also eventually become locked down for the long term, too.

    I also believe it's possible, though not certain, that some neurons that have learned information for the long term may occasionally have that information decay and be freed up to learn new things.

    Competing to be useful

    When I started thinking about banks of neurons working in this way, I naturally asked the question: how does the brain decide what is important to learn and how long to retain it? It then occurred to me that there may be some kind of competition going on. What if most of the neurons in the cortex "want" more than anything to be useful? What if they are all competing to be the most useful neuron in the entire brain?

    Let's start with the assumption that all neurons in a neuron bank all have access to the same input data. And let's say each neuron wishes to be most useful by learning some important piece of information. You would think that the first problem to arise would be that they would all learn the exact same piece of information and thus be redundant. But what if, when one neuron learns a piece of information, the others could be steered away from learning the same information? What if every neuron was hungry to learn, but also eager to be unique among its peers in what it knows?

    But how could one neuron know what its peers know? Would that require an outside arbiter? An executive function, perhaps? Not necessarily. It's possible that each neuron, when it considers the current state of input, decides how closely that input matches its own expected pattern that it has learned, "shouts out" how strongly it considers the input to match its expectation. The other neurons in the bank could each be watching to see which neuron shouts out the loudest and assume that neuron is the most likely match. Actually, it could be enough to know the loudest shout and not which neuron did the shouting.

    Confidence

    The idea that every neuron in a bank reports to the group how well it thinks it matches the input is powerful. It follows, then, that the neuron that shouts the loudest would pat itself on the back by becoming more "confident" in its knowledge and thus reinforce what it knows. Conversely, all the other neurons would become no more confident and perhaps even less so with each passing moment that they go unused.

    Confidence breeds stasis. In this case, that's ideal. What if some neurons in a bank were highly confident in what they know and others were very unconfident? Those that have low confidence should be busy looking for patterns to learn. In a rich environment, there will be a nearly limitless variety of new patterns that such neurons could learn. There are several ways a brain could decide that some piece of information is important. One is simple repetition. When you want to remember someone's name, you probably repeat it in your mind several times to help reinforce it. And in school, repetition is key to learning. So it could be that individual neurons of low confidence gain confidence when they latch onto some new pattern and see it repeated. Repetition suggests non-randomness and hence a natural sort of significance.

    What if, as a neuron becomes more confident, it becomes less likely to change its expectation of what pattern it will match? What it confidence is itself a moderator of a neuron's flexibility to learning new patterns?

    The simulation

    Armed with this hypothesis, I set out to make a program called "Pattern Sniffer" to simulate a bank of neurons operating in this way and to test its viability. My goal, to be sure, is not to replicate human neocortical tissue. I suspect our brains do some of what my hypothesis entails, but my main goal is to see if learning can happen like this. Here's a screen shot from the program:


    Screen shot from Pattern Sniffer program

    You can download the Pattern Sniffer program and its source code. This is a VB.NET 2005 application. Once you unzip it, you will find the executable program at PatternSniffer\Ver_01\PatternSniffer\bin\Debug\PatternSniffer.exe. There is a PatternSniffer.exe.config file along-side it, which you can edit with a text editor to change certain settings, such as the number of neurons in the bank. There is a "Snapshots" subfolder, in case you wish to use the "Snapshot" button, not shown here.

    The program's user interface is very simply as seen above. The main feature is a set of gray boxes representing individual neurons in a single bank. The grid of various shades of gray boxes in each represents the "dendrites" of each. Input values in this program are from -1 to +1. In this UI, -1 is represented as white and +1 as black. Each dendrite has an "expectation" of what its input value should be for it to consider itself to match. In this example, there are 25 input values; hence 25 dendrites per neuron. The top left corner of the program features an input grid, also with 25 values. The user can click on this to alternate each pixel from black to white. You probably won't want to use that, though, as the program comes with a SourcePatterns.bmp file that has 25 5x5 gray-scale images on it, which you can edit. Following is a magnified version of SourcePatterns.bmp:


    SourcePatterns.bmp, magnified 10 times

    When you start the program, the neurons start out in a "naive" state. They know nothing and hence have nearly zero confidence (shown as a white box in each neuron display above). As you click the "Random Patch" button, the program picks one of the patterns in SourcePatterns.bmp, displays a representation of it in the input grid, presents it to the neuron bank for a moment of consideration, and updates the display to reflect changes in the neuron bank's state. Check the "Keep going" check box to make pushing this button happen automatically.

    To be clear, while the program displays a 2 dimensional grid of image data, the neurons have no awareness of either a grid or of it being graphical data. They only know they take a set of linear values as input. The inputs could be randomly reshuffled at the start with no impact on behavior. The grid and the choice of image data is simply to help us visualize what is going on inside the bank.

    You can control how many of the patterns in the source set are used by changing the "Use first" number. If you choose 3, for example, patterns 1, 2, and 3 will be used to select randomly from with each click of the "Random Patch" button. At any time, you can specifically change the "Pattern" number to select a specific pattern to work with. Clicking "Linger" causes the bank to go through a single moment of "pondering" the input, just like when the user clicks "Random Patch". With each moment of pondering, the brain becomes more "set" in what it knows. Clicking "Brainwash" brings the entire neuron bank back to its naive state.

    The "Noise" setting is a value from 0 to 100% and controls how degraded the input pattern is when presented to the neuron bank. At 100%, one pattern is nearly indistinguishable from any other.

    Learning in linear time

    Let's start with a familiar and yet simplistic case of training and using our neuron bank. We begin with the naive state as follows:

    Pattern 1 contains all white pixels. With the first click of "Linger", the neurons in the bank all try to determine which of them best matches this pattern. In this case, neuron 14 (n14) is most similar:

    Because it "yells the loudest", it is rewarded by having its confidence level raised ever so slightly and by moving its dendrites' expectation levels closer to the input pattern. The lower the confidence, the more pliable the dendrites' expectations are to change. Since n14 has near zero confidence (-1), it conforms nearly 100% in this single step. Clicking "Linger" 7 more times, n14 continues to be the best match and so continue to increase its confidence until it is nearly full confidence (+1):

    Now we move to pattern 2 and repeat this. Pattern 2 is all black pixels. n23 happens to be most like this pattern, so with repetition it learns it quickly:

    Notice in the preceding how n14 is still expecting the white pattern and has a high level of confidence. Its expectations have shifted every so slightly, indicated by the very faint gray boxes scatted within n14's display.

    We continue this process for the first 6 patterns, picking one and lingering on it for 8 steps each, and end up with the following state:

    You can quickly find the learned knowledge by looking for black confidence level boxes. At this point, you may wonder why the left, right, top, or bottom bar patterns would match neurons with randomized expectations better than, say, the solid white or solid black patterns. This has to do with the way matching occurs and is affected by a neuron's confidence level.

    When the neuron bank is asked to "ponder" the current input, it goes through two steps, with each neuron being processed in turn in one step before the next step proceeds and each neuron is again processed. Step 1 is matching. It begins with each dendrite calculating its own match strength. The match strength is calculated as MaxSignal - Abs(Input - Expectation), where MaxSignal = 1. Thus, the closer the scalar input value is to the value expected by that dendrite, the closer the match strength will be to the maximum possible.

    Things get interesting here. Before returning the match strength value, we alter it. If the strength is less than zero -- that is, if this dendrite finds the input value is very different -- then we "penalize" the match strength using Strength = Strength * Neuron.Confidence * 6. The final strength, whether adjusted or not, is divided by 6 to make sure the strength is never outside the min/max range of -1 to +1. So the more confident the neuron is in what it knows, the more strongly mismatched inputs will penalize the match value.

    So now, if I set "Use first" to 6 and check "Keep going", the program will continually run through these first 6 patterns that have been learned and will always match and reinforce them. So far, this is not very remarkable, as it is easy to make a program learn any number of distinct digital patterns. As we'll see, however, there's a lot more to this than this cheap parlor trick.

    What is remarkable, however, is the time it takes to learn. AI systems that include learning often suffer exponential increases in learning time as the amount of information to learn increases linearly. In this simple demonstration, it does not matter how many novel patterns are exposed to the neuron bank. It will take the same number of steps of repetition to solidify a naive neuron's knowledge. One simple estimate would be that it takes 8 steps to learn each new pattern, when they are presented in this fashion.

    There are caveats, to be sure. For one, the configuration for this demo has only 26 neurons, which means it can only learn up to 26 distinct patterns. For another, as time passes and a neuron is not "used" -- if it never matches anything -- it slowly loses confidence that it is still useful and begins to degrade until it finally is naive again. So there is a practical limit to how many patterns can be taught before there has to be a "refreshment" process to bolster the existing neurons' confidences.

    All at once learning

    The story changes when learning is done in bulk. Let's change the experiment a little to illustrate. First, we'll brainwash our neuron bank. Then, we set "Use first" to 6, the same solid black and white patterns, plus the left, right, top and bottom bars that we saw before. Now we'll step through the process for a while (using the "Random Patch" button). Below is a series of screen shots. Note the "Steps taken" number in each step.

    When we started out, all neurons were naive, meaning they had not learned any patterns and they had no confidence in what they "knew". So as a new pattern is introduced in each moment, there's usually a "virgin" neuron that's happy to match and claim that pattern for its own. But watch the sequence of events for each neuron that does this as time moves on. Each one degrades quickly. In step 1, n21 is the first neuron to match anything, namely the solid black pattern. Yet one step later, when the input has a new pattern, n21 is already starting to decay. By step 8, with no further reinforcement yet, n21 has decayed so much that there's a good chance if the next step brings the solid black pattern back, it may not be the best match for it any more.

    However, reinforcement does build confidence. The right bar pattern has been seen 3 times in the above sequence. n5 was the first to see it and, thanks to reinforcement, it has a higher degree of confidence and so its expectation pattern is more likely to persist longer without reinforcement. Still, this is not at all high. Let's see what happens as time progresses on and the patterns are seen more. Note the steps-taken number in each snapshot and how each learned neuron's confidence level grows with reinforcement:

    OK. So after 80 steps, we have most of the patterns pretty well learned, save for the solid white pattern. By random chance, that one was simply not seen many times during this run. Still, this is markedly worse than when we spoon-fed the patterns to learn, one at a time. With 8 steps per pattern and 6 patterns, the learning process took only 48 steps. So maybe that's an indication that this is not a very good learning algorithm. Isn't the real world like this? And when we try this experiment with all 25 patterns thrown around at random, it may take thousands of steps to solidly learn them all instead of the 200 if we spoon-feed them.

    But maybe this is exactly what we expect. Have you ever been in a room with someone speaking a language you don't understand? You may be exposed to hundreds of new words. If I asked you to repeat even three of them that you picked up (and did not already know), you might just shrug and tell me none of them really stuck. But if you asked one of the speakers to teach you one or two words, you might be able to retain them for the duration of the conversation and reliably repeat them. To use another analogy, consider a grade school English class. Would a teacher be more likely to expose the students to all of the vocabulary words at once and simply repeat them all every day, or instead to expose students to a small number of new vocabulary words each week? Clearly, learning a few new words a week is easier than learning the same several hundred all at once, starting from day one.

    My interpretation of what's going on is that this neural network is behaving very much like our own brains do, in this sense. The more focused its attention is on learning a small number of patterns at one time, the faster it will learn them. This may seem like a weakness of our brains, but I don't think so. I believe this is one way our own brains filter out extraneous information. We're exposed to an endless stream of changing data. Some of it we already know and expect, but a lot of it is novel. Repetition, especially when it occurs in close succession, is a powerful way to suggest that a novel pattern is not random and therefore potentially interesting enough to learn. In fact, the very principle of rote learning seems to be based on this idea of hijacking this repetition-based learning system in our brains.

    Learning while performing

    As I mentioned in the introduction, I've long been bothered by the fact that most AI learning systems require a learning stage separate from a "performance" process. So far, we've been focused on learning with this novel sort of neural network I've made, and we'll continue to focus on that, but I want to stress that all the while that we are training this neural net, we are also watching it perform. Its only task, in this experiment, is to match patterns it sees.

    One simple way to prove this point is to train the neuron bank on however many patterns you wish and then just check the "Keep going" box and watch it perform. Then, at some point, try adding one more pattern using the "Use first" number while it continues crunching away. It will eventually learn the new pattern, all the while still performing its main task of matching patterns. There is no cue we send to the neuron bank that we are introducing a new pattern. In fact, the neuron bank doesn't know any of these numbers we see on the screen. It doesn't, for example, know that we have 25 total patterns, or that we are only using 6 of them at the moment. We don't check any box saying, "you are now supposed to be learning". It just does both constantly; both learning and performing.

    Noisy data

    I said earlier that having a machine learn 6 digital image patterns is just a cheap programming parlor trick. But I said there is more to this. Numenta's Pictures demo app of their HTM concept is configured such that a single node adds a quantization point for each bit-level unique pattern it comes across. True, the HTM can be configured to be a little more relaxed and to consider two similar patterns to represent one and the same, but you have to program the threshold of similarity in in advance of learning. So really, one is very likely to end up with a very large set of quantization points if the training data is noisy. And their own white paper states, "The system achieved 66 percent recognition accuracy on the test image set," hardly impressive. Traditional ANNs seem to be a little less sensitive to noise, but they aren't perfect, either.

    The matching algorithm for this neural network is incredibly simple: just add together the differences between the expected and actual input values and multiply them by other basic factors like confidence level. But as you'll see in the following experiments, this makes it very competent at dealing with noise.

    Let's start by setting "Noise" to 50% and brainwashing. We'll take the top bar pattern (#3) as our starting point and click "Linger" a few times. Watch what happens in the following sequence:

    Notice now n21's expectations, in step 1, look exactly like the first noisy version of the top-bar that it sees? Yet in each successive step of learning, as it gets new noisy versions, its expectation shifts more towards the perfectly noise-free top-bar pattern it never actually sees. It's learning a mostly noise-free version of a pattern it never sees without that noise!

    Is this magic? Not at all. The noise is purely random, not structured. That means with each successive step, n21 is averaging out the pixel values and thus cancelling the noise. Now, n21 is also becoming more confident, though more slowly than it did when it saw the noise free version. So with each passing moment, the pattern is changing more and more slowly. Eventually, it will become fairly solid.

    Let's continue this experiment by training the bank with the first 6 patterns:

    With manual spoon-feed learning of each of the 6 patterns, we get to step 90 and all 6 are pretty solidly learned. We can now switch on the "Keep going" check box to let it cycle at random through all 6 patterns indefinitely and it will continue to work just fine, with 100% accuracy (to be sure, I spot-checked; I didn't check the match accuracy at all steps), in spite of the noise and all the neurons hungrily looking for new patterns to learn. Here it is after 150 unattended steps, still solid in its knowledge:

    Now, we turn the noise level up to 75%. Watch how well it continues to work:

    Look back carefully at these 8 steps, because they are very telling. Remember: the neuron bank has no idea that I am still using the same 6 patterns I trained it on. Remember also that with a highly confident neuron, there is a high penalty for each poorly matched dendrite. Looking at the input patterns, I'm struck by how badly degraded they are and thus difficult for me to match, yet the neuron bank seems to perform brilliantly. Only at step 155 do we finally see a pattern so badly degraded that the bank decides it's a novel one it might want to learn. Of course, it's never going to be seen again, so this blip will be quickly forgotten and n8 will be free to try learning some other new pattern. In all 7 of the other steps, it matches the noisy input pattern correctly.

    This isn't the end of the story, though. Noise filtering cuts both ways. Some unique patterns will be treated as simply noisy versions of known patterns. Take another look at the source patterns:


    SourcePatterns.bmp, magnified 10 times

    Near the bottom, there are four "arrow" patterns. To your eye, they probably look pretty distinctly different from the side bar patterns (left, right, top, bottom) that we've been working with, but to this neural net, they are so similar that they are considered to be simply noisy versions of the bars. Or, conversely, the bars are seen as noisy versions of the arrows. Here's our neuron bank after a brainwashing and learning the first 19 patterns, just before we get to the arrows. You can see the first patterns (solid white and black) to be learned are starting to degrade:

    Now to introduce one of the arrows to the bank. See how, in just a few steps, this confident neuron's expectations change to start looking like the arrow?

    Longevity of information

    Now that I've illustrated some of what this particular program can do and thus some of the potential capabilities for machine learning using this concept, I think I can more easily speak about some of its weaknesses and suggest some potential ways to overcome them.

    For one thing, longevity is lacking. What is learned in this particular demonstration by one neuron can be unlearned within a few minutes of running without seeing that pattern again. That's obviously not a desirable capability of a machine that may have a useful life of many years. But that doesn't mean that this is a limitation of this type of system, per se. I set out to demonstrate not only how a neural network can learn while being productive, but also how unused neurons can be freed up to learn new things without any central control over resource allocation.

    I did address this to some degree in the current algorithm, actually. As described earlier, a neuron loses confidence over time if it is unused, and therefore becomes more pliable to adjusting its expectations. However, the degree to which it loses confidence, in any given step, is determined in part by the best match value seen. That is, if some neuron has a very strong match of the current input pattern, then a non-matching neuron will not lose much confidence. If, however, none of the other neurons considers itself to be a strong match, that could potentially mean that there's a new pattern to learn, and so the non-matching neurons will lose confidence a little faster.

    One way that this algorithm could be improved is by consideration of how "full" a neuron bank is of knowledge. Perhaps when a bank has a lot of naive neurons, those that are highly confident of what they know should be less likely to lose confidence. Conversely, when there are few or no neurons that remain naive, there could be a higher pressure to lose confidence. Perhaps this could further be adjusted based on the rate of novelty in input patterns, but that's harder to measure.

    Perhaps there are higher level ways that memory could be evaluated for importance and, over time, exercised in order to keep it clean and strong.

    Working memory

    When I started making this program, I was not really considering the problem identified earlier in this blog entry of working memory versus long term memory. But in the course of building and testing Pattern Sniffer, it dawned on me that my neural network was displaying both short and long term learning within the same system. The key difference was not structure, locality, or anything so complicated, but simply repetition.

    Yes, in the sample program, we are learning and matching simple visual patterns. But this same kind of memory could just as easily be used to learn a phone number sequence long enough to dial it. Or to remember a visual pattern long enough to match it to something else in the room. And, without heavy repetition, the neuron(s) that remember it will decay again into naivete, ready to learn some other pattern.

    Pattern invariance

    I think this sample program well demonstrates this kind of neural network's insensitivity to noisy data. Still, one thing it clearly is not is insensitive to patterns of information that are subtly transformed.

    With this program, I decided I would use a small visual patch for demonstration purposes in part because I though it would be worth perhaps replicating the ability of our own retinas to detect and report strong edges and edge-like features at different angles, especially if it could learn about edges all on its own. But I must admit this was also a cheat of the same sort many AI researchers tackling vision do: forcibly constrain the source data to take advantage of easy-to-code techniques.

    To their credit, the Numenta team have come up with a crafty way of discerning that different patterns of input are representative of the same things by starting with the assumption that "spatial" patterns that appear in close time succession to one another very likely have the same "cause" and thus such closely tied spatial patterns should be treated as effectively the same, when reporting to higher levels of the brain.

    I think the kind of neural network I've engendered in Pattern Sniffer can benefit from this concept as well. Implicitly, it already embraces the notion that the same pattern, repeated in close succession, has the same cause and is thus significant enough to learn. But to be able to see that two rather different spatial patterns have a common cause could be very powerful. One way to do this would be to have a neuron bank above the first which is responsible for discovering two-step (or longer) sequences in the lower level's data. If, for example, the first level has 10 neurons, the second level could take 20 inputs: 10 for one moment of output and 10 more for the following moment. In keeping with Jeff Hawkins' vision of information flowing both up and down a neural hierarchy, discovering such temporal patterns, the upper neuron bank could "reward" the contributing lower level neurons by pushing up their confidence levels even faster. This higher level neuron bank could even be designed to respond either to the sequence being seen or to any one of its constituents being seen, and thus serve as an "if I see A, B, or C, I'll treat them as all the same thing" kind of operation.

    One thing I had originally envisioned but never implemented is the concept of "don't care". If you look at the source code, you'll notice each dendrite has not only an "expectation", but also a "care" property. The idea was that care would be a value from 0 to 1. Multiplying the match strength by the "care" value would effectively mean that the less a dendrite cares about the input value, the less likely it would be to contribute positively or negatively to the neuron's overall match strength. I was impressed enough with the results of the algorithm without this that I never bothered exploring it further. Honestly, I don't even know quite how I would use it. I had assumed that a neuron could strongly learn some pattern's essential parts and learn to ignore nonessentials by observing that certain parts of a recurring pattern themselves don't recur. But that simply led me to wonder how a neuron bank would decide whether to allocate two or more neurons for pattern variants or to allocate a single neuron with those variants ignored. There's still room to explore this concept further, as it seems almost intuitively like something our own brains would do.

    More to explore

    This is obviously not the end of this concept for me. I think one logical next area of exploration will be hierarchy. I also want to see if and what even the current arrangement can learn when it is exposed to "real world data". Even with noise added, the truth is I'm just feeding this thing carefully crafted, strong patterns that seem of dubious relation to the messy sensory world we inhabit.

    I certainly welcome others to dabble in this concept as well. You can play with this sample program yourself. The .config file gives you control over a bunch of factors, you can supply your own source-patterns graphic, and the program's user interface is fairly easy to extend for other experiments. The NeuronBank class and all of its lower level parts is very self-contained and independent of the UI, which means it can easily be applied in other ways without the need for this or even any user interface. And the core code is surprisingly lightweight (only 3 classes) and heavily commented, so it should be easy to study and even reproduce in other environments.

    So we'll see what's next.

    The nuts and bolts of the algorithm

    I've tried to describe the concepts of the Pattern Sniffer demonstration program in plain English and with visuals, but it's worthwhile to go into more detail for people more interested in the details of how this algorithm actually works. I'll ignore the UI and test program and focus exclusively on the neuron bank and its constituent parts.

    Following is a list of the classes and their essential public members:

    Next is the algorithm for behavior. Aside from basic maintenance like the .Brainwash() methods, there really is only one single operation that the neuron bank and all its parts perform. Each "moment", the input values are set and the neuron bank "ponders" the inputs. Here's a pseudo-code summary of how it works. All the methods and properties have been mashed into one chunk to make it easier to read the process in a linear fashion. Here's the short version:

        Loop endlessly
            
            Set values in Bank.Inputs (each value is a single floating point number from -1 to 1)
            
            Sub Bank.Ponder()
                For Each N in Me.Neurons
                    N.PonderStep1()  (Measure the strength of my own match to the current input.)
                Next N
                For Each N in Me.Neurons
                    N.PonderStep2()  (Adjust my confidence level and dendrite expectations.)
                Next N
            End Sub
            
            For Each N In Bank.Neurons
                Do something with N.MatchValue
            Next
            
        Continue looping
    

    And now the more detailed version, fleshing out PonderStep1() and PonderStep2():

        Loop endlessly
            
            Set values in Bank.Inputs (each value is a single floating point number from -1 to 1)
            
            Sub Bank.Ponder()
                For Each N in Me.Neurons
                    
    
    Sub N.PonderStep1() 'Measure the strength of my own match to the current input. 'Add up all the dendrite strengths. For Each D in Me.Dendrites Strength = Strength + D.MatchStrength Function D.MatchStrenth() As Single Input = ForNeuron.Bank.Inputs(Me.InputIndex) Strength = 1 - AbsoluteValue(Input - m_Expectation) Strength = Strength / 6 'Penalize strongly mismatched values. If Strength < 0 Then Strength = Strength * ForNeuron.Confidence * 6 End If Return Strength End Function D.MatchStrength() Next D 'Divide the total to get the average dendrite strength. Strength = Strength / DendriteCount 'Maybe I am the new best match. If Strength > Bank.BestMatchValue Then Bank.BestMatchValue = Strength Bank.BestMatchIndex = Me.ListIndex End If Me.MatchStrength = Strength End Sub N.PonderStep1()
    Next N For Each N in Me.Neurons
    Sub N.PonderStep2() 'Adjust my confidence level and dendrite expectations. If Me.ListIndex = Bank.BestMatchIndex Then 'I have the best match 'Boost my confidence a little. Me.Confidence = Me.Confidence + 0.8 * Me.MatchStrength If Me.Confidence > 0.9 Then Me.Confidence = 0.9 'Maximum possible confidence. For i = 0 To Me.Dendrites.Count - 1 D = Me.Dendrites(i) Input = Bank.Inputs(i) 'How far away is this dendrite's value from what's expected? Delta = Input - D.Expectation 'The more confident I am, the less I want to deviate from my current expectation. Delta = Delta * (1 - Me.Confidence) D.Expectation = D.Expectation + Delta Next i Else 'I don't have the best match 'I should lose confidence more when no other neuron has a strong match. Me.Confidence = Me.Confidence * 0.001 * (1 - Bank.BestMatchValue) If Me.Confidence < 0.05 Then Me.Confidence = 0.05 'Minimum possible confidence. For i = 0 To Me.Dendrites.Count - 1 D = Me.Dendrites(i) Input = Bank.Inputs(i) If Bank.BestMatchValue - Me.MatchStrength <= 0.1 Then 'I must be pretty close to the current best match. 'Get more random. D.Expectation = D.Expectation + RandomPlusMinus(0.05) * (1 - Me.Confidence) Else 'I don't strongly match the current input. 'How far away is this dendrite's value from what's expected? Delta = Input - D.Expectation 'The more confident I am, the less I want to deviate from current expectation. Delta = Delta * (1 - Confidence) 'Get a little closer to the current input value. D.Expectation = D.Expectation + RandomPlusMinus(0.00001) * Delta * 0.2 End If Next i End If 'Do I have the best match or no? End Sub N.PonderStep2()
    Next N End Sub For Each N In Bank.Neurons Do something with N.MatchValue Next Continue looping

    It might be entertaining to try to boil this down to a few lengthy mathematical formulas, but I usually find those more intimidating than helpful.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Pattern Sniffer: a demonstration of neural learning
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 4/21/2007 - Abstraction in neuron banks

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    On an exhilarating walk with my wife, we discussed the subject of how to build on the lessons I learned from my Pattern Sniffer project and its "neuron bank", documented in my previous blog entry. There are loads of things to do and it was not obvious how to squeeze more value out of what little I've done so far. But it finally became apparent.

    One thing that I was not happy about with Pattern Sniffer is that the world it perceives is "pure". There is just one pattern to perceive at a time. The world we perceive is rarely like this. As I walk along, I hear a bird singing, a car, and a lawn mower at the same time and am aware of each, separately. Clearly, there is lots of raw information overlap, yet I'm able to filter these things out and be aware of all three at once. Pattern Sniffer could see two things going on in its tiny 5 x 5 pixel visual field, but it would see them as a single pattern. This is the kind of sterile world so many AI systems live in because the experimenters don't know how to rise above this problem. Yet rising above is a requirement if we want to be able to get machines that can exist at the "perceptual level", and not just the "sensory level" of intelligence.

    I said in my previous blog entry that my neurons' dendrites had a "care" property, but that I didn't make use of it yet. My vision was that this would play an important role in being able to recognize patterns in a more abstract way, but I didn't know how, yet. I need to get to work and document my results, but I wanted to document some of the thoughts we came up with that I can now practically explore.

    As we walked, I pointed at a car and explained that somehow, I'm able to "mask out" all the not-car parts of the scene and focus only on the car part. It's very hard to explain what that means, but I tried to relate it in terms of my neuron banks. Consider the "left bar" pattern:


    "Left Bar" pattern.

    What if we had a neuron in a bank that could recognize this pattern. But let's say I have another neuron that's a copy of this, save for one thing: each dendrite that now expects white pixels now doesn't actually care what's in the white area. We'll represent "don't care" pixels (dendrites) with blue diagonal stripes, like so:


    "Left Bar" pattern with white pixels replaced by "don't care" pixels.

    In this case, I'm assuming the "care" property would be a numeric value, from 0 (don't care) to 1 (care very much), multiplied while calculating the strength of the match on that dendrite that ultimately contributes to the total match score for the neuron. Now let's say the neuron bank is confronted by a perfect left bar pattern. Clearly, the neuron with the "solid" left bar pattern, with all dendrites having care = 1, will get a stronger match than the neuron with the "masked" version of the left bar pattern, because the don't-care dendrites will not contribute positively to the match score. So if only one neuron gets to "win" this matching game, the neuron with the solid left bar pattern will always win.


    An exact match trumps a masked match.

    But now let's say we showed our neuron bank an "L" shaped pattern. The "masked" left bar pattern is going to fare better than the "solid" left bar, like so:


    The "don't care" pixels don't get penalized by the "lower bar" part.

    Now let's say we also had "bottom bar" neurons that match both the solid and masked versions of that. Things get interesting with the "L" pattern. Let's say we even have a neuron that has learned the solid "L" pattern. Following illustrates these variations:


    The "L" neuron has the best match, followed by the masked left and bottom bar.

    OK, so if we have a neuron that already has a strong match of the "L" pattern, what good are the masked left and bottom bar? Here's where having a neuron hierarchy comes in handy. If we are regularly seeing left bars, bottom bars, and L patterns, a higher level neuron bank could potentially see that the masked-pattern neurons match more things than the solid-pattern neurons do and thus find them to be more generally useful than the specific-pattern neurons. It could then reward them by encouraging them to gain confidence, even though they are not the best matches.

    One thing my current neuron banks assume is that there is only one single best match and that only that one neuron gets rewarded for matching a pattern, while all the others may in fact be penalized. Yet this doesn't seem to fit how our brains work, at some level. Remember: I said I can hear and be aware of a bird singing, a car, and a lawn mower at the same time. That's what I want my software to do, too. See, if we're regularly seeing left bars and bottom bars, it may just be that, when we see an "L" in the input, that it's actually just a left bar and a bottom bar, seen together. That's another interpretation.

    Being able to explain the total input in terms of multiple perceived stimuli must be more "satisfying" to certain parts of our brains than alternative explanations that see the input as all part of a single cause that is not currently known. Being able to engender this could bring a machine a lot closer to the perceptual level of intelligence.

    So that's what I'm probably going to study next. One challenge will be figuring out how to deal with allowing multiple neurons to be rewarded for doing the right thing in a given moment without encouraging neurons to learn redundant information. We'll see.

    method="post" action="../../ai/feedback.asp">
    Your Feedback
    Name (optional):
    Email (optional):

    Prove Your Humanity:
    Please enter the code you see here. This is designed to
    protect our message board from spam posted by automated software.
    Those programs can't easily read these codes like you and I can.

    Subject: AI - Blog - Abstraction in neuron banks
    Or write me an email instead.         

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog


    Ç 6/22/2007 - A hypothetical blob-based vision system

    Back to
    blog home
    Listen to an
    audio version
    Notify me of
    new entries
    Subscribe to a full
    RSS feed of this blog

    As often happens, I was talking with my wife earlier this evening about AI. Given that she's a non-programmer, she's an incredible sport about it and really bright in her understanding of these often arcane ideas.

    Because of some questions she was asking, I thought it worthwhile to explain the basics of classifier systems. Without going into detail here, one way of summarizing them is to imagine representing knowledge of different kinds of things in terms of comparable features. She's a "foodie", so I gave the example of classifying cookies. As an engineer, you might come up with a long list of the things that define cookies; especially ones that can be compared among lots of cookies. Like "includes eggs" or a degree of homogeneity from 0 - 100%. Then, you describe each kind of cookie in terms of all these characteristics and measures. Some cookie types will have a "not applicable" or "don't care" value for some of these characteristics. So when confronted with an object that has a particular set of characteristics, it's pretty easy to figure out which candidate object types best fit this new object and thus come up with a best guess. One could even add learning algorithms and such to deal with genuinely novel kinds of objects.

    I explained classifier systems to my wife in part to show that they are incomplete. Where does the list of characteristics of the cookie in question come from? It's not that it's not a useful thing, but that it lacks the thing that most all AI system ever made to date lack: a decent perceptual faculty. Such a system could have cameras, chemical analyzers, crush sensors, and all sorts of things to generate raw data, and that might give us enough characteristics to classify cookies. But what happens when the cookie is on a table full of food? How do we even find it? AI researchers have been taking the cookie off the table and putting it on the lab bench for their machines to study for decades, and it's a cheap half-solution.

    Ronda naturally asked if it would be possible to have the machine come up with the fields in the "vectors" -- I prefer to think in terms of matrices or database tables -- on its own, instead of having an engineer hand craft those fields? Clever. Of course, I've thought about that and other AI researchers have gone there before. We took the face recognition problem as a new example. I explained how engineers define key points on faces, craft algorithms to find them, and then build a vector of numbers that represent the relationships among those points as found in pictures of faces. The vector can then be used in a classifier system. OK, that's the same as before. So I imagined the engineer instead coming up with an algorithm to look for potential key points in a set of pictures of 100 people's faces. It could then see which ones appear to be repeated in many or most faces and throw away all others. The end result could be a map of key points that are comparable. Those are the fields in the table. OK. So a program can define both the comparable features of faces and then classify all the faces it has pictures of. Pretty cool.

    But then, there's that magic step, again. We had 100 people sit in a well-lit studio and had them all face forward, take off their hats and shades, and so on. We spoon fed our program the data and it works great. Yay. But what about the real world? What about when I want to find and classify faces in photographs taken at Disneyland? That's a new problem and starts to bring up the perception question all over again.

    At some point, as we were talking over all this, I put the question: let's say your practical goal for a system is to be able to pick out certain known objects in a visual scene and keep track of them as they move around. How can you do this? I was reminded of the brilliant observations Donald D. Hoffman laid out in his Visual Intelligence book, which I reviewed on 5/11/2005. Among other things, Hoffman observed that, given a simple drawing representing an outline of an object, it seems we look for "saddle points" and draw imaginary lines to connect them and end up with lots of simpler "blob" shapes. I went further to suggest that this could be a way to segment a complex shape in such a way that it can be represented by a set of ellipses. The figure below shows a simple example:

    I drew a similar outline in a sandbox at a playground we were walking by and asked her to segment it using these fairly simple rules. Naturally, she got the concept easily. From there, we asked how you could get to the clean line drawings to do the segmenting. After all, vision researchers have been banging their heads against the wall trying to come up with clean segmentation algorithms like this for decades.

    I described the most common trick vision researchers have in their arsenal of searching static images for sharp contrasts and approximating lines and curves along them. Not surprisingly, these don't often yield closed loops. That's why I had experimented with growing "bubbles" (see my blog entry and project site) to ensure that there were always closed loops, on the assumption that they would be easier to analyze later than disconnected lines. Following is an illustration:

    I found that somewhat unsatisfying because it relies very much on smooth textures, whereas life is full of more complicated textures that we naturally perceive as continuous surfaces. So we batted around a similar idea in which we could imagine "planting" small circles on the image and growing them so long as the image included within the circle is reasonably homogeneous, from a texture perspective. Scientists are still struggling to understand how it is we perceive textures and how to pick them out. I like the idea of simply averaging out pixel colors in a sample patch to compare that to other such patches and, when the colors are sufficiently similar, assume they have the same texture. Not a bad starting point. So imagine segmenting a source image into a bunch of ellipses, where each ellipse contains as large a patch of one single texture as reasonably possible. Why bother?

    These ellipses -- we'll call them "blobs" for now -- carry usable information. We switched gears and used hand tools as our example. Let's say we want to learn to recognize hammers and wrenches and such and be able to tell one from another, even when there are variations in designs. Can we get geometric information to jibe with the very one-dimensional nature of databases and algebraic scoring functions? Yes. Our blobs have metrics. Each blob has X / Y coordinates and a surface area; we'll call it its "weight". So maybe in our early experiments, we write algorithms to learn how to describe objects' shapes in terms of blobs, like so:

    Step 3 is interesting, in that it involves a somewhat computation-he