Why does the Uncanny Valley exist? Why is it so difficult to cross? Why is it that all the tricks we use for making CG things like rocks, houses, and cars do not work for faces? The usual answer is something along the lines of “Faces are hard” or “As human being we are experts at understanding faces”. That just tells us the symptom, not the cause. What is the real, physiological reason why the Uncanny Valley exists?
In my opinion, we can understand this problem by looking at Visual Agnosia and Prosopagnosia. Let’s start with Visual Agnosia. Here is a quick video about a man named Kevin Chappell who as has Visual Agnosia. Basically, he can’t understand objects. He can see the building blocks of objects such as lines, colors, and shapes. But he can not put it all together to recognize the objects. For example, he can see this thing that is long and silver but has no ability to recognize what it is. He has to use feel and context to realize that the object is a fork. However, he can somehow recognize faces.
I would recommend the whole video, but you should definitely see the parts:
- 0m40s: Kevin discusses what he sees.
- 2m41s: He can easily recognize faces in photos but can not make out the shapes on the vase.
However, Prosopagnosia is the reverse problem. More commonly known as “Face Blindness”, people with Prosopagnosia can make out objects clearly but are unable to recognize faces.
She performs a test at 1m00s to determine if she can recognize her mother by only looking at the face but she can not do it. She can recognize her from her clothes but not the face.
You can also take a test to see if you are faceblind: https://www.faceblind.org/facetests/index.php
Average is 85% and I got a 78%. If you score below 50% you might be faceblind. It’s actually harder to recognize people than you think when there is no context behind it. In the beach volleyball crowd, it is a common occurrence for two people to start up a conversation in a bar and realize 5 minutes later that they already met while wearing volleyball gear. For better or worse, most people recognize me though because I’m the only 6’4” volleyball-playing redhead in a 5 mile radius (context).
The point it is, our visual cortex uses a completely different algorithm to process faces than all other objects. Many people think that our vision works like an LCD screen where the eye records a rectangle of pixels that gets sent to our brains. Rather, the data from the rods and cones gets passed to our retinal ganglion cells. The retinal ganglions perform contrast detection and send the data through our optic nerve to our visual cortex. The visual cortex applies simple shape detection along with movement tracking and color recognition. That “feature” data then gets sent for semantic analysis. As always, wikipedia has a great article: Cognitive Neuroscience of Visual Object Recognition.
However, any information that affects faces goes to the Fusiform Face Area (wikipedia article: http://en.wikipedia.org/wiki/Fusiform_face_area). The FFA is the area of the brain which, when damaged, causes Prosopagnosia.
In the computer graphics world we are pretty good at faking things. Good artists have been trained to analyze real world objects and try to create the most realistic looking facsimile using the minimum amount of time and resources possible. That includes both content creation time and rendering time. Since time is always a constraint, good artists have learned that we want to do the minimum amount of work possible to trick our visual system. In computer graphics, we are experts at fooling our visual cortex. That’s our one and only job.
But faces are handled by a completely different section of our brain: The Fusiform Face Area. My theory is that the Uncanny Valley feeling happens when the Fusiform Face Area has a mismatch with the rest of the visual system. Your visual system thinks the scene is real but the FFA is telling you that something is wrong. To solve this problem, all we need is a better understanding of how the FFA works. We need to find out what is and is not important to the FFA, and if we can do that then we should be able to solve the Uncanny Valley.
Going big picture for a second, we have evolved for millions of years to have this separate, special, dedicated area of brain functionality in the FFA. So it probably is not doing the exact same thing as the rest of our visual system. If it was doing the same thing it would not have evolved into a separate region.
For example, something about our FFA is hardwired to detect faces that are upright (as opposed to inverted). This is known as the Thatcher Effect (wikipedia).
The inverted images both look reasonable at a glance. If you look for a while you can probably use context to realize that the right one is a little bit off. But when you see them in the correct orientation:
It looks obviously, terrifyingly wrong. That image demonstrates your FFA rejecting the image which puts it in the Uncanny Valley. The point is that the FFA is not doing the same thing as the rest of your visual system with higher quality. Rather, the FFA is using fundamentally different algorithms than the rest of your visual cortex.
I don’t think about the Uncanny Valley as making “Higher Quality Faces”. Rather, I think about it in terms of “FFA Rejection”. Our CG faces do not necessarily need to be more accurate or more detailed. Instead we need to figure out which triggers cause the FFA to reject the image. Then all we have to do is fake those triggers.
So that leads to the question: What should we do? We could just make all our faces upside down but your art director will probably veto that idea. Why is the FFA rejecting our CG images? What is the missing thing? You probably already know my answer: It rhymes with “Stud Toe”.