Google’s Tango prototype is a handset that can map 3D spaces simply with a walkthrough. That is possible because at its heart is the Movidius Myriad 1 vision processing unit or the VPU. According to Movidius, this VPU (not to be confused with video processing unit), is about ten times faster and has very little resemblance to GPUs or graphics processing units with which we are all familiar.
The VPU actually sits between the camera and the application processor in contrast to the GPU, which resides between the application processor and the display. However, that is only the beginning of their differences, since, as defined by Movidius, the VPU is an essential new component that will bring about astonishing changes to visual awareness in a camera.
The CEO of Movidius, Remi El-Quazzone, believes that all cameras, specifically the mobile ones are currently passing through a revolution and he calls this computational imaging, bringing in new functionality. He further explains that Movidius is developing visual processing units with functions similar to that of the visual cortex of the brain. The aim is to let the devices have the same kind of awareness and realism that the eye-brain combination has in the human body.
If you look closely at the graphical processing units, most are mere bit-bangers. These are vector processors performing identical operations on each pixel on the screen at extraordinarily high speeds. On the other hand, the VPU first interprets the data coming from the camera – very similar to what the eye and visual cortex do – before sending it to the applications processor. Therefore, instead of raw pixels, the application processor gets to work on high-level metadata, identifying where an object begins and where it ends, which ones are in front of the others, what kind of object each is, where its shadow is, the trajectory it is following, and other similar dozens of smart information. In fact, not only does it make the work of the applications processor markedly easier, it also makes possible Nuevo applications that no-one could have thought of earlier.
According to Remi, the Movidius methodology is to convert all the photons captured by the camera into metadata that expresses an understanding of the scene. Depending on the application, this metadata could then be used in a number of different ways. However, initially, they are looking into providing total visual awareness of the most relevant details in the scene.
Others have already explored the algorithms required by Movidius to perform such types of analyzes. They find that this requires supercomputers consuming megawatts to do the same. However, Movidius boldly claims that it is possible to equal or even exceed the visual awareness of such applications, consuming only few watts of power, and sometimes only a fraction of a watt.
Movidius claims a novel micro-architecture of cores entirely optimized for computational imaging. This involves structuring the delays between stages and an extremely innovative fabric of memory that allows maximizing the data localization. Since this drastically reduces the need for external memory accesses, it also reduces power requirements substantially.