Although we mostly use speech when interacting with other human beings, interacting with machines using speech is still a distant dream. So far, human-to-machine communication technology has been reserved for science fiction movies. However, many are working to provide groundwork for transforming that vision to reality. For instance, speech recognition software, such as Apple’s Siri for the iPhone 4s, is now quite popular. Yet, there are several challenges to address and many kinks to be smoothened out related to voice authentication and voice-activated commands.
VocalZoom, a startup based in Israel, utilizes military technology and develops proprietary optical sensors to map out vibrations emanating from people when they speak. Their HMC or human-to-machine sensor is coupled to an acoustic microphone voice signal. They translate the output to a machine-readable sound signal. The system delivers a speech-recognition technology that is highly accurate and unparalleled in the market today.
VocalZoom approached the problem of speech recognition in an entirely different way. They came across a military technology commonly used for eavesdropping – a laser microphone to sense vibrations on windows. Designers at VocalZoom surmised that if windows vibrate when people speak, surely other things did too. Their research led them to facial skin vibrations because of voice. They created a special low-cost sensor small enough to measure facial vibrations similar to the way microphones did. Their speech recognition system uses microphones, audio processors and the special sensor.
The special sensor is actually an interferometer to measure distance and velocity. Therefore, it can be used as a microphone for measuring vibrations of audiobe used for 3D imaging, proximity sensing, biometric authentication, tapping detection and accurate heart-rate detection. The multifunction sensor has a very wide dynamic range useful for implementation in many applications, for instance, to measure vibrations in engines, industrial printers, or turbines.
A typical sensor for measuring distance and velocity, such as time-of-flight based sensors, use an emitter and a detector. However, designers at VocalZoom use a laser for both purposes. That means their interferometer is of a super low-cost design that practically has no optical component. However, they had to cope with noise issues and it was necessary to develop noise reduction methods when using the sensor with speech recognition systems.
The noise reduction methods used by VocalZoom often use optical sensors to improve speech recognition. They have reached a stage where in an environment with a lot of background noise, they can reduce the results of the speech recognition or voice authentication to a very low error rate.
In actual practice, the laser is directed at the face of the person talking. It measures vibrations that are in the order of tens and hundreds of nanometers, not usually picked up by normal sensors. As the laser measurements are so precise, other surrounding noise does not interfere with the micro-measurements of the skin, which are then converted into clear audio.
Very soon, you will be able to use the optical laser technology of VocalZoom together with Siri or Google Voice and other voice-recognition applications for a wholly different experience.