Every day, computers are getting smarter. So far, it is not clear whether the smartness is moving towards something as depicted in the Terminator movies, but computers are beating humans in chess, poker and Jeopardy. The next hurdle that computers have crossed is image recognition. Microsoft claims to have programmed a computer that can beat humans at recognizing images.
Although the final competition is going to be held on December 17, 2015, already there are claims that computers are better than humans are in visual recognition. The ImageNet Large Scale Visual Recognition Challenge will do judging for the final competition. The first claim about computers beating humans came from Microsoft. They claimed that while humans made 5.1% errors in recognizing images, computers failed only in 4.94% cases. After 5 days of Microsoft announcing their feat, Google announced that they have bettered the Microsoft claim by 0.04%. That means the competition is getting fiercer every day.
Since 2010, more than 50 institutions take part every year in the competition for image recognition. ImageNet runs this competition and they have hundreds of object categories and several millions of example images. So far, humans have scored the most, but this year a computer is expected to take the crown. Typically, contestants use the latest deep learning algorithms. Derived from different types of artificial neural networks, these deep learning algorithms mimic the way the human brain works to a varying degree.
Although no contestant actually offers their exact code, they provide papers that freely describe their algorithm in great detail – similar to the spirit of open source – explaining the advantages of their algorithm and why it is expected to work so well. As Microsoft explains in their paper, they are using deep CNNs or convolutional neural networks that have 30 weight layers. Google have revealed that they are using batch normalization techniques, and these do not allow neurons to saturate during initialization.
Usually, the conventional way of using neural units involves hand designing them and fixing while training. However, Microsoft has deviated from this path and made the neural units smarter. They have done this by making their form more flexible in nature. According to the principal researcher at the Visual Computing Group of Microsoft Research, Asia, each neural unit undergoes a particular form of end-to-end training that imparts the learning. The introduction of smarter units improves the model considerably.
However, the reason for the ability of current neural networks being able to beat human experts lies in the algorithm of Microsoft’s Deep Learning. This algorithm usually initializes and trains on 1.2 million training images and verifies its learning on 50-thousand validation images. For the final application of its learning, Deep Learning uses 100-thousand test images from the main image database. However, Microsoft did not actually follow this standard route.
As training of very deep neural networks is rather difficult, Microsoft used a robust initialization method. As with other contestants, Microsoft did buy Nvidia’s access to their arrays of graphic processing units. However, they also bought and configured their own supercomputer. They simulated parametric rectified linear neural units and that helped them finally to beat the human experts for image classification.