Impossible Code

5. December 2008 by thebeebs 0 Comments

One thing that amazes me about the human brain is it’s ability to recognise objects and how computers are pretty rubbish at it. I’ve been pondering this for the past hour or two ever since one of the guys at work showed me an application on his phone that photographed a barcode which then used a shopping web service to return you a list of prices for that object in your local area.

Without a doubt that's a very cool and clever piece of code it’s just a shame that the computer needs the barcode, wouldn’t it just be incredible if you could photograph any item and the computer could tell you what it was. It got me thinking of applications that would be possible with this technology.

One application that sprung to mind was one that could take a photograph of your lunch and tell you how many calories it contained. The idea was inspired in part by Dr Yoshiro Nakamats a man who was awarded the 2005 Ig Nobel prize for Nutrition, for photographing and then analyzing every meal he has consumed during a period of 34 years. Just imagine if you could actually get this to work, a user could photograph everything they ate in a day and a computer could calculate exactly how much exercise is required to ensure that the user stayed trim.

The application I’m proposing is far more complicated than the barcode application, with the barcode application the computer needs to:

Recognise a white box
Count the pixel distance between the black lines contained in the white box
Run it through a fuzzy algorithm that could determine the barcode number
Send the number to a web-service
Return the results from the web service to screen

My application Would need to:

Identify an object with no idea about current composition, context, white balance or lighting
Identify which items were food and which were not
Look around the picture for an object that is always a constant size and then use this to calculate the size of object (or objects) of food.
Call a web-service to look up calorie details based upon size and type of the food item
Return the results from the web service to screen

The first 3 items in this list sound impossible… but I like to think nothing is impossible it’s all just a matter of time and money. If you need proof that anything is possible just take a look at this: http://uk.youtube.com/watch?v=5fAn5A0HbhU

Anyway back to the problem… Consider you had 1 million photographs of a hamburger. Would that provide enough information for you to determine if a new photograph contained a hamburger?. Scale that up and collect millions and millions of pictures of food, all of which were linked to food descriptions, would it be possible to determine if a new photograph was close to any of the existing images? I think it would… the problem could be solved not by trying to perform very complex image identification and classification but actually by collecting enough pictures and writing a way of encoding them so that it was possible to compare the millions, perhaps billions of items.

This approach is very similar to the way that Google approached the translation of German. They didn’t attempt to work out all the rules of the language, they just analyzed millions and million of pages in English and German translation then asked people using the translation to adjust the translations that were generated. It led to a German translation system that was more accurate than traditional systems that focused solely on the rules and intricacies of the language and word for word translations.

If anyone has a spare 5 million for research into this I’d be more than willing to start work next week.