I doubt it is possible to do much in the way of image classification
with the puny processors found in most smart phones. I would suggest a
review of Google's state of the art results, which still have a long
way to go to match human level vision.
http://arxiv.org/pdf/1112.6209.pdf
Google's system learned to recognize human faces, human bodies, and
cat faces. Humans can recognize about 100,000 distinct visual objects.
Google has a 15.2% accuracy on ImageNet, a 70% improvement over other
systems. Humans can recognize nearly all of the objects in ImageNet.
Google's system was trained on 10 million 200 by 200 still images.
Humans are trained on the equivalent of 100 billion video frames at
10000 by 10000 resolution.
Google's system recognizes only still, grayscale images in a
standalone system. Humans perceive color, motion, and stereoscopic
depth in a system integrated with language, hearing, motor movement
and feedback, and other sensory data, all of which provide important
context for visual recognition.
Google's system uses a 9 layer neural network with 10^9 connections.
The human brain has about 10^14 connections.
Google's system was trained for 3 days on 1000 16-core processors. A
human brain sized neural network is equivalent to several petaflops of
computation, trained for decades.
Google's vision architecture is fairly simple. The human genome has
the same information content as about 300 million lines of code.
--
-- Matt Mahoney,
mattma...@gmail.com