As someone with a bit of a background in machine learning and who is currently at an institution with academics who actually produce research on Deep Learning, let me tell you that the claims being made for this field are wildly ridiculously exaggerated.
"Deep Learning" is a current wave of research in the field of machine learning. It's the thing generating the return of the AI hype that is currently sweeping over all tech commentary venues like this.
Deep Learning is a significant advancement to be sure, but it's not going to solve hard AI anytime soon. And the amount of BS being slung around by people who don't understand it (like Cringely) is extremely frustrating.
Firs the TLDR. Deep learning does two things:
1) It reduces a problem with overfitting in neural networks.
2) It introduces new techniques for "unsupervised feature learning", in other words it introduces new more automatic ways to figure out the parts of your data you should feed into your learning algorithm.
For non-machine learning people, here's the short version of Deep Learning.
The field called "machine learning" should really be called "applied statistics". It's goal is to take a bunch of data and extract information from that data that can then generalize to new inputs that you haven't seen yet. As the simplest possible example, imagine that there was a food cart outside my office and each day we counted the number of people who visited. The cart's getting more popular so if we graphed this data we might see that there was something like a linear relationship between time and the number of people who visit the cart, i.e. we could draw a line that would approximately fit through the graph of visitors vs. time. Using that line we could then predict the number of people who are likely to visit a cart on a particular day.
Obviously, only an idiot would expect this trend to continue forever without ever changing. An idiot like Disco Stu:
http://buttcoin.org/wp-content/uploads/2011/06/disco-stu.jpg
There are two kinds of problems that our simple "learning" algorithm (and all other learning algorithms) might suffer from: bias and overfitting.
Bias means that the data in your sample is systematically unlike the real world. For example, maybe you counted people at the cart at the end of August when all the students were returning to campus and the weather was still nice. This data won't generalize well to a different situation, i.e. mid-February when everyone's already here and it's snowing out. Another really familiar version of this is "selection bias" in, for example, polling data. Polling will tend to oversample people who have landlines, are not homeless, and are willing to answer the phone during dinner time. In other words, more of these types of people will answer the pollster's calls proportionally than really exist in the population. If the pollster doesn't account for this difference using some data about the general population about which they want to make claims then their results will suffer from bias.
The second problem is overfitting. Overfitting means that your approach tried to extract too much of a trend from the data you found. Let's say you had a larger sample of cart data (say from a full year instead of a week and from multiple carts) and you used a more sophisticated algorithm to predict the number of visitors. Maybe you kept track of the weather, the academic calendar, and a bunch of other factors so you're not just naively assuming some linear relationship between visitors and the single other variable of time. You train up your new sophisticated Support Vector Machine or Neural Network learning process on this data to make predictions about the future. Now the danger is that your fancy new algorithm ends up treating noise in the training data as if it was meaningful and uses that in its generalizations. In other words, your model will do really well on data that is very similar to the training data, but quite badly on data that differs from the training. What i
f it rains in late August when your training data only had hot weather then? What if the carts are moved to new parking spaces and that effects the results? If the model attempts to learn to much from the data it will fail when reality (inevitably) diverges from the training data.
So, now back to deep learning:
One major leg of deep learning is the return of neural networks. Neural networks have been around for a long-time. They're a learning technique that's vaguely inspired by some ideas about how the brain works, but they're not fundamentally different than other statistical machine learning techniques. They've come and gone in a series of different hype waves historically. The first one in the 60s collapsed because it was proven that a single layer neural network couldn't learn some important basic logical operations. The second one in the 80s collapsed because multi-layer networks were extremely computation intensive and so couldn't be used efficiently on real problems.
In the early 2000s they came back again as computers got faster but it turned out that they suffered from major problems with overfitting. The first big wave of deep learning advances consists of some quite clever techniques to reduce overfitting in neural nets. The short version is that you throw away some of the nodes randomly as you go, an idea borrowed from Random Decision Forests, which is one the dominant machine learning approaches actually in practice today (along with SVMs).
>From a wider perspective this is simply an application of the realization of the value "ensemble methods", i.e. combining the results of many dumb learners that use different algorithms and different parts of the data instead of one big smarter learner. One particularly prominent example of this trend was Pragmatic Chaos, the amalgamation of algorithms whose authors teamed up to win the Netflix Prize:
http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
What's cool about neural networks is not that they're like the brain, but that they are kind of their own ensembles.
New neural networks techniques that take this into account, particularly "convolutional neural networks", have had a lot of success on some long standing image recognition and other machine learning projects. I worked with a system that used a CNN to identify the walls of neurons in scans of mouse retinas and it worked amazingly well given the image quality.
Now the second part: unsupervised feature learning.
One of the key parts of building most machine learning applications today is what's called "feature engineering". This is where you decide what aspects of your data you should actually send to your learning algorithm. Returning to our food cart example, we have all this data that we recorded: time of day, day of year, temperature, humidity, whether or not today is a holiday, the location of the cart, etc. But we don't know which of these "features" are actually useful for making the predictions we want to make. So how we proceed is that we use software to try to experiment. We discover correlations between various features, we scale the data, we drop features, we calculate new ones that summarize or smooth out some of what we measured, etc. And we input these features into our learning algorithm over and over to see how the results change.
One truism amongst machine learning people is: better features beats better algorithms. If you provide a bad feature set to two algorithms, they superior one might outperform the inferior one by a couple of percentage points of accuracy. However, re-engineering the features might improve the results from both by tens of points.
Manual feature engineering is a black art that requires the practitioner to learn a lot about the actual field they're working on. You have to know something about food carts, academic calendars, and how the data was collected (or about neurons and retinas and brain scans).
Unsupervised feature learning is the field of trying to get the computer to do this for you. Instead of having a learning algorithm that starts with a set of features extracted from the data, you give it the full data set and the algorithm tries to find out what the fundamental axes are that explain the data. For example, if you're processing images to recognize shapes, the algorithm would end up extracting various directions of edges as a low-level feature.
It can be a bit of a hard concept to get your mind around, but the math is actually less bad than some of the high-level kernel method techniques like Support Vector Machines.
I highly recommend this really great talk by Andrew Ng from Stanford/Google where he explains the whats and whys of unsupervised feature learning:
https://www.youtube.com/watch?v=n1ViNeWhC24
(Don't forget to take the brain comparisons with a grain of salt. I think Ng is pretty modest about making these, but people tend to get carried away with them.)
This is the kind of thing behind things like Google's system that "taught itself to recognize a cat's face".
Anyway, hope that gives you enough of an overview to be able to read some of this nonsense critically. It's an interesting field and it's definitely making some big advances but, because of the framing as "learning" or "AI", people in the tech and mainstream press tend to let their rhetoric run wild when they talk about it.
-- Greg
> --
> You received this message because you are subscribed to the Google
> Groups "Philosophy in a time of Software" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
>
philosophy-in-a-time-o...@googlegroups.com
> <mailto:
philosophy-in-a-time-o...@googlegroups.com>.
> For more options, visit
https://groups.google.com/d/optout.