Fundamentals of Data Science/Machine Learning

6 views
Skip to first unread message

Travis Smith

unread,
Feb 1, 2017, 8:12:52 PM2/1/17
to omaha-ma...@googlegroups.com, Nick Wertzberger
Hey guys,

One of the things I've been learning over the last year or so is introductory 'data science' techniques.  Last night at the meeting I attempted to explain some of it, but did not do so to my own satisfaction.  This is my attempt to re-explain in a more clear manner.  Forgive me in advance if any of this is patronizing:

1. A lot of what we call data science seems to be based on linear algebra, which uses algebra to predict values a long a line, or a plane, or some other object that is in a space of 'n' dimensions.

2. The computer takes in 'observations' consisting of whatever factors you put into it.  If I have a tulip, it might have several observations about it:  petal color, petal size, petal shape, whatever.  The observation of one tulip is spread out over several 'features' or columns:

Petal color, size, shape, length, height of plant

3. If I had 100 such observations of tulips, I could then plot each observation on a graph, with each feature representing a dimension.

Only one thing you are categorizing = points on a line
2 things categorized = a 2d graph
3 things categorized = an isometric 3d plot
4+ = monkey mind blown!

4. Aside from the fact that a multi-dimensional space very often exists in such a categorization, if you think in 2d or 3d, the principles pretty much work.

-Just like how you can plot a line that describes the average of data that his a linear relationship and then predict what further or future observations might hold (https://en.wikipedia.org/wiki/Linear_regression), you can do this in among any 2 dimensions in your n dimensional space.

-Just like how you might notice that certain plots on a 2d graph clump together and form clusters, you can do that in n dimensional space by calculating the linear distance from each observation point to its 'k' (ex. 8) nearest neighbouring observations. https://en.wikipedia.org/wiki/Nearest_neighbor_graph

-And so on.  This is one important way in which computers are able to recognize patterns in data and make predictions about new data of the same type: by using algorithms that borrow heavily from a basis in linear algebra.

Which makes me think this might be a good course to take for fun:
https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/

Travis

GPG Key: BFEB 7E65 04EB 184B A150 2E2C CC11 933F EE27 D86E
Reply all
Reply to author
Forward
0 new messages