Data Science Discussions

50 views
Skip to first unread message

JnBrymn

unread,
Nov 2, 2017, 6:50:47 PM11/2/17
to penny-un...@googlegroups.com
This week I've been in Penny Chat overdrive, Tuesday, Wednesday, and Thursday I had lunch with different Data Scientists. In the not-so-distant future I would like to morph my career toward Data Science (and MATH!) so I'm effectively going around and interviewing people that are Data Scientists and learning about how they work.

Here's a quick summary of what I learned this week:

Sharon Chou - Asurion
Asurion provides small item insurance policies. (So if you buy a TV at Walmart, then Asurion might be the company that handles the insurance policy that you buy.) Sharon's work there is in support of Asurion's customer support chat feature. At the end of each support chat the customer can rate their satisfaction with the support. Sharon's work is then to figure out what way to best improve the customers' satisfaction. Factors that come into play are the response times, the time spent with support, the topics they talk about etc.

Sharon's previous employment was with a clean-tech startup where she built models that predicted energy and cost savings that would result from making facilities improvements. One of the perennial frustrations there was with a lack of data and with the greatly varied sources for getting data about existing properties and their energy consumption.

Sharon's tech stack is python-based: Scikit-learn and friends.

Damian Mingle - Intermedix
Damian the the Chief Data Scientist at Intermedix, coming by way of a company he founded called WPC Healthcare which recently sold to Intermedix. (Did I get that right Damian?)

The main project that Damian is working with is sepsis prediction. Sepsis is a deadly infection that can be treated if caught in time, but is often recognized too late. Damian and his team are building a model to predict sepsis early so that it can be treated. The problem is challenging because the access to medical records is limited -- and besides, the medical professionals often recognize sepsis to late, it sneaks up! So Damian and his team have figured out a way to join in lots of disjointed data sets, impute missing data, and they have a tree-based approach (IICR) that very accurately predicts sepsis and saves lives.

Damian's team uses a mixture of R and Python to get work done. Damian confided that he was rather fond of Python himself.

After talking with Damian I'm also watching him for something else besides Data Science. Damian appears magically produce surplus time which allows him to achieve so much. He's built and sold companies, he makes a point to always have side-projects in the works, he's a top Kaggler, and after all this Damian makes time to meet with people. I've got to figure out this secret!

Rob Harrigan - 247Sports
247Sports is a small Data Science shop that recent was purchased by CBS Interactive. The main product from 247Sports is in tracking high school sports and providing statistics for things like college sports recruiting. 

Rob's role at at 247Sports is quite interesting - he is an aspiring "unicorn". One of Rob's mentors once told him that people with specialized knowledge in Data Science are valuable, and people with specialized knowledge in Data Engineering are valuable - but that there is no one who has specialized knowledge of both Data Science and Data Engineering. "These people are unicorns. They don't exist." So this is exactly what Rob decided to become.

Rob's specific role is in monitoring the users' engagement with the website and determining how to keep them engaged. And true to his unicorniness, this role involves both Data Engineering work (building services and pipelines in AWS using Terraform, Docker, Flask, etc.) and Data Science (using scikit-learn, seaborn, numpy, etc.). Rob stressed the importance in his career of positioning himself always in the middle of fields. He intentionally finds no-man's-lands where they shouldn't be, and positions himself right there in the middle.

Rob also had an interesting take on the notion of the term "Data Science" - he believes that the term is over-hyped and we will soon see the term fall away; it will be seen as something of a synonym to "magic". However Rob believes that Data Science as a field will nevertheless have a healthy future, it will just be under different branding. As business better understands Data Science, it will split into specialized sub-fields that have more specialized and meaningful names.

Overall Learnings
  • Python is a great place to start out. All three of these individuals use Python tools including Scikit-learn, seaborn, numpy, pandas, pyspark.
  • A common concerns I found in all my conversations is that good data is hard to come by and Data Science is often forced to make do with whatever data is available. So when you think "this is impossible, I simply don't have the data I need" - that might be as good as it's going to get. You have to be resourceful.
  • You don't have to go to Data Science school to be a data scientist, but you do have to be good with math. The above people are a materials engineer, a philosopher, and an image processing engineer respectively. This gives me hope as a former Aerospace guy. :D
Thanks Sharon, Damian, and Rob for your time! I learned a lot.

-John

JnBrymn

unread,
Nov 9, 2017, 10:54:35 PM11/9/17
to Penny University
This past week, in my quest to better understand the field of Data Science, I got a chance to meet with another couple of really interesting people, Jason King, and Jimmy Whitaker. Here's what we covered:

Jason King - Xsolis
Xsolis helps to inform hospitals of any risk that their medical treatments and procedures may not be reimbursed by insurance companies. They do this by processing medical data and building machine learning models that classify the output as either reimbursable or not. Jason works as a Data Scientist as Xsolis. According to Jason, Xsolis process a very large and diverse set of data for their hospitals in order to build these models. The model, then is actually a "stacked model", a layered combination of several low-level models into one overarching model. Jason's piece of the puzzle is in handling textual feature extraction and modeling. The text comes from various sources - hand-written prescription, doctor's voice recorded noted, etc. Because of the wide variety of the inputs and the particularly "dirty" nature of the inputs, Jason says that he probably puts 75% of his time into data cleanup.

Another interested part of our discussion is Jason's background. Jason got a PhD in biology/physics related to protein folding - something very different from Data Science. Jason transitioned to Data Science by teaching himself the trade. He says that he spent probaby 2 hours a night for 6 months learning about the techniques and theory behind Data Science. 

Kaggle was another interesting topic we covered. Jason has been involved in 3 different Kaggle competitions, winning a bronze medal in one of them. I didn't know this, but if you get 2 bronze medals or better, then you will have access to competitions in Kaggle that are only available to elite competitors. That's enticing. Jason say that Kaggle is a great was to practice, learn, and to develop a portfolio of your work, but he warns that you can no longer go to Kaggle and expect to become famous. The early competitors regularly won competitions with very simple techniques, but now Kaggle has drawn in a very large community of very talented data scientists.

Jimmy Whitaker - Digital Reasoning
Digital Reasoning researches and builds products with "cognitive computing" - e.g. artificial intelligence. Jimmy is leading a team that is pioneering Deep Learning efforts in Computer Vision, Audio Speech Recognition, and Text analytics. Jimmy is the first Data Scientist that I've met with that is working directly with Deep Learning! And that is a large part of the conversation that we covered - What is Deep Learning? And why is it the next cool thing?

According to Jimmy, Deep Learning is basically the same thing as Neural Networks. And Neural Networks, basically, are just a way for a computer to take a bunch of input-output data and build a black box function that can take a new input and predict the appropriate output. So how's this different from "classical" data science? This is where things get really interesting. With classical data science a human carefully designs features. So with images, for example, your basic features might be edges, corners, and color gradients. From simple features, you then build more complex features like circle or rectangle detection, and you keep building up more and more complicated features. But there's a problem with this approach. If you're building an algorithm to locate the position of cats in a photograph, then at the end of the day you're going to have to build up very strange and complex "cat" features by hand. This starts to feel foolish. What's a cat feature?! Here is where Deep Learning is radically different form "classical" data science - you don't worry about the specific features at all! You create a network and let the network build it's own features. In some ways the network as it learns builds features much like a human would - low level features are again edges, corners, and color gradients; upon these are built higher level features. But since the features are based completely on the data, you never have to worry about "What's a good set of cat features?" - the network finds a good set automatically.

Jimmy's current work is in speech recognition. Unlike Siri or Alexa, which relies upon having a fairly clean input (humans talking slowly so that a computer will understand them) Jimmy is trying to make it possible for computers to transcribe text from much "dirtier" real-world input data. Seems like interesting work.

Jimmy also has a great story about how he got into Data Science. Jimmy started out with a bachelor's degree in a technical field (I think engineering, but I don't recall the details). After graduating and working the field for a short time Jimmy decided that this was the wrong field and left and become a construction worker. Yep... a construction worker. And soon Jimmy again felt that he'd landed himself in the wrong field, so we enrolled in Oxford University in England studying AI and cyber security. Yep... Oxford. Pretty big leap aye? But now Jimmy appears to have found a field that he enjoys.

Thanks Jason and Jimmy! The conversations were incredibly interesting.

John
Reply all
Reply to author
Forward
0 new messages