This week I've been in Penny Chat overdrive, Tuesday, Wednesday, and Thursday I had lunch with different Data Scientists. In the not-so-distant future I would like to morph my career toward Data Science (and MATH!) so I'm effectively going around and interviewing people that are Data Scientists and learning about how they work.
Here's a quick summary of what I learned this week:
Sharon Chou - Asurion
Asurion provides small item insurance policies. (So if you buy a TV at Walmart, then Asurion might be the company that handles the insurance policy that you buy.) Sharon's work there is in support of Asurion's customer support chat feature. At the end of each support chat the customer can rate their satisfaction with the support. Sharon's work is then to figure out what way to best improve the customers' satisfaction. Factors that come into play are the response times, the time spent with support, the topics they talk about etc.
Sharon's previous employment was with a clean-tech startup where she built models that predicted energy and cost savings that would result from making facilities improvements. One of the perennial frustrations there was with a lack of data and with the greatly varied sources for getting data about existing properties and their energy consumption.
Sharon's tech stack is python-based: Scikit-learn and friends.
Damian Mingle - Intermedix
Damian the the Chief Data Scientist at Intermedix, coming by way of a company he founded called WPC Healthcare which recently sold to Intermedix. (Did I get that right Damian?)
The main project that Damian is working with is sepsis prediction. Sepsis is a deadly infection that can be treated if caught in time, but is often recognized too late. Damian and his team are building a model to predict sepsis early so that it can be treated. The problem is challenging because the access to medical records is limited -- and besides, the medical professionals often recognize sepsis to late, it sneaks up! So Damian and his team have figured out a way to join in lots of disjointed data sets, impute missing data, and they have a tree-based approach (IICR) that very accurately predicts sepsis and saves lives.
Damian's team uses a mixture of R and Python to get work done. Damian confided that he was rather fond of Python himself.
After talking with Damian I'm also watching him for something else besides Data Science. Damian appears magically produce surplus time which allows him to achieve so much. He's built and sold companies, he makes a point to always have side-projects in the works,
he's a top Kaggler, and after all this Damian makes time to meet with people. I've got to figure out
this secret!
Rob Harrigan - 247Sports
247Sports is a small Data Science shop that recent was purchased by CBS Interactive. The main product from 247Sports is in tracking high school sports and providing statistics for things like college sports recruiting.
Rob's role at at 247Sports is quite interesting - he is an aspiring "unicorn". One of Rob's mentors once told him that people with specialized knowledge in Data Science are valuable, and people with specialized knowledge in Data Engineering are valuable - but that there is no one who has specialized knowledge of both Data Science and Data Engineering. "These people are unicorns. They don't exist." So this is exactly what Rob decided to become.
Rob's specific role is in monitoring the users' engagement with the website and determining how to keep them engaged. And true to his unicorniness, this role involves both Data Engineering work (building services and pipelines in AWS using Terraform, Docker, Flask, etc.) and Data Science (using scikit-learn, seaborn, numpy, etc.). Rob stressed the importance in his career of positioning himself always in the middle of fields. He intentionally finds no-man's-lands where they shouldn't be, and positions himself right there in the middle.
Rob also had an interesting take on the notion of the term "Data Science" - he believes that the term is over-hyped and we will soon see the term fall away; it will be seen as something of a synonym to "magic". However Rob believes that Data Science as a field will nevertheless have a healthy future, it will just be under different branding. As business better understands Data Science, it will split into specialized sub-fields that have more specialized and meaningful names.
Overall Learnings
- Python is a great place to start out. All three of these individuals use Python tools including Scikit-learn, seaborn, numpy, pandas, pyspark.
- A common concerns I found in all my conversations is that good data is hard to come by and Data Science is often forced to make do with whatever data is available. So when you think "this is impossible, I simply don't have the data I need" - that might be as good as it's going to get. You have to be resourceful.
- You don't have to go to Data Science school to be a data scientist, but you do have to be good with math. The above people are a materials engineer, a philosopher, and an image processing engineer respectively. This gives me hope as a former Aerospace guy. :D
Thanks Sharon, Damian, and Rob for your time! I learned a lot.
-John