Advice for a newly minted Data Scientist from Tim Dobbins and Chang Lee

14 views
Skip to first unread message

Matthew Cronin

unread,
Feb 8, 2020, 10:09:27 PM2/8/20
to Penny University

This week, Tim Dobbins and Chang Lee were kind enough to spend some time talking with me about their experiences in the role of Data Scientist, questions to ask and thought processes  to follow at the inception of a new project, best practices, useful tools, and time-saving strategies to use during a project, and thoughts on the growth and management of a Data Science team. I have summarized key points from these conversations below.


About Me:


I am a recovering academic physicist, having recently left my 2nd postdoctoral appointment this week after almost 9 years studying quantitative Magnetic Resonance Imaging. Next week I start my first 'real' job as a Decision Scientist at Campaign Monitor in Nashville. Tim and Chang were kind enough to give me some of their time and words of wisdom ahead of this new adventure.


About Tim:


Tim is currently a Lead Data Scientist at The General in Nashville. He studied Economics and Statistics at Belmont, with a particular interest in the philosophy of economics and econometrics; and held various Data Analyst and Software Developer before his transition into full-time data science.


About Chang:


Chang is currently a Senior Data Scientist at Lowe’s Companies, Inc. He studied Physics and Mathematics at the National Tsing-Hua University in Taiwan, followed by a Ph.D. in Mathematics at Vanderbilt University, specializing in digital signals processing, before moving into Data Science.


Inception of a new Data Science project:


Both Tim and Chang emphasized the need to distill an idea down to a clearly articulated and deliverable project. Even if an idea sounds like common sense, be sure to establish *why* the customer or stakeholder wants it solved, what value a working model would deliver, and how it would be integrated into actual routine use or deployment. To paraphrase Chang - great sounding ideas may be enthusiastically presented along the lines of “We want to cure cancer!”, but the appropriate goal that must be defined will look more like “We want to reduce deaths from thyroid cancer in Charlotte by X% as current outcomes are poor.  We could achieve this by using machine learning to identify at-risk individuals for proactive-screening/follow-up.” Additionally, having a formal structure for the submission and evaluation of project ideas including appropriate individuals from interested areas of the business is important to minimize ad-hoc requests which may be ill-conceived or produce redundant results.


Execution of a Data Science project:


Chang emphasised taking the time to explore data and research the problem at hand. Survey what others have done trying to solve your problem. Read articles in the press and try to identify who has tried to solve the problem previously. Have people written any white papers? Read abstracts on related academic papers even if the article itself is behind a paywall. Identify 3 or 4 solutions, try them individually and where relevant in combination. Do all this before diving head-first into whatever novel approach you have in your mind as a potential solution and you may save significant amounts of time.


Tim had some great advice regarding best practices and tools to ensure consistency and repeatability in your work. He emphasized working on your software development skills and maintaining best practices even when carrying out preliminary/exploratory work, as this can help considerably if a project is later handed off to engineers for deployment, or even simply back to you for further work or review some time in the future. He emphasized the use of virtual environments and version control, the thorough commenting and documentation of code, and working on the principle that any project should be able to be immediately repeatable when executed on another AWS instance.


To this end, Tim also had the following software recommendations that he finds particualrly useful in his work:


  • Git + Bitbucket (or equivalently GitHub) for version control

  • DVC for version control of data, models, and other files too large for Bitbucket/GitHub

  • pipenv for virtual environments


Both Tim and Chang also had thoughts about/interest in the cultural differences between industry and academic environments, and we aim to set up a Zoom chat or similar soon with anyone else who might be interested in discussing that topic!


Reply all
Reply to author
Forward
0 new messages