Kaggle challenge (grocery sales data) with Data Nerds

21 views
Skip to first unread message

Chang Lee

unread,
Jan 31, 2018, 10:31:28 PM1/31/18
to Penny University
On Monday I had a chance to chat with 3 of the Data Nerds: Alex Antonison, John Berryman, and Chad You.

I asked about how was the grocery sales challenge for the data nerds and what did they learn. John said that he used the challenge to get *very* familiar with pandas, and got started in using Keras, a neural network/deep learning framework that I had no experience in. 

A particular question I asked on the challenge was how they imputed the sales data. In my opinion, retail sales are tricky because the missing data can come from different sources --- it could be due to actual missing data or the items went out of stock. Alex, Chad, and John described they used different ways to impute the data, like taking median or means from other values, and we got into a conversation on how to impute different kinds of datasets if we know some context on the data.

One idea was to chop the data up into strips and take the mean or median in each strip if there is missing data, as shown in image (by John & Chad). 


Another method of imputation is to fit a model on the missing dependent variable from other variables and impute accordingly, but a linear model might not work well if the data doesn't look linear. We didn't dive too deep into it as we went out of time.

Bonus:
1. Alex recommended a nice podcast called Linear Digressions: http://lineardigressions.com/

I haven't listened to but the topic list looks really solid. The learned tree paper was probably the hottest paper in the machine learning world in the last 3 months, so I'll definitely listen to that episode.

2. I learned that moustache wax the key to *great* handlebars.

Thanks Data Nerds!
Reply all
Reply to author
Forward
0 new messages