Hi all
I've now released v1.0 of my project! It's an Visual Studio solution consisting of SQL scripts to create the DB objects (schemas, tables, SPs) and an SSIS package to manage the ELT workflow, and is available in a
public Git repo for reuse. If you have Visual Studio, a SQL Server DB, and are reasonably competent with both of them it should [hopefully] be quick and easy to deploy so that you have a DB holding all your activity data and updating it with new or changed records.
I wanted to get all my historic activity data - thanks Bryant and David for the helpful suggestions - this turned out to be fairly quick and painless using a WHILE loop. The on request Strava archive wasn't easily usable since the flat files that it consisted of were not structured the same as the datasets returned by API call. The repeated column names in a single dataset were a bit unhelpful too....
I made a decision to that the DB should be accurate as of the previous day so this meant that the same activity record would be loaded into the landing table several times since some count values (kudos, comments, etc) could increase after the day the activity was uploaded to Strava - the incremental load to the Staging table would then process records as INSERTs or UPDATEs.
This version only has basic functionality - the plan now is to create and populate a simple dimensional model, and then build PowerBI reporting on it.
This has been a fun little project so far! 😎 I'm sure that it could be done a lot more cleanly / efficiently but I'm putting it out there hoping that it can help others to get started.
All feedback is welcome!!