Pwr vs HR vs Bike vs Cranks - Big Data analysis

366 views
Skip to first unread message

Steve Mansfield

unread,
Aug 26, 2016, 11:33:41 PM8/26/16
to golden-cheetah-users
G'day

As a part of a work project and secondly through interest, I've been looking at PWR vs HR vs Bike vs Cranks. It seemed sensible to think that on any one ride it would be hard to see correlations. However what about if you looked at lots of rides ie applied some BigData to the problem?

Well, I converted all the Garmin .fit files for every ride with my power meter (I wrote a script to use GPSBabel), and uploaded into Google Cloud Storage* and BigQuery. The result is about 2,000,000 rows of data, each row having HR, PWR and Cadence. I denormalized the data to add bike and crank length ie flattened the tables. Then I used Big Query to analyse the data.

The basic query is

SELECT
Bike,
AVG (Heartrate) AS HR,
AVG (Cadence) AS Cad,
Power AS Pwr,
Cranks as Cranks
FROM RideData.Rides_at_18_08_2016
WHERE Bike IN ( "Bike2", "Bike1")
GROUP BY Bike, Pwr, Cranks

This query gives about 1.3M rows. Its less than the ~2M total because I have other bikes that also have a power meter. Still 1.3M data points is quite a bit right?

So what do I see? Well not always what I expected... which is fairly good I suppose! Here's the chart for Power and HR for 2 bikes. NB on the Cannondale I swapped to shorter cranks about half way through its life, and each data set is about 450k rows.



What I expected to see was the switching to shorter cranks on the Cannondale would mean less HR for the same power, and thats what we see. What I was not expecting is that power for power my heart rate would be so much higher when on the Cervelo!

My working hypothesis for the shape of these charts, which I put forward for discussion is that as the power required approaches and then becomes greater than VO2Max (approx 410W for me) my energy systems switch and become less oxygen / HR dependent. Below about 200W I'm probably not working hard eg after a sprint or the top of a climb when my heart rate is still recovering.

Does that sound sensible?

Then I looked at Cadence vs HR, and HR vs Pwr/Cad (which I think equals Torque?) and it doesn't seem enlightening... looks like it just gives behaviour more than insight.





Cheers!

Steve

* I cleaned up the data by removing any rows with:
- cadence and power less than 50 since that would be noise
- HR less than 100 - since I'm looking for actually working
- HR greater than 173 since that would be bogus data - might heart won't beat faster





Ruud Goorden

unread,
Aug 27, 2016, 6:06:16 AM8/27/16
to golden-cheetah-users
Nice one Steve. This is also a nice read for you: http://wattagetraining.com/files/JMartinCrankLengthPedalingTechnique.pdf

Steve Mansfield

unread,
Aug 27, 2016, 8:18:42 AM8/27/16
to golden-cheetah-users
Thanks Ruud. I've read that before, always good to re-read!

I've also read that metabolic effort is closely correlated with linear foot velocity, but I can't find that article right now. That would mean that shorter cranks can equal higher rpm as long as the rpm does not go up by more than the crank length shortens.

Here's my result, n is approx 900,000:

RowBikeHRCadCranks 
1'Dale125.2351654091581178.89628382060505172.5 
2'Dale127.2203446527035976.62319661605862175.0 


Here's a chart of cadence vs heart rate for two crank lengths same bike - n is approx 900,000



http://www.ncbi.nlm.nih.gov/pubmed/12183473

Ryan Switala

unread,
Sep 1, 2016, 2:22:56 PM9/1/16
to golden-cheetah-users
Nice analysis. How are you seperating out causation vs correlation? For example your HR per power on 1 bike might be higher or lower because of some other totally unrelated factor (e.g you used bike A 5 years ago when really unfit and bike B in the last year when you were really fit?)

Mark Liversedge

unread,
Sep 1, 2016, 2:48:47 PM9/1/16
to golden-cheetah-users
On Saturday, 27 August 2016 04:33:41 UTC+1, Steve Mansfield wrote:
As a part of a work project and secondly through interest, I've been looking at PWR vs HR vs Bike vs Cranks. It seemed sensible to think that on any one ride it would be hard to see correlations. However what about if you looked at lots of rides ie applied some BigData to the problem?

Well, I converted all the Garmin .fit files for every ride with my power meter (I wrote a script to use GPSBabel), and uploaded into Google Cloud Storage* and BigQuery. The result is about 2,000,000 rows of data, each row having HR, PWR and Cadence. I denormalized the data to add bike and crank length ie flattened the tables. Then I used Big Query to analyse the data.


With such a small data set I'd say you could probably do this with an R chart in GC.
For example;

data <- activity(date=activities(filter="Data contains "P" and Data contains "H" and Bike matches "Bike[12]"));

Will get one big old ride that contains all data where the activities contain HR and Power for the bikes you are interested in. Then you could do your analysis on it. Its likely to be no more than a few megs of RAM, which is small fry when you have 4gb in your PC/Mac etc.

If you want to save on reloading it everytime you start the chart you could check to see if it is cached:
if (exists("data")) ...

So what do I see? Well not always what I expected... which is fairly good I suppose! Here's the chart for Power and HR for 2 bikes. NB on the Cannondale I swapped to shorter cranks about half way through its life, and each data set is about 450k rows.



What I expected to see was the switching to shorter cranks on the Cannondale would mean less HR for the same power, and thats what we see. What I was not expecting is that power for power my heart rate would be so much higher when on the Cervelo!

What exactly is this plot, coz HR:POWER would be much more scattered. Be interesting to see your code and see if we can turn it into an R chart. If nothing else it will see if GC+R is flexible enough to do the kind of analysis you are looking for !

CHEERS

Mark 

Steve Mansfield

unread,
Sep 4, 2016, 10:51:56 AM9/4/16
to golden-cheetah-users
Yes, these values are hugely variable minute to minute, day to day, ride to ride. I wanted to see if a pattern emerged across all data points across all rides

This the SQL query I used

SELECT
Bike,
AVG (Heartrate) AS HR,
AVG (Cadence) AS Cad,
AVG (Power) AS Pwr,
Cranks as Cranks
FROM RideData.Rides
WHERE Bike IN ( "Bike1", "BikeN")
#AND Cranks = 172.5
GROUP BY Power, Bike, Cranks

To create the graphs, after the query completes in BQ, press the "save to Google Sheets" button. If you've got multiple bikes with multiple crank lengths then you'll need to do a pivot, and then use Power as the X axis.

The eagle eyed will spot that a scatter chart cannot be created in Google Sheets with multiple vertical axes: you have to use a line chart and then the chart type. To my mind that's a bug so I've reported it.

Hopefully the variable names I used are self-planatory

PS If this is a)valid and b)interesting, I can post a cookbook of how I did the underlying data manipulation. One interesting thing is that although the whole dataset is about 200Mb, because BQ uses columnar storage, the data queried is only for those columns ie this query only runs about 70Mb. Also BQ is dirt cheap to run: a TB costs, I think, $5 !


Steve Mansfield

unread,
Sep 8, 2016, 9:36:18 PM9/8/16
to golden-cheetah-users
I've often though I work less hard while commuting... here's some evidence NB Com=commute, NoCom = not a commute

mickebergma...@gmail.com

unread,
Sep 9, 2016, 2:33:56 AM9/9/16
to golden-cheetah-users
Very interesting analysis.

I was very surprised when I changed to 2.5 mm shorter cranks how big the subjective difference was. Really never expected. I even changed back to verify the feeling.
Yesterday I happened to do some statistics not directly related to this but from a accuracy point of view yes.

I wanted to see if and how I'm developing over the summer.
What I did was plotted (by ride - not line by line) all "valid" rides Average power/Average HR and Normalized power/Ave HR.
First I noticed that the plot for AP/HR R^2 is all quite bad. NP works much better (although curves correlate).

Then the point affecting your analysis. From starting in April at 0.68 HR/NP I'm now down to 0.60.
This means the data should be collected in a relatively short period of time. Another thing discussed in the fatigue tread is that HR/Power varies from ride to ride. To get better accuracy the ckanks should be swapped several times back and forth to ensure data quality.

BTW. I also waned to see if my cadence has changed over time. Is there a way to add data (columns) to the Trends - Summary page?

Steve Mansfield

unread,
Sep 12, 2016, 1:09:55 AM9/12/16
to golden-cheetah-users
My own experience is that correlating HR and NP is potentially misleading. On the way home from work I can take a small diversion and do some interval work on some hills. Each one takes about 40s to a minute. Because I do them to maximum effort , NP for such a ride gets quite high as you'd expect, much higher than Av Pwr and Weighted Av Pwr, however HR being a lagging indicator, my HR does not have chance to get to max ie if I use NP/HR I will _think_ I am getting fitter, however I'm not. Would  Weighted Av Power / HR as a trend might be better?

mickebergma...@gmail.com

unread,
Sep 12, 2016, 1:53:42 AM9/12/16
to golden-cheetah-users
I thought about that and agree. It requires a big data set to be fairly accurate. However, Average power is even worse. I don't remember exactly the difference between weighted average and NP.
Best would be line by line data including some filtering (removing transients). However, this is not within my capabilities.

Speaking of NP in this matter it has an advantage which I already checked last week as it looked strange. The fact that NP can be way below average power. At first it sounded funny but it actually makes a lot of sense. And also helps the HR/NP accuracy in short sprints. In my example I did a strava segment of 54 s. Average power was 554 W and NP only 513. (I must say HR DID climb to a value correlating well with the power). In a shorter sprint the difference is even bigger (another on the same trip 34 s - 626 / 486 W)

Steve Mansfield

unread,
Sep 12, 2016, 4:21:49 PM9/12/16
to golden-cheetah-users
It seems that the less data you have, the less clean the resulting chart, for example here's two rides only - Perth was about 4.75 hours, Amy's about 3.5 hours.


The rides were a week apart, and I was definitely less fresh for  Amy's (the later one).

For the relationship between NP and other power measures, it looks like the 30s simple averaging aspect causes some interesting effects if you do, say 15s intervals. If I get chance I'll have a play later this week. In any event, looking at the underlying maths, Skiba's xPower may be a better bet. Pity because all my rides NP is more than xPower. So far I cannot replicate the xPower numbers.

Mark Liversedge

unread,
Sep 12, 2016, 4:42:06 PM9/12/16
to golden-cheetah-users
On Monday, 12 September 2016 21:21:49 UTC+1, Steve Mansfield wrote:
It seems that the less data you have, the less clean the resulting chart, for example here's two rides only - Perth was about 4.75 hours, Amy's about 3.5 hours.


What does each point represent?
Are they 30s moving averages for Power and HR for each point in a ride, red for Perth Ride and blue for Amy Ride?

Thanks
Mark

 

Steve Mansfield

unread,
Sep 14, 2016, 7:46:45 AM9/14/16
to golden-cheetah-users
Average HR Y for a given power X

So if say I'd hit 300W on 4 occasions and at the time my HR was say 120,130,140,150 respectively, then the average would be 135, and that what is plotted, 300 X, 135 Y

andy aardema

unread,
Sep 21, 2016, 11:20:45 AM9/21/16
to golden-cheetah-users
I'm finding this sql query suspect:

SELECT
Bike,
AVG (Heartrate) AS HR,
AVG (Cadence) AS Cad,
AVG (Power) AS Pwr,
Cranks as Cranks
FROM RideData.Rides
WHERE Bike IN ( "Bike1", "BikeN")
#AND Cranks = 172.5
GROUP BY Power, Bike, Cranks


First, just as a nitpick from a person who writes a lot of sql, it's confusing to have an avg(Power) aggregate in the select clause but not aggregated in the GROUP BY.  I think the one effectively negates the other.

Second, I'm interpreting this as "for each power value, plot the average HR values accumulated at that power value".  The average HR doesn't tell us much about it's distribution, and likewise for power when assigning one point to each value.  That's what a scatter plot can solve, except it can get visually overwhelming and trends/patterns can get lost.

I think this is a very cool idea, and it would be fascinating if you could tease out a difference between the bikes.  I thinking something like ratio of per-sample HR to "instantaneous" xP (e.g. xP for last 30 seconds), maybe binned by frequency and/or into power band histogram buckets.

Steve Mansfield

unread,
Sep 21, 2016, 7:06:56 PM9/21/16
to golden-cheetah-users


On Thursday, 22 September 2016 01:20:45 UTC+10, andy aardema wrote:
I'm finding this sql query suspect:

It could well be bogus... I know nearly nothing of SQL. If you can suggest a better query that Google BigQuery will run (needs to be near ANSI-standard SQL) happy to run it

 

SELECT
Bike,
AVG (Heartrate) AS HR,
AVG (Cadence) AS Cad,
AVG (Power) AS Pwr,
Cranks as Cranks
FROM RideData.Rides
WHERE Bike IN ( "Bike1", "BikeN")
#AND Cranks = 172.5
GROUP BY Power, Bike, Cranks


First, just as a nitpick from a person who writes a lot of sql, it's confusing to have an avg(Power) aggregate in the select clause but not aggregated in the GROUP BY.  I think the one effectively negates the other.

Second, I'm interpreting this as "for each power value, plot the average HR values accumulated at that power value".  The average HR doesn't tell us much about it's distribution, and likewise for power when assigning one point to each value.  That's what a scatter plot can solve, except it can get visually overwhelming and trends/patterns can get lost.

I think this is a very cool idea, and it would be fascinating if you could tease out a difference between the bikes.  I thinking something like ratio of per-sample HR to "instantaneous" xP (e.g. xP for last 30 seconds), maybe binned by frequency and/or into power band histogram buckets.

Yes I see differences between bikes and cranks as you see earlier in the thread (thats as far as I've gotten). I need to try this for a bike I only really commute on, and a bike I nearly never commute on. My hypothesis is that its the second to second differences in speed which characterise the ride. In other words if I commute I spend a lot of time at lights, stuck in traffc etc, so that's basically resting. On a non-commute ride that doesn't happen, so HR tends to be higher for a given power ie Watts / HR is higher on commutes.

If I knew more (and I don't even know how to do date partitioned tables so I can't do date based queries!), it would be interesting to calculate for the average speed difference ie the acceleration, or even second-second power difference across all my rides. I did it manually for two rides (commute vs non-commute) and found a difference.

Steve Mansfield

unread,
Sep 21, 2016, 11:15:31 PM9/21/16
to golden-cheetah-users
So, here's the commute vs non-commute:


Reply all
Reply to author
Forward
0 new messages