[Golden-Cheetah-Users] Handling gaps in recording

778 views
Skip to first unread message

Mark Liversedge

unread,
May 22, 2010, 6:08:01 AM5/22/10
to golden-cheetah-users
Hi,

I've been working on fixups to handle drops in recording and am
conflicted about the solution.

The metric calculations and some of the plots assume that the data
points are all equally spaced (in time). There are two solutions to
this; post process data files to fix gaps etc or change the codebase
to not make this assumption in the first place.

Looking through this in detail there are many problems with the first
approach, most notably that the data will no longer represent that
which was recorded.

If for example you have a two hour ride, in which there is a gap in
the first 5 minutes that is not equally divisible by the recording
interval we will need to interpolate the remaining 1hr and 55mins
worth of data. Alternatively, we could be clever and adjust the first
five minutes to align with the remainder of the file. Of course, this
approach becomes very complex when there are multiple gaps.

I have come to the conclusion that the assumption that datapoints are
equally spaced should be corrected instead.

This is a complex undertaking and will be hard :-). Before I start,
are there any comments or thoughts?

Mark

--
_______________________________________________
Golden-Cheetah-Users mailing list
golden-che...@googlegroups.com
http://groups.google.com/group/golden-cheetah-users?hl=en

Robert Chung

unread,
May 22, 2010, 9:07:33 AM5/22/10
to golden-cheetah-users
On May 22, 3:08 am, Mark Liversedge <liverse...@gmail.com> wrote:

> If for example you have a two hour ride, in which there is a gap in
> the first 5 minutes that is not equally divisible by the recording
> interval we will need to interpolate the remaining 1hr and 55mins
> worth of data.

I'm a tad confused by this. Could you give an example?

Mark Liversedge

unread,
May 22, 2010, 9:19:43 AM5/22/10
to golden-cheetah-users
I'll go with a 20s ride, to keep the numbers down :-)

Device Sample rate 1s

Secs, Watts
1 100
2 100
3 100
4 130
5 130
10.5 140
11.5 150
12.5 150
13.5 160
14.5 170
15.5 180
16.5 190
17.5 190
18.5 190
19.5 200
20.5 210

As you can see, after 5 seconds there was a pause in recording of 5.5
seconds, and then we got 1s interval data for the remainder of the
ride (albeit a very short one).

To ensure we have a fixed period of 1s then the points between 6s and
10s would need to be set to zero and the points at from 11s onwards
would need to be interpolated from the available data. Alternatively,
the first 5 seconds would be interpolated to get values at 1.5,2.5
etc.

Recording gaps which are not multiples of the sample rate seems to
occur in my Powertap files quite often (via WKO admittedly). Maybe it
is a special case (and I have accounted for the 1.26s sample rate).

Hope this makes sense?

Cheers,
Mark

Robert Chung

unread,
May 22, 2010, 9:38:49 AM5/22/10
to golden-cheetah-users
It makes sense, but I've never seen that from my PT (wired, 1.26 sec).
Every gap I've ever seen has been a multiple of 1.26 seconds. How were
these data downloaded?

Mark Liversedge

unread,
May 22, 2010, 9:48:22 AM5/22/10
to golden-cheetah-users
Ah, ok. They were downloaded into WKO and then imported as WKO files.
Will look more closely to see if its isolated to that. I do have some
old Polar files with 5s recording that have similar issues too and
they were converted from HAC4 to hrm format using hac42hrm. Hehe,
maybe my code is the cause of the issue :-)

Robert Chung

unread,
May 22, 2010, 11:09:42 AM5/22/10
to golden-cheetah-users


On May 22, 6:48 am, Mark Liversedge <liverse...@gmail.com> wrote:
> Ah, ok. They were downloaded into WKO and then imported as WKO files.
> Will look more closely to see if its isolated to that.

There is a deeper issue related to time synching, though. That's if we
ever need to work with two (or more) files collected by two different
devices, like a 1.26-second PT and a CinQo, or two SRMs with different
recording intervals. That might come up in a multi-ride analysis
context, or in Aerolab.

Mark Liversedge

unread,
May 22, 2010, 1:13:32 PM5/22/10
to golden-cheetah-users
I went back to my source files and looked through those with problems.

The problem is manifested in rides from Powertaps and Computrainer/
Velotrons via WKO (and possibly other devices WKO supports). I
wouldn't be suprised if there were other files with these problems
(e.g. TCX). In fact, with CT/VT the recording interval is different
for almost every sample, which I think is due to the downsampling it
does to reduce file sizes (the native sampling rate is 0.1s I
believe).

I'm going to work up both solutions - one to support user configurable
post-processing of dataPoints to fill-in gaps with zeroes/interpolate/
whatever and one to remove the assumption that sample rates are
constant and samples are contiguous in the metrics and find best
intervals code.

Regards,
Mark

AndyF

unread,
May 22, 2010, 1:36:51 PM5/22/10
to golden-cheetah-users


On May 22, 6:08 am, Mark Liversedge <liverse...@gmail.com> wrote:
> I have come to the conclusion that the assumption that datapoints are
> equally spaced should be corrected instead.


I completely agree. This conclusion is probably the only possible
longterm solution to the problem of datasets collected at different or
inhomogeneous sampling rates.

Now if we ever want to homogenize the datasets by putting them all on
one homogeneous sampling grid, the problem needs some care. To get
ripple-free, unaliased results involves projecting known ride data,
u(t), onto v(t) by minimizing the squared integral:

I = int_t{ (u-v)^2 ) dt }


Upshot: You cannot expect to just interpolate one grid onto another
when they don't share the same time values.


I'd be happy to contribute the projection code necessary for this, if
you want it.

Mark Liversedge

unread,
May 22, 2010, 1:46:46 PM5/22/10
to golden-cheetah-users
All code gratefully received!! But I need some explanation too, I
suffer from remedial math syndrome :-)

Mark Liversedge

unread,
May 22, 2010, 3:20:07 PM5/22/10
to golden-cheetah-users
Quick update for those that are /vaguely/ interested the Powertap /
WKO sample size and recording gap issue.

It seems to be a rounding error in the WKO software. Very, very,
occasionally the sample is 1.32 seconds long i.e. 0.06 seconds longer
than the 1.26 seconds it's supposed to be.

It doesn't look like its a fault with the WKO reader, rather the data
itself is wrong. Of course, there may be an error in my code or my
understanding of the file format -- I'll continue to keep looking.

And yeah I know, 0.06 seconds isn't that big a deal :-)

DailyEvil

unread,
May 22, 2010, 4:58:46 PM5/22/10
to golden-cheetah-users
One thing I'd like to see is a list of how many dropouts there were on
a ride.

On the "Ride Summary" page, at the bottom, there is occasionally a
"time jumped backwards" warning. How about putting a summary there of
the number of sample points dropped? e.g. "There were 5 dropouts for a
total of 8 missing data points".

This has at least two benefits: first, people will *know* that their
data is not 100%, and if the number is high work to correct it (for
instance, on a wired powertap, dropouts are often caused by the shark
fin being badly positioned). The other is that a sudden increase in
the number of dropouts is an early warning sign of the batteries
getting weak.

Robert Chung

unread,
May 23, 2010, 9:43:15 AM5/23/10
to golden-cheetah-users

On May 22, 12:20 pm, Mark Liversedge <liverse...@gmail.com> wrote:

> It seems to be a rounding error in the WKO software. Very, very,
> occasionally the sample is 1.32 seconds long i.e. 0.06 seconds longer
> than the 1.26 seconds it's supposed to be.
>
> It doesn't look like its a fault with the WKO reader, rather the data
> itself is wrong.
>
> And yeah I know, 0.06 seconds isn't that big a deal :-)

Could it be in either the hub or the head, before it gets to WKO? The
reason I ask is because of an anomaly I've noticed since the Rosetta
Stone analysis, which pre-dated Cyclingpeaks: when I first tried
synching up a PT file with a SRM file, I noticed that the PT's time
stamp would occasionally seem to burp, so I had to re-synch a few
times. An amusing exanple of this can be seen in the AIS comparison of
a PT and SRM back in 2004: they had a figure in that article that
clearly showed a mismatch, which they didn't fix.

Rainer Clasen

unread,
May 24, 2010, 6:00:44 AM5/24/10
to golden-cheetah-users
Mark Liversedge wrote:
> Ah, ok. They were downloaded into WKO and then imported as WKO files.
> Will look more closely to see if its isolated to that. I do have some
> old Polar files with 5s recording that have similar issues too and
> they were converted from HAC4 to hrm format using hac42hrm. Hehe,
> maybe my code is the cause of the issue :-)

FYI: When using SRM with recording != 1sec such "unaligned gaps" also show
up quite frequently.

Rainer

Fredrik

unread,
May 22, 2010, 9:16:15 AM5/22/10
to golden-cheetah-users
Hi Mark,

I have been thinking of this too. I don't think it's a good idea to
interpolate such long gaps as 5 mins since such interpolations are
very uncertain. I think it's better to just mark the missing time
interval (the gap) as just "missing" in some way in the plots. Also,
sometimes I stop the timer in the middle of a training and then I
don't want this gap to be a part of the excercise.

One could perhaps do some sort of interpolation for shorter gaps, or
perhaps on all data since there is always erros ("noise") in the data;
for position data (GPS coordinates) I think the error usually in the
order of a few meters. An idea is to use some sort of motion model
for a rider which would improve both position and speed estimates.
That is, a rider with a certain speed and weight cannot change
direction (location) or speed too much between each sampling instant.
However, I also don't like to change the original data but
interpolations can perhaps be used to improve the measuers (scores)
and plots?

Regarding equally spaced datapoints, not all devices sample uniformly.
On my Edge 705 you can sample data in a "smart mode" which means that
the sampling interval varies roughly between 5 s and 15 s (at least
when a last tested it).

Fredrik
> golden-cheetah-users@googlegroups.comhttp://groups.google.com/group/golden-cheetah-users?hl=en

Fredrik Lingvall

unread,
May 22, 2010, 10:06:20 AM5/22/10
to Mark Liversedge, golden-cheetah-users
Hi Mark,

I have been thinking of this too. I don't think it's a good idea to interpolate such long gaps as 5 mins since such interpolations are very uncertain. I think it's better to just mark the missing time interval (the gap) as just "missing" in some way in the plots. Also, sometimes I stop the timer in the middle of a training and then I don't want this gap to be a part of the excercise.

One could perhaps do some sort of interpolation for shorter gaps, or perhaps on all data since there is always erros ("noise") in the data; for position data (GPS coordinates) I think the error usually in the order of a few meters. An idea is to use  some sort of motion model for a rider which would improve both position and speed estimates. That is, a rider with a certain speed and weight cannot change direction (location) or speed too much between each sampling instant. And, I also don't like to change the original data  but interpolations (smoothing) can perhaps be used to improve the measuers (scores) and plots?

Regarding equally spaced datapoints, not all devices sample uniformly. On my Edge 705 you can sample data in a "smart mode" which means that the sampling interval varies roughly between 5 s and 15 s (at least  when a last tested it).

Fredrik

PS. I tried to send this with the google groups web interface but that didn't seem to work so I'm resending it now.

Fredrik Lingvall

unread,
Jun 3, 2010, 7:56:09 AM6/3/10
to golden-che...@googlegroups.com, Mark Liversedge
Hi all,

I'm trying to add a gradient plot to GC. So far I have just basically
made a copy of the altitude code (and then renamed it) so there is not
content yet. I've got so far that I have a (dimmed) gradient checkbox in
the "Ride Plot" tab now. I'm a little bit puzzled about how the data is
stored in the GC code.

I need to compute the gradient from altitude and distance data. I have
done similar stuff in matlab-code (for hrm-files and tcx-files) before.
Basically I want to use a number of samples around the a data point to
calculate the gradient, i.e., fit a function or smooth the data around a
point to get a robust estimate of the gradient.

Do I need to add specific code for this for every file format or can do
it globally after the data is loaded (preferable)? That is, where is it
best to add code for this?

/Fredrik

Mark Liversedge

unread,
Jun 3, 2010, 8:30:18 AM6/3/10
to golden-cheetah-users
Hi Fredrik,

I have coded up a 'Data processor' class which post-processes ride
file data. It gets called when a file is loaded (if the user has
configured it in the config pane to be 'automatic'). I have also added
them to the main window's Tools menu so you can run them manually.

So far I have written 3 data processors; fix gaps in recording, fix
power spikes and compute .cpi files. I'm working on the ride editor at
the moment but also plan to go back and add a new 'cpi' type file (and
associated dataprocessor) to support distribution charts across
multiple ride files (i.e. for each ride a file that says how many
seconds were spent at 0w, 1w, 2w ... Nw, or 0 bpm, 1bpm, 2bpm ...
Nbpm). I was going to use these files to update the Histogram tab to
allow you to plot distribution charts for a season (or any other time
period). I'm also mucking about with a stacked graph (like this
http://manyeyes.alphaworks.ibm.com/manyeyes/page/Stack_Graph.html) to
support the Greg Steele Bottle of Sand chart.

I wonder if the bigger question is how we store these 'auxillary' or
'derived' datapoint values? Currently, a datapoint has a pre-defined
set of series (power, cadence, speed, lat, lon et al) and this will
add another data-series for gradient, rather like the recent patch for
headwind. My instinct says this is probably the right approach.

I really need to start issuing patches for some of the stuff I've been
working on, but in the meantime you can clone my repo from
git://github.com/liversedge/VirtualCheetah.git and look for the
DataProcessor.{h,cpp} for the base class definition and then
FixGaps.cpp or FixSpikes.cpp for an example.

PM me if you want to play with it and need help getting to grips with
the new dependencies (e.g. qtsoap is used for training peaks upload/
download code, new versions of the qwtplot3d libs to fix font problems
on Linux). Alternatively, just wait for me to send out a stack of
patches, I'll try and sort that out this month. Writing a
Dataprocessor for gradient would be very easy.

FWIW I have added gradient as a metric in my repo (and also Ferrari's
VAM), since I think they are really most interesting for interval
analysis rather than the ride plot. But that is just my opinion :-)

Cheers,
Mark

greg steele

unread,
Jun 3, 2010, 10:20:08 AM6/3/10
to golden-cheetah-users List
You mean it's not dead? I thought it was totally dead....

On Jun 3, 2010, at 6:30 AM, Mark Liversedge wrote:
> I'm also mucking about with a stacked graph (like this
> http://manyeyes.alphaworks.ibm.com/manyeyes/page/Stack_Graph.html) to
> support the Greg Steele Bottle of Sand chart.

> Cheers,
> Mark

Mark Liversedge

unread,
Jun 3, 2010, 10:38:49 AM6/3/10
to golden-cheetah-users
I've got the hang of QT's Painter and reckon I can have a go. Might
look like it was drawn by a 2 year old, but hey we all have to start
somewhere!

Fredrik Lingvall

unread,
Jun 4, 2010, 4:47:54 AM6/4/10
to Mark Liversedge, golden-cheetah-users

Hi Mark,

I'm building GC from your repo right now and I will have a look at your 'Data processor' class.

Roughly, I'm thinking of adding the gradient to the Ride Plot tab and it would also be nice to have color coded gradient zones in the Google map (so one easily can spot the nasty sections on Galibier :-)

/Fredrik


Fredrik Lingvall

unread,
Jun 4, 2010, 7:33:06 AM6/4/10
to Mark Liversedge, golden-cheetah-users
Hi Mark,

I have attached a patch for some initial gradient stuff. I have not
used git before so be careful...

I have done this:

* Added a new data type for the gradient which is mostly a copy of the
altitude code (I think that this make sense even though it's "derived"
data).

* Added a new member function appendPoint which takes one more
argument (the grad arg) than than the old appendPoint function.

* Made the old appendPoint function call the new appendPoint with grad
= 0.0 (since we don't want to add/compute the gradient info in every
*RideFile.cpp file for the different file formats - right?)

grep for FIXME and TODO and you will find where I'm uncertain what to do.

Ideally I want to work on the raw (unprocessed, non-interpolated)
data. Which data type should I use then?

And, the patch is for your git rep Mark.

Regards,

/Fredrik

0001-Added-some-initial-support-for-gradient-data-plots.patch

Mark Liversedge

unread,
Jun 4, 2010, 5:57:27 PM6/4/10
to golden-cheetah-users
Hi Fredrick,

The raw data is in RideFile::dataPoints_ which can be accessed as a
const via the dataPoints() member. To get a mutable reference to the
QVector I was thinking of adding a dataPointsRef() member to return a
reference to the private data, but whilst I'm hacking I just made it
public.

I know that Sean will not accept the code as it stands (and rightly
so) but its just a hacker repo anyway. When I push patches to the list
I tend to tidy this stuff up. I tend to address const and private
'issues' once I'm happy that I really do need to do something (i.e.
after a while the OO design emerges rather than being fully
established whilst I'm hacking and generally the desire to change
stuff disappears). I think with the dataprocessor and ride editor that
is unlikely tho.

I'll play with your patch this weekend if that is ok?

ian

unread,
Jun 12, 2010, 6:52:33 AM6/12/10
to golden-che...@googlegroups.com
On Thu, 3 Jun 2010 05:30:18 -0700 (PDT)
Mark Liversedge <liver...@gmail.com> wrote:


in the meantime you can clone my repo from
> git://github.com/liversedge/VirtualCheetah.git
>

I like Virtual Cheetah a lot and also see there is also a Peak WPK for
various times, I assume this is watts per Kilo, I have my weight
entered in the options section but just get "inf" shown as a value,
does this matrix work, am I doing something wrong or does WPK mean
something else.

Reply all
Reply to author
Forward
0 new messages