Golden Cheatah to Google Big Query?

1,416 views
Skip to first unread message

Steve Mansfield

unread,
Jun 15, 2016, 10:29:47 PM6/15/16
to golden-cheetah-users
G'day

With the recent release of a tool from OWOX, you can now query data in Google Cloud Storage via BigQuery, and pull it into Google Sheets.

Works well for the test data sets.

Wondering if there is a good ie straight forward way to get GC data into Big Query tables?

TIA

Steve

Mark Liversedge

unread,
Jun 16, 2016, 2:25:25 AM6/16/16
to golden-cheetah-users
If you have paid for your GCP  then you should be able to just load in activities as they are only JSON, should be a snap.

We have a GoldenCheetah GCP to host cloud charts and will shortly be using it to create an Open Data Store of downsampled ride data for researchers and others to use. But we are not likely to provide a generic query service on top of the big tables since we do not have funding for that and it could become very costly, very quickly.

Cheers  

Mark

Steve Mansfield

unread,
Jun 17, 2016, 7:53:16 AM6/17/16
to golden-cheetah-users
Yes you're right, I can do that.

I had a go at uploading a GC .json file as a new table, but the file gets rejected. Do you know which bits I need to remove? I created a schema based on the ordering of the fields on the JSON file.

The error I get is

Errors:file-00000000: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. (error code: invalid)JSON parsing error in row starting at position 0 at file: file-00000000. Parser terminated before end of string (error code: invalid)

which I get even if I remove all the non-data bits below

Here's the first few lines of a sample file:

{
"RIDE":{
"STARTTIME":"2016\/06\/17 06:28:11 UTC ",
"RECINTSECS":1,
"DEVICETYPE":"Garmin FR920XT ",
"IDENTIFIER":" ",
"TAGS":{
"Athlete":"PowerTap ",
"Calendar Text":" ",
"Data":"TDSPHC-AGL----- ",
"Device":"Garmin FR920XT ",
"File Format":"FIT (*.fit) ",
"Filename":"2016_06_17_16_28_11.json ",
"Month":"June ",
"Source Filename":"2016-06-17-16-28-11 - 920xt_2016_06_17_16_28_11.FIT ",
"Sport":"Bike ",
"Weekday":"Fri ",
"Year":"2016 "
},
"INTERVALS":[
{ "NAME":"Lap 1 ", "START": 0, "STOP": 1441 },
{ "NAME":"Lap 2 ", "START": 1442, "STOP": 2631 },
{ "NAME":"Lap 3 ", "START": 2632, "STOP": 3873 },
{ "NAME":"Lap 4 ", "START": 3874, "STOP": 4009 }
],
"SAMPLES":[


Steve

Mark Liversedge

unread,
Jun 17, 2016, 8:15:38 AM6/17/16
to golden-cheetah-users
On Friday, 17 June 2016 12:53:16 UTC+1, Steve Mansfield wrote:
I had a go at uploading a GC .json file as a new table, but the file gets rejected. Do you know which bits I need to remove? I created a schema based on the ordering of the fields on the JSON file.

The error I get is

Errors:file-00000000: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. (error code: invalid)JSON parsing error in row starting at position 0 at file: file-00000000. Parser terminated before end of string (error code: invalid)

Its probably the BOM at the beginning of the file (text strings in metadata need this for i18n).

Cheers and good luck !!

Mark

Steve Mansfield

unread,
Jun 21, 2016, 6:11:34 AM6/21/16
to golden-cheetah-users
If anyone else is interested...

1. you need to remove all the stuff at the start of the GC JSON activity file so it starts with the actual ride data
2. then remove all the stuff at the end of the JSON file after the last line of ride data, I think you need the last line to be blank
3. the spacing of fields and text elements in the GC JSON file is not accepted by Big Query, so you'll need to modify them - see example below
4. you'll need the schema, or create a JSON that has the schema, and use it to paste into the schema window in BQ. To save you the work, I've attached one.

A successful file will look like this:

{"SECS": 1, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT": 24.8, "LAT": -32.001868, "LON": 141.13594, "SLOPE": 0 }
{"SECS": 2, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT": 24.6, "LAT": -33.998934, "LON": 141.13234, "SLOPE": 0 }

Finally, BQ is very sensitive to file names. I don't know all the rules, but I do know that having a "-" in the filename will cause a parse error.

My next challenge is to try to upload data into date partitioned tables so that I can have a whole month in the same table. Should make queries quicker (and cheaper).

PS There's some good fun to be had with Google Cloud, and when you sign up and start you get $300 credit.
RideDataSchema.json.zip

Mark Liversedge

unread,
Jun 21, 2016, 8:49:47 AM6/21/16
to golden-cheetah-users
On Tuesday, 21 June 2016 11:11:34 UTC+1, Steve Mansfield wrote:
If anyone else is interested...

1. you need to remove all the stuff at the start of the GC JSON activity file so it starts with the actual ride data
2. then remove all the stuff at the end of the JSON file after the last line of ride data, I think you need the last line to be blank
3. the spacing of fields and text elements in the GC JSON file is not accepted by Big Query, so you'll need to modify them - see example below
4. you'll need the schema, or create a JSON that has the schema, and use it to paste into the schema window in BQ. To save you the work, I've attached one.

A successful file will look like this:

{"SECS": 1, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT": 24.8, "LAT": -32.001868, "LON": 141.13594, "SLOPE": 0 }
{"SECS": 2, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT": 24.6, "LAT": -33.998934, "LON": 141.13234, "SLOPE": 0 }

Finally, BQ is very sensitive to file names. I don't know all the rules, but I do know that having a "-" in the filename will cause a parse error.

My next challenge is to try to upload data into date partitioned tables so that I can have a whole month in the same table. Should make queries quicker (and cheaper).

PS There's some good fun to be had with Google Cloud, and when you sign up and start you get $300 credit.

Thanks Steve.

For the record, our JSON files are fully compliant with RFC 7159. Use of white space is valid.

The UTF-8 encoding with a Byte order mark (BOM) breaks parsers that do not expect it, and should be removed, but of course then you will need to re-encode unicode strings into some other encoding too.

Mark
 

Steve Mansfield

unread,
Jun 21, 2016, 8:28:11 PM6/21/16
to golden-cheetah-users
For the record, our JSON files are fully compliant with RFC 7159. Use of white space is valid.

Yes, they certainly look OK, so its the Big Query parse engine that has issues :-)

Karl Billeter

unread,
Jun 21, 2016, 9:14:19 PM6/21/16
to golden-cheetah-users
On Tue, Jun 21, 2016 at 05:49:47AM -0700, Mark Liversedge wrote:


> For the record, our JSON files are fully compliant with RFC 7159. Use of
> white space is valid.
>
> The UTF-8 encoding with a Byte order mark (BOM) breaks parsers that do not
> expect it, and should be removed, but of course then you will need to
> re-encode unicode strings into some other encoding too.

Not sure I'd agree here as the RFC 7159 (section 8.1) specifies encoders
MUST NOT add a BOM (but a parser MAY ignore one). Obviously not my call
and pretty easy to strip if required in any case :-)


Karl

Karl Billeter

unread,
Jun 21, 2016, 9:25:43 PM6/21/16
to Steve Mansfield, golden-cheetah-users
On Tue, Jun 21, 2016 at 03:11:34AM -0700, Steve Mansfield wrote:

> If anyone else is interested...
>
> 1. you need to remove all the stuff at the start of the GC JSON activity
> file so it starts with the actual ride data
> 2. then remove all the stuff at the end of the JSON file after the last
> line of ride data, I think you need the last line to be blank
> 3. the spacing of fields and text elements in the GC JSON file is not
> accepted by Big Query, so you'll need to modify them - see example below
> 4. you'll need the schema, or create a JSON that has the schema, and use it
> to paste into the schema window in BQ. To save you the work, I've attached
> one.
>
> A successful file will look like this:
>
> {"SECS": 1, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT":
> 24.8, "LAT": -32.001868, "LON": 141.13594, "SLOPE": 0 }
> {"SECS": 2, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT":
> 24.6, "LAT": -33.998934, "LON": 141.13234, "SLOPE": 0 }

Another option if you're going to only use the samples in the ride is to use
the REST API. That way you don't need to do any hand editing. Something like

curl 'localhost:12021/Karl%20Billeter/activity/2016_06_22_08_32_18.json?format=csv' > 20160622T083218.csv

should work.


Karl

Mark Liversedge

unread,
Jun 22, 2016, 3:18:59 AM6/22/16
to golden-cheetah-users, kbil...@gmail.com
I agree. I raised the issue of a BOM to highlight this, I should have used the word "except" :)
Stripping it is a lot easier than post-processing strings to un-escape/escape unicode characters !

Mark 

Steve Mansfield

unread,
Jun 30, 2016, 6:46:42 AM6/30/16
to golden-cheetah-users, quit...@gmail.com, kbil...@gmail.com
Thanks, I've gotten that going

The only thing is that the CSV has many fields I don't need and are all null values

secs

 cad

 hr

 km

 kph

 nm

 watts

 alt

 lon

 lat

 headwind

 slope

 temp

 interval

 lrbalance

 lte

 rte

 lps

 rps

smo2

 thb

o2hb

 hhb


so I would need to write a processor script or pipeline to remove the extra columns


Reply all
Reply to author
Forward
0 new messages