Golden Cheatah to Google Big Query?

Steve Mansfield

unread,

Jun 15, 2016, 10:29:47 PM6/15/16

to golden-cheetah-users

G'day

With the recent release of a tool from OWOX, you can now query data in Google Cloud Storage via BigQuery, and pull it into Google Sheets.

Works well for the test data sets.

Wondering if there is a good ie straight forward way to get GC data into Big Query tables?

TIA

Steve

Mark Liversedge

unread,

Jun 16, 2016, 2:25:25 AM6/16/16

to golden-cheetah-users

If you have paid for your GCP then you should be able to just load in activities as they are only JSON, should be a snap.

We have a GoldenCheetah GCP to host cloud charts and will shortly be using it to create an Open Data Store of downsampled ride data for researchers and others to use. But we are not likely to provide a generic query service on top of the big tables since we do not have funding for that and it could become very costly, very quickly.

Cheers

Mark

Steve Mansfield

unread,

Jun 17, 2016, 7:53:16 AM6/17/16

to golden-cheetah-users

Yes you're right, I can do that.

I had a go at uploading a GC .json file as a new table, but the file gets rejected. Do you know which bits I need to remove? I created a schema based on the ordering of the fields on the JSON file.

The error I get is

Errors:file-00000000: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. (error code: invalid)JSON parsing error in row starting at position 0 at file: file-00000000. Parser terminated before end of string (error code: invalid)

which I get even if I remove all the non-data bits below

Here's the first few lines of a sample file:

{

"RIDE":{

"STARTTIME":"2016\/06\/17 06:28:11 UTC ",

"RECINTSECS":1,

"DEVICETYPE":"Garmin FR920XT ",

"IDENTIFIER":" ",

"TAGS":{

"Athlete":"PowerTap ",

"Calendar Text":" ",

"Data":"TDSPHC-AGL----- ",

"Device":"Garmin FR920XT ",

"File Format":"FIT (*.fit) ",

"Filename":"2016_06_17_16_28_11.json ",

"Month":"June ",

"Source Filename":"2016-06-17-16-28-11 - 920xt_2016_06_17_16_28_11.FIT ",

"Sport":"Bike ",

"Weekday":"Fri ",

"Year":"2016 "

},

"INTERVALS":[

{ "NAME":"Lap 1 ", "START": 0, "STOP": 1441 },

{ "NAME":"Lap 2 ", "START": 1442, "STOP": 2631 },

{ "NAME":"Lap 3 ", "START": 2632, "STOP": 3873 },

{ "NAME":"Lap 4 ", "START": 3874, "STOP": 4009 }

],

"SAMPLES":[

Steve

Mark Liversedge

unread,

Jun 17, 2016, 8:15:38 AM6/17/16

to golden-cheetah-users

On Friday, 17 June 2016 12:53:16 UTC+1, Steve Mansfield wrote:

I had a go at uploading a GC .json file as a new table, but the file gets rejected. Do you know which bits I need to remove? I created a schema based on the ordering of the fields on the JSON file.

The error I get is

Errors:file-00000000: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. (error code: invalid)JSON parsing error in row starting at position 0 at file: file-00000000. Parser terminated before end of string (error code: invalid)

Its probably the BOM at the beginning of the file (text strings in metadata need this for i18n).

https://www.w3.org/International/questions/qa-byte-order-mark

Cheers and good luck !!

Mark

Steve Mansfield

unread,

Jun 21, 2016, 6:11:34 AM6/21/16

to golden-cheetah-users

If anyone else is interested...

1. you need to remove all the stuff at the start of the GC JSON activity file so it starts with the actual ride data

2. then remove all the stuff at the end of the JSON file after the last line of ride data, I think you need the last line to be blank

3. the spacing of fields and text elements in the GC JSON file is not accepted by Big Query, so you'll need to modify them - see example below

4. you'll need the schema, or create a JSON that has the schema, and use it to paste into the schema window in BQ. To save you the work, I've attached one.

A successful file will look like this:

{"SECS": 1, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT": 24.8, "LAT": -32.001868, "LON": 141.13594, "SLOPE": 0 }

{"SECS": 2, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT": 24.6, "LAT": -33.998934, "LON": 141.13234, "SLOPE": 0 }

Finally, BQ is very sensitive to file names. I don't know all the rules, but I do know that having a "-" in the filename will cause a parse error.

My next challenge is to try to upload data into date partitioned tables so that I can have a whole month in the same table. Should make queries quicker (and cheaper).

PS There's some good fun to be had with Google Cloud, and when you sign up and start you get $300 credit.

RideDataSchema.json.zip

Mark Liversedge

unread,

Jun 21, 2016, 8:49:47 AM6/21/16

to golden-cheetah-users

On Tuesday, 21 June 2016 11:11:34 UTC+1, Steve Mansfield wrote:

If anyone else is interested...

1. you need to remove all the stuff at the start of the GC JSON activity file so it starts with the actual ride data
2. then remove all the stuff at the end of the JSON file after the last line of ride data, I think you need the last line to be blank
3. the spacing of fields and text elements in the GC JSON file is not accepted by Big Query, so you'll need to modify them - see example below
4. you'll need the schema, or create a JSON that has the schema, and use it to paste into the schema window in BQ. To save you the work, I've attached one.

A successful file will look like this:

{"SECS": 1, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT": 24.8, "LAT": -32.001868, "LON": 141.13594, "SLOPE": 0 }
{"SECS": 2, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT": 24.6, "LAT": -33.998934, "LON": 141.13234, "SLOPE": 0 }

Finally, BQ is very sensitive to file names. I don't know all the rules, but I do know that having a "-" in the filename will cause a parse error.

My next challenge is to try to upload data into date partitioned tables so that I can have a whole month in the same table. Should make queries quicker (and cheaper).

PS There's some good fun to be had with Google Cloud, and when you sign up and start you get $300 credit.

Thanks Steve.

For the record, our JSON files are fully compliant with RFC 7159. Use of white space is valid.

The UTF-8 encoding with a Byte order mark (BOM) breaks parsers that do not expect it, and should be removed, but of course then you will need to re-encode unicode strings into some other encoding too.

Mark

Steve Mansfield

unread,

Jun 21, 2016, 8:28:11 PM6/21/16

to golden-cheetah-users

For the record, our JSON files are fully compliant with RFC 7159. Use of white space is valid.

Yes, they certainly look OK, so its the Big Query parse engine that has issues :-)

Karl Billeter

unread,

Jun 21, 2016, 9:14:19 PM6/21/16

to golden-cheetah-users

On Tue, Jun 21, 2016 at 05:49:47AM -0700, Mark Liversedge wrote:

> For the record, our JSON files are fully compliant with RFC 7159. Use of
> white space is valid.
>
> The UTF-8 encoding with a Byte order mark (BOM) breaks parsers that do not
> expect it, and should be removed, but of course then you will need to
> re-encode unicode strings into some other encoding too.

Not sure I'd agree here as the RFC 7159 (section 8.1) specifies encoders
MUST NOT add a BOM (but a parser MAY ignore one). Obviously not my call
and pretty easy to strip if required in any case :-)

Karl

Karl Billeter

unread,

Jun 21, 2016, 9:25:43 PM6/21/16

to Steve Mansfield, golden-cheetah-users

On Tue, Jun 21, 2016 at 03:11:34AM -0700, Steve Mansfield wrote:

> If anyone else is interested...
>
> 1. you need to remove all the stuff at the start of the GC JSON activity
> file so it starts with the actual ride data
> 2. then remove all the stuff at the end of the JSON file after the last
> line of ride data, I think you need the last line to be blank
> 3. the spacing of fields and text elements in the GC JSON file is not
> accepted by Big Query, so you'll need to modify them - see example below
> 4. you'll need the schema, or create a JSON that has the schema, and use it
> to paste into the schema window in BQ. To save you the work, I've attached
> one.
>
> A successful file will look like this:
>
> {"SECS": 1, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT":
> 24.8, "LAT": -32.001868, "LON": 141.13594, "SLOPE": 0 }
> {"SECS": 2, "KM": 0, "WATTS": 0, "CAD": 0, "KPH": 0, "HR": 104, "ALT":
> 24.6, "LAT": -33.998934, "LON": 141.13234, "SLOPE": 0 }

Another option if you're going to only use the samples in the ride is to use
the REST API. That way you don't need to do any hand editing. Something like

curl 'localhost:12021/Karl%20Billeter/activity/2016_06_22_08_32_18.json?format=csv' > 20160622T083218.csv

should work.

Karl

Mark Liversedge

unread,

Jun 22, 2016, 3:18:59 AM6/22/16

to golden-cheetah-users, kbil...@gmail.com

I agree. I raised the issue of a BOM to highlight this, I should have used the word "except" :)

Stripping it is a lot easier than post-processing strings to un-escape/escape unicode characters !

Mark

Steve Mansfield

unread,

Jun 30, 2016, 6:46:42 AM6/30/16

to golden-cheetah-users, quit...@gmail.com, kbil...@gmail.com

Thanks, I've gotten that going

The only thing is that the CSV has many fields I don't need and are all null values

secs

cad

hr

km

kph

nm

watts

alt

lon

lat

headwind

slope

temp

interval

lrbalance

lte

rte

lps

rps

smo2

thb

o2hb

hhb

so I would need to write a processor script or pipeline to remove the extra columns

Reply all

Reply to author

Forward