uBiome raw data files

186 views
Skip to first unread message

Samantha Clark

unread,
Sep 25, 2014, 7:30:13 PM9/25/14
to openSNP Dev List
Beta users of uBiome can now download their raw microbiome sequencing data. Their current implementation provides a response in pure JSON markup in the browser. Results are formatted as such (whitespace added):

{ "ubiome_bacteriacounts" : [{"taxon":"2","parent":"131567","count":"200967","count_norm":"1000000","avg":null,"tax_name":"Bacteria","tax_rank":"superkingdom","tax_color":null},
{"taxon":"1239","parent":"2","count":"132648","count_norm":"660049","avg":null,"tax_name":"Firmicutes","tax_rank":"phylum","tax_color":"5E6591"},
{"taxon":"186801","parent":"1239","count":"56235","count_norm":"279822","avg":null,"tax_name":"Clostridia","tax_rank":"class","tax_color":null},
{"taxon":"186802","parent":"186801","count":"56226","count_norm":"279777","avg":null,"tax_name":"Clostridiales","tax_rank":"order","tax_color":null},
{"taxon":"909929","parent":"909932","count":"51425","count_norm":"255888","avg":null,"tax_name":"Selenomonadales","tax_rank":"order","tax_color":null},

[...]

Where phylogenetic structure is preserved through the "parent":id relationships, and "count_norm" appears to be result of count/(count of ("tax_rank":"superkingdom")) to 6 sigfigs, i.e. percentage of entire result set. 

"tax_color" seems to be for internal use so not sure why it is included, unless there is an ISO taxonomic-nomenclature<->hexadecimal-colour-representations standard of which I am unaware. "avg" I'm assuming is where they intend(ed?) to put average "count_norm" values of all users. May be useful for those wanting to create their own visualizations/population analyses, less so for those who just have a general data oversharing fetish, cough cough. ;)

Noticeably absent is anything labelling sample source. Predefined options for this are "gut", "mouth", "nose", "skin", and "genitals", but they also send out a "spare" swab with kits, which you can use on anything (I somehow resisted the urge to use mine for "anus of local Felis catus").

While I know you are all (very) busy, I just want to put this out there as a potential additional raw data type which could be uploaded and attached to an openSNP user profile. There was a positive reception on this list when the uBiome project was first announced, so I thought I'd follow up on that now that the data are accessible (to some). Alternatively, you could just wait until they provide an API, but I expect that to be in the far distant future as they seem to be developing slowly at this time.

I have attached my full "gut" data as a more comprehensive sample for any of those interested.

Cheers!

Samantha
gut

Philipp Bayer

unread,
Sep 25, 2014, 8:19:45 PM9/25/14
to snpr-dev...@googlegroups.com
Hi Samantha,

thank you very much for the data! That looks very cool.

I guess we'd have to introduce a group of models -  the different tax-ranks and their names, the different species, the other being the user-associated counts. That's a bit of work but do-able, and then we can do user-wide counts! and even link SNPs to specific outliers, I'm pretty sure there should be some connection, even though you might just get ancestry SNPs from ancestry diets (i.e., milk rich diet microbiome would get linked to European ancestral SNPs. Nice Lactobacillus casei btw, does that come in bags too?).

I'm pretty sure tax_color is just an internal thing, could be that the display mechanism queries the same data and they haven't yet bothered to remove these internal extras. I assume 'count' is the actual raw cell count and not some other kind of variable?

It looks like github could even be a good place for this data, for now.

P.S.: This came out today, http://opensource.com/life/14/9/genotype-tests-open-snp

P.P.S: You should use that spare kit on your keyboard and be disgusted


--
You received this message because you are subscribed to the Google Groups "SNPr development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snpr-developme...@googlegroups.com.
To post to this group, send email to snpr-dev...@googlegroups.com.
Visit this group at http://groups.google.com/group/snpr-development.
For more options, visit https://groups.google.com/d/optout.

Bastian Greshake

unread,
Sep 26, 2014, 3:36:39 AM9/26/14
to snpr-dev...@googlegroups.com
Hi,
sorry to be that guy, but I just want to make sure the terminology doesn’t get watered down: What uBiome offers for now and what we see here is not the raw sequencing data but rather the results of the taxonomic assignment for the sequencing reads. So it’s not really raw data, because we have to take their taxonomic assignment at face value.

The actual sequencing reads from the 16S you can not download for now, or at least not directly. What you can do is contact them via email and nicely ask them to provide you the actual sequences, in that case they might send you the FASTQ files. At least that worked for me: https://github.com/gedankenstuecke/microbiome

But I agree, having the uBiome bacteria counts added to openSNP would be nice. :-)

cheers,
Bastian

--
You received this message because you are subscribed to the Google Groups "SNPr development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snpr-developme...@googlegroups.com.
To post to this group, send email to snpr-dev...@googlegroups.com.
Visit this group at http://groups.google.com/group/snpr-development.
For more options, visit https://groups.google.com/d/optout.
<gut>

signature.asc
Reply all
Reply to author
Forward
0 new messages