Hey there,
so by now I’m nearly back from this month’s conference circus and would like to give you a small update. (figured out it’s not that small. Ok, there’s a tl;dr at the end).
# PCA
Over the last weekend I was in Zurich for the research data hackdays there. The group I led decided to try doing a PCA to cluster the data sets that are available on openSNP and to see where people come from originally. This is how far we came:
http://make.opendata.ch/wiki/project:opensnp#to-do
As you can see we didn’t get too far, but we managed to reformat all the 23andMe data sets, clean it up a bit and did the PCA with the openSNP data. It’s not super informative, but right now I’m trying to add the 1000 Genomes data into the mix. These have known ancestry and could work as guiding populations compared to openSNP.
If that should work out nicer: Do you think that we should somehow integrate this into the website? It would offer a sweet visualization and you could also use it to a) color the data sets by phenotypes to see whether some phenotypes cluster according to ancestry and b) you could use it to link to users. So you could on individual user pages show the PCA neighborhood to find closely related data sets
# Studies w/ openSNP data
While in Zurich I also talked to Ulrich (with whom we’re planning the anosmia study) and with Effy (with whom we are doing the survey amongst openSNP users). There’s little progress on the former study (largely because the ethics stuff still isn’t sorted out, but I’m still optimistic that we will get there) but there’s more on the latter. Effy and I spent a morning going over the latest draft of the survey and it should be final pretty soon. But we think it might be best to not send out the emails with the survey over the summer, as people are likely on vacation and might miss the email.
# Global Alliance: Lighting a Beacon
From Monday to Today I was in Leiden at the Plenary Meeting of the “Global Alliance for Genomics & Health”, they are having this little API thingy called “Beacons” which are basically just a proof of concept. The idea of them was to create the simplest possible genomic API. So what you can query for is basically just to ask whether a database has a dataset that contains a given allele at a given position.
So you can ask “does openSNP have a data set with an A at chromosome 3, position 565343?” and the answer is YES/NO/NONE(in case things went wrong somewhere). I think it would be fun if openSNP would offer this, because it’s easily done and shows our support for the GA. I already did implement it in a new branch:
https://github.com/gedankenstuecke/snpr/pull/177 Would be nice to get comments from you on it.
# TL;DR
- We did a PCA in Zurich, shall we include this somehow in openSNP?
- all current studies are still work in progress
- I did a reference implementation of the GA Beacon API, shall we put this live?
Cheers,
Bastian