Jerven Bolleman
unread,Apr 27, 2014, 6:40:40 PM4/27/14Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Chris Mungall, Michel Dumontier, faldo, Joachim Baran, Toshiaki Katayama, Robert Buels, Robert Hoehndorf, Raoul Bonnal, Takatomo Fujisawa, Peter Cock, Francesco Strozzi
Hi Chris,
On 27 Apr 2014, at 20:31, Chris Mungall <
cjmu...@lbl.gov> wrote:
> At the Network of Biothings hackathon yesterday I started a translation from ClinVar to triples using FALDO to represent the position of the variant.
>
> This is not unusual in that the source data here stores position as tuples
>
> (GenomeBuild, Chromosome, Begin, End)
>
> If FALDO is to take off, others will be producing triples from similar tuples.
>
> We are a little bit silent on how the generate the value of the reference predicate. The manuscript states "FALDO makes very few
> assumptions about the representation of the reference sequence". I think avoiding overspecification and allowing flexibility is good, but we should give more guidance here.
>
> For example, is it up to my infrastructure to perform a lookup on (GenomeBuild, Chromosome) -> ReferenceURI?
>
> For example (GRCh37, Chr8) -->
http://www.ebi.ac.uk/genomes/CM000670.html
I think we need a Cool URI’s for biological records document. But at the moment yes, reference URI’s are not cool at the moment, which means you do need
to do lookups :(
>
> Let's say this is not a pain to do. Is it then up to the consumer of my triples to do some kind of reverse lookup when they want to expose the build and chromosome in their system?
>
> Is it considered good practice for me to also produce triples such as:
>
> <
http://www.ebi.ac.uk/genomes/CM000670> hasChromosome :Chr8 .
> <
http://www.ebi.ac.uk/genomes/CM000670> hasBuild :GRCh37 .
I think that is biologists talk ;) not semantics.
I would go more for something like this.
> <
http://www.ebi.ac.uk/genomes/CM000670> representsChromosome :Chr8 .
> <
http://www.ebi.ac.uk/genomes/CM000670> assembly build:GRCh37 .
In the end its down to a practical decision, we can’t model databases we don’t control.
And before you know it FALDO ends up doing ensembl/ena/ddbj/and refseq in RDF.
Secondly the reference sequences is one of the 6 contig sequences so it would be
<
http://www.ebi.ac.uk/ena/data/view/GL000062> a ena:Contig .
<
http://www.ebi.ac.uk/ena/data/view/GL000062> partOf <
http://www.ebi.ac.uk/genomes/CM000670> .
_:1 a faldo:ExactPostion ;
faldo:position 1;
faldo:reference <
http://www.ebi.ac.uk/ena/data/view/GL000062> .
>
> This way consumers can easily get at what they want. But FALDO is silent on this.
Its because this is already covered in basic URI design… We can put something on the wiki..
FALDO is limited in scope (maybe too limited)
>
> It is tempting to produce the data as JSON with a simple object for producing the location:
>
> {
> "build" : …,
> "chromosome" : …,
> "begin" : …,
> "end" : ..
> }
Its late here and I can’t think of anything better than this for now.
{
"@context": {
"faldo" : "
http://biohackathon.org/faldo#",
"build" : {"@id":"faldo:reference", "@type":"@id"},
"begin" : {"@id":"faldo:begin"},
"end" : {"@id":"faldo:end"},
"pos" : {"@id":"faldo:position"}
},
"@type" : "faldo:Region",
"build" : "GRCh37" ,
"chromosome" :"Chr09",
"begin" : {"pos" : 1},
"end" : {"pos" : 2}
}
You can play more with it here.
http://json-ld.org/playground/#startTab=tab-nquads&json-ld=%7B%22%40context%22%3A%7B%22faldo%22%3A%22http%3A%2F%2Fbiohackathon.org%2Ffaldo%23%22%2C%22build%22%3A%7B%22%40id%22%3A%22faldo%3Areference%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22begin%22%3A%7B%22%40id%22%3A%22faldo%3Abegin%22%7D%2C%22end%22%3A%7B%22%40id%22%3A%22faldo%3Aend%22%7D%2C%22pos%22%3A%7B%22%40id%22%3A%22faldo%3Aposition%22%7D%7D%2C%22%40type%22%3A%22faldo%3ARegion%22%2C%22build%22%3A%22GRCh37%22%2C%22chromosome%22%3A%22Chr09%22%2C%22begin%22%3A%7B%22pos%22%3A1%7D%2C%22end%22%3A%7B%22pos%22%3A2%7D%7D
>
> And include a JSON-LD context for mapping this to an RDF model that is not FALDO but has a defined translation to FALDO. This avoids creating all the additional URIs, and makes it easier for consumers of the data to get what they need.
I agree we avoid all the references and we have a section about that for DDBJ. Which applies here as well.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "FALDO" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
faldo+un...@googlegroups.com.
> For more options, visit
https://groups.google.com/d/optout.