--
You received this message because you are subscribed to the Google Groups "bio2rdf" group.
To post to this group, send email to bio...@googlegroups.com.
To unsubscribe from this group, send email to bio2rdf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/bio2rdf?hl=en.
Sorry for the delays of the answer, I was in vacation without any
Internet access for the last 2 weeks.
For all endpoints except OMIM and HGNC, the update process is manual.
That means that I have to fetch the new data myself, convert it in RDF
or nQuads and load it in the corresponding Virtuoso server. Once it is
done, I send a copy of the updated virtuoso.db file to the various
Bio2RDF mirror.
For OMIM and HGNC, I use a pipeline script that do all the update
process in 1 shell command. However, I still need to prepare the
virtuoso.db manually to update the mirrors.
I also have an endpoint where I place information about a release
http://release.bio2rdf.org/sparql . You can see the release which have
information with that query:
select distinct ?release where {?release a
<http://bio2rdf.org/release_resource:release>}
Few release are listed there, so you can assume that the one not there
has not been updated in a while.
Bye !!
Marc-Alexandre
2011/8/15 Joerg Kurt Wegner <joergku...@gmail.com>:
They *do* actually all have a version and a date encoded using Dublin
Core, although the version may simply be the date if the dataset
doesn't have a numbered versioning scheme. Do you have an alternative
strategy for encoding this information?
> Just to put it in perspective, at this point I consider Bio2RDF an non-
> maintainable and not integrate-able in an organizational context.
> Unless we get a clearer process on this I will hold this position and
> cannot recommend it as a data source for use.
> At this point it looks easier integrating data right from the start
> not making any use of Bio2RDF, but I might be too naive on this ?;-)
There are a number of challenges related to integrated data that you
would want to think of before embarking on a solo/new effort.
The first one would be that if you want to provide third-party Linked
Data, where the data is published by someone else but converted to RDF
by you, (as opposed to just a SPARQL endpoint, or some other RDF
resource), you need to have a clear strategy for resolving the HTTP
requests to either your infrastructure, or some federated
infrastructure. This is necessary even before you think about how to
easily maintain the resulting datasets and maintaining uptime on the
Linked Data endpoints. In bio2rdf we have had a single host for this
from the very beginning, although we have branched from a single
physical server into multiple physical servers at different locations.
The important thing was that the URI design from the very beginning
did not hamper us in this goal. In addition, we were lucky in the URI
design that we are now able to offer different REST services at
http://bio2rdf.org/* without interfering with the Linked Data
resolution. For example, you can perform searches using
http://bio2rdf.org/searchns/hgnc/abcc4 without interfering with the
hgnc namespace:identifier pattern.
When you get past that challenge, you need to decide what the limit to
the number of providers you are going to support is. For example, do
your clients have the money necessary to maintain an entire mirror of
NCBI in RDF form? Have you talked to the Uniprot RDF team about their
experiences, as they embarked on the same endeavour about the same
time Bio2RDF started.
We would be more than happy for you to develop maintainable RDFisers
for us, as we (Marc and I) have so far been driven by our personal
PhD's, which only require prototypes to back up our assertions. In
doing so, you could avoid both of the two challenges I just posed and
be able to start doing the actual RDFisation straight away.
Peter
Mark
Diabetes mellitus, type 2, 125853 (3)|PAX4, MODY9, KPD|167413|7q32.1
Diabetes mellitus, type II, 125853 (3)|AKT2|164731|19q13.2
This same information is also available via OMIM web browser under "phenotype gene relationships" table: http://omim.org/entry/125853
However, when I browse the RDF for OMIM ID 125853 (http://bio2rdf.org/omim:125853), I do cannot find this data. Am I doing something incorrect?
2.) OMIM also has links to SNOMED and ICD codes for the phenotypes (please see attached screenshot: right hand corner). As an example, Type 2 Diabetes (OMIM ID 125853) is mapped to SNOMED CT code 44054006. I am not able to find this information as well.
Could someone please advise?
Thanks, Jyoti
Thank you Jose -- very helpful.I do have a related question about OMIM data in Bio2RDF, although not 100% sure if this the appropriate forum to ask.1.) As part of OMIM download, one can get access to the Morbid Map which basically provides the gene-phenotype relationships (ftp://grcf.jhmi.edu/OMIM/morbidmap). For example, the text below describes the morbid map entry for Type 2 Diabetes.Diabetes mellitus, type 2, 125853 (3)|PAX4, MODY9, KPD|167413|7q32.1
Diabetes mellitus, type II, 125853 (3)|AKT2|164731|19q13.2
This same information is also available via OMIM web browser under "phenotype gene relationships" table: http://omim.org/entry/125853
However, when I browse the RDF for OMIM ID 125853 (http://bio2rdf.org/omim:125853), I do cannot find this data. Am I doing something incorrect?
2.) OMIM also has links to SNOMED and ICD codes for the phenotypes (please see attached screenshot: right hand corner). As an example, Type 2 Diabetes (OMIM ID 125853) is mapped to SNOMED CT code 44054006. I am not able to find this information as well.