Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
about DataSource and Miriam
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  2 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Igor Rodchenkov  
View profile  
 More options Mar 25 2010, 10:51 am
From: Igor Rodchenkov <rod...@gmail.com>
Date: Thu, 25 Mar 2010 10:51:23 -0400
Local: Thurs, Mar 25 2010 10:51 am
Subject: about DataSource and Miriam

Hi folks!

First of all (recent emotions), the more I am into the Bridge DB code, the
more I like it. Excellent job, especially impressed by DataSource! Thank
you.

Here, I would like to suggest something about Miriam in Bridge DB (I'm not a
member of Miriam team, by the way, but built on it quite a lot). As you
know, in org.bridgedb.bio.BioDataSource, we have init() method that
reads dadasource attributes from the text file
(org/bridgedb/bio/datasources.txt). In the same class, there are regexp
patterns hard-coded, e.g.:

DataSourcePatterns.registerPattern(

BioDataSource.UNIPROT,

Pattern.compile(
"([A-N,R-][0-9][A-Z][A-Z,0-9][A-Z,0-9][0-9])|([O,P,Q][0-9][A-Z,0-9][A-Z,0-9 ][A-Z,0-9][0-9])"
)

);

These are potential problems... (including to update)  Alternative? Well, as
we know, Miriam is good at being a standard, having XML schema and data
export to XML, being very small "database", having standard DB names and
synonyms, and *ID patterns*, etc. There are java library (MiriamLink 1.1.1,
though heavily uses web service) capable to return the list of data sources,
resources, convert db:id pairs to data URN, URLs, etc. So I encourage Bridge
dB to use that independently supported and quite easy staff instead of (or
in addition to) the internal text file and Pattern.compile(
"([A-N,R-][0-9][A-Z][A-Z,0-9][A-Z,0-9][0-9])|([O,P,Q][0-9][A-Z,0-9][A-Z,0-9 ][A-Z,0-9][0-9])"
 (although there are data sources in datasources.txt that are absent from
Miriam; *this is a good point to ask them add*!) So, init() method could get
all the required datasources with attributes from the latest miriam xml
automatically! Entire (small!) Miriam can be loaded and unmarshalled only
once and used for all DataSource and Xref-related things.

So far, I've made a new MiriamLink version that I'd live to share. It's
actually one class, schema, and pom.xml (helps xjc generate sources), and
almost no dependencies (when using Java6)). Miriam team is willing to help
develop and release the official version of this idea:

- browse src here:
http://biopax.hg.sourceforge.net/hgweb/biopax/validator/file/ (go to
"miriam-lib" there), e.g.:

--
http://biopax.hg.sourceforge.net/hgweb/biopax/validator/file/fcf9ddba...
(caution:
the revision number may change)
--
http://biopax.hg.sourceforge.net/hgweb/biopax/validator/file/5a3a1132...

--
http://biopax.hg.sourceforge.net/hgweb/biopax/validator/file/fcf9ddba...

- at a maven2 repository:
http://biopax.sourceforge.net/m2repo/snapshots/org/biopax/miriam-lib/ is the
re-engineered one we're talking about (*compiled from the above source
(do not be confused, as there are actually another versions on the
miriam-lib in the m2repo: http://biopax.sourceforge.net/
m2repo/snapshots/uk/ac/ebi/miriam-lib/ - which is simply the unmodified
miriam-lib v1.1.1 release, manually "mavenized" and deployed there)*

PS:
To checkout the sources, one should have Mercurial client ("hg") installed;
and then do -
hg clone http://biopax.hg.sourceforge.net:8000/hgroot/biopax/validator
 biopax-validator
- then take *take only the biopax-validator/miriam-lib* directory
(unfortunately, you get all the "validator" modules, but feel free to
ignore/remove unrelated).

Cheers,
--
Igor Rodchenkov
baderlab.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Martijn van Iersel  
View profile  
 More options Mar 25 2010, 12:28 pm
From: Martijn van Iersel <mvanier...@gmail.com>
Date: Thu, 25 Mar 2010 17:28:14 +0100
Local: Thurs, Mar 25 2010 12:28 pm
Subject: Re: [bridgedb] about DataSource and Miriam
Hi Igor

See my answers in between below...

Yes, we're aware of Miriam (in fact, the Xref.getURN() method will
return a miriam-based urn if available.
The intention has been to rely on Miriam for this type of data as much
as we can, but in actuality this hasn't happened everywhere yet, due to
time constraints. But I'm hoping to improve this in the future (and
patches are always welcome of course)

> There are java library (MiriamLink 1.1.1, though heavily uses web
> service) capable to return the list of data sources, resources,
> convert db:id pairs to data URN, URLs, etc. So I encourage Bridge dB
> to use that independently supported and quite easy staff instead of
> (or in addition to) the internal text file
> and Pattern.compile("([A-N,R-][0-9][A-Z][A-Z,0-9][A-Z,0-9][0-9])|([O,P,Q][0-9][ A-Z,0-9][A-Z,0-9][A-Z,0-9][0-9])"

Yes, this is a good idea, but I still want to keep datasources.txt
around as a fallback solution (i.e. as a cache), for two reasons.

Reason 1: Webservices can be unreliable, (as we're seeing now with
CRONOS wsdl, for example). Since we're dealing with very little data,
~100s of records at max, it's a piece of cake to maintain such a cache.

Reason 2: There is another problem with Miriam: some of the DataSources
that we consider very important they don't want to add. Examples of
these are Affymetrix probeset IDs and Agilent. Affymetrix and Agilent
don't provide online pages for each possible identifier, so Miriam
doesn't want to consider those. Yet mapping to and from microarray
reporters is one of the primary use-cases for BridgeDb. I've had a long
discussion with the Miriam folks over this, and I think the end result
is that we agreed to disagree. Miriam is for linking to identifiers,
BridgeDb is for mapping between identifiers, so even though there is
overlap there is also some tension there.

>  (although there are data sources in datasources.txt that are absent
> from Miriam; *this is a good point to ask them add*!)

Indeed I've already submitted a few datasources to Miriam myself, but
like I said some are considered out of their scope. That's fine, I would
still like to use Miriam data wherever possible, but it's not an end-all
solution unfortunately.

Ok, thanks for the info, this is very useful. I'll take a look at this
and I'll try to do what I can to improve the cohesion between Miriam and
BridgeDb.

regards,
Martijn


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »