Contents of the large_corpus.bel and small_corpus.bel files

44 views
Skip to first unread message

Roberto Mosca

unread,
Dec 9, 2015, 11:33:42 AM12/9/15
to openbel-discuss, Julia.K...@thomsonreuters.com
Dear all,

we are currently testing some SPARQL queries on BEL2RDF translated contents. To test the queries we would like to use the small_corpus.bel and large_corpus.bel statement files from the knowledge directory in the openbel framework.

I was wondering if there is a description of how these two corpuses have been created. Where do the statements come from? Do they form a connected network or are they more a collection of independent facts? What type of biological knowledge do they encode? Do they refer to a single biological process? To several processes? To (a) canonical pathway(s)?

Any information would really be helpful!

Thanks in advance for your help.

Sincere regards,
Roberto Mosca

Natalie Catlett

unread,
Dec 9, 2015, 11:35:18 PM12/9/15
to openbel...@googlegroups.com, Julia.K...@thomsonreuters.com

Hi Roberto,

 

As Dexter noted, the large corpus is a subset of the Selventa knowledge base. It is essentially a collection of independent facts – I don’t believe it was selected to represent any specific biological process(es) or signaling pathway(s). The RCR methods paper (http://www.biomedcentral.com/1471-2105/14/340) has some information about the contents of the large corpus since it used this corpus as a knowledge source. In particular, the supplemental data section provides a file listing the “mechanisms” (upstream controllers of multiple RNAs) that could be generated from the large corpus.

 

Best,

Natalie

--
You received this message because you are subscribed to the Google Groups "openbel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openbel-discu...@googlegroups.com.
To post to this group, send email to openbel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openbel-discuss/daf3705c-b982-48b6-bd6f-29317efc2643%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert...@thomsonreuters.com

unread,
Dec 10, 2015, 4:20:08 AM12/10/15
to openbel...@googlegroups.com, Julia.K...@thomsonreuters.com
Thank you Natalie,

this gives me some useful information about how to use it for testing.

Best,
Roberto

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 
Roberto Mosca, PhD
Software Developer, Systems Biology Informatics

IP&Science
Thomson Reuters

Robert...@thomsonreuters.com

unread,
Dec 14, 2015, 11:31:19 AM12/14/15
to openbel...@googlegroups.com, Julia.K...@thomsonreuters.com
Dear All,

While working on the small_corpus.bel I found a typo at line 312 of small_corpus.bel.

p(HGNC:HIF1A,pmod(P,H))

should be

p(HGNC:HIF1A,pmod(H,P))

It is hydroxylation of a proline instead of phosphorylation of an histidine…

Sincere regards,
Roberto Mosca

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 
Roberto Mosca, PhD
Software Developer, Systems Biology Informatics

IP&Science
Thomson Reuters

Anthony Bargnesi

unread,
Dec 14, 2015, 2:59:24 PM12/14/15
to openbel...@googlegroups.com, Julia.K...@thomsonreuters.com

Roberto,


Great catch! Please go ahead and submit a Pull Request to the https://github.com/OpenBEL/openbel-framework-resources repository and we'll merge it in.


In the future feel free to open an issue and submit a Pull Request as you see fit.


Thanks!


Anthony Bargnesi
Application Architect
Selventa  |  T: +1 617.547.5421 x216


From: openbel...@googlegroups.com <openbel...@googlegroups.com> on behalf of Robert...@thomsonreuters.com <Robert...@thomsonreuters.com>
Sent: Monday, December 14, 2015 11:30 AM
To: openbel...@googlegroups.com
Cc: Julia.K...@thomsonreuters.com
Subject: Re: Contents of the large_corpus.bel and small_corpus.bel files
 

Robert...@thomsonreuters.com

unread,
Dec 15, 2015, 6:12:47 AM12/15/15
to openbel...@googlegroups.com, Julia.K...@thomsonreuters.com
Thanks Anthony,

I have submitted a pull request on the issue.

Going on using the dataset I also found another point that I cannot understand. Before submitting another pull request I would like to make sure if it is an error or if I cannot interpret correctly the text.

I believe that the 

UNSET STATEMENT_GROUP

SET STATEMENT_GROUP = "Group 5"

At line 868 of the BEL file are misplaced. Also the SET Evidence that follows

SET TextLocation = Review
SET Evidence = "Originally identified by high-throughput screening of small molecules against C-Raf
kinase, sorafenib was found to be a potent competitive inhibitor of ATP binding
in the catalytic domains of C-Raf, wild-type B-Raf, and V599EB-Raf mutant."
SET Citation = {"PubMed","The Biochemical journal","12444918","","Houslay MD|Adams DR",""}

refers to the previous sentence and I cannot establish exactly which Evidence refers to the next sentence.

act(p(HGNC:ADRB2)) directlyIncreases complex(p(HGNC:ADRB2),p(SFAM:"PDE4 Family"),p(SFAM:"ARRB Family"))

Can someone help me out with this?

Thanks in advance for your help!
Roberto

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 
Roberto Mosca, PhD
Software Developer, Systems Biology Informatics

IP&Science
Thomson Reuters

Anthony Bargnesi

unread,
Dec 15, 2015, 7:30:00 AM12/15/15
to openbel-discuss, Julia.K...@thomsonreuters.com
Roberto,

Thanks, the PR was merged.

Regarding the STATEMENT_GROUP separator I don't think they are misplaced. Statement group "Group 4" describes PubMed article 16170185 while "Group 5" describes PubMed article 12444918.

I do think the BEL statement (line 878):

    act(p(HGNC:ADRB2)) directlyIncreases complex(p(HGNC:ADRB2),p(SFAM:"PDE4 Family"),p(SFAM:"ARRB Family"))

has the wrong Evidence annotation.

The correct Evidence may be lines 951 - 957:

SET Evidence = "All 3 categories of PDE4 isoforms from all four subfamilies can interact with β-arrestin1/2,
implying that a common region in the PDE4 catalytic unit provides a
binding site for β-arrestins [65]. Thus challenge of cells with a β-agonist has
been shown to cause recruitment of a PDE4–arrestin complex to the β2- AR."

complex(p(SFAM:"PDE4 Family"),p(SFAM:"ARRB Family"))

act(p(HGNC:ADRB2)) directlyIncreases complex(p(HGNC:ADRB2),p(SFAM:"PDE4 Family"),p(SFAM:"ARRB Family"))

Thanks,

Tony

Robert...@thomsonreuters.com

unread,
Dec 15, 2015, 7:53:30 AM12/15/15
to openbel...@googlegroups.com
Thanks Anthony,

this clarifies things.

I have placed another pull request.

Best,
Roberto

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 
Roberto Mosca, PhD
Software Developer, Systems Biology Informatics

IP&Science
Thomson Reuters

-- 
You received this message because you are subscribed to the Google Groups "openbel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openbel-discu...@googlegroups.com.
To post to this group, send email to openbel...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages