Re: [Genome Informatics] Reactome Search question

46 views

Skip to first unread message

David Croft

unread,

Apr 26, 2013, 5:49:35 AM4/26/13

to ppar...@gmail.com, genome-in...@googlegroups.com

Hi Prasath,
>
>
> My name is Prasath Pararajalingam and I'm interested in applying for
> the Reactome search engine project. I have a Bachelor of Science in
> Biology from the University of Western Ontario and a Graduate
> Certificate in Bioinformatics from Seneca College. I've read the other
> threads for this project and I have some questions about the project.
>
> I just want to that make sure I have the deliverables correct. Here is
> what I believe needs to be created for this project:
>
> 1) Script to organize Reactome data into domains.

Reactome data is already pretty well organized - what do you mean by
domains? We do already have ranking heuristics, see below for more details.
>
> 2) Script to index domains for the Lucene search engine.
>
> 3) Search script which incorporates the domain of the search hit into
> its ranking algorithm.
>
> 4) Client-side script to display results (optional).

That seems like a reasonable breakdown of what we want.

We would like to use a Lucene-based search platform called EBeye to
search our database. It is currently being used for the databases at
the European Bioinformatics Institute (EBI), but it does a very bad job
with Reactome data, because it is not using any domain-specific
heuristics for sorting results.

You can try EBeye on this page:

http://www.ebi.ac.uk/s4/

There is more detail about EBeye here:

http://www.ebi.ac.uk/ebisearch/documentation.ebi

A full research paper can be found here:

http://bib.oxfordjournals.org/content/early/2010/02/11/bib.bbp065.full

You can download the Reactome source code bundle from:

http://www.reactome.org/download/current/GKB.tar.gz

You will find our current sorting heuristics under:

GKB/modules/GKB/SearchUtils/ResultsRanker.pm

As you will see, these are hard-coded in Perl, and we would like to
migrate from Perl to a pure Java environment.

As you already noted, the client side of things is optional. It's a
nice to have feature, if there is time. We are using the Google Web
Toolkit (GWT) to build our web pages, and this is also what we would use
for the client.

I hope this gives you something to get started on, please let me know if
you have any questions.
>
> I am excited to work on this project. Thank you for reading this email.
>
Thanks for your interest in the search project!

Cheers,

David Croft.

cr...@ebi.ac.uk

unread,

Apr 29, 2013, 1:48:41 PM4/29/13

to genome-in...@googlegroups.com, ppar...@gmail.com

Hi Prasath,

> I went through the EBI-eye help documentation and it explains how related
> biological data are organized into domains and categories; I assumed that
> Reactome's data needed to be organized in similar fashion.

It's an open question how to organize the results. My preference would be
to keep it as simple as possible, I'm not sure that categorizing is
necessarily the best way to go. That would require further thought and
discussion - the pros and cons of the different approaches would be worth
putting into the project proposal.

> In the project description, you mention that domain-information needed to
> be taken into account during ranking. What do you mean by
> domain-information? Are you referring to the "instance class rank factors"
> in ResultsRanker.pm?

That's right.

Cheers,

David.

> --
> You received this message because you are subscribed to the Google Groups
> "Genome Informatics-Google Summer of Code Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to genome-informat...@googlegroups.com.
> To post to this group, send email to genome-in...@googlegroups.com.
> Visit this group at
> http://groups.google.com/group/genome-informatics?hl=en-US.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

Reply all

Reply to author

Forward

0 new messages