Hi Prasath,
>
>
> My name is Prasath Pararajalingam and I'm interested in applying for
> the Reactome search engine project. I have a Bachelor of Science in
> Biology from the University of Western Ontario and a Graduate
> Certificate in Bioinformatics from Seneca College. I've read the other
> threads for this project and I have some questions about the project.
>
> I just want to that make sure I have the deliverables correct. Here is
> what I believe needs to be created for this project:
>
> 1) Script to organize Reactome data into domains.
Reactome data is already pretty well organized - what do you mean by
domains? We do already have ranking heuristics, see below for more details.
>
> 2) Script to index domains for the Lucene search engine.
>
> 3) Search script which incorporates the domain of the search hit into
> its ranking algorithm.
>
> 4) Client-side script to display results (optional).
That seems like a reasonable breakdown of what we want.
We would like to use a Lucene-based search platform called EBeye to
search our database. It is currently being used for the databases at
the European Bioinformatics Institute (EBI), but it does a very bad job
with Reactome data, because it is not using any domain-specific
heuristics for sorting results.
You can try EBeye on this page:
http://www.ebi.ac.uk/s4/
There is more detail about EBeye here:
http://www.ebi.ac.uk/ebisearch/documentation.ebi
A full research paper can be found here:
http://bib.oxfordjournals.org/content/early/2010/02/11/bib.bbp065.full
You can download the Reactome source code bundle from:
http://www.reactome.org/download/current/GKB.tar.gz
You will find our current sorting heuristics under:
GKB/modules/GKB/SearchUtils/ResultsRanker.pm
As you will see, these are hard-coded in Perl, and we would like to
migrate from Perl to a pure Java environment.
As you already noted, the client side of things is optional. It's a
nice to have feature, if there is time. We are using the Google Web
Toolkit (GWT) to build our web pages, and this is also what we would use
for the client.
I hope this gives you something to get started on, please let me know if
you have any questions.
>
> I am excited to work on this project. Thank you for reading this email.
>
Thanks for your interest in the search project!
Cheers,
David Croft.