Hi Nilesh,
Hi,
I am a 3rd year
undergraduate student of computer science, pursuing my B.Tech
degree at RCC Institute of Information Technology. I am
proficient in Java, PHP and C#.
Among the project
ideas on the GSoC 2013 ideas page, the one particular idea that
seemed really interesting to me is the one titled "Reactome
Search". I want to work on it. I think my experience will come
of good use in this project.
Thanks for your interest in the Reactome search project!
We would like to use a Lucene-based search platform called EBeye to
search our database. It is currently being used for the databases
at the European Bioinformatics Institute (EBI), but it does a very
bad job with Reactome data, because it is not using any
domain-specific heuristics for sorting results.
You can try EBeye on this page:
http://www.ebi.ac.uk/s4/
There is more detail about EBeye here:
http://www.ebi.ac.uk/ebisearch/documentation.ebi
A full research paper can be found here:
http://bib.oxfordjournals.org/content/early/2010/02/11/bib.bbp065.full
I am passionate about
data mining, big data, search and recommendation engines,
therefore this idea naturally appeals to me a lot. I have
experience with building search functionality into a live
production site, where I'm interning at. I used Sphinx with
MySQL and was responsible for all the database configuration,
trigger and index creation, and full-text search configuration.
I have thorough experience with Sphinx (a very capable full-text
search engine with many matching and ranking algorithms and
different fuzzy matching options) and am willing to dig deeper
into Lucene or learn SOLR if the need arises. I have a little
experience with Lucene and using DefaultSimilarity (uses Cosine
Similarity).
I would like to
download the Reactome source code and set it up on my local
machine. But I couldn't find any reference to a source code repo
anywhere other than that it uses CVS. As suggested in
http://wiki.reactome.org/index.php/Reactomes,
I'm CC'ing David to help me out. It'd be great if I could
examine the code in the perl CGI script (search2) and the code
in
GKB/modules/GKB/SearchUtils/ResultsRanker.pm
to see how I can integrate it with a search platform like
Lucene.
You can download the Reactome source code bundle from:
http://www.reactome.org/download/current/GKB.tar.gz
You will find our current sorting heuristics under:
GKB/modules/GKB/SearchUtils/ResultsRanker.pm
I hope this gives you something to get started on, please let me
know if you have any questions.
Cheers,
David Croft.