Re: [Genome Informatics] [GSoC 2013] Interested in working on Reactome Search

51 views
Skip to first unread message

David Croft

unread,
Apr 23, 2013, 8:44:32 AM4/23/13
to genome-in...@googlegroups.com, Nilesh Chakraborty
Hi Nilesh,
Hi,

I am a 3rd year undergraduate student of computer science, pursuing my B.Tech degree at RCC Institute of Information Technology. I am proficient in Java, PHP and C#.

Among the project ideas on the GSoC 2013 ideas page, the one particular idea that seemed really interesting to me is the one titled "Reactome Search". I want to work on it. I think my experience will come of good use in this project.

Thanks for your interest in the Reactome search project!

We would like to use a Lucene-based search platform called EBeye to search our database.  It is currently being used for the databases at the European Bioinformatics Institute (EBI), but it does a very bad job with Reactome data, because it is not using any domain-specific heuristics for sorting results.

You can try EBeye on this page:

http://www.ebi.ac.uk/s4/

There is more detail about EBeye here:

http://www.ebi.ac.uk/ebisearch/documentation.ebi

A full research paper can be found here:

http://bib.oxfordjournals.org/content/early/2010/02/11/bib.bbp065.full


I am passionate about data mining, big data, search and recommendation engines, therefore this idea naturally appeals to me a lot. I have experience with building search functionality into a live production site, where I'm interning at. I used Sphinx with MySQL and was responsible for all the database configuration, trigger and index creation, and full-text search configuration. I have thorough experience with Sphinx (a very capable full-text search engine with many matching and ranking algorithms and different fuzzy matching options) and am willing to dig deeper into Lucene or learn SOLR if the need arises. I have a little experience with Lucene and using DefaultSimilarity (uses Cosine Similarity).

I would like to download the Reactome source code and set it up on my local machine. But I couldn't find any reference to a source code repo anywhere other than that it uses CVS. As suggested in http://wiki.reactome.org/index.php/Reactomes, I'm CC'ing David to help me out. It'd be great if I could examine the code in the perl CGI script (search2) and the code in GKB/modules/GKB/SearchUtils/ResultsRanker.pm to see how I can integrate it with a search platform like Lucene.

You can download the Reactome source code bundle from:

http://www.reactome.org/download/current/GKB.tar.gz

You will find our current sorting heuristics under:

GKB/modules/GKB/SearchUtils/ResultsRanker.pm

I hope this gives you something to get started on, please let me know if you have any questions.

Cheers,

David Croft.
Reply all
Reply to author
Forward
0 new messages