Re: [Genome Informatics] Reactome Search

35 views
Skip to first unread message

Robin Haw

unread,
Apr 16, 2013, 9:52:52 PM4/16/13
to genome-in...@googlegroups.com, David Croft
Dear Nityata,
Thank you for your interest in the Search engine project. If you have any specific questions please direct them to David Croft, cc'd on this email.
Good luck with your proposal application.
Regards,
Robin Haw


On Sun, Apr 14, 2013 at 4:13 PM, nityata kumar <nityatak...@gmail.com> wrote:
Hi all!

My name is Nityata Nagendra Kumar and I am a graduate student in Information and Computer Sciences at the University of California, Irvine. I have built a web based search engine over the ics.uci.edu domain as part of my graduate coursework. The project can be found on https://github.com/nityata/SearchEngine.

The project entails: 
- Pages from the above domain are crawled and selective content is stored
- An inverted index is built using the stored pages, using Apache Lucene APIs, to enable quick search.
Various options provided by Lucene for Query optimization such as Boolean Query etc. was incorporated by us for better results. We also used the NDCG ranking as a metric to measure the performance of our search engine with the IDCG taken as the Google search query results for the doman: ics.uci.edu.

I was very interested in this project and was sorry to finish it. After seeing this project in the GSOC, I'm very enthusiastic in going forward and would like to know any tasks etc. that you want me to do.

Hoping for a reply,
Best Regards,
Nityata

--
You received this message because you are subscribed to the Google Groups "Genome Informatics GSoC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-informat...@googlegroups.com.
To post to this group, send email to genome-in...@googlegroups.com.
Visit this group at http://groups.google.com/group/genome-informatics?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

David Croft

unread,
Apr 23, 2013, 8:39:28 AM4/23/13
to genome-in...@googlegroups.com
Hi Nityata,

My name is Nityata Nagendra Kumar and I am a graduate student in Information and Computer Sciences at the University of California, Irvine. I have built a web based search engine over the ics.uci.edu domain as part of my graduate coursework. The project can be found on https://github.com/nityata/SearchEngine.

The project entails: 
- Pages from the above domain are crawled and selective content is stored
- An inverted index is built using the stored pages, using Apache Lucene APIs, to enable quick search.
Various options provided by Lucene for Query optimization such as Boolean Query etc. was incorporated by us for better results. We also used the NDCG ranking as a metric to measure the performance of our search engine with the IDCG taken as the Google search query results for the doman: ics.uci.edu.

I was very interested in this project and was sorry to finish it. After seeing this project in the GSOC, I'm very enthusiastic in going forward and would like to know any tasks etc. that you want me to do.

Thanks for your interest in the Reactome search project!

We would like to use a Lucene-based search platform called EBeye to search our database.  It is currently being used for the databases at the European Bioinformatics Institute (EBI), but it does a very bad job with Reactome data, because it is not using any domain-specific heuristics for sorting results.

You can try EBeye on this page:

http://www.ebi.ac.uk/s4/

There is more detail about EBeye here:

http://www.ebi.ac.uk/ebisearch/documentation.ebi

A full research paper can be found here:

http://bib.oxfordjournals.org/content/early/2010/02/11/bib.bbp065.full

If you download the Reactome source code bundle, you will find our current sorting heuristics under:

GKB/modules/GKB/SearchUtils/ResultsRanker.pm

I hope this gives you something to get started on, please let me know if you have any questions.

Cheers,

David Croft.

Reply all
Reply to author
Forward
0 new messages