configurable indexing

1 view
Skip to first unread message

iain.d...@gsk.com

unread,
Mar 17, 2005, 11:44:21 AM3/17/05
to cp...@googlegroups.com

After our conversations about indexing I put my mind to generalising the indexing process.

I've extended the attached indexing class, so that it uses apache commons JXPath to tree walk through the castor generated entries.  I have put the XPath expressions into a config file which is read in on an index run.  Knowledge of the Castor generated classes is needed, but that can easily be found in generated Javadoc.

Net result, you can define any field you want to index by an xpath expression, and it will be add to the index with the field name you specify.  

For example we use the interactor attribute list for a number of extra fields, so I have created some xpath expressions to pull then out.  The entry in the config file is:
field4=EFFECT
xpath4=interactionList/interaction/attributeList/attribute[name='EFFECT']/content

Once the indexer has run, I can use the cpath search box thus "EFFECT:someEffect".   Very useful indeed, as we keep adding new attributes!  I have values in the properties file as examples of how to use it.

You may wish to alter the code so it's a better architectural fit? but it works well on our system.  You also have to add commons-jxpath-1.2.jar to your libs.

Enjoy.

PS the config file needs to be placed in the admin/bin directory.

Iain K

PsiInteractionToIndex.java
TestConfigurableIndexCollector.java
ConfigurableIndexCollector.java
indexConfig.properties

Ethan Cerami

unread,
Mar 17, 2005, 1:04:13 PM3/17/05
to cp...@googlegroups.com
Iain,

I like your idea a lot. Each data format would get its own config file,
and we are done. I had a few questions:

1. can you associates more than one xpath expresssion to each field?
For example, we currently lump name, id, and xrefs all into one
interactor field?

2. is it possible that you could generate a config file that matches
the current functionality, so that the current unit tests re: indexing
will pass?

Assuming your new implementation works with the existing unit tests,
would you like to incorporate these changes into our core code? How is
your cvs access working?

Ethan
--
Ethan Cerami
Computational Biology Center
Memorial Sloan-Kettering Cancer Center
http://cbio.mskcc.org
Email: cer...@cbio.mskcc.org
Direct phone: (646) 735-8082
cerami.vcf

Gary Bader

unread,
Mar 17, 2005, 8:11:24 PM3/17/05
to cp...@googlegroups.com
Hi guys,
Iain - thanks! That looks cool. There is one issue to think about
when we move to BioPAX and that is BioPAX is OWL based, not XML Schema
based and xpath expressions might not work in a general way with the OWL
format. So, sounds great for PSI-MI, but we'll have to evaluate for
BioPAX later - we might need to massage BioPAX in some way before we
store it in the database.

Cheers,
Gary

iain.d...@gsk.com

unread,
Mar 18, 2005, 4:05:12 AM3/18/05
to cp...@googlegroups.com

Hi Ethan,

Glad it'll be useful. It dawned on me yesterday that I hadn't run the indexing code through the web interface yet as I do all my indexing through the Admin application.  I'll try it today.

1) I assume lucene well let you add terms to a field more than once but I have not tried it. I will have a play.  I have left the current indexing untouched, so you can keep the core indexing fields hard coded otherwise people could accidentally disable some of your web services?

2) As my indexing extends the current indexing, the tests should still work.  I have to admit I haven't yet got your tests to work, but I have not tried them using the 0.3.2 build that I use.

I'm afraid I'm still in a stalemate position regarding cvs access, and have not tried using my home machine to get the codebase yet.

Regards

Iain K



"Ethan Cerami" <cer...@cbio.mskcc.org>

17-Mar-2005 18:04

Please respond to cp...@googlegroups.com

       
To
cp...@googlegroups.com
cc
Subject
Re: configurable indexing


cerami.vcf

iain.d...@gsk.com

unread,
Mar 18, 2005, 4:17:46 AM3/18/05
to cp...@googlegroups.com

The JXPath libraries aren't really XML centric.  They use the xpath language as a way to configure standard java reflection.  So as long as the biopax objects are written as beans, JXPath uses reflection to call any getXxx setXxx methods.  That's one of the reasons I passed you some example xpath expressions for the castor generated code. Initially I made the mistake of writing my xpath expressions based on the xml, which is in fact slightly different to the objects that castor creates.

If you were curious, you could take the junit test I gave you and swap the Entry object for one of the biopax objects in the call "configurableIndexCollector.setContext(...)", then you can try some xpath expressions on the biopax object.

Regards



"Gary Bader" <ba...@cbio.mskcc.org>

18-Mar-2005 01:11

Please respond to cp...@googlegroups.com

       
To
cp...@googlegroups.com
cc
Subject
Re: configurable indexing


iain.d...@gsk.com

unread,
Mar 18, 2005, 6:03:55 AM3/18/05
to cp...@googlegroups.com, cp...@googlegroups.com

Ethan,

I just tried out the configurable indexing code from the web interface.  You just need to put the indexConfig.properties in the directory one above the web app, this is the "current" directory as far as the webapp is concerned.  I know this is not ideal, and if I had proper cvs access I would probably add it as a initialisation parameter to the servlet, then use your property manager to collect it in the correct class.  

For the moment it will have to go on my wish list, along with adding a "database host" parameter as a servlet initialisation param.

Regards

Iain K

Gary Bader

unread,
Mar 18, 2005, 10:55:58 AM3/18/05
to cp...@googlegroups.com
Hi Iain,
    Ok - I wasn't familiar with JXPath, so I just assumed it was done on the XML level.  Sorry about that.  After looking at the JXPath page, it looks like a very useful tool and will probably work well with the generated code from the BioPAX OWL file.

Cheers,
Gary
Reply all
Reply to author
Forward
0 new messages