[Fedora-commons-users] Mulgara query failure: Misleading error message or bug?

9 views
Skip to first unread message

Janna Wemekamp

unread,
Apr 1, 2010, 11:12:10 PM4/1/10
to Fedora Users
Hi,

In Fedora Commons 3.3, the following iTQL query in http://.../fedora/risearch generates an error:

select  $pid
from    <#ri>
where   ( $pid <dc:identifier> 'c4oc:*' in <#ri-fullText>
            or $pid <dc:identifier> 'wip*' in <#ri-fullText> )
minus   (
             $pid <fedora-view:disseminates> $ds1 in <#ri>
    and $ds1 <fedora-view:disseminationType> <info:fedora/*/PDF2TEXT> in <#ri>
        )
order by $pid

The query succeeds as long as there's no colon in the first literal. Is this a bug or just a misleading error message?
Is there any way to 'escape' the colon?

The error message is shown below.
org.trippi.TrippiException: Query failed: Cannot parse 'c4oc:*': '*' or '?' not allowed as first character in WildcardQuery
	at org.trippi.impl.mulgara.MulgaraSession.query(MulgaraSession.java:148)
	at org.trippi.impl.base.ConcurrentTriplestoreReader.findTuples(ConcurrentTriplestoreReader.java:79)
	at fedora.server.resourceIndex.ResourceIndexImpl.findTuples(ResourceIndexImpl.java:279)
	at fedora.server.resourceIndex.ResourceIndexModule.findTuples(ResourceIndexModule.java:296)
	at org.trippi.server.TrippiServer.find(TrippiServer.java:119)
	at org.trippi.server.http.TrippiServlet.doFind(TrippiServlet.java:514)
	at org.trippi.server.http.TrippiServlet.doGet(TrippiServlet.java:379)
	at fedora.server.access.RISearchServlet.doGet(RISearchServlet.java:103)
	at org.trippi.server.http.TrippiServlet.doGet(TrippiServlet.java:271)
	at org.trippi.server.http.TrippiServlet.doPost(TrippiServlet.java:576)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at fedora.server.security.servletfilters.FilterSetup.doFilter(FilterSetup.java:234)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at fedora.server.security.servletfilters.FilterSetup.doFilter(FilterSetup.java:234)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at fedora.server.security.servletfilters.FilterSetup.doFilter(FilterSetup.java:234)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
	at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
	at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
	at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:769)
	at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:698)
	at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891)
	at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
	at java.lang.Thread.run(Thread.java:619)
Caused by: org.mulgara.query.QueryException: Query failed: Cannot parse 'c4oc:*': '*' or '?' not allowed as first character in WildcardQuery
	at org.mulgara.resolver.DatabaseSession.execute(DatabaseSession.java:754)
	at org.mulgara.resolver.DatabaseSession.query(DatabaseSession.java:464)
	at org.trippi.impl.mulgara.MulgaraSession.query(MulgaraSession.java:146)
	... 36 more
Caused by: org.mulgara.query.MulgaraTransactionException: Transaction rollback triggered
	at org.mulgara.resolver.MulgaraInternalTransaction.implicitRollback(MulgaraInternalTransaction.java:516)
	at org.mulgara.resolver.MulgaraInternalTransaction.execute(MulgaraInternalTransaction.java:629)
	at org.mulgara.resolver.DatabaseSession.execute(DatabaseSession.java:751)
	... 38 more
Caused by: org.mulgara.query.QueryException: Error resolving LC{subj=$pid, pred=http://purl.org/dc/elements/1.1/identifier, obj="c4oc:*", score=null, binder=null} from rmi://localhost/fedora#ri
	at org.mulgara.resolver.LocalQueryResolver.resolve(LocalQueryResolver.java:194)
	at org.mulgara.resolver.DefaultConstraintHandlers$3.resolve(DefaultConstraintHandlers.java:131)
	at org.mulgara.resolver.ConstraintOperations.resolveModelExpression(ConstraintOperations.java:160)
	at org.mulgara.resolver.lucene.LuceneConstraintDescriptor.resolve(LuceneConstraintDescriptor.java:55)
	at org.mulgara.resolver.ConstraintOperations.resolveConstraintExpression(ConstraintOperations.java:187)
	at org.mulgara.resolver.LocalQueryResolver.resolveConstraintOperation(LocalQueryResolver.java:112)
	at org.mulgara.resolver.DefaultConstraintHandlers$6.resolve(DefaultConstraintHandlers.java:176)
	at org.mulgara.resolver.ConstraintOperations.resolveConstraintExpression(ConstraintOperations.java:187)
	at org.mulgara.resolver.LocalQueryResolver.resolveConstraintOperation(LocalQueryResolver.java:112)
	at org.mulgara.resolver.DefaultConstraintHandlers$7.resolve(DefaultConstraintHandlers.java:190)
	at org.mulgara.resolver.ConstraintOperations.resolveConstraintExpression(ConstraintOperations.java:187)
	at org.mulgara.resolver.LocalQueryResolver.resolveE(LocalQueryResolver.java:269)
	at org.mulgara.resolver.DatabaseOperationContext.doQuery(DatabaseOperationContext.java:798)
	at org.mulgara.resolver.QueryOperation.execute(QueryOperation.java:136)
	at org.mulgara.resolver.MulgaraInternalTransaction.execute(MulgaraInternalTransaction.java:625)
	... 39 more
Caused by: org.mulgara.query.QueryException: Failed to query string index
	at org.mulgara.resolver.lucene.LuceneResolver.resolve(LuceneResolver.java:367)
	at org.mulgara.resolver.InternalResolver.resolve(InternalResolver.java:180)
	at org.mulgara.resolver.DatabaseOperationContext.resolve(DatabaseOperationContext.java:653)
	at org.mulgara.resolver.LocalQueryResolver.resolve(LocalQueryResolver.java:187)
	... 53 more
Caused by: org.mulgara.query.TuplesException: Couldn't generate answer from text index: subject='null', predicate='http://purl.org/dc/elements/1.1/identifier', object='c4oc:*'
	at org.mulgara.resolver.lucene.FullTextStringIndexTuples$SearchHitsTuples.(FullTextStringIndexTuples.java:387)
	at org.mulgara.resolver.lucene.FullTextStringIndexTuples.(FullTextStringIndexTuples.java:184)
	at org.mulgara.resolver.lucene.LuceneResolver.resolve(LuceneResolver.java:361)
	... 56 more
Caused by: org.mulgara.resolver.lucene.FullTextStringIndexException: Unable to parse query 'c4oc:*'
	at org.mulgara.resolver.lucene.FullTextStringIndex.find(FullTextStringIndex.java:594)
	at org.mulgara.resolver.lucene.FullTextStringIndexTuples$SearchHitsTuples.(FullTextStringIndexTuples.java:385)
	... 58 more
Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'c4oc:*': '*' or '?' not allowed as first character in WildcardQuery
	at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
	at org.mulgara.resolver.lucene.FullTextStringIndex.find(FullTextStringIndex.java:590)
	... 59 more



Janna Wemekamp














    

Steve Bayliss

unread,
Apr 2, 2010, 4:37:05 AM4/2/10
to Janna Wemekamp, Fedora Users
Hi Janna
 
The ":" character will be interpreted by the full-text index as a field delimiter by the underlying Lucene query parser (which the full-text model uses) -- ie your query is searching for "*" in the field "c4oc" (which won't exist - the only full-text field is stemmedliteral).  (and you can't search with * as the first character - hence the error message).
 
When indexing, the Lucene analyzer will in fact split text separated by a colon into separate "words" in the full-text index so that for instance "abc:def" will result in index entries for "abc" and "def", and not "abc:def".  So the entry you are searching for won't actually exist in the index; there will be separate full-text entries for the pid namespace and the pid without a namespace.
 
You can escape characters using "\\" - so 'c4oc\\:*' in your example, but as Lucene won't have indexed the full namespace:pid string you won't actually match anything.
 
What you are trying to do - find objects based on the pid namespace - seems entirely reasonable to me; but Fedora doesn't currently index the pid namespace (unlike, for example, the content model and the owner ID).
 
So it would seem that a solution would be for Fedora to create a triple for the pid namespace, so that you could for example query the main <#ri> model using something like:
 
$pid <fedora-model:hasPIDNamespace> 'c4oc'
 
Is that something you would like to see? 
 
If so I can create a JIRA ticket for that for consideration in a future release of Fedora (or you can create one yourself).  This would also mean that you wouldn't need to use the full text model, which should mean that the query is more performant.  And if this is the only reason you need the full text model then you'd be able to turn off full-text indexing, which will improve ingest performance.
 
In the mean time, you should find that a query with a where clause of...
 
$pid <dc:identifier> 'c4oc' in <#ri-fullText>
(ie without a wildcard)
 
... should work, however this could potentially pick up spurious results if it finds 'c4oc' as a "word" in some other dc:identifier field (for instance if you had a dc:identifier value of somenamespace:c4oc then this would also be picked up).  So the success of this will depend on how you're populating dc:identifier.
 
Regards
Steve

Janna Wemekamp

unread,
Apr 2, 2010, 4:14:08 PM4/2/10
to Steve Bayliss, Fedora Users
Thanks Steve!

You're correct - I'm attempting to find objects by pid namespace and the dc:identifier search in the full-text model was the only mechanism available. A triple for the pid namespace in the main <#ri> model as you describe would be _very_ helpful!

I don't yet have a JIRA login so if you'd create the ticket I'd be very appreciative!

thanks again.


Janna

James, Eric

unread,
Apr 5, 2010, 1:47:21 PM4/5/10
to Janna Wemekamp, Steve Bayliss, Fedora Users
The following SQL query should also return the pids you're looking for:

select token from objectPaths where token like "c4oc%" or token like "wip%"

________________________________
From: Janna Wemekamp [janna.w...@gmail.com]
Sent: Friday, April 02, 2010 4:14 PM
To: Steve Bayliss
Cc: 'Fedora Users'
Subject: Re: [Fedora-commons-users] Mulgara query failure: Misleading errormessage or bug?
In Fedora Commons 3.3, the following iTQL query in http://.../fedora/risearch<UrlBlockedError.aspx> generates an error:

Steve Bayliss

unread,
Apr 3, 2010, 4:47:11 AM4/3/10
to Janna Wemekamp, Fedora Users
Hi Janna - I've created http://www.fedora-commons.org/jira/browse/FCREPO-676 for this.
 
Regards
Steve
-----Original Message-----
From: Janna Wemekamp [mailto:janna.w...@gmail.com]
Sent: 02 April 2010 21:14
To: Steve Bayliss
Cc: 'Fedora Users'

Janna Wemekamp

unread,
Apr 5, 2010, 5:58:34 PM4/5/10
to James, Eric, Fedora Users
Hi Eric,

Yes, an SQL query on objectPaths (or doRegistry) would give me a list of pids. However, I was specifically after the pids of objects which did not have a PDF2TEXT datastream. That's a more complex SQL query; I haven't tried to construct it although it's probably possible using the datastreamPaths table.
I prefer not to have my apps depend on SQL queries if I can possibly avoid them so a <fedora-model:hasPIDNamespace> predicate as Steve suggests would be very useful to query the triplestore instead (w/o enabling full-text indexing).


Cheers!

Janna

Reply all
Reply to author
Forward
0 new messages