[Dspace-tech] Errors encountered when running update-discovery-index

105 views
Skip to first unread message

Ian Boston

unread,
Aug 26, 2015, 10:11:45 AM8/26/15
to DSpac...@lists.sourceforge.net
Hi,
I am seeing issues when I run dspace update-discovery-index using the
3.0-rc3 tag.

The dataset is the full DSpace@Cambridge dataset (217K items) migrated
from 1.6 up to 3.0-rc3 and visually checked by people who know the
data.

I run dspace update-discovery-index and see about 70% of the Solr
update index operations failing. (of the POST operations to Solr in
the access log file, over 70% have a status code of 400 (Bad
Request)).

On the SolrJ client site I see 1000s of traceback messages like [1]
and on the Solr server side I see a matching number of error messages
like [2]. In the second traceback the names of the fields eg "[g18,
g0, g5]", vary. I cant see any other errors in the logs and the
Discovery part of the UI appears to be working.

The code base is the 3.0-rc3 tag. I have made no changes to the DSpace
code. The only changes are to configuration files, and there are no
changes to the schema.xml for any of the Solr indexes, and no changes
to the config in the solr area other than to enable Discovery.

AFAIK, the metadata scheme is standard Dublin Core with no extensions.

I searched the mailing list but could not find anything that pointed
to this problem.

Any ideas ?
(Looks like one or more of the Solr fields should be multiValued,
probably an automated one).
Ian



1. ( from dspace.log.2012-12-02)

request: http://localhost:8080/solr/search/update?wt=javabin&version=2
org.apache.solr.common.SolrException: Bad Request

Bad Request

request: http://localhost:8080/solr/search/update?wt=javabin&version=2
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
at org.dspace.discovery.SolrServiceImpl.writeDocument(SolrServiceImpl.java:601)
at org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:1270)
at org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:162)
at org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:332)
at org.dspace.discovery.IndexClient.main(IndexClient.java:120)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
2012-12-02 00:16:06,215 ERROR org.dspace.discovery.SolrServiceImpl @
Error while writing item to discovery index: 1810/433 message:Bad
Request




2. (from catalina.out)

02-Dec-2012 22:24:21 org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=4-10]
multiple values encountered for non multiValued field read: [g18, g0,
g5]
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:158)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.dspace.solr.filters.LocalHostRestrictionFilter.doFilter(LocalHostRestrictionFilter.java:60)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1653)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

helix84

unread,
Aug 26, 2015, 10:11:46 AM8/26/15
to Ian Boston, DSpac...@lists.sourceforge.net
Hi Ian,

did you follow the upgrading instructions [1] and ran
update-discovery-index with the -f parameter? Solr was upgraded, so
the index needs to be recreated.

Just in case that doesn't help, can you also try with the -b parameter
and report back?

[1] https://wiki.duraspace.org/display/DSDOC3x/Upgrading+From+1.8.x+to+3.x


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Ian Boston

unread,
Aug 26, 2015, 10:11:49 AM8/26/15
to hel...@centrum.sk, DSpac...@lists.sourceforge.net
Hi Ivan,

On 3 December 2012 10:39, helix84 <hel...@centrum.sk> wrote:
> Hi Ian,
>
> did you follow the upgrading instructions [1] and ran
> update-discovery-index with the -f parameter? Solr was upgraded, so
> the index needs to be recreated.

The migration was done with the -f on a clean solr subtree with no
index files present.

If I do it again (presuably to force a rebuild of the index) I get
still get errors when rebuilding.


>
> Just in case that doesn't help, can you also try with the -b parameter
> and report back?

Still the same with a -b on the command.

I did see this:

2012-12-03 06:11:36,610 INFO org.dspace.discovery.SolrServiceImpl @
Wrote Item: 1810/1587 to Index
2012-12-03 06:11:36,618 ERROR org.dspace.browse.SolrBrowseCreateDAO @
Null metadata value for item 1542, field: dc.contributor.advisor
2012-12-03 06:11:36,623 ERROR org.dspace.discovery.SolrServiceImpl @
Error while writing item to discovery index: 1810/1470 message:Bad
Request

Bad Request

request: http://localhost:8080/solr/search/update?wt=javabin&version=2
org.apache.solr.common.SolrException: Bad Request

Bad Request

request: http://localhost:8080/solr/search/update?wt=javabin&version=2
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
at org.dspace.discovery.SolrServiceImpl.writeDocument(SolrServiceImpl.java:601)
at org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:1270)
at org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:162)
at org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:332)
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:294)
at org.dspace.discovery.IndexClient.main(IndexClient.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
2012-12-03 06:11:36,636 INFO org.dspace.discovery.SolrServiceImpl @
Wrote Item: 1810/23948 to Index
2012-12-03 06:11:36,648 INFO org.dspace.discovery.SolrServiceImpl @
Wrote Item: 1810/19183 to Index
2012-12-03 06:11:36,661 INFO org.dspace.discovery.SolrServiceImpl @
Wrote Item: 1810/3642 to Index


So I think [1] is one of the records that is failing. The link is from
our live system but the IDs and the metadata is identical as I took a
copy of the live DB before going through the upgrade scripts (which
worked like a dream :)).


1 http://www.dspace.cam.ac.uk/handle/1810/1470?mode=full&submit_simple=Show+full+item+record

Ian Boston

unread,
Aug 26, 2015, 10:11:50 AM8/26/15
to hel...@centrum.sk, DSpac...@lists.sourceforge.net
On 3 December 2012 19:51, helix84 <hel...@centrum.sk> wrote:
> On Mon, Dec 3, 2012 at 7:20 AM, Ian Boston <ib...@cam.ac.uk> wrote:
>> 2012-12-03 06:11:36,618 ERROR org.dspace.browse.SolrBrowseCreateDAO @
>> Null metadata value for item 1542, field: dc.contributor.advisor
>
> This does look strange, assuming you really have a text_value for that
> row there.
>
> Both updating index and reindexing worked fine on my DSpace 3.0
> instance upgraded from 1.8.
>
> I'm wondering about that SolrBrowseCreateDAO part, not sure if it's
> supposed to be there, or if it's supposed to use the Postgres/Oracle
> DAO. This is new in DSpace 3.0:
> https://github.com/DSpace/DSpace/blob/dspace-3.0/dspace/config/dspace.cfg#L872
>
>
> Maybe if you could change this part to log what document it's trying
> to write, we'd see what's wrong:
> https://github.com/DSpace/DSpace/blob/dspace-3.0/dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java#L601
>
>
> Another thing I'd look at is changes in the value of the multiValued
> attribute in the Solr search schema, see if that gives you a clue:
> git diff --ignore-space-at-eol dspace-1_8_x dspace-3.0
> dspace/solr/search/conf/schema.xml
> But I can't explain why any such error would occur when rebuilding the
> index from scratch.


I am not massively experienced with DSpace, but have a reasonable
amount of exposure to Solr (especially Solr4). The error normally
happens when a SolrJ client tried to put multiple values into a fields
that was declared with no multiValued attribute or multiValued=false.

Normally the error message tells you exactly which one it is, but the
log messages on the Solr sever side dont look very normal. I am going
to guess the fields is being defined by one of the field templates...
but which one?

In the offending line that causes the exception (not the one reporting
null), there is a reference to a handle that contains several fields
that are multi values.

Perhaps the solr schema is not expecting subject to be multi valued ?

I dont think this is a problem with a normal DSpace data set. Its more
likely to be a problem with the DSpace@Cambridge dataset which has
been living in a modified DSpace that forked 5 or 6 years ago probably
at 1.4 or maybe 1.5. There are a lot of items, and hence a lot of
opportunity for all sorts of weird metadata. I was impressed that the
data migration to an unpatched DS3 worked.

I'll hack the code a bit to see what exactly the problematic field is.

Ian

helix84

unread,
Aug 26, 2015, 10:11:50 AM8/26/15
to Ian Boston, DSpac...@lists.sourceforge.net
On Mon, Dec 3, 2012 at 7:20 AM, Ian Boston <ib...@cam.ac.uk> wrote:
> 2012-12-03 06:11:36,618 ERROR org.dspace.browse.SolrBrowseCreateDAO @
> Null metadata value for item 1542, field: dc.contributor.advisor

This does look strange, assuming you really have a text_value for that
row there.

Both updating index and reindexing worked fine on my DSpace 3.0
instance upgraded from 1.8.

I'm wondering about that SolrBrowseCreateDAO part, not sure if it's
supposed to be there, or if it's supposed to use the Postgres/Oracle
DAO. This is new in DSpace 3.0:
https://github.com/DSpace/DSpace/blob/dspace-3.0/dspace/config/dspace.cfg#L872


Maybe if you could change this part to log what document it's trying
to write, we'd see what's wrong:
https://github.com/DSpace/DSpace/blob/dspace-3.0/dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java#L601


Another thing I'd look at is changes in the value of the multiValued
attribute in the Solr search schema, see if that gives you a clue:
git diff --ignore-space-at-eol dspace-1_8_x dspace-3.0
dspace/solr/search/conf/schema.xml
But I can't explain why any such error would occur when rebuilding the
index from scratch.


Kevin Van de Velde

unread,
Aug 26, 2015, 10:11:52 AM8/26/15
to Ian Boston, DSpac...@lists.sourceforge.net
Hi Ian,

There was an issue with the read field in rc3. This has been fixed for the DSpace 3.0 release with the following commit: https://github.com/DSpace/DSpace/commit/d9df145a4e9628fcb649d8bdeb7454085ae2cc53#diff-4


Kind regards,

logo   
Kevin Van de Velde @mire 
2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010
Esperantolaan 4, Heverlee 3001, Belgium
www.atmire.com



------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
BUILD Helping you discover the best ways to construct your parallel projects.
http://goparallel.sourceforge.net
_______________________________________________
DSpace-tech mailing list
DSpac...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

@mire_web_2.jpg

helix84

unread,
Aug 26, 2015, 10:11:52 AM8/26/15
to Ian Boston, DSpac...@lists.sourceforge.net
On Mon, Dec 3, 2012 at 10:38 AM, Ian Boston <ib...@cam.ac.uk> wrote:
> Perhaps the solr schema is not expecting subject to be multi valued ?

I can't find subject in the schema (it seems to be created
dynamically), but the Solr admin interface tells me it is multivalued:
Field: subject
Dynamically Created From Pattern: *
Field Type: TEXT
Schema: Indexed, Tokenized, Stored, Multivalued
Index: Indexed, Tokenized, Stored

You could check that this is true also for other fields in your
dataset, but I suspect they will all be multivalued:
<!--Used for matching on all other fields -->
<dynamicField name="*" type="text" multiValued="true"/>

> I dont think this is a problem with a normal DSpace data set. Its more
> likely to be a problem with the DSpace@Cambridge dataset which has
> been living in a modified DSpace that forked 5 or 6 years ago probably
> at 1.4 or maybe 1.5. There are a lot of items, and hence a lot of
> opportunity for all sorts of weird metadata. I was impressed that the
> data migration to an unpatched DS3 worked.

I see.

> I'll hack the code a bit to see what exactly the problematic field is.

If you'll figure it out, more catch clauses to report all kinds of
schema abuse are always welcome! :)

Ian Boston

unread,
Aug 26, 2015, 10:12:17 AM8/26/15
to Kevin Van de Velde, DSpac...@lists.sourceforge.net
Excellent, thank you (both), I'll upgrade and get back to you. Should be easy now we have a null patch set against the release.
Ian
@mire_web_2.jpg

Ian Boston

unread,
Aug 26, 2015, 10:12:23 AM8/26/15
to Kevin Van de Velde, DSpac...@lists.sourceforge.net
Hi,
The upgrade to the 3.0 release fixed it.
All indexes rebuilt with no errors other than data format errors, which will have to be fixed in the db.
Thanks
Ian
@mire_web_2.jpg
Reply all
Reply to author
Forward
0 new messages