Indexing not working properly in DSpace 6 after upgrade from DSpace 5 to 6

1,171 views
Skip to first unread message

Bhavesh Patel

unread,
May 1, 2017, 9:48:48 AM5/1/17
to DSpace Technical Support
Dear all,

we have upgraded DSpace 5 to DSpace 6, we have imported the database and assesstore but when we run index-discovery it will not list all the records, it will hide some records even we got the data that we have entered latest one but some older data still not indexed.

what may be the issue? what is the proper steps to run full index?

OS: CentOS 7
DSpace: 6

Thanks & Regards,
Bhavesh R. Patel

"Learning is a never ending process"

Tran Huu Trung (TTTV.ICT)

unread,
May 1, 2017, 11:45:46 PM5/1/17
to DSpace Technical Support
Please check anonymous authorization of these data and try index-discovery -b -f

Vào 20:48:48 UTC+7 Thứ Hai, ngày 01 tháng 5 năm 2017, Bhavesh Patel đã viết:

Bhavesh Patel

unread,
May 3, 2017, 7:13:21 AM5/3/17
to Tran Huu Trung (TTTV.ICT), DSpace Technical Support
​Thanks for your reply, we did it as per your suggestion but there is no change in number.

We have cross checked in database (postgresql), in table: metadatavalue  we set the condition with itemtype and count, it shows the perfect number of records but it when we index, it doesn't show all the title, we don't know what is missing as some of the most recent submissions are there.

Can you please suggest how to find collection wise list and count through query in DSpace 6?

what is the possible reason for not indexing all the items?, please help us.

Thanks & regards,
Bhavesh





Thanks & regards,
Bhavesh


Thanks & Regards,
Bhavesh R. Patel

"Learning is a never ending process"


--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Tom Desair

unread,
May 3, 2017, 7:22:59 AM5/3/17
to Bhavesh Patel, Tran Huu Trung (TTTV.ICT), DSpace Technical Support
Hi Bhavesh,

To check the number of items in the database that need to be indexed, execute:
select count(*) from item where in_archive = true or withdrawn = true;

Then compare this to the number of items that are in the discovery index:
curl "http://localhost:8080/solr/search/select?q=search.resourcetype%3A2&rows=0&wt=json&indent=true
(check the value after "numFound")

Are the counts different?

 
logoTom Desair
250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
Gaston Geenslaan 14, Leuven 3001, Belgium
www.atmire.com

Bhavesh Patel

unread,
May 3, 2017, 7:37:22 AM5/3/17
to Tom Desair, Tran Huu Trung (TTTV.ICT), DSpace Technical Support
Dear Tom Desair,

Both counts are different,

SQL Query in database: 18072

Through Solr: 17058

Through Solr:
-----------------------
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "q":"search.resourcetype:2",
      "indent":"true",
      "rows":"0",
      "wt":"json"}},
  "response":{"numFound":17058,"start":0,"docs":[]
  }}
-------------------


Thanks & Regards,
Bhavesh R. Patel

"Learning is a never ending process"


Tom Desair

unread,
May 3, 2017, 7:48:19 AM5/3/17
to Bhavesh Patel, Tran Huu Trung (TTTV.ICT), DSpace Technical Support
DSpace 6.0 has some known memory issue so it might be that the indexing process is stopped before it could complete. These will be fixed in DSpace 6.1.

For now, can you try this:
$ export JAVA_OPTS="-Xmx4G -Dfile.encoding=UTF-8"
$ bin/dspace index-discovery -bf

Do you see any errors in the dspace.log file when running "dspace index-discovery -bf" or when it completes?


 
logoTom Desair
250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
Gaston Geenslaan 14, Leuven 3001, Belgium
www.atmire.com

Bhavesh Patel

unread,
May 3, 2017, 9:55:23 AM5/3/17
to Tom Desair, Tran Huu Trung (TTTV.ICT), DSpace Technical Support
Dear Tom Desair,

We have followed the instruction as per your reply, index command was successfully executed, but the count remain same.

we see into dspace.log file (dspace.log.2017-05-03), it's around 26 MB, so I have taken some of the lines (that is having error while executing the command) into attached .txt file

It's give error like: "ERROR org.dspace.discovery.SolrServiceImpl @ No choices plugin was configured for  field "dc_contributor_author".
java.lang.IllegalArgumentException: No choices plugin was configured for  field "dc_contributor_author"."

what may be the issue? it's not recognize the author field?

Thanks
Bhavesh



Thanks & Regards,
Bhavesh R. Patel

"Learning is a never ending process"


dspace_log.txt

Tom Desair

unread,
May 3, 2017, 11:00:21 AM5/3/17
to Bhavesh Patel, Tran Huu Trung (TTTV.ICT), DSpace Technical Support
Hi Bhavesh,

It looks like you have dc.contributor.author metadata values for which the authority column has a value. But your dspace.cfg file doesn't contain any valid authority configuration for dc.contributor.author.

That means you have two options:
  1. Configure authority control for dc.contributor.author: To do this, you have to uncomment the following section https://github.com/DSpace/DSpace/blob/dspace-6.0/dspace/config/dspace.cfg#L1473
  2. OR ignore authorities for the browse index: Add these lines to config/modules/discovery.cfg
discovery.browse.authority.ignore-prefered.author = true
discovery.browse.authority.ignore-variants.author = true

That should make that error disappear and index all items.

 
logoTom Desair
250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
Gaston Geenslaan 14, Leuven 3001, Belgium
www.atmire.com

Bhavesh Patel

unread,
May 3, 2017, 1:25:24 PM5/3/17
to Tom Desair, Tran Huu Trung (TTTV.ICT), DSpace Technical Support
Dear Tom Desair,

We configured authority control for dc.contributor.author into dspace.cfg file and again we do the indexing using following commands: 

Inline image 1

commands executed successfully, but no change into count.

into dspace log file: error message from dspace.log file (please see the attached image)

Regards,
Bhavesh



Thanks & Regards,
Bhavesh R. Patel

"Learning is a never ending process"


dspace-log.PNG

Tran Huu Trung (TTTV.ICT)

unread,
May 3, 2017, 2:13:10 PM5/3/17
to DSpace Technical Support, tom.d...@atmire.com, tru...@hpu.edu.vn
It's seem that error at line 72 in SolrServiceResourceRestrictionPlugin.java, so please check the policy of these data:

select * from metadatavalue where dspace_object_id in (select distinct dspace_object from resourcepolicy where eperson_id is not null)

and you can add READ policy for anonymous group for these old data. The solr index work well with items have READ policy for anonymous group.


Vào 00:25:24 UTC+7 Thứ Năm, ngày 04 tháng 5 năm 2017, Bhavesh Patel đã viết:
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

Bhavesh Patel

unread,
May 8, 2017, 5:02:54 AM5/8/17
to Tran Huu Trung (TTTV.ICT), DSpace Technical Support, Tom Desair
While executing following query it's shows 5712 records,

select distinct dspace_object from resourcepolicy where eperson_id is not null

we have some collections that don't have any policy but still some records not index.

After Tom suggestion about "enable authority control for dc.contributor.author into dspace.cfg file" it's don't show any error into log file but count remain same (not indexing proper), what may be the issue ?

Regards,
Bhavesh



Thanks & Regards,
Bhavesh R. Patel

"Learning is a never ending process"


To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

Bhavesh Patel

unread,
May 11, 2017, 7:37:33 AM5/11/17
to Tom Desair, Tran Huu Trung (TTTV.ICT), DSpace Technical Support
Dear Tom Desair,

Thanks for your valuable suggestion, Now we are able to fix the issue and now it's index full data.

we have followed 2nd option
ignore authorities for the browse index:
Add these lines to config/modules/discovery.cfg
discovery.browse.authority.ignore-prefered.author = true
discovery.browse.authority.ignore-variants.author = true


That is help us to resolve the issue.

Once again thanks to DSpace technical community and especially Tom for help.

Thanks & regrds,
Bhavesh




Thanks & Regards,
Bhavesh R. Patel

"Learning is a never ending process"


Reply all
Reply to author
Forward
0 new messages