I recently discovered the need to run `dspace filter-media` in order to have the full text of items searchable but even after doing that I am getting what seems to be inconsistent results. Some items have their full text searchable because the search returns those items. However, some are inexplicably not showing up when I search for them.
My prime example is the item 196004. I run the following command:
2015-12-10 09:50:35,424 INFO org.dspace.core.ConfigurationManager @ Loading from classloader: file:/E:/dspace/config/dspace.cfg
2015-12-10 09:50:35,455 INFO org.dspace.core.ConfigurationManager @ Using dspace provided log configuration (log.init.config)
2015-12-10 09:50:35,455 INFO org.dspace.core.ConfigurationManager @ Loading: E:/dspace/config/log4j.properties
2015-12-10 09:50:39,658 INFO org.dspace.storage.rdbms.DatabaseManager @ DBMS is 'PostgreSQL'
2015-12-10 09:50:39,658 INFO org.dspace.storage.rdbms.DatabaseManager @ DBMS driver version is '9.4.1'
2015-12-10 09:50:39,736 INFO org.dspace.storage.rdbms.DatabaseUtils @ Loading Flyway DB migrations from: filesystem:E:/dspace/etc/postgres, classpath:org.dspace.storage.rdbms.sqlmigration.postgres, classpath:org.dspace.storage.rdbms.migration
2015-12-10 09:50:39,799 INFO org.flywaydb.core.internal.dbsupport.DbSupportFactory @ Database: jdbc:postgresql://localhost:5432/dspace (PostgreSQL 9.4)
2015-12-10 09:50:39,924 INFO org.dspace.storage.rdbms.DatabaseUtils @ DSpace database schema is up to date
2015-12-10 09:50:40,408 INFO org.dspace.content.MetadataField @ Loading MetadataField elements into cache.
2015-12-10 09:50:40,440 INFO org.dspace.content.MetadataSchema @ Loading schema cache for fast finds
2015-12-10 09:50:43,862 INFO org.dspace.content.Bitstream @ anonymous::create_bitstream:bitstream_id=72780
2015-12-10 09:50:43,877 INFO org.dspace.content.Bundle @ anonymous::add_bitstream:bundle_id=19615,bitstream_id=72780
2015-12-10 09:50:44,174 INFO org.dspace.content.Bitstream @ anonymous::update_bitstream:bitstream_id=72780
2015-12-10 09:50:44,330 INFO org.dspace.content.Bundle @ anonymous::remove_bitstream:bundle_id=19615,bitstream_id=72778
2015-12-10 09:50:44,346 INFO org.dspace.content.Item @ anonymous::update_item:item_id=13562
2015-12-10 09:50:44,346 INFO org.dspace.content.Bitstream @ anonymous::update_bitstream:bitstream_id=72780
2015-12-10 09:50:44,362 INFO org.dspace.content.Bitstream @ anonymous::delete_bitstream:bitstream_id=72778
2015-12-10 09:50:44,377 INFO org.dspace.content.Item @ anonymous::update_item:item_id=13562
2015-12-10 09:50:44,580 INFO org.dspace.content.Bitstream @ anonymous::create_bitstream:bitstream_id=72781
2015-12-10 09:50:44,596 INFO org.dspace.content.Bundle @ anonymous::add_bitstream:bundle_id=19615,bitstream_id=72781
2015-12-10 09:50:44,612 INFO org.dspace.content.Bitstream @ anonymous::update_bitstream:bitstream_id=72781
2015-12-10 09:50:44,627 INFO org.dspace.content.Bundle @ anonymous::remove_bitstream:bundle_id=19615,bitstream_id=72779
2015-12-10 09:50:44,643 INFO org.dspace.content.Item @ anonymous::update_item:item_id=13562
2015-12-10 09:50:44,643 INFO org.dspace.content.Bitstream @ anonymous::update_bitstream:bitstream_id=72781
2015-12-10 09:50:44,643 INFO org.dspace.content.Bitstream @ anonymous::delete_bitstream:bitstream_id=72779
2015-12-10 09:50:44,643 INFO org.dspace.content.Item @ anonymous::update_item:item_id=13562
2015-12-10 09:50:44,643 INFO org.dspace.event.EventManager @
2015-12-10 09:50:48,080 INFO org.dspace.discovery.SolrServiceImpl @ Wrote Item: 123456789/196004 to Index
What gives? I can find the metadata for the item -- just not a fairly unique word in the full text of the PDF.
I even ran a discovery-index -bfo at one point to try and force a full with the full text having already been built (I think).