[
http://jira.dspace.org/jira/browse/DS-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=10417#action_10417 ]
Mark Diggory commented on DS-208:
---------------------------------
Stuart, you are correct to a degree...
I would be cautious about storing the full text, unless your planning on presenting fragments of it in context like Google, it is going to create a very large index. And indexing is going to be come very memory intensive is you are pulling the full text into strings, we will begin to risk out of memory errors if the indexing process is not streamed using readers (I did the rewrite to optimize this when I first started at MIT).
You might experiment with using the setter for value after constructing the Field with some default string like....
Field field = new Field(String name, "junk" , Field.Store.Yes, Field.Index.TOKENIZED)
field.setValue(Reader value)
You might be able to get away with a reader parsed tokenized stored field then (but I don't know how much more efficient that may be)
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/document/Field.html#setValue(java.io.Reader)
p.s. How are you highlighting when the presented values for the search results are pulled from the Item metadata directly? (that is a loaded question, I'm hoping your answer is, we don't use the metadata for the item directly anymore and render the lucene record directly with hit highlighting present?!) ;-)