On Thu, May 22, 2025 at 07:04:17PM +0000, Keese, Brian W wrote:
> More information... in my test sample of one, just now, I changed "textextractor.use-temp-file = true" to "textextractor.use-temp-file = false" in dspace.cfg and then the pdf text was parsed successfully. I'll dig into the temp file code to see if I can nail down the root cause. I'm guessing something about the parser plug-in interface has changed.
Interesting. I may try that.
More data: I fetched tika-app 3.1.0 and opened one of the offending
files. It warns twice about "Empty COSName at offset blah" but has no
trouble reading the file or displaying content.
> On Thursday, May 22, 2025 at 10:32:41 AM UTC-5
mw...@iu.edu wrote: