solr error importing stats

300 views
Skip to first unread message

James Holobetz

unread,
Sep 11, 2023, 4:46:30 PM9/11/23
to dspac...@googlegroups.com
I am moving data from our production dspace 7.6 server to our development dspace 7.6 server and I am repeatedly receiving this error:

holobetj dspace $ dsp /opt/dspace/bin/dspace solr-import-statistics -c
No index name provided, defaulting to "statistics".
Exception: Error from server at http://localhost:8983/solr/statistics: Exception writing document id 01072706-6b8a-420d-9bc0-cc637bce3df4 to the index; possible analysis error: Document contains at least one immense term in field="query" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[117, 110, 101, 120, 105, 115, 116, 105, 110, 103, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46]...', original message: bytes can be at most 32766 in length; got 34396. Perhaps the document has an indexed string field (solr.StrField) which is too large
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/statistics: Exception writing document id 01072706-6b8a-420d-9bc0-cc637bce3df4 to the index; possible analysis error: Document contains at least one immense term in field="query" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[117, 110, 101, 120, 105, 115, 116, 105, 110, 103, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46, 46, 47, 46]...', original message: bytes can be at most 32766 in length; got 34396. Perhaps the document has an indexed string field (solr.StrField) which is too large


Looking in  the forums here I have seen the error very rarely:

James Holobetz

unread,
Sep 11, 2023, 4:49:56 PM9/11/23
to dspac...@googlegroups.com

DSpace Technical Support

unread,
Sep 12, 2023, 1:12:40 PM9/12/23
to DSpace Technical Support
Hi James,

I have to admit, I've never seen that error before.  My guess is there's something odd/different (or incorrect) with the data that you are trying to import.  But, I don't know what it could be.  That error mentions the "query" field is the problematic one.  Have you looked at the data you are trying to import to see why that "query" field is so long?  Maybe something is incorrect in that import data, or maybe it's encoded improperly and the script is stumbling over it?

Tim

James Holobetz

unread,
Sep 12, 2023, 4:15:19 PM9/12/23
to DSpace Technical Support
I have found the "query" string in question in the particular csv file that was dumped (solr-export-statistics) from our DSpace 7.6 production machine. I have attached the relevant files to help as to any clue what may be happening.

Thank you,

James

--
All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/6810f574-69e6-4676-95ec-717b4ca22a72n%40googlegroups.com.
solr_error-mon.txt
immense_field_query_solr_csv_entry.xlsx
solr_query_text.txt

James Holobetz

unread,
Sep 12, 2023, 5:56:51 PM9/12/23
to DSpace Technical Support
During our move from DSpace 6.x to DSpace 7.x we had to combine solr shards and then use the UUIDfix tool to convert the old DSpace Object ID to UUID. Anyways, I saved all the csv files for solr ingest and went looking through them for clues about the "query" in question. The solr-export-statistics dump from 6.3 looks different from the solr-export-statistics dump from 7.6 for the query in question.


UUIDfixed-tarball-solr_query_text.txt
UUIDFixed-tarball-immense_field_query_solr_csv_entry.xlsx

DSpace Technical Support

unread,
Sep 13, 2023, 12:25:17 PM9/13/23
to DSpace Technical Support
Hi James,

That long query looks suspiciously like a "directory traversal" attack that someone tried to (unsuccessfully) run against your system in the past using the search page.  For example: https://www.acunetix.com/websitesecurity/directory-traversal/  (Notice how the query you shared had "win.ini" which is a common directory traversal attack attempt)

This sort of attack won't work on DSpace (so there's nothing to worry about). But it might have been logged in your statistics because it was attempted from the DSpace search page.

Overall, my opinion is you may just want to *delete* this entry from your exported CSV.  It doesn't look like a valid statistical entry that you'd want to "count".  It looks like someone was attempting to attack your site (and failing to do so).

Tim

James Holobetz

unread,
Sep 13, 2023, 2:06:48 PM9/13/23
to DSpace Technical Support
After evaluation I suspected the same thing. The big issue of the whole matter is that DSpace 7.6, while it was exporting the statistics, was adding escape characters (\) to the path character (again \) probably which increased the "query" size.

I am just going to delete that record all together in our production system so any further solr exports do not produce the same error when syncing our development machine.

Thanks for your help Tim!

James 

Reply all
Reply to author
Forward
0 new messages