After we upgraded AtoM to 2.7.3 from 2.6 we are getting elastic search issue : Caused by: java.lang.NumberFormatException: For input string: "198438,198437"

105 views
Skip to first unread message

deepak chauhan

unread,
Apr 26, 2024, 11:39:58 AMApr 26
to AtoM Users
We recently migrated to AtoM 2.7.3 and our elasticsearch logs are filling up very fast. 
Atom - 2.7.3 
php 7.4 
elasticsearch -5.6.16 
java - 1.8.0_402

we saw below logs : 
Caused by: java.lang.NumberFormatException: For input string: "198438,198437"

at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043) ~[?:?]

at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) ~[?:?]

at java.lang.Double.parseDouble(Double.java:538) ~[?:1.8.0_402]

at org.elasticsearch.index.mapper.NumberFieldMapper$NumberType.hasDecimalPart(NumberFieldMapper.java:890) ~[elasticsearch-5.6.16.jar:5.6.16]

at org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$6.termQuery(NumberFieldMapper.java:605) ~[elasticsearch-5.6.16.jar:5.6.16]

at org.elasticsearch.index.mapper.NumberFieldMapper$NumberFieldType.termQuery(NumberFieldMapper.java:958) ~[elasticsearch-5.6.16.jar:5.6.16]

at org.elasticsearch.index.query.TermQueryBuilder.doToQuery(TermQueryBuilder.java:143) ~[elasticsearch-5.6.16.jar:5.6.16]

at org.elasticsearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:96) ~[elasticsearch-5.6.16.jar:5.6.16]

at org.elasticsearch.index.query.BoolQueryBuilder.addBooleanClauses(BoolQueryBuilder.java:444) ~[elasticsearch-5.6.16.jar:5.6.16]

at org.elasticsearch.index.query.BoolQueryBuilder.doToQuery(BoolQueryBuilder.java:418) ~[elasticsearch-5.6.16.jar:5.6.16]

at org.elasticsearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:96) ~[elasticsearch-5.6.16.jar:5.6.16]

at org.elasticsearch.index.query.QueryShardContext.lambda$toQuery$1(QueryShardContext.java:313) ~[elasticsearch-5.6.16.jar:5.6.16]

Dan Gillean

unread,
Apr 26, 2024, 3:25:50 PMApr 26
to ica-ato...@googlegroups.com
Hi Deepak, 

Searching in the forum, I found one other instance similar to this - it turned out to be caused by an aggressive web crawler hitting the site. See: 
You can do as the original poster did and try to confirm this by looking at the Nginx Access logs, and then decide how to proceed - for example, adding a robots.txt file, or else specifically adding rules in Nginx to block the IP of the problem bot, etc. 

Otherwise, you can review and update the logging configuration to reduce verbosity, and even set up automatic log rotation in Elasticsearch - here is a link to the relevant docs: 
Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/6e695488-9139-4be3-b6f0-cdec7bc9712en%40googlegroups.com.

deepak chauhan

unread,
May 2, 2024, 4:25:24 PMMay 2
to AtoM Users
thanks @Dan . The bot that is scrapping in our access_ssl.log is ClaudeBot/1.0 although it doesn't consist what user : Cindy had - > 'elasticsearch log is generated when a web crawler( Go-http-client/1.1) is crawling on the website. ' however I will try to block this bot with IP using robots.txt file and will see what happens.

Our file /var/log/nginx/access_ssl.log shows : 
132.216.177.190 - - [02/May/2024:16:10:07 -0400] "GET /index.php/informationobject/browse?subjects=195237&genres=397&sort=endDate&repos=&view=table&levels=223&%253Bcollection=1766&%253BtopLod=0&%253Bsort=alphabetic&sortDir=desc HTTP/1.1" 200 33623 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot'atherate'anthropic)"

So far we have reduced verbosity and that's working. I'll let you know if robots.txt helps us.

Thanks

Dan Gillean

unread,
May 3, 2024, 8:22:32 AMMay 3
to ica-ato...@googlegroups.com
Hi Deepak,

Keep in mind that following the robots.txt directive is OPTIONAL for bots, unfortunately - meaning if claudebot is already misbehaving, then it may also be the kind of bot that will ignore a block request from a robots.txt file. 

If you need to block it more aggressively, I shared the following Nginx configuration block in another thread, that our Support team uses when configuring AtoM sites for clients. See: 
There are further details on its use, including how to configure protection against bots, in the README:
You will also find a number of external Nginx configuration examples and articles linked in that README. Here is one more that our Support team told us they have been referencing and exploring a lot recently:
Finally, in another thread another AtoM user shared some helpful suggestions for how to configure bad bot blocking in Nginx - see: 
Cheers, and do let us know how it goes!

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

deepak chauhan

unread,
May 8, 2024, 11:57:35 AMMay 8
to AtoM Users
Hi Dan,

Thanks for your replies. we have reduced verbosity for elasticsearch logs and blocked aggressive web crawler bots. 

Just a question out of curiosity:

does this elastic search issue can be caused by code as we upgraded AtoM from 2.6 to 2.7.3 (almost 2-3 weeks ago)

Thank you

Dan Gillean

unread,
May 8, 2024, 3:02:20 PMMay 8
to ica-ato...@googlegroups.com
Hi Deepak, 

If the suggested fixes work, then the issue is not code based, since bot blocking and log settings are configuration changes and not underlying application code changes. 

I am seeing a lot of reports about this after upgrades, but I think this has to do with the fact that these bots are web crawlers - its' likely that if changes are detected to a site, they will recrawl it to update whatever index they are feeding.  Nevertheless, we will keep an eye on this! So far, others with similar reports have confirmed that updating Nginx and blocking problem bots has resolved the issues. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Reply all
Reply to author
Forward
0 new messages