Can anybody suggest what I coudl do about this ?

71 views
Skip to first unread message

gi...@shaw.ca

unread,
Jul 7, 2022, 1:04:49 PM7/7/22
to islandora
Thanks folks.
Solr Error.png

ysu...@berklee.edu

unread,
Jul 8, 2022, 9:23:48 AM7/8/22
to islandora
Hello,

I only know the basics of Solr, but I wanted to suggest that you provide additional info about your installation and the issue you are having, in case someone else can help.

For example...

1) what is your Islandora version?

2) server operating system?

3) what steps does it take to reproduce the issue?

4) did you change anything recently in your configurations or upgrade any components?

Good luck,
Yamil

On Thursday, July 7, 2022 at 1:04:49 PM UTC-4 gi...@shaw.ca wrote:
Thanks folks.

gi...@shaw.ca

unread,
Jul 8, 2022, 5:05:46 PM7/8/22
to islandora
Islandora 7.61
Ubuntu server 14.04.6
No steps - noticed the error
No changes that I know of - but that does not mean that the repository team did not import some mods or something.

Nobody reported any problems. Site seems to be working fine. But I caught this during one of my checks.

Thanks

gi...@shaw.ca

unread,
Jul 19, 2022, 1:24:27 PM7/19/22
to islandora
HI folks,

just an update. The errors seem to happen in the wee hours of the morning. They are definitely not caused by any user input.

Attached is the catalina.out excerpt....

Can somebody suggest how I could make a solr query to find the offending record ?

Thanks
SOLR.txt

Don Richards

unread,
Jul 20, 2022, 2:17:23 PM7/20/22
to islandora
This behavior and query string looks like a bot scraping your site via Solr. I had experienced this a few years back. There's a couple of options to prevent this activity. You can use an automated solution like Apache Bad Bot https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker 
It should block the bot from interacting with the Solr server. 

The other solution is fail2ban (overview at https://www.booleanworld.com/blocking-bad-bots-fail2ban/) to block potentially malicious queries. This can take a lot of manual filter creations up front but it's my preferred solution. 

Otherwise, if it is caused by something like a cron update this might be an indicator of a control character in a field that can't handle it or something. I'd need more information. Can you get to the Solr dashboard to run a query directly? If so, can you run a query for "*:*" and see the solar output for an object that has the mods_genre_ms field filled in? 

Also, getting feedback is much quicker on the Islandora Slack channel if you're interested. https://islandora.slack.com

gi...@shaw.ca

unread,
Jul 21, 2022, 6:47:25 PM7/21/22
to islandora
Thanks for that. Would I be correct in presuming that if it is indeed a bot scraping my site it does that via port 8080 ? In that case I could block it via the local firewall...or block it all together ?
Yes, I can get to the solar dashboard...
Attached is a screenshot of my query window. Can you suggest how to use it ? I am not too familiar with solr...
Thanks a lot.
solr.png

Don Richards

unread,
Jul 21, 2022, 9:02:06 PM7/21/22
to islandora
The query should be going through Drupal's port 80, and I would block them altogether. To use Solr's dashboard, execute the query with *:* to get a list. It might be helpful to increase the number of results (under the rows field from 10 to 100). You can change the *:* to solr field name colon field value X:X   The asterisks is a wild card character. You can substitute *:* with something like PID:243 or whatever fields you see. Using the *:* initially will return some objects in the Solr index. The number of objects is specified by the "rows" field and it defaults to 10.
So I usually start with the *:* with increasing the rows to 100. You can query just 1 object to keep the page simple to use. For instance, if there's a solr field "PID":"42" . Without the double quotes, replace the *:* with PID:42 and execute the query with that value. It should return the 1 object you were looking for. 

That dashboard page is used for testing queries and/or see what a solr results would return.

gi...@shaw.ca

unread,
Jul 27, 2022, 2:03:28 PM7/27/22
to islandora
Thanks !
Reply all
Reply to author
Forward
0 new messages