Solr REST no longer likes the ":" colon or "\" character unencoded

276 views
Skip to first unread message

Brian Gillingham

unread,
Jan 22, 2018, 3:35:33 PM1/22/18
to islandora-dev
After we performed a bunch of system patches to our servers, we noticed that some Solr queries no longer worked.  Those requests now suddenly result in a 400 error.  The problem seems to be whenever any of the values contains a ":" character at all unless it is encoded as "%3A" or the "\" backslash unless encoded as "%5C".

The only packages that were patched on our Solr server were:
  • rhncfg-5.10.99-1.0.3.el6.noarch
  • rhncfg-actions-5.10.99-1.0.3.el6.noarch
  • webmin-1.870-1.noarch
  • rhncfg-client-5.10.99-1.0.3.el6.noarch
I did not perform these patches.  They were all a part of "critical security updates" as noted by our Systems Admins.

We used to be able to send Solr REST queries (or use a browser) like:
  {solr_url}?PID:(pitt\:*) AND RELS_EXT_isMemberOfSite_uri_ms:info\:fedora\/pitt\:site.historic-pittsburgh

They only work now when the backslash and colon are encoded as %5C%3A.
  {solr_url}?PID:(pitt%5C%3A*) AND RELS_EXT_isMemberOfSite_uri_ms:info%5C%3Afedora%5C/pitt%5C%3Asite.historic-pittsburgh


Has anybody had this issue with the encoded colon and backslash characters being required by Solr?

thank you,

Brian Gillingham

University of Pittsburgh | University Library System

natk...@gmail.com

unread,
Jan 22, 2018, 4:27:27 PM1/22/18
to islandora-dev

dp...@metro.org

unread,
Jan 22, 2018, 6:51:25 PM1/22/18
to islandora-dev
Brian, 

Are those calls done manually/custom/ outside vanilla islandora or have you found core Islandora modules that neglect encoding special characters? If so can you point us to where?

We have found some issues with some recent updates and different ways Tomcat has to encode (or not) URLs and arguments. We recently moved our Islandora deployment strategy to Tomcat 8 
and Solr 5.5.5 (using its own server, jetty) and we have had no issues other than the new world that Solr 5.5.5 's API opens for schema management and how strict it is. We will move all our servers to this new versions soon because it is just faster and 4.x which i like, can't handle GIS as good as Solr 5.x. I know Giancarlo Birello (CNR) has deployed Solr 4.10 as a standalone server also (see his v2p2 guide) with very good results, so that could help you to circumvent Tomcat's new rules and security updates if those are the issue if you can't solve the problem by fixing the offending code (offending because URL encoding is the right way of doing stuff, but we were used for a long time to let browsers and servers to deal with that for us).

 By the way, this whole URL encoding/decoding thing broke *xploding sound* finally adore-djatoka for us on Tomcat 8. I tried for weeks to make OpenURI resolving to work without success or I was just too tired, but I tried every combination or URL encoding, connector, etc. I gave up. So those machines now run the imperfect but working Cantaloupe image server. A different symptom of a similar illness.

Best

Diego

Giancarlo Birello

unread,
Jan 23, 2018, 8:53:11 AM1/23/18
to island...@googlegroups.com, dp...@metro.org

On 23/01/2018 00:51, dp...@metro.org wrote:
Brian, 

Are those calls done manually/custom/ outside vanilla islandora or have you found core Islandora modules that neglect encoding special characters? If so can you point us to where?

We have found some issues with some recent updates and different ways Tomcat has to encode (or not) URLs and arguments. We recently moved our Islandora deployment strategy to Tomcat 8 
and Solr 5.5.5 (using its own server, jetty) and we have had no issues other than the new world that Solr 5.5.5 's API opens for schema management and how strict it is. We will move all our servers to this new versions soon because it is just faster and 4.x which i like, can't handle GIS as good as Solr 5.x. I know Giancarlo Birello (CNR) has deployed Solr 4.10 as a standalone server also (see his v2p2 guide) with very good results, so that could help you to circumvent Tomcat's new rules and security updates if those are the issue if you can't solve the problem by fixing the offending code (offending because URL encoding is the right way of doing stuff, but we were used for a long time to let browsers and servers to deal with that for us).
The updated guide is here http://dev.digibess.it/doku.php?id=reloaded .


 By the way, this whole URL encoding/decoding thing broke *xploding sound* finally adore-djatoka for us on Tomcat 8. I tried for weeks to make OpenURI resolving to work without success or I was just too tired, but I tried every combination or URL encoding, connector, etc. I gave up. So those machines now run the imperfect but working Cantaloupe image server. A different symptom of a similar illness.
I confirm 3-server architecture (Fedora/Islandora/Cantaloupe) is working well, completely adore-djatoka free (= without adore-djatoka).
The same guide above for installation steps.
Some issues to upgrade Cantaloupe from 3.3.x to 3.4, solved here http://dev.digibess.it/doku.php?id=reloaded:is_cantsa and you need islandora_openseadragon HEAD + openseadragon 2.3.1 (see http://dev.digibess.it/doku.php?id=reloaded:is_req).

Best
Have a nice IIIF :D

Giancarlo

Diego

On Monday, January 22, 2018 at 4:27:27 PM UTC-5, natk...@gmail.com wrote:
This issue seems to be similar to the one we ran into here: https://groups.google.com/forum/#!searchin/islandora-dev/Natkeeran%7Csort:date/islandora-dev/5VHC3k53obo/9lS77FI1AwAJ



On Monday, January 22, 2018 at 3:35:33 PM UTC-5, Brian Gillingham wrote:
After we performed a bunch of system patches to our servers, we noticed that some Solr queries no longer worked.  Those requests now suddenly result in a 400 error.  The problem seems to be whenever any of the values contains a ":" character at all unless it is encoded as "%3A" or the "\" backslash unless encoded as "%5C".

The only packages that were patched on our Solr server were:

  • rhncfg-5.10.99-1.0.3.el6.noarch
  • rhncfg-actions-5.10.99-1.0.3.el6.noarch
  • webmin-1.870-1.noarch
  • rhncfg-client-5.10.99-1.0.3.el6.noarch
I did not perform these patches.  They were all a part of "critical security updates" as noted by our Systems Admins.

We used to be able to send Solr REST queries (or use a browser) like:
  {solr_url}?PID:(pitt\:*) AND RELS_EXT_isMemberOfSite_uri_ms:info\:fedora\/pitt\:site.historic-pittsburgh

They only work now when the backslash and colon are encoded as %5C%3A.
  {solr_url}?PID:(pitt%5C%3A*) AND RELS_EXT_isMemberOfSite_uri_ms:info%5C%3Afedora%5C/pitt%5C%3Asite.historic-pittsburgh


Has anybody had this issue with the encoded colon and backslash characters being required by Solr?

thank you,

Brian Gillingham

University of Pittsburgh | University Library System

--
You received this message because you are subscribed to the Google Groups "islandora-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora-dev/11877b91-b0eb-4f43-b321-0c2bb740f8ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brian Gillingham

unread,
Jan 23, 2018, 9:24:03 AM1/23/18
to islandora-dev
Diego,

From the day I started working with Islandora here at Pitt, the browser query has been a tool I've used for all kinds of things... there are thousands of URLs in my browser history that used the unencoded but escaped colon characters... but now they only work if I encode the colon and encode the backslash.

There are no modules that are blowing up (to my knowledge) with respect to this, but I did have a CRON job that ran a Solr query in order to tabulate daily stats on our collections and models (like # of images per day, etc).  That was not encoded before, but now it works since I changed the "\:" to "%5C%3A".

I don't know what patch could have even caused this, but I suspect that something inside Java (with respect to Connector) is not the same.

In the Solr's Tomcat server.xml configuration file, the comment above the Connector states: 

A "Connector" represents an endpoint by which requests are received and responses are returned. Documentation at :

         Java HTTP Connector: /docs/config/http.html (blocking & non-blocking)

         Java AJP  Connector: /docs/config/ajp.html

         APR (HTTP/AJP) Connector: /docs/apr.html

         Define a non-SSL HTTP/1.1 Connector on port 8080

 

Could this all be caused by the “Java HTTP Connector”, maybe “Java AJP  Connector”: /docs/config/ajp.html, or “APR (HTTP/AJP) Connector”: /docs/apr.html (whatever these are)?


Seeming to make all of this a moot point is a note in the Tomcat documentation https://wiki.apache.org/solr/SolrTomcat seems to indicate that Solr does not need this setting:

Solr4.1 Solr now parses request parameters (in URL or sent with POST using content-type application/x-www-form-urlencoded) in its dispatcher code. It no longer relies on special configuration settings in Tomcat or other web containers to enable UTF-8 encoding, which is mandatory for correct Solr behaviour.”


I tested this out on my VM.  The setting in its tomcat configuration included the URIEncoding=”UTF-8” parameter.  Both http://infost01-02.library.pitt.edu:8080/solr/select?q=PID:islandora%5C%3A19 and http://infost01-02.library.pitt.edu:8080/solr/select?q=PID:islandora\:19 return valid queries.  After removing the parameter from the server configuration and restarting tomcat, BOTH queries still work.  I cannot reproduce the issue on my VM… and this could have something to do with how the VM is a single server where Solr and Fedora are literally running on the same Java.

 

The same version of Solr is running in my VM.  The Connector for the VM is set to:

      <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" enableLookups="true" acceptCount="100" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" URIEncoding="UTF-8"/>

 

On pa-solr-01, the Connector value is set to:

    <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />

 

I will update this thread when I have more actionable facts.


Thanks again Diego.


Brian Gillingham

Nitish Sainani | CrewKarma

unread,
Feb 11, 2020, 7:20:04 AM2/11/20
to islandora-dev
http://localhost:8983/solr/industry/select?q=*:*&fq=industry_name:("adver%5C")&wt=json

"error":{
   
"metadata":[
     
"error-class","org.apache.solr.common.SolrException",
     
"root-error-class","org.apache.solr.parser.TokenMgrError"],
   
"msg":"org.apache.solr.search.SyntaxError: Cannot parse 'industry_name:(\"adver\\\")': Lexical error at line 1, column 25.  Encountered: <EOF> after : \"\\\"adver\\\\\\\")\"",
   
"code":400}}

Canged but the error is still comming. have any clues?
Reply all
Reply to author
Forward
0 new messages