RSolr and String encoding

86 views
Skip to first unread message

Anders

unread,
Nov 5, 2009, 12:27:02 PM11/5/09
to rsolr

Is submitting queries using UTF-8-encoded strings as parameters
supported? I'm asking as I'm seeing this error when querying from the
console:

/Users/anders/.rvm/gems/ruby/1.9.1/gems/rsolr-0.9.6/lib/rsolr/
http_client.rb:105: warning: regexp match /.../n against to UTF-8
string

The message is repeated three times. As is evident from the message,
I'm using Rsolr 0.9.6 on Ruby 1.9.1. I suspect that it might be the
Ruby version that is causing the trouble.

Any hints are greatly appreciated.

Anders

matt mitchell

unread,
Nov 5, 2009, 10:01:18 PM11/5/09
to rsolr
Hi Anders,

Looks like String #size for multibyte chars is different between 1.8
and 1.9. I've added a fix in the latest RSolr and just released it as
0.9.7.1:

http://gemcutter.org/gems/rsolr/versions/0.9.7.1

Try that out and let me know how it works for you. Thanks for the
report!

Matt

Anders Johannsen

unread,
Nov 6, 2009, 5:14:01 AM11/6/09
to rs...@googlegroups.com

Hi Matt,

Thank you for the speedy reply. There seems to be a problem with the new gem, perhaps a missing file somewhere?

ruby-1.9.1-p243 > require 'rsolr'
LoadError: no such file to load -- rsolr/client
from /Users/anders/.rvm/gems/ruby/1.9.1/gems/rsolr-0.9.7.1/lib/rsolr.rb:60:in `<module:RSolr>'
from /Users/anders/.rvm/gems/ruby/1.9.1/gems/rsolr-0.9.7.1/lib/rsolr.rb:9:in `<top (required)>'
from (irb):1:in `require'
from (irb):1
from /Users/anders/.rvm/ruby-1.9.1-p243/bin/irb:15:in `<main>'

Loading fails in the exact same way on Ruby 1.8.7

Anders


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "rsolr" group.
To post to this group, send email to rs...@googlegroups.com
To unsubscribe from this group, send email to rsolr+un...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/rsolr?hl=en
-~----------~----~----~----~------~----~------~--~---


matt mitchell

unread,
Nov 6, 2009, 8:46:23 AM11/6/09
to rsolr
Woops, sorry about that. I just published 0.9.7.2 -
http://gemcutter.org/gems/rsolr/versions/0.9.7.2 - which should fix
that problem.

Thanks again,
Matt

Anders Johannsen

unread,
Nov 9, 2009, 5:05:52 AM11/9/09
to rs...@googlegroups.com

Thank you. That certainly gets rid of both the error and the warnings. Pursuing the Ruby 1.9.1 compatibility quest further, I discovered that the responses from the rsolr-lib are always encoded in "Encoding:US-ASCII" even though the HTTP response headers states that the response text should be interpreted as "text/plain; charset=utf-8".

This may be an issue with the underlying http client though. As far as I can tell, net/http does not try to be clever about the string encoding of the response and just returns the default Encoding:ASCII-8BIT. If we are reasonably sure that a response from Solr is always utf-8 encoded - which I suspect - it will be safe to do something like

response.force_encoding(Encoding::UTF_8) if response.encoding

on the output from net/http in rsolr. But I'm not well enough acquainted with Solr to tell if that is really the case.

Anders

matt mitchell

unread,
Nov 9, 2009, 9:38:51 AM11/9/09
to rsolr
Hi Anders. I'll look into this as soon as I get a chance, maybe
tonight sometime. Thanks again,

Matt

On Nov 9, 5:05 am, Anders Johannsen <and...@johannsen.com> wrote:
> Thank you. That certainly gets rid of both the error and the warnings.  
> Pursuing the Ruby 1.9.1 compatibility quest further, I discovered that  
> the responses from the rsolr-lib are always encoded in "Encoding:US-
> ASCII" even though the HTTP response headers states that the response  
> text should be interpreted as "text/plain; charset=utf-8".
>
> This may be an issue with the underlying http client though. As far as  
> I can tell, net/http does not try to be clever about the string  
> encoding of the response and just returns the default  
> Encoding:ASCII-8BIT. If we are reasonably sure that a response from  
> Solr is always utf-8 encoded - which I suspect - it will be safe to do  
> something like
>
> response.force_encoding(Encoding::UTF_8) if response.encoding
>
> on the output from net/http in rsolr. But I'm not well enough  
> acquainted with Solr to tell if that is really the case.
>
> Anders
>
> Den 06/11/2009 kl. 14.46 skrev matt mitchell:
>
>
>
> > Woops, sorry about that. I just published 0.9.7.2 -
> >http://gemcutter.org/gems/rsolr/versions/0.9.7.2- which should fix
>  smime.p7s
> 2KViewDownload
Reply all
Reply to author
Forward
0 new messages