How does Redis deal with non-English character?

7,713 views
Skip to first unread message

LO Yi

unread,
Aug 12, 2016, 9:39:12 AM8/12/16
to Redis DB
I am using Sinatra + Redis to make a tiny personal blog. Basically I have two pages

  1.  the page '/' use for show people what are my posts.
  2. the page '/write' is the background that I post my blog.
and it look like this


line 21 is where my Chinese post is, I know that Redis cannot save the original Chinese into database but encodes it.

like for example when I type in "哈哈", Redis converts this to "\xe5\x93\x88\xe5\x93\x88".

So when I get this value from Redis by the code in line 21, Ruby seems not realize this is Chinese, and I will meet the error of "

incompatible character encodings: ASCII-8BIT and UTF-8"


I have tried to put "#encoding:UTF-8" at the first line, it doesn't work.
I have tried put "set :default_encoding,"utf-8" in my configure as well, and that doesn't work too.
I found that convert my value by using .force_encoding('utf-8') could solve this problem, but I think there should be a better way to solve this problem. Otherwise, I have to use this method in every value that I get from Redis.

Thanks in advance!

   

Jan-Erik Rediger

unread,
Aug 12, 2016, 9:58:03 AM8/12/16
to redi...@googlegroups.com
First of: Redis doesn't care. All it sees are some bytes, it doesn't
know anything about encoding.

What you are seeing is how your client library deals with it. At least
the pure-Ruby connection will force an encoding back on the string ([1] [2]).
(I'd like to get all the encoding stuff out of redis-rb though, so it will always just return bytes)

For whatever reason it might be that something fails in between, your
external encoding is different or a dozen other reasons.

If you know that you will get UTF-8, just force it to be interpreted as UTF-8.
You can always write a wrapper method to deal with it.


[1]: https://github.com/redis/redis-rb/blob/master/lib/redis/connection/ruby.rb#L396
[2]: https://github.com/redis/redis-rb/blob/master/lib/redis/connection/command_helper.rb#L35

On Fri, Aug 12, 2016 at 04:41:14AM -0700, LO Yi wrote:
> I am using Sinatra + Redis to make a tiny personal blog. Basically I have
> two pages
>
>
> 1. * the page '/' use for show people what are my posts.*
> 2. *the page '/write' is the background that I post my blog.*
>
> *and it look like this*
>
> <https://lh3.googleusercontent.com/-lS9BpBOla5U/V62yMlhskyI/AAAAAAAAABs/LSWqX5VDP6EzPSWtH_-nJNKdDdH4upDJgCLcB/s1600/0DFF426E-B9ED-4A49-BE19-215830F6D908.png>
>
> line 21 is where my Chinese post is, I know that Redis cannot save the
> original Chinese into database but encodes it.
>
> like for example when I type in "哈哈", Redis converts this to
> "\xe5\x93\x88\xe5\x93\x88".
>
> So when I get this value from Redis by the code in line 21, Ruby seems not
> realize this is Chinese, and I will meet the error of "
> incompatible character encodings: ASCII-8BIT and UTF-8
> <http://stackoverflow.com/questions/5286117/incompatible-character-encodings-ascii-8bit-and-utf-8>
> "
>
> I have tried to put "#encoding:UTF-8" at the first line, it doesn't work.
> I have tried put "set :default_encoding,"utf-8" in my configure as well,
> and that doesn't work too.
> I found that convert my value by using .force_encoding('utf-8') could solve
> this problem, but I think there should be a better way to solve this
> problem. Otherwise, I have to use this method in every value that I get
> from Redis.
>
> Thanks in advance!
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.

LO Yi

unread,
Aug 12, 2016, 2:57:14 PM8/12/16
to Redis DB, jan...@fnordig.de
I think this has to do with the environment. I created a small "test.rb" and run this in both my server and localhost. 

require 'redis'

redis = Redis.new
redis.set("content","哈哈")
puts redis.get("content").encoding


 guess what? in my Debian(4.8.4 -1) server I got feedback like this.

However, I ran that in my local OS X10.11.5 the feedback is UTF-8.

I am kind of confused why is it different.  


在 2016年8月12日星期五 UTC-5上午8:58:03,janerik写道:

Jan-Erik Rediger

unread,
Aug 12, 2016, 3:06:14 PM8/12/16
to LO Yi, Redis DB
Because your shell environment is different.

Small reminder: post shell output/error messages/commands as text, not
as images. It's accessible, easy to read and searchable.

On Fri, Aug 12, 2016 at 11:57:14AM -0700, LO Yi wrote:
> I think this has to do with the environment. I created a small "test.rb"
> and run this in both my server and localhost.
>
> require 'redis'
>
> redis = Redis.new
> redis.set("content","哈哈")
> puts redis.get("content").encoding
>
>
> guess what? in my Debian(4.8.4 -1) server I got feedback like this.
> <https://lh3.googleusercontent.com/-2uDArO8TOuU/V64bL0SCvEI/AAAAAAAAACA/4QdlmVKzlhglAwAJI_FRDjnOS27fq5QEwCLcB/s1600/303F1718-AEBA-4520-A579-A35086D9CEDA.png>
> > an email to redis-db+u...@googlegroups.com <javascript:>.
> > > To post to this group, send email to redi...@googlegroups.com
> > <javascript:>.

LO Yi

unread,
Aug 13, 2016, 11:55:44 AM8/13/16
to Redis DB, yiiiii...@gmail.com, jan...@fnordig.de
Haha, it is my first time asking for help on mail-list, thank you for the reminder. I don't know if you can shed some more light on this shell environment. I used the command in my Terminal

LANG="UTF-8"


And this looks all set already.

xxx@XXX:~# locale


locale: Cannot set LC_CTYPE to default locale: No such file or directory


locale: Cannot set LC_MESSAGES to default locale: No such file or directory


locale: Cannot set LC_ALL to default locale: No such file or directory


LANG=UTF-8


LANGUAGE=


LC_CTYPE=UTF-8


LC_NUMERIC="UTF-8"


LC_TIME="UTF-8"


LC_COLLATE="UTF-8"


LC_MONETARY="UTF-8"


LC_MESSAGES="UTF-8"


LC_PAPER="UTF-8"


LC_NAME="UTF-8"


LC_ADDRESS="UTF-8"


LC_TELEPHONE="UTF-8"


LC_MEASUREMENT="UTF-8"


LC_IDENTIFICATION="UTF-8"


LC_ALL=



I don't know so much about Shell, I don't know if this is the correct way to set up the environment or not. Indeed, the problem still remained.


在 2016年8月12日星期五 UTC-5下午2:06:14,janerik写道:

Arnaud GRANAL

unread,
Aug 23, 2016, 2:51:16 PM8/23/16
to redi...@googlegroups.com, yiiiii...@gmail.com, jan...@fnordig.de
The problem is not in Redis.

\xe5\x93\x88\xe5\x93\x88 is u'哈哈'

What you see in Redis is the reality. What you see on your screen, is an UTF-8 *interpretation* of these bytes.

What Ruby is telling you is that it can't mix "content:" and some chinese characters because they are not the same encoding.
On the left, you have a banana, on the right, you have a cucumber. They look the same, but they taste different.

Make sure you bring bananas both side before you call Redis.

Translating the encoding (iconv or force_encoding) is cool.

To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.

LO Yi

unread,
Sep 7, 2016, 5:06:31 PM9/7/16
to Redis DB, yiiiii...@gmail.com, jan...@fnordig.de
Thanks for your resply

Well, the thing is that I didn't mix any Chinese character with "content:", the original code put a number after "content:", so the Chinese should be in the value. Here's an example.

key => value
"content:1" => "哈哈"

My Chinese article is the value of that key, but the key doesn't includes any Chinese character.

在 2016年8月23日星期二 UTC-5下午1:51:16,Arnaud Granal写道:
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages