Character Encoding

173 views
Skip to first unread message

Jamal

unread,
Nov 22, 2012, 2:34:02 PM11/22/12
to light...@googlegroups.com
Hi, I am not sure if this is the right place to ask. I am using LightCouch, connecting to CouchDB. I am creating some documents, with fields having text in Arabic. On Futon web app, I see the text displayed correctly, but using LightCouch, after querying the document, I get the string with Characters like this "محمود درويش".
Has anyone came across this problem, I am not sure if the problm is in my CouchDB settings, or I have to do something extra, to make these strings display correctly.
Can you help please.

Thanks
Jamal

Ahmed Yehia

unread,
Nov 23, 2012, 10:55:50 AM11/23/12
to light...@googlegroups.com
Hi Jamal,
Lightcouch internally uses UTF-8 which should be fine with Arabic language.
The issue might be elsewhere, eg. where the text is displayed past getting from the database, a JSP page ?

--
- Ahmed

Message has been deleted
Message has been deleted
Message has been deleted

Ahmed Yehia

unread,
Nov 29, 2012, 7:34:22 PM11/29/12
to light...@googlegroups.com
Hi Jamal,

After some testing, I can't confirm whether the issue is a missing encoding or other, I tried the library in a web application; with several languages including Arabic, it worked w/o any issues.  (both save and find operations)

Let me know pl if you have more updates on the issue.

Regards,
Ahmed


On Sun, Nov 25, 2012 at 12:47 PM, Jamal <jamal....@gmail.com> wrote:
Hi Ahmed,
Thanks for the reply and for this great work. I managed to solve this problem, but I am not sure if this is the right approach. After looking at many parts of the application, I am using for client-side GWT, with RPC remote servlets. So many things could be causing this problem. But when I've tried using Gson on it's own, I had Arabic coming from the CouchDB dispalying the correct characters. Here's an example I did using Gson only:

BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
Poet results = new Gson().fromJson(reader, Poet.class);

You see the InputStreamReader, if I don't pass the second argument, the charsetName which is "UTF-8", I get Arabic strings with these funny shapes, but if I do, I get them displayed correctly and this is with the same data I've got saved on the db.
I've checkedout LightCouch source code from Github and modified for example on CouchDbClientBase on line 166, basically the method that is in use for getting data. Inside that method, the InputStreamReader in use I had to pass to it the charsetName which is "UTF-8", and all characters now start to come displaying correctly.
So I don't know if this is the right approach, but by doing that this has solved my problem for now.

Thanks again for this great work.

Jamal



--
- Ahmed

Mladen Dryankov

unread,
Nov 16, 2014, 6:11:31 PM11/16/14
to light...@googlegroups.com
Hi, 

I was very curious why it happens to me as well and after a bit  more digging -  here is the answer...
The Java VM uses some system defaults like Windows system locale, regional settings etc... 

If you need UTF-8 IO, but it doesn't happen somehow... you need to use -Dfile.encoding=UTF-8 java argument or simple set it through System,setProperty("file.encoding","UTF-8");

Hope this help to someone!
Mladen

Ahmed Yehia

unread,
Nov 17, 2014, 6:07:50 PM11/17/14
to light...@googlegroups.com
Thanks for the info.

--
You received this message because you are subscribed to the Google Groups "LightCouch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lightcouch+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sergey Korolev

unread,
Feb 15, 2015, 4:35:55 PM2/15/15
to light...@googlegroups.com
Hello,
The same troubles.
Seems this is a bug in CouchDbClientBase.java,  function  <T> T get(URI uri, Class<T> classType) 
You shoud implicitly specify charset:

...return getGson().fromJson(new InputStreamReader(in, "UTF-8"), classType);


вторник, 18 ноября 2014 г., 2:07:50 UTC+3 пользователь Ahmed Yehia написал:

Ahmed Yehia

unread,
Apr 25, 2015, 11:07:44 AM4/25/15
to light...@googlegroups.com
The encoding issues is fixed in 0.1.8

Ankur Prakash Srivastava

unread,
Jul 10, 2015, 4:17:30 AM7/10/15
to light...@googlegroups.com
Thanks Man, You are a life saver !!!

Harsh Gupta

unread,
Jul 29, 2016, 11:16:21 PM7/29/16
to LightCouch
Hi Ahmed,

There seems to be a bug in your lightcouch api version 0.1.8 to not set the UTF charset while retrieving data from Inputstream and the fix seems to be made in these 2 method below to pass the UTF-8 as an additional parameter in  new InputStreamReader(instream, "UTF-8").  I have extensively tested the code and jumped to the conclusion. What is your opinion on this and how and when can fix be made, if it has to be made? Please respond

public <T> List<T> query(Class<T> classOfT) {
InputStream instream = null;
try {  
Reader reader = new InputStreamReader(instream = queryForStream());

private <V> V queryValue(Class<V> classOfV) {
InputStream instream = null;
try {  
Reader reader = new InputStreamReader(instream = queryForStream());

Thanks
Harsh
Reply all
Reply to author
Forward
0 new messages