How to control encoding when a page is served from a couchapp (on cloudant)

90 views
Skip to first unread message

Alexander Gabriel

unread,
Sep 23, 2013, 5:54:42 AM9/23/13
to couc...@googlegroups.com
Hi experts

I have a weird problem. I'm not really sure where to ask for help so please point me to the right place in case this is the wrong group.

A little explaining first:
Besides containing a lot of helpful information on Swiss species it enables biologists to export the data and also import own data and reexport it combined with other data in the couchapp.
Switzerland means the used language is German. German contains lots of special characters like ä, ö, ü, Ä, Ö, Ü.
So the data in this couchapp is full of them. And also: The names used in JSON for fields, for instance: "Artname vollständig": "Pulsatilla vulgaris Mill. (Gewöhnliche Küchenschelle)".
Coming from a relational background I realize that these characters can be troublesome, especially when used as field names but in this use-case preventing it from happening would make the app a whole lot more complicated and less intuitive for the users.

The issue
In short:
When the couchapp is served from cloudant and the field name "Artname vollständig" is read from an input using "getAttribute" and encoded using "encodeURIComponent" (this happens in the browser), the result is: "Artname%20vollsta%CC%88ndig". This is wrong and breaks my app. When the couchapp is served on my local machine the result is: "Artname%20vollst%C3%A4ndig" which is the desired result.

More detailed:
I have code that runs on an event after the user has chosen data to export. The code loops through checkboxes and creates an array of objects containing (also) the field names. The code gets the field names from an attribute named "feld" of the checkbox (I've omitted unrelevant classes and additional attributes of the input)
<div class="checkbox">
  <label>
    <input type="checkbox" feld="Artname vollständig">Artname vollständig
  </label>
</div>
running this code:
console.log("this.getAttribute('feld') = " + this.getAttribute('feld'));
gives as expected: "$(this).attr('feld') = Artname vollständig"
If while looping, I run:
console.log('encodeURIComponent("Artname vollständig") = ' + encodeURIComponent("Artname vollständig"));
the answer is correct: 'encodeURIComponent("Artname vollständig") = Artname%20vollst%C3%A4ndig'
But if I run:
console.log("encodeURIComponent(this.getAttribute('feld')) = " + encodeURIComponent(this.getAttribute('feld')));
the answer is: "encodeURIComponent(this.getAttribute('feld')) = Artname%20vollsta%CC%88ndig". But only when the app is served from cloudant. On my local development machine the result is correct.

The evironment
On my local machine:
Windows 8
apache couchdb 1.4.0
chrome 29.0.1547.76 m

Server side the couchapp runs on www.cloudant.com (bigcouch)

Proposed solution
offered by jasonslyvia on the above mentioned stackoverflow thread: "seems like erlang server, can you manage to add a Content-Type:text/html;charset=utf-8 header when your server send responses"
Unfortunately I'm too much of a noob to understand how to do this under the circumstances (couchapp hosted on cloudant.com).

I've searched quite a while but haven't found any information relating to couchapps that seems to nail the problem (mainly information on how to control encoding when serving list-functions).
Help is much appreciated!
Alex




Dave Cottlehuber

unread,
Sep 23, 2013, 6:48:40 AM9/23/13
to couc...@googlegroups.com, Alexander Gabriel
Hey Alex,

I think you should be able to wrap this into a list (or show for a doc) and send custom type as in http://docs.couchdb.org/en/latest/ddocs.html#list-functions

Anyway I'd suggest you send ask cloudant support on irc at #cloudant or sup...@cloudant.com to see if they have a way to encourage cloudant's stack to do that directly.

MfG/Cheers
Dave Cottlehuber


On 23. September 2013 at 11:54:45, Alexander Gabriel (alexande...@bluewin.ch) wrote:
>
>Hi experts
>
>I have a weird problem. I'm not really sure where to ask for help so please
>point me to the right place in case this is the wrong group.
>I first posted it here:
>http://stackoverflow.com/questions/18768047/encodeuricomponent-encodes-differently-depending-on-environment?noredirect=1
>
>*A little explaining first:*
>I have this
>couchapp: https://barbalex.cloudant.com/artendb/_design/artendb/index.html
>Besides containing a lot of helpful information on Swiss species it enables
>biologists to export the data and also import own data and reexport it
>combined with other data in the couchapp.
>Switzerland means the used language is German. German contains lots of
>special characters like ä, ö, ü, Ä, Ö, Ü.
>So the data in this couchapp is full of them. And also: The names used in
>JSON for fields, for instance: "Artname vollständig": "Pulsatilla vulgaris
>Mill. (Gewöhnliche Küchenschelle)".
>Coming from a relational background I realize that these characters can be
>troublesome, especially when used as field names but in this use-case
>preventing it from happening would make the app a whole lot more
>complicated and less intuitive for the users.
>
>*The issue*
>In short:
>When the couchapp is served from cloudant and the field name "Artname
>vollständig" is read from an input using "getAttribute" and encoded using "encodeURIComponent"
>(this happens in the browser), the result is: "Artname%20vollsta%CC%88ndig".
>This is wrong and breaks my app. When the couchapp is served on my local
>machine the result is: "Artname%20vollst%C3%A4ndig" which is the desired
>result.
>
>More detailed:
>I have code that runs on an event after the user has chosen data to export.
>The code loops through checkboxes and creates an array of objects
>containing (also) the field names. The code gets the field names from an
>attribute named "feld" of the checkbox (I've omitted unrelevant classes and
>additional attributes of the input)
>
>

>
>Artname vollständig
>

>
>running this code:
>
>console.log("this.getAttribute('feld') = " + this.getAttribute('feld'));
>
>gives as expected: "$(this).attr('feld') = Artname vollständig"
>If while looping, I run:
>
>console.log('encodeURIComponent("Artname vollständig") = ' + encodeURIComponent("Artname vollständig"));
>
>the answer is correct: 'encodeURIComponent("Artname vollständig") =
>Artname%20vollst%C3%A4ndig'
>But if I run:
>
>console.log("encodeURIComponent(this.getAttribute('feld')) = " + encodeURIComponent(this.getAttribute('feld')));
>
>the answer is: "encodeURIComponent(this.getAttribute('feld')) =
>Artname%20vollsta%CC%88ndig". But only when the app is served from
>cloudant. On my local development machine the result is correct.
>
>*The evironment*
>On my local machine:
>Windows 8
>apache couchdb 1.4.0
>chrome 29.0.1547.76 m
>
>Server side the couchapp runs on www.cloudant.com (bigcouch)
>
>*Proposed solution*
>offered by jasonslyvia on the above mentioned stackoverflow thread: "seems
>like erlang server, can you manage to add a
>Content-Type:text/html;charset=utf-8 header when your server send responses"
>Unfortunately I'm too much of a noob to understand how to do this under the
>circumstances (couchapp hosted on cloudant.com).
>
>I've searched quite a while but haven't found any information relating to
>couchapps that seems to nail the problem (mainly information on how to
>control encoding when serving list-functions).
>Help is much appreciated!
>Alex
>
>
>
>
>--
>You received this message because you are subscribed to the Google Groups "CouchApp" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to couchapp+u...@googlegroups.com.
>To post to this group, send email to couc...@googlegroups.com.
>Visit this group at http://groups.google.com/group/couchapp.
>For more options, visit https://groups.google.com/groups/opt_out.
>

Benoit Chesneau

unread,
Sep 23, 2013, 7:04:47 AM9/23/13
to couc...@googlegroups.com
Are you storing the data as plain utf8?

- benoit

Alexander Gabriel

unread,
Sep 24, 2013, 4:17:00 AM9/24/13
to couc...@googlegroups.com
hm. How would I know? How can I influence it?

It's stored as JSON which I thought always uses utf8.
I mostly us the jquery.couch.js library for storing.
The web-app is single page and includes <meta charset="utf-8"> in the head.

Alex



2013/9/23 Benoit Chesneau <bche...@gmail.com>

llabball

unread,
Oct 22, 2013, 5:10:16 AM10/22/13
to couc...@googlegroups.com
Browser

if you requesting per JSONP with jQuery you can do something like:

var req = $.ajax({
    url: url,
    dataType: 'jsonp',
    scriptCharset: 'utf-8'
  })

CouchDB

Inside a list you can return 

start({'headers': {'Content-Type' : 'application/json; charset=utf8'}, 'code' : 200});

Inside a list or show you can also

return {
  'headers': {'Content-Type' : 'application/json; charset=utf8'},
  'code' : 200,
  'body': 'Artname vollständig'
}

Alexander Gabriel

unread,
Nov 21, 2013, 9:35:41 AM11/21/13
to couc...@googlegroups.com
Hi guys, just a follow up in case somebody runs into this issue later.

It seems to be unique to cloudant. This is the answer I got from their very helpful support:

**********

OK - I think I've found the culprit. The issue is that, due to internal optimisations (which are not present in CouchDB), the form of unicode strings can get changed. In this case, ä is represented as:

U+0061 LATIN SMALL LETTER A character
U+0308 COMBINING DIAERESIS character (&#x0308;)

instead of

U+00E4 LATIN SMALL LETTER A WITH DIAERESIS character (&#x00E4;)

Both are semantically equivalent, so the fix is to normalize your unicode strings before comparison. Unfortunately, JavaScript has no built-in unicode normalization, but you can use a library such ashttps://github.com/walling/unorm.

**********

It's not an issue for me any more as I changed to a virtual server running on digitalocean.com with vanilla couchdb (and am very happy with it).
But I do think this could hit others developing apps in German or other languages needing utf8.

Thanks again for your great help.

Alex

2013/10/22 llabball <thew...@googlemail.com>

--
Reply all
Reply to author
Forward
0 new messages