Non-English Label Support

243 views
Skip to first unread message

Thanomsak Ajjanapanya

unread,
Dec 24, 2014, 6:40:34 AM12/24/14
to cesiu...@googlegroups.com
Hi Cesium-Dev Team,

First, thanks for contributing this fantastic library.
I'm Thai engineer and now starting to adopt Cesium 3D view visualization.
I've followed Sandcastle Label example to show Thai Language and found that some Thai vowel/tone mark on upper/superscript position was invisible on the scene. (Ex. "ตึก" show as "ตก")

I've checked and found Glyph.billboard of that missing character is undefined.
Any idea or suggestion to correctly display label?

Thanks in advance.

Matthew Amato

unread,
Dec 29, 2014, 4:48:58 PM12/29/14
to cesiu...@googlegroups.com
This should work.  I'm not super familiar with language encoding; but if you're using Sandcastle, could this possibly be caused by the page encoding itself not matching the language you're using?

Can you post your code to recreate the issue?  Thanks.

--
You received this message because you are subscribed to the Google Groups "cesium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cesium-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanomsak Ajjanapanya

unread,
Jan 7, 2015, 7:27:21 AM1/7/15
to cesiu...@googlegroups.com
Hi Matthew,

Thanks for your reply.
I've tried to set page encoding from UTF-8 to TIS-620 under my localhost but result still be the same.
I've put simple code to reproduce on jsfiddle here.
http://jsfiddle.net/mjsvpoyg/

Thanomsak.

gowtham.m...@gmail.com

unread,
Feb 27, 2015, 9:51:05 AM2/27/15
to cesiu...@googlegroups.com
Hi Cesium Dev Team,

Even I face the same problem as mentioned above. When I use label text with hindi characters it's showing extra chars with hindi characters. If you provide any solution that would be great.

Thanks,
Gowtham.

Matthew Amato

unread,
Feb 27, 2015, 2:41:01 PM2/27/15
to cesiu...@googlegroups.com
So I looked into this some and I think the main problem is the way Unicode works in JavaScript.  Here's a fairly extensive article on the subject: https://mathiasbynens.be/notes/javascript-unicode 

The problem is specifically the way JavaScript deals with surrogate pairs (two characters that actually form one glyph). For example, alert('ตึก'.length) reports 3, even though visually there are clearly 2 characters. In order to support a large amount of strings, the LabelCollection renders each glyph as a separate image, so when we iterate the string, we actually create two separate characters for one glyph (because JavaScript itself treats a surrogate pair as two characters).  

I'm not sure there's an easy solution for this problem using our current label rendering techniques.  The good news is it's incredibly easy to work around by just using Billboards instead. (which is how labels are implemented anyway) Below is an example of doing that in Cesium.  Basically rename LabelCollection to BillboardCollection and instead of assigning a text property, use writeTextToCanvas to create an image from the text and assign it to the image property. 

The only downside to this approach is that it will use more texture memory and if you have dynamic textures, you could run out.  Give it a try and let me know how it works out.  I also opened a GitHub issue so that we look at this in the future: https://github.com/AnalyticalGraphicsInc/cesium/issues/2521


var viewer = new Cesium.Viewer('cesiumContainer');
var scene = viewer.scene;
var camera = viewer.scene.camera;
camera.lookAt(Cesium.Cartesian3.fromDegrees(100.5382368,13.8, 50000),
              Cesium.Cartesian3.fromDegrees(100.5382368,13.7242002, 0), Cesium.Cartesian3.UNIT_Z);
var labels = scene.primitives.add(new Cesium.BillboardCollection());
labels.add({
    position : Cesium.Cartesian3.fromDegrees(100.545624,13.743179),
    image  : Cesium.writeTextToCanvas('ตึก', { font: '24px san-serif' })
});

Let me know how it works out.

Thanks,
Matt
    

Hyper Sonic

unread,
Feb 27, 2015, 4:28:00 PM2/27/15
to cesiu...@googlegroups.com
You can replace text with this and get the same result
text: String.fromCharCode(0xe15,0xe36,0xe01)

Codes gotten from
console.log('ตึก'.charCodeAt(0).toString(16));
console.log('ตึก'.charCodeAt(1).toString(16));
console.log('ตึก'.charCodeAt(2).toString(16));
console.log(String.fromCharCode(0xe15,0xe36,0xe01));

Doing this you can see the components of the surrogate pairs
console.log('ตึก'.charAt(0));
console.log('ตึก'.charAt(1));
console.log('ตึก'.charAt(2));

The first glyph is a combination of 0xe15 (the bottom of the first glyph) and 0xe36 (the top of the first glyph). For some reason label isn't showing the top part of the first glyph. Though I'm not sure how one is suppose to know that the 2 glyphs are suppose be printed at the same character position.

Matthew Amato

unread,
Feb 27, 2015, 4:36:27 PM2/27/15
to cesiu...@googlegroups.com
Though I'm not sure how one is suppose to know that the 2 glyphs are suppose be printed at the same character position.

Exactly, that's the root of the problem.  It appears that they are trying to fix this for ES6 and we may be able to use the code form here to fix it: 


--

Hyper Sonic

unread,
Feb 27, 2015, 5:37:04 PM2/27/15
to cesiu...@googlegroups.com
I was reading through some of the links. There are some wild Unicode characters out there from character combining! This is one is 75 characters long, yet it only takes 6 character slots (well horizontally, vertically it takes like 5.)



console.log("ZALGO!".length);


They also talk about a new commands codePointAt() and fromCodePoint. While charCodeAt assumes 8bits per char I presume that this can be any multiple of 8bits per char.

Reply all
Reply to author
Forward
0 new messages