Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

arabic and european character in one string end up in wrong element position

2 views
Skip to first unread message

Lion_b1

unread,
Jan 29, 2005, 1:44:02 PM1/29/05
to
Hello, when I try to mix arabic and europen characters in a string where the
european chars are numeric cf mixes up the resulting string. in the real
website the list is used for a java class that builds pdf-documents based on
the comma separed values. the content of the created pdf is mixed arabic /
english. the list element order is importend for the associated variables in a
rtf-document that is jused for the java class. here ist the code example:
<cfprocessingdirective pageencoding='utf-8'> <!DOCTYPE HTML PUBLIC '-//W3C//DTD
HTML 4.01 Transitional//EN'> <html> <head> <title>test arabic / ascii
character</title> <meta http-equiv='Content-Type' content='text/html;
charset=utf-8'> </head> <body> <cffunction name='testlist' output='true'>
<cfset var stTest1 = StructNew() /> <cfset var stTest2 = StructNew() />
<cfset var teststring1 = '???? ???;???? ?????? ????? ?????? ???? ??? ??????
????,001,???? ???,???,???? ????,002' /> <cfset var teststring2 = '????
???;???? ?????? ????? ?????? ???? ??? ?????? ????,e001,???? ???,???,????
????,e002' /> <cfset var testlist1 = '' /> <cfset var testlist2 = '' />
<cfset stTest1.str1 = '???? ???' /> <cfset stTest1.str2 = '???? ?????? ?????
?????? ???? ??? ?????? ????' /> <cfset stTest1.str3 = '001' /> <cfset
stTest1.str4 = '???? ???,???,???? ????' /> <cfset stTest1.str5 = '002' />
<cfset stTest2.str1 = '???? ???' /> <cfset stTest2.str2 = '???? ?????? ?????
?????? ???? ??? ?????? ????' /> <cfset stTest2.str3 = 'e001' /> <cfset
stTest2.str4 = '???? ???,???,???? ????' /> <cfset stTest2.str5 = 'e002' />
<cfset testlist1 = testlist1 &amp; stTest1.str1 &amp; ';' /> <cfset testlist1
= testlist1 &amp; stTest1.str2 &amp; ';' /> <cfset testlist1 = testlist1 &amp;
stTest1.str3 &amp; ';' /> <cfset testlist1 = testlist1 &amp; stTest1.str4
&amp; ';' /> <cfset testlist1 = testlist1 &amp; stTest1.str5 &amp; ';' />
<cfset testlist2 = testlist2 &amp; stTest2.str1 &amp; ';' /> <cfset testlist2
= testlist2 &amp; stTest2.str2 &amp; ';' /> <cfset testlist2 = testlist2 &amp;
stTest2.str3 &amp; ';' /> <cfset testlist2 = testlist2 &amp; stTest2.str4
&amp; ';' /> <cfset testlist2 = testlist2 &amp; stTest2.str5 &amp; ';' />
<cfoutput><p><strong>Testlist-1:</strong> #testlist1#</p></cfoutput>
<cfoutput><p><strong>Testlist-2:</strong> #testlist2#</p></cfoutput>
<cfoutput><p><strong>Teststring-1:</strong> #testlist1#</p></cfoutput>
<cfoutput><p><strong>Teststring-2:</strong> #testlist2#</p></cfoutput>
<cfreturn> </cffunction> <cfset testlist() /> </body> </html> this page
gives the following output: Testlist-1: ???? ???;???? ?????? ????? ?????? ????
??? ?????? ????;001;???? ???,???,???? ????;002; Testlist-2: ???? ???;????
?????? ????? ?????? ???? ??? ?????? ????;e001;???? ???,???,???? ????;e002;
Teststring-1: ???? ???;???? ?????? ????? ?????? ???? ??? ?????? ????;001;????
???,???,???? ????;002; Teststring-2: ???? ???;???? ?????? ????? ?????? ????
??? ?????? ????;e001;???? ???,???,???? ????;e002; so the string where contain
only numeric chars '001' and '002' is in a wrong order, and in this case my
list for the called java method is wrong as well. any help for a solution of
the problem is highly appreceated. thanks. bernhard

PaulH

unread,
Jan 30, 2005, 10:46:36 PM1/30/05
to
you have "directionally ambiguous" text embedded in an RTL text stream. pretty
much any applicatiion that uses some form of the "Unicode Bidirectional
Algorithm" will have some problems with that (appending LTR text to RTL text
puts it after the left most char). try editng the full string in notepad to see
what i mean, even switching to arabic locale it's still pretty frustrating.

as you have already shown (appending an "e" to your 001/002 text), you could
swap to using something that's directionally unambiguous like "abc" or "a01".
if you can't do that you might try inserting the unicode RLE (8235) or LRE
(8234) control chars in your text stream. this is however generally frowned
upon (though i do it myself when i have to mix RTL/LTR text in things like form
selects, etc.).


0 new messages