How to determine length of a UTF-8 input string using CFML

34 views
Skip to first unread message

Dave Hatz

unread,
Feb 9, 2016, 10:34:56 PM2/9/16
to Lucee
This is more of a general design issue than a CFML issue but since we are using Lucee, I figured I would give it a try on this forum.   

The problem that we are having is that we have just started accepting Japanese characters into our Oracle database.   We have set our web page encoding to UTF-8.   The problem that we are having is that we have database field lengths set to specific length, for example first name field is set to VARCHAR2(50).   But the Japanese characters are longer than 50 characters and our validation to trap for field lengths are not working and we are getting Insert errors because the field length is longer than 50 characters. 

Trying Left(firstname,50) isn't working.   

Do I just keep increasing the size of my database fields?    Or is there a way to test the length of the Japanese string?  We are running on IIS if that matters.

Any help would be greatly appreciated.

Michael Offner

unread,
Feb 10, 2016, 1:42:53 AM2/10/16
to lucee

Problem is that the jvm (where Lucee is on top) is handling all strings as unicode, so the charset only comes into the game when you read strings from outside the jvm or write them somewhere outside the jvm. Inside the jvm every letter uses one position, so that a certain letter uses 2 position when stored with a certain charset is completely independent. So what can you do? The easiest way is to convert the string to a byte array and get the length of that array like this:
// not tested
function bytelength(required string str, required string charset){
   return arrayLen( arguments.str.getBytes(arguments. charset));
}

Then you can use this function to terminate the length and cut pieces of the string until you have the right length.

We should consider to extend the function stringLen with this...

Micha

--
Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/65c3b19b-9a9a-4843-9ad5-b2282bc949ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages