refind regexp not working anymore ...

76 views
Skip to first unread message

Stéphane MERLE

unread,
Feb 22, 2016, 10:51:14 AM2/22/16
to Lucee
Hi,

I've just discover that one my regexp check is not working anymore ... no change have been made to the code, only an upgrade from railo to lucee (using railo for long and now lucee ;) )

the regexp is :

nom:             {regexp:"^[éèëêàäâûüùïîöôÿç'a-zA-Z\ _-]{2,50}$",erreur="invalid_nom", msg="Votre nom n'est pas valide. Sa longueur doit être comprise entre 2 et 50 caracteres."},

<cfif (not refind(CHKregexp[nom].regexp, ARGUMENTS.nom)) >
     
<cfset erreur=CHKregexp[nom].erreur>
     
<cfset erreur_detail=CHKregexp[nom].msg>
     
<cfset callit=application.mythrow(errorCode="400", erreur="#erreur#", detail="#erreur_detail#") >
</cfif>


if I try with a name with accent like 'Lashât' it fail with the error message ...

This check is in an API (REST) and is called by a php page in ajax.

have I done something wrong ?

Stéphane

Paul Klinkenberg

unread,
Feb 22, 2016, 4:28:55 PM2/22/16
to lu...@googlegroups.com
Hi Stéphane,

Having special characters inside a cfm/cfc file can be problematic when files are read back in, and should be avoided imho. It's better to have the regex something like this:
"^[\x8C\x9C\xC0\xC2\xC6-\xCB\xCE\xCF\xD4\xD9\xDB\xDC\xE0\xE2\xE6-\xEB\xEE\xEF\xF4\xF9\xFB\xFC 'a-zA-Z_-]{2,50}$"
Where all those \x characters are unicode representations of these characters: http://character-code.com/french-html-codes.php

Also, I see your regex is currently not checking for the uppercase variants of the special characters. The regex I suggested here will check for the uppercase variant as well.

Kind regards,

Paul Klinkenberg


--
Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/a3220998-adad-48de-a46e-4052f6820aba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adam Cameron

unread,
Feb 22, 2016, 6:24:58 PM2/22/16
to Lucee


On Monday, 22 February 2016 21:28:55 UTC, Paul Klinkenberg wrote:
Having special characters inside a cfm/cfc file can be problematic when files are read back in, and should be avoided imho.


What?

Why? 

Paul Klinkenberg

unread,
Feb 23, 2016, 4:32:43 AM2/23/16
to lu...@googlegroups.com
Well, I have seen numereous occasions where special characters were garbled in a cfml template (or any text document for that matter)
That usually occured after the file was ftp'd, updated in an editor with different character set, or any other action which causes the file to change character set.

Off course, measures can be taken to prevent this from happening, but I have had my fair share of bugs at different companies and platforms, with this exact problem.


Paul
--
Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

Stéphane MERLE

unread,
Feb 23, 2016, 9:46:50 AM2/23/16
to Lucee
Hi paul,

it has indeed solved the matter !

Thanks for the tip !

Stéphane

Adam Cameron

unread,
Feb 27, 2016, 3:08:26 PM2/27/16
to Lucee
On Monday, 22 February 2016 21:28:55 UTC, Paul Klinkenberg wrote:
Having special characters inside a cfm/cfc file can be problematic when files are read back in, and should be avoided imho.


What?

Why? 


On Tuesday, 23 February 2016 09:32:43 UTC, Paul Klinkenberg wrote:
Well, I have seen numereous occasions where special characters were garbled in a cfml template (or any text document for that matter)
That usually occured after the file was ftp'd, updated in an editor with different character set, or any other action which causes the file to change character set.


Well yeah, all valid observations. I think it's more just something to be mindful of, than actively avoid. That said... having this sort of content in a code file kinda suggests there's hard-coded content in the code - I imagine this is where this sort of thing mostly comes from - which is probably rather more an issue.

I do find that charset encoding is a topic that a lot of CFML devs (perhaps not just CFML ones) do seem to struggle with.

I s'pose it's just *more* complexity and "moving parts" that can contribute to possible problems.

Nando Breiter

unread,
Feb 27, 2016, 5:37:46 PM2/27/16
to lu...@googlegroups.com
Of course, for nearly any language besides English, "special" characters are normal. English is the outlier here. Interfaces will have words containing them, and these characters simply can't be avoided without misspellings - or using images in place of text - for Chinese words as an example. Both workarounds aren't at all ideal. Dealing with various and varying character sets is a pain, but unavoidable in my opinion - unless one works only in English.

The issue I typically run across in Switzerland is that someone will give me a text in an encoding other than utf-8, and then I have to convert it. If it's data, I'll import it into mySql and convert the charset there. 



Aria Media Sagl
+41 (0)76 303 4477 cell
skype: ariamedia

--
Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

Adam Cameron

unread,
Feb 28, 2016, 2:30:02 AM2/28/16
to Lucee


On Saturday, 27 February 2016 22:37:46 UTC, Nando Breiter wrote:
Of course, for nearly any language besides English, "special" characters are normal.


Yeah, I do wish people would stop using such jingoistic terms. I assure you to a lot of English speakers, there's nothing "special" about some other language's character set.

It's even worse when some muppets - usually when talking about password strength - refer to punctuation as "special characters". I know IT people are - on the whole - reasonably poor at written communication, but even to them how are things like comma and fullstops "special"?

[sigh]

Kai Koenig

unread,
Feb 28, 2016, 2:28:20 PM2/28/16
to lu...@googlegroups.com
I get your point, Adam, but you have to admit that in particular in the single-language English lands of the UK, AU, NZ and the US for a lot of English speakers, “special” (non ASCII) characters are NOT the norm and quite frankly a lot of applications are not being built to handle non-ASCII character sets.

Hence the point Nando makes is absolutely correct.

Cheers
Kai


--
Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

Adam Cameron

unread,
Feb 28, 2016, 4:15:20 PM2/28/16
to Lucee


On Sunday, 28 February 2016 19:28:20 UTC, Kai Koenig wrote:

Hence the point Nando makes is absolutely correct.


Which is why... uh... I was agreeing with him.
 
Reply all
Reply to author
Forward
0 new messages