Filenames with german umlaute making strange things on upload rename

46 views
Skip to first unread message

C.A.

unread,
Dec 13, 2016, 9:51:23 AM12/13/16
to Lucee
Hi,

I have a rail and a lucee server (4.5 and 5.1 version) doing strange things with filenames with german umlaute like ä and ü, but not with "ß".
When uploading in Windows, Android and Linux, everything works as we need it. But with MacOS and OSX and Chrome or Firefox the umlaute can not be replaced by script...

Is this a known issue?

Demolink can be provided. Code is nothing special. Just form with file and cffile upload and rename with replace.

Regards
Christian

Michael Hnat

unread,
Dec 13, 2016, 10:09:25 AM12/13/16
to lu...@googlegroups.com

Well, this sounds like a client/browser issue. Maybe the default charset in the browser is set different.

Could you check this?

 

But I know, that an ‚ä‘ on mac is a different character than the ‚ä‘ on win. Could you just hash the filename (after checking the first point)? It should be the same.

 

Best,

Michi

--
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/2e818a3e-8a5d-4f87-8735-e51894bfa7e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nando Breiter

unread,
Dec 13, 2016, 10:58:28 AM12/13/16
to lu...@googlegroups.com
Sounds like this may be an encoding issue. The problem I faced with file names with German, Chinese, French characters is that the files names were not encoded consistently. Macs will use UTF-8, I believe, Windows may use a different encoding. So if you are searching for ä using ISO-8859 or Windows-1252 encoding, your search and replace code may not find an ä encoded with UTF-8. Remember that the computer is searching for a series of 1's and 0's, using a consistent encoding is essential. There is some overlap between different encodings, hence ß may work, sometimes. 

The point to keep in mind is that you cannot control the encoding of a file being uploaded. To do what you want to do completely and correctly, your find and replace code should take into account all possible encodings, in all languages the files might be named in. 

So like I said in another thread you started on this same topic, what I've done is rename uploaded files to a name that I control, using CreateUUID() plus the extension of the upload, and give the user a form field so that they can fill in whatever name they want. In this case, I fully control the encoding of the name they type into the form field, so I avoid the potential for encoding issues. If you want to adapt that approach so that you rename the file using the name the user types in, then I believe it would be much easier to search and replace characters accurately, because you would need to work in only one encoding, the one you specify. 

What I do is store the name the user assigns to the file in the text field I provide in the database, and display that name, not the UUID file name, to the user. If they download it, the file itself can be renamed to the one the user assigned to it on the way out - the user and/or their file system can deal with any name conflicts that arise.





Aria Media Sagl
+41 (0)76 303 4477 cell
skype: ariamedia

On Tue, Dec 13, 2016 at 4:09 PM, 'Michael Hnat' via Lucee <lu...@googlegroups.com> wrote:

Well, this sounds like a client/browser issue. Maybe the default charset in the browser is set different.

Could you check this?

 

But I know, that an ‚ä‘ on mac is a different character than the ‚ä‘ on win. Could you just hash the filename (after checking the first point)? It should be the same.

 

Best,

Michi

 

Von: lu...@googlegroups.com [mailto:lu...@googlegroups.com] Im Auftrag von C.A.
Gesendet: Dienstag, 13. Dezember 2016 15:51
An: Lucee <lu...@googlegroups.com>
Betreff: [Lucee] Filenames with german umlaute making strange things on upload rename

 

Hi,

 

I have a rail and a lucee server (4.5 and 5.1 version) doing strange things with filenames with german umlaute like ä and ü, but not with "ß".

When uploading in Windows, Android and Linux, everything works as we need it. But with MacOS and OSX and Chrome or Firefox the umlaute can not be replaced by script...

 

Is this a known issue?

 

Demolink can be provided. Code is nothing special. Just form with file and cffile upload and rename with replace.

 

Regards

Christian

--
You received this message because you are subscribed to the Google Groups "Lucee" group.

To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com.


To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/2e818a3e-8a5d-4f87-8735-e51894bfa7e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com.

To post to this group, send email to lu...@googlegroups.com.

C.A.

unread,
Dec 13, 2016, 11:10:11 AM12/13/16
to Lucee
Thanks for your reply.

@Michael Hnat: The charset is UTF-8 like the locales on the debian boxes and all lucee encoding settings.
The problem is only MacOS and OS X with any browser (as we know) except Safari! So I think the "different character" is not the problem.

So hashing the file name:
File 1 on my mac: ümlaut.jpg
File 1 on my debian box uploaded via railo and lucee: u?mlaut.jpg

@Nando Breiter: This was just for railo (I thought) now I can see it is a lucee "problem" too.
The name shall be just without umlauts for SEO reason. And it worked for months. There were no changes in the code, no lucee updates, no debian updates. Maybe some browser updates or OS X update.
I am searching for the ä in UTF-8,  in a UTF-8 encoded file with all server settings set to UTF-8 (de_DE.UTF-8 in debian)

I have no clue where to look. A demo can be found here:



Best regards

Nando Breiter

unread,
Dec 13, 2016, 11:33:48 AM12/13/16
to lu...@googlegroups.com
So hashing the file name:
File 1 on my mac: ümlaut.jpg
File 1 on my debian box uploaded via railo and lucee: u?mlaut.jpg

Looks like an encoding issue to me. The ? is the encoder saying "I don't know how to render this series of bits into a character." 

 

@Nando Breiter: This was just for railo (I thought) now I can see it is a lucee "problem" too.
The name shall be just without umlauts for SEO reason. And it worked for months. There were no changes in the code, no lucee updates, no debian updates. Maybe some browser updates or OS X update.
I am searching for the ä in UTF-8,  in a UTF-8 encoded file with all server settings set to UTF-8 (de_DE.UTF-8 in debian)

I have no clue where to look.

Does the upload work if you leave the name alone? (Some files systems can't handle special characters).

I would suggest revisiting the idea that umlauts necessarily have a negative effect on search results.  


Reflexively, I tend to work around encoding issues when I can rather than try to tackle issues like this head on. There could be some bit of code in the entire process somewhere that a developer left in by mistake that is mucking up the encoding in this case, for instance. 

Zac Spitzer

unread,
Dec 13, 2016, 7:24:47 PM12/13/16
to lu...@googlegroups.com
if it's only happening on OSX with Chrome and Firefox, have you looked into their open
bug trackers to see if there is a known issue?


Both are keen to achieve consistent functionality between browsers



--
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Zac Spitzer
+61 405 847 168

Bilal

unread,
Dec 14, 2016, 9:47:35 AM12/14/16
to Lucee
can you clarify a bit.
Are you using the same server platform (OS) and different client OS for your tests or are you installing your software on different OS each time and running the test?

C.A.

unread,
Dec 14, 2016, 10:25:37 AM12/14/16
to Lucee
I have all debian boxes as for the servers (8.X and 7.6)
And yes, different Client OS (Mac OSX, Windows 10, Mint and Debian)
Do you think this is a debian thing?

I opened a bug ticket for Chrome, as I do not think that debian OS says: Oh, this is OSX and Chrome - hmmm, why not mix some characters? ;-)
Reply all
Reply to author
Forward
0 new messages