Hi Mariano,
thanks for the tips. If I could rely on a BOM, things were easy, as you say. Unfortunately, I can't. The streams I have to handle com from sources that do not respect any standards (i.e. German Banks) ;-) .
IIUC, 'file' will only look for magic numbers, right? There are no magic numbers in these files. It 's really just text files that may or may not be encoded in UTF-8, the only chance to find out is to find a first occurence of an UTF-8 encoded character, just to guess this might be UTF-8. The fact that I receive these files as uploads from the browser would mean I have to save them to disk to use 'file'. Plus, 'file' is not available on Windows. So I really look forward to 9.3 ;-)
For now, I'll brush up my little Stream reading knowledge and implement some naive search for UTF-encoded German Umlaut sequences in the uploads. Far from perfect, I know. Lots of "but that won't work for X (like french characters)".
Anyways, It's great Instantiations has this area on their radar and we'll get a libicu based solution in 9.3! Thanks for that, keep up the good work!
Joachim