On Thu, 06 Jun 2013 18:59:37 +0200, Simon Krahnke <
over...@gmx.li>
wrote in <
877gi7b...@xts.gnuu.de>:
>* Charles Calvert <
cb...@yahoo.com> (18:58) schrieb:
>
>>On Sun, 02 Jun 2013 15:08:04 +0200, Simon Krahnke <
over...@gmx.li>
>>wrote in <
87fvx0b...@xts.gnuu.de>:
>>
>>>* Charles Calvert <
cb...@yahoo.com> (17:00) schrieb:
>>>
>>>> On Fri, 31 May 2013 07:43:03 +0200, Simon Krahnke <
over...@gmx.li>
>>>>
>>>>>* Charles Calvert <
cb...@yahoo.com> (2013-05-30) schrieb:
>>>>>
>>>>>> 1.9, on the other hand, has built-in support for encoded strings and
>>>>>> conversion for file i/o. Here's some demo code that I wrote for a
>>>>>> talk that I gave on Unicode in Ruby:
>>>>>>
>>>>>> #!/usr/bin/env ruby
>>>>>> # encoding: UTF-8
>>>>>>
>>>>>> File.open('utf8.txt', 'w') do |file|
>>>>>> puts "Writing a UTF-8 file"
>>>>>> file.write('Tomás')
>>>>>
>>>>> That String is UTF-8 because of the default encoding specified in the
>>>>> encoding magic comment above.
>>>>
>>>> Correct.
>>>>
>>>>> But why is the File written in UTF-8, because of the same reason?
>>>>
>>>> I believe so, though I haven't checked the source to verify.
>
>I've looked through the code and it looks to me like the default is
>Encoding.default_external, which seems to be initialized by the locale,
>not the file's encoding. I can't find a place to find the source files
>encoding from within Ruby.
That makes sense from what I've seen. Detecting the encoding of a
file without a BOM is a tricky process, and there are libraries to do
it, so building it into the core seems like overkill.
>>> But you can make it explicit, like you did for reading, can't you.
>>
>> Yes, as well as specifying an in-memory encoding that is different
>> from the file's encoding on disk.
>
>puts and the like seem to just dump that internal encoding out, right?
The internal encoding of the string, yes.
>>> I think that would be a good idea, to keep things local. Someone
>>> might change the encoding of the file, and then the file will have
>>> a different encoding.
>>
>> Except that specifying the encoding doesn't transform the data if the
>> actual encoding is something other than what you specified. Maybe I
>> misunderstood you.
>
>That was based an false premises anyway. The internal encoding doesn't
>inform the default encoding of files written, the locale does.
From my testing, it appears to be the encoding of the string written
to the file, rather than the locale.
>>> Some other application might try read the file as UTF-8, though.
>>
>> Yes. You have to be careful with encodings. :)
>
>Which too should expect to find the file be encoded with what the locale
>says.
I never assume when it comes to user input. :)
>>>For string literals there is no way to declare the encoding locally,
>>
>> No, but you can escape them (e.g. "\x00\x50\x00\x65\x00\xF1\x00\x61")
>> if you need a literal in an encoding other than the default.
>
>But that string will still have an encoding attributed with it that says
>file's encoding.
String#force_encoding is useful there.