I made some progress.
[By the way, NetBean's console displays *everything* 100% fine.
I decided to use one of the worst repl consoles: that of IntelliJ.
I want to make sure I really understand what's the point behind all
this.]
(import '(
java.io PrintWriter PrintStream FileInputStream)
'(java.nio CharBuffer ByteBuffer)
'(java.nio.charset Charset CharsetDecoder CharsetEncoder)
'(org.xml.sax InputSource))
(def utf8 "UTF-8")
(def d-utf8 (.newDecoder (Charset/forName utf8)))
(def e-utf8 (.newEncoder (Charset/forName utf8)))
(def latin1 "ISO-8859-1")
(def d-latin1 (.newDecoder (Charset/forName latin1)))
(def e-latin1 (.newEncoder (Charset/forName latin1)))
(defmacro with-out-encod
[encoding & body]
`(binding [*out* (PrintWriter. (PrintStream. System/out true
~encoding) true)]
~@body
(flush)))
(def s "québécois français")
(print s) ;quÔøΩbÔøΩcois franÔøΩaisnil
(with-out-encod latin1 (print s)) ;qu?b?cois fran?aisnil
(with-out-encod utf8 (print s)) ;qu?b?cois fran?aisnil
(def encoded (.encode e-utf8
(CharBuffer/wrap "québécois français")))
(def s-d
(.toString (.decode d-utf8 encoded)))
(print s-d) ;quÔøΩbÔøΩcois franÔøΩaisnil
(with-out-encod latin1 (print s-d)) ;qu?b?cois fran?aisnil
(with-out-encod utf8 (print s-d)) ;qu?b?cois fran?aisnil
(def f-d
(:content (let [x (InputSource. (FileInputStream. "french.xml"))]
(.setEncoding x latin1)
(clojure.xml/parse x))))
(print f-d) ;quÔøΩbÔøΩcois franÔøΩaisnil
(with-out-encod latin1 (print f-d)) ;québécois français
(with-out-encod utf8 (print f-d)) ;québécois français
So my theory, which is still almost certainly wrong, is:
1. When the input is a file whose encoding is, say, latin-1, it's easy
to decode it and then encode it however one wants.
2. When the input is a literal string in the source file, it looks
like it's impossible to encode it correctly, unless one first decodes
it from the source file's encoding. But then, I don't yet know how to
do this without actually reading the source file. :\