--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I wrote some UTF8 encoding / decoding based on a state machine a while back, using principles like this:
http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ but never completed it. I think its in some never applied patch to Apache Avro, somewhere, as it was not a big enough difference in speed versus the additional complexity of code in the project. If I recall correctly, it helped quite a bit more one direction than the other. Also, at the time passing the string "UTF-8" to the jre was faster than passing a Charset constant, because the former was better optimized and created less garbage. I don't know if Java 8 has cleaned that up or not.
I wrote some UTF8 encoding / decoding based on a state machine a while back, using principles like this:
http://bjoern.hoehrmann.de/ utf-8/decoder/dfa/ but never completed it. I think its in some never applied patch to Apache Avro, somewhere, as it was not a big enough difference in speed versus the additional complexity of code in the project. If I recall correctly, it helped quite a bit more one direction than the other. Also, at the time passing the string "UTF-8" to the jre was faster than passing a Charset constant, because the former was better optimized and created less garbage. I don't know if Java 8 has cleaned that up or not.