The new implement encoding method in JDK7, and I want to using it.

3 views
Skip to first unread message

罗勇刚(Yonggang Luo)

unread,
Jul 24, 2009, 9:26:15 AM7/24/09
to Google Groups
At the last time, I implement the GBK charset converter in falcon. I
used the method that used in JDK. At that time, I am curious why the
charset converter of JDK7's repo is changing so much,  And many
enocding was deleted from the repo, such as ISO_8859_2 ~ ISO_8859_16.
And I have
to get it from the early version of JDK7 repo. And now I know why,
it's just because of JDK7 tl(Tools and Library) team is implementing
an new method for the charset converter. And if so, the source code
size of the charset converter is much smaller. I want to using this
new method, even it's not so mature, but at least it's working:).
And also, at that time, i don't know why JDK7 charset converter
using the "Encoder" and "Decoder" in separate class, and now maybe I
understand why, it's just because of saving the space. Because we
would not all the time need to bidirectional converting. For example,
some times, we just need to convert GBK to UTF32, but we don't need to
convert UTF32 to GBK, so we just need to load the one table of these
two.: But the current implement is storage these two table in the same
converter class. It's not so important at current time, but once we
need to optimize the falcon, It's should be considered:)
The new method of JDK7 tl is very similar to Giancarlo Niccolai's
idea every tell me. It's storage those data on the local driver, not
in memory we ever used. And once we need it, we load the table once,
and because these table is read only, And we don't need to load
multiple copy of these table.
And the new method implement two major method, One is SingleByte
and the other one is DoubleByte.
And ISO_8859_2 ~ ISO_8859_16 is belong to SingleByte, And GBK is
belong to DoubleByte.
And the implementation is very simple. I attached these two program:)

--
        此致

罗勇刚
Yonggang Luo

DoubleByte.java
SingleByte.java

Giancarlo Niccolai

unread,
Jul 24, 2009, 9:44:19 AM7/24/09
to FalconPL
Hello Yonggang!
Nice to hear from you!

On 24 Lug, 15:26, 罗勇刚(Yonggang Luo) <luoyongg...@gmail.com> wrote:
> At the last time, I implement the GBK charset converter in falcon. I
> used the method that used in JDK. At that time, I am curious why the
> charset converter of JDK7's repo is changing so much,  And many
> enocding was deleted from the repo, such as ISO_8859_2 ~ ISO_8859_16.

No, they're not deleted at all. Just, I moved UTF* classes in a place
where they can be instantiated directly, while all the other
transcoders can be created only through the factory function
TranscoderFactory. ISO-8859* are fully supported.


> And I have
> to get it from the early version of JDK7 repo. And now I know why,
> it's just because of JDK7 tl(Tools and Library) team is implementing
> an new method for the charset converter. And if so, the source code
> size of the charset converter is much smaller. I want to using this
> new method, even it's not so mature, but at least it's working:).

Fine; if we can spare some space, it's a welcome update.

>     And also, at that time, i don't know why JDK7 charset converter
> using the "Encoder" and "Decoder" in separate class, and now maybe I
> understand why, it's just because of saving the space.

A bit of "java-ism", probably. Being in separate source files, they
NEED to be separate classes.
IN C++ you don't need to do that (see the Virtual Machine which is
divided into vm.h (inlines), vm.cpp for the service methods and
vm_run.cpp for the main loop and opcode parsing).

Because we
> would not all the time need to bidirectional converting. For example,
> some times, we just need to convert GBK to UTF32, but we don't need to
> convert UTF32 to GBK, so we just need to load the one table of these
> two.: But the current implement is storage these two table in the same
> converter class. It's not so important at current time, but once we
> need to optimize the falcon, It's should be considered:)
>    The new method of JDK7 tl is very similar to Giancarlo Niccolai's
> idea every tell me. It's storage those data on the local driver, not
> in memory we ever used. And once we need it, we load the table once,
> and because these table is read only, And we don't need to load
> multiple copy of these table.
>    And the new method implement two major method, One is SingleByte
> and the other one is DoubleByte.
> And ISO_8859_2 ~ ISO_8859_16 is belong to SingleByte, And GBK is
> belong to DoubleByte.
> And the implementation is very simple. I attached these two program:)
>
> --

You're a well-respected developer in our project, so we do trust your
skills. If you think the changes are working (i.e. you can test them
and they work), just commit them to the trunk, and the other
developers will double check them too; if you think you need a bit of
support, open a svn branch and do your experimentation there, inviting
the other developers to test them.

(This is valid in general for all the developers in our project).

Gian.

Reply all
Reply to author
Forward
0 new messages