RLE performs so poorly compared to modern compression algorithms, that
I don't think there is much interest in them today.
Why don't you just use regular zip (deflate) algorithm, which is
build into Java.
Arne
That could just as well have been in Chinese - I completely don't get it.
Arne
Arne Vajhøj wrote:
> That could just as well have been in Chinese - I completely don't get it.
+1
Also, librtchx, please use "reply" to answer posts instead of beginning a new
thread each time. And quit with the tildes if that's not too much to request.
--
Lew
Ceci n'est pas une pipe.
Again, you couldn't use "reply" to keep the thread intact?
Why not?
ATTRIBUTE YOUR CITATIONS!
Arne Vajhøj wrote:
>> That could just as well have been in Chinese - I completely don't get it.
lbrt chx _ gemale kom wrote:
> OK. The technical term (someone's) "finger" (or was it "hand"?)
> was coined during WW2 by "intelligence" services wiretapping each
> other's morse [sic] code messages. They didn't know the actual persons
> on the other end but they did know "their finger" by the slight, but
> very much identifying, difference in which they tapped on their telegraph,
The term is "fist", not "finger".
http://en.wikipedia.org/wiki/Fist
http://en.wiktionary.org/wiki/fist #3
Perhaps you're thinking of
http://en.wikipedia.org/wiki/Finger_(gesture)
It is not clear what value RLE provide in this context.
Arne
On 12/26/2010 09:32 PM, lbrt chx _ gemale kom wrote:
> Actually, I just coded an implementation myself, but as I was testing my code, I got an Exception that to me looks like a bug more than an error from myself
> ~
Stupid tildes.
Since you don't show the code "from yourself" we have to remain suspicious of
your conclusion.
> Isn't java.lang.StringBuffer supposed to do its internal buffering by itself?
As it does, but one question - why are you using 'StringBuffer' instead of
'StringBuilder' instead of letting the JVM handle it?
> ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
> ~
You sure do love your tildes, don't'cha there, sport?
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 16
> at java.lang.AbstractStringBuilder.insert(AbstractStringBuilder.java:979)
> at java.lang.StringBuffer.insert(StringBuffer.java:446)
> at AUtils00Test.main(AUtils00Test.java:104)
Show us your code. USE REPLY! If it isn't /too/ much trouble. Yeesh!
lbrt chx _ gemale kom wrote:
>> Actually, I just coded an implementation myself, but as I was testing my
>> code, I got an Exception that to me looks like a bug more than an error from
>> myself
>> ~
Lew wrote:
> Since you don't show the code "from yourself" we have to remain suspicious of
> your conclusion.
lbrt chx _ gemale kom wrote:
>> Isn't java.lang.StringBuffer supposed to do its internal buffering by itself?
Had you actually read the API documentation for the 'StringBuffer#insert()'
family of methods, you'd've noticed that they're *documented* to throw
'IndexOutOfBoundsException' under certain usage patterns. Does your code use
those patterns?
Lew wrote:
> As it does, but one question - why are you using 'StringBuffer' instead of
> 'StringBuilder' instead of letting the JVM handle it?
...
It is supposed to increase the buffer size by itself when you append.
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 16
> at java.lang.AbstractStringBuilder.insert(AbstractStringBuilder.java:979)
> at java.lang.StringBuffer.insert(StringBuffer.java:446)
> at AUtils00Test.main(AUtils00Test.java:104)
> ~
> /java/lang/StringBuffer.java's insert should "ensureCapacity_unsynchronized" before doing its business as it does for all other methods; which you can see from:
It should expand the buffer if necessary to hold the extra
characters being pushed out by the insert.
If you try and insert beyond the end it is supposed to throw
an exception.
> http://www.opensource.apple.com/source/gccfast/gccfast-1622/libjava/java/lang/StringBuffer.java
> ~
> /** Insert the<code>String</code> argument into this<code>StringBuffer</code>.
> * @param offset the place to insert.
> * @param str the<code>String</code> to insert.
> * @return this<code>StringBuffer</code>.
> * @exception IndexOutOfBoundsException if<code>offset</code> is out
> * of range for this<code>StringBuffer</code>.
which is documented here!
> */
> public synchronized StringBuffer insert (int offset, String str)
> {
> if (offset< 0 || offset> count)
> throw new StringIndexOutOfBoundsException (offset);
and done here!
> // Note that using `null' is from JDK 1.2.
> if (str == null)
> str = "null";
> int len = str.length();
> ensureCapacity_unsynchronized (count+len);
And making this call *after* the test means that the
implementation follow what it is documented to do (and
not your expectations).
> System.arraycopy(value, offset, value, offset+len, count-offset);
> str.getChars(0, len, value, offset);
> count += len;
> return this;
> }
Arne
RLE is quite easy to implement in Java.
You might look into a more sophisticated version using controlbytes or
control sequences.
You chose the byte that is the least likely to occur in the data stream
as control byte and encode runs like follows ("#" being the
controlbyte):
#12WB#12W#3B#24WB#14W
instead of: 12W1B12W3B24W1B14W
That approach will waste some bytes for each run, but it will save on
random data, because you'd get:
WBWBWBWBWBWBWB
Instead of: 1W1B1W1B1W1B1W1B1W1B1W1B1W1B
Controlbytes itself can be encoded like: "#1").
An even shorter method is to substitute the controlbytes by using the
first two characters of a run to act as control sequence, so you'd get:
WW10BWW10BB1WW22BWW12
Any time your decoder finds two subsequent characters, it assumes the
next byte is a count for the same characters to follow.
Controls bytes offer more flexibility though, because for 2-byte-runs
you might introduce a second controlbyte for another coding scheme.
Imagine 2-byte runs:
$7WB
Anyway, since it's hard to know the optimum chunk-size, which may be
locally different, you're usually better off with using an LZSS
(Lempel,Ziv,Storer,Szymanski variant of LZ77 with controlbyte)
for anything else but simple runs.
LZSS encodes using references to previous ocurrences, using a
controlbyte:
The whiskymixer mixes whisky in a whiskymixer.
becomes
The whiskymixer $6,4s$18,7 in a$30,12.
where
$6,4
reads: copy from cursor position minus 6, 4 characters.
Any sequence smaller than 3 bytes gets ignored, any controlbyte gets
encoded as $1.
The end of file can be denoted by a zero offset or a zero run length:
#0 or $0 respectively.
Please note that all numbers in the schemes are raw values, no text, and
that the comma I used in the LZSS sequences is just a separator for the
human eye.
So you could imagine a hybrid approach using different controlbytes (or
"escapes", as they are also called) for different encodings:
#=RLE (run length, byte)
$=RLE2 (run length, 2 bytes)
%=LZSS (displacement, n)
&=LZSS+run (displacement, n, run length)
There have been several compressors which use a hybrid approach and more
sophisticated encoding of the encoding sequence, for example with Elias
Gamma codes, which encode near sequences with less bits than far
sequences.
Be warned that LZSS is not quite the fastest way to compress data, but
it's rather easy to implement and decompression is tivial.
This could be an interesting read on a hybrid approach, there is:
http://www.cs.tut.fi/~albert/Dev/pucrunch/
http://www.cs.tut.fi/~albert/Dev/pucrunch/packing.html
The latter being a link with good explainations to different algorithms.
As as alternative you might like to dive into arithmetic coding:
http://en.wikipedia.org/wiki/Arithmetic_coding
http://www.dogma.net/markn/articles/arith/part1.htm
http://www.colloquial.com/ArithmeticCoding/
With the first link being a more abstract explanation, the second link
being a very good introduction with some c-sourcecode and the last one
some Java implementation.
Kind regards,
Wanja
--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---
Why LZSS and not LZH?
I would expect LZH to always give better compression and LZH comes
with Java so no need for a third party library.
(OK - deflate is a not a simple LZH but has some tweaks, but those
tweaks should not be a problem)
Arne
LZH ist, as far as I know, a hybrid of LZSS and adaptive Huffman coding,
which means that the escape sequence, the reference and the message
itself are decoded from an apadptive huffman tree. Since I guess the
original poster is quite new to the topic, I would not recommend to try
this for a start.
LZH is, as far as I know, implemented in Winzip. Java already knows the
ZipOutputStream and the ZipInputStream. No need to reinvent the wheel.
;-)
Kind Regards,
Exactly.
Even a new to the topic could use it.
Arne
> > // Note that using `null' is from JDK 1.2.
> > if (str == null)
> > str = "null";
> > int len = str.length();
> > ensureCapacity_unsynchronized (count+len);
> And making this call *after* the test means that the
> implementation follow what it is documented to do (and
> not your expectations).
~
Arne you tease me (as many other people) telling me I code java as if I was coding C, but here I think you are talking like a politician ;-)
~
I do understand it is per spec as part of the implementation of java.lang.StringBuffer being append("...") its primary method, but to me it comes as more natural to do the buffering strategy first in any case if possible or throw a run time exception if is not. But then again it is my opinion. To me StringBuffer is just a facade to an underlying character buffering strategy
~
I just went ahead and coded the buffering myself in a C-style way
~
lbrtchx
Why do you not answer the questions pertaining to your posts?
lbrt chx _ gemale kom wrote:
(Why did you not attribute the citation? Especially without the threading
provided by "reply", it's nearly impossible to follow the thread the way you
post.)
[attribution restored] Arne Vajhøj wrote:
>> And making this call *after* the test means that the
>> implementation follow what it is documented to do (and
>> not your expectations).
lbrt chx _ gemale kom:
> Arne you tease me (as many other people) telling me I code java [sic]
> ~as if I was coding C, but here I think you are talking like a politician ;-)
What does that mean, "talking like a politician"? He's telling you that the
code works as it's documented to do. That's talking like an engineer.
> I do understand it is per spec as part of the implementation of
> ~java.lang.StringBuffer being append("...") its primary method,
> but to me it comes as more natural to do the buffering strategy first
> in any case if possible or throw a run time exception if is not.
It's vital that the code work as documented, hence why it's coded that way.
> But then again it is my opinion. To me StringBuffer is just a facade to an
> underlying character buffering strategy
And what is 'StringBuilder' to you?
> ~I just went ahead and coded the buffering myself in a C-style way
Sure, because re-inventing the wheel is always better, particularly when you
cannot bring yourself to use the standard API as it's documented to work.
You never did show us your code that had the bug, despite our requests. Now
that's participatory!
Lars Enderin wrote:
> He doesn't seem to use a normal newsreader. Maybe he's written it
> himself? I found the header X-Newsreader: NetComponents.
> When I google that, the first hit refers to a Java example on using
> PostMessage.java. Whatever program he's using may not be able to post a
> proper reply.
Interesting speculation. I wonder if the OP would confirm that if he ever
/actually answered the question/.
If true, then it's incumbent on the OP to use one of the plethora of readers,
many free, that do support "reply". It ain't much of a newsreader if it
doesn't, and librtchx would be well-advised in their own interest to find a
way to keep the threads connected.
Between that and refusal to answer germane questions, the OP might find
themself alone on the playing field. This is a discussion group, not a bunch
of lackeys whose job it is to hunt around for connected messages to help a
non-participatory librtchx.
> >> Why LZSS and not LZH?
[..]
> > LZH is, as far as I know, implemented in Winzip. Java already knows
the
> > ZipOutputStream and the ZipInputStream. No need to reinvent the wheel.
>
> Exactly.
>
> Even a new to the topic could use it.
Well, but that's boring isn't it? If you like to get into the topic, why
not start with something simple?
I find teh compression stuff really interesting!
Cheers,
-Wanja-
I did too. 20 years ago. Implementing LZW and LZSS in Fortran
is so fun.
:-)
Arne
Perhaps comp.compression or comp.compression.research would be more
useful to you than comp.lang.java.programmer or alt.undersea.baking ...
--
Eric Sosman
eso...@ieee-dot-org.invalid
I know comp.compression, no worries.
Kind regards
Wanja
Then what are you doing here? No, don't explain: Just go away.
--
Eric Sosman
eso...@ieee-dot-org.invalid
I believe that he answered a question that somebody posted
at cljp.
An answer that did talk about Java.
The thread then drifted a little bit off topic, but that
is seen before.
Arne
> Then what are you doing here? No, don't explain: Just go away.
Eric, have you been taking Lew's pills ?!
-- chris
Chris Uppal wrote:
> Eric, have you been taking Lew's pills ?!
I have no beef with Wanja. Nor would I say that algorithm discussions are off
topic here. Don't Java programmers need to know about algorithms?
I am a little disappointed in you, Chris.
> >>> [...]
> >>> I find teh compression stuff really interesting!
> >>
> >> Perhaps comp.compression or comp.compression.research would be more
> >> useful to you than comp.lang.java.programmer or alt.undersea.baking ...
> >
> > I know comp.compression, no worries.
>
> Then what are you doing here? No, don't explain: Just go away.
Well, Eric, I guess other readers of this newsgroup will judge better
whose manners are inappropriate here.
As for me, I've had enough of you. "Thank you" for wasting my time.
*plonk*
-Wanja-
Mistake.
Eric is not like that in general and often provides
good information.
I suspect that he got you mixed up with somebody else.
Arne