Can anyone suggest one or two good compression for dictionary
database?
Thanks alot!
A prefix tree, since dictionary entries usually come sorted,
and many words in a dictionary are prefixes of others.
> Thanks alot!
Thanks for advise!
How about using LZW then use prefix tree searching algorithm?
Since many papers said that LZW is the best compression for text data.
Btw, is prefix tree a compression algorithm or searching algorithm? (or both?)
Thanks in advance!
Regards,
Sara
No, it isn't even remotely so. The markov-modeling methods combined
with arithmetic coders are the best performing at the moment.
But typically to do LZW or LZ78 compression algorithms, it is
common (but not essential) to build a prefix tree to implement them.
> Btw, is prefix tree a compression algorithm or searching algorithm? (or both?)
It is a data structure which supports efficient algorithms to perform
queries.
Once you construct one, it is easy to search whether any given word
is in the structure or not.
The point is that if you store a dictionary it is usually in alphabetical
order, and you want to search it in alphabetical order.
You do not have to compress arbitrary sequences of words in arbitrary
orders, and thus you can gain.
Then you need an efficient way to serially encode a prefix tree
but that should not be hard at all.
Yes, quite true; in fact, to the OP, what papers actually said
that LZW is the best compression for text data? I'd be interested to take
a look at one or two, just for interest sake...
Ray
Thanks very much!
To do what?
Generally you have to say what words are in the dictionary, write
algorithms to search it, and write algorithms to serialize and
deserialize it efficiently if you want to 'compress' a dictionary.
You may need a slight modifcation to a prefix tree.
In a prefix tree, for every word which is in a tree, all prefixes of
it are also in a tree. Like if ABC is in a tree then AB and A is in a
tree. Think of a tree with nodes descending for each extant character
successively forward in sequence. (a suffix tree goes backwards).
In a list of words like a dictionary of natural language, it is common
for many words which are in the dictionary to have prefixes which are
also in the dictionary, but it is not always so.
You will probably need additional indicators, maybe a boolean flag per
node to say 'yes this element in the tree is also in the dictionary'.
Thus the nodes of the prefix tree will be a superset of the words
in the dictionary.
The hope is that given the empirical structure of dictionaries of
natural language, this excess will not be too significant.
> Do you know where can I get the information or theory? (e.g. web
> sites) Since I can hardly find them on internet...
I think you may need a textbook on algorithms. A prefix tree is
fairly simple.
There are probably lecture notes on line which may get you started but
a real book is better.
> Thanks very much!
"caprice" <sara_...@hotmail.com> wrote in message
news:dd26c21.03082...@posting.google.com...
well, it's faster compared to some other techniques and is commonly used.
maybe that's what is meant by 'best'. 'best' is pretty subjective.
I've borrowed the book today and started to read.
Actually I'm really not familiar with those compression technique and
searching algorithm.
From your professional point of view, for a dictionary, is that
different algorithms make a big difference on performance? (in terms
of searching rate)
As I have to include background study in my report (school
assignment), I wonder if I should study and compare all kinds of
algorithms!
pardon me my poor English..