Hello!
Thank you for answer.
And yes, it might be that huge uncompressed dictionary is loading very
long time now, because of it is loading in memory record by record in
this version.
Please, try to gzip your .idx file and load the same dictionary again.
I'm adding .idx loading optimization in my v0.4 to do list.
--
Serge Matveenko
mailto:se...@matveenko.ru
http://serge.matveenko.ru/
> On Thu, Dec 18, 2008 at 10:54 AM, cocobear <cocob...@gmail.com>
> wrote:
> > Hi,
> > I found that it took more then 13 seconds to load a
> > dictionary:
> > -rw-r--r-- 1 root root 10651674 2003-11-14 langdao-ec-gb.idx
>
> Hello!
>
> Thank you for answer.
>
> And yes, it might be that huge uncompressed dictionary is loading very
> long time now, because of it is loading in memory record by record in
> this version.
> Please, try to gzip your .idx file and load the same dictionary again.
>
I tried this, but it's the same as ungziped.
> I'm adding .idx loading optimization in my v0.4 to do list.
>
>
I think it's very IMPORTANT, no one want to look up a word in 12
seconds.
it took only 0.017s in sdcv(http://sdcv.sourceforge.net).
I think we should make "lookup" in 1 second.
i agree
but this is important to understand that loading dictionary index and
looking up the word is two different operations
it will be great if you could modify demo.py from examples to run it
with your dictionary and then post results here
thank you for your help!
Ok, i've found after profiling that the longest time is needed for
unpacking data from records
I'm going to rewrite some code to making 1 unpack for record instead
of three. Also this rewrite will affect dropping some lists.
Than we could use NumPy's array interface
http://numpy.scipy.org/array_interface.shtml instead of unpack method
from struct module as i was advised by Alexey Smirnov.
rewrited. anyone can checkout 'speedup' tag.
we have now one big unpack instead of three small
i've got speedup from 4 seconds to 2.7 seconds on my PC
> Than we could use NumPy's array interface
> http://numpy.scipy.org/array_interface.shtml instead of unpack method
> from struct module as i was advised by Alexey Smirnov.
i will look at it later
>
> On Thu, Dec 18, 2008 at 2:06 PM, Serge Matveenko <se...@matveenko.ru>
> wrote:
> > On Thu, Dec 18, 2008 at 11:37 AM, Serge Matveenko
> > <se...@matveenko.ru> wrote:
> >> I'm adding .idx loading optimization in my v0.4 to do list.
> >
> > Ok, i've found after profiling that the longest time is needed for
> > unpacking data from records
> >
> > I'm going to rewrite some code to making 1 unpack for record instead
> > of three. Also this rewrite will affect dropping some lists.
>
> rewrited. anyone can checkout 'speedup' tag.
> we have now one big unpack instead of three small
> i've got speedup from 4 seconds to 2.7 seconds on my PC
>
1 dicts load: 0:00:10.023205
(5887265, 74)
1 cords getters: 0:00:00.000241
*[hi:]
pron. 他
n. 男孩, 男人, 雄性动物
【医】 氦(2号元素)
1 direct data getters (w'out cache): 0:00:00.114292
*[hi:]
pron. 他
n. 男孩, 男人, 雄性动物
【医】 氦(2号元素)
1 high level data getters (not cached): 0:00:00.113353
*[hi:]
pron. 他
n. 男孩, 男人, 雄性动物
【医】 氦(2号元素)
1 high level data getters (cached): 0:00:00.000213
About 2 seconds on my PC.
thank you for another test
it looks unbelievable slow especially on your really fast configuration
i will make test script via profiler to look deeper into problem
however 10 seconds is fair result for loading such big amount of data
with fields of various size into memory
Probably there is a better way.
there is no byte by byte file reading
there is only byte by byte parsing directly in memory of whole file
read at once into byte buffer
> Probably there is a better way.
i willl be glad to know it
>
> On Wed, Dec 24, 2008 at 8:58 AM, cocobear <cocob...@gmail.com>
> wrote:
> > I think read one byte by a time is wrong, even I did nothing, this
> > will take 5seconds to finished when reading a large dictionary.
>
> there is no byte by byte file reading
> there is only byte by byte parsing directly in memory of whole file
> read at once into byte buffer
>
It's what I mean, but I express it not exactly.
> > Probably there is a better way.
>
> i willl be glad to know it
>
I'm working on it.
>