Greetings all,
I did some profiling of the proposed changes to QueryData. I used one
of the test files in WordNet::Similarity (t/pairs.t) ... I ran the
following using the old (currently released) and new (proposed)
versions of QueryData...
perl -d:DProf t/pairs.t
dprofdd -u
We do see something of a slowdown with the new changes with even a
single test file, and so I'd be a bit concerned about having the
"noload" incorporated as a default (since most users of
WordNet::Similarity simply install QueryData without making
modifications, etc.)
My own experience of QueryData has been that typically there is one
load followed by lots of queries, and so loading the index has
generally been the right thing to do (and the cost in terms of RAM at
least is fairly negligible).
I suppose the goal in general is to "make the common case fast" - the
question then is whether or not a load followed by a few queries or a
load followed by many queries is the common case...from the point of
view of WordNet::Similarity at least, the common case is a single load
followed by many queries, so I think we'd be happiest if that remained
the default behavior of WordNet::QueryData.
The profiling results...
The old (currently released) version...
Total Elapsed Time = 78.50668 Seconds
User Time = 71.62668 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
34.9 25.01 31.700 295828 0.0001 0.0001
WordNet::QueryData::getSensePointers
12.7 9.133 42.404 304699 0.0000 0.0001 WordNet::QueryData::querySense
11.0 7.929 9.715 297028 0.0000 0.0000 WordNet::QueryData::getSense
4.70 3.368 3.368 851828 0.0000 0.0000 WordNet::QueryData::lower
4.62 3.312 4.222 199220 0.0000 0.0000 WordNet::QueryData::offset
3.85 2.760 2.760 3 0.9200 0.9200
WordNet::Similarity::ICFinder::configure
3.82 2.734 42.626 63623 0.0000 0.0007
WordNet::Similarity::hso::_getDownwardOffsetsPOS
3.70 2.652 2.793 23376 0.0001 0.0001 WordNet::QueryData::getWordPointers
3.55 2.540 2.540 2 1.2700 1.2700
WordNet::Similarity::DepthFinder::_processSynsetsFile
3.52 2.520 2.520 1 2.5200 2.5200 WordNet::QueryData::loadIndex
2.90 2.080 3.110 9 0.2311 0.3455 WordNet::Tools::new
2.68 1.920 50.714 189836 0.0000 0.0003 WordNet::Similarity::hso::_medStrong
1.44 1.030 1.030 36 0.0286 0.0286 WordNet::QueryData::listAllWords
1.37 0.979 0.979 301422 0.0000 0.0000 WordNet::QueryData::delMarker
1.20 0.860 3.740 23376 0.0000 0.0002 WordNet::QueryData::queryWord
The new (proposed) version ....
Total Elapsed Time = 205.4782 Seconds
User Time = 96.26824 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
36.2 34.92 43.965 295828 0.0001 0.0001 WordNet::QueryData::getSensePointers
12.6 12.20 58.032 304699 0.0000 0.0002 WordNet::QueryData::querySense
11.2 10.86 13.387 297028 0.0000 0.0000 WordNet::QueryData::getSense
4.41 4.244 4.244 851828 0.0000 0.0000 WordNet::QueryData::lower
4.33 4.171 5.401 199220 0.0000 0.0000 WordNet::QueryData::offset
4.12 3.967 58.153 63623 0.0001 0.0009 WordNet::Similarity::hso::_getDownw
ardOffsetsPOS
3.81 3.670 3.670 3 1.2233 1.2233
WordNet::Similarity::ICFinder::configure
3.70 3.566 3.702 23376 0.0002 0.0002 WordNet::QueryData::getWordPointers
3.45 3.320 3.320 1 3.3200 3.3200 WordNet::QueryData::loadIndex
3.34 3.220 3.220 2 1.6100 1.6100
WordNet::Similarity::DepthFinder::__processSynsetsFile
2.98 2.865 69.397 189836 0.0000 0.0004
WordNet::Similarity::hso::_medStrong
2.84 2.730 3.870 9 0.3033 0.4300 WordNet::Tools::new
1.45 1.399 1.399 301422 0.0000 0.0000 WordNet::QueryData::delMarker
1.16 1.120 1.120 36 0.0311 0.0311 WordNet::QueryData::listAllWords
1.10 1.055 4.895 23376 0.0000 0.0002 WordNet::QueryData::queryWord
We actually ran into this issue with the command line interface to
WordNet::Similarity (
similarity.pl) some time ago, that is what to do
about the load times. We actually created an interactive mode for
similarity.pl that loads the database once and then lets the user
proceed in an interactive session without reloading. We also created a
--file option that lets a user load a number of different pairs of
concepts all at once (and thereby only requires a single load of the
database...) So, I'm wondering if there might not be a solution like
that which would work for a command line oriented application?
Anyway, WordNet::Similarity is here :
http://search.cpan.org/dist/WordNet-Similarity/
and
similarity.pl is here...
http://search.cpan.org/dist/WordNet-Similarity/utils/similarity.pl
Just some thoughts, very much from the WordNet::Similarity point of
view...I'll be curious to hear what other users have to say about how
they are using QueryData, and what they might have done about this
database load issue in the past (and what they think about the
future...)
Thanks!
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse