noload?

6 views
Skip to first unread message

csaba

unread,
Jan 30, 2010, 1:04:07 PM1/30/10
to wn-perl
I have a strange problem because WordNet::QueryData seems to work
fine, except I see no difference at all in the startup time if I
include noload => 1.
I have upgraded to the latest version as far as I can see:
WordNet::QueryData is up to date (1.49).

I am on a Mac OSX Snow Leopard.

I tried with verbose => 1
and I get this:

Verbose = 1
Noload = 1
Loading WordNet data...
(loadIndex) at /Library/Perl/5.10.0/WordNet/QueryData.pm line 353.
(openData) at /Library/Perl/5.10.0/WordNet/QueryData.pm line 385.

"Noload" followed by "Loading"!

Am I missing something?

Danny Brian

unread,
Jan 30, 2010, 4:28:10 PM1/30/10
to wn-...@googlegroups.com
You're probably expecting more of a difference than you're seeing,
especially if you're running a fast computer. You'd need to time it to
see the difference. On my MacBook Pro loading indexes (noload => 0)
takes a little over 2 seconds.

You'll still get the loadIndex verbose message in either mode.

> --
> You received this message because you are subscribed to the Google Groups "wn-perl" group.
> To post to this group, send email to wn-...@googlegroups.com.
> To unsubscribe from this group, send email to wn-perl+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/wn-perl?hl=en.
>
>

Danny Brian

unread,
Jan 30, 2010, 4:39:34 PM1/30/10
to wn-...@googlegroups.com
Run this:

use WordNet::QueryData;
use Benchmark;

timethese ( 1, {
'noload => 0' => sub { $wn = WordNet::QueryData->new( noload => 0 ) },
'noload => 1' => sub { $wn = WordNet::QueryData->new( noload => 1 ) }
} );

csaba

unread,
Jan 31, 2010, 6:41:29 AM1/31/10
to wn-perl
That's cute, thanks for that :-)

I am not a Perl programmer (I hate to admit) so it is nice to pick up
little tips. Anyway I ran it and got this:

Benchmark: timing 1 iterations of noload => 0, noload => 1...
noload => 0: 3 wallclock secs ( 2.77 usr + 0.03 sys = 2.80 CPU) @
0.36/s (n=1)
(warning: too few iterations for a reliable count)
noload => 1: 0 wallclock secs ( 0.22 usr + 0.01 sys = 0.23 CPU) @
4.35/s (n=1)
(warning: too few iterations for a reliable count)

I guess this confirms that noload is working?

But the full query script takes about the same time to execute with
noload set to 0 or 1 (about 6 seconds). I guess this means that the
initialization time is negligible compared to the execution of the
full script. But here is the problem. A friend who is a good Perl
programmer wrote a wrapper to my script that basically makes it
initialize WordNet::QueryData and wait for input on a socket on
localhost. This takes input from a client script, and runs then runs
the "server" script (which is basically the same queries as the
original program). This returns the result almost instantly!

My understanding is that loading he wordnet index does not speed up
queries that much, so some sort of time consuming initialization must
still be happening with noload => 1 ???

Ben Haskell

unread,
Jan 31, 2010, 11:23:37 AM1/31/10
to wn-perl
On Sun, 31 Jan 2010, csaba wrote:

> That's cute, thanks for that :-)
>
> I am not a Perl programmer (I hate to admit) so it is nice to pick up
> little tips. Anyway I ran it and got this:
>
> Benchmark: timing 1 iterations of noload => 0, noload => 1...
> noload => 0: 3 wallclock secs ( 2.77 usr + 0.03 sys = 2.80 CPU) @
> 0.36/s (n=1)
> (warning: too few iterations for a reliable count)
> noload => 1: 0 wallclock secs ( 0.22 usr + 0.01 sys = 0.23 CPU) @
> 4.35/s (n=1)
> (warning: too few iterations for a reliable count)
>
> I guess this confirms that noload is working?
>
> But the full query script takes about the same time to execute with
> noload set to 0 or 1 (about 6 seconds). I guess this means that the
> initialization time is negligible compared to the execution of the
> full script. But here is the problem. A friend who is a good Perl
> programmer wrote a wrapper to my script that basically makes it
> initialize WordNet::QueryData and wait for input on a socket on
> localhost. This takes input from a client script, and runs then runs
> the "server" script (which is basically the same queries as the
> original program). This returns the result almost instantly!
>
> My understanding is that loading he wordnet index does not speed up
> queries that much, so some sort of time consuming initialization must
> still be happening with noload => 1 ???

When using the WordNet::Similarity modules, I did the same thing your
friend did (wrote a wrapper to serve WN::QD data via a socket). It was
before the 'noload' parameter existed, though.

As the output of that test indicates, one iteration isn't enough to get
a reliable comparison. Replace the following:

use Benchmark; -> use Benchmark ':all';
timethese(1, -> cmpthese(-30,

to have each test run as many times as possible in ~30 seconds, and give
(IMO) a more useful output for comparison. (See perldoc Benchmark for
more info.)

On my machine, for example, that shows an almost-100% speedup:

s/iter noload=>0 noload=>1
noload=>0 2.94 -- -100%
noload=>1 1.64e-04 1796315% --

(With noload=>1, it takes an average of .1 *milli*seconds, whereas
noload=>0 takes an average of 2.94 seconds. The percentages indicate
that noload=>0 is almost 100% slower than noload=>1, or that noload=>1
is ~18,000 times faster than noload=>0.)

Unless I'm mistaken about what noload=>1 does, even based on your
Benchmark run you should see a difference of about 2.5 seconds.

It might be interesting to see what you're doing in the full script. My
off-hand guess is that the socket-based approach has better disk-cache
characteristics (i.e. running for a long time, the entire WordNet data
files could be in cache). Do you get different results if you run the
non-server script back-to-back? (So that the portions of the data files
that your script accesses might be in cache after the first run.)

--
Best,
Ben Haskell

Danny Brian

unread,
Jan 31, 2010, 2:19:42 PM1/31/10
to wn-...@googlegroups.com
What Ben said.

> My understanding is that loading he wordnet index does not speed up
> queries that much, so some sort of time consuming initialization must
> still be happening with noload => 1 ???

Yes, unless I misunderstand your question. See the perldoc:

http://search.cpan.org/dist/WordNet-QueryData/QueryData.pm#CACHING_VERSUS_NOLOAD

It gives sample benchmarks for various queries, as well as initialization.

Any object constructor is going to consume *some* time. It's code, after all.

csaba

unread,
Feb 2, 2010, 2:07:10 PM2/2/10
to wn-perl
Yes, my results are similar:

s/iter noload => 0 noload => 1
noload => 0 3.05 -- -100%
noload => 1 2.20e-04 1385399% --

I am running a MacBook Pro with 2.53 gig core duo. I am going to try
it on my i7 powered Linux server next :-)

So I guess I am surprised at the difference when the script is running
as a service.

The script itself is pretty basic. It takes either one argument or
two.
The first argument is a word, and the script returns every synset the
word appears in, the gloss, and the frequency.
The optional second argument is a word sense, and the script returns
all the information as above, except with the addition of the
relatedness measure to the word sense provided by the second
argument.

Oh and running back to back seems to have no effect.

Danny Brian

unread,
Feb 2, 2010, 2:15:34 PM2/2/10
to wn-...@googlegroups.com
> So I guess I am surprised at the difference when the script is running
> as a service.

I'm not following your concerns at all. Why are you surprised by this?
Queries are very fast. Starting an interpreter and loading a database
are not. A service has already started and done whatever
initialization is needs, so it avoids the cost of anything other than
the query.

Maybe you could explain what you expect to have happen, since your own
benchmarks show that everything is working correctly. There is no way
your script should take the same length of time running with either
noload => 0|1, per your own data.

csaba

unread,
Feb 2, 2010, 2:22:35 PM2/2/10
to wn-perl
O.K. .. I just realized something which if true, would make me feel
really dumb!

In my script I use

WordNet::Similarity::vector_pairs;

I wonder if that has startup issues?

sidd

unread,
Feb 2, 2010, 2:27:46 PM2/2/10
to wn-...@googlegroups.com
Yes... vector_pairs has its own initialization (loads a bunch of word
vectors into memory), which most likely accounts for the startup time.

-- Sid.

csaba

unread,
Feb 3, 2010, 4:00:04 AM2/3/10
to wn-perl
Thanks for all the comments everyone.

Danny, the reason I was puzzled was that I (stupidly perhaps) assumed
that the loadup time for WordNet::Query was by far the most time
consuming, since that is where the large database is loaded. Then,
using noload => 1 should have avoided this. As you point out, it does
in fact do this, but the other initialization processes still need to
be carried out.

I guess that answers my concerns.

Thanks again everyone for really helpful comments.

Danny Brian

unread,
Feb 3, 2010, 1:37:07 PM2/3/10
to wn-...@googlegroups.com
Not stupid.

You can trust benchmarks and profiling. Enclose blocks of your code in
timethis() to see how long they take and where problem areas might
occur.

Regards,
Danny

Reply all
Reply to author
Forward
0 new messages