Decling relationship autoloading on first class use

10 views
Skip to first unread message

Kevin

unread,
Oct 20, 2009, 12:00:48 PM10/20/09
to Rose::DB::Object
Hi everyone - we are experimenting with moving to Rose::DB as the
backend for our large perl-based database system. We have roughly 500
tables and track samples through a biology lab using an event-based
transitioning system. We have been impressed with the ease of
managing conventions and metadata, but have the following problem that
auto_load_related_classes() doesn't seem to help.

Because about 80% of our entities are linked to our event recording
tables, we are running into trouble because as soon as we use the
event class in any fashion, it loads all relationships. We have used
auto_load_related_classes, which allows for us to "use Event;" without
too much overhead, but even "Event->isa('X');" triggers the
relationship generation that load >300 other classes and takes 15
seconds to complete.

Is there a way to say, "Don't load relationships until they are
actually requested"?

Kevin Crouse
kcrou...@gmail.com

Abaddon Daemon

unread,
Oct 20, 2009, 12:17:05 PM10/20/09
to rose-db...@googlegroups.com
I solved this problem in a slightly different way.

In my apache configuration for our site, we have this section:

PerlModule            ICA::Startup
PerlOpenLogsHandler   ICA::Startup::open_logs
PerlPostConfigHandler ICA::Startup::post_config
PerlChildInitHandler  ICA::Startup::child_init
PerlChildExitHandler  ICA::Startup::child_exit

ICA::Startup is a class that we have. Each of these responds to a section. The relevant section is post_config, which looks like this:


  package ICA::Startup;

  use strict;
  use warnings;

  use Apache2::Log ();
  use Apache2::ServerUtil ();

  use Fcntl qw(:flock);
  use File::Spec::Functions;

  use Apache2::Const -compile => 'OK';

  my $log_path = catfile Apache2::ServerUtil::server_root,
      "logs", "startup_log";
  my $log_fh;

... some functions ...

  sub post_config {
      use ICA::DB::Object::Classes;
      use CGI();
      CGI->compile();

      my ($conf_pool, $log_pool, $temp_pool, $s) = @_;
      say("configuration is completed");
      return Apache2::Const::OK;
  }

... more functions ...

  sub say {
      my ($caller) = (caller(1))[3] =~ /([^:]+)$/;
      if (defined $log_fh) {
          flock $log_fh, LOCK_EX;
          printf $log_fh "[%s] - %-11s: %s\n",
              scalar(localtime), $caller, $_[0];
          flock $log_fh, LOCK_UN;
      }
      else {
          # when the log file is not open
          warn __PACKAGE__ . " says: $_[0]\n";
      }
  }

  my $parent_pid = $$;
  END {
      my $msg = "process $$ is shutdown";
      $msg .= "\n". "-" x 20 if $$ == $parent_pid;
      say($msg);
  }


The file ICA::DB::Object::Classes contains an enumerated list of any classes that we want to be loaded into our apache threads as they spawn. This causes all perl compiling to be done during thread creation time, rather than during user request time. This doesnt just offshore the buildup time until the user needs it, but it completely removes it from the user experience. =)

We've had quite a bit of success with this.

John Siracusa

unread,
Oct 20, 2009, 1:55:48 PM10/20/09
to rose-db...@googlegroups.com
On Tue, Oct 20, 2009 at 12:00 PM, Kevin <kcrou...@gmail.com> wrote:
> Is there a way to say, "Don't load relationships until they are
> actually requested"?

No, and if you think about it, it'd be pretty complex to do since
"loading relationships" really means creating new methods and loading
other classes which themselves have relationships to other classes and
will need methods created and so on. It's pretty easy to end up
triggering a massive class load and method creation spree with, say, a
single Manager query.

Even for 500 classes, a 15-second load time seems way too long. Have
you profiled it? What's the bottleneck? Disk? CPU?

-John

Kevin

unread,
Oct 20, 2009, 3:46:25 PM10/20/09
to Rose::DB::Object
Hi John,

It's not likely that we'd have a manager query that would need to
trigger the loading of more than 10% of our tables. Such a thing
would likely be a wide ranging quality or usage report, and we have an
olap database for such things. Is there a way to generate a static
class file (after the meta class makes all of its methods) instead of
having all of the Object classes rely on dynamic method generation? I
expect that would greatly speed up our performance as well.

We've been doing some more investigation and made some progress. It
pegs a CPU at 100%.

Initially we were using Class::Autouse->autouse() for the set of table-
based classes, and this led to the 15 s runtime on first use of the
Event class, and almost the entirety of this time happening during
Metaclass->add_relationship(). I was perusing your Metadata.pm code
and I'm wondering if it doesn't play nicely with Class::Autouse
because autouse registers the potential classes in the same place that
Rose looks to see if the class is already loaded.

When we simplified our test, removed autouse, and we just used a
single "use" statement for the Event class, it takes 5 s, which is
okay for our persistent applications but not ideal for the arsenal of
quick response programs and scripts. This test does not include
overhead to connect to the database. When we profile this test
script, we find that 700 calls to retry_deferred_foreign_keys()
inclusively accounts for roughly 1/2 of the total time, and .87s not
including function calls it makes; 140,000 calls to meta() and dynamic
functions in MakeMethods::Generic appears to take over 2s as well.
Although we use nytprof, I have included the output from DProf since
it is easier for your to view it. Most of the methods below are also
listed on the 'top 10' list, but the calculated time is actually quite
different.

Total Elapsed Time = 4.653712 Seconds
User+System Time = 4.503712 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
11.3 0.510 0.767 372248 0.0000 0.0000
Rose::Object::MakeMethods::Generic::__ANON__
8.17 0.368 0.626 736 0.0005 0.0008
Rose::DB::Object::Metadata::retry_deferred_foreign_keys
7.46 0.336 0.432 4396 0.0001 0.0001 Rose::Object::new
6.71 0.302 0.462 2442 0.0001 0.0002
Rose::DB::Object::MakeMethods::Generic::object_by_key
6.11 0.275 0.305 1451 0.0002 0.0002
Rose::DB::Object::MakeMethods::Generic::scalar
5.53 0.249 5.792 368 0.0007 0.0157
Rose::DB::Object::Metadata::make_relationship_methods
5.02 0.226 0.493 3846 0.0001 0.0001
Rose::DB::Object::Metadata::Column::apply_method_triggers
4.66 0.210 0.210 20394 0.0000 0.0000
Rose::Class::MakeMethods::Generic::__ANON__
3.77 0.170 0.175 139594 0.0000 0.0000 Rose::DB::Object::meta
3.44 0.155 2.021 3257 0.0000 0.0006
Rose::DB::Object::Metadata::MethodMaker::make_methods
3.37 0.152 0.534 5570 0.0000 0.0001
Rose::DB::Object::Metadata::MethodMaker::method_maker_arguments
2.69 0.121 5.809 368 0.0003 0.0158
Rose::DB::Object::Metadata::make_foreign_key_methods
2.38 0.107 0.115 1638 0.0001 0.0001
Rose::DB::Object::MakeMethods::Generic::objects_by_key
2.24 0.101 0.099 93593 0.0000 0.0000
Rose::DB::Object::Metadata::Column::name
2.18 0.098 0.098 24369 0.0000 0.0000 UNIVERSAL::can


Thanks,
Kevin Crouse

On Oct 20, 12:55 pm, John Siracusa <sirac...@gmail.com> wrote:

John Siracusa

unread,
Oct 20, 2009, 4:02:28 PM10/20/09
to rose-db...@googlegroups.com
On Tue, Oct 20, 2009 at 3:46 PM, Kevin <kcrou...@gmail.com> wrote:
> Is there a way to generate a static class file (after the meta class makes all
> of its methods) instead of having all of the Object classes rely on dynamic
> method generation?

The methods are created using anonymous subs that act as closures,
capturing values of surrounding lexical variables I'm not aware of
any good way to serialize such subs into source code, though I believe
there are some experimental approaches on CPAN.

> Initially we were using Class::Autouse->autouse() for the set of table-
> based classes, and this led to the 15 s runtime on first use of the
> Event class, and almost the entirety of this time happening during
> Metaclass->add_relationship(). I was perusing your Metadata.pm code
> and I'm wondering if it doesn't play nicely with Class::Autouse
> because autouse registers the potential classes in the same place that
> Rose looks to see if the class is already loaded.

Which place is that? %INC? (I'm not familiar with Class::Autouse.)

> When we simplified our test, removed autouse, and we just used a
> single "use" statement for the Event class, it takes 5 s, which is
> okay for our persistent applications but not ideal for the arsenal of
> quick response programs and scripts.

RDBO is not a good fit for "non-persistent" environments where you
need very fast process startup time. Its design is slanted heavily
towards doing a lot upfront (and trading memory usage for speed) which
is a good fit for persistent, load-in-parent-then-fork environments
like mod_perl. There are, however, ways to get this same kind of
performance outside of mod_perl by wrapping your scripts in a
persistence layer so they, too, only have to load once. I forget the
modules, but I think Perrin(?) posted a few to this list earlier. If
not, maybe search or ask on stackoverflow.com.

> This test does not include overhead to connect to the database. When we
> profile this test script, we find that 700 calls to
> retry_deferred_foreign_keys() inclusively accounts for roughly 1/2 of the
> total time, and .87s not including function calls it makes; 140,000 calls to
> meta() and dynamic functions in MakeMethods::Generic appears to take over 2s
> as well. Although we use nytprof, I have included the output from DProf since
> it is easier for your to view it. Most of the methods below are also listed
> on the 'top 10' list, but the calculated time is actually quite different.

Those call count numbers don't surprise me, but there could be room
for some optimization. Have you looked into the relevant code?

-John

Kevin

unread,
Oct 20, 2009, 4:35:22 PM10/20/09
to Rose::DB::Object
Hi John,

Thanks for the fast responses.

> The methods are created using anonymous subs that act as closures,
> capturing values of surrounding lexical variables. I'm not aware of
> any good way to serialize such subs into source code

We're very familiar with this as well :).

> Which place is that?  %INC?  (I'm not familiar with Class::Autouse.)

Yes - I believe it routes the listed classes to itself and then loads
the class on first use (there is also a directory-recursive function
to do an entire module tree at a time). We find it useful because the
users are so heavily segmented and our frameworks are very general.

> like mod_perl.  There are, however, ways to get this same kind of
> performance outside of mod_perl by wrapping your scripts in a
> persistence layer so they, too, only have to load once.  

This is definitely possible, and we probably have all of the
infrastructure already there to handle it. It doesn't had a lot of
support in the developer group, however. There's a certain
decentralized unixian ethic that opposes it and relies on stringing
together simple and fast programs to get immediate ad-hoc production
information. We'll see what we can come up with.

> Those call count numbers don't surprise me, but there could be room
> for some optimization.  Have you looked into the relevant code?

Not too deeply yet, though I hope to in the upcoming weeks. We're
doing a survey of the more mature ORMs to replace an increasingly
outdated home-grown one.

Thanks for your help.

Kevin Crouse

Peter Karman

unread,
Oct 20, 2009, 7:14:52 PM10/20/09
to Rose::DB::Object


On Oct 20, 3:35 pm, Kevin <kcrouse...@gmail.com> wrote:

> > like mod_perl.  There are, however, ways to get this same kind of
> > performance outside of mod_perl by wrapping your scripts in a
> > persistence layer so they, too, only have to load once.  
>
> This is definitely possible, and we probably have all of the
> infrastructure already there to handle it.  It doesn't had a lot of
> support in the developer group, however.  There's a certain
> decentralized unixian ethic that opposes it and relies on stringing
> together simple and fast programs to get immediate ad-hoc production
> information.  We'll see what we can come up with.

One way I have handled a similar situation is to make my "client" code
do RPC using
HTTP+JSON to talk to a server that has loaded all my RDBO classes.
It's
"decentralized" in the sense that the clients are lightweight and can
be piped
together in typical unix fashion; instead of talking to the database
directly
however, they talk to a HTTP proxy for the database.

I've encountered (and felt myself) the resistance to centralized
services. One way I've sold it (to myself first of all) is to remember
that the database is a centralized service, even if it is replicated,
and that adding the code-driven interface to it has some added
advantages in terms of validation and business logic implementation
that can't reasonably be done at the db storage level.
Reply all
Reply to author
Forward
0 new messages