Refactoring PyKE

34 views
Skip to first unread message

Jeremy

unread,
Jan 9, 2009, 5:05:13 PM1/9/09
to PyKE
Integrating PyKE into Zope, and indeed paving the way for more
flexible persistence, probably first requires factoring out the
filesystem dependent code into separate routines. I think it might be
time to cut a branch.

My plan-A would be to develop the branch so that all non-Zope specific
changes to the PyKE codebase could easily be merged back into the
trunk, and the Zope specific stuff would exist as separate files. The
code that handles files would sit alongside the Zope stuff, and maybe
we want to get pedantic and implement a third storage method.

dangyogi

unread,
Jan 10, 2009, 1:56:40 PM1/10/09
to PyKE

On Jan 9, 5:05 pm, Jeremy <jeremy.mcmil...@gmail.com> wrote:
> Integrating PyKE into Zope, and indeed paving the way for more
> flexible persistence, probably first requires factoring out the
> filesystem dependent code into separate routines. I think it might be
> time to cut a branch.

I am glad that you are excited about this, but there are only a total
of 7 calls to the built-in "open" function in all of the PyKE source
code that need to be looked at:

- 3 calls in pyke/krb_compiler/k*parser.py to open the 3 kinds of
source files (.krb, .kfb and .kqb)
- 3 calls in pyke/krb_compiler/__init__.py to write .py, .fbc and .qbc
files, and
- 1 call in pyke/knowledge_engine.py to open .fbc and .qbc (pickle)
files.

Each of these is rather trivial to change.

I don't think that we are quite ready to start a branch yet. And
besides, an error has been reported that is ending up touching a lot
of this code dealing with files that hasn't been committed yet. So it
might be better to wait a few more days. OTOH, if you really want to
start immediately, I can create the branch now and then you can merge
the bug fix changes into the branch later. Let me know.

> My plan-A would be to develop the branch so that all non-Zope specific
> changes to the PyKE codebase could easily be merged back into the
> trunk, and the Zope specific stuff would exist as separate files. The
> code that handles files would sit alongside the Zope stuff, and maybe
> we want to get pedantic and implement a third storage method.

I plan to branch the whole trunk (including the documentation, unit
tests and examples). While the Zope work is ongoing, changes on the
trunk can be merged into the branch (luckily, subversion 1.5 makes
this much easier and SourceForge is using 1.5 svn servers -- you'll
need to make sure that you have a 1.5 or later svn client).

Then when the Zope work is done, we can merge the entire branch back
into trunk. This should put everything back together in one place.

I envision that the Zope work would then be included in all PyKE
releases from that point on. There may be a pyke/zope directory that
isn't used by "Plain-PyKE" users, but is still part of the release.
My assumption here is that the pyke/zope stuff would not grow the PyKE
release by more than about 50%. If the pyke/zope stuff turns into a
monster, then we may need to find a Plan B.

So the merge of the branch back into the trunk at the end would bring
the pyke/zope directory into the trunk.

Does this make sense?

One another topic --- Do objects destined for the ZODB need to be
derived from some kind of zope.persistent class? And, if so, can this
be done by multiple inheritance:

pyke/knowledge_engine.py:
class engine(object):
...

pyke/zope/knowledge_engine.py
import pyke.knowledge_engine

class engine(zope.persistent, pyke.knowledge_engine.engine):
...

Jeremy

unread,
Jan 12, 2009, 10:35:37 AM1/12/09
to PyKE
PyKE, to be a very reusable component, would ideally not require
filesystem access any more than it would require Zope or ZODB. At this
point the logic that handles finding a collection of rule/fact/
question files is a filesystem walker.

On Jan 10, 12:56 pm, dangyogi <dangy...@gmail.com> wrote:
> On Jan 9, 5:05 pm, Jeremy <jeremy.mcmil...@gmail.com> wrote:
>
> > Integrating PyKE into Zope, and indeed paving the way for more
> > flexible persistence, probably first requires factoring out the
> > filesystem dependent code into separate routines. I think it might be
> > time to cut a branch.
>
> I am glad that you are excited about this, but there are only a total
> of 7 calls to the built-in "open" function in all of the PyKE source
> code that need to be looked at:
>
> - 3 calls in pyke/krb_compiler/k*parser.py to open the 3 kinds of
> source files (.krb, .kfb and .kqb)
> - 3 calls in pyke/krb_compiler/__init__.py to write .py, .fbc and .qbc
> files, and
> - 1 call in pyke/knowledge_engine.py to open .fbc and .qbc (pickle)
> files.
>
> Each of these is rather trivial to change.

There's those, which don't bother me at all.

The constructor does a lot of stuff that scans the filesystem and
automatically (re)compiles anything that needs compiling. I was
thinking of hoisting all of that up into knowledge_base/rule_base
classes to abstract the interface between the engine and the file/zope/
whatever persistent store.

The idea I'm planning to try is knowledge_engine.engine intializes
itself, but will accept knowledge_bases and rule_bases as arguments
ala

knowledge_engine.engine(rule_base = joe_rules, knowledge_base =
joe_kb)

In the default behavior, it's expected that a new engine will walk the
filesystem starting at '.' or any other given path and compile
everything it finds. I envision an implementation where the engine
doesn't care how the rule_base or knowledge_base got itself ready for
inference and asks the objects that provide the data to do that work
for them. They would figure out how to find the stored data they need,
and then use the compiler to load it for them.

Does that make sense?

Overall, I'm trying to think OOPish and pythonic about how to abstract
the persistence away from the engine, but anything like what I'm
suggesting would need to make sense in a big picture way. Otherwise it
would be stumbling towards a rewrite.

> I don't think that we are quite ready to start a branch yet.  And
> besides, an error has been reported that is ending up touching a lot
> of this code dealing with files that hasn't been committed yet.  So it
> might be better to wait a few more days.  OTOH, if you really want to
> start immediately, I can create the branch now and then you can merge
> the bug fix changes into the branch later.  Let me know.

I thought the code would be mostly stable now since you were going to
be otherwise occupied, but I will take your advice and wait for a
bit.

> > My plan-A would be to develop the branch so that all non-Zope specific
> > changes to the PyKE codebase could easily be merged back into the
> > trunk, and the Zope specific stuff would exist as separate files. The
> > code that handles files would sit alongside the Zope stuff, and maybe
> > we want to get pedantic and implement a third storage method.
>
> I plan to branch the whole trunk (including the documentation, unit
> tests and examples).  While the Zope work is ongoing, changes on the
> trunk can be merged into the branch (luckily, subversion 1.5 makes
> this much easier and SourceForge is using 1.5 svn servers -- you'll
> need to make sure that you have a 1.5 or later svn client).

I need to upgrade svn from ports. Doing that now.

> Then when the Zope work is done, we can merge the entire branch back
> into trunk.  This should put everything back together in one place.
>
> I envision that the Zope work would then be included in all PyKE
> releases from that point on.  There may be a pyke/zope directory that
> isn't used by "Plain-PyKE" users, but is still part of the release.
> My assumption here is that the pyke/zope stuff would not grow the PyKE
> release by more than about 50%.  If the pyke/zope stuff turns into a
> monster, then we may need to find a Plan B.
>
> So the merge of the branch back into the trunk at the end would bring
> the pyke/zope directory into the trunk.
>
> Does this make sense?

Yes. Agreed.

> One another topic --- Do objects destined for the ZODB need to be
> derived from some kind of zope.persistent class?  And, if so, can this
> be done by multiple inheritance:
>
> pyke/knowledge_engine.py:
>     class engine(object):
>         ...
>
> pyke/zope/knowledge_engine.py
>     import pyke.knowledge_engine
>
>     class engine(zope.persistent, pyke.knowledge_engine.engine):
>         ...

That's exactly how it is supposed to work. However, that is only
necessary when you want/need your classes to be transaction aware. In
practice, I think most implementations just make sure their classes
are pickle-safe and then handle the Zope stuff outside.

Here's my favorite intro:
http://www.zope.org/Members/adytumsolutions/HowToLoveZODB_PartI

Bruce Frederiksen

unread,
Jan 13, 2009, 10:25:57 AM1/13/09
to py...@googlegroups.com
Jeremy wrote:
The idea I'm planning to try is knowledge_engine.engine intializes
itself, but will accept knowledge_bases and rule_bases as arguments
ala

knowledge_engine.engine(rule_base = joe_rules, knowledge_base =
joe_kb)
  
This sounds like moving the filesystem walking/compiling/loading code into a separate function.  So Plain-PyKE users would then do:
engine = some_pyke_module.load_engine(...)
Rather than calling the engine constructor directly.

This moves the initialization stuff out of the way so that you can do your own initialization in Zope.

I think that this makes a lot of sense.

But, in the meantime, I've refactored the initialization logic and just checked in a first cut at it.  I've created a new target_pkg class (in pyke/target_pkg.py) that deals with walking/compiling/loading all of the knowledge bases related to a single 'compiled_krb' directory.  I've also changed the initialization logic to allow multiple compiled_krb directories (and hence multiple instances of target_pkg).  This was done so that, for example, the web_framework example could just directly use the sqlgen compiled_krb without having to compile its own copy of the sqlgen knowledge base in its own compiled_krb directory.  I hope that this is a first step towards allowing a mix of filesystem and ZODB knowledge bases in the same engine (something I want to promote to allow for Zope users to still use filesystem based rule bases -- as well as encouraging Zope developers to consider packaging their rule bases as files so that non-Zope users can also benefit from their work).

This also changes the parameters to engine.__init__ somewhat.  Rather than paths (as a tuple) and  generated_root_pkg (where the compiled_krb directory is); the new format is simply *paths -- where each path component has its own compiled_krb directory (but allowing multiple path components to specify the same compiled_krb directory).  I'm also now requiring that the pyke sources be on the Python Path.  Thus, each path argument takes one of the following forms:
  1. python_package
    • import foo.bar
    • engine = knowledge_engine.engine(foo.bar)
  2. 'dotted.package.name'
    • engine = knowledge_engine.engine('foo.bar')
  3. (python_package|'dotted.package.name'|None, 'dotted.package.name.of.compiled_krb')
    • engine = knowledge_engine.engine(('foo.bar', '..compiled_krb'))
(Note that the compiled_krb module path may use the new relative import notation.  So the default compiled_krb package, if none is specified, is '.compiled_krb' to place the compiled_krb package in the source package).

So what I now wonder is whether you could add a fourth path option for Zope?  This would allow the engine object to be initialized with any combination of filesystem rule bases and Zope bases.  If this ends up workable, then maybe we don't need to pull the initialization out of engine.__init__ as a separate function???

OK, all of that said, I've just checked the first version of all of this into svn (rev 172).  Give me your sourceforge user id, and I'll add you to the project.  This commit isn't quite soup yet, but the changes from here should be small.  I've gone ahead and created the zope branch:
$ svn checkout https://pyke.svn.sourceforge.net/svnroot/pyke/branches/zope what_you_call_it
This is a complete copy of trunk as of rev 172.

I would suggest placing your what_you_call_it directory on your Python Path using a .pth file in the site-packages directory rather than setting PYTHONPATH.  The problem with PYTHONPATH is that it is still seen by virtualenv copies of Python.  You will have to delete any prior installation of PyKE too avoid confusion (or do all of your Zope stuff in a virtualenv?).

Are you developing on Linux?  My test scripts and documentation generation scripts are bash scripts...

You will need to be made a project member before you can do the above checkout (using https).  You have to do this form of checkout before you can do commits.

Are you a pro with subversion?  You will need to regularly merge changes from the trunk back into your branch.  I'll try to keep you posted on when I do commits to trunk.  Subversion 1.5 has a special way to do this that keeps track of where it is.  So you need to be sure to use the right incantation of merge.  Ask if you are new to svn.

One another topic --- Do objects destined for the ZODB need to be
derived from some kind of zope.persistent class?  And, if so, can this
be done by multiple inheritance:

pyke/knowledge_engine.py:
    class engine(object):
        ...

pyke/zope/knowledge_engine.py
    import pyke.knowledge_engine

    class engine(zope.persistent, pyke.knowledge_engine.engine):
        ...
    
That's exactly how it is supposed to work. However, that is only
necessary when you want/need your classes to be transaction aware. In
practice, I think most implementations just make sure their classes
are pickle-safe and then handle the Zope stuff outside.

Here's my favorite intro:
http://www.zope.org/Members/adytumsolutions/HowToLoveZODB_PartI
  
I guess that you can still provide TTW access to persistent objects that are not derived from Zope.Persistent.  So maybe only the engine and/or knowledge_bases have to be Zope.Persistent??  I'm interested to see how this pans out!
Reply all
Reply to author
Forward
0 new messages