[ANN] Python Pairtree library

24 views
Skip to first unread message

Ben O'Steen

unread,
Oct 15, 2009, 6:47:35 AM10/15/09
to Digital Curation
An announcement of sorts (more to do with the library being kinda
mature, and the documentation being verbose enough!)

Please see the following for a blogpost about it:
http://oxfordrepo.blogspot.com/2009/10/python-in-pairtree.html
[http://bit.ly/49zfsm]

In a nutshell:

Link to the Pairtree specification from John Kunze et al at the CDL
http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html

Link to pairtree python package:
http://pypi.python.org/pypi/Pairtree/

Link to API docs and quick start:
http://packages.python.org/Pairtree/

Debian/ubuntu install:
[sudo apt-get install python-setuptools]
sudo easy_install pairtree

Code example:
[foo] $ python
--------------------------------------------------
from pairtree import *
f = PairtreeStorageFactory()

fedora = f.get_store(store_dir="objects", uri_base="info:fedora/")

obj = fedora.create_object('changeme:1')

with open('somefileofdublincore.xml', 'r') as dc:
obj.add_bytestream('DC', dc)

with open('somearticle.pdf', 'r') as pdf:
obj.add_bytestream('PDF', pdf)

obj.add_bytestream('RELS-EXT', """
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rel="info:fedora/fedora-system:def/relations-
external#">
<rdf:Description rdf:about="info:fedora/changeme:1">
<rel:isMemberOf rdf:resource="info:fedora/type:article"/>
</rdf:Description>
</rdf:RDF>""")
--------------------------------------------------

Which will create the following on disk:

objects
|-- pairtree_prefix # (contains 'info:fedora/')
|-- pairtree_root
| `-- ch
| `-- an
| `-- ge
| `-- me
| `-- +1
| |-- DC
| |-- PDF
| `-- RELS-EXT
`-- pairtree_version0_1



Ben

Erik Hetzner

unread,
Oct 15, 2009, 1:20:01 PM10/15/09
to digital-...@googlegroups.com, Ben O'Steen
At Thu, 15 Oct 2009 03:47:35 -0700 (PDT),

Ben O'Steen wrote:
>
>
> An announcement of sorts (more to do with the library being kinda
> mature, and the documentation being verbose enough!)
>
> Please see the following for a blogpost about it:
> http://oxfordrepo.blogspot.com/2009/10/python-in-pairtree.html
> [http://bit.ly/49zfsm]

Hi Ben -

This is wonderful work! I have a half-written Pairtree library around,
but I am pleased not to have to document & release it.

I previously turned (half) the test cases for Perl pairtree into
testcases for Python. I have attached my test.py script modified to
use your new library. Most everything seems to work, except for a
problem with space and a problem with unicode.

John - I think that it would be nice to begin to define a set of test
cases for Pairtree to try to achieve interoperability.

I left out a few that I felt were inappropriate,
namely:

i2p2i('', '//', '0-char edge case');

because I think that a Pairtree should refuse to process a 0 character
id.

I also added an explicit test of the space mapping.

I have attached my Python pairtree library, if you are interested in
the hex encoding.

Thanks again for releasing this!

best,
Erik

test.py
ppath.py

Ben O'Steen

unread,
Oct 16, 2009, 10:16:09 AM10/16/09
to Digital Curation

Erik,

Thanks for sending me the magic regexes and tests :) I've since
incorporated them into the code, added a few tests and made sure that
it
can roundtrip all of the existing tests - "What you put in, you get
out
again"

How do you want me to attribute you?

To upgrade, "sudo easy_install -U pairtree" is the quickest way.

If you've got the source from pypi, you can run the testsuite by
running
"python setup.py test" in the root directory, as well as "sudo python
setup.py install" to install it.

Ben
>  test.py
> 8KViewDownload
>
>  ppath.py
> 1KViewDownload
>
>
>
> ;; Erik Hetzner, California Digital Library
> ;; gnupg key id: 1024D/01DB07E3
>
>  application_pgp-signature_part
> < 1KViewDownload

John A. Kunze

unread,
Oct 16, 2009, 10:42:50 AM10/16/09
to Digital Curation
Good stuff, guys! It's really nice to see these developments.
Ben, I take it you'd have no trouble if we pointed publicly
to your work?

-John

Ben O'Steen

unread,
Oct 16, 2009, 10:47:06 AM10/16/09
to Digital Curation
No problem at the pointing, naming and - due to inevitable bugs -
shaming. Ed Summers raised a good point about putting the code out
into a publically accessible repository like github, so I'll look into
that too.

Ben

Michael J. Giarlo

unread,
Oct 16, 2009, 11:11:47 AM10/16/09
to digital-...@googlegroups.com
Ben said it was kosher to do a wee bit of threadjacking, so here I am.

Ed Summers and I have worked on two Python libraries for the Dflat,
ReDD, and Namaste micro-service specifications. Mostly they were an
experiment to see how easy in practice it would be to implement and
use tools built on such lightweight specs, but it could be they're
useful for other folks too. Caveats aside, the code is all up on
github for folks who're interested in collaborating, and they're also
on pypi for easy_install purposes:

http://github.com/edsu/dflat
http://github.com/mjgiarlo/namaste

Each library has test cases which can be run with "python setup.py
test". I'd be glad to add test cases to boost interop between related
codesets, as Ben and Erik worked on for Pairtree. Oh, and I should
add that the Namaste library is largely a port of the Perl library up
on CPAN.

Cheers,

-Mike

Ben O'Steen

unread,
Oct 16, 2009, 11:13:29 AM10/16/09
to Digital Curation
To add to the git love - just put pairtree up there too:

http://github.com/benosteen/pairtree/

On Oct 16, 4:11 pm, "Michael J. Giarlo" <leftw...@alumni.rutgers.edu>
wrote:

Erik Hetzner

unread,
Oct 16, 2009, 12:59:38 PM10/16/09
to digital-...@googlegroups.com, Ben O'Steen
Hi Ben -

At Fri, 16 Oct 2009 07:16:09 -0700 (PDT),


Ben O'Steen wrote:
> Erik,
>
> Thanks for sending me the magic regexes and tests :) I've since
> incorporated them into the code, added a few tests and made sure
> that it can roundtrip all of the existing tests - "What you put in,
> you get out again"

Thanks, I forgot to do that.

> How do you want me to attribute you?

My work is (c) 2009 UC Regents. It is derived from John Kunze’s work,
which is also (c) UC Regents and released under the Apache license,
but I am sure we could relicense as BSD or something else.

And thanks for putting the code on github!

We are working on getting a open wiki up for curation microservices
work, expect that in the next few days.

-Erik

Ben O'Steen

unread,
Oct 16, 2009, 1:14:27 PM10/16/09
to Digital Curation
Yeah, the attribution/licence explicit boilerplate is something its
currently missing - I'm fine with an Apache licence too.

Will make the changes shortly...

Ben

Erik Hetzner

unread,
Oct 16, 2009, 1:16:44 PM10/16/09
to digital-...@googlegroups.com, Ben O'Steen
At Fri, 16 Oct 2009 10:14:27 -0700 (PDT),

Ben O'Steen wrote:
>
>
> Yeah, the attribution/licence explicit boilerplate is something its
> currently missing - I'm fine with an Apache licence too.
>
> Will make the changes shortly...

Hi Ben -

I prefer BSD to Apache, and GPL over both, but whatever works for you,
of course.

best,
Erik

Ben O'Steen

unread,
Oct 21, 2009, 11:22:30 AM10/21/09
to Digital Curation
Added a ppath binary to the 0.4.5 distribution and amended the
documentation to reflect this:

http://packages.python.org/Pairtree/

("sudo easy_install -U pairtree" to update)

Relevant docs on ppath follow:

The ppath script

A ppath script is included for convenience to be used in shell scripts
or similar. Eg:

ppath topath examples:

$ vim mystore/pairtree_root/`ppath topath document:105/data/
doc.txt`
(Opens the file at mystore/pairtree_root/do/cu/me/nt/+1/05/data/
doc.txt)
$ cp `ppath topath foo:bar/1.txt` `ppath topath bar:foo/2.txt`

ppath toid examples:

data/subjects/pairtree_root/HA/SS/ET/ROOT$ ppath toid `pwd`
HASSET/ROOT

(NB I know it's long, but hopefully long enough to understand what the
intent of the commands is for. There is always alias, should you want
i2p/p2i instead :) )

Ben
Reply all
Reply to author
Forward
0 new messages