How are people storing metadata about their data on the back end?
Say I have a set of 100,000 files and I want to store a set of (name,
value) tuples for each file, with the added complication that some
subset of the files change over time, so it needs to be easy to do
queries like "show me the parts of files that have changed since a
certain (name, value) tuple was set".
My first thought was to use subversion and svn properties -
http://svnbook.red-bean.com/en/1.1/ch07s02.html - does that seem
reasonable to people? Any experiences?
I really wanted to use a distributed version control system, but
apparently none of them support this feature, and the only one that
looks like it has a good chance of supporting it in the future is
bazaar -
http://bazaar-vcs.org/VersionedProperties - although patches
that add the functionality keep being rejected (latest was in 4/2008).
I understand on a superficial level that there are standard XML
formats to store this kind of information - RDFa, OWL - but they seem
like they would be the kind of thing that would be generated from a
database or version control system and then used as an interchange /
searchable format by machines, not something people would edit by
hand.
The context for anyone who is interested is moving the back end for
gNewSense kernel freedom verification [1] from a wiki to something
that has an actual API, data integrity, et cetera [2]. Currently the
process is too tied to a single tool or people not making typos when
editing metadata via a freeform text interface (aka wiki).
There are actually at least 4 different projects doing similar work,
often on the same large sets of files, so part of the objective would
be to find something that is as higher-level tool-agnostic and
distributed / easy to sync / merge as possible.
[1] gNewSense wiki: Kernel / Documenting Your Work
http://wiki.gnewsense.org/Kernel/DocumentingYourWork?from=Main.DocumentingYourWorkKernel
[2] [gNewSense-users] KFV back end / Code Review Programs
http://lists.gnu.org/archive/html/gnewsense-users/2008-07/msg00042.html
Thanks for any thoughts on this problem,
--
Danny Clark # Sys Admin, Free Software Foundation
#
http://www.fsf.org #
http://opensysadmin.com