Metadata storage

14 views
Skip to first unread message

Danny

unread,
Jul 24, 2008, 12:38:19 AM7/24/08
to process.theinfo
How are people storing metadata about their data on the back end?

Say I have a set of 100,000 files and I want to store a set of (name,
value) tuples for each file, with the added complication that some
subset of the files change over time, so it needs to be easy to do
queries like "show me the parts of files that have changed since a
certain (name, value) tuple was set".

My first thought was to use subversion and svn properties -
http://svnbook.red-bean.com/en/1.1/ch07s02.html - does that seem
reasonable to people? Any experiences?

I really wanted to use a distributed version control system, but
apparently none of them support this feature, and the only one that
looks like it has a good chance of supporting it in the future is
bazaar - http://bazaar-vcs.org/VersionedProperties - although patches
that add the functionality keep being rejected (latest was in 4/2008).

I understand on a superficial level that there are standard XML
formats to store this kind of information - RDFa, OWL - but they seem
like they would be the kind of thing that would be generated from a
database or version control system and then used as an interchange /
searchable format by machines, not something people would edit by
hand.

The context for anyone who is interested is moving the back end for
gNewSense kernel freedom verification [1] from a wiki to something
that has an actual API, data integrity, et cetera [2]. Currently the
process is too tied to a single tool or people not making typos when
editing metadata via a freeform text interface (aka wiki).

There are actually at least 4 different projects doing similar work,
often on the same large sets of files, so part of the objective would
be to find something that is as higher-level tool-agnostic and
distributed / easy to sync / merge as possible.

[1] gNewSense wiki: Kernel / Documenting Your Work
http://wiki.gnewsense.org/Kernel/DocumentingYourWork?from=Main.DocumentingYourWorkKernel

[2] [gNewSense-users] KFV back end / Code Review Programs
http://lists.gnu.org/archive/html/gnewsense-users/2008-07/msg00042.html

Thanks for any thoughts on this problem,
--
Danny Clark # Sys Admin, Free Software Foundation
# http://www.fsf.org # http://opensysadmin.com

Philip (flip) Kromer

unread,
Jul 25, 2008, 9:26:30 AM7/25/08
to process...@googlegroups.com
A cursory tour of
http://en.wikipedia.org/wiki/Comparison_of_revision_control_software
and its forward links
http://better-scm.berlios.de/comparison/
http://www.relisoft.com/co_op/vcs_breakdown.html
seems to confirm that none of the Free Open VCS do file properties. This
http://www.daizucms.org/blog/2007/07/git-lacks-properties/
talks a bit about why it's hard.

Could you do this with git-svn
http://www.robbyonrails.com/articles/2008/04/10/git-svn-is-a-gateway-drug
and post-commit hooks
http://www.kernel.org/pub/software/scm/git/docs/v1.3.3/hooks.html
?

Perhaps best: limp along with SVN for a while, and know that whichever
DVCS proves worthy will provide an svn migration tool by market
necessity.

flip


--
http://www.infochimps.org
Connected Open Free Data

Stuart Sierra

unread,
Jul 29, 2008, 3:35:45 PM7/29/08
to process.theinfo
A slightly more complicated solution: store your metadata in separate
files, so for "foo.txt" you would also have ".meta.foo.txt" or
something similar. Then you could implement the queries you need on
top of any version control system.

Another alternative: store your metadata in an SQL database, with
timestamped rows.

-Stuart


On Jul 24, 12:38 am, Danny <da...@opensysadmin.com> wrote:
> How are people storing metadata about their data on the back end?
>
> Say I have a set of 100,000 files and I want to store a set of (name,
> value) tuples for each file, with the added complication that some
> subset of the files change over time, so it needs to be easy to do
> queries like "show me the parts of files that have changed since a
> certain (name, value) tuple was set".
>
> My first thought was to use subversion and svn properties -http://svnbook.red-bean.com/en/1.1/ch07s02.html- does that seem
> reasonable to people? Any experiences?
>
> I really wanted to use a distributed version control system, but
> apparently none of them support this feature, and the only one that
> looks like it has a good chance of supporting it in the future is
> bazaar -http://bazaar-vcs.org/VersionedProperties- although patches
> that add the functionality keep being rejected (latest was in 4/2008).
>
> I understand on a superficial level that there are standard XML
> formats to store this kind of information - RDFa, OWL - but they seem
> like they would be the kind of thing that would be generated from a
> database or version control system and then used as an interchange /
> searchable format by machines, not something people would edit by
> hand.
>
> The context for anyone who is interested is moving the back end for
> gNewSense kernel freedom verification [1] from a wiki to something
> that has an actual API, data integrity, et cetera [2]. Currently the
> process is too tied to a single tool or people not making typos when
> editing metadata via a freeform text interface (aka wiki).
>
> There are actually at least 4 different projects doing similar work,
> often on the same large sets of files, so part of the objective would
> be to find something that is as higher-level tool-agnostic and
> distributed / easy to sync / merge as possible.
>
> [1] gNewSense wiki: Kernel / Documenting Your Workhttp://wiki.gnewsense.org/Kernel/DocumentingYourWork?from=Main.Docume...
>
> [2] [gNewSense-users] KFV back end / Code Review Programshttp://lists.gnu.org/archive/html/gnewsense-users/2008-07/msg00042.html
Reply all
Reply to author
Forward
0 new messages