Incremental META information

13 views
Skip to first unread message

Paul Stubbe

unread,
Jun 11, 2014, 7:46:04 AM6/11/14
to dezi-...@googlegroups.com
Hi,

   First of all: THANKS for the Dezi 3.0 release.

   I have the following scenario:

      A user can add a file to my repository
      and then I add this file to the DEZI search infrastructure.
      After this the file is never changed, except when the file is deleted at the end of its lifetime.

      But not so with the META information about the file.





Paul Stubbe

unread,
Jun 11, 2014, 7:50:12 AM6/11/14
to dezi-...@googlegroups.com
The META information can grow with every user of the system.
    Every user can add a 'tag' with a 'description' that can be used by others to search for something.

    Every time someone adds some metainformation,
    do I need to reload the Original file and all META information that was already loaded into the DEZI system
    or is there a way to do this in an incremental way?

Thanks for any advice.

Paul

Peter Karman

unread,
Jun 13, 2014, 8:38:35 AM6/13/14
to dezi-...@googlegroups.com
Hi Paul,

Conceptually, a "document" in Dezi (or any IR library -- Lucene, Xapian,
etc) is whatever you make it. It could be a file on disk or a URI
(resource) or a the serialization of a db record or whatever. It's up to
you.

What Dezi is *not* is a relational database, so if you have metadata
about the document that you want stored, you must update the entire
document in the index. It's a "replace" not an "update". The HTTP method
is PUT for Dezi.

For what you're describing, though, it might make sense to have 2
indexes: one for the "file" and then one for each tag+description about
the file. While Dezi isn't relational in the sense of RDBMS, you can
still refer to things by their URI, just like HTML does.

You could serve these indexes from your Dezi instance with the
MultiTenant feature:

https://metacpan.org/release/Dezi-MultiTenant

with a (example, untested) config like:

{
'/files' => { }, # index-specific config here
'/tags' => {
indexer_config => {
config => {
MetaNames => 'file tag description author',
PropertyNames => 'file tag description author',
},
},
}

Example:

% cat myfile.xml
<doc>
<title>my document</title>
<body>important stuff</body>
</doc>

% dezi-client --server http://localhost:5000/files myfile.xml
# you can now GET http://localhost:5000/files/index/myfile.xml

% cat myfile-tag.xml
<doc>
<tag>foo</tag>
<description>the important stuff is very foo</description>
<author>someone</author>
<file>myfile.xml</file>
</doc>

% dezi-client --server http://localhost:5000/tags myfile-tag.xml
# you can now GET http://localhost:5000/tags/index/myfile-tag.xml

% dezi-client --server http://localhost:5000/tags -q file:myfile.xml
# returns all the tags for myfile.xml

--
Peter Karman . http://peknet.com/ . pe...@peknet.com
Reply all
Reply to author
Forward
0 new messages