Alfresco Bulk Upload Auto Tagging

792 views
Skip to first unread message

john.s...@ibo.org

unread,
Oct 5, 2011, 5:03:12 AM10/5/11
to Alfresco Bulk Filesystem Import
Hi all,

I am planning to use the bulk upload tool top transfer a large number
of documents from one storage system to another handled by Alfresco
and I wanted to understand a little better the topic of automatically
adding tags to a document during this process.

The plan is to extract existing metadata from the current database and
create the shadow file of type <filename>.metadata.properties to
assist in this migration process.

Our tagging information can be extracted from the file name as there
is an existing strict definition for files which in effect can be
reverse engineered to create the tag data for the shadow metadata
file.

The bulk upload tool can then walk the tree of documents and create
the document structure in the repository. The created node reference
information can then be used to add in the tag data to that node.

My original question was around issue 57, and the original properties
file is reproduced here for context setting.

type=cm:content
aspects=cm:dublincore, cm:taggable
cm\:title=XXX
cm\:description=Work
cm\:identifier=505299
cm\:taggable=tag1

In the reply thread is seems that the cm:taggable field should be set
to the node being accessed, as per the following response in the
thread:

cm\:taggable=workspace://SpacesStore/
3da6c395-3a4b-4a57-836d-8e5b4fdfc332

What I wanted to understand is how to represent tagging data to be
applied to that node reference in the shadow file?

So for example if I wanted to add tags such as "English" "Programme_X"
to the referenced node how would I represent that in the shadow file?

Any examples would certainly assist me.

Regards John

Zhihai Liu

unread,
Oct 7, 2011, 1:17:54 PM10/7/11
to alfresco-bulk-f...@googlegroups.com
John,

I ran into the same problem. Please see Peter's comments on Issue 88 - http://code.google.com/p/alfresco-bulk-filesystem-import/issues/detail?id=88. Basically, you would need to find out the node ref of tag "English" in your Alfresco repository and use that as value in the metadata file.

That said, I would love to see that the Bulk Import Tool could do a bit extra, such as, in case of plain text tag, query Alfresco to get node ref then does its business as usual. It would be so sleek regarding to usability.

Peter, do you foresee this as a useful feature? Can we add it to the "wish list"? :-)

Thanks,
Zhihai




Peter Monks

unread,
Oct 7, 2011, 1:20:02 PM10/7/11
to Alfresco Bulk Filesystem Import
G'day John,

> In the reply thread is seems that the cm:taggable field should be set
> to the node being accessed, as per the following response in the
> thread:
>
> cm\:taggable=workspace://SpacesStore/3da6c395-3a4b-4a57-836d-8e5b4fdfc332

Actually the NodeRef(s) in the cm:taggable property are the NodeRef(s)
of the tag(s) you wish to associate the content with. In Alfresco
tags are stored as independent content objects (nodes) in the
repository, and (as with all nodes), they have their own globally
unique NodeRef. It's the tags' own NodeRefs that need to be placed
into the cm:taggable property in the shadow metadata file.

> What I wanted to understand is how to represent tagging data to be
> applied to that node reference in the shadow file?
>
> So for example if I wanted to add tags such as "English" "Programme_X"
> to the referenced node how would I represent that in the shadow file?
>
> Any examples would certainly assist me.

Let's say for example that you've already created the "English" and
"Programme_X" tags in your repository (a requirement if you wish to
tag content that is being bulk imported). Let's also say that these
two tags happen to have the following NodeRefs:

English -> workspace://SpacesStore/fda6c397-3a42-4ab7-836d-8ec74f86cd84
Programme_X -> workspace://SpacesStore/429e06a7-6478-418b-8194-9280060bd22b

(note: if you were to create these two tags in your own Alfresco
installation, the NodeRefs for the tags would be different to what I'm
showing here - these NodeRef values are for illustration purposes
only)

Furthermore, let's say that you want to tag a file called
"Shakespeare.txt" with both of these tags. In this case there would
also be a file called "Shakespeare.txt.metadata.properties.xml" in the
same directory as "Shakespeare.txt", and it would have at least the
following content:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/
properties.dtd">
<properties>
<entry key="aspects">cm:taggable</entry>
<entry key="cm:taggable">workspace://SpacesStore/
fda6c397-3a42-4ab7-836d-8ec74f86cd84,workspace://SpacesStore/
429e06a7-6478-418b-8194-9280060bd22b</entry>
</properties>

In summary the key steps are:
1. You have to pre-create your tags prior to preparing your source
data or importing it. The bulk import tool does not create or
manipulate tags.
2. You need to find out the NodeRefs of all of the tags you wish to
associate with the imported content, since it's those NodeRefs that
need to be put into the shadow metadata files.

Cheers,
Peter

Peter Monks

unread,
Oct 7, 2011, 1:42:43 PM10/7/11
to Alfresco Bulk Filesystem Import
G'day Zhihai,

Jinx! ;-)

> That said, I would love to see that the Bulk Import Tool could do a bit
> extra, such as, in case of plain text tag, query Alfresco to get node ref
> then does its business as usual. It would be so sleek regarding to
> usability.
>
> Peter, do you foresee this as a useful feature? Can we add it to the "wish
> list"? :-)

This is a specific example of the more general issue described at
issue #10 [1], so it's already on the roadmap. That said, it's a low
priority at the moment as there's a workaround (use NodeRefs) and
because the performance implications need careful consideration
(mixing up reads and writes within the same transaction hurts
performance).

To put this in context, my guiding principles for the import tool are:
1. to be performant
2. to be convenient to use, but only when that's not at odds with
principle #1

The primary audience of the tool is Alfresco administrators, so I've
made the assumption of familiarity with concepts like NodeRefs,
associations, the default dictionary & content model, etc. Am I over-
estimating the level of expertise that the typical Alfresco
administrator has, do you think?

Cheers,
Peter

[1] http://code.google.com/p/alfresco-bulk-filesystem-import/issues/detail?id=10

john.s...@ibo.org

unread,
Oct 10, 2011, 5:50:44 AM10/10/11
to Alfresco Bulk Filesystem Import
Hi all,

The confirmation from Peter on the tag architecture confirms what I
thought was the case with the node reference model.

The discussion raised by Zhihai is along the lines of what I was
thinking as a future enhancement, and again thanks for all the input,
as I appreciate examples.

In essence at this juncture I will have to do some pre-processing
before the bulk importation tool is run to build the shadow data files
to contain the pre-existing tag data node references.

The process will extract information from the existing database
metadata and then query the Alfresco repository for the existing tags
for say "English" and "Programme_X" to then insert that GUID reference
into the shadow file for application during the bulk upload, via the
cm:taggable data field.

Thank you all for the assistance, I appreciate it and I learnt another
thing today.

Regards John
> [1]http://code.google.com/p/alfresco-bulk-filesystem-import/issues/detai...
Reply all
Reply to author
Forward
0 new messages