Metadata extractor?

118 views
Skip to first unread message

RW Shore

unread,
Feb 24, 2018, 12:47:16 PM2/24/18
to mayan...@googlegroups.com
Does anyone have a transformer that maps the embedded JPEG metadata (date/time taken, size, ...) into Mayan-EDMS metadata? Is such a transformation possible?

lonevi...@gmail.com

unread,
Feb 27, 2018, 7:21:33 PM2/27/18
to Mayan EDMS
Found this searching the web. Seems to be exactly what you are looking for.

RW Shore

unread,
Feb 28, 2018, 1:08:27 PM2/28/18
to mayan...@googlegroups.com
Thanks. Feel stupid for not finding it myself

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lonevi...@gmail.com

unread,
Feb 28, 2018, 8:03:53 PM2/28/18
to Mayan EDMS
Cheers mate it happened to me too :) There is a veritable wealth of expansions, plugins and apps written for Mayan, but they are all over the web. Would be nice to have all that in a single place, like an app store.

On Wednesday, February 28, 2018 at 2:08:27 PM UTC-4, RW Shore wrote:
Thanks. Feel stupid for not finding it myself
On Tue, Feb 27, 2018 at 7:21 PM, <lonevi...@gmail.com> wrote:
Found this searching the web. Seems to be exactly what you are looking for.

https://pypi.python.org/pypi/mayan-exif

On Saturday, February 24, 2018 at 1:47:16 PM UTC-4, RW Shore wrote:
Does anyone have a transformer that maps the embedded JPEG metadata (date/time taken, size, ...) into Mayan-EDMS metadata? Is such a transformation possible?

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+...@googlegroups.com.

Matthias Löblich

unread,
Mar 2, 2018, 6:10:59 AM3/2/18
to Mayan EDMS
You can also use https://gitlab.com/mayan-edms/document_analyzer, which includes the exif functionality.

br
Matthias

RW Shore

unread,
Mar 2, 2018, 12:01:19 PM3/2/18
to mayan...@googlegroups.com
thank you for the suggestion. Unfortunately I can't get the document_analyzer app to install. My situation is the following:

* I'm starting with the docker container -- not the NG one (yet), but the mayanedms/mayanedms:latest one. The image is running as a swarm-based service

* Docker file is attached. The only change from the installation instructions given in the gitlab README is to copy the document_analyzer code into the mayan/apps directory. I did this
rather than a symlink only because I was uncertain if the symlink was my problem.

* This local.py works fine:
from __future__ import absolute_import, unicode_literals

from .base import *

SECRET_KEY = 'iluml=7+pdsklj@ild8e%o*16b2a1=+m!ks9^o%5u54b&=2gh%'

EMAIL_HOST = 'smtp.gmail.com'
EMAIL_PORT = 587
EMAIL_HOST_USER = 'r...@shore.support'
EMAIL_HOST_PASSWORD = 'RWS.$oct10'
EMAIL_USE_TLS = True
# INSTALLED_APPS += (
#  'document_analyzer',
# )

* By "works fine" I mean I can login as admin, see the document types (only 1), upload a zip file of JPEGs, ...

* As soon as I remove the comments around INSTALLED_APPS, nothing works. If I cycle the service, it refuses to come up. If I change the local.py file while the app is running and execute "mayan-edms.py migrate, I get a stack trace (attached) which ends with the following:
  File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py", line 237, in get_containing_app_config
    self.check_apps_ready()
  File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py", line 124, in check_apps_ready
    raise AppRegistryNotReady("Apps aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.

* I've verified that exiftool is installed and appears to run properly, though the only thing I actually did was "exiftool -ver".

Any suggestions?
 

--

---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+unsubscribe@googlegroups.com.
trace.txt
Dockerfile

Matthias Löblich

unread,
Mar 5, 2018, 11:40:53 AM3/5/18
to Mayan EDMS
Hi,
please try to use the document_analyzer version from my repository:

https://gitlab.com/startmat/document_analyzer

br
Matthias
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+...@googlegroups.com.

RW Shore

unread,
Mar 6, 2018, 5:44:32 AM3/6/18
to mayan...@googlegroups.com
Thank you for the reply. By using the analyzer from your repository, I was able to extend the docker container and get the service running. My next questions involve setup. I assume that I need to create a new analyzer. When I bring up the "create analyzer" panel, I see "GetExifData" as one of the analyzers in the drop-down at the bottom of the panel. I suppose that I just make up names for the label and the slug. However, I can't create a new analyzer without putting something in the "Parameters" field.

What should the Parameters field contain for an EXIF analyzer? Also, do I need to pre-define metadata types for the extracted EXIF information, or are the types created automagically?

To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+unsubscribe@googlegroups.com.

Matthias Löblich

unread,
Mar 6, 2018, 2:07:12 PM3/6/18
to mayan...@googlegroups.com
Hi,
just put the string None as parameter for the exif analyzer.

The result of the EXIF information will not be stored in the metadata. There is separate data structure where there values are stored. You can find it there:





The result page looks like that:





You are able to setup Mayan-Indexes based on the analyzer results similar to the indexes based on metadata:

Menupath: System/Setup/Indexes -> Create index -> Save -> Tree Template:

Add a django template expression pointing to the name of the parameter of the document_analyzer result.
e.g this expression is creating an Index based on the FileType (see red square in the screen shot above) :

{{ document.analyzer_value_of.FileType }}




Hope that helps.

br
Matthias


--

---
You received this message because you are subscribed to a topic in the Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mayan-edms/ky5ReQIMzSg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mayan-edms+unsubscribe@googlegroups.com.

RW Shore

unread,
Mar 6, 2018, 3:33:42 PM3/6/18
to mayan...@googlegroups.com
It helps, but still no joy. I changed the parameter from None to "None" and submitted a document. I found the analyzer results, but nothing prints. I've done the following:

a. cd /var/lib/mayan/document_storage && exiftool one-of-the-files -> a listing of JPEG metadata. I conclude from this that I've got exiftool installed properly. If it matters, the path to exiftool is /usr/local/bin/exiftool, but it appears to be on the executable path, at least when I exec /bin/bash in the container. If it matters, all the processes in the docker container are running as root _except_ the nginx workers, which run as www-data. This is the way the original container was set up; I haven't messed with the mayan startup command as embedded in the image mayanedms/mayanedms:latest. t

b. MariaDB [mayan]> select * from document_analyzer_analyzer;
+----+---------------+--------------+----------------------------------------------+-----------+
| id | label         | slug         | type                                         | parameter |
+----+---------------+--------------+----------------------------------------------+-----------+
|  1 | Exif Analyzer | exifAnalyzer | document_analyzer.backends.exiftool.EXIFTool | "None"    |
+----+---------------+--------------+----------------------------------------------+-----------+
1 row in set (0.00 sec)

c. MariaDB [mayan]> select * from document_analyzer_analyzer_document_types;
+----+-------------+-----------------+
| id | analyzer_id | documenttype_id |
+----+-------------+-----------------+
|  1 |           1 |               1 |
+----+-------------+-----------------+
1 row in set (0.00 sec)
(documentype #1 is Default)

d. MariaDB [mayan]> select * from document_analyzer_result;
Empty set (0.00 sec)
(this is after at least one document got submitted from the GUI)

* I don't see any events for the submitted document that say document_analyzer started or finished, though I do get the pop-up when I submit a document that says the document was inserted into the document_analyzer queue. Is the lack of events expected?

* Are there any error logs that might give me a clue about what's going on?

* Any other thoughts or comments?
Reply all
Reply to author
Forward
0 new messages