I need help with document_analyzer basics

87 views
Skip to first unread message

David Reagan

unread,
Feb 14, 2018, 8:26:38 PM2/14/18
to Mayan EDMS
Hey all,

I finally have time to experiment with Mayan-EDMS some more. So I'm back at trying to get https://gitlab.com/startmat/document_analyzer working the way I want.

Unfortunately, I can't seem to figure it out.

I'm currently testing on a vagrant instance. See: https://gitlab.com/mayan-edms/mayan-edms-vagrant

I ended up copying the document_analyzer app into the apps directory to get it loading.

I am using an Albertsons receipt to test with. The first two lines of OCR look like:

4S Albertsons
It's just better.

 I made an analyzer and assigned the 'receipt' document type to it. (That's the type I added and that the albertsons receipt's properties page says it is.)

Parameter:
first;(?ims)(?P<albertsons>(.*Albertsons.*))


This should cause document_analyzer to add a "albertsons" field to either the metadata or properties of the document. Am I wrong?

I also made an analyzer based on the document_analyzer's README.

Parameter:
first;(?i)(?P<Creator>Tele2|Apple|Microsoft|Billa|Albertsons)

I just added "Albertsons" to list of words to look for.


This should cause document_analyzer to add a "Creator" field to either the metadata or properties of the document. Am I wrong?


I used the menu item "Submit to analyze" http://localhost:8080/document_analyzer/analyzer/1/submit/ to run document_analyzer.


All I can see in the logs is that I clicked that menu item. The document's properties and metadata do not change.


Nothing is added to either the metadata or properties of the document.


If I test:


(?ims).*albertsons.*


on http://www.pyregex.com/ with the first two lines of the document, it reports a success.


/usr/share/mayan-edms/mayan/settings/local.py looks like:


from __future__ import absolute_import, unicode_literals

from .base import *

SECRET_KEY
= '5(kv&ow31r2m9e^#c65v%ppiwiv9epu-hxa*1jsa1#m5bi!g7+'

DATABASES
= {
   
'default': {
       
'ENGINE': 'django.db.backends.postgresql_psycopg2',
       
'NAME': 'mayan_edms',
       
'USER': 'mayan',
       
'PASSWORD': 'test123',
       
'HOST': 'localhost',
       
'PORT': '5432',
   
}
}
INSTALLED_APPS
+= (
   
'document_analyzer',
)

BROKER_URL
= 'redis://127.0.0.1:6379/0'
CELERY_RESULT_BACKEND
= 'redis://127.0.0.1:6379/0'

LOGGING
= {
   
'version': 1,
   
'disable_existing_loggers': True,
   
'formatters': {
       
'verbose': {
           
'format': '%(levelname)s %(asctime)s %(name)s %(process)d %(thread)d %(message)s'
       
},
       
'intermediate': {
           
'format': '%(name)s <%(process)d> [%(levelname)s] "%(funcName)s() %(message)s"'
       
},
       
'simple': {
           
'format': '%(levelname)s %(message)s'
       
},
   
},
   
'handlers': {
       
'console':{
           
'level':'DEBUG',
           
'class':'logging.StreamHandler',
           
'formatter': 'intermediate'
       
}
   
},
   
'loggers': {
       
#'documents': {
       
#    'handlers':['console'],
       
#    'propagate': True,
       
#    'level':'DEBUG',
       
#},
       
#'common': {
       
#    'handlers':['console'],
       
#    'propagate': True,
       
#    'level':'DEBUG',
       
#},
       
'document_analyzer': {
           
'handlers':['console'],
           
'propagate': True,
           
'level':'DEBUG',
       
},

   
}
}


Does anyone have any tips? Am I missing a step somewhere?

Matthias Löblich

unread,
Feb 17, 2018, 7:17:29 AM2/17/18
to Mayan EDMS
 Hi David,
you can navigate to the document_analyzer result by selecting the document version page and then select "Analyzer result" from the "Actions" Menu of the related Document Version.




The Analyzer Result is not stored as Metadata, it is using its own structure. You are able to build Mayan Indexes based on the Analyzer Result.

For you example you can build an Index like that:  {{ document.analyzer_value_of.Creator }}

br
Matthias
Auto Generated Inline Image 1

David Reagan

unread,
Feb 17, 2018, 11:24:34 AM2/17/18
to mayan...@googlegroups.com
Thanks Matthias.

Now I know where to look.

When I read the docs the other day, I thought indexes seemed similar to
a folder structure. Is that an ok way to think of them?

Is there a way to use document_analyzer to add tags, metadata, or
properties?

For example, if I upload a receipt from Amazon.com, I'd like to add it
to the "2018->Amazon" index, tag it with something pulled from the Items
Ordered section, and add metadata that includes: total, billed date,
ordered date, Amazon.com order number, and what card I used.

On 02/17/2018 04:17 AM, Matthias Löblich wrote:
>  Hi David,
> you can navigate to the document_analyzer result by selecting the
> document version page and then select "Analyzer result" from the
> "Actions" Menu of the related Document Version.
>
>
>
>
> The Analyzer Result is not stored as Metadata, it is using its own
> structure. You are able to build Mayan Indexes based on the Analyzer Result.
>
> For you example you can build an Index like that:  {{
> document.analyzer_value_of.|Creator| }}
>
> br
> Matthias
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Mayan EDMS" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/mayan-edms/1vDxSIvulNI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> mayan-edms+...@googlegroups.com
> <mailto:mayan-edms+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
- David Reagan

Alan

unread,
Apr 25, 2018, 9:18:23 PM4/25/18
to Mayan EDMS
Bump
Reply all
Reply to author
Forward
0 new messages