OCR and document names

239 views
Skip to first unread message

Douglas Van Es

unread,
May 28, 2017, 12:30:01 PM5/28/17
to mayan...@googlegroups.com
hello. just installed mayan via docker, it's up and running, it's looks
like it is going to work great.

i've read through the documentation, but i do have one question before i
continue the set up of users, document types, etc. and roll this out to
our users.

will i be able to use OCR to grab an invoice number from a scanned or
emailed document and have mayan name the document based on the results of
the OCR?

would that be set up as a transfromation, or some other way?

i am basically looking to really minimize the workload on our clerks who
will be scanning the invoices into mayan.

thank you all for your time, and the project looks amazing by the way!

doug van es

Douglas Van Es

unread,
Jun 6, 2017, 7:39:37 PM6/6/17
to mayan...@googlegroups.com
i've now set up a couple of users, a group, a role, some metadata,
watched and staging folders.

test document uploads are working great.

after reading through the docs and website, i still can't figure out how
to set up OCR to capture an invoice number and rename the document based
on the result.

can anyone tell me if this is possible with mayan? any hint's on how to
implement?

thanks in advance!

David Kornahrens

unread,
Jun 6, 2017, 8:13:08 PM6/6/17
to Mayan EDMS
I'm currently trying to walk myself through the program as well.  We really see the potential here, but help doesn't come quick.  I'm interested in getting a support plan, but not if the support speed doesn't increase.

Roberto has answered a few questions, but it's more of a waiting game really.  I posted some issues in the GitLab repository, but nothing on that yet either.  Let me known if you figure it out, we are looking into the same thing.

Douglas Van Es

unread,
Jun 8, 2017, 4:54:09 PM6/8/17
to mayan...@googlegroups.com

if i crack this or hear from anyone at mayan i'll be sure to let you know.

i'm in the same boat, if i can be sure mayan is going to work for us a
support plan is in our future as well.

Matthias Löblich

unread,
Jun 9, 2017, 4:42:16 AM6/9/17
to Mayan EDMS
Hi,
it did an Extension for mayan called /document_analyzer

https://gitlab.com/mayan-edms/document_analyzer

The idea behind is to analyze a document and store the result in an generic way (similar to metadata structure). At the moment there are two "analyzers" implemented. One which reads the exif data and one where you can configure regular expressions which are used to parse the ocr result of an document.
If you are able to write an regular expression to parse the invoice number (be aware that the ocr qualtity is very important !) you can use the extension to store the invoice number in a metadata like structure. You can also configure an mayan index on it.

br
Matthias

Douglas Van Es

unread,
Jun 15, 2017, 1:18:38 PM6/15/17
to mayan...@googlegroups.com
wow thank you matthias, this looks like it may work for me.

i have a couple of questions based on the docs at the github site, and am
wondering if you could help me out with them. what would my mayan root
folder be on an install using docker? i've looked around /var/lib/docker
and can't quite figure out the correct place to create a link to
document_analyzer...

would it be something like this: /var/lib/docker/aufs/mnt/HASHEDNAME/usr/
local/bin/ ? i don't have an apps folder in there.

i've found local.py in /var/lib/docker/volumes/mayan_settings/_data/ and
so will be able to edit that file to include document_analyser in the
list of installed apps, but can't find a /mymayanroot/apps folder.

will the migrations step shown on the git page be the same for a docker
install? eg: mayan-edms.py migrate ? i suppose i would execute that from /
var/lib/docker/aufs/mnt/HASHEDNAME/usr/local/bin/ right?

thank you for the help so far!

Matthias Löblich

unread,
Jun 19, 2017, 7:25:38 AM6/19/17
to mayan...@googlegroups.com
Hi Douglas,
I have not done any stuff on Docker with the document_analyzer, but if I look into the mayan docker file:

https://gitlab.com/mayan-edms/mayan-edms-docker/blob/master/Dockerfile

It is using ubuntu:16.04 image and installing mayan by "RUN pip install mayan-edms==2.3". So I guess mayan will be installed in sitepackages.

How to find the sitepackages-folder:

MY Laptop is an:
~$ lsb_release -a
No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 16.04.2 LTS
Release:    16.04
Codename:    xenial

Start python:
~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.

Run:
>>> import site; site.getsitepackages()
['/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages']
>>>


But this might be a good question for Roberto: How to integrate an Extension in to the mayan docker image.


br
Matthias




--

---
You received this message because you are subscribed to a topic in the Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mayan-edms/6P1AqlvNjWQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mayan-edms+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Douglas Van Es

unread,
Jun 21, 2017, 3:42:30 PM6/21/17
to mayan...@googlegroups.com
yes any tips on installing an extension into a mayan docker container
roberto?

thanks again matthias! i really appreciate the help

doug
>> mayan-edms+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/
Ez6Z...@public.gmane.org

Roberto Rosario

unread,
Jun 21, 2017, 3:52:25 PM6/21/17
to Mayan EDMS
Check the section "Customizing the image" here: https://hub.docker.com/r/mayanedms/mayanedms/

It is not the easiest thing to do but it is the way Docker images are officially customized.

However, after the next version, I plan to work on finding ways to customize the image without having to rebuild a new image.
One idea I want to try is providing an environment variable called MAYAN_PIP_PACKAGES or similar that contains
a comma delimited list of packages to download and install from the web. The disadvantage of this approach is that 
the installed packages are not persistent and need to be downloaded and installed every time the image starts.

Also planning on trying something like MAYAN_APT_PACKAGES too to allow installing Ubuntu packages like extra 
OCR language packs at runtime.

Docker provides a command called "commit" which could be the answer to the non persistent issue. 

These are all untested ideas at the moment and for now the only official way to customize an image is the one provided in the link above. 

Douglas Van Es

unread,
Jun 23, 2017, 9:37:25 AM6/23/17
to mayan...@googlegroups.com
thank you for the response roberto. great work on mayan, it looks like an
amazing tool. i think it will fill my organization's requirements for edm
rather well, if i can pull an invoice name out of the scanned documents
using OCR and then name the document using the invoice name, or at a
minimum populate a metadata field.

it seems like matthias has created an extension that will fit the bill
for my OCR needs, but i am having a little difficulty finding my way
around the docker container's environment.

do i need to customize the image in this case? or can i just install
matthias' document_analyzer extension by placing it in mayan's root
folder?

i just need to know what paths to use in the instructions on the
extension's git site and quoted below:

> Installation
>
> clone the sources from gitlab to you local env.
>
> add an link from your mayan/apps folder to the document_analyzer folder:
> cd /yourmayanroot/apps
> ln -s /yourgitroot/document_analyzer/document_analyzer/ .
>
> In your settings/local.py file add document_analyzer to your
INSTALLED_APPS list:
> INSTALLED_APPS += (
> 'document_analyzer',
> )
>
> Run the migrations for the app:
> mayan-edms.py migrate

i'm pretty sure local.py sits in /var/lib/docker/volumes/mayan_settings/
_data/ and that i can make the mentioned changes there.

it's figuring out what to substitute for "yourmayanroot" that has me
stumped. i don't have an apps folder in /var/lib/docker/volumes/
mayan_settings/_data/

problem is there are duplicates of these files and folders sprinkled
around the image: in hashed folders at /var/lib/docker/aufs/mnt and so on.

thanks again for your time!

doug van es

Roberto Rosario

unread,
Jul 31, 2017, 6:54:35 PM7/31/17
to Mayan EDMS
Hi David,

I created a support plan subscription for governments with a reduced price. Apart from the price, it offers the same benefits as the commercial plan. Details at the website: https://www.mayan-edms.com/providers/ 

I hope the answers and the support provided for free are a good introduction to the greater level of support you will receive for a paid plan.

Thank you.

Morgan Boyd

unread,
Jan 6, 2018, 1:30:54 PM1/6/18
to Mayan EDMS
Hey Doug - I managed to get this configured with my Docker container.

Here's how I did it.

1: created a docker volume for "dms_apps"
2: mounted volume to Mayan container to a local path of /etc/mayan (my version is configured with the MySQL option, so this dir was empty)
3: created the symlink to /usr/local/lib/python2.7/dist-packages/mayan/apps
4: Ran mayan-edms.py migrate
5: Restart container (may take a bit longer to initialize)

After the restart, the Analyzers option is available within Setup.

Noong Biyernes, Hunyo 23, 2017 ng 8:37:25 AM UTC-5, si Douglas Van Es ay sumulat:

Douglas Van Es

unread,
Jan 16, 2018, 12:15:55 PM1/16/18
to mayan...@googlegroups.com
thanks for the little roadmap morgan!

it'll take a little more reading on my part, but i'll give it a go when i
get a little more free time!
>> >> > <doug....@gmail.com <javascript:>>:
>> >> Ez6Z...@public.gmane.org <javascript:>
Reply all
Reply to author
Forward
0 new messages