This would be a specific directory only containing a specific type of file.
It seems like a good idea, but how would you handle metadata? I thought about using a script to automate uploads for a specific documents but I'm afraid that poorly documented files would render my collection useless in the end.
--
---
You received this message because you are subscribed to a topic in the Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mayan-edms/L2RnhallmnM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mayan-edms+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This feature was actually started some time ago (https://github.com/mayan-edms/mayan-edms/blob/master/mayan/apps/sources/models.py#L194) but is not yet enabled because it depends on some scheduling update that have not made it into the master branch.
As for metadata, I came up with some ideas but none are implemented. One was to let users set default metadata values as well as document type for each watch folder. Another idea was when a document is being imported from a watch folder to look for a file with the same name but with the .metadata extension. No design decision has been reached yet so any ideas are welcomed.
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+...@googlegroups.com.
* Roberto Rosario: " Re: [Mayan EDMS: 761] Automatic upload from certain
staging folder" (Wed, 30 Jul 2014 13:36:51 -0400):
> This feature was actually started some time ago (
> https://github.com/mayan-edms/mayan-edms/blob/master/mayan/apps/sources/models.py#L194)
> but is not yet enabled because it depends on some scheduling update that
> have not made it into the master branch.
>
> As for metadata, I came up with some ideas but none are implemented. One
> was to let users set default metadata values as well as document type for
> each watch folder. Another idea was when a document is being imported from
> a watch folder to look for a file with the same name but with the .metadata
> extension. No design decision has been reached yet so any ideas are
> welcomed.
Both possibilities could have their individual use cases, for which they fit
best. The most flexible approach is the second.
What I found when evaluating other DMS software:
- Inclusion of some identifier on the document (could be a barcode, or some
special formatted string, or...). This identifier must not necessarily be
fixed on the document, but could be the first page of a scan or some paper
scanned together with the document. This method applies preferably to scanned
documents.
- Rather straightforward is a sort of recognition, where templates can be
defined containing regions formatted in an individual way. E.g. if you have a
supplier with his custom invoice format displaying the invoice number, date,
amount at fixed places, they could be used on such a template and the software
can check, if the document contains such a region.
Perhaps this could be used slightly modified but simpler by defining
string patterns, that could be matched on the OCR result. So at last
repeating patterns could be used to extract metadata.
In any case I would find useful a document queue containing documents
already processed (OCR available), but still to be completed with metadata. So
to speak an inversion of the current workflow (where metadata are defined
first).
As already discussed in https://github.com/mayan-edms/mayan-edms/issues/9) I
think it would be best for the manual completion of metadata to have a view of
the document together with its OCR data available directly on the metadata form.
I am imagining a staging folder, from which the documents are processed
immediately. If after the initial processing no metadata are available
for the document, it is added to the postprocessing queue. When finally
(manually) processed, those documents are removed from the queue.
There should be some configuration options:
- Which metadata are required to be filled for a document to be able to leave
the queue?
- Should only documents missing the required metadata be added to the queue or
just all (if postprocessing control for all processed documents is desired)?
So far my brainstorming at this very moment, comments as always very welcome.
> > - Rather straightforward is a sort of recognition, where templates can be
> > defined containing regions formatted in an individual way. E.g. if you
> > have a
> > supplier with his custom invoice format displaying the invoice number,
> > date,
> > amount at fixed places, they could be used on such a template and the
> > software
> > can check, if the document contains such a region.
> >
>
>
> Regional OCR is a must have feature and usually a defining feature of the
> commercial offerings, I don't know how accurate OCRing a rectangle of text
> would be but if there is a need for the feature let's do it. I see some
> requirements, we need a way to let users mark/highlight the fields they
> want scanned and entered as metadata. This would require some design
> decisions (do we store the cursor's x and y positions of the square to be
> scanned or the x and y % in relation to the current zoom level)
The more agnostic of the zoom level, the better. So I would think x and y in
relation to X and Y (where X and Y are the dimensions of the whole page).
> and a rich client w/ corresponding API endpoints to talk to the backend.
Do you mean, a separate client is needed for that purpose?