scanningcabinet and pdfs?

73 views
Skip to first unread message

Gina White

unread,
Dec 30, 2022, 8:30:39 PM12/30/22
to per...@googlegroups.com
Hi,

I've been poking at the scanningcabinet app and think I want something similar to it, but with a few changes.

1.
Add support for pdfs.  Offhand, I think this would mean possibly modifying the upload code to accept pdfs and likewise the display code, which I suspect is hard coded to expect multiple pages with each page as an image.
2.
Add support for a 'who' attribute which I imagine would be somewhat orthogonal to the tags attribute.  In my view, 'who' would let one tag a document with the senders/receivers where the tags attribute would be more related to the type of document.  So a document might be tagged 'phone, bill' with a who attribute of 'tmobile', for instance.

Would you be open to me modifying scanningcabinet to support these things?  If so, do you have any thoughts about the direction you would like me to go?

Alternatively, I could create a separate app, which could be plugged into devcam, or not.  In that case I'd be somewhat tempted to write the separate app in python as I haven't really written go in quite a while now.

- Gina

Ralph Corderoy

unread,
Dec 31, 2022, 5:37:55 AM12/31/22
to per...@googlegroups.com
Hi Gina,

> Add support for a 'who' attribute which I imagine would be somewhat
> orthogonal to the tags attribute. In my view, 'who' would let one tag
> a document with the senders/receivers where the tags attribute would
> be more related to the type of document. So a document might be
> tagged 'phone, bill' with a who attribute of 'tmobile', for instance.

Does a compound tag like ‘who:tmobile’ give that function?

--
Cheers, Ralph.

Gina White

unread,
Dec 31, 2022, 10:51:19 AM12/31/22
to Ralph Corderoy, per...@googlegroups.com
Hi Ralph,

I think there are two contexts to consider:

1.
In the context of the UI, I believe separating 'who' from 'tag' makes sense.  In my current system for managing pdfs, I think of these two concepts differently, sometimes searching on one or on the other, depending.  Likewise, I'll do 'tag gardening' a lot more aggressively than trying to manage all the possible who values, which are largely out of my control.

2.
In the context of the schema, I don't really know.  I lean towards a separate field there too, but I'm having trouble articulating why and maybe it is less important.   I guess it mostly depends on how it would affect other apps.  If other apps are using the tags attribute too, will it be a problem or a benefit if there are a few hundred 'who:' entries in there (in my case)?

FWIW, I'd be totally find with naming the who attribute in the schema something along the lines of scanningcabinet_who or pdfcabinet_who or whatever. While I think some concept of 'who' could be useful to other apps (e.g. an email importer, the twitter importer etc), that probably balloons into a larger problem that ends with separate who permanodes with their own attributes like email address and twitter handle. And tags, lol.

- Gina
-- 
You received this message because you are subscribed to the Google Groups "Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email to perkeep+u...@googlegroups.com.


Gina White

unread,
Jan 2, 2023, 2:33:54 PM1/2/23
to per...@googlegroups.com
I've since created PR #1643 https://github.com/perkeep/perkeep/pull/1643,  where I went with the 'creating a separate app' option (in go).

In a PR comment, Micah asked if it would make sense to instead build the pdfcabinet upon/into scanningcabinet.  I feel like it is better to try and continue this discussion here rather than in the PR.

I guess I'd like to get a sense of whether anyone actually cares about scanningcabinet.  Not sure since it was sitting somewhat broken when I started playing with it recently. https://github.com/perkeep/perkeep/issues/1635

If nobody cares about scanningcabinet, perhaps the best thing to do is to remove it.

If someone does care about scanningcabinet, how would you feel about merging pdf functionality into it?  The primary difference, I think is the 1:1 nature of of pdfs and documents where scanningcabinet expects multiple images(pages) per document.

That will show up in the UI, especially in the creation of documents.  In scanningcabinet, you are expected to select some images, then click the button to turn that into a document.  In pdfcabinet, you click the button associated with the pdf you want.  Each flow should ideally be optimized for the user, to make it easy to create documents quickly.

If we were to try to merge pdfcabinet/scanningcabinet together, I lean towards some kind of mode the user can set to decide how they want to create documents (selecting multiple items vs. a single item).

The display of documents will also be affected by this.  scanningcabinet lays out multiple images (using img tags I assume) where pdfcabinet uses an object to embed the pdf into the html page.  This seems manageable...just thinking out loud, but I lean towards, at document creation time, marking the document permanode with an attribute to indicate which kind it is.

Does anyone else have thoughts related to this?

- Gina

Ralph Corderoy

unread,
Jan 3, 2023, 2:13:55 AM1/3/23
to per...@googlegroups.com
Hi Gina,

> Does anyone else have thoughts related to this?

I'm not a Perkeep user, but since you asked...

> The primary difference, I think is the 1:1 nature of of pdfs and
> documents where scanningcabinet expects multiple images(pages) per
> document.

If I had a sheet-feeding scanner, I might stack fifty bits of paper in
it, come back to a single PDF, and then want to identify which pages
belong to which document: some are a single page, some a run of three.

I don't want to stack one, get a PDF, stack three, wait, get a PDF, ...

--
Cheers, Ralph.

Michael Hoffmann

unread,
Jan 3, 2023, 6:25:24 AM1/3/23
to Perkeep
Hi,

by chance i was starting to check out scanning cabinet lately to organize my documents in some way ( have not yet done anything in production though ). PDFs would be more useful to me i think. If you have a separate app and are going to dogfood it that would be pretty cool too i think.

Michael

Gina White

unread,
Jan 3, 2023, 8:21:12 AM1/3/23
to per...@googlegroups.com
Fwiw I scan everything at once, then later create pdfs out of the stacks, using gscan2pdf. I like that it lets me rotate and ocr as part of the process. I also find that more and more folks are emailing me pdfs instead of sending me paper to be scanned as time goes on so I need a system for managing pdfs.

Still if anyone is using scanning cabinet, I don’t want to impact their use negatively.

-- 
Cheers, Ralph.

-- 
You received this message because you are subscribed to the Google Groups "Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email to perkeep+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages