Document Viewer Options for Archipelago: Unify to single format or preserve original when displaying? Or both?

39 views
Skip to first unread message

Diego Pino

unread,
May 4, 2020, 10:33:36 AM5/4/20
to archipelago commons
Good Morning folks,

So happens that Document uploading (also in our current reality) using other than just the famous and (in)famous PDF (and PDF/X, etc) is an actual use case/ need! In old times (my times) people used to upload documents in PDF/A to ensure archival preservation but there is right now a trend on letting all media conversion happen on the browser (thank you google docs for making not-available your code but making what you do the standard!) so, i was thinking 

A) I'm gonna ask people what they want?
B) I'm gonna propose some options
C) I'm gonna code

So for A) what is that you want people? What are your use cases? Do you want to upload DOC/DOCX/PPTX/PPT etc and 

Option 1: Allow users to see them as they are created (in their original form, or close to that.. remember many formats can be commercial/propieraty)
Option 2: Obscure the fact/format and unify into a single format (like HTML e.g, a favorite of the self-publishing-movement or PDF? Hated my many/loved by the other 50%)
Option 3:  Other? Both? None?

for B) I propose what can be done without becoming ourselves google inc.

- 1. A Viewer/Formatter that can use Google.com and Office (MS) online viewers to display/render any MS Office propietary format inline (iframe..). tested and it works if the document in archipelago can publicly be accessible.

- 2. A more generic local viewer that requires you to upload the same MS documents in  their respective Open Standard formats (Can be done in MS Word/Open office, etc) and can render them online: benefit: unified viewer experience, of course there could be edge cases and things we can not control (like password protected documents, wordstar or wordperfect documents from 1991, etc)

- 3. A post processor that unifies file formats/standards. This, given the fact it is a binary (and a super cool one named... wait for it... no, i won't share it here!..oh well, all right, its named pandoc) can read almost any format and output anyformat. From epub to emacs. Question would be unifies to which one? I like the idea of rendering directly HTML. What is even more portable than that? But i can be convicend otherwhise specially if people are very into the formatting / layouts than just the content in a readable way

C) i will do 1 and 2, and will bring 3 into Strawberry Runners as an option. Want to help? Please!!

All this said. I have an additional question

How would anyone would like Archipelago could/decide/which of this option applies to you when someone hits a Digital object page? Basically, when/how to use 1 or 2? Some automatic display? based on the files that are present? based on the rdf type ? Like if its of schema.org  type Document?

If any of you have time for a comment here, or some ideas (please reply to all, i get super nice replies that go directly into my inbox but sadly nobody else sees, we are all here to learn and share, there are no wrong answers or reasons to be shy, really) i would really appreciate that.

Thanks a lot

Diego Pino
Metro.org




 


  

Nate Hill

unread,
May 4, 2020, 10:46:36 AM5/4/20
to Diego Pino, archipelago commons
I've been thinking, based on the work that IMLS is doing right now with their COVID research partnership, that a repository for locally produced reopening plans and guidelines might be useful. This could ultimately feed whatever resource OCLC builds for that project. Alternately or additionally, it could harvest relevant information from there or any other resource. I could see a New York State version of this functioning as a nice prototype for a no frills document repository.

One key to getting folks to actually use a repository like this would be to make it as simple as using Drive, Onebox, Dropbox, whatever. Sign up, get a username, add your doc, minimal metadata, and go. Simple, costrained workflows.

--
You received this message because you are subscribed to the Google Groups "archipelago commons" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archipelago-com...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archipelago-commons/af793aa7-5ba4-4033-8997-c5ed74c918d7%40googlegroups.com.


--
Nate Hill
Executive Director
Metropolitan New York Library Council

Tim Spindler

unread,
May 4, 2020, 12:50:48 PM5/4/20
to Nate Hill, Diego Pino, archipelago commons
Nate,

I think that is a cool idea about a COVID repository of opening plans.  It could be useful to show Archipelago capabilities.


Tim Spindler
Executive Director | Long Island Library Resources Council
tspi...@lilrc.org | 631-675-1570 x2000
http://www.lilrc.org



LILRC is a member of the Empire State Library Network (ESLN)

Follow LILRC on Facebook Twitter Instagram


CSHL Library

unread,
May 11, 2020, 2:34:43 PM5/11/20
to archipelago commons
At least for the MS formats, it would be really nice to be able to edit the documents stored in Archipelago and have the changed document uploaded as the next version. Perhaps implementing some sort of connector to Office 365. This would give Archipelago a little document management flavor.

-Tom

Diego Pino

unread,
May 19, 2020, 9:53:26 AM5/19/20
to archipelago commons
Hi folks, thanks for your feedback. We are still processing some of the options (coding) we have and also thinking on building some ´smart´ field formatters that should be able to deal with more file options and decide for you which respective viewer fits better. Tom, the Office 365 is something i have been looking at but i also feel allowing a repository item that was deposited to be edited inline is a more complex use case we should research more. The number of moving parts (versions, encodings, provenance, etc) that could be affected/involved by allowing that are large and i want to be sure we don´t introduce too many setup steps and dependencies too. Is in scope of our project of course  and i will keep that in a separate issue as a Use case.

Best

Diego

Diego Pino

unread,
Jun 24, 2020, 9:54:04 AM6/24/20
to archipelago commons
A tiny update/question regarding this:

Been thinking and testing this. So i have been watching how some US institutions have been publishing their research/findings/Covid recomendations as PDFs. PDFs are an archivist favorite and are not really for discovery and, wait for it, for accessibility. So basically they preserve (great we want that) but they do not expose content in HTML which is really what search engines, phone browsers, readers, voice readers, rss, etc, etc love and are made for. So. I feel i will say that if going of a public unifying format is desired here, i will recommend anything that can be natively rendered as HTML. Of course uploaded PDFs will be viewable and be downloadable (we already allow this).

 Any one has strong feelings about this? Would love to argument against this? 

I feel i keep returning to 1996, when things were easier in some aspects and one of those aspects was actually pure HTM(L)

Also: i have these two ISSUEs open if anyone wants to chime in (good motivator to make use of your github account) 
And here (unveils secret room)

Cheers

Diego

Nate Hill

unread,
Jun 24, 2020, 9:59:04 AM6/24/20
to Diego Pino, archipelago commons
Diego, the main argument I have heard against this was related to the NYC Internet Master Plan, a pdf.
There was a discussion of "why not HTML" or "why not an EPUB" for this document.
My understanding is that the concerns were related to consistency in layout and design, especially as related to printing it.
Not sure if this addresses your question....?

--
You received this message because you are subscribed to the Google Groups "archipelago commons" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archipelago-com...@googlegroups.com.

Diego Pino

unread,
Jun 24, 2020, 5:04:58 PM6/24/20
to archipelago commons
Nate, i agree. You answer answer one the angles of my question. There is a level of visual inconsistency when transforming automagically PDFs to e.g HTML. The printing issue you are touching is a quite peculiar local problem/reality. Printing things is still a very common thing in the US, actually i have never seen before so many test prints, shredded pages and old printers in the garbage as here! Probably because keeping physical copies of tax documents, legal docs is required by law but also a love for paper, like checks!!! So, i do see, contextually in this scenario (and i believe it also true for archives) that PDFs are OK and needed and we will always preserve them. But i also believe that they can not be the only source and should serve hopefully as sources of 1:1 representations but not as main content on a repository. Accessibility is important. I agree also, Design v/s readable content will clash many times, for a repo i feel we need both. And more metadata of course.

Thanks
To unsubscribe from this group and stop receiving emails from it, send an email to archipelago-commons+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages