How to preserve newsletters in RSS feed

26 views
Skip to first unread message

daniela_w...@subud.org

unread,
Aug 23, 2019, 11:48:27 AM8/23/19
to AtoM Users
This is not a technical question but more about how to put certain types of documents on the AtoM database.

I am currently an archivist for a non-profit organization. My organization has newsletters from various wings of our organization and national newsletters from our member countries. I have collected these newsletters for many years in paper and PDF format and put them on our AtoM database. Now our members are publishing their newsletters in RSS feed. I need to find a way to continue to collect, preserve, and provide access to these newsletters but don't know how to preserve RSS feed publications. Does anyone have experience in how to capture RSS feed newsletters in PDF?


Thank you for your help,


Daniela Moneta, MLIS

World Subud Association

Dan Gillean

unread,
Aug 26, 2019, 11:13:45 AM8/26/19
to ICA-AtoM Users
Hi Daniela, 

Tricky question! I'm not personally very experienced with web crawling (so I could be wrong about some of the following!), but essentially that's what I believe would be required to capture this content. 

RSS is essentially an XML protocol that is used to expose web content to aggregators (such as RSS readers) - it passes a bit of metadata about the content being shared, which remains in HTML form generally. Are they still sharing the newsletter in PDF form in the RSS feed, or is the newsletter now fully HTML, that is being shared via RSS?

If the former, then I believe the source PDFs should still be discoverable on their website - RSS is just a way of automating the sharing of website updates, it won't change the content from HTML to PDF on its own. 

If however they have switched to using HTML for the newsletter content and are using RSS to share that, then it's trickier. At that point, I think you need to look into web crawling tools, such as webrecorder.io, wget, Heritrix, or many others out there. if you want to automate saving the content directly from the RSS feed. Either way, the format you'll get (such as a WARC file, or a JSON/XML/CSV, depending on the tool and the settings) will not be the PDF you expect, and if you upload these files to AtoM, they won't be able to generate a thumbnail etc, and users will need to download them locally to view. 

If you're willing to do some manual work, and have a PDF creator (such as Acrobat, NitroPDF, etc), then you could manually find the Newsletter web page, and save it locally as a PDF. It might not have the same look as previous newsletters that were formatted specifically for PDF, but it would be something you could upload to AtoM as before. 

Good luck! Hopefully someone else on the list with more knowledge than I on this topic might have other suggestions! 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/d33fd13b-355d-455f-8cb9-b48c0e91e865%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages