Is there a way to scrape a web page and put it in a tiddler?

1,213 views
Skip to first unread message

Andy Pastuszak

unread,
Aug 1, 2016, 2:55:49 PM8/1/16
to TiddlyWiki
As the title says, is there a way to scrape a web page and throw all it's contents (including images) in a Tiddler?

Marcus Baw

unread,
Aug 1, 2016, 3:32:41 PM8/1/16
to TiddlyWiki
+1 for this functionality, I would be willing to pay for a browser extension that did exactly this. It would be similar to the Evernote Web Clipper, which is now about the only thing that's good about Evernote... :-|

SteveSchneider DesignWriteX

unread,
Aug 1, 2016, 5:07:07 PM8/1/16
to TiddlyWiki
The best firefox extension that captures html, and rewrites links to images as local, is Scrapbook:

https://addons.mozilla.org/en-US/firefox/addon/scrapbook/

It's old, and not updated.

But, it saves nicely archived local html versions of web pages with a stable URL based on timestamp. Your repository can be served via dropbox or other way, so you have a link to a stable archived web page.

Use iframe, or explore importing into TW. This is the best I've been able to do in many years of searching.

I have some macros left over from TWC days if you want to see possibilities. You might also explore an old circa 2012 implementation called a "Lightweight Web Archiving" tool that have some instructions and documentation to get you started: https://drive.google.com/open?id=0B6pMEe8dCtrQYTJmOTkzMzMtYjdlZS00ZDczLWIyNjItNzBkYjc3OTE4NTgy

Let me know if this is at all helpful,

//steve.

RichardWilliamSmith

unread,
Aug 1, 2016, 6:47:13 PM8/1/16
to TiddlyWiki
Hi All,

If you haven't seen it, you need to look at Tiddlyclip - http://tiddlyclip.tiddlyspot.com/ - it's a TW plugin + browser extension that lets you define 'clipping', 'snipping' or 'pinning' type actions to ingest web content into TW.

It is very customisable but lacking a polished interface - you have to get in and edit the config tiddlers. It can copy plain text, html, tiddlers, images etc. etc. and format them to suit your needs (eg; it is very useful to have an "attribution" field pointing to the original source of the material.

Regards,
Richard

Uwe

unread,
Aug 12, 2018, 3:36:49 AM8/12/18
to TiddlyWiki
Hi,

sadly, none of the here described solutions for copy and pasting websites to a tiddler in tiddlywiki work anymore.

Not compatible:


My workaround at the moment:
  1. mark the wished area on the website
  2. right-click in Firefox "Element untersuchen (Q)"
  3. mark the line over the wished text (difficult to describe)
  4. right mouseclick: Copy - "äußeres HTML" (outer html)
  5. Paste the code in your tiddler
  6. at the top write the source website <base href="http://thewebsite.de/"/> so you can see Images from the website and links work.
  7. Close the tiddler and save, ready.
(Yes, this is no fun for hundreds of tiddlers.)

Two notebook tools use a web clipper: Joplin with "Joplin web clipper" and OneNote with "OneNote web clipper" - The best thing would ba an update for these "Tiddlywiki web clippers", but perhaps in the meantime there are new solutions out there?

Greetings,

Uwe

Ton Gerner

unread,
Aug 12, 2018, 3:55:33 AM8/12/18
to TiddlyWiki
Hi Uwe,

There is a new version (prerelease) of TiddlyClip: https://github.com/buggyj/tiddlyclip/releases

Cheers,

Ton

TonyM

unread,
Aug 12, 2018, 4:21:03 AM8/12/18
to TiddlyWiki
Andy,

It can depend on what you are wanting to achieve and what the quality of the source is or have they actually tried to stop you copying their page.
Is it reference material? Is it code you want, are you after its media, do you want to direct people to the information within a page, you want a thumbnail pre-view and a link to the source.
A lot of webpages access data via a back-end server and wont give you their data without effort.
  • Out of the box you can copy html and paste it into a tiddler and set the type to text/html
  • You can use Browser tools to copy as text and other forms
  • Online wikis can use iframes, even olnies ones if you save it to your local system.
  • You can print a browser page to PDF and drop it on your wiki (or link to it) I have a tool that does thing and can retain hyperlinks.
  • You can view source and copy it to paste into a tiddler as a lot of HTML works in tiddlers
  • You can take screenshots and drop them in tiddlywiki
And if you still need to programaticaly scrape data from a site consider using another tool to do so, then import what you want into your wiki.


Tiddlywiki supports most web standards well, and depending on your need you can spend more or less time capturing web resources but there are plenty of tools for this even without tiddlywiki that may help you extract exactly what your want to put in your wiki.

Regards
Tony
Reply all
Reply to author
Forward
0 new messages