Import an article from wikipedia

299 views
Skip to first unread message

Diego Mesa

unread,
Jun 7, 2018, 9:41:53 AM6/7/18
to TiddlyWiki
Hello all,

I've come across some previous posts about this topic, but others were interested in importing the entire Wikipedia database (awesome!). I want a simple way to to just import one article or section of the article. Right now, I copy/paste and in a text editor make the necessary changes in wiki markup myself for khatex markup, bold, etc.

Does anyone know of a better way? Or has anyone written any personal scripts for this that they wouldn't mind sharing?

Thanks!

Thomas Elmiger

unread,
Jun 7, 2018, 2:30:50 PM6/7/18
to TiddlyWiki
Hi Diego,

When I copy text from websites like Wikipedia I copy HTML.
For this I inspect the beginning of the part I want to copy using the browser’s developer tools on the right mouse button.
I try to locate the tag that contains the desired content, (right) click it and choose „copy outer HTML“ that should copy everything including the surrounding tag.
I post that into a tiddler and already have basic formatting.

Sometimes I even imitate some CSS of the source to re-implement e.g. boxes with background colour.

As my HTML and CSS understanding is sufficient for most parts of the web I find this process very efficient – surely it isn’t for everyone.

Cheers,
Thomas

Diego Mesa

unread,
Aug 4, 2020, 5:37:23 PM8/4/20
to TiddlyWiki
Thanks for your answer Thomas.

I just want to revive this thread in case anyone else has had some luck importing the actual wikipedia/wikimedia markup syntax.

bimlas

unread,
Aug 10, 2020, 8:03:36 AM8/10/20
to TiddlyWiki
Diego,

In order to preserve references (e.g. web pages) in their entirety in TiddlyWiki, in addition to the copied text, you should also import the images in them, plus convert the formatting to TiddlyWiki syntax. Instead, it's easier to export the website to a separate file.

The list of possible document formats is relatively long, but Tiddly supports few of them, with PDF and EPUB formats being the most preferred in my opinion.
  • The file can be imported into the single HTML wiki
    • When you download the wiki, all images and documents are downloaded along with it
    • If you create a password-protected wiki, the files in it will also be encrypted
    • The larger the HTML size, the slower it will load and save over the web
      • A document or image makes an HTML file as large as adding hundreds of regular tiddlers
      • Convert images to another format, resize them, delete metadata to reduce their size to a minimum
      • It is a good idea to store documents in EPUB format instead of PDF, as this is the smallest size
        • In this case, you need external software to view the documents
  • You can also store the file separately
    • The size of the single HTML will be much smaller, so loading and saving will be faster
      • You can link to a virtually unlimited number of images and documents, with a minimal increase in wiki size
    • Since we handle separate files, it might be worth considering switching to TiddlyWiki on Node.js, because in the case of single HTML, the files are uploaded in an interface independent of TiddlyWiki, but in Node.js, TiddlyWiki itself takes care of saving them to separate files
    • The wiki will be incomplete if the files are missing, the link to them will be broken, due to the relative addressing it is not even possible to view them online
    • Since the size of the files is no longer a problem, it's a good idea to store the documents in PDF, because this way you can view them directly on the wiki, you don't have to download the file separately and open it in the software that conforms to the document format.

bimlas

unread,
Aug 10, 2020, 8:08:48 AM8/10/20
to TiddlyWiki
Forgot to mention how to save a webpage as a PDF: https://www.digitaltrends.com/computing/how-to-save-a-webpage-as-a-pdf/

Diego Mesa

unread,
Aug 10, 2020, 11:22:30 AM8/10/20
to TiddlyWiki
Thanks for the response bimlas!

Im not so much interested in completeness, as I am being able to edit my local copies of wikipedia articles. I still think integrating tw into pandoc is the way to go

Mark S.

unread,
Aug 10, 2020, 12:34:57 PM8/10/20
to TiddlyWiki
With 5.1.22 the support for Markdown is greatly improved and allows some Wikitext.

The problem with Wikipedia, is that it uses a lot of floating boxes and complicated CSS to get it's look. Getting that look into TW wouldn't be easy, no matter what approach you used. But grabbing the essence of a page isn't too hard.

Using the extension copycat, you can copy the markdown text of an article, and paste it into a markdown tiddler. It's fast, and editable, and communicates the main features of the original page. The styling with floating images and boxes, that's going to be different.

There seems to be a lot of denial in the TW community re the status of Wikitext vs. markdown. These conversations remind me (showing my age) of all those people who maintained that beta was better than VHS. It doesn't matter which is better, it matters which is supported and maintainable. Markdown has won, for better or worse. There is lots of support for it everywhere. TW is only supported inside of the TW project.

Joshua Fontany

unread,
Aug 10, 2020, 6:47:46 PM8/10/20
to TiddlyWiki
https://github.com/jeffrey4l/pandoc-addons/blob/master/md2tid.lua 


https://github.com/tonywoode/unslack/blob/master/tiddlyToMd.js

Found these scripts on github. If I poke around, I might be able to make a button that converts a type:text/markdown tiddler to a type:text/vnd.tiddlywiki tidder (&/or vice versa). I am also aiming to import Wikipedia articles in order to re-mark them up with internal TW links. 

Best,
Joshua Fontany

TW Tones

unread,
Aug 10, 2020, 7:53:21 PM8/10/20
to TiddlyWiki
Folks,

Given Wikipedia goes beyond a single edit field and constructs boxes tabs and sections It would make sense to build a tiddler view template for this purpose. Add to that some features to copy to clipboard the result.

Tiddlywiki's templating options are mature and feature rich. 

You could make a template that looks almost identical to a wikipedia page, at least the content view, then you could have a "local to tiddlywiki talk or notes tab".

Another approach I use as Thomas points out is to use copy a selection as HTML and paste the HTML into a tiddler, one advantage being the links can be retained if you set a value for the domain the relative references refer to. HTML can be parsed and the html tags used to programaticaly extract data as well.

Regards
Tony

Mark S.

unread,
Aug 10, 2020, 11:10:06 PM8/10/20
to TiddlyWiki
Once upon a time I made a lua filter for conversion:


But Pandoc itself uses Haskell (why do they use a separate language for their filters?). So this is not the same as having a native pandoc converter.

Another approach is to convert HTML to wikitext, as done at this site:

Reply all
Reply to author
Forward
0 new messages