[TW5] Custom import of (large) XML file and creating tiddlers from its contents

163 views
Skip to first unread message

TheDiveO

unread,
Feb 23, 2017, 2:32:49 PM2/23/17
to TiddlyWikiDev
Hopefully, someone can shed some more light on whether this is feasible in TW5, or whether I should rather go for a stand-alone converter tool...

I would like to be able to drop an XML file, which conforms to a certain schema, onto a TW5 instance. It should then be parsed and tiddlers created or updated based on the XML file's contents.

  1. Is it possible to write custom "import modules"? If so, where can I find examples, such as existing import modules?
  2. My XML file is going to be huge, several MB in size, and it would result in around 10,000 small tiddlers. Feasible inside TW5 as to parsing?
Best regards,
TheDiveO

David Szego

unread,
Feb 23, 2017, 9:03:55 PM2/23/17
to TiddlyWikiDev

I wrote a plugin to make an AJAX call to a proxy server, send username/password/address (and some other parameters), and parse back a JSON response of messages in an IMAP mailbox....

$:/plugins/Cardo/macros/loadJSONbyAJAX.js

which can be found at:

http://cardo.wiki/#%24%3A%2Fplugins%2FCardo%2Fmacros%2FloadJSONbyAJAX.js:%24%3A%2Fplugins%2FCardo%2Fmacros%2FloadJSONbyAJAX.js

It's probably a good enough base for what you're trying to do.

Also, there is the tm-import-tiddlers and tm-perform-import messages in the TW Core which you could base an importer on.

http://tiddlywiki.com/upgrade.html itself seems to be a version of what you're looking to do, in a sense.

Jeremy Ruston

unread,
Feb 24, 2017, 3:34:00 AM2/24/17
to tiddly...@googlegroups.com
Hi TheDiveO

I would like to be able to drop an XML file, which conforms to a certain schema, onto a TW5 instance. It should then be parsed and tiddlers created or updated based on the XML file's contents.

The new bibtex plugin does exactly that:


It uses a third party library to actually parse the bibtex.

The main gotcha is that the deserializer is chosen based on the extension of the incoming file; there’s no way to give the user a choice over how an incoming XML file, say, should be interpreted. So you’ll have to adopt a file extension for your XML files.

  1. Is it possible to write custom "import modules"? If so, where can I find examples, such as existing import modules?
“Deserializers” are the modules responsible for extracting tiddlers from a typed block of text.

  1. My XML file is going to be huge, several MB in size, and it would result in around 10,000 small tiddlers. Feasible inside TW5 as to parsing?
Interesting. On another project I have been experimenting with importing large files and been somewhat surprised by how gracefully browsers cope with large amounts of data. But working under Node.js is much more reliable.

Best wishes

Jeremy

TheDiveO

unread,
Feb 24, 2017, 12:55:04 PM2/24/17
to TiddlyWikiDev
Jeremy,

thank you very much for the pointers!

While I can perfectly understand the current architecture to chose the deserializer based on the extension of the incoming file, this quickly hits a dead end when it comes to generic file formats, such as XML/.xml, without a cleary differentiating file extension. Of course, a quick workaround would be to rename the import file. But the next time I or someone else importing an updated XML file version ... so what file extension do I need to use?

Also, over time I suspect that we may see more .xml files with different XML schemas.

Do you see a way to allow multiple XML deserializers, based on the root element -- or maybe just based on the beginning of the XML file to be imported? Kind of a mime-type detection, just on a XML file signature?

Best regards,
TheDiveO

TheDiveO

unread,
Feb 24, 2017, 1:03:50 PM2/24/17
to TiddlyWikiDev
Another question regarding XML parsing: are there any incrementally XML parser libraries for JavaScript? Our would it be better to use the built-in XML parser?

Jeremy Ruston

unread,
Feb 24, 2017, 1:14:10 PM2/24/17
to TiddlyWikiDev
Hi TheDiveO


While I can perfectly understand the current architecture to chose the deserializer based on the extension of the incoming file, this quickly hits a dead end when it comes to generic file formats, such as XML/.xml, without a cleary differentiating file extension. Of course, a quick workaround would be to rename the import file. But the next time I or someone else importing an updated XML file version ... so what file extension do I need to use?

Also, over time I suspect that we may see more .xml files with different XML schemas.

Do you see a way to allow multiple XML deserializers, based on the root element -- or maybe just based on the beginning of the XML file to be imported? Kind of a mime-type detection, just on a XML file signature?

Yes, I recognise the problems here. One solution would indeed be to have multiple deserializers registered to a file type and give each a chance to examine the incoming data, and if necessary prompt the user to choose the best one to use.

If you don’t want to go down a rabbit hole of core modifications I’d suggest sticking with the custom file extension workaround for the moment.

Another question regarding XML parsing: are there any incrementally XML parser libraries for JavaScript? Our would it be better to use the built-in XML parser?

I believe there are several incremental XML JS parsers; the xmldom.js parser that is used in a few of the core plugins however only works in “bulk” mode.

Best wishes

Jeremy


--
You received this message because you are subscribed to the Google Groups "TiddlyWikiDev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywikide...@googlegroups.com.
To post to this group, send email to tiddly...@googlegroups.com.
Visit this group at https://groups.google.com/group/tiddlywikidev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywikidev/9827917b-3822-46dd-a6e4-399fa9845f75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

TheDiveO

unread,
Feb 25, 2017, 11:52:39 AM2/25/17
to TiddlyWikiDev
Hi Jeremy,

when going with a custom file extension, would it be okay to call $tw.utils.registerFileType("application/x-rfc-index","utf8",".rfcindex") from inside my tiddler deserializer? This module would have a module-type "tiddlerdeserializer" and type "application/javascript".

Best regards,
TheDiveO

PS: good chance to eat my own dog food and using my ThirdFlow plugin.

TheDiveO

unread,
Feb 25, 2017, 2:32:19 PM2/25/17
to TiddlyWikiDev
A first simple test using the full-size 11MB XML import file shows an acceptable speed. After dropping the index file onto a TiddlyWiki browser window, it took maybe 20 to 30 seconds for the full import list with 10,000 tiddlers to appear. Quite acceptable I think.
Reply all
Reply to author
Forward
0 new messages