[TW5] Community search to aggregate specific #tiddlers, @Erwan ?

Mat

unread,

Dec 27, 2014, 10:54:57 AM12/27/14

to tiddl...@googlegroups.com, Jeremy Ruston, Daniel Baird

Erwan,

do you think you (or anyone) could modify your search engine to aggregate specific tiddlers, such as all tagged with #tag, or @modifier ?

I think you can see where my question is going... can we make a twitter type tool to to have tiddlywikis set up (e.g on tiddlyspot) titled #rocknroll.tiddlyspot.com that aggregates everything tagged such?

This would be absolutely fantastic.

<:-)

Felix Küppers

unread,

Dec 27, 2014, 11:33:16 AM12/27/14

to tiddl...@googlegroups.com, jeremy...@gmail.com, feed...@tiddlyspot.com

wow mat, nice idea.

Erwan

unread,

Dec 27, 2014, 8:25:19 PM12/27/14

to tiddl...@googlegroups.com

Hi Mat,

filtering the content by tag (or in any other way) is rather easy technically, but the difficulty lies in the heavy lifting of offline scraping, I mean the computational power needed for the offline part.

This is not visible when you look at the resulting wiki, but behind the scene the (full) wikis are downloaded as standalone html pages, then converted to Node.js version: only these steps take about 10 to 15 minute on my machine for around 30 wikis. I configured a daily update, which is enough for the current usage of the wiki, but it's ridiculous compared to real time updates, which is probably what you need for twitter-like behaviour. And of course the servers in charge would also have to be able to deal with a much larger amount of tiddlers. So basically I don't think the method is really scalable, but of course if somebody wants to buy a massive amount of servers we could try! ;-)

Anyway this is only my very "DIY" approach, maybe Jeremy or others have ideas about how to do this in a smarter way. I'm not at all an expert in this domain to be honest :-)

Also thank you all wiki authors for your agreement and everybody for the suggestions, this helps me figure out which direction to take. I did a list of the next possible improvements at https://github.com/erwanm/tw-aggregator/issues, don't hesitate to add or comment (but please be very patient for things to be implemented!).

Regards
Erwan

--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.
To post to this group, send email to tiddl...@googlegroups.com.
Visit this group at http://groups.google.com/group/tiddlywiki.
For more options, visit https://groups.google.com/d/optout.

Mat

unread,

Dec 29, 2014, 8:02:28 AM12/29/14

to tiddl...@googlegroups.com

Erwan, thanks for your reply. Very interesting. Some thoughts arise:

I don't know how long a conversion from TW into TW node.js takes but the other steps you describe should be pretty fast if automated (right?).In other words, if the TWs were already in node.js form then this would be much simpler, correct?

But how about this for the twitter type idea, if at all possible; maybe it'd be enough to start off with looking at tiddler
$:/core/ui/MoreSideBar/All and filter out the tiddlers that start with # and @. For twitter, you're not interested in all tiddlers, after all. Would this simplify/quickify your metod?

Further, if your engine is included in the local wiki, it could look at the date for the last update and filter not only the # and @ tiddlers, but also only include those added or modified since last update!

BTW, I just posted a question on the dev group if it is possible to include a filter straight into the url which would be another approach for this.

<:-)

Erwan

unread,

Dec 30, 2014, 2:49:03 PM12/30/14

to tiddl...@googlegroups.com

Hi Mat,

On 29/12/14 14:02, Mat wrote:

Erwan, thanks for your reply. Very interesting. Some thoughts arise:

I don't know how long a conversion from TW into TW node.js takes but the other steps you describe should be pretty fast if automated (right?).In other words, if the TWs were already in node.js form then this would be much simpler, correct?

Sorry maybe I wasn't clear: currently my process is already totally automated: my script does all the downloading, converting to Node.js, filtering (keeping only regular tiddlers), adding to the global wiki and finally converting back to standalone html. Also the script is called automatically every day with a cron task from a local machine.

I'm not sure but I suspect that publishing public wikis as node.js on the web would not make much sense, because then they wouldn't be readable by a standard browser (and this is a quite convenient feature :) ). Anyway imho the conversion is only one of the computer-intensive parts, but to be honest I didn't measure which part takes how much time (dowloading the wikis is also very significant).

But how about this for the twitter type idea, if at all possible; maybe it'd be enough to start off with looking at tiddler
$:/core/ui/MoreSideBar/All and filter out the tiddlers that start with # and @. For twitter, you're not interested in all tiddlers, after all. Would this simplify/quickify your metod?

Further, if your engine is included in the local wiki, it could look at the date for the last update and filter not only the # and @ tiddlers, but also only include those added or modified since last update!

Yes I see your point, but I can't do that technically (or I don't know how): I need to download the full html page, and I can only decode it by converting it to node.js. It's only at this stage that I can start manipulating tiddlers and filtering.

All this process is cumbersome because it takes place from outside TW. I have no idea if it could be done from inside (I know very little about javascript and about web development in general), but this would be the ideal situation indeed: a wiki A would be able to "communicate" with another wiki B, so A could send a filter request to B which would return only the required tiddlers, then A could transclude them the way it wants. In this case there is no conversion needed at all; storage might even be unnecessary, unless the workload is too much for the network; even in that case, if the tiddlers are copied, the update process becomes much lighter because we only need to obtain the most recent tiddlers, as you suggested.

Regards
Erwan

BTW, I just posted a question on the dev group if it is possible to include a filter straight into the url which would be another approach for this.

<:-)

Reply all

Reply to author

Forward