TW community wikis aggregator

201 views
Skip to first unread message

Erwan

unread,
Dec 14, 2014, 12:15:24 PM12/14/14
to tiddl...@googlegroups.com
Hi everyone,

I wrote a quick-and-dirty script which aggregates a bunch of wikis
(mostly those which appear in http://tiddlywiki.com/#Community) into a
single big wiki. The result can be seen here:

https://rawgit.com/erwanm/tw-aggregator/master/tw-aggregator/output/tw-aggregator.html

The idea is to have a big collection of tiddlers which can be searched
without browsing all the individual wikis one by one. The disadvantage
is that it is messy, contains inconsistencies and missing parts (but you
can always click on the source link to go to the original wiki).

Do you think this is useful? There is also the issue that I didn't ask
the authors if they agree to this: I assume that the proper way would be
to ask every author individually, but before doing that I just want to
see if there is some interest. For those who are on this list, is it ok
with you please?

If there is some interest I can configure an automatic daily rebuild, in
order to keep it up to date.

Regards
Erwan

PMario

unread,
Dec 14, 2014, 1:24:19 PM12/14/14
to tiddl...@googlegroups.com
While I think it is useful, to have a possibility to search for TW content, I see 3 major issues.

a)
The inconsistencies, that you mention. .. Some elements simply don't work in the aggregation TW, which will result in unnecessary support requests. If things are not working in the aggregation but work in the original wiki, the aggregation still may cause bad reputation for the original author. I think, this is problematic.

b)
You create a copyright / licensing issue. Some aggregated TWs don't have a license that allow modification and redistribution and therefore you are not allowed to modify and redistribute them, in the way you do. I know I'm nitpicking here but that's the truth.

c)
By design, TW isn't very friendly to search engines like google. That's why TW5 can create stand alone TWs that are pure html and SEO friendly. ...
Your TW still is an app and not SEO friendly, so we don't win. In the contrary. Google may put penalties on link farms and duplicated content. So probably both sides will loose here, your link farm and the original content author.

Also, if google publishes search results with my content, I personally don't want it to be a rawgit . com link. It should link to the "real source".

have fun!
mario

Jed Carty

unread,
Dec 14, 2014, 2:11:41 PM12/14/14
to tiddl...@googlegroups.com
I am not sure what the point is in this context. Just scraping all the tiddlers from TW sites without context is going to result in a largely meaningless pile of things. Like you pulled the tiddler  'Add a reset button that clears the form inputs without you having to make a new whatever with the form, or clear them manually for all forms.' from my site, which is just an entry on my task list that makes absolutely no sense out of context. Even with the link to the original it still has no context because the original is never seen by itself. So reading the source wiki by the link on many of the tiddlers isn't at all useful.
That and almost none of the examples work, at least from my site. I use a lot of system tags and plugins in the examples, so almost nothing of any use comes from it (aside from basic text markup examples, which you can get on tiddlywiki.com or probably on any other site you get stuff from). In fact, since the examples don't work it is probably counter productive since it has what are supposed to be examples of how to make something work, and then it doesn't work.
If it were a curated list than it would probably be very helpful, with something like a list of places to go to see how different people talk about using a table of contents or other topics. As it is the disadvantage of it being messy and inconsistent makes it almost impossible to use. Your solution of clicking on the source link doesn't help at all when it is something like 'Add a recepie book to the dashboard', which sounds like it tells you how to do it out of context, but it is just an entry on my task list.

So, a listing of all or most available resources with a list of what is available at each one would be great, but scraping the wikis and posting everything without any plugins or system tiddlers in another place isn't going to do anything other than result in a confusing jumble of mostly broken tiddlers.

Mat

unread,
Dec 14, 2014, 2:31:33 PM12/14/14
to tiddl...@googlegroups.com
Well... I think this is a first step into what can be absolutely fantastic! This is very much in line with the vision that I have for the futre with "loose tiddlers" from arbitrary sources being aggregated into separate tiddlywikis. Of course not everything is perfect at this first step, so let's see what needs improvement first and what to do about it.

The more people we get into the community, or at least the more TW5s that get out there, the more value in aggregating combinations into "cross-disciplinary" areas of interest. This is very exciting!

<:-)

Erwan

unread,
Dec 14, 2014, 8:40:27 PM12/14/14
to tiddl...@googlegroups.com

Thank you for the comments. I wasn't very satisfied either with the loss of functionality/design, then I thought of something slightly different: the system is now presented as a "community search engine", which is not meant to give access to the content directly but only to point to the original wikis. It can be found here:

https://rawgit.com/erwanm/tw-aggregator/master/tw-aggregator.html

remark: some "syncer-browser" error appear, but it doesn't seem to have any consequence (?). I don't know what this is, if somebody has an idea? (It doesn't happen with the local version)

It now works like this: all the imported tiddlers are converted into system tiddlers, thus making them (and the tags) invisible to the user. A custom search box is proposed which searches only among these tiddlers and gives directly the link to the original wiki (so the user never sees the content inside this wiki)... except that this part doesn't work yet: in the current version I didn't succeed in generating the links, certainly because I'm not very comfortable yet with all the widgets/macros stuff. My code is this:

\define build-link()
  <$view tiddler=TWAggregatorSources field={{!!source-wiki-id}}/>#{{!!source-tiddler-title}}
\end


<div class="tc-menu-list-item">
  <$link to=<<build-link>>>
    <$view field="source-tiddler-title"/> @<$view field="source-wiki-id"/> (address=<<build-link>>)<br/>
  </$link>
</div>

You can see it in https://rawgit.com/erwanm/tw-aggregator/master/tw-aggregator.html#%24%3A%2FCommunitySearchListItemTemplate

Basically I'm trying to concatenate the two parts of the link, where the first part is extracted from another tiddler based on the field "source-wiki-id". I can obtain the right string but cannot make a link out of it (hence the temporary "address=..."). If some of the experts could give me a little help on that, that would be great!

Regards
Erwan
--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.
To post to this group, send email to tiddl...@googlegroups.com.
Visit this group at http://groups.google.com/group/tiddlywiki.
For more options, visit https://groups.google.com/d/optout.

BJ

unread,
Dec 14, 2014, 8:51:13 PM12/14/14
to tiddl...@googlegroups.com
the

"syncer-browser" error appear,

probably comes from including

$:/plugins/tiddlywiki/tiddlyweb

cheers

BJ

PMario

unread,
Dec 15, 2014, 7:03:51 AM12/15/14
to tiddl...@googlegroups.com
On Monday, December 15, 2014 2:40:27 AM UTC+1, Erwan wrote:

Thank you for the comments. I wasn't very satisfied either with the loss of functionality/design, then I thought of something slightly different: the system is now presented as a "community search engine", which is not meant to give access to the content directly but only to point to the original wikis. It can be found here:

https://rawgit.com/erwanm/tw-aggregator/master/tw-aggregator.html

Hi Erwan ,

I'm much more in favour of this approach. ....

Some time ago I was investigating a little bit about TW searchability, for TiddlySpace based interconnected TWs.

eg:
 - phonetic search   ...   where you would get a result if the tiddler contains "phonetic" but the user searches for "fonetik"
 - search for the word stem   ...   eg: a tiddler contains "cat"  but the user searches for "cats" ...
 - lookup related words   ...   eg: you search for "child" and get hits for "kid, youngster, minor, shaver, nipper, small fry, tiddler, tike, tyke, fry, nestling" in the text. (sorting by relevance would be nice here. limiting the related words too :)
 - suggesting useful search terms with hit guaranty, when only 2 characters are typed yet ...

and so on.

The library I was thinking of is: natural [1]. It provides all the needed components.

Some components need a server side or preprocessing, some components may be part of the published TW.

To be useful, some components need preprocessing with large "lookup databases". So it isn't practical to include them in the published TW.

... Since you need a preprocessing step anyway, I think it would fit very well for an aggregated TW search index.

-----

This would remove the necessity to scrap and store the whole tiddlers, but instead store and publish the aggregated meta data with the according source links.

... (still an issue) Storing full text 3rd party tiddlers into the TW system area doesn't remove the licensing problem, it just makes it less visible.

... on the other hand:
Scraping, aggregating and publishing the described meta data creates new and useful content, which is similar to well known search engines. There are still some rules to follow, but they are much less critical.

have fun!
mario

[1] https://github.com/NaturalNode/natural

whatever

unread,
Dec 15, 2014, 10:53:14 AM12/15/14
to tiddl...@googlegroups.com
Does TW5 have the ability to mark your wiki as non-searchable? Or to mark individual tiddlers as non-searchable? So they would not get included in the aggregator's results?
w


On Sunday, December 14, 2014 6:15:24 PM UTC+1, Erwan wrote:

Tobias Beer

unread,
Dec 15, 2014, 11:36:58 AM12/15/14
to tiddl...@googlegroups.com
@Erwan,

I think this does have some potential and isn't as awkward as it first comes off.

What I would criticize most right now is...
  1. how ugly and unreadable the results are currently presented
  2. that that tiddler opens and not the original article in a new tab
What this aggregator can indeed do well is render search results as titles that directly link to the original source. No preview, because chances are that it's broken beyond acceptable, so that's not helping at all.

If available, the result list should also show tags, modified, created, and a link to the source wiki ...perhaps just some form of name linking to a tiddler containing some information about the source, e.g. tb5.tiddlyspot.com or tb5@tiddlyspot.

It would be good if one could click on a tag and then see search results for that tag. Easily done, I'd say.

It would be good if search results were grouped by "in tags", "in title", "in text"... and sorted alphabetically, by the full url, which thus naturally group by origin.

@whatever,
 
Does TW5 have the ability to mark your wiki as non-searchable? Or to mark individual tiddlers as non-searchable? So they would not get included in the aggregator's results?

That sounds a lot like putting the cart before the horse.

Best wishes, Tobias.

Tobias Beer

unread,
Dec 15, 2014, 12:40:23 PM12/15/14
to tiddl...@googlegroups.com
One more thing... Is there a way to avoid:

?

Best wishes, Tobias.

Erwan

unread,
Dec 15, 2014, 7:51:44 PM12/15/14
to tiddl...@googlegroups.com

Hi Mario,

I like your idea, and I happen to know a few things about approximate string matching techniques, so I'd be interested to look into it at some point. I could probably do the offline analytic part, but I know nothing about javascript, even less about node.js and the internal of TW. Additionally I think this is a quite complex problem, because it would have to be efficient in time and space, which is not a given with this kind of algorithms.

Erwan

Erwan

unread,
Dec 15, 2014, 7:56:05 PM12/15/14
to tiddl...@googlegroups.com

Thank you very much BJ, that was it and it should be fixed now.

Erwan

unread,
Dec 15, 2014, 7:56:56 PM12/15/14
to tiddl...@googlegroups.com

that's fixed now, thanks to BJ.

Erwan

unread,
Dec 15, 2014, 8:09:22 PM12/15/14
to tiddl...@googlegroups.com

On 15/12/14 16:36, Tobias Beer wrote:
@Erwan,

I think this does have some potential and isn't as awkward as it first comes off.

What I would criticize most right now is...
  1. how ugly and unreadable the results are currently presented
  2. that that tiddler opens and not the original article in a new tab

Well these two problems are mostly because I wasn't able to solve my widget/macro problem, as explained here: https://groups.google.com/d/msg/Tiddlywiki/qGk3aH731Pc/TtA9P9FgLFMJ
The goal is indeed to link directly to the original article. If anyone has an idea, that would be very helpful



What this aggregator can indeed do well is render search results as titles that directly link to the original source. No preview, because chances are that it's broken beyond acceptable, so that's not helping at all.

If available, the result list should also show tags, modified, created, and a link to the source wiki ...perhaps just some form of name linking to a tiddler containing some information about the source, e.g. tb5.tiddlyspot.com or tb5@tiddlyspot.

It would be good if one could click on a tag and then see search results for that tag. Easily done, I'd say.

It would be good if search results were grouped by "in tags", "in title", "in text"... and sorted alphabetically, by the full url, which thus naturally group by origin.

As a first version I'm trying to show the items as "<tiddler title> @<source id>", as a link to the original <site>#<tiddler title>. Once this works, I'll try to add the tags, but I'm not sure my current level of expertise allows more than that!

Erwan



@whatever,
 
Does TW5 have the ability to mark your wiki as non-searchable? Or to mark individual tiddlers as non-searchable? So they would not get included in the aggregator's results?

That sounds a lot like putting the cart before the horse.

Best wishes, Tobias.

RichShumaker

unread,
Dec 16, 2014, 12:17:59 AM12/16/14
to tiddl...@googlegroups.com
Love your idea and I agree with everyone about the mixture of the TW5's and privacy(copyright) issues.
I grab stuff from Tobias all the time(Thanks Tobias ;) and his pages always get mangled because he uses 'best practices' and really cool add on's and tricks).
Also the data is not tracked back to the original site like it would be inside TW5 when you use the 'New Here' drop down.

My idea and others have had it as well is a Social TW5.
A way to easily share content between TW5 users.

My end goal is a "FB for You".
Napster + FB = TW5 Social

Part of what I describe above requires a "scraping tool" that would allow you to grab <tags> and [[Tiddlers]] from all (Public) TW5 Social sites.
TW5 Social would be a platform where you Share what you want to Share the same way you would in a Public setting, keeping ownership of images and data just as other Social Networks do.
Private TW5 would come in the future as that requires more stuff to stop people you don't want to see them to see them.

If you created a TW5 Social Site you expect people to PIN or POST or Share your pages.

Also your Public TW5 would use a base line of code so you could "Scrape", "Share", "Link-Back" automagically.
Hell right now I can't even Tweet my Tiddlers from TW5, although Tobias has done a bunch on this and wow, seriously Thanks Tobias.

Here are some ideas that might help with the issues that we are running into on this
For Privacy and Copyright Issues
Is there a way to create a generic [['I AGREE' Tiddler]] that everyone(who wants too) adds to their sites if they want to have people 'Share' there stuff, sames as Social Networks you don't OWN the images and what not just you can share it.
If the TW5 Site has no [['I AGREE' tiddler]] then the scrapper skips your site?

For Mangled Mixture MashUp Styles from many different sites.
Maybe create a basic Tiddler List and Tag List with links to the actual sites.
So limited or no major content right now just links to stuff.
You control the view of this so you could create it in <Tables> or use <Tags> whatever way you wanted to list it.
You don't actually have the content until we get an "I AGREE" system in place.

Let me know what you guys think and thanks for all the help.

Rich Shumaker

Jeremy Ruston

unread,
Dec 16, 2014, 3:31:13 AM12/16/14
to TiddlyWiki
Hi Erwan

> I wrote a quick-and-dirty script which aggregates a bunch of wikis (mostly those which appear in http://tiddlywiki.com/#Community) into a single big wiki. 

Well done, this is a very interesting experiment; it's something that I've been wanting to explore for a while.

> Do you think this is useful?

I think this technique may be useful for a number of scenarios:

* Gathering together material for tiddlywiki.com. For example, we could aggregate the plugin descriptions from their originating sites
* Searchability, as you're exploring at the moment. As Mario points out, there is an opportunity to build an index and support sophisticated fuzzy searches
* Updating federated plugins; TiddlyWiki could retrieve the latest version of loaded plugins from their originating sites
* Federated discussion group. Participants would register their own TW's for regular scraping, and the aggregation site would thread tiddlers into discussions via a "comment.to" field identifying the target of a comment

Best wishes

Jeremy



--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.
To post to this group, send email to tiddl...@googlegroups.com.
Visit this group at http://groups.google.com/group/tiddlywiki.
For more options, visit https://groups.google.com/d/optout.


--
Jeremy Ruston
mailto:jeremy...@gmail.com

Tobias Beer

unread,
Dec 16, 2014, 6:24:35 AM12/16/14
to tiddl...@googlegroups.com
Hi Erwan,
 
As a first version I'm trying to show the items as "<tiddler title> @<source id>", as a link to the original <site>#<tiddler title>. Once this works, I'll try to add the tags, but I'm not sure my current level of expertise allows more than that!

Haven't taken any look yet at how this actually works. I'll see if I can't come up with some suggestions. Perhaps the more knowledgeable among us could do that, as well ..so as to help you turn that aggregator into the useful resource your gut suggested it might just be, so you baked some first samples.

Best wishes, Tobias.

Alberto Molina

unread,
Dec 16, 2014, 6:35:15 AM12/16/14
to tiddl...@googlegroups.com
Hi Erwan,

I don't know how you did it, but you made me think about something. If it is possible to access other public tw, then it might be possible to see the version number of a plugin in its official page. Thus, it should be possible to develop a plugin to check if an installed plugin is up to date, comparing its version number with the one in the official page. What do you think?

Alberto
Reply all
Reply to author
Forward
0 new messages