Adding indexes to the tiddler store

78 views
Skip to first unread message

Jeremy Ruston

unread,
May 18, 2019, 1:41:23 PM5/18/19
to TiddlyWikiDev
Apologies for being quiet on the groups over the last week or two. One thing I’ve been working on is exploring performance improvements for large wikis. As part of that work, there’s now a nearly-complete pull request for adding indexes to the tiddler store.


Currently, the core evaluates a lot of filter expressions like [all[shadows+tiddlers]tag[$:/tags/ViewTemplate]!has[draft.of]] to construct the user interface. These filters are slow because the “tag” operator has to iterate through every tiddler in the store to find the tiddlers with that tag. With this PR, the wiki store maintains an index of tiddlers with each tag, updating it every time a tiddler changes. There’s also a fair amount of refactoring to get the filter processing logic using the new indexes.

The implementation introduces a new type of module called an “indexer”. It has some standard methods: rebuild() for when the entire index must be reconstructed, update(oldTiddler,newTiddler) when a tiddler is modified, created or deleted. There are indexer-specific methods for performing lookups. There is also a new convention whereby tiddler iterators can have additional properties defining “index methods” that perform fast lookups on the tiddlers in the iterator — this is how the filter operators find the indexers that they know about.

The TagIndexer speeds up the “tag” filter operator when it is used immediately after one of [all[tiddlers]], [all[shadows]], [all[shadows+tiddlers]], [all[tiddlers+shadows]] (note that using the tag operator at the start of a filter run is equivalent to preceding it with [all[tiddlers]]).

There is also a FieldIndexer which speeds up the “field” and “has” operators, with the same restrictions as to the input selection. The field indexer is reasonably clever, and only constructs indexes for fields for which it has encountered queries.

On very large wikis (>60,000 tiddlers) I’m seeing startup times reduced by 25% with these optimisations, and refresh time reduced by a factor of three. It will be interesting to get reports from other people working with large or complex wikis.

Any comments/questions welcome,

Best wishes

Jeremy.

Jeremy Ruston

unread,
May 18, 2019, 2:09:16 PM5/18/19
to TiddlyWikiDev
I’m doing further tests and timings, comparing that 60K tiddler wiki before and after the improvements. One interesting datapoint is that using the advanced search box, the time to refresh the page on each keypress reduces a full order of magnitude, from approx 235ms to approx 22ms. For this wiki, this is a very consistent result, and makes an enormous difference in usability.

Again, I’ll be interested to see how well these improvements work with other peoples wikis.

Best wishes

Jeremy 

Hans Wobbe

unread,
May 18, 2019, 3:48:00 PM5/18/19
to TiddlyWikiDev
Jeremy:

Thank you very much of this significant enhancement !!

Best regards,
Hans

PMario

unread,
May 18, 2019, 8:47:48 PM5/18/19
to TiddlyWikiDev
Hi Jeremy,

Did you test the tag and field dropdown menus in tiddler edit-mode. Any performance points here?

I did also post a reply in the github PR.

-m

TonyM

unread,
May 18, 2019, 9:13:13 PM5/18/19
to TiddlyWikiDev
Jeremy,

Thanks for this development. I do see a need going forward. Can I please ask?
  • Will this mechanism be effectively invisible to most users/designers?
  • With the code to support this mechanism in tiddlywiki, could it be also surfaced to provide other indexing methods?
    • Such as indexing large data tiddlers, or a subset of tiddlers, tags? A kind of indexedlookup operator?
  • In many cases [is[current] is the first operator in filters especially on view tiddler, should this be included in the application of the tagindexer? 
  • Will the indexes be saved in tiddlers for the next reload?, perhaps this could be optional allowing indexes to be forced to be regenerated in wikis on reload in cases where there are external/federated wikis, and stored for large indexes on wikis primarily used for search and lookups (with less changes).
  • Perhaps an option to store such indexes in local storage and regenerate them if missing would help keep total size down for wikis.
Love your work
Tony

Jeremy Ruston

unread,
May 19, 2019, 7:13:09 AM5/19/19
to TiddlyWikiDev
Hi Tony

  • Will this mechanism be effectively invisible to most users/designers?
Yes. It’s a big change so in due course when the PR is merged we will need everyone to pay attention to testing it.

  • With the code to support this mechanism in tiddlywiki, could it be also surfaced to provide other indexing methods?
    • Such as indexing large data tiddlers, or a subset of tiddlers, tags? A kind of indexedlookup operator?
Yes. The current PR has two indexers for tag and field lookups, but the mechanism is extensible so that others can be added.

  • In many cases [is[current] is the first operator in filters especially on view tiddler, should this be included in the application of the tagindexer? 
Yes, as you know it is much less efficient to start with [is[current]] than [all[current]] because the former scans each tiddler to check whether it is the current tiddler, while the second just selects the current tiddler without scanning.

But, we might indeed be able to adapt the indexing mechanism to optimise the is[current] operator, and possibly some of the other variants.
  • Will the indexes be saved in tiddlers for the next reload?, perhaps this could be optional allowing indexes to be forced to be regenerated in wikis on reload in cases where there are external/federated wikis, and stored for large indexes on wikis primarily used for search and lookups (with less changes).
No, I don’t think it’s worth it. These are very simple indexes that are pretty fast to build. It’s only if we were doing full text indexing that I think we might want to persistently cache the indexes.

  • Perhaps an option to store such indexes in local storage and regenerate them if missing would help keep total size down for wikis.
As I say, generating the indexes is pretty fast, even with 60,000 tiddlers. It’s only done once, and thereafter it is selectively updated as required.

Best wishes

Jeremy


Love your work
Tony

On Sunday, May 19, 2019 at 3:41:23 AM UTC+10, Jeremy Ruston wrote:
Apologies for being quiet on the groups over the last week or two. One thing I’ve been working on is exploring performance improvements for large wikis. As part of that work, there’s now a nearly-complete pull request for adding indexes to the tiddler store.


Currently, the core evaluates a lot of filter expressions like [all[shadows+tiddlers]tag[$:/tags/ViewTemplate]!has[draft.of]] to construct the user interface. These filters are slow because the “tag” operator has to iterate through every tiddler in the store to find the tiddlers with that tag. With this PR, the wiki store maintains an index of tiddlers with each tag, updating it every time a tiddler changes. There’s also a fair amount of refactoring to get the filter processing logic using the new indexes.

The implementation introduces a new type of module called an “indexer”. It has some standard methods: rebuild() for when the entire index must be reconstructed, update(oldTiddler,newTiddler) when a tiddler is modified, created or deleted. There are indexer-specific methods for performing lookups. There is also a new convention whereby tiddler iterators can have additional properties defining “index methods” that perform fast lookups on the tiddlers in the iterator — this is how the filter operators find the indexers that they know about.

The TagIndexer speeds up the “tag” filter operator when it is used immediately after one of [all[tiddlers]], [all[shadows]], [all[shadows+tiddlers]], [all[tiddlers+shadows]] (note that using the tag operator at the start of a filter run is equivalent to preceding it with [all[tiddlers]]).

There is also a FieldIndexer which speeds up the “field” and “has” operators, with the same restrictions as to the input selection. The field indexer is reasonably clever, and only constructs indexes for fields for which it has encountered queries.

On very large wikis (>60,000 tiddlers) I’m seeing startup times reduced by 25% with these optimisations, and refresh time reduced by a factor of three. It will be interesting to get reports from other people working with large or complex wikis.

Any comments/questions welcome,

Best wishes

Jeremy.

--
You received this message because you are subscribed to the Google Groups "TiddlyWikiDev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywikide...@googlegroups.com.
To post to this group, send email to tiddly...@googlegroups.com.
Visit this group at https://groups.google.com/group/tiddlywikidev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywikidev/22681694-5fa1-47e9-9a9f-0c9df73d70a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeremy Ruston

unread,
May 19, 2019, 7:18:54 AM5/19/19
to tiddly...@googlegroups.com
Hi Mario

Did you test the tag and field dropdown menus in tiddler edit-mode. Any performance points here?

I’ve just done some testing with the tag dropdown on a new tiddler in the prerelease; performance improves from 450ms to 350ms — in other words, not a great deal. Some quick checks with the browser performance tools suggests that most of the rendering time for the tag dropdown is actually spent initialising the parser. I’ll have to investigate more.

I did also post a reply in the github PR.

Thanks,

Best wishes

Jeremy


-m

--
You received this message because you are subscribed to the Google Groups "TiddlyWikiDev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywikide...@googlegroups.com.
To post to this group, send email to tiddly...@googlegroups.com.
Visit this group at https://groups.google.com/group/tiddlywikidev.

TonyM

unread,
May 19, 2019, 11:16:57 PM5/19/19
to TiddlyWikiDev
Jeremy,

My bad saying [is[current] vs [all[current] however the english would suggest it is the other way around. Ie is is a specific case, all means from all get current. 

  • In many cases [is[current] is the first operator in filters especially on view tiddler, should this be included in the application of the tagindexer? 
Yes, as you know it is much less efficient to start with [is[current]] than [all[current]] because the former scans each tiddler to check whether it is the current tiddler, while the second just selects the current tiddler without scanning.

It is a subtle difference that most will miss and they must read or never know. Perhaps it would make sense to provide an alternate here[] operator that is equivalent to all[current]. The parameter could also allow here[tiddlername] or here<variablecontainingtitle> to override currentTiddler. Unlike the all[tiddlername] operation which is bad english and does not work.

I think this would make the filters more readable, especially when acting on the current tiddler.

So as you say
 

But, we might indeed be able to adapt the indexing mechanism to optimise the is[current] operator, and possibly some of the other variants.


Perhaps a here[] operator could be one of these variants. 

An aside
Since the suggested here[] operator by design results in a single title perhaps we could explore the use of another operator that handles multiple without deduplication for the maths operators. No need to respond, there has already being some discussion on this.

Post Script: with current and foreshadowed developments in tiddlywiki I am looking at making TiddlyWiki a key feature of a business I am starting. Perhaps we could start a forum for this and have some hangouts (equivalent), becuae there is a lot to discuss, this can inform development and help us address open source and licencing issues in advance.

Regards
Tony 

TonyM

unread,
May 19, 2019, 11:27:59 PM5/19/19
to TiddlyWikiDev
Jeremy,

One other question of clarification;

  • Will the indexes be saved in tiddlers for the next reload?, perhaps this could be optional allowing indexes to be forced to be regenerated in wikis on reload in cases where there are external/federated wikis, and stored for large indexes on wikis primarily used for search and lookups (with less changes).
No, I don’t think it’s worth it. These are very simple indexes that are pretty fast to build. It’s only if we were doing full text indexing that I think we might want to persistently cache the indexes.

  • Perhaps an option to store such indexes in local storage and regenerate them if missing would help keep total size down for wikis.

My interest here is not only or even for performance, it is leveraging the indexing process for my own use. For example I have an application I am building on tiddlywiki for a client. Each of there 30+ Offices have 1 to 30 post codes associated with each office. If there were a way to access the indexed array and copy it to a tiddler, ie surface the result, it could prove useful for various applications.   I know I can build equivalents myself but they will be wikitext based and never as efficient as the ones in the indexing system you propose. Perhaps providing a widget or variable name that can reference the content of the indexed array. Extracting such arrays at a point in time would allow new lists to be created with additions and deletions for further use. In effect capturing the result in an index in a moment in time, useful for database, reporting and dashboard applications that can highlight new and deleted items/relationships relative to a prior state.

Regards
Tony

Reply all
Reply to author
Forward
0 new messages