Implications of "Tag Pollution"

157 views
Skip to first unread message

Stobot

unread,
Jul 14, 2021, 11:29:38 AM7/14/21
to TiddlyWiki
I've seen this come up from time to time on the forums - and Tones mentioned it recently  How to make a more convenient method for selecting tags? (google.com)

I agree with most of the points of where fields are more useful than tags as you get deeper and deeper into organization, and the methods make sense, but often there's talk about a performance implication that I don't understand, but am hoping to as I'm kind of a performance junkie. 

So my questions are:
1. Is the "bad" type of "tag pollution" related to number of unique tags, or number of tiddlers *with* tags? Or both? For example I use my main TiddlyWiki for project organization and most details are in fields, but almost every tiddler is tagged as either a task, project, meeting, etc. So, I have a small number of unique tags, but high number of tagged items.
2. What is the source of the associated negative performance impact? I assume it has something to do with indexing - and if so there's probably a speed vs. memory tradeoff. 

David Gifford

unread,
Jul 14, 2021, 11:54:43 AM7/14/21
to TiddlyWiki
I am no expert. But my experience has been that when I have used large numbers of tags the performance slows.

Here are two of my projects that have some slow down:

1. https://giffmex.org/gospels.bubbles.html (1600 tiddlers, 1155 tags) very slow. I gave up on finishing it because to add tags to each tiddler was taking too long. Yet there is no content in most of the text fields.
2. https://giffmex.org/gifts/dictionaryarticles1.html (17,000 tiddlers, 214 tags) This one is much better: New tiddler creation, while slow, is still tolerable. Even the all tiddlers tab opens fairly fast. Again, no content in most text fields. But yes, even at 214 tags this one is slowing down.

Again, I am no expert on the behinds the scenes processes, so it is hard for me to interpret what is happening, but I am guessing three things are the culprit re performance on files with a lot of tags:
.
1. The use of complicated list filters by tag, and view templates displaying those list elements with involved CSS, could slow it down
2. The more sidebar lists are visible they must be rendered. This perhaps could affect performance.
3. The tag pills themselves must be rendered with CSS and dropdowns in multiple default places, so perhaps that is affecting performance as the number grows?

There is the other issue of having to wade through tags to add them.
And another issue of trying to use one system (tags) to handle multiple functions (tags for topic, tags for format, etc, and tags for tracking "meta" tiddlers, etc)

I now tend to use links for topics, tags for functionality - adding things to, say, the sidebar or pagecontrol buttons, etc, and fields for tiddler-specific information I want to track or organize (contact info, etc)

PMario

unread,
Jul 14, 2021, 12:22:33 PM7/14/21
to TiddlyWiki
On Wednesday, July 14, 2021 at 5:29:38 PM UTC+2 Stobot wrote:

I agree with most of the points of where fields are more useful than tags as you get deeper and deeper into organization, and the methods make sense, but often there's talk about a performance implication that I don't understand, but am hoping to as I'm kind of a performance junkie. 

Field values and
Tag values are internally indexed since TW version 5.1.20

Backlinks are indexed since 5.1.22

See: https://tiddlywiki.com/#Release%205.1.20 and search for "indexer" to see the release message and the link to the PR in the repo.
 
So my questions are:
1. Is the "bad" type of "tag pollution" related to number of unique tags, or number of tiddlers *with* tags?

At the moment it should only be a "user management" problem, not a performance problem any more.
 
Or both? For example I use my main TiddlyWiki for project organization and most details are in fields, but almost every tiddler is tagged as either a task, project, meeting, etc. So, I have a small number of unique tags, but high number of tagged items.

Filters should be reasonably performant for your use-case

2. What is the source of the associated negative performance impact?

It was the number of _all_ tiddlers. As Jeremy mentioned there in the release-note with 60 000 tiddlers the refresh time improved by the factor of 3.
Refresh time will matter, if you eg: Switch tabs ...
 
I assume it has something to do with indexing - and if so there's probably a speed vs. memory tradeoff. 

Yes, ... We trade more memory consumption for better speed _now_

-mario
 

Mark S.

unread,
Jul 14, 2021, 1:29:10 PM7/14/21
to TiddlyWiki
In my case, a TW with only 30k entries slowed to a crawl when using the tag filter. This was AFTER indexing had been added. Most of the tiddlers in the TW had the same tag. My workaround was to change my filters to use search filter instead of tag filter. As best as I understand it, the indexing is set up to look for a small subset of tiddlers with a certain tag, not a large number of tiddlers with the same tag. 

In terms of user management, the problem is that you can have tags that are meta data, tags that are semantic, and tags that are functional. My solution is to have a selectable filter in the editor which changes which set of tags that I see.  This could be expanded in other ways. For instance, if you wanted to only see tags that were also contacts, for labeling correspondence.

Message has been deleted

TW Tones

unread,
Jul 14, 2021, 8:49:57 PM7/14/21
to TiddlyWiki
Folks,

Good to see this conversation continue;

Tags are beautiful and easy to adopt and should remain free and unconstrained, however as they build in number they can get messy and complex.

To me the main issue with tag pollution is "user management" but also its the wrong solution and results in more complexity when for example you want tiddler to have only one of a set of tags at the same time. Every time you manipulate one of these tags you have to take account of the others, and people can manually bypass your "rules" eg add do and done tags to a tiddler.

Long drop-downs because of many tags when tagging force you to remember part of the tag name at least and browsing is made complex.

Yes possible impacts of many tags can be managed to reduce the effect of tag pollution. My preference is however to choose the best technique as soon as possible for the functionality I need and I leave the tags available for instant and ad hoc groupings.

Regards
Tones

unread,
Jul 15, 2021, 12:28:32 AM7/15/21
to TiddlyWiki
Hi all,

> Long drop-downs because of many tags when tagging force you to remember part of the tag name at least and browsing is made complex.

Long drop-drowns are quite the bother. Displaying them over several columns beyond a certain number would be a starter I guess.

Moving away from TW itself and into user practices, and regarding the point about remembering part of the tag name, I use **numbers as tag prefixes** on my PIM (personal information manager), which essentially solves this issue. My tiddlers there require a certain number of tags, and I have a number of optional tags too. I then just have to cycle through the numbers and all my tiddlers are always properly tagged and I never forget applying an important tag. I make extensive use of prepopulated tiddlers and cloning so there is little extra friction, except when starting with a very new kind of note, and I actually view that as a positive investment in getting my taxonomy right. If I really lack time, I just slap a big "8requiring-maintenance" tag and leave it for later.

The overall logic could possibly make sense in other situations that PIMs. A few more details on my exact system, in case it may be of interest to anyone using TW as a PIM or otherwise: 

I cycle through a minimum of 7 tag categories and a maximum of 12, usually checking only one of each: .domain (e.g. .Playbook, .MyBizName1, .Domos, .Zettelkasten) → 0core-theme (e.g. 0inventory-management, 0project-opportunity…) → 1 key-motive (e.g. 1manage-daily-life, 1handle-ideas, 1improve-skills, 1manage-property…) → 2next-action (e.g. 2keep, 2experience, 2solve, 2ponder…) → 3object (e.g. 3heuristic, 3contact, 3PIM-tool, 3admin-info…) → 4ticket (e.g. 4next, 4later…) → 5development-stage (e.g. 5idea, 5draft, 5thorough, 5final…) → filing-type (e.g. 6research-note, 6contact-card — typically controls formatting) → [optional] 7confidentiality-class, 8extra-parameters (mostly 8requiring-maintenance and 8sandbox) → [optional] 9subtheme → [optional] @location/person. I find all of these categories very useful, except 0core-theme, which tends to be fairly redundant with either .domain or 1key-motive and which I am considering simply removing, shifting .domain to 0domain thanks to TiddlyCommander, my go-to tool for this kind of maintenance work). I don't feel anything is missing either. 8extra-parameters was supposed to serve as a catch-all in that regard but beyond 8requiring-maintenance and 8sandbox, I have only 2 experimental tags there, 8link-to-local-file being seemingly useful.

I expect maintenance work to remain minimal over time as tag inflation is virtually nil by design for almost half of these tags and extremely low for another half or so, leaving me with inflation only in @location and 9subtheme. 9subtheme is doing fine, however, as I seem to be using subtheme tags mostly to identify specific projects and project inflation itself is gradual and low enough. 3object is probably my only concern, pollution-wise, as I currently have 40ish tags there, and seem to be regularly creating new ones. I am experimenting with a few maintenance tiddlers mostly with a view to helping me identify tags that I though were good ideas at one point but never caught on to keep that under control without putting too much effort. 

Regards,
Reply all
Reply to author
Forward
0 new messages