Hi Wolfgang,
I missed this post too! Thanks for the pointer to your experience, I
found it very helpful.
For my needs, I will have hundreds of structured datasources roughly
the size of your dictionary, usually representing the metadata
attached to an experiment log. I still want to easily link to parts of
the datasources when I write something, or search, but I'm not sure
the resources need to be in a tiddler format (TW file) unless I update/
add to the datasource.
To continue your investigation, at an attempt to make TW useful for
working with larger data structures, I wonder if anyone has any
comments on the following approaches:
*Storing data in SQLite (database included with Firefox):
zotero.org a references datastore with page scrapers already used by
many academics could probably handle your dictionary example, I once
considered creating a TW interacting with the zotero tables using
Jack's SQLQueryPlugin, but some scientists I know have complained the
growing volume of their references has considerably slowed their
interactions with the tool, so I did not bother - my interests have
moved toward Solr, or creating a Tiddlywiki version that shards data
according to the domains I work between
*Storing data in Solr:
This will easily handle your dictionary and several hundred more
dictionaries, can pass json structures, and simple to batch updates to
Solr with the TW RSS feed. Solr can be thought of as a flat database,
tiddler objects easily map to it. I have some bias though, since I
already invested considerable time with the project.
One issue with Solr is the requirement for a Java runtime - and for
part of the user base I would like to help, it is probably not
feasible to expect a java runtime.
*Enhance TW to actively partition notebooks and create/load
precomputed indexes based on rule sets derived from tags, alphanumeric
range, username, date, etc.:
As noted in the Pali dictionary example, loads and edits are slow with
larger collections, but searches are relatively fast, this has been my
experience too, even with the 8MB references file I maintain. I rarely
want to edit this file, just search it. So, when priming the index for
the TW search command, perhaps a parameter can be referenced that
defines a set of additional pre computed index structures to load into
scope. The indexes are derived from a datasource such as other TW
files (e.g. pali dictionary), or another file format (e.g. csv). If I
need to create/update a reference (e.g. pali dictionary entry), a
tiddler object is created in my primary notebook as needed. Changes
are then propagated back to the datasource (e.g. a TW document, csv
file, or server) based on user preference (batched as a monthly sync,
lazy loaded during minimal activity, etc.).
The idea is to keep your primary notebook small, and thus able to
persist edits quickly. It is a short term memory of sorts always in
RAM - a bag of essential objects(tasks, reminders, etc. ), plugins,
and themes. This primary notebook would also manage profiles pointing
to sets of index structures appropriate for the various realms I may
exist in e.g. all tiddlers I authored over the last year, my wife's
website, my research and experiments, tiddlywiki dev bits - perhaps
using iframes to load a set of objects as a new TW file. This also
sounds a little like what the CognyWiki authors had in mind, in that
at any given point in time, my brain is likely pointed along some task
axis, it's just that occasionally my tasks have large collections of
connected objects I would like to maintain local references to.
Does it seem like this might improve user interaction when working
with sets of large objects?
It is likely more than you need for the dictionary example, but it
should scale to handle many more dictionaries.
To further explain my reasoning:
Even for a very large lexical and data rich domain as the life
sciences, one is provided many reasonable dimensions upon which to
split information into smaller chunks. There are many molecules in
nature, but I am often only interested in molecules of a particular
mass range (e.g. 'small molecules', or proteins), and usually just the
set that influences the particular biochemical pathways I am working
with. Or, if I am looking at a mass spec dataset for biomarker search
of patient plasma, I might only need the small peptide part of my
molecule datastore - where I may have entered several new peptide
entries based on some external tool's sifting of the dataset. Later, I
might want to sync the new entries with some data authority after I
publish a paper. The algorithms in the external tool may have actually
identified a few hundred different peptides in the sample, in addition
to my three new unidentified peptides - all of which are now
referenced in my molecule datastore. Other times, when I am in my
HomeRealm, I have MB's of statistics and clippings on electric
tankless water heaters and other energy topics, notes, emails from a
neighbor, etc. - I do not need this information in RAM, when I am in
the biomedical branch of my task realm.
Based on a quick review of the plugins available, perhaps this might
serve as the foundation for the solution I am proposing (again
comments welcome):
I wasn't entirely clear from the post here:
http://groups.google.com/group/TiddlyWiki/browse_thread/thread/640aba056052fffc
but I think this is the direction the author was headed in, to persist
index structures - maybe some mix of this plugin, SearchPlusPlugin and
the Includes/MasterIncludes plugin will provide the building blocks
for a solution (?)
I wonder at what point on such a path, I am recreating a large part of
Solr, but in Javascript - not something I want to do. So, maybe it is
better to stuff everything into Solr and only use TW for the things I
regularly edit (appointments, etc.)? I'd still apply the concept of a
short term memory mapped to a TW file, but as my task realms change,
alternatively load and query various Solr indexes with http/json
(localhost.molecules, localhost.patients, university_server, etc.).
ian
On Jul 5, 5:11 pm, wolfgang <
wolfgangl...@gmail.com> wrote:
> Hi Ian,
>
> when I tried to create a simple dictionary I soon met TiddlyWiki's
> database limits. This caused me to plea for "an incredible better
> DataTiddlerPlugin" in the following thread:
http://groups.google.com/group/TiddlyWikiDev/browse_thread/thread/b05...