Test data proposal and calls for contributions

163 views
Skip to first unread message

TonyM

unread,
Nov 24, 2018, 10:31:24 PM11/24/18
to TiddlyWiki
Folks,

With the advent of 5.1.18 I plan to build some test data and test data generators and publish them so we can share some standard test data sets.

When people are struggling to use complex filters, interrogate large tocs or data structures or even simple datasets people are either reluctant to share their own data or develop an alternate dataset to share with the community.

So I propose we build and share a set of test data examples that can be used as standard examples when describing problems or solutions. To make it even more practical I think it should be hosted on a noteself instance so you can run tests and save them in your own browser session.

A similar but seperate resource could also publish useful datasets like list of country or airport codes US OR Australian states which are not just available as test data but that people can use to build their solutions

Such resources are easy to build over time using a crowd sourcing approach and they will grow rapidly. Building plugin or tiddler bundle libraries could keep the wiki size small while building a substantial library.

What test or reference data do you want?
What test or reference data can you share?
Do you have suitable data sets, import processes or generation macros?
Any other ideas?

Please contribute in this thread.

Thanks in advance
Tony

TonyM

unread,
Nov 25, 2018, 12:48:56 AM11/25/18
to TiddlyWiki
Test data I am planning or hope people can provide includes;

Some scientific data like geological epochs, genus and species, a year of tiddlers with date fields set, before and after today, a random network of nodes randomly tagged, a heirachy of nodes, a geneological tree, a network of clustered networks, some kind of linked or heirachical reference work. List of lists both unique and cross linked.

International airport codes, Australian post codes.

Internet top level country domains, international green house emissions (yes shame on australia and the US).

This will not happen overnight so help would be appreciated so feel free to send some datasets or json files of tiddlers.

Regards
Tony

Mohammad

unread,
Nov 25, 2018, 3:07:41 PM11/25/18
to TiddlyWiki
Hello Tony,

 In response to your post, I would like to share the bulk tiddler creator code! It is not directly related to the test data you asked (meaningful dataset) but can be used for some intensive tests on TW 5.1.18.
The code can creates thousands (or millions) of new tiddlers in three levels. A table of content will be used in the sidebar to list them. Tiddlers are in hierarchical order.

New tiddlers have a lorem ipsum text and a sample field and tagged with test.

You can find my Test Data wiki here at tiddlyspot:




Cheers
Mohammad

TonyM

unread,
Nov 25, 2018, 6:18:04 PM11/25/18
to TiddlyWiki
Thanks heaps Mohammad,

Just the kind of thing we need.

Regards
Tony

@TiddlyTweeter

unread,
Nov 26, 2018, 6:19:36 AM11/26/18
to TiddlyWiki
Ciao Mohammad & Tony

I think, overall, for good basic testing Mohammad's approach is optimal. Far better to have a generator than lots of more static solutions.

Comments ...

I successfully could generate 100's of Tiddlers using it. BUT when I asked it to make 6000 it ground to halt and locked the browser up too long.

But I still think the approach is good.

I can suggest ideas to hone it IF you want. Just ask.

Best wishes
Josiah

@TiddlyTweeter

unread,
Nov 26, 2018, 6:32:51 AM11/26/18
to tiddl...@googlegroups.com
TonyM

Regarding test data ...

1 - a GENERATOR of test tiddlers that is flexi-spec seems best for most cases--Mohammad is in that direction well.

2 - a data set for SEQUENTIAL documents. For instance a novel chunked to paragraphs with an ordering field filled in with an INDEX value. Here the issue is a data set you can use to hone presentation order.

3 - a data set for ENCYCLOPEDIA wiki. A large set of tiddlers that are field-rich that collectively compromise a complete system for knowledge in a specialist field of endeavour. Here the issue is organising optimal structured access.

4 - singular Tiddlers aimed at understanding CSS. Like you asked me in another thread and which I will work on over coming weeks.

These seem to cover most cases. Though I think (1) is likely the most universally useful since its likely a better Swiss Army Knife for the issues tested for.

Buongiorno
Josiah

Mohammad

unread,
Nov 26, 2018, 6:40:43 AM11/26/18
to TiddlyWiki
Hello Josiah,
I think you should be able to create around 10000 tiddlers with no problem, but it takes a little more time. For example use
L1 L2 L3 as 10, 50, 20.

10+10*50+10*50*20=10510 tiddllers

When you want higher number of tiddlers the tiddlywiki seems takes a lot of time and browser halts.

This is where Jeremy can explain. It may be related to the TW, in the way it does jobs and manages memory.
In Windows I noticed Tiddlydesktop and nwjs are working at the behind, but takes a lot of time to produce the results.

I think this code is good for intensive testing.

One more point, in demo I have added a line of code to delete previous tiddllers, you can remove it, so this can produce a lot of tiddllers but in several smaler number.
Mohammad

TonyM

unread,
Nov 26, 2018, 8:28:11 AM11/26/18
to TiddlyWiki
Folks,

As my original post states test data generators are great, but we need them also to produce common or standard sets that can be used as a reference.

Further some standard sets are required with realistic or even useful content, as suggested in my second post.

An example I may provide is a set of domains, containing a set of projects with a set of tasks with a set of different status settings in each task.

Another may be a random network of nodes linked by tags and or field values.

And another a geneological tree with two parents per node.


Thanks for the contributions a so far
Regards

Magnus

unread,
Nov 26, 2018, 9:12:00 AM11/26/18
to TiddlyWiki
I have a TW with all genus & species from Orchidaceae (orchids), some 29000 tiddlers, most are empty but for testing might suffice. Hope it's not corrupt, IE10 wanted to (and did) save as .html.

Message has been deleted

Mohammad

unread,
Nov 27, 2018, 12:34:10 AM11/27/18
to TiddlyWiki
Hello Josiah,
 I revised the code and added below capability:

- a checkbox lets you keep or delete previously created tiddlers
- a base-name lets change the name prefix in all three levels of created tiddlers

This way you can create huge number of tiddlers in several steps, each step less than 10,000 to prevent  browser from freezing.


Warning: When creating tiddlers with same base name several times, without deleting previously created ones, makes  a numerical counter to be added to the title and incremented until it is unique. So, the table of contents in the sidebar will be affected.



Best
Mohammad

Mohammad

unread,
Nov 27, 2018, 5:40:31 AM11/27/18
to tiddl...@googlegroups.com
Hello again

Further input,
 Right now the test-data.tiddlyspot.com has 31470 tiddlers and is around 100MB.
It seems the browser (for me Chrome) is slow to handle that much of data easily.

By the way Tiddlywiki 5.1.18 shows the results and when there is a few open tiddlers it does the job with ease while is a bit slow!


Warning: It is 100MB so, be careful if you want to open it on your cell phone!
You can simply delete all tiddlers and download a light copy!

Mohammad

@TiddlyTweeter

unread,
Nov 27, 2018, 5:49:06 AM11/27/18
to TiddlyWiki
TonyM wrote:
.... we need them also to produce common or standard sets that can be used as a reference.

Could you say a bit more about this? I'm trying to get clearer about what needs testing ... What are some typifying cases you can think of? What makes a "reference point" in this?

My own interest is around optimizing performance for ...

(1) longer wiki that are structured like novels & screenplays ... i.e. sequential texts ... One issue here is TOC structure. Another is size of "chunk" (a chapter, a paragraph?)
Another is how much to render at a time = but it is simply structured.

(2) big wiki consisting of zillions of fragments ... e.g. tweet length Tiddlers that require good search & richer tagging to find = potentially multi-structured

Best
Josiah

TonyM

unread,
Nov 27, 2018, 6:02:32 AM11/27/18
to TiddlyWiki
Josiah,

On one hand

Some time ago someone presented a Question, to illustrate they had some example data, it was a very small set, it related to fruits, it took a bit of the conversation to get a json file posted and more than one person put  it in their own wiki and provided a solution that worked against the test data. We were then all on the same page, the solution was tested.

It would be helpful when trying to do simple or complex things to point to an existing dataset of test data and build the solution, and if you have a problem you can share your plugin/macro and reference the dataset and the results it generates.

It would also save time being able to grab sample data when testing an idea, and if there is a hard to solve problem it would be trivial to state "When using test data setname and the following macro the result is xyz, is this a bug or am I doing something wrong?", then it is easy for any one to replicate the problem, and work on their own version for a fix, and if they publish their solution they can quote the result they get from the same shared dataset.

On the other hand many of us may just want to test an idea, not even share it out there, having dataset or generators of test datasets will be helpful.

Regards
Tony

TonyM

unread,
Nov 27, 2018, 6:03:25 AM11/27/18
to TiddlyWiki
Magnus,

That is great, I have not yet converted it yet, but its promising.

Regards
Tony

@TiddlyTweeter

unread,
Nov 27, 2018, 6:39:58 AM11/27/18
to TiddlyWiki
Ciao Mohammad

I'll comment more in a day or two in more detail. I'm really using your tool already!

Maybe these points for potential users could be helpful to add to a Tiddler shown on first use?

  -- For testing a wiki don't use test data in the original, only in a copy of your wiki!

  -- If you create Test Tiddlers in large numbers (thousands) your browser may slow down or freeze for some time.

  -- If you are doing a lot of testing its advisable to use a browser instance just for that wiki so it runs separately from other pages.

Just thoughts
Josiah

@TiddlyTweeter

unread,
Nov 27, 2018, 6:58:33 AM11/27/18
to tiddl...@googlegroups.com
TonyM wrote:
Some time ago someone presented a Question, to illustrate they had some example data, it was a very small set, it related to fruits, it took a bit of the conversation to get a json file posted and more than one person put  it in their own wiki and provided a solution that worked against the test data. We were then all on the same page, the solution was tested.

Picking up on one aspect of this .... I'm wondering if someone has skill to create a macro that will interrogate the wiki to create a Wiki Spec Report .... from https://tiddlywiki.com/#InfoMechanism for detection of environment Plus global totals like those found in ControlPanel/Info ...

Capture.PNG


Footnote: the report should not be dynamic ... by which I mean its output should be a "snapshot tiddler" which can be shared.


Just a thought

Josiah


@TiddlyTweeter

unread,
Nov 27, 2018, 7:22:51 AM11/27/18
to TiddlyWiki
Ciao Tony

Thanks. Now I better understand what you meant. Which is somewhat like "grow knowledge through triangulation"? I.e. tests done using known data sets. Right?

What I was thinking about was whether we yet have general "Rules-Of-Thumb" of performance critical factors. Over time I've read threads about the relative efficiency of standard fields over tag fields. About the issue of balancing needs of organistion (rendering overhead) with only have "just-enough" organisation (reduce recursive rendering as much as possible) and no more.

I can't say I'm yet clear. And maybe, given TWs richness its kinda hard to posit anything conclusive on "rules". But, IME, once you get to scale (thousands of Tiddlers) some generic issues get clearer. And are perhaps worth articulating?

Just background thoughts
Josiah

@TiddlyTweeter

unread,
Nov 27, 2018, 7:47:07 AM11/27/18
to TiddlyWiki
I agree. Its very useful. Its substantial and is well populated with fields and their content exists. I don't think it matters the "text" field is empty in most Tiddlers.

Great for testing "complex schema" ... I.e. ...

... very good for what one might call the ENCYCLOPEDIA scenario?

Best wishes
Josiah

TonyM wrote:
That is great ...

Magnus wrote:
I have a TW with all genus & species from Orchidaceae (orchids), some 29000 tiddlers, most are empty but for testing might suffice.

@TiddlyTweeter

unread,
Nov 27, 2018, 8:34:15 AM11/27/18
to TiddlyWiki
Ciao Tony & all

Great thread. Many thanks for it!

What emerged more clearly from this thread is the issue: "How Do I Measure This?" How do I objectify my intuitions on performance?

What I mean is ... okay we have data sets to test. But how can I ...

-- measure performance reliably & comparably?

In a previous discussion I asked about the viability of using "now" to log start and end times on a render as a simple measure. The answer was, basically "its not accurate enough" ... "you should use the in-browser debug console" ...

... but what I'm still unclear about is what to look for in the in-browser console and how to relate it back to the TW under study and improve it.

Can someone provide an example?

TonyM wrote:

With the advent of 5.1.18 I plan to build some test data and test data generators and publish them so we can share some standard test data sets.

 

Any other ideas?


Reply all
Reply to author
Forward
0 new messages