Tiddlywiki and regexp examples part ii: Working with fields

321 views
Skip to first unread message

Mohammad

unread,
Aug 27, 2019, 7:01:07 AM8/27/19
to TiddlyWiki
In this part questions raise for finding and matching a substring in a Tiddler fields. This field is NOT text
as we will discuss this in another thread and seems needs more work.

Target fields: created modified

write regexp patterns to match

  1. created/modified  in August 2019
  2. created/modified on Wednesdays
  3. created/modified on 3rd day of each month
  4. created/modified on 1st of December of each year
  5. created/modified on January of each year


Target fields: other user fields
  1. store a url in format http://xx.yy... or https://xx.yy...
  2. store an email: like jeremy...@tiddlywiki.com or jer...@tw5.org
  3. store an integer number only: like 598012543
  4. store a negative  integer number only: like 598012543
  5. store an image filename like .png/.jpeg/.tiff/.gif
  6. store a transcluded value in the form of {{aaa}}
  7. store a transcluded value in the form of {{aaa!!bb}}
  8. store an html tag like <div class="" style="">content </div>
  9. store image like [img[source]] or [img width=xx [source]]
  10. store a floating number 1.2365
  11. store time in format hh:mm:ss or hh:mm:ss am/pm

Please give solution! Please give other use cases
Only focus on user fields and system fields except TEXT and Tags and Title.

Cheers
Mohammad

Mohammad

@TiddlyTweeter

unread,
Aug 27, 2019, 8:57:37 AM8/27/19
to TiddlyWiki
Mohammad

An issue is how to use foreign languages. Regex character ranges like \w don't have accented letters. But there ways to deal with that.

TT
Message has been deleted

Mohammad

unread,
Aug 28, 2019, 1:48:13 AM8/28/19
to TiddlyWiki
Some solution to first group
modified in 2019: ^2019
modified
in January 2019: ^201901
modified
in August (any year) : ^\d{4}09
modified on
3rd of each month: ^\d{6}03
modified on
1st of December (any year): ^\d{4}1201
modified between
1st and 9th of December (any year): ^\d{4}120

The question
created/modified on Wednesdays
seems tricky and needs some scripting

Cheers
Mohammad

@TiddlyTweeter

unread,
Aug 28, 2019, 1:58:05 AM8/28/19
to TiddlyWiki
Target fields: created modified

write regexp patterns to match
  1. created/modified  in August 2019
  2. created/modified on Wednesdays
  3. created/modified on 3rd day of each month
  4. created/modified on 1st of December of each year
  5. created/modified on January of each year
You mean, match against the number in "modified"?

No 2 I can't see a way in regex to do that from date numbers. 

Modified / Created fields contain a number, meaning
YYYYMMDDhhmmssmil
20190827150116448


August 2019
^201908
"^" = match from start of field

3rd day of any month, any year
^......03
"." = match any in-line character

1st of December of any year
^....1201
  or 
^\d{4}1201
"\d" = shorthand for [0-9]
"{4}" = repeat preceding pattern exactly 4 times

January of any year
^....01
  or
^\d{4}01

The 2nd & 7th of March & April 2017
^20170[34]0[27]
  or
^20170(3|4)0(2|7)
"[..]" = Character Class
"(..)" = Capturing Group; "|" = alternation 

---

Q: Anyone want to try 4-9pm on all days from the 1st of August to the 25th? :-)

TT

Mohammad

unread,
Aug 28, 2019, 2:07:44 AM8/28/19
to TiddlyWiki
Hi TT,
 By modified, I mean the date value stored in the modified field of tiddler which is created automatically by TW.
Thanks for provided alternative solutions and extra examples.

Cheers
Mohammad

Mohammad

unread,
Aug 28, 2019, 2:17:25 AM8/28/19
to TiddlyWiki
Q: Anyone want to try 4-9pm on all days from the 1st of August to the 25th? :-)

TT

Is this a quiz for Mark?

:-) 

Mark S.

unread,
Aug 28, 2019, 10:32:24 AM8/28/19
to TiddlyWiki
The 3rd of the month AT Greenwich, or somewhere else?

You would need to make a different match for every target locale. One location's 3rd is another location's 2nd and another location's 4th.

I don't think regex is a good match for this kind of date comparison. What we need are more tools that will allow us to access and compare
date stamps. More ways to convert local dates into UTC, add/subtract days, and then convert them back. Filters that understand days of
the week, month, year. That sort of thing.

Thanks!

Mohammad

unread,
Aug 28, 2019, 10:39:00 AM8/28/19
to TiddlyWiki
Hi Mark,
 By third I mean what is stotred in modified field so if it is yyyymm03??..   I interpret it as 3rd of month!

--Mohammad

@TiddlyTweeter

unread,
Aug 28, 2019, 10:41:21 AM8/28/19
to TiddlyWiki
It is.

@TiddlyTweeter

unread,
Aug 28, 2019, 10:42:16 AM8/28/19
to TiddlyWiki
What's Greenwich got to do with it?

Mark S.

unread,
Aug 28, 2019, 12:11:05 PM8/28/19
to TiddlyWiki
What, have I stumbled into some political landmine? GMT <> UTC ?

It can be the 3rd of the month (with an easy regexp) in one place, but the 2nd of the month somewhere else. In that somewhere else, it would
require a more complicated regexp that matches say 03(00|02|03) to the 2nd of the month.

But if you wanted to stick to only UTC, then I think Mohammad's #2 request should be possible, though painful. There's only 14 possible yearly calendars. So you could
match the year against a "wednesday" calendar. Then match the month and day to determine if the date was a wednesday. There's about 4 Wednesdays per
month, so there's about 672 elements that would have to be programmed (not counting the years, which would be limited to however many you wanted
to plug in. Whew. That's why date filter operators are needed.

@TiddlyTweeter

unread,
Aug 28, 2019, 12:38:58 PM8/28/19
to TiddlyWiki
May I hesitate and point out that a TW date (singleton wiki) can ONLY be a number of finite length?

And that number HAS to be something. Mohammad & I tend to believe in those numbers.

When does GMT <> UTC  change that?

I'm struggling without your point. 

TT

@TiddlyTweeter

unread,
Aug 30, 2019, 4:30:43 AM8/30/19
to TiddlyWiki
Find titles with at least one German accented character ...
^.*?[ßÄÖÜẞäöü]+?

Field: Title

"^" = start of "scope" (in this case the start of the Title field)

".*?" = match any character except line-breaks, but as few times as possible ("lazy" matching). 
Note: If you used ".*" it would match the accented characters before we actually specifically matched them. So it must be lazy.

[ßÄÖÜẞäöü]+? = Match at least one German accented character.
Note: Once we match the first German accented character its not needed to continue to the end of the field.

Matches these Titles, ->match<- ...
-><-ber" is a German word. ↩︎
->The word <-ber" is used in German.↩︎


@TiddlyTweeter

unread,
Sep 2, 2019, 4:32:27 AM9/2/19
to TiddlyWiki
Footnote. This could be simplified further to just ...

Match any single German accented character ...
[ßÄÖÜẞäöü]

Note, because it uses "[...]" character class, which are "reserved characters" in TW, the regular expression needs to be placed in a variable and the variable called by the TW regexp. For example ...

<$set name="german-accented" value="[ßÄÖÜẞäöü]">
<<list-links "[regexp:title
<german-accented>]">>
</$set>

TT

HansWobbe

unread,
Sep 2, 2019, 1:30:49 PM9/2/19
to tiddl...@googlegroups.com

@TiddlyTweeter: Neat!

It simply had not occurred to me to use this simple method, in spite of the fact that I make extensive use of the characters as leading or trailing sigils in my titles, from Unicode ranges like https://unicode-table.com/en/blocks/mathematical-alphanumeric-symbols/

After all, if mathematicians can agree to reserve ranges of characters for the formulas used in their specific fields of study, perhaps we could; especially given the powerful tagging capabilities of TW.

Cheers,
Hans


@TiddlyTweeter

unread,
Sep 2, 2019, 1:50:58 PM9/2/19
to TiddlyWiki
Hi Hans

I find regular expressions are very economical for working in languages that need more than the English a-z. 

It just needs a bit of work to setup the character classes.

Do ask me if you ever have a language issue on matching. I like playing regex.

TT 

HansWobbe

unread,
Sep 2, 2019, 2:58:44 PM9/2/19
to TiddlyWiki
If you like regExp and ways of exploiting it, you might enjoy these...


Chinese Ideographic Telegraph Symbol for Day One ...


https://unicode-table.com/en/33E0/

㏠ ㏡ ㏢ ㏣ ㏤ ㏥ ㏦ ㏧ ㏨ ㏩ ㏪ ㏫ ㏬ ㏭ ㏮ ㏯ ㏰ ㏱ ㏲ ㏳ ㏴ ㏵ ㏶ ㏷ ㏸ ㏹ ㏺ ㏻ ㏼ ㏽ ㏾

I often combine them with the equivalent Month symbols to create a dense date encoding that is useful as Tags and Search targets.

Cheers,
Hans


On Monday, September 2, 2019 at 1:50:58 PM UTC-4, @TiddlyTweeter wrote:
Hi Hans

...

@TiddlyTweeter

unread,
Sep 2, 2019, 3:20:54 PM9/2/19
to TiddlyWiki
Very interesting!

I think about it a bit!

@TiddlyTweeter

unread,
Sep 5, 2019, 6:19:02 AM9/5/19
to tiddl...@googlegroups.com
Match titles that within words include 3 or more repeated "word" characters

\B(\w)\1{2,}\B

"\B"
= NOT a word-boundary. It is an "anchor" of no size. It is complementary to "\b" the anchor for word-boundary. "\B" will ONLY match inside "words".
"\w" = any single "word" character, i.e. [a-zA-Z0-9_]
"\1" = repeat match using character captured by group "(\w)"
"{2,}" = repeat 2 or more times (i.e. 3 or more in total)
"\B" = NOT a word-boundary. 

Matches ...
The Dark Woooood
Index9996no

No matches ...
The band played "Ooom-pa-pa"
The Item 999 is in stock
The Item999 is in stock

@TiddlyTweeter

unread,
Sep 5, 2019, 6:36:24 AM9/5/19
to TiddlyWiki
Example of using the Negated Character Class

Very useful regex syntax. Often much more economical than using a positive character class.

Match titles NOT starting "$"
^[^\$]

"^" = start of scope, in this case the start of the title field
"[^" = inside a character class, in first position, "^" means "match the negation" of the following character(s)
"\$" = match the character "$" literally
"]" = close character class

To use this in a filter the regex pattern needs to be put into a variable and then invoked. See example here: https://groups.google.com/d/msg/tiddlywiki/TOUdt8ZjTa4/5v3wiF6fAQAJ

You can test it at Mohammad's regex documentation site: http://tw-regexp.tiddlyspot.com/#RegExp%20Experimentation%20with%20Title

TT

@TiddlyTweeter

unread,
Sep 5, 2019, 7:10:49 AM9/5/19
to TiddlyWiki
Slightly more advanced use of the Negated Character Class

Match titles with only ONE "/" slash
^[^/]+?/[^/]+?$

This is useful for people who use the "/" hierarchy in TW for naming Tiddlers.

"^" = start of scope 
[^/]+? = do NOT match "/" one or more times (lazy)
"/" = match "/" once
"[^/]+?" = do NOT match "/" one or more times (lazy)
"$" = end of scope

@TiddlyTweeter

unread,
Sep 5, 2019, 7:35:50 AM9/5/19
to tiddl...@googlegroups.com
Advanced use of the Negated Character Class

Match titles with defined numbers of "/" slash
^(([^/]+?)/){1}[^/]+?$

The  difference here is in {1}

If you change the number then it will change the number of "/" permitted in the match.


TT

Mohammad Rahmani

unread,
Sep 5, 2019, 1:26:01 PM9/5/19
to tiddl...@googlegroups.com
Thanks TT.
I will add these to tw-regexp.

Yes, we did not focus on negated character which can be helpful in many use cases.


Best wishes
Mohammad


--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywiki/166333e8-b354-47a0-b685-b47886566827%40googlegroups.com.

Mohammad Rahmani

unread,
Sep 5, 2019, 1:30:13 PM9/5/19
to tiddl...@googlegroups.com
This is great to be able to experiment at tw-regexp when you give a new example.
So, I highly recommend to test and lets user try it at tw-regexp.

Cheers
Mohammad



On Thu, Sep 5, 2019 at 4:05 PM @TiddlyTweeter <Tiddly...@assays.tv> wrote:
--

@TiddlyTweeter

unread,
Sep 5, 2019, 3:42:45 PM9/5/19
to TiddlyWiki
if you test this ...
^(([^/]+?)/){0}[^/]+?$

You will see only titles without any preceding "/" slash using "{0}". 

The combination of negated classes, [^...] with exact match numbers {n}, is often very useful.

TT

Mohammad Rahmani

unread,
Sep 5, 2019, 11:47:03 PM9/5/19
to tiddl...@googlegroups.com
TT,
 How we can use this in a real case? Not starting with $ sign means all ordinary tiddlers!


Best wishes
Mohammad


--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.

Mohammad Rahmani

unread,
Sep 5, 2019, 11:48:08 PM9/5/19
to tiddl...@googlegroups.com
TT,
 At https://regex101.com/ the below syntax return errors it needs the slash character to be escaped!

Please have a look


Best wishes
Mohammad


On Thu, Sep 5, 2019 at 4:05 PM @TiddlyTweeter <Tiddly...@assays.tv> wrote:
--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.

Mohammad Rahmani

unread,
Sep 6, 2019, 12:01:20 AM9/6/19
to tiddl...@googlegroups.com
More question:
Does this pattern allows trailing slashes?


Best wishes
Mohammad


On Thu, Sep 5, 2019 at 4:05 PM @TiddlyTweeter <Tiddly...@assays.tv> wrote:
--

Mark S.

unread,
Sep 6, 2019, 12:29:13 AM9/6/19
to TiddlyWiki
There's a difference in javascript between a regular expression, and a string that can be interpreted as a regular expression.

If you notice at regex101, the input box has / at the start and end. So it's assuming a direct regular expression, like:

/^(([^/]+?)/){1}[^/]+?$/gm

The slashes are used to indicate the start and end of the expression, and so any slashes in the middle not part of a character class throw an error.

But we're actually passing a string here. So internally something like this is happening:

var patt = new RegExp("^(([^/]+?)/){1}[^/]+?$") ;

Since the forward slash is not needed to delimit the expression when the regular expression is created this way, it doesn't throw an error inside of TW.
It's unfortunate that the tool at regex101 doesn't allow you to enter the expression as a string.

-- Mark



On Thursday, September 5, 2019 at 8:48:08 PM UTC-7, Mohammad wrote:
TT,
 At https://regex101.com/ the below syntax return errors it needs the slash character to be escaped!

Please have a look


Best wishes
Mohammad


On Thu, Sep 5, 2019 at 4:05 PM @TiddlyTweeter <Tiddly...@assays.tv> wrote:
Advanced use of the Negated Character Class

Match titles with defined numbers of "/" slash
^(([^/]+?)/){1}[^/]+?$

The  difference here is in {1}

If you change the number then it will change the number of "/" permitted in the match.


TT

--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddl...@googlegroups.com.

Mohammad Rahmani

unread,
Sep 6, 2019, 12:34:30 AM9/6/19
to tiddl...@googlegroups.com
Hi Mark,

Thanks for clarification!


Best wishes
Mohammad


To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywiki/7262fe0a-4a4b-402f-a094-f0cf3ee6f94f%40googlegroups.com.
Message has been deleted

@TiddlyTweeter

unread,
Sep 6, 2019, 4:35:12 AM9/6/19
to TiddlyWiki
Ciao Mohammad

As it is it has little utility. I included it for learning purposes. A very simple case that illustrates what Negative Classes do.

I don't know how much regex users have, so I think for docs its useful to give some really simple examples and then build from them.

Best wishes
TT

Mohammad wrote:
 How we can use this in a real case? Not starting with $ sign means all ordinary tiddlers!

@TiddlyTweeter

unread,
Sep 6, 2019, 5:38:24 AM9/6/19
to TiddlyWiki
Mohammad wrote:
Does this pattern allows trailing slashes?

No. To do that you could use ...

^(([^/]*?)/){1,}[^/]*?$
This makes the negation classes "[^/]*?" matching "not /" of 0 or more length (rather than 1 or more)
This will match tiddlers ending "/", as well as cases where the title could just be "///"

By the way, if you want to see all titles with "/" use {1,} = 1 or more

---

But to match that use case only, where you only wanted to list tiddlers with a trailing "/"  use this simple pattern :-) ...

/$

Part of the art with regex is determining when to be minimal and when to go for something with wider matching power but more complexity.
The more complex it gets the more important it gets to test against data to be sure it works as expected.

TT

@TiddlyTweeter

unread,
Sep 6, 2019, 5:49:45 AM9/6/19
to TiddlyWiki
Mohammad wrote:
 At https://regex101.com/ the below syntax return errors it needs the slash character to be escaped!

Mark's detailed answer to why that happens (and why it is not an issue in TW) is really clear.

I think it may be worth adding a note about the way that TW relates to the underlying JS regex engine. 
Tools like TW that access the engine usually use an interface that is defined  by the end programmers.

Its part of the same issue of understanding how the "scope" flags "g" and "m" are invoked in TW. 
This can be important in matching in the text field, which TW can do, but needs a bit more documentation to be optimally used.

TT 

On Friday, 6 September 2019 06:29:13 UTC+2, Mark S. wrote:
There's a difference in javascript between a regular expression, and a string that can be interpreted as a regular expression.

If you notice at regex101, the input box has / at the start and end. So it's assuming a direct regular expression, like:

/^(([^/]+?)/){1}[^/]+?$/gm

The slashes are used to indicate the start and end of the expression, and so any slashes in the middle not part of a character class throw an error.

But we're actually passing a string here. So internally something like this is happening:

var patt = new RegExp("^(([^/]+?)/){1}[^/]+?$") ;

Since the forward slash is not needed to delimit the expression when the regular expression is created this way, it doesn't throw an error inside of TW.
It's unfortunate that the tool at regex101 doesn't allow you to enter the expression as a string.

-- Mark

@TiddlyTweeter

unread,
Sep 6, 2019, 8:03:55 AM9/6/19
to TiddlyWiki
Regex is very powerful and often confusing :-)

For instance ...

^(([^/]*?)/){1,}[^/]*?$

Is actually, functionally, the same as the very simple ...

/


The point is whether to have a more complex regex that can do a lot that is precise in changing via  {1,} or simply match the immediate need that a Tiddler needs contain a "/" slash.

Its a pragmatic tool.

TT
Reply all
Reply to author
Forward
0 new messages