Query: Are Regular Expressions Regular in TW?

273 views
Skip to first unread message

@TiddlyTweeter

unread,
Aug 23, 2017, 2:19:35 PM8/23/17
to tiddl...@googlegroups.com
Over in another thread I finally got to understand filters better. And, thanks to Mark S., achieved what I needed to do.

HOWEVER.

I feel slightly diminished. I am crap with computers. But one thing I know well is Regular Expressions. I was surprised at the very complex loops that i had to go through to CHANGE stuff in the way I needed to in that thread.

Let me give an example ...

<$set name=test filter="
[list[!!text]prefix[#]]
[list[!!text]prefix[#]
removesuffix['s]] +[!suffix['s]]
[list[!!text]prefix[#]
removesuffix[.]] +[!suffix[.]]
[list[!!text]prefix[#]
removesuffix[...]] +[!suffix[...]]
[list[!!text]prefix[#]
removesuffix[,]] +[!suffix[,]]
[list[!!text]prefix[#]
removesuffix[;]] +[!suffix[;]]
[list[!!text]prefix[#]
removesuffix[:]] +[!suffix[:]]
[list[!!text]prefix[#]
removesuffix[!]] +[!suffix[!]]
[list[!!text]prefix[#]
removesuffix[?]] +[!suffix[?]]
[list[!!text]prefix[#]
removesuffix[--]] +[!suffix[--]]
[list[!!text]prefix[...#]
removeprefix[...]] +[!prefix[...]]
[list[!!text]prefix[--#]
removeprefix[--]] +[!prefix[--]]
">

Here is sample data this deals with ...

left alone - #buddha #buddha #buddha 
remove apostrophised ending
- #BuddHA's
remove fullstop
- #buddha.
remove comma
- #buddha,
remove semicolon
- #buddha;
remove colon
- #buddha:
remove exclamation mark
- #buddha!
remove question mark
- #buddha?
remove
2 trailing dashes - #buddha--
remove
3 trailing stops - #BUDDHA...
remove
2 leading dashes - --#Buddha
remove
3 leading stops - ...#Buddha
left alone
- #notBuddha
#BuddhaDANGER" (NOT YET DEALT WITH)
left alone
#karma

In JavaScript Regular Expressions ALL these cases, and more---discarding the cruft of punctuation that all that code in the first box has to deal with---would be matched by simply: "#\w+\b". Nothing more would be needed.

If you could just transfer the match you would not have any of that prefix/suffix malarky to cope with.

WHY is it SO difficult in TW to do that?

It seems nuts.

I am aware there is a regexp operator, but can it RETURN its exact match for processing, or is it what it looks like: a match with something IN a field it adds to a list to display but then the match itself is discarded?

All of this is as long way of saying two things ...

(1) Regular Expressions (BOTH match & replace) are natural TW allies

(2) I don't understand why we don't have a more friendly relationship with them.

Please ask if anything is unclear.

Best wishes
Josiah

codacoder...@outlook.com

unread,
Aug 23, 2017, 3:56:29 PM8/23/17
to TiddlyWiki


On Wednesday, August 23, 2017 at 1:19:35 PM UTC-5, @TiddlyTweeter wrote:

I am aware there is a regexp operator, but can it RETURN its exact match for processing, or is it what it looks like: a match with something IN a field it adds to a list to display but then the match itself is discarded?


Timely.  I was looking to do this yesterday.  No, I didn't find a solution either.  What's happening is, TW5 is using the regexp op to filter in-or-out the tiddler during that run, not the matched string.  Be a nice addition to have at some point, though.

rexpexps -- matches the specified string of text and //returns it//.

However, I'm not sure there would be enough people voting it up - but I could be wrong.


Mark S.

unread,
Aug 23, 2017, 5:18:23 PM8/23/17
to TiddlyWiki
I'm playing with version that does replacement with regexp. Based on original regexp code with a couple of tweaks. Seems to work. Absolutely make backups, because sometimes one gets blind-sided by subtle discrepancies as I discovered with the  seemingly simple tolower filter.

After install, your hashtag2tag filter could look like

[list[!!text]regexps[#\w+\b]tolower[]]

if you also installed the latest edition of tolower (in other thread,  first version had a subtle bug).

I suppose the problem with this simple approach is that a regular expression could return many results (I didn't try the global options). Some people might expect all those hits to be returned. That might pose complications.

Thanks,
Mark
In JavaScript Regular Expressions ALL these cases, and more---discarding the cruft of punctuation that all that code in the first box has to deal with---would be matched by simply: "#w+\b". Nothing more would be needed.
$__core_modules_filters_regexps.js.json

@TiddlyTweeter

unread,
Aug 23, 2017, 8:38:24 PM8/23/17
to TiddlyWiki
Ciao Mark S.

WHOAH! Utterly BRILLIANT. Its absolutely spot-on for what I need. It also seamlessly dealt with the tricky case of double quotes I couldn't figure out. Perfect for this kind of job. Can't thank you enough.

For readers who maybe not understand quite what is going on, Mark S. just worked on a bit of JavaScript that means the user now only needs to write this ...

<$set name=test filter="

[list[!!text]regexps[#\w+\b]tolower[]]
">

Rather than this (and more) ...

<$set name=test filter="
[list[!!text]prefix[#]tolower[]]
[list[!!text]prefix[#]removesuffix['s]tolower[]] +[!suffix['s]]
[list[!!text]prefix[#]removesuffix[.]tolower[]] +[!suffix[.]]
[list[!!text]prefix[#]removesuffix[...]tolower[]] +[!suffix[...]]
[list[!!text]prefix[#]removesuffix[,]tolower[]] +[!suffix[,]]
[list[!!text]prefix[#]removesuffix[;]tolower[]] +[!suffix[;]]
[list[!!text]prefix[#]removesuffix[:]tolower[]] +[!suffix[:]]
[list[!!text]prefix[#]removesuffix[!]tolower[]] +[!suffix[!]]
[list[!!text]prefix[#]removesuffix[?]tolower[]] +[!suffix[?]]
[list[!!text]prefix[#]removesuffix[--]tolower[]] +[!suffix[--]]
[list[!!text]prefix[...#]removeprefix[...]tolower[]] +[!prefix[...]]
[list[!!text]prefix[--#]removeprefix[--]tolower[]] +[!prefix[--]]
">

This deserves the Batman theme tune: https://www.youtube.com/watch?v=kK4H-LkrQjQ&feature=youtu.be

Buona Notte
Josiah

PMario

unread,
Aug 23, 2017, 8:55:14 PM8/23/17
to TiddlyWiki
On Wednesday, August 23, 2017 at 11:18:23 PM UTC+2, Mark S. wrote:
After install, your hashtag2tag filter could look like

[list[!!text]regexps[#\w+\b]tolower[]]

IMO you should create a PR at github.

-m

Mark S.

unread,
Aug 24, 2017, 3:59:52 PM8/24/17
to TiddlyWiki
Another version of regexps to play with. This version will pull out all matches at once when the global flag is used. So the filter could look like:

"[<currentTiddler>get[text]regexps[(?g)#\w+\b]tolower[]]"

It might not make a difference in this use-case, but perhaps in some other situation.

Notes:

The inverse function (!regexps) still does whatever (!regexp) does.

Regular expressions in javascript can return sub-groups for non-global expression searches. It might be possible to implement this in regexps, if someone can think of a good use-case. Mostly, instead of "removeprefix" and "removesuffix", you could grab whatever text you wanted from the middle of a tiddler title or filter stream. You could even make a single title (or field) split into multiple filter outputs. Once again -- what would be a use-case?

Have fun,

Mark



On Wednesday, August 23, 2017 at 11:19:35 AM UTC-7, @TiddlyTweeter wrote:
$__core_modules_filters_regexps.js(1).json

codacoder...@outlook.com

unread,
Aug 24, 2017, 4:28:44 PM8/24/17
to TiddlyWiki
Hi Mark


On Thursday, August 24, 2017 at 2:59:52 PM UTC-5, Mark S. wrote:
Regular expressions in javascript can return sub-groups for non-global expression searches. It might be possible to implement this in regexps, if someone can think of a good use-case. Mostly, instead of "removeprefix" and "removesuffix", you could grab whatever text you wanted from the middle of a tiddler title or filter stream. You could even make a single title (or field) split into multiple filter outputs. Once again -- what would be a use-case?


Okay, I'm trying my best to not make this long-winded.  My use case is as follows...

I'm writing a documentation TW for an app.  It has logical chapters and sections within chapters called chapsecs (1 chapsec = 1 tiddler).  I'm trying to take the approach that the system will be a good authoring tool for... well, authoring anything "book-like".

Within chapsec tiddlers, I use annotations macros to annotate the text. Annotations can be enabled (made visible) and disabled (made invisible) globally. The annotation macro is named bk-ann.

What I'd like to do is, for each chapsec, find all bk-anns in the text field and list their content.  An example might be...

                      <<bk-ann "Explain the use of the Tiddlywiki's macro system">>

and

                      <<bk-ann "Explain the scope rules of Tiddlywiki macros">>


As you can see, this text could become a synopsis of each chapsec if I could figure out a way to re-use the annotation text.  The output would render:

  Section 1-010 synopsis:

    Explain the use of the Tiddlywiki's macro system

    Explain the scope rules of Tiddlywiki macros

I'm guessing we're not quite there yet?  But hoping I'm wrong :)


Thomas Elmiger

unread,
Aug 24, 2017, 5:50:23 PM8/24/17
to TiddlyWiki
Hi Codacoder

Something like synopsis extraction exists in an experiment I developed some time ago:

https://tid.li/tw5/numbers.html#Chapter%201:%5B%5BChapter%201%5D%5D%20Footnotes

Here I extract footnotes from a text using my extract macro based on existing TW5 technology.

https://tid.li/tw5/hacks.html#Extract%20Macro

Problem: It does not work for transcluded sections … that would be the same with a regexp solution.

Cheers,
Thomas

codacoder...@outlook.com

unread,
Aug 24, 2017, 6:31:31 PM8/24/17
to TiddlyWiki
Excellent Thomas - thank you.  I'll grab it later (or tomorrow, more likely) and let you know how it goes.

Danke!

@TiddlyTweeter

unread,
Aug 25, 2017, 5:17:58 AM8/25/17
to TiddlyWiki
Ciao Mark S.

Many thanks. I'll test it out and get back to you...


Mark S. wrote:
Another version of regexps to play with. This version will pull out all matches at once when the global flag is used. So the filter could look like:

"[<currentTiddler>get[text]
regexps[(?g)#\w+\b]tolower[]]"
 
Regular expressions in javascript can return sub-groups for non-global expression searches. It might be possible to implement this in regexps, if someone can think of a good use-case.

I definitely could use that. Let me go look at what I'm currently working on and find a real use case. Sub-groups allow a lot of things, like (1) moving bits of the matched string around; (2) discarding bits of it you don't need; (3) duplicating a sub-group, say as a heading.

Plus. With what you have developed so far is it possible to return a MATCH COUNT? If that were possible I have definite use cases I can lay out.

This is exciting.

Best wishes
Josiah

codacoder...@outlook.com

unread,
Aug 25, 2017, 8:47:46 AM8/25/17
to tiddl...@googlegroups.com
Thomas!!!

Awesome (not a word I throw around like confetti).  Just AWESOME!!!

I recall we talked about textstretch when it was "new", quite some time ago. I ended up going my own route with bk-ann (and a couple of others, like bk-note and bk-problem which use different colors and are "transient" since they're like TODOs).  But extract was the missing "elephant in the room" ;) 

I'm so pleased and thankful!  Thank you!

Coda

Mark S.

unread,
Aug 25, 2017, 10:18:05 AM8/25/17
to TiddlyWiki
This version will return  sub-groups for non-global searches. So in codacoder's example, this search:

<$list filter='[tag[TestGroups]get[text]regexps[(?g)bk-ann ".+"]regexps[bk-ann "(.+)"]]'  >

will change this:


<<bk-ann "Explain the use of the Tiddlywiki's macro system">>X


and

                      <<bk-ann "Explain the scope rules of Tiddlywiki macros">>


Into



Explain the use of the Tiddlywiki's macro system
Explain the scope rules of Tiddlywiki macros

Notice that it takes 2 regexps filters to accomplish this (the first global one finds the items, the second trims them).

For counts, you can just use the count[] filter after the regexps filter you want.

Backup before trying.

Good luck!
Mark
$__core_modules_filters_regexps.js(2).json

Mark S.

unread,
Aug 25, 2017, 6:28:42 PM8/25/17
to TiddlyWiki

PR #2963


Mark

@TiddlyTweeter

unread,
Aug 26, 2017, 4:30:07 PM8/26/17
to TiddlyWiki
Ciao Mark S. & all ...

I have an issue using character sets in square brackets. A case came up on the #buddha stuff, covered earlier in this thread, I overlooked. Occasionally I'd include a web address in a Tweet ...

Under the current regex "#\w+\b", it would capture "#BuddhaLink". It shouldn't as its NOT a Twitter hashtag, its just part of an address. There are several regex solutions. An economical one looks to me like "[^a-z](#\w+\b)".

In this case the capture group is what needs returning, not the whole match.

I have an issue ... I simply can't figure out how to get the character set in square brackets into the regex match formula.

Any help much appreciated
Best wishes
Josiah

Mark S.

unread,
Aug 26, 2017, 7:26:31 PM8/26/17
to TiddlyWiki
According to the regexp operator tiddler at tiddlywiki.com:

The filter syntax makes it impossible to directly specify a regular expression that contains square brackets. The solution is to store the expression in a variable. See the examples.

In the examples:


<$set name="digit-pattern" value="[0-9]{2}">
<<list-links "[regexp:title<digit-pattern>]">>
</$set>

HTH
Mark

codacoder...@outlook.com

unread,
Sep 29, 2017, 7:37:58 PM9/29/17
to TiddlyWiki
Sorry I'm so late coming to this Mark, but I think I've found an issue...


On Friday, August 25, 2017 at 9:18:05 AM UTC-5, Mark S. wrote:
This version will return  sub-groups for non-global searches. So in codacoder's example, this search:

<$list filter='[tag[TestGroups]get[text]regexps[(?g)bk-ann ".+"]regexps[bk-ann "(.+)"]]'  >

will change this:


<<bk-ann "Explain the use of the Tiddlywiki's macro system">>X

and

                      <<bk-ann "Explain the scope rules of Tiddlywiki macros">>


Into


Explain the use of the Tiddlywiki's macro system
Explain the scope rules of Tiddlywiki macros

Notice that it takes 2 regexps filters to accomplish this (the first global one finds the items, the second trims them).


Understand that there may be zero or more (usually more than one) bk-ann in any given tiddler.  So, in the case of the example (if both are included in one tiddler, I see only

    Explain the use of the Tiddlywiki's macro system

IOW, once a match is found, the rest of the text is ignored.  Is there a way to make the regex less greedy and carry on to do the rest?

 Coda
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Mark S.

unread,
Sep 29, 2017, 8:46:54 PM9/29/17
to TiddlyWiki
Thank you for the feedback.

There was a bug in the javascript macro. I've attached a new version.

Also, for your finder filter, you (probably) need to add the non-greedy operator ("?").  So your filter looks like:

<$list filter='[tag[TestGroups]get[text]regexps[(?g)bk-ann ".+?"]regexps[bk-ann "(.+?)"]]'  >

HTH
Mark
$__core_modules_filters_regexps.js(3).json

codacoder...@outlook.com

unread,
Sep 29, 2017, 11:01:31 PM9/29/17
to TiddlyWiki
Bingo!

Thank you Mark, that's a really useful addition to my toolbox.
Reply all
Reply to author
Forward
0 new messages