Query: How do I filter to extract content in <russ>CONTENT</russ> in the text field?

@TiddlyTweeter

unread,

Sep 16, 2019, 7:21:15 AM9/16/19

to TiddlyWiki

I like regular expressions. But converting them to TW filter syntax I get lost ... Here is an issue.

For the TEXT field below how would I use "splitregexp" & "regexp" to EXTRACT only the text between the tags?

Morbi non enim facilisis, lacinia odio volutpat, congue arcu. Sed vel ullamcorper 
magna, maximus malesuada nulla. Fusce pharetra commodo facilisis. <russ class="fred">---- #1 This is not
 Latin.----</russ> Integer in justo ac diam <russ class="fred">---- #2 This is not Latin either.----</russ>
lobortis eleifend. Nullam vitae sollicitudin risus. Etiam ut aliquet nulla. 
Morbi facilisis urna id lacus feugiat suscipit.

Quisque a nulla luctus lacus tincidunt euismod. Duis condimentum luctus leo a tristique. 
Donec quis vulputate arcu, non lacinia purus. Nullam sit amet interdum 
lorem. <russ class="fred">---- #3 Nor is this Latin.----</russ>

The output should look like ...

---- #1 This is not Latin.----

---- #2 This is not Latin either.----

---- #3 Nor is this Latin.----

Any help appreciated!

TT

@TiddlyTweeter

unread,

Sep 16, 2019, 7:23:44 AM9/16/19

to TiddlyWiki

repeat for email users ...

Mohammad

unread,

Sep 16, 2019, 9:48:22 AM9/16/19

to TiddlyWiki

Hi TT,

This may be the answer

http://tw-regexp.tiddlyspot.com/#Find%20All%20List%20Items%20in%20a%20Tiddler

Mark explained why he uses splitregexp and then join to create one line text from a tiddler text.

--Mohammad

@TiddlyTweeter

unread,

Sep 16, 2019, 10:52:37 AM9/16/19

to tiddl...@googlegroups.com

Hi Mohammad

Unfortunately I can't get that solution to work when you have text between the matches.

It works fine for sequential <li> or \define, but it fails for me when you have text you need to discard

between matches. As in the example.

So I need CONTENT 1-3 returning, but nothing else...

<tag>CONTENT 1</tag> text not wanted <tag>CONTENT 2</tag>  text not wanted <tag>CONTENT 3</tag>

I could not work out how to do that.

Best wishes

TT

@TiddlyTweeter

unread,

Sep 17, 2019, 7:11:58 AM9/17/19

to TiddlyWiki

I commented on the problem here too: https://groups.google.com/d/msg/tiddlywiki/43x2tbA4ALE/kiePQiIJAgAJ

TT

@TiddlyTweeter

unread,

Sep 17, 2019, 10:29:36 AM9/17/19

to TiddlyWiki

I'm not really getting anywhere on this.

Maybe TW filters are INCAPABLE of discarding text?

I'd like to know. The issue is this ...

I need CONTENT 1-3 returning, but nothing else...

<tag>CONTENT 1</tag> text not wanted <tag>CONTENT 2</tag> text not wanted <tag>CONTENT 3</tag>

I could not work out how to do that.

TT

Mohammad Rahmani

unread,

Sep 17, 2019, 10:34:35 AM9/17/19

to tiddl...@googlegroups.com

TT

This needs a little script.

My find macro already do this. But it possible with regexp

Let's see what Mark has to say.

--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywiki/79251e6c-1209-4251-b61b-74d555b9e7e7%40googlegroups.com.

@TiddlyTweeter

unread,

Sep 17, 2019, 10:48:20 AM9/17/19

to TiddlyWiki

Okay. Let's hope. xxx

On Tuesday, 17 September 2019 16:34:35 UTC+2, Mohammad wrote:

TT
This needs a little script.
My find macro already do this. But it possible with regexp

Let's see what Mark has to say.

On Tue, Sep 17, 2019, 6:59 PM @TiddlyTweeter <Tiddly...@assays.tv> wrote:

I'm not really getting anywhere on this.

Maybe TW filters are INCAPABLE of discarding text?

I'd like to know. The issue is this ...

I need CONTENT 1-3 returning, but nothing else...

<tag>CONTENT 1</tag> text not wanted <tag>CONTENT 2</tag> text not wanted <tag>CONTENT 3</tag>

I could not work out how to do that.

TT

--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tiddl...@googlegroups.com.

Mark S.

unread,

Sep 17, 2019, 12:21:08 PM9/17/19

to TiddlyWiki

Actually, now I wonder if the original code was undertested.

You need a nested list inside the first one:

<$vars realchars="[^\s]+">
<$list filter="[{mas01}splitregexp[\n]join[ ]splitregexp[<russ.*?>]butfirst[1]]" variable=item>
<$list filter="[<item>splitregexp[</russ>]butlast[1]]" variable=item2>
<$text text=<<item2>>/><br/>
</$list>
</$list>
</$vars>

In this example {mas01} refers to a tiddler with your test text.

Mark S.

unread,

Sep 17, 2019, 12:58:49 PM9/17/19

to TiddlyWiki

You don't seem to need realchars for this. So:

<$list filter="[{mas01}splitregexp[\n]join[ ]splitregexp[<russ.*?>]butfirst[1]]" variable=item>
<$list filter="[<item>splitregexp[</russ>]butlast[1]]" variable=item2>
<$text text=<<item2>>/><br/>
</$list>
</$list>

coda coder

unread,

Sep 17, 2019, 1:57:22 PM9/17/19

to TiddlyWiki

Thanks to Josiah for proxy-handling this!

Thanks Mark - that's pretty much nailed it.

Can you think of/see a way to condense this into one macro?

\define q-f1(tid, tagname) [{$tid$}splitregexp[\n]join[ ]splitregexp[<$tagname$.*?>]butfirst[1]]
\define q-f2(tagname) [<item>splitregexp[</$tagname$>]butlast[1]]


\define q(tid, tagname)
<$list filter=<<q-f1 $tid$ $tagname$>> variable=item>
<$list filter=<<q-f2 $tagname$>> variable=item2>


<$text text=<<item2>>/><br/>
</$list>
</$list>

\end

<<q 2-020 sauron>>

.

Mark S.

unread,

Sep 17, 2019, 2:16:15 PM9/17/19

to TiddlyWiki

If there was a regular expression filter, it could be done in one list statement:

<$list filter="[{sample}splitregexp[\n]join[ ]regexps[(?g)<russ.*?>.*?</russ>]regexps[<russ.*?>(.*?)</russ>]]">

@TiddlyTweeter

unread,

Sep 17, 2019, 2:21:16 PM9/17/19

to TiddlyWiki

Mark S. wrote:

If there was a regular expression filter, it could be done in one list statement:

<$list filter="[{sample}splitregexp[\n]join[ ]regexps[(?g)<russ.*?>.*?</russ>]regexps[<russ.*?>(.*?)</russ>]]"

And there we have it. Don't die soon. Or we'd be in a bad mess.

Mat

unread,

Sep 17, 2019, 2:25:01 PM9/17/19

to TiddlyWiki

I'm following this with interest. Thank you, all involved.

<:-)

Mat

unread,

Sep 17, 2019, 2:26:29 PM9/17/19

to TiddlyWiki

BTW, I'd say this is (should be) of general enough interest that it belongs in the docs.

<:-)

Mark S.

unread,

Sep 17, 2019, 2:36:38 PM9/17/19

to TiddlyWiki

Eble ...

\define q(tid, tagname)
<$list filter="""[{$tid$}splitregexp[\n]join[ ]splitregexp[<$tagname$.*?>]butfirst[1]]""" variable=item>
<$list filter="""[<item>splitregexp[</$tagname$>]butlast[1]]""" variable=item2>


<$text text=<<item2>>/><br/>
</$list>
</$list>

\end

coda coder

unread,

Sep 17, 2019, 3:25:07 PM9/17/19

to TiddlyWiki

Bango!

(Like bingo but with more bang).

Mohammad

unread,

Sep 17, 2019, 4:11:47 PM9/17/19

to TiddlyWiki

Added to tw-regexp

--Mohammad

Mohammad

unread,

Sep 17, 2019, 4:26:44 PM9/17/19

to TiddlyWiki

Now in

http://tw-regexp.tiddlyspot.com/#Extract%20Contents%20between%20Html%20Tags

--Mohammad

On Tuesday, September 17, 2019 at 11:06:38 PM UTC+4:30, Mark S. wrote:

@TiddlyTweeter

unread,

Sep 19, 2019, 8:15:54 AM9/19/19

to TiddlyWiki

Ciao Mark

Tweaking the baby ...

... erm, is "splitregexp[\n]join[ ]" needed?

TT

Mark S.

unread,

Sep 19, 2019, 9:38:36 AM9/19/19

to TiddlyWiki

If you try it on your original suggested data, you will find it messes up on the first "this is not latin" if you remove the join.

This is because the regular expression only works on a single tiddler name, Each sentence fragment became its own

tiddler name in the first split. So they have to be joined up again so that the next expression can see the entirety of the text as one pseudo-tiddler.

If you knew in advance that your HTML was well formed and there were no line breaks between tags, then you wouldn't need the join.

HTH

Mark

@TiddlyTweeter

unread,

Sep 19, 2019, 9:57:53 AM9/19/19

to TiddlyWiki

Thanks Mark!

Its useful to understand what is going on. The "prep" needed for matching is unusual.

TT

Reply all

Reply to author

Forward