Query: How do I filter to extract content in <russ>CONTENT</russ> in the text field?

144 views
Skip to first unread message

@TiddlyTweeter

unread,
Sep 16, 2019, 7:21:15 AM9/16/19
to TiddlyWiki
I like regular expressions. But converting them to TW filter syntax I get lost ... Here is an issue. 

For the TEXT field below how would I use "splitregexp" & "regexp" to EXTRACT only the text between the tags?

Morbi non enim facilisis, lacinia odio volutpat, congue arcu. Sed vel ullamcorper
magna
, maximus malesuada nulla. Fusce pharetra commodo facilisis. <russ class="fred">---- #1 This is not
 
Latin.----</russ> Integer in justo ac diam <russ class="fred">---- #2 This is not Latin either.----</russ>
lobortis eleifend
. Nullam vitae sollicitudin risus. Etiam ut aliquet nulla.
Morbi facilisis urna id lacus feugiat suscipit.

Quisque a nulla luctus lacus tincidunt euismod. Duis condimentum luctus leo a tristique.
Donec quis vulputate arcu, non lacinia purus. Nullam sit amet interdum
lorem
. <russ class="fred">---- #3 Nor is this Latin.----</russ>

The output should look like ... 

---- #1 This is not Latin.----
---- #2 This is not Latin either.----
---- #3 Nor is this Latin.---- 
 
Any help appreciated!
TT
 

@TiddlyTweeter

unread,
Sep 16, 2019, 7:23:44 AM9/16/19
to TiddlyWiki
repeat for email users ...

Mohammad

unread,
Sep 16, 2019, 9:48:22 AM9/16/19
to TiddlyWiki
Hi TT,
 This may be the answer


Mark explained why he uses splitregexp and then join to create one line text from a tiddler text.

--Mohammad

@TiddlyTweeter

unread,
Sep 16, 2019, 10:52:37 AM9/16/19
to tiddl...@googlegroups.com
Hi Mohammad

Unfortunately I can't get that solution to work when you have text between the matches.

It works fine for sequential <li> or \define, but it fails for me when you have text you need to discard 
between matches. As in the example.

So I need CONTENT 1-3 returning, but nothing else... 

<tag>CONTENT 1</tag> text not wanted <tag>CONTENT 2</tag>  text not wanted <tag>CONTENT 3</tag>

I could not work out how to do that.

Best wishes
TT

@TiddlyTweeter

unread,
Sep 17, 2019, 7:11:58 AM9/17/19
to TiddlyWiki

@TiddlyTweeter

unread,
Sep 17, 2019, 10:29:36 AM9/17/19
to TiddlyWiki
I'm not really getting anywhere on this.

Maybe TW filters are INCAPABLE of discarding text?

I'd like to know. The issue is this ... 

I need CONTENT 1-3 returning, but nothing else... 
 
<tag>CONTENT 1</tag> text not wanted <tag>CONTENT 2</tag>  text not wanted <tag>CONTENT 3</tag>
  
I could not work out how to do that.

TT


Mohammad Rahmani

unread,
Sep 17, 2019, 10:34:35 AM9/17/19
to tiddl...@googlegroups.com
TT
This needs a little script.
My find macro already do this. But it possible with regexp

Let's see what Mark has to say.

--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywiki/79251e6c-1209-4251-b61b-74d555b9e7e7%40googlegroups.com.

@TiddlyTweeter

unread,
Sep 17, 2019, 10:48:20 AM9/17/19
to TiddlyWiki
Okay. Let's hope. xxx


On Tuesday, 17 September 2019 16:34:35 UTC+2, Mohammad wrote:
TT
This needs a little script.
My find macro already do this. But it possible with regexp

Let's see what Mark has to say.

On Tue, Sep 17, 2019, 6:59 PM @TiddlyTweeter <Tiddly...@assays.tv> wrote:
I'm not really getting anywhere on this.

Maybe TW filters are INCAPABLE of discarding text?

I'd like to know. The issue is this ... 

I need CONTENT 1-3 returning, but nothing else... 
 
<tag>CONTENT 1</tag> text not wanted <tag>CONTENT 2</tag>  text not wanted <tag>CONTENT 3</tag>
  
I could not work out how to do that.

TT


--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddl...@googlegroups.com.

Mark S.

unread,
Sep 17, 2019, 12:21:08 PM9/17/19
to TiddlyWiki
Actually, now I wonder if the original code was undertested.

You need a nested list inside the first one:

<$vars realchars="[^\s]+">
<$list filter="[{mas01}splitregexp[\n]join[ ]splitregexp[
<russ.*?>]butfirst[1]]" variable=item>
<$list filter="[
<item>splitregexp[</russ>]butlast[1]]" variable=item2>
<$text text=<
<item2>>/><br/>
</$list>
</$list>
</$vars>

In this example {mas01} refers to a tiddler with your test text.

Mark S.

unread,
Sep 17, 2019, 12:58:49 PM9/17/19
to TiddlyWiki
You don't seem to need realchars for this. So:

<$list filter="[{mas01}splitregexp[\n]join[ ]splitregexp[<russ.*?>]butfirst[1]]" variable=item>
<$list filter="[
<item>splitregexp[</russ>]butlast[1]]" variable=item2>
<$text text=<
<item2>>/><br/>
</$list>
</$list>



coda coder

unread,
Sep 17, 2019, 1:57:22 PM9/17/19
to TiddlyWiki
Thanks to Josiah for proxy-handling this!

Thanks Mark - that's pretty much nailed it.

Can you think of/see a way to condense this into one macro?

\define q-f1(tid, tagname) [{$tid$}splitregexp[\n]join[ ]splitregexp[<$tagname$.*?>]butfirst[1]]
\define q-f2(tagname) [<item>splitregexp[</$tagname$>]butlast[1]]


\define q(tid, tagname)
<$list filter=<<q-f1 $tid$ $tagname$>> variable=item>
<$list filter=<<q-f2 $tagname$>> variable=item2>

<$text text=<<item2>>/><br/>
</$list>
</
$list>
\end

<<q 2-020 sauron>>



.

Mark S.

unread,
Sep 17, 2019, 2:16:15 PM9/17/19
to TiddlyWiki
If there was a regular expression filter, it could be done in one list statement:

<$list filter="[{sample}splitregexp[\n]join[ ]regexps[(?g)<russ.*?>.*?</russ>]regexps[<russ.*?>(.*?)</russ>]]">

@TiddlyTweeter

unread,
Sep 17, 2019, 2:21:16 PM9/17/19
to TiddlyWiki
Mark S. wrote:
If there was a regular expression filter, it could be done in one list statement:

<$list filter="[{sample}splitregexp[\n]join[ ]regexps[(?g)<russ.*?>.*?</russ>]regexps[<russ.*?>(.*?)</russ>]]"

And there we have it. Don't die soon. Or we'd be in a bad mess. 

Mat

unread,
Sep 17, 2019, 2:25:01 PM9/17/19
to TiddlyWiki
I'm following this with interest. Thank you, all involved.
<:-)

Mat

unread,
Sep 17, 2019, 2:26:29 PM9/17/19
to TiddlyWiki
BTW, I'd say this is (should be) of general enough interest that it belongs in the docs.
<:-)

Mark S.

unread,
Sep 17, 2019, 2:36:38 PM9/17/19
to TiddlyWiki

Eble ...

\define q(tid, tagname)
<$list filter="""[{$tid$}splitregexp[\n]join[ ]splitregexp[<$tagname$.*?>]butfirst[1]]""" variable=item>
<$list filter="""[<item>splitregexp[</$tagname$>]butlast[1]]""" variable=item2>

<$text text=<<item2>>/><br/>
</$list>
</
$list>
\end


coda coder

unread,
Sep 17, 2019, 3:25:07 PM9/17/19
to TiddlyWiki
Bango!

(Like bingo but with more bang).

Mohammad

unread,
Sep 17, 2019, 4:11:47 PM9/17/19
to TiddlyWiki
Added to tw-regexp

--Mohammad

Mohammad

unread,
Sep 17, 2019, 4:26:44 PM9/17/19
to TiddlyWiki

On Tuesday, September 17, 2019 at 11:06:38 PM UTC+4:30, Mark S. wrote:

@TiddlyTweeter

unread,
Sep 19, 2019, 8:15:54 AM9/19/19
to TiddlyWiki
Ciao Mark

Tweaking the baby ...

... erm, is "splitregexp[\n]join[ ]" needed?

TT

Mark S.

unread,
Sep 19, 2019, 9:38:36 AM9/19/19
to TiddlyWiki
If you try it on your original suggested data, you will find it messes up on the first "this is not latin" if you remove the join.
This is because the regular expression only works on a single tiddler name, Each sentence fragment became its own
tiddler name in the first split. So they have to be joined up again so that the next expression can see the entirety of the text as one pseudo-tiddler.

If you knew in advance that your HTML was well formed and there were no line breaks between tags, then you wouldn't need the join.

HTH
Mark

@TiddlyTweeter

unread,
Sep 19, 2019, 9:57:53 AM9/19/19
to TiddlyWiki
Thanks Mark!

Its useful to understand what is going on. The "prep" needed for matching is unusual.

TT
Reply all
Reply to author
Forward
0 new messages