Wildcards in Search/replace

30 views
Skip to first unread message

nmyshkin

unread,
Jul 4, 2021, 1:04:14 PMJul 4
to Tasker
I'm trying to fix a routine that extracts just the text from a web page and formats it in simple HTML for local viewing. The host (BBC News) has recently done an extreme overhaul of their HTML which is now dense with css stuff.

This is the string:

<p class="ssrcss-1q0x1qg-Paragraph eq5iqo00">

I want to replace it with:

<p>

From there I build my simplified text document with slicing and dicing.

The problem: while every example I have looked at so far includes all the css stuff after "<p", I am not confident that this will always be the case so I have tried to use a wildcard in the search operation:

<p class=*>

This has not worked, although searching on the entire string does. Using wildcards with Tasker has always been hit-or-miss with me. Is it possible in this context, and if so, what have I done wrong?

jmjc...@gmail.com

unread,
Jul 4, 2021, 10:33:27 PMJul 4
to Tasker
If you use the Variable Search Replace action, you can use regex in the Search field:

A1: Variable Set %string To <p class="ssrcss-1q0x1qg-Paragraph eq5iqo00">

A2: Variable Search Replace
Variable: %string
Search: <p class=.*
Replace Matches: On
replace With: <p>

A3: Flash %string

nmyshkin

unread,
Jul 5, 2021, 4:35:36 PMJul 5
to Tasker
Thanks for responding. I am still confused. From my reading of your suggestion it seems that searching on <p class=.* would find the first instance of <p class= and select it and everything that follows, then replace with <p>. All the text would be lost! Seems like the closing ">" is needed somewhere so the entire file is not replaced.

jmjc...@gmail.com

unread,
Jul 5, 2021, 9:33:23 PMJul 5
to Tasker
You are right. I misread your post and thought you only wanted to search and replace a single string. If you need to replace all <p> tags in a file, then you will use this instead:

Search: <p class=.*?>

nmyshkin

unread,
Jul 6, 2021, 12:35:52 PMJul 6
to Tasker
Thanks, that did it! Hopefully that will allow my routine to survive future changes in their HTML/css structure, at least for awhile.

I had some documentation which described the use of ., *, and ? in Regex expressions but I would not have tried the combination you suggested. I stuil have a lot more to learn about those arcane structures!
Reply all
Reply to author
Forward
0 new messages