<$select tiddler="myregexp">
<option value="^[0-9]*$">Only digits</option>
<option value="^[a-z]*$">Only lower case</option>
<option value="^[A-Z]*$">Only upper case</option>
<option value="^[\w-_]*$">Only alphanumeric, _, and -</option>
<option value="^[\w]{3,15}$">Only alphanum len 3-15</option>
<option value="^[A-Z]+.*$">Starts with capital</option>
<option value="^[0-9]+.*$">Starts with digit</option>
<option value="^.+\.[a-zA-Z]{3,4}$">Extensions only</option>
<option value="^.+(\.jpg|\.gpeg)$">Extension jpg gpeg</option>
</$select>
<$list filter="[regexp{myregexp}sort[]]">
</$list>
<option value="^[0-9]*$">Only digits</option>
<option value="^[a-z]*$">Only lower case</option>
<option value="^[A-Z]*$">Only upper case</option>
<option value="^[0-9]+$">Only digits</option>
or use "\d", shorthand for [0-9]
<option value="^\d+$">Only digits</option> "\d" is shorthand for [0-9]
<option value="^[a-z]+$">Only lower case</option>
<option value="^[A-Z]+$">Only upper case</option>
<option value="^[\w-_]*$">Only alphanumeric, _, and -</option>
<option value="^(\w|-)+$">Only alphanumeric, _, and -</option>
or
<option value="^[-a-zA-Z0-9_]+$">Only alphanumeric, _, and -</option>
or, its good practice to make explicit the need for literal "-", like so ...
<option value="^[\-a-zA-Z0-9_]+$">Only alphanumeric, _, and -</option>
or, less kosher
<option value="^[\-\w]+$">Only alphanumeric, _, and -</option>
<option value="^[\w]{3,15}$">Only alphanum len 3-15</option>
<option value="^\w{3,15}$">Only alphanum len 3-15</option>
<option value="^[A-Z]+.*$">Starts with capital</option><option value="^[0-9]+.*$">Starts with digit</option>
<option value="^[A-Z].*$">Starts with capital</option>
<option value="^[0-9].*$">Starts with digit</option>
<option value="^.+\.[a-zA-Z]{3,4}$">Extensions only</option><option value="^.+(\.jpg|\.gpeg)$">Extension jpg gpeg</option>
<$vars digonly="^[0-9]*$">
<$vars useme=<<digonly>>>
</$vars>
</$vars>
<$select tiddler="myregexp">
<option value="^[0-9]*$">Only digits</option>
<option value="^[a-z]*$">Only lower case</option>
<option value="^[A-Z]*$">Only upper case</option>
<option value="^[\w-_]*$">Only alphanumeric, _, and -</option>
<option value="^[\w]{3,15}$">Only alphanum len 3-15</option>
<option value="^[A-Z]+.*$">Starts with capital</option>
<option value="^[0-9]+.*$">Starts with digit</option>
<option value="^.+\.[a-zA-Z]{3,4}$">Extensions only</option>
<option value="^.+(\.jpg|\.gpeg)$">Extension jpg gpeg</option>
<option value="^\b(\w{2,})\b.*\b\1\b.*$">Duplicate words</option>
<option value="^\b(\w{2,})\b.*\b\1\b.*$">Duplicate words</option>
<option value="^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s{1}\d{2}\s\d{4}$">Date like Jan 06 2019</option>
<option value="^\d{4}\.[0-1]\d\.[0-3]\d$">Date like 2019.08.25</option>
</$select>
<$list filter="[regexp{myregexp}sort[]]">
</$list>
Please give your use case.
are a date in format like Jan 06 2019
are a date in format like 2019.08.25
Actual validation of dates would take
real code massaging.
Mark, your solutions look good on dates and with good basic match to exclude a lot of things that would not be proper dates.Its probably all that is needed practically.FYI, it is actually possible to use regex to correctly match dates. I know because I've done it to accurately match dates, including leap years, under both Gregorian & Julian calendars. Its just enormously complex :-). Yeah, coding is better suited for that.
<option value="^(?=\d{4})(((?!\d\d(00|04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96))\d{4}-(((0[13578]|10|12)-(0[1-9]|[12]\d|30|31))|((04|06|09|11)-(0[1-9]|[12]\d|30))|((02)-(0[1-9]|1\d|2[1-8]))))|((?=\d\d(00|04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96))\d{4}-(((0[13578]|10|12)-(0[1-9]|[12]\d|30|31))|((04|06|09|11)-(0[1-9]|[12]\d|30))|((02)-(0[1-9]|1\d|2[1-9])))))">Experimental VALIDATE yyyy-mm-dd</option>
Mohammad:are a date in format like Jan 06 2019are a date in format like 2019.08.25Mark:Actual validation of dates would takereal code massaging.
TT
Love your work. I suggest we make a small bundle of tiddlers and macros, perhaps even some filters for use in subfilter operator for this collection.
Of course these would often be used for searching but they can be used for validation.
It is however important to remember the html tag can be set on the edit-text widget for rudimentary validation as well.
A supporting macro that allows you to test a variable or text reference against one or more of these tests would be helpful. Especialy when using edit-text widget. It may be as simple as displaying a message when the result is not what the regex tests for. Search a single value for the reflex pattern and indicate when it is not found. E.g. not number only.
Regards
Tony
- from X until the end of the sentence
- ...until the end of the paragraph
- ...until the end of the text field
This is the content of a field containing X and the rest of it.
<option value="X(.+)$">Capture match after FIRST "X"</option>
This is the content of a field X containing X again and the rest of it.
<option value="^.*X(.+)$">Capture match after LAST "X"</option>
- all possible characters that typically follow a word in common text i.e:
. OR , OR (space) OR ! OR ? OR : OR ; OR (spacechar) OR (end of text field) OR .......I don't even know
([ .,;:!?]|$)
Here's a probably inefficient date VALIDATOR for dates starting 0000-01-01 following format yyyy-mm-dd (which I find to be the most generally useful). Ok, I didn't check the rules. I think there's something about a surpriseleap year every 400 years, so there's probably more tweaking to be done to match the Gregorian calendar precisely.<option value="^(?=\d{4})(((?!\d\d(00|04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96))\d{4}-(((0[13578]|10|12)-(0[1-9]|[12]\d|30|31))|((04|06|09|11)-(0[1-9]|[12]\d|30))|((02)-(0[1-9]|1\d|2[1-8]))))|((?=\d\d(00|04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96))\d{4}-(((0[13578]|10|12)-(0[1-9]|[12]\d|30|31))|((04|06|09|11)-(0[1-9]|[12]\d|30))|((02)-(0[1-9]|1\d|2[1-9])))))">Experimental VALIDATE yyyy-mm-dd</option>
Match Gregorian dates 1800 -> 9999
<option value="^((?:(?:1[8-9]|[2-9]\d)\d{2}([-/.]))(?:(?:(?:0[13578]|1[02])\2(?:0[1-9]|[12][0-9]|3[01]))|(?:(?:0[469]|11)\2(?:0[1-9]|[12][0-9]|30))|(?:(?:02)\2(?:0[1-9]|1[1-9]|2[1-8])))|(?:(?:1[8-9]|[2-9]\d(?:04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96)([-/.])(?:02)\3(?:29))|(?:(?:[2468][048]|[3579][26])(?:00)([-/.])(?:02)\4(?:29)))$">Match dates 1800-9999 in "yyyy[-/.]mm[-/.]dd" format</option>
([-/.])" means the data separator can be "-", "." or "/".
"(?: ...)" starting "?:" means "use this capturing group, but don't retain it". It makes complex regular expressions easier to work with.
Note: the regex needs to be one line, not broken by line-breaks as google does.
Anything between "@@" and "@@"
<option value="@@(.+?)@@">Match between "@@" pairs</option>
<option value="(\b\w+\b)(?:\s+\1)+">Duplicate Words (in Sequence)</option>
1234_dp
12.34_dp
1234._dp
1234e5_dp
1.234e5_dp
By "number" do you mean everything before the "_"? So "1.234e5" is a number?
By "number" do you mean everything before the "_"? So "1.234e5" is a number?I see "e" in the number -- is it hexadecimal number? Or is the "e" for something else?
Mark S. wrote:Here's a probably inefficient date VALIDATOR for dates starting 0000-01-01 following format yyyy-mm-dd (which I find to be the most generally useful). Ok, I didn't check the rules. I think there's something about a surpriseleap year every 400 years, so there's probably more tweaking to be done to match the Gregorian calendar precisely.<option value="^(?=\d{4})(((?!\d\d(00|04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96))\d{4}-(((0[13578]|10|12)-(0[1-9]|[12]\d|30|31))|((04|06|09|11)-(0[1-9]|[12]\d|30))|((02)-(0[1-9]|1\d|2[1-8]))))|((?=\d\d(00|04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96))\d{4}-(((0[13578]|10|12)-(0[1-9]|[12]\d|30|31))|((04|06|09|11)-(0[1-9]|[12]\d|30))|((02)-(0[1-9]|1\d|2[1-9])))))">Experimental VALIDATE yyyy-mm-dd</option>
Lol! Nice one! I'd put that in the REALLY ADVANCED category for regex!Most users would have no clue how that works!Anyway I think you got it off the net? It's not bad, but faulty :-).
Mark S. wrote:TT: Lol! Nice one! I'd put that in the REALLY ADVANCED category for regex!
Most users would have no clue how that works!Anyway I think you got it off the net? It's not bad, but faulty :-).
Actually, I rolled it myself.
+1.23e4_dp
-1.23e4_dp
1.236e+5_dp
-1.23e-5_wp
- from X until the end of the sentence
- ...until the end of the paragraph
- ...until the end of the text field
The exact regex may differ according to the context (type of field, does it have line-breaks?, what are the regex settings?) ...
([ .,;:!?]|$)
Note that in character classes meta-characters "." and "?" no longer have any special meaning so do not need to be escaped."|" is "alternation" within a "capture group" "(...)""$" is "end-of-scope" (depends on regex setting whether that is end-of-field or end-of-line)
(The attached image doesn't seem to work tho.)
<option value="@@(.+?)@@">Match between "@@" pairs</option>The "+?" is to make the match "lazy" so it won't extend beyond the second @@ to a third pair of @@.The content between them is passed to a "capturing group".
If the match needs to span lines regex settings may need tweaking.
@@.mystyle
foo bar
@@
/<\/?[\w\s]*>|<.+[\W]>/
{{{ [regexp:text[..........]] }}}
Mat: the first thing the TW docs should answer, i.e "How to reformat regexps so they can be used in TW"
<$set name="digit-pattern" value="[0-9]{2}">
<<list-links "[regexp:title<digit-pattern>]">>
</$set>
The filter syntax makes it impossible to directly specify a regular expression that contains square brackets. The solution is to store the expression in a variable. See the examples.
Yes, as Eric explained these are scientific notation. I forgot to add they can have positive or negative sign like
+1.23e4_dp
-1.23e4_dp
1.236e+5_dp
-1.23e-5_wp
^([\-+.0-9e]+_[A-Za-z]+)$
My trick is to "flatten" the text before applying the target regular expression...tiddlers...splitregexp[\n]join[ ]splitregexp<myfilter> ...
Is you aim here to CHANGE them?
.... (<\/?[\w\s]*>|<.+[\W]>) ...
\define magnitude3() [regex[blah]]
{{{ [<var>subfilter<magnitude3>else[number too big]] }}}
Ciao TonyMRegex is best developed with concrete data. It has no maths ability. Everything is just a string of characters to it.But its often possible to match using pattern. It depends on working with example test data to ensure where it might work.So could you give a paragraph or two of test data?TT
On Saturday, 24 August 2019 07:15:47 UTC+2, TonyM wrote:Mark/Josiah,Is there a simple way to test a number is in a range and or greater than or less than?It would be nice to have a pattern to test if a number lies between or equal to a number, even if we simply follow it with the new then or else operators and or make use of the emptyMessage on the list. Sadly the reveal greater than less than an equal to tests are somewhat limited and we do not yet have greater than or less than filter operators although match is now a form of equals.We may be able to have some tests like this{{{ [<number>regex[input>A$<B]else[out of range]] }}}
RegardsTony
Mark S. wrote:My trick is to "flatten" the text before applying the target regular expression...tiddlers...splitregexp[\n]join[ ]splitregexp<myfilter> ...
These nuggets should probably be in the official docs. While it "is about regex, not TW" it IS about regex in a TW context which does mean extra demands and quirks.
+1.23e4_dp
-1.23e4_dp
1.236e+5_dp
-1.23e-5_wp
To get a preciser match I'd like to know where in the number "e" can appear.
_Number 000 -> 799 _ "^([0-7][0-9][0-9])$" (must be exactly three numbers long)
or, more compact ...
_Number 000 -> 799 _ "^([0-7]\d\d)$" (must be exactly three numbers long)↩︎
... These should NOT match↩︎
800↩︎
27↩︎
8↩︎
... These should match↩︎
->799↩︎
->435↩︎
->000↩︎
->127
P.S What is that noise? Is that Josiah screaming curses, pulling his hair and smashing his computer? ;-)
_Age 22 -> 55_ regex: "^([2][2-9]|[3-4][0-9]|[5][0-5])$"↩︎
↩︎
... These should NOT match↩︎
21↩︎
56↩︎
300↩︎
4↩︎
05↩︎
... These should match↩︎
->22↩︎
->29↩︎
->30↩︎
->37↩︎
->40↩︎
->45↩︎
->55↩︎
^\d{2}(st|nd|rd|th)\s{1}(January|February|March|April|May|June|July|August|September|October|November|December)\s{1}\d{4}$
[02][0-255][0-255][0-255][192][168][0-255][0-255]
Ok, with duplicates and date formats. Note that the date formats only check for the format. You could still createnonsensical dates that actually match the formats (Jan 55 9999, 1111.15.55). Actual validation of dates would takereal code massaging.
<$vars digonly="^[0-9]*$">
<$vars useme=<<digonly>>>
</$vars>
</$vars>
<$select tiddler="myregexp">
<option value="^[0-9]*$">Only digits</option>
<option value="^[a-z]*$">Only lower case</option>
<option value="^[A-Z]*$">Only upper case</option>
<option value="^[\w-_]*$">Only alphanumeric, _, and -</option>
<option value="^[\w]{3,15}$">Only alphanum len 3-15</option>
<option value="^[A-Z]+.*$">Starts with capital</option>
<option value="^[0-9]+.*$">Starts with digit</option>
<option value="^.+\.[a-zA-Z]{3,4}$">Extensions only</option>
<option value="^.+(\.jpg|\.gpeg)$">Extension jpg gpeg</option>
<option value="^\b(\w{2,})\b.*\b\1\b.*$">Duplicate words</option>
<option value="^\b(\w{2,})\b.*\b\1\b.*$">Duplicate words</option>
<option value="^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s{1}\d{2}\s\d{4}$">Date like Jan 06 2019</option>
<option value="^\d{4}\.[0-1]\d\.[0-3]\d$">Date like 2019.08.25</option>
</$select>
<$list filter="[regexp{myregexp}sort[]]">
</$list>
On Friday, August 23, 2019 at 12:11:07 AM UTC-7, Mohammad wrote:I am looking for examples and use cases of regexp in Tiddlywiki!Those can be done current filter operators like prefix, search,... are not recommend to be done with regexp.I appreciate your help, case and examples on this. Just give what you want to do.Some caseGive a regexp pattern in Tiddlywiki to match all tiddlers name are
- only digits
- only lowercase letters
- only uppercase letters
- only alphanumeric and underscore and hyphen
- only alphanumeric with length between 3 and 15
- start with a capital letter
- start with a digit
- have a extension like mytiddler.ext
- have jpg or jpeg extension like mytiddler.jpg or mytiddler.gpeg
- are a date in format like Jan 06 2019
- are a date in format like 2019.08.25
- have duplicate words
- have a valid url
[This list will grow by more examples]Please give your use case.-- Mohammad
^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
(\b\w{2,}\b)(.*)\1
(\b\w{2,}\b)(.*)\1 ... spaced duplicate words to remove↩︎
↩︎
This is a Tiddler ->This<- is Nice↩︎
Nice Tiddler is This ->Tiddler<-↩︎
Remove this repeat of ->this<- Tiddler repeat of ->repeat<-.↩︎
It seems the problem is with ^$.
I think he means "02" literally. Usually IP numbers aren't padded, so not sure.It's the range 0-255 that's problematic. Here's what I have for the range:<option value="^(\b\d\b|\b\d\d\b|1\d\d|2[0-4]\d|25[0-5])">IP range 0-256</option>
Hmm, I guess with an IP you could add the mandatory delimiter (usually ".") and repeat the group. But you would have to manually repeat the group at the end where the delimiter must not be.
And then there's zero padding. Most of the IP numbers I've seen are not zero-padded, but ...
I think the first thing I would do is see what the internet says.A search for "regular expression ip address" immediately turns up a page from O'Reilly, with both a simpleversion and an accurate version for checking IP. As I expected, they're able to do a repeat on the structure 3 times, buthave to do the last one by hand. They've figured out the 0 padding:^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$So ... no need to rebuild the wheel for most common use cases. Hmm, I wonder about IPv6 ?
Ok, sorry for the stream-of-consciousness problem-working.
<p style="text-transform: capitalize;"><<currentTiddler>></p>
{{{ [{!!title}lowercase[]titlecase[]] }}}
<$text text={{{ [{!!title}lowercase[]titlecase[]] }}}/>
{{{ [fields[]lowercase[]titlecase[]] }}}
<option value="@@(\s|\S)+?@@">Match between "@@" pairs</option>
@TiddlyTweeter wrote:
<option value="@@(.+?)@@">Match between "@@" pairs</option>The "+?" is to make the match "lazy" so it won't extend beyond the second @@ to a third pair of @@.The content between them is passed to a "capturing group".Thank you - but:If the match needs to span lines regex settings may need tweaking.In deed, this common case is not found AFAICT:
@@.mystyle
foo bar
@@<:-)
Mohammad,http://tw-regexp.tiddlyspot.com/ is very good!!I doubt I have to make my own version now :-). In a way its better there is ONE resource, not two.
M: May this include your previous tutorial.
... check for 10.*.*.* 192.168.*.* 127.*.*.* or if equal to 1.1.1.1 or 0.0.0.0. to determine if they are local or public addresses.
<option value="\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b">0-255 Decimal, no leading zeros</option>
<option value="\b[01]{8}\b">00000000-11111111, Binary Byte</option>
<option value="\b(0|1){8}\b">00000000-11111111, Binary Byte</option>
Now to test a fixed number such as 127 I imagine that's just a string?
tony
TonyM wrote:
... check for 10.*.*.* 192.168.*.* 127.*.*.* or if equal to 1.1.1.1 or 0.0.0.0. to determine if they are local or public addresses.
(\b192\.168(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b){2})
(\b10(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b){3})
(\b172\.(1[6-9]|2[0-9]|3[01])(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b){2})
(\b(192\.168|10|172\.(1[6-9]|2[0-9]|3[01]))(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b){2,3})
_Should NOT match_↩︎
9.255.255.255↩︎
10.0.0.00↩︎
147.168.255.255↩︎
172.32.255.255↩︎
172.14.255.255↩︎
192.168.1.256↩︎
192.27.255.255↩︎
↩︎
_Should match_↩︎
->10.0.0.0↩︎
->172.16.89.125↩︎
->192.168.1.1↩︎
->192.168.1.255↩︎
Now to test a fixed number such as 127 I imagine that's just a string?