I'm struggling get to grips with the arcane ways of regular expressions an assortment of things in the workings of TW plugins. I know once cracked it'll be easy, but in the meantime...
I also suspect it's a darn sight easier to construct a Reg Exp when you know what you want to do as opposed to deconstructing one to see what it does.
And so on to the IntelliTaggerPlugin [edit] feature
The ability to edit tags without needing to edit the tiddler is very appealing, as is (are?) TagglyTags
I'd like to be able to incorporate IT's [edit] in the tag list style used by QuickOpenTagsPlugin.
It's kind-of do-able by replacing the TagglyTags macro hideSomeTags by "tags" but I'm not sure if that will cause unwanted side effects. I've not spotted the tags that it is supposed to hide but it'd be better to keep to the original where possible.
Also I'm not happy with having [edit] alongside the tags, it takes space feels wrong.
It'd be better (IMO) if it could appear at the top right of the blue drop-down panel (not asking for much ;)
Getting this to happen is completely defeating me as I cannot yet follow how Intellitags inserts its [edit] in the original Tags box.
You will find it is called within the overloaded config.macros.tags.handler function, to add the "[edit]" button behind the items already created by the original tags.handler function. It is possible to add this button at other places (e.g. to the right of the "tags:" text) but most of these locationsrequire some assumption on the structure of the list that I didn't want to rely on. But of cause you may modify the code and place the button at other loactions if you like to. Just call the createEditTagsButton function and pass the proper place.
On 6/16/06, Chris Lawley <ch...@art-en-soul.cix.co.uk> wrote:
> I also suspect it's a darn sight easier to construct a Reg Exp when > you know what you want to do as opposed to deconstructing one to see > what it does.
It's sometimes possible to translate one to english though. If you post the regex here I'll have a go.. hopefully it'll be fun and educational for both of us :)
;Daniel
-- Daniel Baird http://danielbaird.com (TiddlyW;nks! :: Whiteboard Koala :: Blog :: Things That Suck)
> On 6/16/06, Chris Lawley <ch...@art-en-soul.cix.co.uk > wrote:
> > I also suspect it's a darn sight easier to construct a Reg Exp when > > you know what you want to do as opposed to deconstructing one to see > > what it does.
> It's sometimes possible to translate one to english though. If you post > the regex here I'll have a go.. hopefully it'll be fun and educational for > both of us :)
> ;Daniel
Getting OT here but.. I was curious about this. It seems insane. The page doesn't mention that the re doesn't appear in the code. Here is the actual code which constructs it. I bet some of the regexps in the pre 2.0 wikifier would have similarly long and unreadable.
# Preloaded methods go here. my $lwsp = "(?:(?:\\r\\n)?[ \\t <file://t/>])";
my $char = '[\\000-\\177]';
sub make_rfc822re { # Basic lexical tokens are specials, domain_literal, quoted_string, atom, and # comment. We must allow for lwsp (or comments) after each of these. # This regexp will only work on addresses which have had comments stripped # and replaced with lwsp.
my $specials = '()<>@,;:\\\\".\\[\\]'; my $controls = '\\000-\\037\\177';
my $dtext = "[^\\[\\]\\r\\\\]"; my $domain_literal = "\\[(?:$dtext|\\\\.)*\\]$lwsp<file://%5B(%3F:$dtext%7C////.)*//%5D$lwsp> *";
my $quoted_string = "\"(?:[^\\\"\\r\\\\]|\\\\.|$lwsp)*\"$lwsp<file://r////%5D%7C////.%7C$lwsp)* /%22$lwsp> *";
# Use zero-width assertion to spot the limit of an atom. A simple # $lwsp* causes the regexp engine to hang occasionally. my $atom = "[^$specials $controls]+(?:$lwsp+|\\Z|(?=[\\[\"$specials]))"; my $word = "(?:$atom|$quoted_string)"; my $localpart = "$word(?:\\.$lwsp*$word)*";
my $sub_domain = "(?:$atom|$domain_literal)"; my $domain = "$sub_domain(?:\\.$lwsp*$sub_domain)*";
my $addr_spec = "$localpart\@$lwsp*$domain";
my $phrase = "$word*"; my $route = "(?:\@$domain(?:,\@$lwsp*$domain)*:$lwsp*)"; my $route_addr = "\\<$lwsp*$route?$addr_spec\\>$lwsp<file://%3C$lwsp*$route%3F$addr_spec//%3 E$lwsp> *"; my $mailbox = "(?:$addr_spec|$phrase$route_addr)";
my $group = "$phrase:$lwsp*(?:$mailbox(?:,\\s*$mailbox)*)?;\\s*"; my $address = "(?:$mailbox|$group)";
> Getting OT here but.. I was curious about this. It seems insane. The page > doesn't mention that the re doesn't appear in the code. Here is the actual > code which constructs it. I bet some of the regexps in the pre 2.0wikifier would have similarly long and unreadable.
> # Preloaded methods go here. > my $lwsp = "(?:(?:\\r\\n)?[ \\t ])";
> my $char = '[\\000-\\177]';
> sub make_rfc822re { > # Basic lexical tokens are specials, domain_literal, quoted_string, > atom, and > # comment. We must allow for lwsp (or comments) after each of these. > # This regexp will only work on addresses which have had comments > stripped > # and replaced with lwsp.
> my $specials = '()<>@,;:\\\\".\\[\\]'; > my $controls = '\\000-\\037\\177';
Ahh, that explains how anyone could maintain that insane regex.
;D
-- Daniel Baird http://danielbaird.com (TiddlyW;nks! :: Whiteboard Koala :: Blog :: Things That Suck)
> It's sometimes possible to translate one to english though. If you > post the regex here I'll have a go.. hopefully it'll be fun and > educational for both of us :)
Well I _was_ hoping to have a laugh at your cartoons of funny, humpless camels <sigh> it just does a 'white page' on me :( or is it "albino in the snow"?
It'd help to have some context; it's from NewerTiddlerPlugin, line 69.
I'm sure it intended to split the function's supplied parameters into an array (fairly obvious from the context) Where it gets tricksy is when someone, who hasn't looked seriously at JavaScript for several year, wants to amend it!
chris :-) (being told he should get out and cut the grass)
The only change to analyse such a RegEx is to split it up, e.g. like this:
all: / A /g;
A: B (?::B2|:C(D*(?:\\.E)*)C)?(?=\s|$)
B: ([^:\'\"\s]+) A sequence of one or more characters that are neither : , ', " nor whitespaces B2: ([^\'\":\s]+) (dito)
C: [\'\"] Either ' or "
D: [^\'\"\\] Neither ' nor " nor \
E: [^\'\"\\]* A sequence of zero or more character that are neither ' nor " nor \
?: identifies a "non-capturing group". I.e. typically every text matched by a (...) expression is "captured" and can be retrieved individually in the result of the match. For (...) that don't need this feature one makes it "non-capturing.
When analysing a RE one can ignore the ?: (in the first pass).
I.e. the expression looks like this
A2: B (:B2|:C(D*(\\.E)*)C)?(?=\s|$)
?= identifies a "positive lookahead". In our case this means behind the term we are looking for there must either be a whitespace (that is not matched) or the end of the text.
So now looking at the result
A3: B (:B2|:C(D*(\\.E)*)C)
one would say in english:
* match a word (not containing any ', ", : or whitespace), followed by a colon (:). * Behind that colon there is either another word (not containing any ', ", : or whitespace) or a quote (' or ") followed by a word (without ', " or \) and a sequence of zero or more "dot word" pairs, all terminated with a quote
Behind that match there must be a whitespace or the end of the text.
>> It's sometimes possible to translate one to english though. If you >> post the regex here I'll have a go.. hopefully it'll be fun and >> educational for both of us :)
> Well I _was_ hoping to have a laugh at your cartoons of funny, > humpless camels <sigh> it just does a 'white page' on me :( > or is it "albino in the snow"?
> It'd help to have some context; it's from > NewerTiddlerPlugin, line 69.
> I'm sure it intended to split the function's supplied parameters into > an array (fairly obvious from the context) > Where it gets tricksy is when someone, who hasn't looked seriously at > JavaScript for several year, wants to amend it!
> chris :-) > (being told he should get out and cut the grass)
On 6/18/06, Udo Borkowski <Udo.Borkow...@gmx.de> wrote:
> > taken all the fun out of it.
> Sorry, Daniel, I forgot.... > Next time...
> Or maybe: What about this: new > RegExp("\\[([<]{0,1})([>]{0,1})[Ii][Mm][Gg]\\[(?:([^\\|\\]]+)\\|)?([^\\[\\] \\|]+)\\](?:\\[([^\\]]*)\\]?)?(\\])")
Sweet!
I remember reading through some worked regex-to-english translations when I was learning regexps, and they were very helpful. So here's me giving back to the internet community:
Firstly, the regex is written in a Javascript string. When you type certain chars in a JS string they need to be escaped: you write \\ to get a \, you write \" to get a ", etc. So the first thing to do is to un-escape the string, so we can see the actual regex itself:
As Udo mentions in a previous message, brackets in a regex capture their contents so that you can use them later, but start the brackets with (?: turns off capturing. We don't care about capturing at the moment, so we can just remove ?: following a (.
Now try translating some of the bracketed groups into english.
- chars inside square brackets like [this] means "any one of these chars". So, [Aa] means either a capital "A" or lower case "a", and [Ii][Mm][Gg] means any way you can write "img" in caps or lowercase.
- numbers in curly brackets after something means some number of repetitions of that thing, so a{3,5} means either three, four or five "a"s in a row (aaa, aaaa, or aaaaa). So, [<]{0,1} means zero or one less than sign.. or to put it another way, an optional <.
At this point it's clear that we're looking at tiddlywiki's img tag. We have something like this:
[img[ stuff morestuff ] otherstuff ]
So concentrating on the "stuff" for now:
(([^\|\]]+)\|) ?
- square brackets normally mean "one of these chars", but a caret ^ at the start changes the meaning to "anything BUT one of these chars". So this bit [^\|\]] means "not a pipe | and not an end square bracket ]" (the pipe and end bracket are escaped with a preceding slashes).
(( not | or ]+)\|) ?
- a plus following a thing means one or more of that thing. So [^\|\]]+ means "one or more chars that aren't | or ]".
( one or more chars that are not | or ] \| ) ?
- a question mark following a thing means zero or one of that thing (or to put it another way, an optional thing). - instead of saying "one or more things that aren't | or ]" let's just say "a phrase". So, our final translation of "stuff" is:
optionally, a phrase followed by a |
"morestuff" is quite similar.
([^\[\]\|]+)
this is a square bracketed list with a caret, followed by a plus, so that translates to "one or more chars that aren't [ or ] or |".. or put slightly more loosely:
a phrase
Okay one more bit to go before we can reassemble. "otherstuff" is:
(\[([^\]]*)\]?) ?
- an asterisk after a thing means any number of that thing, including zero. So [^\]]* means any number of chars that aren't ]. So we end up with:
optionally ( a [, any number of chars but not ], followed by an optional ] )
a [ optionally, a < optionally, a > the letters "img", optionally captials a [ optionally, a phrase not including | or ], ending with a | a phrase not including [ or ] or | a ] optionally, an optional phrase followed by a ] a ]
..and that's it.
Now, I'm pretty sure I've got something wrong in there, because i think the optional trailing ] on the second last line of my final breakdown would allow wrong syntax for the img tag. I should probably review my working, but after spending an hour on this my wife is nagging me to mow the lawn :)
Cheers all
;Daniel
-- Daniel Baird http://danielbaird.com (TiddlyW;nks! :: Whiteboard Koala :: Blog :: Things That Suck)
> Now, I'm pretty sure I've got something wrong in there, because i > think the optional trailing ] on the second last line of my final > breakdown would allow wrong syntax for the img tag.
GOOD JOB! And you found a bug!!
You did nothing wrong. The RE is covering the wrong syntax. I.e. The following lines are possible, but the second one is probably not intended.
> On 6/18/06, *Udo Borkowski* <Udo.Borkow...@gmx.de > <mailto:Udo.Borkow...@gmx.de>> wrote:
> > taken all the fun out of it.
> Sorry, Daniel, I forgot.... > Next time...
> Or maybe: What about this: new > RegExp("\\[([<]{0,1})([>]{0,1})[Ii][Mm][Gg]\\[(?:([^\\|\\]]+)\\|)?([^\\[\\] \\|]+)\\](?:\\[([^\\]]*)\\]?)?(\\])")
> Sweet!
> I remember reading through some worked regex-to-english translations > when I was learning regexps, and they were very helpful. So here's me > giving back to the internet community:
> Firstly, the regex is written in a Javascript string. When you type > certain chars in a JS string they need to be escaped: you write \\ to > get a \, you write \" to get a ", etc. So the first thing to do is to > un-escape the string, so we can see the actual regex itself:
> As Udo mentions in a previous message, brackets in a regex capture > their contents so that you can use them later, but start the brackets > with (?: turns off capturing. We don't care about capturing at the > moment, so we can just remove ?: following a (.
> Now try translating some of the bracketed groups into english.
> - chars inside square brackets like [this] means "any one of these > chars". So, [Aa] means either a capital "A" or lower case "a", and > [Ii][Mm][Gg] means any way you can write "img" in caps or lowercase.
> - numbers in curly brackets after something means some number of > repetitions of that thing, so a{3,5} means either three, four or five > "a"s in a row (aaa, aaaa, or aaaaa). So, [<]{0,1} means zero or one > less than sign.. or to put it another way, an optional <.
> At this point it's clear that we're looking at tiddlywiki's img tag. > We have something like this:
> [img[ stuff morestuff ] otherstuff ]
> So concentrating on the "stuff" for now:
> (([^\|\]]+)\|) ?
> - square brackets normally mean "one of these chars", but a caret ^ at > the start changes the meaning to "anything BUT one of these chars". > So this bit [^\|\]] means "not a pipe | and not an end square bracket > ]" (the pipe and end bracket are escaped with a preceding slashes).
> (( not | or ]+)\|) ?
> - a plus following a thing means one or more of that thing. So > [^\|\]]+ means "one or more chars that aren't | or ]".
> ( one or more chars that are not | or ] \| ) ?
> - a question mark following a thing means zero or one of that thing > (or to put it another way, an optional thing). > - instead of saying "one or more things that aren't | or ]" let's just > say "a phrase". So, our final translation of "stuff" is:
> optionally, a phrase followed by a |
> "morestuff" is quite similar.
> ([^\[\]\|]+)
> this is a square bracketed list with a caret, followed by a plus, so > that translates to "one or more chars that aren't [ or ] or |".. or > put slightly more loosely:
> a phrase
> Okay one more bit to go before we can reassemble. "otherstuff" is:
> (\[([^\]]*)\]?) ?
> - an asterisk after a thing means any number of that thing, including > zero. So [^\]]* means any number of chars that aren't ]. So we end > up with:
> optionally ( a [, any number of chars but not ], followed by an > optional ] )
> a [ > optionally, a < > optionally, a > > the letters " img", optionally captials > a [ > optionally, a phrase not including | or ], ending with a | > a phrase not including [ or ] or | > a ] > optionally, an optional phrase followed by a ] > a ]
> ..and that's it.
> Now, I'm pretty sure I've got something wrong in there, because i > think the optional trailing ] on the second last line of my final > breakdown would allow wrong syntax for the img tag. I should probably > review my working, but after spending an hour on this my wife is > nagging me to mow the lawn :)
Actually, I'd rather keep Trac focused on the core code and the cooking tools for the moment. We will be opening up Subversion to third party plugins, and that would be the time to start tracking issues for them.
Jeremy Ruston wrote: >> Daniel: I suggest that you post this as a bug ticket in Trac (i.e. the >>second last ? is too much). >>(http://trac.tiddlywiki.org/tiddlywiki)
>Actually, I'd rather keep Trac focused on the core code and the >cooking tools for the moment. We will be opening up Subversion to >third party plugins, and that would be the time to start tracking >issues for them.
> Actually, I'd rather keep Trac focused on the core code and the > cooking tools for the moment. We will be opening up Subversion to > third party plugins, and that would be the time to start tracking > issues for them.
On 6/19/06, Jeremy Ruston <jeremy.rus...@gmail.com> wrote:
> > Daniel: I suggest that you post this as a bug ticket in Trac (i.e. the > > second last ? is too much). > > (http://trac.tiddlywiki.org/tiddlywiki)
> Actually, I'd rather keep Trac focused on the core code and the > cooking tools for the moment. We will be opening up Subversion to > third party plugins, and that would be the time to start tracking > issues for them.
I'm confused now.. I thought that regex was from the standard wikifier? I never did get the lawn mowed, I'd like to think all that work was helping TiddlyWiki in some way ;)
And, I'm now certain this thread should be on the dev list, not the main list!
;Daniel
-- Daniel Baird http://danielbaird.com (TiddlyW;nks! :: Whiteboard Koala :: Blog :: Things That Suck)
> On 6/19/06, Jeremy Ruston <jeremy.rus...@gmail.com> wrote:
> > > Daniel: I suggest that you post this as a bug ticket in Trac (i.e. > > the > > > second last ? is too much). > > > (http://trac.tiddlywiki.org/tiddlywiki)
> > Actually, I'd rather keep Trac focused on the core code and the > > cooking tools for the moment. We will be opening up Subversion to > > third party plugins, and that would be the time to start tracking > > issues for them.
> I'm confused now.. I thought that regex was from the standard wikifier? I > never did get the lawn mowed, I'd like to think all that work was helping > TiddlyWiki in some way ;)
> And, I'm now certain this thread should be on the dev list, not the main > list!
> ;Daniel
> -- > Daniel Baird > http://danielbaird.com (TiddlyW;nks! :: Whiteboard Koala :: Blog :: Things > That Suck)
-- Daniel Baird http://danielbaird.com (TiddlyW;nks! :: Whiteboard Koala :: Blog :: Things That Suck)