RegEx for matching a string of tags...

77 views
Skip to first unread message

David Szego

unread,
Feb 21, 2012, 9:54:56 PM2/21/12
to TiddlyWiki
Hi all, if anyone needs a RegEx to take a string full of tags and
break it into an array of valid tags, using:

var tagList = tagString.match(/(\[{2}(\w+(\.?|\S?)(\s?)(\w*))+\]
{2})|(\[(\w+((\S)\w+)*(\S?)))|(\w+((\S)\w+)*(\S?))/gi);

works on a complex example string like:

Tag1 [[another tag]] moretags tag.one [[her's]] paul's [[david's
meeting]][[feb. 16th asdf]] something [[somethingelse]] somethingelse?
[[something's got's to. No? give!]] period. [[three word tag]]
[wrongtag] wonder's [[[three brackets]]] [one bracket] these's
those.twice.again periods.twice.

tagList[0] = "Tag1"
tagList[1] = "[[another tag]]"
tagList[2] = "moretags"
tagList[3] = "tag.one"
tagList[4] = "[[her's]]"
tagList[5] = "paul's"
tagList[6] = "[[david's meeting]]"
tagList[7] = "[[feb. 16th asdf]]"
tagList[8] = "something"
tagList[9] = "[[somethingelse]]"
tagList[10] = "somethingelse?"
tagList[11] = "[[something's got's to. No? give!]]"
tagList[12] = "period."
tagList[13] = "[[three word tag]]"
tagList[14] = "[wrongtag]"
tagList[15] = "wonder's"
tagList[16] = "[[three brackets]]"
tagList[17] = "[one"
tagList[18] = "bracket]"
tagList[19] = "these's"
tagList[20] = "those.twice.again"
tagList[21] = "periods.twice."

Note that the "[one bracket]" gets parsed to two separate tags, "[one"
and "bracket]" which is how TW would see it.

Oh, and thus is fixed a bug in my mGSDMeetingEnhancementsPlugin!
(http://thinkcreatesolve.biz/mGSDEnhancements.html - shameless plug!)

Cheers,
David Szego

Eric Shulman

unread,
Feb 21, 2012, 11:15:01 PM2/21/12
to TiddlyWiki
> Hi all, if anyone needs a RegEx to take a string full of tags and
> break it into an array of valid tags, using:
>     var tagList = tagString.match(/(\[{2}(\w+(\.?|\S?)(\s?)(\w*))+\]
> {2})|(\[(\w+((\S)\w+)*(\S?)))|(\w+((\S)\w+)*(\S?))/gi);
>
> works on a complex example string like:
>
>     Tag1 [[another tag]] moretags tag.one [[her's]] paul's [[david's
> meeting]][[feb. 16th asdf]] something [[somethingelse]] somethingelse?
> [[something's got's to. No? give!]] period. [[three word tag]]
> [wrongtag] wonder's [[[three brackets]]] [one bracket] these's
> those.twice.again periods.twice.

Interesting work on the regexp. I like the idea of parsing the tags
string with regexp because it is very efficient. Howevern the TWCore
defines a string method, .readBracketedList(), which is used to parse
tag string input and populate the internal tiddler data structure in
the 'store'. Thus, you would write:
var tagList = tagString.readBracketedList();
to get the actual tags that the core will use. You should do a little
comparison to make sure your regexp produces the same results.

enjoy,
-e
Eric Shulman
TiddlyTools / ELS Design Studios

PMario

unread,
Feb 22, 2012, 3:09:51 AM2/22/12
to TiddlyWiki
TW interprets
[[[three brackets]]]
like
tag[0] = [[[three brackets]] shown as [three brackets
tag[1] = ] shown as ]

see your
> tagList[16] = "[[three brackets]]"

As Eric said: there is a core function that does the same. The core
extends the string prototype [1] with readBracketedList(unique). So if
you set unique to true, tags that are doubled, will be filtered.
readBracketedList calls this.parseParams() [2] which does the regexp
stuff.

-m

[1] https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/Strings.js#L184
[2] https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/Strings.js#L96

Alex Hough

unread,
Feb 22, 2012, 3:39:13 AM2/22/12
to tiddl...@googlegroups.com
Mario,

I like the way you have linked to the lines of code in GitHub, I think
it is a really useful practice and an opportunity to direct people
towards the code.


It is evident that String.js is the home of many functios for
manipulating strings. As it says at the top if the file ...

//-- Augmented methods for the JavaScript String() object


Intimidate level TiddlyScholars might want to look the Javascript
String object [1] and prototype string [2]

thanks for starting a mini learning loop (TiddlyLearningLoop?) for me ;)

Alex
[1] http://www.w3schools.com/jsref/jsref_obj_string.asp
[2] http://www.w3schools.com/jsref/jsref_prototype_string.asp

> --
> You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
> To post to this group, send email to tiddl...@googlegroups.com.
> To unsubscribe from this group, send email to tiddlywiki+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/tiddlywiki?hl=en.
>

David Szego

unread,
Feb 22, 2012, 10:24:50 AM2/22/12
to tiddl...@googlegroups.com
Thanks for the readBracketedList tip! Learn something new every day!

David Szego

unread,
Feb 28, 2012, 3:25:27 PM2/28/12
to TiddlyWiki
FYI, I noticed my regex didn't handle parenthesis nicely, so I added
some more characters to it.

tags.match(/(\[{2}((\w*|\.*|\s?|\?*|\!*\(*|\)*|\'*\"*\&*)\s?)+\]{2})|
((\w*|\.*|\?*|\!*\(*|\)*|\'*\"*\&*)+)/gi)

now works well on a set of tags like this:

Tag1 (Something) (something2 something3) [[Something (else)]]
[[(something else)]] [[(something)else (no?)]][[another tag]] moretags
tag.one [[her's]] paul's [[david's meeting]][[feb. 16th asdf]]
something [[somethingelse]] somethingelse? "this" us&them
[[something's got's to. No? give!]] period. [[three word tag]]
[wrongtag] wonder's [[[three brackets]]] [one bracket] these's
those.twice.again periods.twice.

Separates into:

Tag1
(Something)
(something2
something3)
[[Something (else)]]
[[(something else)]]
[[(something)else (no?)]]
[[another tag]]
moretags
tag.one
[[her's]]
paul's
[[david's meeting]]
[[feb. 16th asdf]]
something
[[somethingelse]]
somethingelse?
"this"
us&them
[[something's got's to. No? give!]]
period.
[[three word tag]]
wrongtag
wonder's
[[three brackets]]
one
bracket
these's
those.twice.again
periods.twice.

In fact, it parses better than readBracketedList! So, give it a try!

Cheers,
David.
Reply all
Reply to author
Forward
0 new messages