URL pattern: false positives

4 views
Skip to first unread message

FND

unread,
Jul 8, 2007, 6:14:59 AM7/8/07
to Tiddly...@googlegroups.com
Hey all,

The URL RegEx pattern rather often creates false positives - for
example, "foo:<br>bar" will be rendered as a link.

Isn't there a standardized RegEx for such a frequently-used pattern?
I've checked RegExLib.com, but that's not really a great resource. There
also seems to be a "Regexp::Common" library for Perl, but I'm not sure
whether those patterns could be used in JavaScript.

Is anyone familiar with this sort of thing? If there's a proven,
JS-compatible RegEx for URLs (or URIs), it should be easy to integrate
that into the the core...


-- F.

Jeremy Ruston

unread,
Jul 9, 2007, 3:48:07 PM7/9/07
to Tiddly...@googlegroups.com
There have been a few twists and turns to the URL regexp that we use -
I believe the current version was contributed by Martin, but I could
be wrong.

Generally, the motivation behind the changes has been to accommodate a
wider range of protocols (eg skype and outlook). I share your desire
for a standardised format, but I don't think it exists. I suspect that
we might be better off making the regexp more conservative and
specific, but perhaps providing some new markup to explicitly force an
ambiguous external link to be recognised.

Cheers

Jeremy


--
Jeremy Ruston
mailto:jer...@osmosoft.com
http://www.tiddlywiki.com

FND

unread,
Jul 9, 2007, 4:28:30 PM7/9/07
to Tiddly...@googlegroups.com
> There have been a few twists and turns to the URL regexp that we use -
> I believe the current version was contributed by Martin, but I could
> be wrong.

Shifting the blame, eh? ;)

> we might be better off making the regexp more conservative and
> specific, but perhaps providing some new markup to explicitly force an
> ambiguous external link to be recognised.

Actually, I don't think this is a huge problem (it's usually easy to
prevent linkification by adding a simple space), so that effort might
rather be invested in fixing more important issues.
However, we might keep this in mind for when we run out of things that
need fixing...


-- F.

Udo Borkowski

unread,
Jul 9, 2007, 6:27:06 PM7/9/07
to Tiddly...@googlegroups.com
If somebody is interesting in writing the RegExp for the URI / URL have a look at the syntax definition in the RFC of the URI "Universal Resource Identifiers in WWW" by T. Berners-Lee ( http://www.ietf.org/rfc/rfc1630.txt, pages 22ff)

Udo

----------
Udo Borkowski
http://www.abego-software.de




On 7/8/07, FND <Ace_...@gmx.net> wrote:

Paul Petterson

unread,
Jul 11, 2007, 4:57:18 PM7/11/07
to Tiddly...@googlegroups.com
I've used one in the past that was very good with many of the standard protocols - but the actual protocols were also hard-coded into the expression. 
 
I should point out since I'm replying to Udo's message on the URI standard that this expression doesn't follow the standard strictly - one rather nice 'feature' the expression had was to force the final character of the URL to be an alpha-numeric-special excluding standard punctuation - so you could embed a URL in text and if it was followed by a period, semi-color, etc - that last character was excluded.
 
But to use it - you really need a set of protocols or it chokes.
 
We could extract the protocols into a separate list stored in config an easily updated?
 
Paul

 

Martin Budden

unread,
Jul 13, 2007, 7:06:05 AM7/13/07
to Tiddly...@googlegroups.com
I've created a ticket for this, see:

http://trac.tiddlywiki.org/ticket/363

Martin

FND

unread,
Jul 13, 2007, 7:09:03 AM7/13/07
to Tiddly...@googlegroups.com
> I've created a ticket for this, see:
> http://trac.tiddlywiki.org/ticket/363

Thanks Martin.
The URL doesn't work though; here's the proper one:
http://tinyurl.com/2qg635
(http://groups.google.com/group/TiddlyWikiDev/browse_thread/thread/19b3d75214600926)


-- F.

Martin Budden

unread,
Jul 13, 2007, 7:18:40 AM7/13/07
to Tiddly...@googlegroups.com
Thanks, now fixed.

(I rather stupidly copied the link from my gmail client, which I use to browse the group).

Martin

FND

unread,
Jul 17, 2007, 8:04:14 AM7/17/07
to Tiddly...@googlegroups.com
> Isn't there a standardized RegEx for such a frequently-used pattern?

While we're at it; what about e-mail addresses? Shouldn't those be
transformed automatically as well, even without the "mailto:" prefix?

(This might be slightly off-topic, but it's probably not worth starting
a new thread for... )


-- F.

Reply all
Reply to author
Forward
0 new messages