unicode uris

4 views
Skip to first unread message

chris...@gmail.com

unread,
Dec 7, 2010, 10:49:18 AM12/7/10
to WikklyText
CATCH_URL regexp only accepts ascii in hostnames of URLs and thus
something like:

http://➔.ws/ෟ

is interpreted as the start of an unclosed italic phrase.

It seems that in these modern times the above URL is legit, since it
does resolve and is used by these guys http://tinyarro.ws/ for their
URL shortening service.

I guess the general problem here is that instead of CATCH_URL there
needs to be a CATCH_IRI. Some related stack overflow chatter:

http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url

I've not yet tried to cook up a patch, but will do some digging and
report back if I find something.

Frank McIngvale

unread,
Dec 7, 2010, 2:56:00 PM12/7/10
to wikkl...@googlegroups.com
Ah, thanks. I was vaguely aware/afraid there might be some trouble there.

Speaking of which ... it's probably painfully obvious that I've had
little to no time to work on WikklyText for a while now. Just a
combination of WikklyText being fairly feature-complete as far as my
needs, plus the usual life distractions. Would you or anyone be
interested in me just sticking it out on e.g. github so it could get
patched without waiting on slow me? (I mention github since I use git
and github lets you do branching super easy, so that would be the
smoothest transition, but I'm open to alternatives.)

frank

> --
> You received this message because you are subscribed to the Google Groups "WikklyText" group.
> To post to this group, send email to wikkl...@googlegroups.com.
> To unsubscribe from this group, send email to wikklytext+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/wikklytext?hl=en.
>
>

chris...@gmail.com

unread,
Dec 7, 2010, 3:39:44 PM12/7/10
to wikkl...@googlegroups.com
On Tuesday, December 7, 2010 7:56:00 PM UTC, Frank McIngvale wrote:

Speaking of which ... it's probably painfully obvious that I've had
little to no time to work on WikklyText for a while now. Just a
combination of WikklyText being fairly feature-complete as far as my
needs, plus the usual life distractions. Would you or anyone be
interested in me just sticking it out on e.g. github so it could get
patched without waiting on slow me? (I mention github since I use git
and github lets you do branching super easy, so that would be the
smoothest transition, but I'm open to alternatives.)

Yeah, that would be great. Github would be best for me as that's where all the tiddlyweb, etc stuff is happening.

Just so you're aware, still only using the wikifier from wikklytext and we keep bouncing around ideas on working out some way of transcoding the tiddlywiki javascript based wikifier into something that can run serverside, perhaps with node.js or similar.

chris...@gmail.com

unread,
Jan 3, 2011, 10:05:33 AM1/3/11
to WikklyText

Just an update on this. It's proving more difficult than I expected. I
had assumed that I could just change the t_CATCH_URL definition to use
\w and turn on the re.UNICODE flag et voila I'd be in business.

Debugging, however, shows that this is not happening. An IRI is not
matching in the tokenizing phase.

I'm stymied.
Reply all
Reply to author
Forward
0 new messages