[cl-ppcre-devel] Fwd: string length limit ?

29 views
Skip to first unread message

Mario Maio

unread,
Apr 9, 2011, 11:41:51 AM4/9/11
to cl-ppcr...@common-lisp.net
Sorry if this is a trivial issue, I'm a common lisp newbie.

If I apply the following very simple command (replacing one or more
consecutive CR chars with one LF char)

(cl-ppcre:regex-replace-all (concatenate 'string (string #\return) "+")
mystring (string #\linefeed))

to my string of 455079 characters (loaded from a utf-8 file), some of
the last #\return characters are not substituted (even if they should,
since if a apply again the command to the resulting string they ARE
subsituted).
It looks like in the search there is a sort of length limit, or maybe some string length mistake connected to multi-byte characters representation ?

Cheers.

Mario


_______________________________________________
cl-ppcre-devel site list
cl-ppcr...@common-lisp.net
http://common-lisp.net/mailman/listinfo/cl-ppcre-devel

Edi Weitz

unread,
Apr 9, 2011, 12:13:09 PM4/9/11
to General interest list about cl-ppcre and cl-unicode
This is an issue of the "This should not happen" variety. Certainly,
there is no such limit in CL-PPCRE. If you could provide us (i.e. the
mailing list) with a self-contained test case that demonstrates the
problem in a reproducible way, I'll look into it. Please also make
sure to let us know which Lisp on which OS you are using and which
version of CL-PPCRE.

Thanks,
Edi.

Mario Maio

unread,
Apr 14, 2011, 7:12:47 AM4/14/11
to General interest list about cl-ppcre and cl-unicode
Well, I reinstalled my clisp/emacs/slime bundle switching to Lisp
Cabinet and I was not able to replicate the problem, so everything's
fine on that regard.

But I have another question: how do I enter Unicode chars in the rexexp?
For example I need to replace "whatever" with “whatever”, I tried to replace

"([^"\r\n]*)"

with

\u201c\1\u201d

but it didn't work.

I know I could generate and concatenate Unicode chars with Lisp, e.g.
(code-char #x201c), but it'd be cleaner to do it directly inside the regexp.

Thanks.

Mario

> .

Edi Weitz

unread,
Apr 14, 2011, 8:52:43 AM4/14/11
to General interest list about cl-ppcre and cl-unicode
On Thu, Apr 14, 2011 at 1:12 PM, Mario Maio <mario...@libero.it> wrote:

> But I have another question: how do I enter Unicode chars in the rexexp?
> For example I need to replace "whatever" with “whatever”, I tried to replace
>
> "([^"\r\n]*)"
>
> with
>
> \u201c\1\u201d
>
> but it didn't work.
>
> I know I could generate and concatenate Unicode chars with Lisp, e.g.
> (code-char #x201c), but it'd be cleaner to do it directly inside the regexp.

For a portable solution, you could give this a try:

http://weitz.de/cl-interpol/

Edi.

Reply all
Reply to author
Forward
0 new messages