Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

(cperl-mode) Problems with syntax highlighting after __DATA__ !

55 views
Skip to first unread message

LanX

unread,
Oct 13, 2010, 8:22:09 AM10/13/10
to
Hi

I have a perlscript processing a large HTML File (>5000 lines)
included after a __DATA__ directive.

I can't scroll anymore thru the buffer because emacs always hangs when
trying to fontify the HTML code according to perl-syntax.

I have no problems when I toggle font-lock-mode of.

Shouldn't cperl stop highlighting after __DATA__ ?

Cheers
rolf

Ilya Zakharevich

unread,
Oct 15, 2010, 2:57:53 AM10/15/10
to

No. What follows __DATA__ is usually code. ;-) Still, this should
better not happen. If you can reproduce the problem with v21 and "my"
CPerl, I may try to look into it. (Highlighting in v22 and in v23 is
more or less completely broken.)

Ilya

LanX

unread,
Oct 15, 2010, 6:40:39 AM10/15/10
to
Hi

>  If you can reproduce the problem with v21 and "my"
> CPerl, I may try to look into it.  (Highlighting in v22 and in v23 is
> more or less completely broken.)

What exactly has to be reproduced, highlighting in a DATA-section or
emacs hanging because it tries to render <tr> like tr/// ?

I'm using (your) cperl v6.2

See also:

http://perlmonks.org/?node_id=865260

Cheers
rolf

LanX

unread,
Oct 15, 2010, 9:14:37 AM10/15/10
to
Hi

I started

emacs21 -l emacs.d/perl/6.2/cperl-mode.el

and reproduced the same problems.

a) Perl code after DATA is highlighted
b) Inserting a big HTML-page makes emacs hanging


Cheers
Rolf

LanX

unread,
Oct 15, 2010, 9:35:30 AM10/15/10
to

to be completely sure I downloaded again your version and did

emacs21 -q --no-site-file -l /tmp/cperl-mode.el.6.2 /tmp/tst.pl

with the same effect, startup is very slow, scrolling thru the DATA
section makes emacs hangig for many seconds sometimes its even
necessary to kill emacs from shell.

emacs21 -version
GNU Emacs 21.4.1

LanX

unread,
Oct 15, 2010, 9:35:43 AM10/15/10
to

Stefan Monnier

unread,
Oct 15, 2010, 10:25:24 AM10/15/10
to
> I can't scroll anymore thru the buffer because emacs always hangs when
> trying to fontify the HTML code according to perl-syntax.

> I have no problems when I toggle font-lock-mode of.

What happens if you use perl-mode instead of cperl-mode?


Stefan

Ilya Zakharevich

unread,
Oct 15, 2010, 6:40:23 PM10/15/10
to

Sorry, not good enough for me. If you have *specific* examples of
*short* stuff which causes a hanging, I may have a chance to try to
put some workarounds.

And BTW "hanging" means what? One needs to press C-g, or what?

Ilya

LanX

unread,
Oct 16, 2010, 10:20:33 AM10/16/10
to
Hi

> Ilya wrote:
> Sorry, not good enough for me.  If you have *specific* examples of
> *short* stuff which causes a hanging, I may have a chance to try to
> put some workarounds.

As I said my case consists of 5000 lines of HTML-code. So a short
example is not possible.

I would greatly appreciate a possibility to switch of perl
highlighting behind __DATA__.

Some people put perl-code there most don't.

IMHO some HTML causes the highlighter to go astray in very deep back-
tracking recursion.

> And BTW "hanging" means what?  One needs to press C-g, or what?

As I said slowing down when scrolling, till the point of
unresponsiveness and the need to kill emacs from shell.

Of course I tried C-g and ESC ESC ESC, but it didn't help, and at the
beginning there is a busy animation shown as mouse pointer.

> Stefan wrote:
> What happens if you use perl-mode instead of cperl-mode?

perl-mode also tries to highlight behind __DATA__, and also slows down
when scrolling sometimes showing a busy mouse pointer. But it doesn't
really hang for more than a second!

(which is no surprise since perl-mode does less highlighting than
cperl-mode - perl has a very rich syntax, maybe the richest of all
mainstream languages)

LanX

unread,
Oct 16, 2010, 10:20:46 AM10/16/10
to
Hi

> Ilya wrote:
> Sorry, not good enough for me.  If you have *specific* examples of
> *short* stuff which causes a hanging, I may have a chance to try to
> put some workarounds.

As I said my case consists of 5000 lines of HTML-code. So a short
example is not possible.

I would greatly appreciate a possibility to switch of perl
highlighting behind __DATA__.

Some people put perl-code there most don't.

IMHO some HTML causes the highlighter to go astray in very deep back-
tracking recursion.

> And BTW "hanging" means what?  One needs to press C-g, or what?

As I said slowing down when scrolling, till the point of

Stefan Monnier

unread,
Oct 16, 2010, 10:56:00 PM10/16/10
to
> perl-mode also tries to highlight behind __DATA__,

Yes, of course: short of using multiple major modes (which is something
Emacs doesn't support very well for now), there's not much highlighting
we can do other than assuming Perl syntax.

I guess we could have perl-mode look at the first line after __DATA__
and if it looks like "# -*- perl -*-" highlight the rest as Perl code,
and otherwise highlight it as some sort of string/comment.

> and also slows down when scrolling sometimes showing a busy mouse
> pointer. But it doesn't really hang for more than a second!

That sounds pretty slow. Speed of perl-mode highlighting non-perl code
is not a very serious concern, but maybe this is a legitimate
performance bug. You might/should be able to reproduce the bug simply
by opening a similar html file and doing M-x perl-mode.

In any case, please report it via M-x report-emacs-bug. You'll probably
need to include a sample file, since I can't reproduce this in my tests
(except when going straight to the end of the file, but there's not
much we can do about that case, other than speed up highlighting.
And that delay is already present in normal cases: it just depends on
the size of the file).

> (which is no surprise since perl-mode does less highlighting than
> cperl-mode - perl has a very rich syntax, maybe the richest of all
> mainstream languages)

Indeed perl-mode does a bit less work here. Tho the only relevant work
here is in making sure the highlighting is correct rather than in
performing the highlighting per-se, so having fewer distinct elements
highlighted does not explain the sped difference. OTOH if perl-mode
gets the highlighting wrong, that could be a good explanation for the
speed difference.


Stefan

Ilya Zakharevich

unread,
Oct 17, 2010, 7:03:15 AM10/17/10
to
On 2010-10-16, LanX <lanx...@googlemail.com> wrote:
>> Sorry, not good enough for me. �If you have *specific* examples of
>> *short* stuff which causes a hanging, I may have a chance to try to
>> put some workarounds.

> As I said my case consists of 5000 lines of HTML-code. So a short
> example is not possible.

I do not see why you see this as a logical inference...

> I would greatly appreciate a possibility to switch of perl
> highlighting behind __DATA__.

Last time I looked into font-lock.el, it was disabling narrowing, so I
do not see how one would achieve this. Deducing the culprit for
slow-down, and working around it looks much more plausible...

Ilya

Ted Zlatanov

unread,
Oct 19, 2010, 11:38:28 AM10/19/10
to
On Sat, 16 Oct 2010 22:56:00 -0400 Stefan Monnier <mon...@iro.umontreal.ca> wrote:

>> perl-mode also tries to highlight behind __DATA__,

SM> Yes, of course: short of using multiple major modes (which is something
SM> Emacs doesn't support very well for now), there's not much highlighting
SM> we can do other than assuming Perl syntax.

I think it's not that bad. __DATA__ is the same content (semantically)
as a here-file, which perl-mode and cperl-mode can handle. It always
ends with EOF or with another __X__ marker on a new line.

Ted

Ilya Zakharevich

unread,
Oct 20, 2010, 3:42:44 PM10/20/10
to
On 2010-10-19, Ted Zlatanov <t...@lifelogs.com> wrote:
> SM> Yes, of course: short of using multiple major modes (which is something
> SM> Emacs doesn't support very well for now), there's not much highlighting
> SM> we can do other than assuming Perl syntax.
>
> I think it's not that bad. __DATA__ is the same content (semantically)
> as a here-file, which perl-mode and cperl-mode can handle.

Are you sure? I think if one would put the same breaks-CPerl content
inside a here-doc, CPerl would slow down as well. AFAIK, here-docs
are still facified; the result is just ignored.

Hmm, on the other hand, if it is syntaxification which is slowed down,
then yes - it should not be a lot of problem to skip stuff after
__DATA__, since CPerl does syntaxification in one pass.

> It always ends with EOF or with another __X__ marker on a new line.

Eh??? AFAIK, it ends with EOF period.

(The code to parse <DATA> may have some special logic to find your
__X__; but it may look for whatever it wants as well...)

Yours,
Ilya

Ilya Zakharevich

unread,
Oct 20, 2010, 3:40:30 PM10/20/10
to
On 2010-10-19, Ted Zlatanov <t...@lifelogs.com> wrote:
> SM> Yes, of course: short of using multiple major modes (which is something
> SM> Emacs doesn't support very well for now), there's not much highlighting
> SM> we can do other than assuming Perl syntax.
>
> I think it's not that bad. __DATA__ is the same content (semantically)
> as a here-file, which perl-mode and cperl-mode can handle.

Are you sure? I think if one would put the same breaks-CPerl content


inside a here-doc, CPerl would slow down as well. AFAIK, here-docs
are still facified; the result is just ignored.

Hmm, on the other hand, if it is syntaxification which is slowed down,
then yes - it should not be a lot of problem to skip stuff after
__DATA__, since CPerl does syntaxification in one pass.

> It always ends with EOF or with another __X__ marker on a new line.

Eh??? AFAIK, it ends with EOF period.

LanX

unread,
Oct 21, 2010, 7:43:43 AM10/21/10
to
Hi

back again was travelling the last days...

> SM> Yes, of course: short of using multiple major modes (which is something
> SM> Emacs doesn't support very well for now), there's not much highlighting
> SM> we can do other than assuming Perl syntax.
>
> I think it's not that bad.  __DATA__ is the same content (semantically)
> as a here-file,

or a POD-section , i.e. natural solution would be comment-face.
(like most other editors I tested do, just check vim)

__END__ is practically the same as __DATA__ and Ilya is right those
sections are terminated only by EOF.


@Stefan: I'm not talking about highlighting according to HTML just
wanna avoid the perl-parser to hang the system.

@Ilya: Sorry the HTML is confidential, I'm even not allowed to publish
parts of it, e.g. by stripping the text between the markup. :( And I
wasn't able to reproduce it.

FWIW my workaround at the moment is to start the __DATA__ section
with
----------

=pod

-----------
to avoid syntaxification.

Ted Zlatanov

unread,
Oct 22, 2010, 11:57:54 AM10/22/10
to
On Thu, 21 Oct 2010 04:43:43 -0700 (PDT) LanX <rolf.la...@googlemail.com> wrote:

>> I think it's not that bad. �__DATA__ is the same content (semantically)
>> as a here-file,

L> or a POD-section , i.e. natural solution would be comment-face.
L> (like most other editors I tested do, just check vim)

L> __END__ is practically the same as __DATA__ and Ilya is right those
L> sections are terminated only by EOF.

Theoretically they are practically the same, but in practice they are not.

__DATA__ begins a section that can be used through the DATA filehandle.
It's a true here-file without interpolation and has no syntax.

__END__ ends the Perl program and any __DATA__ effects. Usually POD
will follow but the Perl parser doesn't care (unlike =cut markers, which
do matter to the Perl parser). The POD extractor, usually `perldoc',
will care. So it's nice to the user to highlight it as POD.

On Wed, 20 Oct 2010 19:40:30 +0000 (UTC) Ilya Zakharevich <nospam...@ilyaz.org> wrote:

IZ> On 2010-10-19, Ted Zlatanov <t...@lifelogs.com> wrote:
>> I think it's not that bad. __DATA__ is the same content (semantically)

>> as a here-file, which perl-mode and cperl-mode can handle.

IZ> Are you sure? I think if one would put the same breaks-CPerl content
IZ> inside a here-doc, CPerl would slow down as well. AFAIK, here-docs
IZ> are still facified; the result is just ignored.

Well, you could try it... I will not presume to know the cperl-mode
internals, I was just talking about parsing the sections when I said


"it's not that bad."

IZ> Hmm, on the other hand, if it is syntaxification which is slowed down,
IZ> then yes - it should not be a lot of problem to skip stuff after
IZ> __DATA__, since CPerl does syntaxification in one pass.

Right. Perl won't parse after __DATA__ or __END__ no matter what.

>> It always ends with EOF or with another __X__ marker on a new line.

IZ> Eh??? AFAIK, it ends with EOF period.

IZ> (The code to parse <DATA> may have some special logic to find your
IZ> __X__; but it may look for whatever it wants as well...)

There are at least a few CPAN modules that care, e.g. Inline::Files and
company. So I think it's nice to highlight every __X__ marker instead
of special-casing __DATA__ (__END__ has to be special because it means
POD will start, usually).

Ted

Ted Zlatanov

unread,
Oct 22, 2010, 12:33:43 PM10/22/10
to
On Thu, 21 Oct 2010 04:43:43 -0700 (PDT) LanX <rolf.la...@googlemail.com> wrote:

L> __END__ is practically the same as __DATA__ and Ilya is right those
L> sections are terminated only by EOF.

btw you should definitely check out `perldoc perldata' and `perldoc
SelfLoader'. That is some nasty stuff done to preserve backwards
compatibility :)

Ted

LanX

unread,
Oct 22, 2010, 5:15:20 PM10/22/10
to
On 22 Okt., 17:57, Ted Zlatanov <t...@lifelogs.com> wrote:
> __DATA__ begins a section that can be used through the DATA filehandle.
> It's a true here-file without interpolation and has no syntax.
>
> __END__ ends the Perl program and any __DATA__ effects.  Usually POD
> will follow but the Perl parser doesn't care (unlike =cut markers, which
> do matter to the Perl parser).  The POD extractor, usually `perldoc',
> will care.  So it's nice to the user to highlight it as POD.

Ted everything after the first __DATA__ or __END__ is just data, even
another __DATA__ or __END__

Just try this code:
----------------------
print while (<DATA>)
__DATA__
a
__END__
b
---------------------
you will see that everything after __DATA__ will be printed.

There are differences between __DATA__ and __END__ regarding package/
modul scope but this has no effect on highlighting or cperl.

And yes, moduls like SelfLoader can be used to eval code in the DATA-
Section on demand, but IMHO SelfLoader is an edge case NOT the rule.

Anyway my preferred solution would be a syntax possibility or local
cperl-variable to decide if the DATA-Section should be highlighted as
perl code or not.

maybe something like

----------------------
package FOOBAR;
use SelfLoader;

__DATA__ #perl-code
sub bla {
...
}
---------------------

LanX

unread,
Oct 22, 2010, 5:15:33 PM10/22/10
to
On 22 Okt., 17:57, Ted Zlatanov <t...@lifelogs.com> wrote:
> __DATA__ begins a section that can be used through the DATA filehandle.
> It's a true here-file without interpolation and has no syntax.
>
> __END__ ends the Perl program and any __DATA__ effects.  Usually POD
> will follow but the Perl parser doesn't care (unlike =cut markers, which
> do matter to the Perl parser).  The POD extractor, usually `perldoc',
> will care.  So it's nice to the user to highlight it as POD.

Ted everything after the first __DATA__ or __END__ is just data, even

Ilya Zakharevich

unread,
Oct 24, 2010, 1:59:07 AM10/24/10
to
On 2010-10-22, Ted Zlatanov <t...@lifelogs.com> wrote:
> L> or a POD-section , i.e. natural solution would be comment-face.
> L> (like most other editors I tested do, just check vim)
>
> L> __END__ is practically the same as __DATA__ and Ilya is right those
> L> sections are terminated only by EOF.
>
> Theoretically they are practically the same, but in practice they are not.

[Ignoring that, formally, this is content-free,] I believe you are mistaken.

> __END__ ends the Perl program and any __DATA__ effects.

No.

> Right. Perl won't parse after __DATA__ or __END__ no matter what.

Nevertheless, most of the time, it will.

> IZ> (The code to parse <DATA> may have some special logic to find your
> IZ> __X__; but it may look for whatever it wants as well...)
>
> There are at least a few CPAN modules that care, e.g. Inline::Files and
> company. So I think it's nice to highlight every __X__ marker instead
> of special-casing __DATA__ (__END__ has to be special because it means
> POD will start, usually).

I'm very wary about handling "quirks of Perl modules" - having handled
quirks of Perl itself. ;-)

Yours,
Ilya

Stefan Monnier

unread,
Oct 29, 2010, 2:41:24 PM10/29/10
to
>> Right. Perl won't parse after __DATA__ or __END__ no matter what.
> Nevertheless, most of the time, it will.

Is the meaning of __DATA__, __END__ (and maybe even __<FOO>__)
documented somewhere?


Stefan

LanX

unread,
Oct 29, 2010, 4:07:32 PM10/29/10
to

> Is the meaning of __DATA__, __END__ (and maybe even __<FOO>__)
> documented somewhere?

http://perldoc.perl.org/perldata.html#Special-Literals

0 new messages