Re: nytprofhtml generating "File format error"

8 views
Skip to first unread message

Tim Bunce

unread,
Jul 30, 2008, 5:17:11 PM7/30/08
to Nicholas Clark, Tim Bunce, develnyt...@googlegroups.com
[CC'd to develnyt...@googlegroups.com]

On Wed, Jul 30, 2008 at 09:47:22AM +0100, Nicholas Clark wrote:
> On Tue, Jul 29, 2008 at 05:45:46PM +0100, Tim Bunce wrote:
> > On Tue, Jul 29, 2008 at 02:50:51PM +0100, Nicholas Clark wrote:
> > > On Tue, Jul 29, 2008 at 02:41:01PM +0100, Nicholas Clark wrote:
> >
> > > Annoyingly I can't recreate it.
> >
> > Umm. That means it's less likely to be due to strings containing newline.
> >
> > If there's forking involved then perhaps the reinit_if_forked()
> > isn't spotting the fork soon enough.
> >
> > Could you send me a zip'd copy of the original nytprof.out that has the
> > corruption? I could probably work out the cause from that.

Looking at copyright.out, prior to it failing with
Read 17:810 15 ticks
Fid 29 is /home/nclark/svn_working/netbanx/trunk/x
File format error: token 129 ('<81>'), chunk 12043 ...
it says
Read 19:60 14 ticks
Fid 26 is /home/nclark/svn_working/netbanx/trunk0^U^H^C

Note the trailing garbage in the filename.

Fid 17 is /usr/lib/perl5/DBI.pm
what's on/after line 810 of your DBI.pm?

Fid 19 is /usr/share/perl5/DBD/SQLite.pm
what's on/after line 60 of your SQLite.pm?

How stable/reliable is OutCopFILE(cop)?
What's the lifespan of the pointer returned?
What situations could cause the pointed-to data to change?
Is it reliably null-terminated?
Plus the same questions for CvFILE(cv).
(I'm glad you're the right man to ask these questions of :)

The fact it's hard to reproduce point to these as possible culprits.

Tim.

Nicholas Clark

unread,
Jul 31, 2008, 7:23:43 AM7/31/08
to Tim Bunce, develnyt...@googlegroups.com
On Wed, Jul 30, 2008 at 10:17:11PM +0100, Tim Bunce wrote:

> Looking at copyright.out, prior to it failing with
> Read 17:810 15 ticks
> Fid 29 is /home/nclark/svn_working/netbanx/trunk/x
> File format error: token 129 ('<81>'), chunk 12043 ...
> it says
> Read 19:60 14 ticks
> Fid 26 is /home/nclark/svn_working/netbanx/trunk0^U^H^C
>
> Note the trailing garbage in the filename.

Aha.

> Fid 17 is /usr/lib/perl5/DBI.pm
> what's on/after line 810 of your DBI.pm?

# $Id: DBI.pm 10087 2007-10-16 12:42:37Z timbo $

Line 810 is "my $class..." in

sub setup_driver {
my ($class, $driver_class) = @_;
my $type;
foreach $type (qw(dr db st)){
my $class = $driver_class."::$type";
no strict 'refs';
push @{"${class}::ISA"}, "DBD::_::$type"
unless UNIVERSAL::isa($class, "DBD::_::$type");
my $mem_class = "DBD::_mem::$type";
push @{"${class}_mem::ISA"}, $mem_class
unless UNIVERSAL::isa("${class}_mem", $mem_class)
or $DBI::PurePerl;
}
}

> Fid 19 is /usr/share/perl5/DBD/SQLite.pm
> what's on/after line 60 of your SQLite.pm?

# $Id: SQLite.pm,v 1.52 2006/04/10 01:50:05 matt Exp $

Line 60 is

return $dbh;

in &connect.


> How stable/reliable is OutCopFILE(cop)?
> What's the lifespan of the pointer returned?
> What situations could cause the pointed-to data to change?
> Is it reliably null-terminated?

I can't remember for sure, but I think that there were issues where it's
(still) pointing to an OP that has just been freed, thanks to a DESTROY call
or somesuch. I think that they got fixed by 5.10, and might well be in 5.8.9

Looking at the core code, I think that the value pointed at is not going to
"go away", as for ithreads it's a copy of a string, and for not-ithreads it's
a reference to a GV.

[Hmm, which reminds me, I think that not having that copy was a space
optimisation I was considering trying in 5.11. One could store them as
indices into an array, which is cloned at thread creation, plus a hash to
give the lookup into the array. That way there would be one copy of the
string of the filename per file, rather than one per COP]

So, yes, I think that it goes wrong if the COP in question gets freed
underneath you.

> Plus the same questions for CvFILE(cv).

Not sure. Looks like under ithreads it's safe, as it's a copy made at CV
construction time. For not-ithreads, it's set from this:

# define CopFILE(c) (CopFILEGV(c) && GvSV(CopFILEGV(c)) \
? SvPVX(GvSV(CopFILEGV(c))) : NULL)

so it's just a pointer to the PV buffer of the SV of the GV.
Which is fine. Only it doesn't hold a reference to it. So conceivably it
could go away.

Maybe that should change for non-ithreads to a SV*, owning a reference,
with an accessor macro that hides this.

> (I'm glad you're the right man to ask these questions of :)

Am I? :-)

Now, what you didn't ask about was GvFILE(), which IIRC is a complete can of
worms in 5.8.x, because it is a char * pointing somewhere else which is
*assumed* to stay around. Only in some situations it doesn't. Specifically,
I think, it was BEGIN blocks. Or something to do with code loaded by XS
modules.

Whatever it was, it gave me pain with Devel::DProf in late 2006. So I tried
to nail all the bugs, and I think that they're all fixed in 5.10
(but can't be in 5.8.x, because the C struct had to change)

> The fact it's hard to reproduce point to these as possible culprits.

Yes, seems plausible. Until profilers came along, nothing much was reading
these parts of the data structures in bulk, and hence spotting the problems.

Nicholas Clark

Adam Kaplan

unread,
Jul 31, 2008, 10:36:54 AM7/31/08
to develnyt...@googlegroups.com, Tim Bunce
Nicholas,

ni...@ccl4.org is added to the project member list at http://perl-devel-nytprof.googlecode.com

You have SVN write access now.  Thanks for all the help!

(if you'd rather have a proper google account added instead, let me know)
--
Adam J. Kaplan
Software Engineer, Code Critic and Cocoa Drinker.
The New York Times Company

Tim Bunce

unread,
Aug 1, 2008, 9:07:48 AM8/1/08
to develnyt...@googlegroups.com, Tim Bunce
On Thu, Jul 31, 2008 at 12:23:43PM +0100, Nicholas Clark wrote:
>
> On Wed, Jul 30, 2008 at 10:17:11PM +0100, Tim Bunce wrote:
>
> > Looking at copyright.out, prior to it failing with
> > Read 17:810 15 ticks
> > Fid 29 is /home/nclark/svn_working/netbanx/trunk/x
> > File format error: token 129 ('<81>'), chunk 12043 ...
> > it says
> > Read 19:60 14 ticks
> > Fid 26 is /home/nclark/svn_working/netbanx/trunk0^U^H^C
> >
> > Note the trailing garbage in the filename.
>
> Aha.

With recent changes the data file should now be unaffected by garbage in
file or sub names.

I've also added a flag in the fid to indicate if the fid was first seen
via the sub or stmt profiler. That may help pinpoint the cause.

> > How stable/reliable is OutCopFILE(cop)?
> > What's the lifespan of the pointer returned?
> > What situations could cause the pointed-to data to change?
> > Is it reliably null-terminated?
>

> [...]

Before I dig deeper into those (very helpful) answers, can you try to
recreate the problem using the current trunk NYTProf? Then let me fetch
the zipped data file. Thanks!

Tim.

Tim Bunce

unread,
Aug 1, 2008, 4:00:43 PM8/1/08
to develnyt...@googlegroups.com, Nicholas Clark, Tim Bunce

Where by "recreate the problem" I now mean see corrupt filenames
(rather than the "File format error" you saw before).

Umm, maybe I'll add detection of non-print-no-space chars
so we get an immediate warning when it happens rather than
have to manually scan many filenames.

Tim.

Nicholas Clark

unread,
Aug 4, 2008, 4:24:51 AM8/4/08
to Tim Bunce, develnyt...@googlegroups.com
On Fri, Aug 01, 2008 at 09:00:43PM +0100, Tim Bunce wrote:
> On Fri, Aug 01, 2008 at 02:07:48PM +0100, Tim Bunce wrote:

> > Before I dig deeper into those (very helpful) answers, can you try to
> > recreate the problem using the current trunk NYTProf? Then let me fetch
> > the zipped data file. Thanks!
>
> Where by "recreate the problem" I now mean see corrupt filenames
> (rather than the "File format error" you saw before).

Well, I re-profiled our unit tests, and there were no reported problems.
They have moved on - I was profiling HEAD on trunk again, but I'm not sure
which revision I profiled before, so I can't re-create perfect. Sorry :-(

However, I've zipped up the two profiles, made with NYProf r386, and will
mail you the URL privately.

Nicholas Clark

Reply all
Reply to author
Forward
0 new messages