In particular, I'm curious about the rate of commits (to blead, and in total)
over time, and the average size of commits, for the past five years.
Nicholas Clark
It would be fairly straight forward to script something that generated
the raw data.
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
> 2009/4/4 Nicholas Clark <ni...@ccl4.org>:
>> Are there any ready made git tools to graph commits?
>>
>> In particular, I'm curious about the rate of commits (to blead, and
>> in total)
>> over time, and the average size of commits, for the past five years.
>
> It would be fairly straight forward to script something that generated
> the raw data.
Since I worked on URI::GoogleChart recently, let me demonstrate with a
graphing example using it. Just need to enhance the module a bit so
that it's easier to get some labels on the Y-axis.
#!/usr/bin/perl
use strict;
my %count;
open(LOG, "git log --pretty=fuller --date=iso|") || die;
while (<LOG>) {
if (/^CommitDate:\s*(\d{4}-\d{2})/) {
$count{$1}++;
}
}
use URI::GoogleChart;
my $uri = URI::GoogleChart->new("lines", 500, 150,
title => "Blead commits per month",
data => [map $count{$_}, sort keys %count],
range_round => 1,
range_show => "left",
encoding => "s",
);
system("/usr/bin/open", $uri) if -x "/usr/bin/open";
print "$uri\n";
__END__
> On Apr 5, 2009, at 10:23 , demerphq wrote:
>
>> 2009/4/4 Nicholas Clark <ni...@ccl4.org>:
>>> Are there any ready made git tools to graph commits?
>>>
>>> In particular, I'm curious about the rate of commits (to blead,
>>> and in total)
>>> over time, and the average size of commits, for the past five years.
>>
>> It would be fairly straight forward to script something that
>> generated
>> the raw data.
>
> Since I worked on URI::GoogleChart recently, let me demonstrate with
> a graphing example using it. Just need to enhance the module a bit
> so that it's easier to get some labels on the Y-axis.
With X-axis labels the script to generate the graph isn't as
short&sweet :-(
#!/usr/bin/perl
use strict;
my %count;
open(LOG, "git log --pretty=fuller --date=iso|") || die;
while (<LOG>) {
if (/^CommitDate:\s*(\d{4}-\d{2})/) {
$count{$1}++;
}
}
# fill in missing months
my @m = sort keys %count;
my @m_all = ($m[0]);
while ($m_all[-1] ne $m[-1]) {
push(@m_all, next_month($m_all[-1]));
}
@m = @m_all;
# calculate year label positions
my @lab;
my @pos;
for (my $i = 0; $i < @m; $i++) {
my($y) = split(/-/, $m[$i]);
if (!@lab || $lab[-1] != $y) {
push(@lab, $y);
push(@pos, sprintf "%.1f", $i / @m * 100);
}
}
shift(@lab); shift(@pos); # drop the first one
use URI::GoogleChart;
my $uri = URI::GoogleChart->new("lines", 700, 150,
title => "Blead commits per month",
data => [map $count{$_}, @m],
range_round => 1,
range_show => "left",
encoding => "s",
# no direct support for labels yet
chxt => "x",
chxl => join("|", "0:", @lab),
chxp => join(",", 0, @pos),
);
system("/usr/bin/open", $uri) if -x "/usr/bin/open";
print "$uri\n";
sub next_month {
my($y, $m) = split(/-/, shift);
$m++;
if ($m > 12) {
$m = 1;
$y++;
}
return sprintf "%04d-%02d", $y, $m;
}
__END__
GitStats <http://gitstats.sourceforge.net/> looked promising but the example
graphs provided on the site are a bit disappointing.
Yes. That's really the biggest thing that it's missing. Although, then what
I'd like is the ability to annotate it with labels showing the various stable
releases... :-)
At the centre of the graph are two spiked peaks, with a lower flat-topped
peak between them. The left peak is 5.6.0 in March 2000. The right peak is
just before 5.8.0, March to May 2002. There's then a massive valley, before
the next peak, which is just before 5.8.1 in August 2003.
The next peak is June 2005, which doesn't correlate with a release, and after
that I can't see much correlation between release dates and spikes in the
graph. Of course, now I want multi-line graphs with commits per-person. I'm
sure the lines for Jarkko and Sarathy would be scary...
Nicholas Clark
Maybe:
git log --pretty="format:%ci"
is a teeny bit more convenient. :-)
Wait till you see git for-each-ref. ;-)
There are two templating systems in git at least, for-each-ref, which
knows how to deal with tags and a few other things, and git log,
which doesnt understand tags and has some other options f-e-r doesnt.
And just to make life interesting the template systems are sortof
different, not totally, but in that really annoying grey zone of close
enough to merge them in your memory, but different enough that you
always check the docs every time, and even more fun for-each-ref has a
tendency to segv on older gits on certain data. :-( If i ever find
copious free time again i want to write a git sprintf that unifies the
two.
cheers,
> > open(LOG, "git log --pretty=fuller --date=iso|") || die;
> > while (<LOG>) {
> > if (/^CommitDate:\s*(\d{4}-\d{2})/) {
> > $count{$1}++;
> > }
> > }
>
> Maybe:
>
> git log --pretty="format:%ci"
>
> is a teeny bit more convenient. :-)
Independent of that, I'm finding the -z option rather useful, as a record
separator of "\0" makes parsing it much much easier.
git log does have an insanely flexible amount of output options.
Nicholas Clark
Right, I've tweaked things somewhat. Script appended.
Graph is http://xrl.us/benyh2 or, the longer version:
git isn't handist (which I approve of) but it has this "interesting" problem
that git log shows you all the log entries, from both the left and right
parents of each merge commit. So I added a heuristic to pick one linear path
back, by choosing at each merge choose to trace back on the parent with the
smaller diff. It all works on in the end, because all the branches (that
matter) fork off the 1.0 checkin somewhere, and I'm after the diffs between
each pair of commits on a linear path. To check this isn't crazy, the net
diff from the child whichever parent was chosen is shown in purple. It is
never close to the total net code change for the month.
Problem is that there are some big spikes the swamp the rest, such as net
200,000 lines added in March 2002 by Encode, so I felt that I needed to go for
a logarithmic scale. It's somewhat cheating - because one can't have
fractional values of commit, negative numbers are logarithmic net deletions
(where needed)
Rather than having one line go wildly positive and negative for net code
change, I elected to split it into red for months of code addition (the usual)
and green for months of net code deletion (sometimes happens). There's a
purple line for total code churn (lines added + lines deleted), and an acme
approved orange line for the total commits, scaled to be 0-7 like everything
else.
I think that the little spike in total churn on the penultimate month plotted
is me moving things around in ext/
My general conclusions are that
1: Our data are very noisy
2: The switch to git hasn't (yet) had any real effect on the number of commits,
or the size of the commits.
Nicholas Clark
#!/usr/bin/perl -w
use strict;
my (%count, %ins, %del, %merge_in, %merge_out);
if (@ARGV) {
while (<>) {
my ($month, @data) = /^([^:]+):\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/;
die $_ unless defined $month;
$count{$month} = shift @data;
$ins{$month} = shift @data;
$del{$month} = shift @data;
$merge_in{$month} = shift @data;
$merge_out{$month} = shift @data;
}
} else {
open(LOG, "git log -z --parents --shortstat --pretty=fuller --date=iso origin/blead|") || die;
my $looking_for = `git rev-list -n 1 HEAD`;
chomp $looking_for;
local $/ = "\0";
my $shortstat = qr/^ \d+ files changed, (\d+) insertions\(\+\), (\d+) deletions\(-\)/m;
while (<LOG>) {
chomp;
my ($commitline) = /\Acommit +([ 0-9a-f]+)/;
my ($month) = /^CommitDate:\s*(\d{4}-\d{2})/m;
die $_ unless defined $commitline && defined $month;
my ($self, @parents) = split / /, $commitline;
next unless $self eq $looking_for;
$count{$month}++;
if (@parents == 1) {
my ($ins, $del) = /$shortstat/o;
if (defined $del) {
$ins{$month}+= $ins;
$del{$month}+= $del;
}
$looking_for = $parents[0];
print STDERR "From $self to $looking_for\n";
} elsif (@parents == 0) {
print STDERR "Stopping at $self\n";
} else {
my (%delta, %in, %out);
foreach my $try (@parents) {
my $stats = `git diff --shortstat $try $self`;
unless (defined $stats) {
warn "Problem for $try $self\n$_";
next;
}
my ($ins, $del) = $stats =~ /$shortstat/o;
unless (defined $ins) {
print STDERR "No diff $try $self for:\n$_";
$ins = $del = 0;
}
$delta{$try} = $ins + $del;
$in{$try} = $ins;
$out{$try} = $del;
}
my ($best, @rest) = shift @parents;
my $max = $delta{$best};
foreach my $try (@rest) {
next unless $delta{$try} < $max;
$delta{$try} = $max;
$best = $try;
}
$merge_in{$month}+= $in{$best};
$merge_out{$month}+= $out{$best};
$looking_for = $best;
print STDERR "At $self from @parents pick $best\n"
}
}
foreach (sort keys %ins) {
printf "$_: $count{$_} $ins{$_} $del{$_} %d %d\n",
$merge_in{$_}||0, $merge_out{$_}||0;
}
}
# fill in missing months
my @m = sort keys %count;
my @m_all = ($m[0]);
while ($m_all[-1] ne $m[-1]) {
push(@m_all, next_month($m_all[-1]));
}
@m = @m_all;
# calculate year label positions
my @lab;
my @pos;
for (my $i = 0; $i < @m; $i++) {
my($y) = split(/-/, $m[$i]);
if (!@lab || $lab[-1] != $y) {
push(@lab, $y);
push(@pos, sprintf "%.1f", $i / @m * 100);
}
}
shift(@lab); shift(@pos); # drop the first one
# Drop the last month as it's probably incomplete.
pop @m;
sub loggit {
map {
defined $_ ? do {
my $sign = $_ <=> 0;
$sign ? $sign * log(abs $_) / log(10): 0;
} : undef;
} @_;
}
sub only_pos {
map { defined $_ && $_ > 0 ? $_ : undef } @_;
}
sub invert {
map { defined $_ && -$_ } @_;
}
# Ensure that the value before every (defined) value is defined and zero.
# Somewhat a con, as Google won't plot single points, and zero logically is
# correct for all the missing values, but the graph gets messy if everything
# is running alone the zero line.
sub smear {
my @vals = (0, @_);
for my $i (1 .. $#vals) {
$vals[$i-1] ||= 0 if defined $vals[$i] and !defined $vals[$i + 1];
}
shift @vals;
@vals;
}
require URI::GoogleChart;
my @count = map $count{$_} ? $count{$_} / 100 : undef, @m;
my @merges = map {$merge_in{$_} ? $merge_in{$_} - $merge_out{$_} : undef} @m;
my @net = map {($ins{$_} || $merge_in{$_})
? ($ins{$_}||0 + $merge_in{$_}||0) - ($del{$_}||0 + $merge_out{$_}||0)
: undef} @m;
my @abs = map {($ins{$_} || $merge_in{$_})
? ($ins{$_}||0 + $merge_in{$_}||0) + ($del{$_}||0 + $merge_out{$_}||0)
: undef} @m;
my $uri = URI::GoogleChart->new("lines", 1000, 300,
title => 'red: net increase, green: net decrease, navy: net change from merges, purple: abs change. y axis is sign($x) * 10**(abs $x) lines. orange: commits / 100',
color => [qw(red green navy purple FE9900)],
data => [[smear(loggit(only_pos(@net)))],
[smear(loggit(only_pos(invert(@net))))],
[smear(loggit(@merges))],
[smear(loggit(@abs))],
[smear(@count)],
],
range_round => 1,
range_show => "left",
encoding => "s",
# no direct support for labels yet
chxt => "x",
chxl => join("|", "0:", @lab),
chxp => join(",", 0, @pos),
);
system("/usr/bin/open", $uri) if -x "/usr/bin/open";
print STDERR "$uri\n";
It did make cherrypicking for the DarkPAN much easier.
Josh
What do you mean? Making a private patched version for work?
Nicholas Clark
Yes, prior to just settling on a .deb with 5.10.0 + a pile of modules,
part of the build instructions were "check out maint-5.x, then
cherrypick a list of shas".
In particular, I started with 5.8.7 plus several patches I was able to
locate through git-fu to help get it built on the Ubuntu 64 bit
machine I was targetting.
Later, I moved to 5.10.0 + some commits from maint-5.10. Again,
locating and integrating the commits through simple git-fu was a
breeze.
Now we're just plain old 5.10.0 but could easily incorporate things in
if needed.
Josh