Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

graphing commits

5 views
Skip to first unread message

Nicholas Clark

unread,
Apr 4, 2009, 5:34:25 PM4/4/09
to perl5-...@perl.org
Are there any ready made git tools to graph commits?

In particular, I'm curious about the rate of commits (to blead, and in total)
over time, and the average size of commits, for the past five years.

Nicholas Clark

Demerphq

unread,
Apr 5, 2009, 4:23:32 AM4/5/09
to perl5-...@perl.org
2009/4/4 Nicholas Clark <ni...@ccl4.org>:

> Are there any ready made git tools to graph commits?
>
> In particular, I'm curious about the rate of commits (to blead, and in total)
> over time, and the average size of commits, for the past five years.

It would be fairly straight forward to script something that generated
the raw data.

Yves


--
perl -Mre=debug -e "/just|another|perl|hacker/"

Gisle Aas

unread,
Apr 5, 2009, 5:02:07 AM4/5/09
to demerphq, perl5-...@perl.org
On Apr 5, 2009, at 10:23 , demerphq wrote:

> 2009/4/4 Nicholas Clark <ni...@ccl4.org>:
>> Are there any ready made git tools to graph commits?
>>
>> In particular, I'm curious about the rate of commits (to blead, and
>> in total)
>> over time, and the average size of commits, for the past five years.
>
> It would be fairly straight forward to script something that generated
> the raw data.

Since I worked on URI::GoogleChart recently, let me demonstrate with a
graphing example using it. Just need to enhance the module a bit so
that it's easier to get some labels on the Y-axis.

#!/usr/bin/perl

use strict;

my %count;
open(LOG, "git log --pretty=fuller --date=iso|") || die;
while (<LOG>) {
if (/^CommitDate:\s*(\d{4}-\d{2})/) {
$count{$1}++;
}
}

use URI::GoogleChart;
my $uri = URI::GoogleChart->new("lines", 500, 150,
title => "Blead commits per month",
data => [map $count{$_}, sort keys %count],
range_round => 1,
range_show => "left",
encoding => "s",
);

system("/usr/bin/open", $uri) if -x "/usr/bin/open";
print "$uri\n";
__END__

<http://chart.apis.google.com/chart?cht=lc&chs=500x150&chd=s:DEDDDEDDEEEDDDDEEEDDDDDDDDDEDEDEDDDEKLMDFIPMNKHLPOSSKIEOOJPHIQMINTiMNURKLXLHVHaVUWLPTouNQNRsPckdaVmim5lZnpxxqi720lYPMOJLQQVUYUfkZNNOONQKKMMJHJKMQJOQXiULPYZddZbeYMROOYXVYSXXTSQMVUTVcSNNMLJJLPQXdXRE&chtt=Blead+commits+per+month&chxr=0,-50,7e%2B02&chxt=y
>

Gisle Aas

unread,
Apr 5, 2009, 6:02:47 AM4/5/09
to perl5-...@perl.org

On Apr 5, 2009, at 11:02 , Gisle Aas wrote:

> On Apr 5, 2009, at 10:23 , demerphq wrote:
>
>> 2009/4/4 Nicholas Clark <ni...@ccl4.org>:
>>> Are there any ready made git tools to graph commits?
>>>
>>> In particular, I'm curious about the rate of commits (to blead,
>>> and in total)
>>> over time, and the average size of commits, for the past five years.
>>
>> It would be fairly straight forward to script something that
>> generated
>> the raw data.
>
> Since I worked on URI::GoogleChart recently, let me demonstrate with
> a graphing example using it. Just need to enhance the module a bit
> so that it's easier to get some labels on the Y-axis.

With X-axis labels the script to generate the graph isn't as
short&sweet :-(

#!/usr/bin/perl

use strict;

my %count;
open(LOG, "git log --pretty=fuller --date=iso|") || die;
while (<LOG>) {
if (/^CommitDate:\s*(\d{4}-\d{2})/) {
$count{$1}++;
}
}

# fill in missing months
my @m = sort keys %count;
my @m_all = ($m[0]);
while ($m_all[-1] ne $m[-1]) {
push(@m_all, next_month($m_all[-1]));
}
@m = @m_all;

# calculate year label positions
my @lab;
my @pos;
for (my $i = 0; $i < @m; $i++) {
my($y) = split(/-/, $m[$i]);
if (!@lab || $lab[-1] != $y) {
push(@lab, $y);
push(@pos, sprintf "%.1f", $i / @m * 100);
}

}
shift(@lab); shift(@pos); # drop the first one

use URI::GoogleChart;
my $uri = URI::GoogleChart->new("lines", 700, 150,


title => "Blead commits per month",

data => [map $count{$_}, @m],


range_round => 1,
range_show => "left",
encoding => "s",

# no direct support for labels yet
chxt => "x",
chxl => join("|", "0:", @lab),
chxp => join(",", 0, @pos),
);

system("/usr/bin/open", $uri) if -x "/usr/bin/open";
print "$uri\n";

sub next_month {
my($y, $m) = split(/-/, shift);
$m++;
if ($m > 12) {
$m = 1;
$y++;
}
return sprintf "%04d-%02d", $y, $m;
}

__END__

<http://chart.apis.google.com/chart?cht=lc&chs=700x150&chd=s:DED___D_______________DED_DE____E_ED_D_DD_E____E______E_______D_______DDD__DDD____D_DEDEDED___DDEKLMDFIPMNKHLPOSSKIEOOJPHIQMINTiMNURKLXLHVHaVUWLPTouNQNRsPckdaVmim5lZnpxxqi720lYPMOJLQQVUYUfkZNNOONQKKMMJHJKMQJOQXiULPYZddZbeYMROOYXVYSXXTSQMVUTVcSNNMLJJLPQXdXRE&chtt=Blead+commits+per+month&chxl=0:%7C1988%7C1989%7C1990%7C1991%7C1992%7C1993%7C1994%7C1995%7C1996%7C1997%7C1998%7C1999%7C2000%7C2001%7C2002%7C2003%7C2004%7C2005%7C2006%7C2007%7C2008%7C2009&chxp=0,0.4,5.1,9.7,14.4,19.1,23.7,28.4,33.1,37.7,42.4,47.1,51.8,56.4,61.1,65.8,70.4,75.1,79.8,84.4,89.1,93.8,98.4&chxr=1,-50,7e%2B02&chxt=x,y
>

Torsten Schoenfeld

unread,
Apr 5, 2009, 6:16:04 AM4/5/09
to perl5-...@perl.org
Nicholas Clark wrote:
> Are there any ready made git tools to graph commits?

GitStats <http://gitstats.sourceforge.net/> looked promising but the example
graphs provided on the site are a bit disappointing.

Nicholas Clark

unread,
Apr 5, 2009, 6:03:00 AM4/5/09
to Gisle Aas, demerphq, perl5-...@perl.org
On Sun, Apr 05, 2009 at 11:02:07AM +0200, Gisle Aas wrote:
> On Apr 5, 2009, at 10:23 , demerphq wrote:
>
> >2009/4/4 Nicholas Clark <ni...@ccl4.org>:
> >>Are there any ready made git tools to graph commits?
> >>
> >>In particular, I'm curious about the rate of commits (to blead, and
> >>in total)
> >>over time, and the average size of commits, for the past five years.
> >
> >It would be fairly straight forward to script something that generated
> >the raw data.
>
> Since I worked on URI::GoogleChart recently, let me demonstrate with a
> graphing example using it. Just need to enhance the module a bit so
> that it's easier to get some labels on the Y-axis.

Yes. That's really the biggest thing that it's missing. Although, then what
I'd like is the ability to annotate it with labels showing the various stable
releases... :-)

> <http://chart.apis.google.com/chart?cht=lc&chs=500x150&chd=s:DEDDDEDDEEEDDDDEEEDDDDDDDDDEDEDEDDDEKLMDFIPMNKHLPOSSKIEOOJPHIQMINTiMNURKLXLHVHaVUWLPTouNQNRsPckdaVmim5lZnpxxqi720lYPMOJLQQVUYUfkZNNOONQKKMMJHJKMQJOQXiULPYZddZbeYMROOYXVYSXXTSQMVUTVcSNNMLJJLPQXdXRE&chtt=Blead+commits+per+month&chxr=0,-50,7e%2B02&chxt=y
> >

At the centre of the graph are two spiked peaks, with a lower flat-topped
peak between them. The left peak is 5.6.0 in March 2000. The right peak is
just before 5.8.0, March to May 2002. There's then a massive valley, before
the next peak, which is just before 5.8.1 in August 2003.

The next peak is June 2005, which doesn't correlate with a release, and after
that I can't see much correlation between release dates and spikes in the
graph. Of course, now I want multi-line graphs with commits per-person. I'm
sure the lines for Jarkko and Sarathy would be scary...

Nicholas Clark

Demerphq

unread,
Apr 5, 2009, 9:23:04 AM4/5/09
to Gisle Aas, perl5-...@perl.org
2009/4/5 Gisle Aas <gi...@activestate.com>:

> On Apr 5, 2009, at 10:23 , demerphq wrote:
>
>> 2009/4/4 Nicholas Clark <ni...@ccl4.org>:
>>>
>>> Are there any ready made git tools to graph commits?
>>>
>>> In particular, I'm curious about the rate of commits (to blead, and in
>>> total)
>>> over time, and the average size of commits, for the past five years.
>>
>> It would be fairly straight forward to script something that generated
>> the raw data.
>
> Since I worked on URI::GoogleChart recently, let me demonstrate with a
> graphing example using it. Just need to enhance the module a bit so that
> it's easier to get some labels on the Y-axis.
>
> #!/usr/bin/perl
>
> use strict;
>
> my %count;
> open(LOG, "git log --pretty=fuller --date=iso|") || die;
> while (<LOG>) {
> if (/^CommitDate:\s*(\d{4}-\d{2})/) {
> $count{$1}++;
> }
> }

Maybe:

git log --pretty="format:%ci"

is a teeny bit more convenient. :-)

Demerphq

unread,
Apr 5, 2009, 9:36:02 AM4/5/09
to perl5-...@perl.org
2009/4/5 Nicholas Clark <ni...@ccl4.org>:
> On Sun, Apr 05, 2009 at 03:23:04PM +0200, demerphq wrote:
>> 2009/4/5 Gisle Aas <gi...@activestate.com>:

>
>> > open(LOG, "git log --pretty=fuller --date=iso|") || die;
>> > while (<LOG>) {
>> > if (/^CommitDate:\s*(\d{4}-\d{2})/) {
>> > $count{$1}++;
>> > }
>> > }
>>
>> Maybe:
>>
>> git log --pretty="format:%ci"
>>
>> is a teeny bit more convenient. :-)
>
> Independent of that, I'm finding the -z option rather useful, as a record
> separator of "\0" makes parsing it much much easier.
>
> git log does have an insanely flexible amount of output options.

Wait till you see git for-each-ref. ;-)

There are two templating systems in git at least, for-each-ref, which
knows how to deal with tags and a few other things, and git log,
which doesnt understand tags and has some other options f-e-r doesnt.
And just to make life interesting the template systems are sortof
different, not totally, but in that really annoying grey zone of close
enough to merge them in your memory, but different enough that you
always check the docs every time, and even more fun for-each-ref has a
tendency to segv on older gits on certain data. :-( If i ever find
copious free time again i want to write a git sprintf that unifies the
two.

cheers,

Nicholas Clark

unread,
Apr 5, 2009, 9:30:35 AM4/5/09
to demerphq, Gisle Aas, perl5-...@perl.org
On Sun, Apr 05, 2009 at 03:23:04PM +0200, demerphq wrote:
> 2009/4/5 Gisle Aas <gi...@activestate.com>:

> > open(LOG, "git log --pretty=fuller --date=iso|") || die;


> > while (<LOG>) {
> > if (/^CommitDate:\s*(\d{4}-\d{2})/) {
> > $count{$1}++;
> > }
> > }
>
> Maybe:
>
> git log --pretty="format:%ci"
>
> is a teeny bit more convenient. :-)

Independent of that, I'm finding the -z option rather useful, as a record


separator of "\0" makes parsing it much much easier.

git log does have an insanely flexible amount of output options.

Nicholas Clark

Nicholas Clark

unread,
Apr 5, 2009, 6:32:05 PM4/5/09
to Gisle Aas, demerphq, perl5-...@perl.org

Right, I've tweaked things somewhat. Script appended.

Graph is http://xrl.us/benyh2 or, the longer version:

http://chart.apis.google.com/chart?cht=lc&chs=1000x300&chco=FF0000,008000,000080,800080,FE9900&chd=s:nh__Vt_______________vml_mq___Vs_qoVn_rmVp___Vr_____Vt______Vu_______wtl__kwl_______sqsmpo___ruuusg__________Vk_ifkjjnqpjtqoqwvrtptjittjvtvqttrt_utssotvrwuttvwvvyvvvwwvtv2zutr_quourutswrvtrvrr__uw_qpvvpqmswsvtwuprxrxwsxrnuVs_vurstryusrrstswptptoqrkvspurpv,________________________________________________________________________________VvVo________________Vm_________________________________________Vl_____________________________Vs________________tp_Vy________________________VtVt______________________________,_______________________________________________________________________________________________________________V0___stCqptpjvPyrcpt_srlllmtuxpoqqqplpet__bqpslqlincV_lkktoilmkiY_Vf_je__gY_________________________________________________________________rnbl,oj__Vv_______________xpn_or___Vt_spVo_xoVr___Vs_____Vu______Vu_______0yr__www___V1_rusvsvr___uxvvxu_Vs_______Vn_nhnlortrqxrxtyzswvvmmvuqxyytuwtyxwzutqw0xzxwv0zxw2y3z0zy1w53xv2xvwrwtvzvxxyxvwtuxwwxztswvuttvwxxwxzsvywzyv1wzxyvyxxzvxx0wwutuw22wwtvtwtyvvvvv3z,VV__VV_______________VVV_VV___VV_VVVV_VVVV___VV_____VW______VV_______VVV__VVV___VV_VVVVVVV___VVVaab_VW_______VV_VVVVWXYWWbXWXgqWcdcVWeZWfXfYadXdelnbaWczemqligoqt5tlvv20tp832sjdbcZbdeihihqulccddceaaccZYZaceZdejsibdkloolmpkcfddkkikgkkggebihgingccbbZZbdehljd&chtt=red:+net+increase,+green:+net+decrease,+navy:+net+change+from+merges,+purple:+abs+change.+y+axis+is+sign(%24x)+*+10**(abs+%24x)+lines.+orange:+commits+%2F+100&chxl=0:|1989|1990|1991|1992|1993|1994|1995|1996|1997|1998|1999|2000|2001|2002|2003|2004|2005|2006|2007|2008|2009&chxp=0,4.7,9.4,14.1,18.8,23.4,28.1,32.8,37.5,42.2,46.9,51.6,56.2,60.9,65.6,70.3,75.0,79.7,84.4,89.1,93.8,98.4&chxr=1,-3.5,6.5&chxt=x,y

git isn't handist (which I approve of) but it has this "interesting" problem
that git log shows you all the log entries, from both the left and right
parents of each merge commit. So I added a heuristic to pick one linear path
back, by choosing at each merge choose to trace back on the parent with the
smaller diff. It all works on in the end, because all the branches (that
matter) fork off the 1.0 checkin somewhere, and I'm after the diffs between
each pair of commits on a linear path. To check this isn't crazy, the net
diff from the child whichever parent was chosen is shown in purple. It is
never close to the total net code change for the month.

Problem is that there are some big spikes the swamp the rest, such as net
200,000 lines added in March 2002 by Encode, so I felt that I needed to go for
a logarithmic scale. It's somewhat cheating - because one can't have
fractional values of commit, negative numbers are logarithmic net deletions
(where needed)

Rather than having one line go wildly positive and negative for net code
change, I elected to split it into red for months of code addition (the usual)
and green for months of net code deletion (sometimes happens). There's a
purple line for total code churn (lines added + lines deleted), and an acme
approved orange line for the total commits, scaled to be 0-7 like everything
else.

I think that the little spike in total churn on the penultimate month plotted
is me moving things around in ext/

My general conclusions are that

1: Our data are very noisy
2: The switch to git hasn't (yet) had any real effect on the number of commits,
or the size of the commits.

Nicholas Clark

#!/usr/bin/perl -w

use strict;

my (%count, %ins, %del, %merge_in, %merge_out);
if (@ARGV) {
while (<>) {
my ($month, @data) = /^([^:]+):\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/;
die $_ unless defined $month;

$count{$month} = shift @data;
$ins{$month} = shift @data;
$del{$month} = shift @data;
$merge_in{$month} = shift @data;
$merge_out{$month} = shift @data;
}
} else {
open(LOG, "git log -z --parents --shortstat --pretty=fuller --date=iso origin/blead|") || die;

my $looking_for = `git rev-list -n 1 HEAD`;
chomp $looking_for;

local $/ = "\0";
my $shortstat = qr/^ \d+ files changed, (\d+) insertions\(\+\), (\d+) deletions\(-\)/m;

while (<LOG>) {
chomp;
my ($commitline) = /\Acommit +([ 0-9a-f]+)/;
my ($month) = /^CommitDate:\s*(\d{4}-\d{2})/m;
die $_ unless defined $commitline && defined $month;

my ($self, @parents) = split / /, $commitline;
next unless $self eq $looking_for;

$count{$month}++;
if (@parents == 1) {
my ($ins, $del) = /$shortstat/o;
if (defined $del) {
$ins{$month}+= $ins;
$del{$month}+= $del;
}
$looking_for = $parents[0];
print STDERR "From $self to $looking_for\n";
} elsif (@parents == 0) {
print STDERR "Stopping at $self\n";
} else {
my (%delta, %in, %out);
foreach my $try (@parents) {
my $stats = `git diff --shortstat $try $self`;
unless (defined $stats) {
warn "Problem for $try $self\n$_";
next;
}
my ($ins, $del) = $stats =~ /$shortstat/o;
unless (defined $ins) {
print STDERR "No diff $try $self for:\n$_";
$ins = $del = 0;
}
$delta{$try} = $ins + $del;
$in{$try} = $ins;
$out{$try} = $del;
}
my ($best, @rest) = shift @parents;
my $max = $delta{$best};

foreach my $try (@rest) {
next unless $delta{$try} < $max;
$delta{$try} = $max;
$best = $try;
}
$merge_in{$month}+= $in{$best};
$merge_out{$month}+= $out{$best};
$looking_for = $best;
print STDERR "At $self from @parents pick $best\n"
}
}

foreach (sort keys %ins) {
printf "$_: $count{$_} $ins{$_} $del{$_} %d %d\n",
$merge_in{$_}||0, $merge_out{$_}||0;
}
}

# fill in missing months
my @m = sort keys %count;
my @m_all = ($m[0]);
while ($m_all[-1] ne $m[-1]) {
push(@m_all, next_month($m_all[-1]));
}
@m = @m_all;

# calculate year label positions
my @lab;
my @pos;
for (my $i = 0; $i < @m; $i++) {
my($y) = split(/-/, $m[$i]);
if (!@lab || $lab[-1] != $y) {
push(@lab, $y);
push(@pos, sprintf "%.1f", $i / @m * 100);
}

}
shift(@lab); shift(@pos); # drop the first one

# Drop the last month as it's probably incomplete.
pop @m;

sub loggit {
map {
defined $_ ? do {
my $sign = $_ <=> 0;
$sign ? $sign * log(abs $_) / log(10): 0;
} : undef;
} @_;
}

sub only_pos {
map { defined $_ && $_ > 0 ? $_ : undef } @_;
}

sub invert {
map { defined $_ && -$_ } @_;
}

# Ensure that the value before every (defined) value is defined and zero.
# Somewhat a con, as Google won't plot single points, and zero logically is
# correct for all the missing values, but the graph gets messy if everything
# is running alone the zero line.

sub smear {
my @vals = (0, @_);
for my $i (1 .. $#vals) {
$vals[$i-1] ||= 0 if defined $vals[$i] and !defined $vals[$i + 1];
}
shift @vals;
@vals;
}

require URI::GoogleChart;

my @count = map $count{$_} ? $count{$_} / 100 : undef, @m;
my @merges = map {$merge_in{$_} ? $merge_in{$_} - $merge_out{$_} : undef} @m;
my @net = map {($ins{$_} || $merge_in{$_})
? ($ins{$_}||0 + $merge_in{$_}||0) - ($del{$_}||0 + $merge_out{$_}||0)
: undef} @m;
my @abs = map {($ins{$_} || $merge_in{$_})
? ($ins{$_}||0 + $merge_in{$_}||0) + ($del{$_}||0 + $merge_out{$_}||0)
: undef} @m;

my $uri = URI::GoogleChart->new("lines", 1000, 300,
title => 'red: net increase, green: net decrease, navy: net change from merges, purple: abs change. y axis is sign($x) * 10**(abs $x) lines. orange: commits / 100',
color => [qw(red green navy purple FE9900)],
data => [[smear(loggit(only_pos(@net)))],
[smear(loggit(only_pos(invert(@net))))],
[smear(loggit(@merges))],
[smear(loggit(@abs))],
[smear(@count)],


],
range_round => 1,
range_show => "left",
encoding => "s",

# no direct support for labels yet


chxt => "x",
chxl => join("|", "0:", @lab),
chxp => join(",", 0, @pos),

);

system("/usr/bin/open", $uri) if -x "/usr/bin/open";

print STDERR "$uri\n";

Joshua ben Jore

unread,
Apr 6, 2009, 12:35:05 PM4/6/09
to Gisle Aas, demerphq, perl5-...@perl.org
On Sun, Apr 5, 2009 at 3:32 PM, Nicholas Clark <ni...@ccl4.org> wrote:
> My general conclusions are that
>
> 1: Our data are very noisy
> 2: The switch to git hasn't (yet) had any real effect on the number of commits,
>   or the size of the commits.

It did make cherrypicking for the DarkPAN much easier.

Josh

Nicholas Clark

unread,
Apr 6, 2009, 12:36:13 PM4/6/09
to Joshua ben Jore, Gisle Aas, demerphq, perl5-...@perl.org

What do you mean? Making a private patched version for work?

Nicholas Clark

Joshua ben Jore

unread,
Apr 6, 2009, 8:58:49 PM4/6/09
to Joshua ben Jore, Gisle Aas, demerphq, perl5-...@perl.org

Yes, prior to just settling on a .deb with 5.10.0 + a pile of modules,
part of the build instructions were "check out maint-5.x, then
cherrypick a list of shas".

In particular, I started with 5.8.7 plus several patches I was able to
locate through git-fu to help get it built on the Ubuntu 64 bit
machine I was targetting.

Later, I moved to 5.10.0 + some commits from maint-5.10. Again,
locating and integrating the commits through simple git-fu was a
breeze.

Now we're just plain old 5.10.0 but could easily incorporate things in
if needed.

Josh

0 new messages