Rank developer by touch of file ?

60 views
Skip to first unread message

Cedric Teyton

unread,
Nov 19, 2013, 1:39:49 AM11/19/13
to boa-...@googlegroups.com
Hello,

I was wondering whether BOA allows to focus on certain developers expertise.
For instance, assume i want the top-10 commiters that touched the most Java files. (or even POM.xml files).

My intuition is BOA can handle it somehow, i'm just not familiar enough to answer these questions.

Thank you for you help !

Best,

Cédric Teyton

Hoan Nguyen

unread,
Nov 19, 2013, 1:18:55 PM11/19/13
to boa-...@googlegroups.com
Hi Cedric,

This Boa program could be a starting point for answering your question.
It counts the number of files each committer has touched. Currently in our data, the committers' names are anonymized.

Please refer to the Programming Guide http://boa.cs.iastate.edu/docs/index.php if you want to change the file matching or use other data types to filter the "touch".

Hope this help.
Best,
Hoan

Robert Dyer

unread,
Nov 19, 2013, 5:26:49 PM11/19/13
to boa-...@googlegroups.com
I believe Hoan gave the wrong URL, here is the correct one:


And just to build on what he provided, here is my version:


This job computes the top-10 committers.  It is also coded to easily modify what file names you match against (the array 'searchNames' at the top).

The basic intuition is to iterate over each code repository (line 6), then over each revision (line 7), then over each file (line 8), and if a file matches (line 9) any of the provided regex's (line 4) then we increase the count for that particular user.  This is done by emitting their name and a weight of '1' (line 10).  The top aggregator (line 2) then groups by each user's name, sums all their weights, sorts by the total weight, and gives the top (in this case, 10) results.

- Robert

Robert Dyer

unread,
Nov 19, 2013, 5:32:11 PM11/19/13
to boa-...@googlegroups.com
And here is the same query, but using our visitor syntax:


I personally think this is easier to understand! :-)

- Robert

Robert Dyer

unread,
Nov 19, 2013, 6:15:08 PM11/19/13
to boa-...@googlegroups.com
A couple of minor changes to my previous example:


1) I made the match case-insensitive (by lowercasing everything)
2) The searchNames are regex strings, so I escaped the period '.' to make it more correct

- Robert

Cedric Teyton

unread,
Nov 20, 2013, 3:00:04 AM11/20/13
to boa-...@googlegroups.com
Thank you for being so responsive ! :)

I understand your answers and i think the answer.
Just a last question, i see in your domain-specific type an entry
"Person A unique person's information".

I just wonder what is the relation with the commiter's name you
extract in this query ?

Best,

Cédric

Robert Dyer

unread,
Nov 20, 2013, 10:39:03 AM11/20/13
to boa-...@googlegroups.com
In my last query, you see in the visitor for the Revision type there is 'rev.author.username'.  In this expression, 'rev' is the Revision, the attribute 'author' gives you a Person (the Person who authored the commit) and the attribute 'username' is from that type Person, giving you that Person's user name.

- Robert
Reply all
Reply to author
Forward
0 new messages