Bumi, a project for loading a git repo into a Titan graph database

144 views
Skip to first unread message

Zack Maril

unread,
Feb 21, 2013, 11:00:03 AM2/21/13
to clo...@googlegroups.com
https://github.com/zmaril/bumi

Bumi loads a git repo into a Titan graph database[0]. You can then ask questions about the history of the project with Faunus[1]. I've successfully loaded the Linux kernel onto an AWS instance. I'm working now to start asking good questions and see if I can't find out anything neat. Any suggestions about paths of investigation to pursue would fall on open ears. I'm open sourcing this now in case anybody wants to play around with it. In theory, it should work with any git repo. In practice, I've only tested it on the Linux kernel so far. It's written in Clojure, using Hermes[2]. 

-Zack

Rich Morin

unread,
Feb 21, 2013, 11:19:59 AM2/21/13
to clo...@googlegroups.com
On Feb 21, 2013, at 08:00, Zack Maril wrote:
> Bumi loads a git repo into a Titan graph database[0]. You can then ask questions
> about the history of the project with Faunus[1]. I've successfully loaded the
> Linux kernel onto an AWS instance. I'm working now to start asking good questions
> and see if I can't find out anything neat. Any suggestions about paths of
> investigation to pursue would fall on open ears. ...

Rich Hickey's Codeq project

http://blog.datomic.com/2012/10/codeq.html

extracts both metadata and "code quanta" (semantically meaningful code snippets)
from Git repos. This allows it to answer questions about (say) code churn and hot
spots in terms of individual functions, etc. It also (implicitly) opens the door
to other information sources (eg, dynamically harvested metadata).

I would urge you to consider (a) what can be learned from Codeq and (b) whether
any sort of cooperation and/or interoperability might be possible.

-r

--
http://www.cfcl.com/rdm Rich Morin
http://www.cfcl.com/rdm/resume r...@cfcl.com
http://www.cfcl.com/rdm/weblog +1 650-873-7841

Software system design, development, and documentation


Zack Maril

unread,
Feb 21, 2013, 12:34:17 PM2/21/13
to clo...@googlegroups.com
codeq looks fantastic and I've looked into using it before. The project seems to have undergone a flurry of activity last October/November and then nothing has really happened with it since then. I haven't seen anybody actually do anything impressive with it, so I decided to write Bumi instead. If you can point me towards an example of someone doing something nontrivial with the project, then I'd happily reconsider starting another code analysis project in clojure. codeq looks fantastically powerful from the outside, but nobody has done anything yet with it that would actually exhibit this perceived power. Which worries me, since if it were so powerful, somebody would have easily done something neat with it by now and talked about it. The lack of results implies to me that it might not be as powerful or useful as people say. 
-Zack

Rich Morin

unread,
Feb 27, 2013, 3:35:29 AM2/27/13
to clo...@googlegroups.com
On Feb 21, 2013, at 09:34, Zack Maril wrote:
> codeq looks fantastic and I've looked into using it before. The project seems to
> have undergone a flurry of activity last October/November and then nothing has
> really happened with it since then.

Work continues, but getting from Rich Hickey's blog and demo code to a production
system (or even a splashy demo :-) is going to take some Real Work (TM).

I know that a few folks (at least) are playing with ideas and code, but I don't
know of any coordinated project. (I would love to pull together such an effort;
if folks are interested, please get in touch!)

Meanwhile, here are some other excuses for the lack of visible progress:

* Although Rich has given a couple of talks on Codeq, no videos are online.
So, the "official" exposure is mostly limited to a blog entry and a demo.

* Moving Codeq from a proof of concept demo to a production app is not a small
project. So, some possible contributors may have been scared off.

* The codeq mailing list has a configuration problem which keeps new submissions
from being accepted. So, some folks may have been unable to participate.

* I imagine that Rich has had more pressing issues to deal with (:-).

* I've been thinking and writing quite a bit about Codeq, but I haven't had the
clues, time, and tuits to create a compelling demo for a mass audience.


> Which worries me, since if it were so powerful, somebody would have easily done
> something neat with it by now and talked about it.

I actually found Rich's demo to be quite compelling, demonstrating major advances
over the state of the art in production documentation generators. Specifically:

* The use of a database for storage of harvested data allows arbitrary queries
to be made about the code base, encourages post-processing and analysis, etc.

* The use of Datomic allows queries to consider the code base's state over time.

* The use of data from multiple sources (specifically, analyzers and Git metadata)
breaks with the typical "data silo" approach to documentation generators. I'm
VERY interested in the possibility of adding other data sources.

* The concept of a "code quantum" is both novel (AFAIK) and useful (IMNSHO). It
allows queries to be made about semantically interesting chunks of code, rather
than files, sets of lines, etc.

So, despite the fact that the demo uses an extremely limited Clojure analyzer, it
can report on:

* codeqs (eg, functions) that have had a lot of churn

* codeqs that were modified during a specified period


> I haven't seen anybody actually do anything impressive with it, so I decided to write
> Bumi instead.

I'm pretty agnostic about implementation specifics, but I'd hate to see Rich's design
decisions (and their benefits) go away.


FWIW, most of my Codeq-related thoughts are written up (or linked from) these pages:

http://wiki.cfcl.com/bin/view/Projects/Codeq/SS
http://wiki.cfcl.com/bin/view/Projects/Codeq/WebHome
Reply all
Reply to author
Forward
0 new messages