Re: codeq: add support for importing from github, importing tags & branches

84 views
Skip to first unread message

Rich Morin

unread,
May 19, 2013, 7:08:19 PM5/19/13
to clo...@googlegroups.com
On May 18, 2013, at 16:36, Dan Burkert wrote:
> Attached are two patches for codeq. The first adds support
> for importing repositories into codeq directly from github
> through the github API, as well an improved CLI for codeq
> (necessary for specifying a github import). The second
> patch builds on the first and adds the ability to import git
> ref types (tags and branches) into codeq.

Very cool! These look like very useful extensions.

I've been speculating about the possibility of using Storm to
run a production Codeq installation. Basically, it would be
called into action whenever a monitored event (eg, git commit
or push, test run, Clojars update) occurred.

I have a few specific questions, but more generally, I'd like
to know what you've found out in extending and using Codeq:

What mechanisms are you using to manage and run Codeq?

What support would you want in a production release?

What other import facilities would you like to have?

Like that...

-r

--
http://www.cfcl.com/rdm Rich Morin
http://www.cfcl.com/rdm/resume r...@cfcl.com
http://www.cfcl.com/rdm/weblog +1 650-873-7841

Software system design, development, and documentation


Dan Burkert

unread,
May 19, 2013, 10:55:21 PM5/19/13
to clo...@googlegroups.com
On Sunday, May 19, 2013 7:08:19 PM UTC-4, Rich Morin wrote:
On May 18, 2013, at 16:36, Dan Burkert wrote: 
 
  What mechanisms are you using to manage and run Codeq? 
 
Right now I don't do a whole lot with Codeq (beyond working with its internals).  I've had the idea (like a lot of people, I think) of pulling in the corpus of clojure projects on github and putting a nice web interface in front of it.  Having support for analyzing repositories directly from github makes this significantly easier, since you then don't have to actually store the repositories anywhere.  As far as putting storm in front of codeq, I think it would be possible, but probably overkill.  I don't have any direct experience with storm, but I am aware of its use cases, and I have a lot of Hadoop experience.  I just don't think the data set size is nearly big enough to warrant the distribution.  It would probably only take a few days for a codeq instance to crawl github and import the clojure projects that 99% of potential users would want to see.  Once the initial import is done, taking care of updates would relatively easy.  I would ballpark that there aren't more than 1000 commits to important clojure projects on github daily.  I may be way off-base though.
 
  What support would you want in a production release?
 
#1 - A better analyzer.  If codeq could analyze down to the s-expression level and determine what function is being called in each expression it would open up a world of opportunities.  Many smart people have discussed this, and I'm not sure I really have anything to add about the feasibility or how it could be done.

#2 - Building on #1, determine not only what function is being called in each expression (i.e., the namespace and symbol), but also what git repository it came from and what commit in that repository.  Obviously this requires, at a minimum, the ability to parse the dependency information from the project.clj.  It would probably also require Clojars integration to determine the git repository and commit from the version.


  What other import facilities would you like to have? 

When codeq was first released there were thoughts of substituting the shelling out behavior with a faster way to read git repository data.  That should be significantly easier now with the repository protocol.  I'm not going to tackle it myself because I don't need local imports to be faster, but its an open problem.

I think there is a lot of room to pull in more metadata about repositories, and that could be very useful for certain use cases.  For example, this commit in my codeq fork adds a parent attribute to repository entities, which is a pointer to where the repository was forked from.  This is easy to get through the github API; for local repos it uses the "upstream" remote, as that seems to be somewhat standard (at least as standard as treating the "origin" remote as the uri for the repo).  Obviously not all repositories are forks, so not all repositories will have parents.  If a repo does have a parent which is not already imported, the import fails (this could be changed though).

Finally, I forgot to mention in my original email that you should not import projects into an already existing codeq database with the patches, because I changed the format of repository URIs.  Instead of using the raw address of the "origin" remote, i.e., "https://github.com/Datomic/codeq.git", I instead use a transformed version, "github.com/Datomic/codeq".  I feel this is a better way of doing it, because the following are all valid git URIs, and mixing them could result in multiply importing a project:

g...@github.com:Datomic/codeq.git
g...@github.com:Datomic/codeq

With my patch all of these URI's are transformed to "github.com/Datomic/codeq", so it is impossible to multiply import a repo.

-- Dan

Dan Burkert

unread,
May 19, 2013, 11:06:30 PM5/19/13
to clo...@googlegroups.com
One more thing before I go to bed -- today I made a fix for what I would consider to be a bug in codeq importing.  In the case where you are importing a fork of a project that you have already imported, the forked project is not associated with the commits already in the parent repo.  In the attached diagram this corresponds to the purple area.  A more specific example:  I import Datomic/codeq into my codeq db, which results in ~50 imported commits.  I then import my fork, danburkert/codeq, which results in ~30 new commits.  The original 50 commits are not associated with my fork, so they will not appear in a query asking for all the commits of danburkert/codeq.  This commit fixes the issue, I can make a patch if there is interest.

-- Dan
codeq-import.svg
Reply all
Reply to author
Forward
0 new messages