No output with small GitHub dataset?

30 views
Skip to first unread message

Anne Ore

unread,
May 9, 2016, 4:02:22 PM5/9/16
to Boa Language and Infrastructure User Forum
I've found several examples where there is output for "2013 September/SF (small)", but not for "2015 September/GitHub (small)". This is the case for at least the following examples:
  • In which year was SVN added to Java projects the most?
  • What are the five most supported operating systems?
  • What are the 5 largest projects, in terms of AST nodes?
  • What are the five most popular databases?
Some others, like "What are the ten most used programming languages?", provide output for both small datasets. 

Why is this? Is this expected, or is something broken? 

Anne Ore

unread,
May 9, 2016, 4:10:59 PM5/9/16
to Boa Language and Infrastructure User Forum
Related - the output for "How many projects are created each year?" is strange for the GitHub dataset, but not the SF dataset. 

Excerpt from GH output, which goes on for many lines:

counts401831
counts402881
counts403301
counts403511



SF output:

counts19997
counts200056
counts200197
counts2002140
counts2003156
counts2004190
counts2005222
counts2006304
counts2007252
counts2008301
counts2009417
counts2010746
counts20112214
counts20121927

Robert E Dyer

unread,
May 9, 2016, 5:14:28 PM5/9/16
to boa-...@googlegroups.com
Hi Anne,

Any query on GitHub looking for SVN data won’t find results as Github is only Git data.

Also other queries like those on oses/databases, Github does not provide such information.

And the GH dataset has a bug in the project creation time (it is 1k bigger than it should be), leading to strange looking years.

So everything you have seen is expected/known!

- Robert

--
More information about Boa: http://boa.cs.iastate.edu/
---
You received this message because you are subscribed to the Google Groups "Boa Language and Infrastructure User Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to boa-user+u...@googlegroups.com.
To post to this group, send email to boa-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

________________________________________________
Robert Dyer | Assistant Professor | Department of Computer Science
BGSU | rd...@bgsu.edu | 419.372.3469 | 244 Hayes | Bowling Green, OH

Want to mine ultra-large-scale software repositories with minimal initial
investment? Check out Boa! http://boa.cs.iastate.edu/

Reply all
Reply to author
Forward
0 new messages