What is a project and what is a repository in Boa?

25 views
Skip to first unread message

chupanw

unread,
Apr 11, 2016, 8:46:11 PM4/11/16
to Boa Language and Infrastructure User Forum
The Dataset Statistics of Boa shows that Github September 2015 Full dataset has 7,830,023 projects but only 380,125 code repositories. It looks like each project on Github should at least correspond to one repository, so I would expect the number of projects to be equal or greater than the number of code repositories. What exactly is a project (repository) in the Boa dataset? Or did I miss something in the documentation? If that's the case, any pointer will be very much appreciated.

Thanks! 

Robert E Dyer

unread,
Apr 11, 2016, 11:34:34 PM4/11/16
to boa-...@googlegroups.com
Sorry for the confusion.  The terminology is exactly what you are picturing.  We just didn’t manage to clone every project on Github (yet).  Instead we focused first on cloning the Java projects only, and hence we have around 380k repositories cloned.

Future datasets should close this gap and eventually, for Github datasets, it should be a 1:1 correlation.

- Robert

On Apr 11, 2016, at 8:46 PM, chupanw <chupa...@gmail.com> wrote:

The Dataset Statistics of Boa shows that Github September 2015 Full dataset has 7,830,023 projects but only 380,125 code repositories. It looks like each project on Github should at least correspond to one repository, so I would expect the number of projects to be equal or greater than the number of code repositories. What exactly is a project (repository) in the Boa dataset? Or did I miss something in the documentation? If that's the case, any pointer will be very much appreciated.

Thanks! 

--
More information about Boa: http://boa.cs.iastate.edu/
---
You received this message because you are subscribed to the Google Groups "Boa Language and Infrastructure User Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to boa-user+u...@googlegroups.com.
To post to this group, send email to boa-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

________________________________________________
Robert Dyer | Assistant Professor | Department of Computer Science
BGSU | rd...@bgsu.edu | 419.372.3469 | 244 Hayes | Bowling Green, OH

Want to mine ultra-large-scale software repositories with minimal initial
investment? Check out Boa! http://boa.cs.iastate.edu/

chupanw

unread,
Apr 12, 2016, 1:15:01 AM4/12/16
to Boa Language and Infrastructure User Forum
Thanks a lot for the prompt reply. Now those numbers make much more sense.
Reply all
Reply to author
Forward
0 new messages