Hello,
I wrote a script (below) to collect all GitHub projects that contain a specific file extension.
I run it on the dataset "2015 August/GitHub", however if I compare the results I get with the GitHub Search feature, I see that there are some missing projects in the BOA output.
I'd like to know whether the dataset is just a portion of GitHub or BOA limits the number of projects that can be retrieved.
p: Project = input;
o: output set [string] of string;
repos := p.code_repositories;
for (i := 0; i < len(repos); i++) {
repo := repos[i];
for (j := 0; j < len(repo.revisions); j++) {
revision := repo.revisions[j];
if (hasfiletype(revision, `atl`))
o [p.id] << p.project_url; }
}
cheers
Valerio