How to locate Scala feature stats

50 views
Skip to first unread message

Jing Vergara

unread,
Nov 11, 2013, 12:11:06 PM11/11/13
to boa-...@googlegroups.com
Hi,

I'd like to locate the number of classes and interfaces for Scala programs. I tried doing this with this source but it's still returning output for other languages. What am I missing?

Thanks,
Jing

# Computes Number of Attributes (NOA) for each project, per-type
# Output is: NOA[Project Language][Project Name][TypeName][Type Kind] = NOA value
p: Project = input;
NOA: output sum[string][string][string][string] of int;

visit(p, visitor {
    # only look at the latest snapshot
    before n: Project -> ifall (i: int; !match(`^scala$`, lowercase(n.programming_languages[i]))) stop;
    before n2: CodeRepository -> {
        if (hasfiletype(n2, "scala")) {
        snapshot := getsnapshot(n2);
        foreach (i: int; def(snapshot[i]))
            visit(snapshot[i]);
        stop;
        }
    }
    before node: Declaration -> {
        if (node.kind == TypeKind.CLASS)
            NOA[p.programming_languages[0]][p.name][node.name]["class"] << len(node.fields);
        if (node.kind == TypeKind.ANNOTATION)
            NOA[p.programming_languages[0]][p.name][node.name]["annotation"] << len(node.fields);
        if (node.kind == TypeKind.ANONYMOUS)
            NOA[p.programming_languages[0]][p.name][node.name]["anonymous"] << len(node.fields);
        if (node.kind == TypeKind.DELEGATE)
            NOA[p.programming_languages[0]][p.name][node.name]["delegate"] << len(node.fields);
        if (node.kind == TypeKind.ENUM)
            NOA[p.programming_languages[0]][p.name][node.name]["enum"] << len(node.fields);
        if (node.kind == TypeKind.GENERIC)
            NOA[p.programming_languages[0]][p.name][node.name]["generic"] << len(node.fields);
        if (node.kind == TypeKind.INTERFACE)
            NOA[p.programming_languages[0]][p.name][node.name]["interface"] << len(node.fields);
        if (node.kind == TypeKind.OTHER)
            NOA[p.programming_languages[0]][p.name][node.name]["other"] << len(node.fields);
        if (node.kind == TypeKind.STRUCT)
            NOA[p.programming_languages[0]][p.name][node.name]["struct"] << len(node.fields);
    }
});

Robert Dyer

unread,
Nov 12, 2013, 1:03:01 AM11/12/13
to boa-...@googlegroups.com
Hi Jing,

I believe you are running into a combination of two issues:

1) Projects manually set what programming languages they use and can select more than 1 language.  As such, some projects that select Scala (probably) also select Java, etc.  So your source code analysis would include non-Scala code as well, if those projects have it.

2) Currently our dataset only has source code information for Java source files (.java file extensions).

So these two issues, acting together, mean you are mining the *Java* source code from *Scala* projects!

We are hoping to add support for Scala source code in the near future.  We just need a good (read: fast and robust to errors) parser for Scala files! :-)

Note that in the future, if you want to filter a specific kind of file[1], you can do that by adding a visitor for ChangedFile.  For example:

  before node: ChangedFile -> if (node.kind != FileKind.SOURCE_JAVA_JLS2) stop;

You can also make use of the 'iskind' function[2] to easily match several kinds (in this case, any parseable Java file):

  before node: ChangedFile -> if (!iskind("SOURCE_JAVA", node.kind) stop;

- Robert

Reply all
Reply to author
Forward
0 new messages