Scraping all methods/functions in a Scala project

86 views
Skip to first unread message

Adelbert Chang

unread,
Feb 18, 2015, 3:01:07 PM2/18/15
to scala-i...@googlegroups.com
Hey all,

I'm currently trying to scrape/crawl the symbol table of the compiler for the non-synthetic methods/functions and accompanying type signatures for a Scala (SBT ?) project. Asking around I was told the compiler plugin may be a good first avenue to explore, so I'm playing around with that and hooking in right after the `typer` phase.

I'm looking at the root mirror and inspecting the info.decls field of it, something along the lines of:

def newPhase(prev: Phase): Phase =
  new StdPhase(prev) {
    def apply(unit: CompilationUnit): Unit = {
      val decls = global.RootClass.info.decls
      val scalaz = decls.find(_.toString contains "scalaz")
      println(scalaz.get.info.decls.filter(x => !x.hasMeaninglessName))

And it dumps quite a bit of stuff, but.. I still get quite a fair bit of weird output like class anonfun$reduceUnordered$1 extends ;

Wondering if I'm going in the right direction at all, and if so what should I be looking more at to just get the type signatures of methods in the project?

Thanks!

Eugene Burmako

unread,
Feb 18, 2015, 3:05:24 PM2/18/15
to <scala-internals@googlegroups.com>
In the compiler plugin, you get your Phase.apply method called on a number of CompilationUnit objects which represent one-by-one all files in the compilation run. 

Traversing through CompilationUnit.body of those objects with a Traverser and matching against all DefDef nodes will give you all methods defined in the project. Looking into Tree.symbol.info of those nodes will give you their signatures.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adriaan Moors

unread,
Feb 18, 2015, 3:42:02 PM2/18/15
to scala-i...@googlegroups.com
I experimented with something similar a way ago using the repl's power mode (should work in sbt console) to get all members under scala.collection. 

There are some issues with reflection that can cause you to get the erased signature, but I can't seem to remember the details. Jason, you figured it out last time, maybe you remember?

:power

def explode(p: Symbol): Stream[Symbol] = {
  if (p.isClass)  p.info.decls.toStream.flatMap(explode)
  else if (p.isPackage || p.isModule) explode(p.moduleClass)
  else Stream(p)
}

val coll = findMemberFromRoot(TermName("scala.collection"))
val mems = explode(coll.moduleClass)

object dealiasFull extends TypeMap {
  def apply(tp: Type): Type = mapOver(tp.dealias)
}

mems filterNot (mem => mem.isSynthetic || mem.isArtifact ) foreach (mem => println(mem.defString))

Grzegorz Kossakowski

unread,
Feb 18, 2015, 4:22:18 PM2/18/15
to scala-internals
Hi Adelbert!

The `API` phase of incremental compiler does exactly that.


On 18 February 2015 at 21:01, Adelbert Chang <adel...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Grzegorz Kossakowski
Scalac hacker at Typesafe
twitter: @gkossakowski

Haoyi Li

unread,
Feb 18, 2015, 6:51:19 PM2/18/15
to scala-internals

Jason Zaugg

unread,
Feb 18, 2015, 11:09:23 PM2/18/15
to scala-i...@googlegroups.com
On Thu, Feb 19, 2015 at 7:21 AM, Grzegorz Kossakowski <grzegorz.k...@gmail.com> wrote:
Hi Adelbert!

The `API` phase of incremental compiler does exactly that.

The solutions presented so far differ as to whether they enumerate defintions in the current compilation run (as is done in xsbt/API.scala), or whether they enumerate the entire contents of a package, including separately compiled classes (as is done in Adriaan's snippet.)

I've tweaked Adriaan's code to cut out some noise. The key part is to exclude non-Scala classes.

Some background:

Top level symbols in Scala come in four flavours:
  1. Backed by a tree in a .scala source file
  2. Backed by a tree in a .java source file
  3. Backed by a classfile that contains Scala pickled type information stored in a custom attribute
  4. Backed by any other class file
(Actually, there are also package symbols, and runtime reflection symbols, but for this discussion those aren't relevant.)

When we have the products of a previous scalac run in the current compilation classpath, we end up with Symbols that presuppose that, e.g. Foo$class originates from javac, ie is in category #4. Same applies to lifted anonymous functions/classes.

We could exclude them in a few ways:
  1. As I have done here, just prune all Java symbols. This only works because we know the entire package we are enumerating is Scala defined
  2. Excluding classes with '$' in the name. (But, this will get rid of the decoded names of `class ::`.
  3. Poking into the bytecode of the class to see if it is actually an inner or local class. The compiler's classpath implementation doesn't do this as it tries to avoid parsing class files eagerly, that is only done if and when the class is referred to in code.
-jason

Adelbert Chang

unread,
Feb 19, 2015, 1:03:56 AM2/19/15
to scala-i...@googlegroups.com
Woah so many responses, this is awesome :-)

I think I'll need an approach that allows me to do this programatically, if this is at all possible. I think ideally I'd like to point it at some source files, or maybe some class files, and hopefully even just a JAR, give it some namespace prefix (e.g. "scala.collection") and get access to the functions/methods that reside in there. Ideally I'd be able to canonicalize the output as well.. for instance something like

drop: List[A] => Int => List[A]
fill: Int => ( => A) => List[A]

Adriaan and Jason's code looks quite interesting, but it looks like it relies on the special `Symbol` import you get from dropping into power mode in the REPL - is it possible to do something similar programatically? 

Haoyi's Pressy example also looks very interesting.. will explore that front.

Jason Zaugg

unread,
Feb 19, 2015, 1:33:01 AM2/19/15
to scala-i...@googlegroups.com
On Thu, Feb 19, 2015 at 4:03 PM, Adelbert Chang <adel...@gmail.com> wrote:
Woah so many responses, this is awesome :-)

I think I'll need an approach that allows me to do this programatically, if this is at all possible. I think ideally I'd like to point it at some source files, or maybe some class files, and hopefully even just a JAR, give it some namespace prefix (e.g. "scala.collection") and get access to the functions/methods that reside in there. Ideally I'd be able to canonicalize the output as well.. for instance something like

drop: List[A] => Int => List[A]
fill: Int => ( => A) => List[A]

Adriaan and Jason's code looks quite interesting, but it looks like it relies on the special `Symbol` import you get from dropping into power mode in the REPL - is it possible to do something similar programatically? 

Haoyi's Pressy example also looks very interesting.. will explore that front.

If you run as a compiler plugin you can access the same APIs as in :power mode.

Otherwise, you can spin up a new compiler instance if you have a list of JARs to feed in as a classpath, as shown https://gist.github.com/retronym/f80d125e99fdbb1389e6.

-jason 

Reply all
Reply to author
Forward
0 new messages