Next Scoobi release 0.7.0

63 views
Skip to first unread message

Eric Torreborre

unread,
Feb 20, 2013, 12:18:22 AM2/20/13
to scoobi...@googlegroups.com
Hi all,

The github master branch, now contains a refactored version of Scoobi, harnessing the power of the Kiama library to do some of its processing (hence removing a few hundreds of lines of code).

Most of the API stays the same (or has been extended as in #194 for example) but there are some changes around the "Persist" API. Mostly you can now specify sinks on DLists directly and use the "persist" and "run" methods to save the data to file or retrieve it in memory:

     val list: DList[Int]    = DList(1, 2, 3)
     val plusOne: DList[Int] = list.map(_ + 1)

     // the sum of all values
     val sum: DObject[Int] = list.sum
     // the max of all values
     val max: DObject[Int] = list.max

    // execute the computation graph for the 2 DObjects and one DList (save the DList result to a file)
    persist(sum, max, plusOne.toTextFile("plus_one.txt"))

    // collect results
    // (6, 3, Seq(2, 3, 4))
    (sum.run, max.run, plusOne.run)

Please read the corresponding User Guide page for more details.

Also, since this is a vast refactoring, and despite a growing number of tests, this new version has not yet been battle-tested and is likely to choke on some applications. We are going to migrate progressively our applications to this new version but any issue you could report from your own testing will be valuable as well.

Thanks,

Eric.

Eric Torreborre

unread,
Feb 20, 2013, 10:42:43 PM2/20/13
to scoobi...@googlegroups.com
I must also mention that one motivation for this refactoring was to be able to put more computations into a lower number of MapReduce jobs. As a result, you should see some performance improvements.

But with any kind optimisation this needs to be actually backed-up by experiment! (starting with this issue). If anyone has already done such benchmarks and is willing to open-source them, that'd be really helpful.

In any case please report any performance improvement / degradation.

Thanks,

Eric.
Reply all
Reply to author
Forward
0 new messages