Re: [disco-dev] roadmap?

67 views
Skip to first unread message

Prashanth Mundkur

unread,
Nov 13, 2012, 2:20:23 PM11/13/12
to disc...@googlegroups.com

Answering your questions in a slightly different order:

On 08:21 Tue 13 Nov, Bart van Deenen wrote:

> Another thing; disco has over a hundred open bugs on github, many of them
> are months (or years) old. Is closing these a manpower issue, or has the
> project spread into many independent clones, where everyone fixes just
> their own bugs?

Yes, this is a manpower issue. Otoh, our team in Nokia is hiring.
Disco hackers are most welcome! There are certainly clones, but
afaict, folks frequently contribute fixes and improvements back.

Also, the 0.5 model was actually motivated by trying to address ways
of fixing some of the existing bugs.

> Looking through the current documentation, i notice both disco.worker and
> disco.worker.classic. I've also seen some mention in this list of the
> future version 0.5 having an incompatible api change. We use disco in our
> company (Spilgames.com) for aggregating events from large amounts of web
> clients, but I find that there is not a lot of momentum in using disco. To
> me that seems to be mostly because of the limited number of tutorials and
> generally not so much overview documentation (you end up at the python
> class documentation pretty quickly).
>
> We will have to develop skills in this company to really use map-reduce
> frameworks anyway, and we're not a Java shop (PHP, Python and Erlang
> mostly), so there is not a lot of enthusiasm for Hadoop. I like the
> compact code-base of disco, and the ease with which you can get something
> going. I also have confidence in its performance.
>
> We are absolutely willing to contribute quite some effort into writing
> tutorials and other documentation, and we will make those available to the
> project, but I'd like to know a bit about the api roadmap.
> Is the 'classic' pattern on the way to obsolescence? Should we really
> focus on the 'new' mechanism.

More tutorials and documentation would certainly help, especially of
the cookbook kind. Any doc contributions will be very gratefully
merged!

There will be some changes in 0.5, but the goal is to support as much
of the current API as possible. 0.5 allows a more flexible 'pipeline'
approach to computation [1]; this flexibility means that the current
'classic' map-reduce API can essentially be supported. There might
need to be slightly different ways of doing the same thing, so minor
code changes _might_ be needed, but you can still do map-reduce style
processing. But the goal is definitely to minimize any code changes
required.

The Erlang support for the pipeline model for 0.5 is basically done
[2] and lightly tested, but needs serious pounding in a large cluster.

The Python user library needs work, both to natively exploit the new
model, and to support as much of the current 'classic' API on top of
the new model. But a lot of documentation for the 'classic' API
should still ideally hold for its port to the new model.

In addition, the Web UI obviously also needs work to show appropriate
and useful job information.

This means that 0.5 is still a ways away; in the meantime, the
discoproject github master branch will still point to the stable 0.4
line.

Note that the changes are mainly targeting the compute portion of
Disco; there are no major changes in the roadmap for the DDFS storage
layer. The main things to be done for DDFS are known [3] and well
specified, it is primarily an issue of time and manpower to get it
done.

If you already want to play around with the new pipeline model, and
don't mind using OCaml, you can already do so [4].

[1] https://github.com/pmundkur/disco/blob/devel/scheduler/master/include/pipeline.hrl
[2] https://github.com/pmundkur/disco/commits/devel/scheduler
[3] https://github.com/discoproject/disco/wiki/DDFS-Evolution
[4] https://github.com/pmundkur/odisco/tree/devel/pipeline

--
prashanth

Bart van Deenen

unread,
Nov 14, 2012, 4:07:26 AM11/14/12
to disc...@googlegroups.com
Thanks for the detailed replies.

I will recommend we put effort into writing cookbooks and tutorials. We'll focus on the current stable version. We're already using Sphinx so that's fine. I'll be in touch with you guys on irc (probably starting next week).

Greetings

Bart van Deenen

P.S. spilgames is hiring, we're looking for instance for a Big Data Engineer. You can meet us at Techmesh in London next month.

Ville Tuulos

unread,
Nov 15, 2012, 6:20:18 PM11/15/12
to disc...@googlegroups.com
Hi all,

Just wanted to chime in regarding the roadmap for Disco.

I just had a chat with Prashanth a few days ago related to what
Bitdeli could contribute back to Disco. It would be great to see our
Linux container (LXC) -based workers integrated in the upcoming 0.5.
This would allow much more robust and fine-grained control of CPU /
memory / IO resources in workers.

Ville

On Wed, Nov 14, 2012 at 1:23 AM, Ebot Tabi <ebot...@gmail.com> wrote:
> Hi Bart
> awesome that you guys are looking forward to contribute to disco, its a good
> start with the howto and cookbook tutorials. Myself find it interesting to
> work with Disco but pretty limited information like integrating with
> external storage system such as Cassandra etc, like for example i am working
> a distributed crawler that pulls data from twitter and analyzed (some kind
> of social analytic), i have been looking at Hadoop and Disco, but Disco
> seems much easier to go with and i am willing as well to contribute more to
> Disco dev. Will be pleasure to exchange some working experience with you.
>
> Greetings
> --
> You received this message because you are subscribed to the Google Groups
> "Disco-development" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/disco-dev/-/SDl3cLTIQH8J.
>
> To post to this group, send email to disc...@googlegroups.com.
> To unsubscribe from this group, send email to
> disco-dev+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/disco-dev?hl=en.

Scott Robertson

unread,
Nov 17, 2012, 9:27:51 PM11/17/12
to disc...@googlegroups.com
Just wanted to chime in regarding the roadmap for Disco. 

I just had a chat with Prashanth a few days ago related to what
Bitdeli could contribute back to Disco. It would be great to see our
Linux container (LXC) -based workers integrated in the upcoming 0.5.
This would allow much more robust and fine-grained control of CPU /
memory / IO resources in workers.

That's interesting, sandboxing disco workers with LXC is on my roadmap, something that I am looking to start in the next week or so. Need help?
Reply all
Reply to author
Forward
0 new messages