Migrate build system from rake to pyinvoke

580 views
Skip to first unread message

Leo Urbina

unread,
Apr 19, 2014, 12:45:06 AM4/19/14
to edx-...@googlegroups.com
During the pycon sprints of 2014 Nate Gentile and I took on the task of moving the edx-platform build system away from rake and use python based one instead. When we started there had already been some work done towards this task, and chosen tool for this migration was paver. After working with paver for some time we noticed that it had significant drawbacks, including:

1) No support for namespaces, making grouping of tasks cumbersome
2) Buggy: While using it we came across several trivial bugs, including improper grouping of tasks when displaying help. We submitted pull requests to fix some of these issues and are yet to hear back.
3) It seems like there has been no activity in the paver repo since at least one year.

Instead, we decided to use invoke. It claims to be fabric's successor, and takes some design elements from Rake. It provides a very concise and clean API to declare tasks and its dependencies. Even though it is not perfect, there is active development going on (the maintainer, Jeff Forcier was at the pycon and is pretty responsive both in github and irc). Also, the docs are fairly thorough and provide enough depth to understand most of the API (there are some poorly documented features, such as per-task flags help, however I'm in the process of creating a pull request to fix that). Nonetheless, I'm overall pretty happy with the tool and have started the process of migrating things over. On what follows I quickly describe the work done so far:

1) Created invoke tasks mirroring those currently available through paver. These tasks are under the tasks module.
2) Added a @deprecated decorator under pavelib/util that redirects execution to invoke added to a paver task (with some caveats, as it cannot deal with paver tasks that take positional arguments. In these cases it only prints that the task is deprecated and prints the help for the invoke task. Not sure if this is reasonable)
3) Moved the i18n rake tasks into invoke tasks.

Currently I'm testing the new invoke tasks to make sure that they all behave the same way that the paver tasks used to. On top of this I have done some amount of cleanup of the verbosity of stdout/stderr of the tasks and added colorization via colorama to make the output more readable. The remaining work is to move what is left as rake tasks into invoke tasks. 

The intent is to submit a pull request with invoke tasks that mirror all the current rake/paver tasks, and deprecate the latter. When everyone is happy with the state of invoke, we can go ahead and delete the unused rake/paver stuff. All the work is being done in my fork of the edx-platform in the invokelib/develop branch. I want to hear any thoughts/concerns/suggestions you may have about this move,

Best,
-Leo






Yarko Tymciurak

unread,
Apr 19, 2014, 1:50:40 AM4/19/14
to edx-...@googlegroups.com
Perhaps you could move this into the openedx-ops group...

Ned Batchelder

unread,
Apr 19, 2014, 6:32:32 AM4/19/14
to edx-...@googlegroups.com
Yarko, this is a fine topic for here, since it's a significant change to the source tree, and will affect all developers.

--Ned.

Ned Batchelder

unread,
Apr 19, 2014, 6:36:10 AM4/19/14
to edx-...@googlegroups.com
Leo, your dedication to this job is remarkable, thanks so much!  It sounds like you have chosen a good strategy.  We are currently in a rake/paver world, if you could submit a pull request making us rake/invoke then we are definitely moving in a positive direction.  Then eventually we could get to pure invoke.  

Thanks so much for taking this on and sticking with it!

--Ned.

Yarko Tymciurak

unread,
Apr 19, 2014, 3:21:34 PM4/19/14
to edx-...@googlegroups.com


On Friday, April 18, 2014 11:45:06 PM UTC-5, Leo Urbina wrote:
During the pycon sprints of 2014 Nate Gentile and I took on the task of moving the edx-platform build system away from rake and use python based one instead. When we started there had already been some work done towards this task, and chosen tool for this migration was paver. After working with paver for some time we noticed that it had significant drawbacks, including:

1) No support for namespaces, making grouping of tasks cumbersome

Is this a problem?   (Would we have been better off just staying with rake after all? - see http://martinfowler.com/articles/rake.html for some perspective)
 
2) Buggy: While using it we came across several trivial bugs, including improper grouping of tasks when displaying help. We submitted pull requests to fix some of these issues and are yet to hear back.

I would expect that, like edx, you would want to link in a decision maker in the PR, when you are ready to ask for their attention:  In https://github.com/paver/paver/pull/123  you call out PR #100, where a repo owner commented - try calling him out.

Side note:  your PR change seems dense, and not immediately obvious what it's purpose is, or what it's doing (obfuscated) - might want to comment and motivate, or future maintenance would be hard.
 
3) It seems like there has been no activity in the paver repo since at least one year.


In fact, when the activity is limited to removing support for Python 2.5, an api and a few doc tweaks, I see this as active _and_ stable.



Instead, we decided to use invoke. It claims to be fabric's successor, and takes some design elements from Rake. It provides a very concise and clean API to declare tasks and its dependencies. Even though it is not perfect, there is active development going on (the maintainer, Jeff Forcier was at the pycon and is pretty responsive both in github and irc). Also, the docs are fairly thorough and provide enough depth to understand most of the API (there are some poorly documented features, such as per-task flags help, however I'm in the process of creating a pull request to fix that). Nonetheless, I'm overall pretty happy with the tool and have started the process of migrating things over. On what follows I quickly describe the work done so far:

So - if we're opening this up even as we haven't really settled into rake => paver transition, what about a discussion (or +/- spreadsheet, or...):

Perhaps the transparent discussion in rake=>paver is what was missing (or a cataloging of motivations / reasons / +'s & -'s)...

Best regards,
- Yarko

Yarko Tymciurak

unread,
Apr 19, 2014, 3:45:30 PM4/19/14
to edx-...@googlegroups.com


On Saturday, April 19, 2014 2:21:34 PM UTC-5, Yarko Tymciurak wrote:


On Friday, April 18, 2014 11:45:06 PM UTC-5, Leo Urbina wrote:
During the pycon sprints of 2014 Nate Gentile and I took on the task of moving the edx-platform build system away from rake and use python based one instead. When we started there had already been some work done towards this task, and chosen tool for this migration was paver. After working with paver for some time we noticed that it had significant drawbacks, including:

1) No support for namespaces, making grouping of tasks cumbersome

Is this a problem?   (Would we have been better off just staying with rake after all? - see http://martinfowler.com/articles/rake.html for some perspective)
 
2) Buggy: While using it we came across several trivial bugs, including improper grouping of tasks when displaying help. We submitted pull requests to fix some of these issues and are yet to hear back.

I would expect that, like edx, you would want to link in a decision maker in the PR, when you are ready to ask for their attention:  In https://github.com/paver/paver/pull/123  you call out PR #100, where a repo owner commented - try calling him out.

Side note:  your PR change seems dense, and not immediately obvious what it's purpose is, or what it's doing (obfuscated) - might want to comment and motivate, or future maintenance would be hard.
 
3) It seems like there has been no activity in the paver repo since at least one year.


In fact, when the activity is limited to removing support for Python 2.5, an api and a few doc tweaks, I see this as active _and_ stable.



Instead, we decided to use invoke. It claims to be fabric's successor, and takes some design elements from Rake. It provides a very concise and clean API to declare tasks and its dependencies. Even though it is not perfect, there is active development going on (the maintainer, Jeff Forcier was at the pycon and is pretty responsive both in github and irc). Also, the docs are fairly thorough and provide enough depth to understand most of the API (there are some poorly documented features, such as per-task flags help, however I'm in the process of creating a pull request to fix that). Nonetheless, I'm overall pretty happy with the tool and have started the process of migrating things over. On what follows I quickly describe the work done so far:

So - if we're opening this up even as we haven't really settled into rake => paver transition, what about a discussion (or +/- spreadsheet, or...):

Perhaps the transparent discussion in rake=>paver is what was missing (or a cataloging of motivations / reasons / +'s & -'s)...

BTW - I have no "dog" in this fight:   as long as we pick a tool that is solid enough that we don't churn the underlying devops support environment in (say) something like 2 years, that's pretty good.

What I don't want to see is an incessant round-robin among this (or a growing) list, without ever fully settling on one.

rake was just fine with me, and I'm not sure the effort (to date) to get off of it shows a comparative payoff.

Bertrand Marron

unread,
Apr 19, 2014, 6:35:13 PM4/19/14
to edx-...@googlegroups.com
On Sat, Apr 19, 2014 at 9:45 PM, Yarko Tymciurak <yar...@gmail.com> wrote:
>
> What I don't want to see is an incessant round-robin among this (or a
> growing) list, without ever fully settling on one.
>
> rake was just fine with me, and I'm not sure the effort (to date) to get off
> of it shows a comparative payoff.
>

While I agree with you, rake has its issues too.

It completely depends on the `aws` environment, and on the fact that
it uses JSON files for extra configuration.
I’d be happier if there was a tool that could load Django settings
instead of doing this :

ENV_FILE = File.join(ENV_ROOT, CONFIG_PREFIX + "env.json")

--
Bertrand Marron
Message has been deleted

Yarko Tymciurak

unread,
Apr 19, 2014, 8:17:11 PM4/19/14
to edx-...@googlegroups.com
Wait - was this _just_ rake that causes this *.env.json use?
Just because this is set in aws.py?
It's not that rake depends on it (is it?), I think - since by that definition,  "devstack.py" also "depends" on aws, in that it builds on it.
This is just the gathering point (it seems) for ansible-playbooks to drop their settings into one place for _various_ servers, commands, etc. to gather as they need, when they need.

To have it otherwise, I think, would make the ansible settings of this parameters too coupled - needing to know too much about "where is that django setting file that needs this;  oh!  and that coffeescript too;  and..... -  eeek!

Rather, ansible collects it all into a "standard" format, and it's up to the command(s) to check / grab settings they need.

This is what happens in django settings (and aws.py just happens to be a convenient place to catch it, for all the stack / envs you might use, aws is the "bottom-most").
This is what happened in rake (not because of rake - but because it was convenient);
This is what happens in paver now (see edx-platform/pavelib/utils/envs.py);

This is not about rake.



--
Bertrand Marron

Ned Batchelder

unread,
Apr 20, 2014, 7:21:12 AM4/20/14
to edx-...@googlegroups.com
Let's keep the issues straight.  Rake is a general-purpose tool for scripting tasks.  The way we've used Rake, is very Amazon-specific.  The two halves to this discussion are: 1) what should dev tasks do, and 2) what tool should we use to implement them.  

--Ned.

Leo Urbina

unread,
Apr 20, 2014, 6:06:00 PM4/20/14
to edx-...@googlegroups.com
Yarko, Ned, et al,

Thanks for all the feedback. During the sprints the original intent was move everything from rake to paver. After dealing with paver for a while it became clear that it was very limited in comparison with rake. Furthermore, it was buggy in some very obvious ways (currently I have a pull request to fix one of those bugs here, and I'm yet to hear any responses). Finally, it seemed like the project hasn't had any activity for the last year. 

Yarko, when we started to entertain the idea of switching to invoke I had some reluctance as well. Being this my first attempt at contributing to Open edX, I was not part of the discussion that went into choosing paver as a suitable replacement for rake. This was something that Nate and I discussed openly during the sprints, and it is perhaps on us for not asking the people on IRC, and you as it seems you are somewhat opinionated on this issue, for advice.

That aside, the bottom line is that we found invoke to be much better suited for the job, it is actively maintained, and takes a lot of its design elements from rake, making the migration more transparent. It is far from perfect, and it has its own quirks (I did find a couple of small annoyances, such as the inability to call tasks programmatically from within other tasks, alas, after creating a bug on github I promptly received a reply, and we are currently trying to sort it out).

Finally, I just want to say that not only this is my first attempt at contributing to Open edX, but also open source as a whole. I think this is a very exciting opportunity, and I look forward to any pointers, advice and comments to help me navigate the ecosystem, and to focus my efforts in the most efficient possible way. Thanks,

-Leo

Yarko Tymciurak

unread,
Apr 20, 2014, 7:02:34 PM4/20/14
to edx-...@googlegroups.com


On Apr 20, 2014 5:06 PM, "Leo Urbina" <leo.a....@gmail.com> wrote:
>
> Yarko, Ned, et al,
>
> Thanks for all the feedback. During the sprints the original intent was move everything from rake to paver. After dealing with paver for a while it became clear that it was very limited in comparison with rake.

It would seem...

> Furthermore, it was buggy in some very obvious ways (currently I have a pull request to fix one of those bugs here, and I'm yet to hear any responses). Finally, it seemed like the project hasn't had any activity for the last year. 

As I showed in their github repo, there has clearly been activity just within the past 3-4 months (I wish you'd stop saying it hasn't had activity in the past year, even more so after I post a link to show recent commit logs!).

To get a response, ask a maintainer to review (as you would in edx repos).  For example, try adding a comment on your PR, e.g.:

"@Almad, would you please review / comment?"

>
> Yarko, when we started to entertain the idea of switching to invoke I had some reluctance as well. Being this my first attempt at contributing to Open edX, I was not part of the discussion that went into choosing paver as a suitable replacement for rake.

Indeed, I don't recall seeing a discussion on Open edX (what I am opinionated on, you could say - I want to see this happening more).

> This was something that Nate and I discussed openly during the sprints, and it is perhaps on us for not asking the people on IRC, and you as it seems you are somewhat opinionated on this issue, for advice.

The advice I would have given is likely the same I gave privately - that is, to start a transparent discussion which would invite and be open to participation & feedback.

Which you are doing now - thank you !

>
> That aside, the bottom line is that we found invoke to be much better suited for the job, it is actively maintained, and takes a lot of its design elements from rake, making the migration more transparent.

How you came to this, specifics would help for the sake of discussion & transparency.

> It is far from perfect, and it has its own quirks (I did find a couple of small annoyances, such as the inability to call tasks programmatically from within other tasks, alas, after creating a bug on github I promptly received a reply, and we are currently trying to sort it out).

>
> Finally, I just want to say that not only this is my first attempt at contributing to Open edX, but also open source as a whole. I think this is a very exciting opportunity, and I look forward to any pointers, advice and comments to help me navigate the ecosystem, and to focus my efforts in the most efficient possible way.

Thanks for your efforts, initiative.  I hope you'll have lots of fun contributing in open source!

> Thanks,
>
> -Leo

Kind regards,
Yarko

Jay Zoldak

unread,
Apr 22, 2014, 3:11:40 PM4/22/14
to edx-code
Not sure it matters, but to fill in the gap - a number of devs wanted to move _from_ rake because ruby is a misfit in the edx-platform code base and was causing pain because only a small subset of people felt comfortable working in ruby. The move _to_ paver was initiated by a team of edX devs in an internal hackathon after a quick evaluation of existing python tools at that time.

Leo - are you working on a fork? Is there a lot more to do? Part of the holdup with getting the paver code merged in originally was that we needed to also make sure that the converted tasks worked equivalently on our internal jenkins ci server. I can help with any changes that need to be co-ordinated there.

-- JZ

Leo Urbina

unread,
Apr 22, 2014, 4:19:49 PM4/22/14
to edx-...@googlegroups.com
Hi Jay,

Yes, I'm currently working on a fork: leourbina/edx-platform. I have thus far migrated all the tasks that were based on paver to invoke, as well as the i18n and a couple others. Any help to get stuff working with your Jenkins server would be appreciated. If you need access to the repo, give me your github account I'll add you as a contributor. I have not devoted much more time to development the last couple of days given that I wasn't sure how set edx is on using paver. 

I admit that I overlooked the activity on paver (not sure where I saw that it was inactive, but I apologize for the misinformation), and since the maintainers have reached out to me and merged my fixes. My personal take from my limited exposure to both systems is that I personally prefer invoke, but I'm not married to it. If edx is set on migrating to paver, I will gladly help migrating everything else. At this point both the tasks that are in paver are also available in invoke, plus there are some extra things that Nate got migrated from rake to invoke. Does anyone have any strong preferences?

-Leo

Calen Pennington

unread,
Apr 22, 2014, 4:31:07 PM4/22/14
to edx-...@googlegroups.com
When I compared the two back around the time of ICFP last year, I preferred pyinvoke as well.

-Cale

Jay Zoldak

unread,
Apr 22, 2014, 4:38:36 PM4/22/14
to edx-code
Leo --

Cool. If you submit a PR from the branch on your fork into master of edx/edx-platform, I can look at it that way and figure out next steps regarding jenkins.

Speaking for myself, if all else is roughly equal my preference is whichever is going to get us converted over to a single python-based tool quickly and in a reliable and maintainable way. If pyinvoke is now ahead of paver that is fine with me.

-- JZ

Yarko Tymciurak

unread,
Apr 22, 2014, 7:43:12 PM4/22/14
to edx-...@googlegroups.com

Leo -

Given this support, I suggest getting a PR in sooner - it doesn't have to be ready to bee seen and commented on (and helped with).

As an external contributor, I concur with Jay Zoldak - whatever gets us there and is stable (I'll be happy to work on the two files I've now PRd for both rake & paver).

Thanks again,
Yarko

Ned Batchelder

unread,
Apr 22, 2014, 9:53:18 PM4/22/14
to edx-...@googlegroups.com
Leo, to echo some sentiments here: if you've already converted all of paver, then your two-tool solution (rake and invoke) can be reviewed and merged now if it works, it's an improvement over the two-tool solution on master now (rake and paver).

--Ned.

Leo Urbina

unread,
Apr 23, 2014, 7:53:34 AM4/23/14
to edx-...@googlegroups.com
Sounds good. I'll file a pull request. Best,

-Leo

David Glance

unread,
Apr 23, 2014, 8:23:42 AM4/23/14
to edx-...@googlegroups.com
Being the person who did the initial work to port Rake to Paver can I say that it was a long road from submitting the initial PR to getting the first version of it accepted and pulled into Master. 

To be absolutely clear, I did the entire port - deprecating all of the Rake tasks and changing the documentation, updating tests etc. Only part of what I did has so far been pulled in after efforts from a number of people on the edx team. And this only happened after someone internally was put on the job.

Given the dependence on Rake for the edx.org production system, other priorities of the team - in hindsight - this is understandable.

So - I can understand the enthusiasm to get the job completed and I can understand the desire to pick yet another technology to do this - could I suggest however that the edx team could possibly finish the job I started and the community concentrate on something more worthwhile - like the comment service (still ruby on rails) - or almost anything else?

I speak only as someone who has been there and done that.

Regards

David

Paul-Olivier Dehaye

unread,
Apr 23, 2014, 5:47:00 PM4/23/14
to edx-...@googlegroups.com
Meta-comment, that could sound snarky but this is meant to really be constructive:

You can suggest, but we have no way to constructively concentrate suggestions, organise, and prioritise as a community outside of the edx consortium.  I wish we had.

I am speculating the consortium would be afraid that initiating this would be misunderstood as a roadmap set by the consortium on their own work (is this speculation correct?). 

Do people think something like this would be useful? If so, I can start a google moderator site and seed it with obvious suggestions (theming, we have all been there done that!) so we can vote up/down and constructively discuss solutions. Google moderator is the easiest way I see, but I am open to alternatives. One would be "fake" issues on github but that does not sound right to me.

Paul

Xavier Antoviaque

unread,
Apr 23, 2014, 6:06:07 PM4/23/14
to edx-...@googlegroups.com
Sef had started one - could be worth updating / revoting, it was some time ago:
 
 
--
Xavier.
 

Paul-Olivier Dehaye

unread,
Apr 23, 2014, 6:09:47 PM4/23/14
to edx-code
Yes, that's why I suggested Google Moderator. A lot has changed since, both in terms of code and community. I think a culling of this to re-seed a forum would be good, and then we can have a round of voting again.
If there is no objection to this plan within 24 hours, I will start that and then repost a link.
Paul

Paul-Olivier Dehaye
skype: lokami_lokami (preferred)

Ned Batchelder

unread,
Apr 24, 2014, 5:42:42 PM4/24/14
to edx-...@googlegroups.com
Everyone, this has been a difficult process.  We aren't picking new technologies just for the sake of it.  Two independent evaluators felt that invoke was a better foundation.  We had originally started on a Paver migration at a hackathon nearly a year ago, when invoke was not yet a viable option.  Since then, it has become a clearer choice.

I'm sorry that we had Paver partially started, enticing others to do work on it.  I would like to get a transition completed so that we can move on to other things.

Thanks to everyone for their efforts.  Even if Paver is not in use eventually, the effort has not been wasted, it has kept the issue at the forefront of our minds, and enabled us to pull in new contributors to help.

--Ned.

David Glance

unread,
Apr 25, 2014, 12:41:57 AM4/25/14
to edx-...@googlegroups.com
Somewhat ironically, Leo's (@leourbina) PR for Paver was merged 3 days ago 

Yarko Tymciurak

unread,
Apr 25, 2014, 11:01:25 AM4/25/14
to edx-...@googlegroups.com

I don't think that's ironic - just contributing to keeping the system improving (OK to do that on two fronts).

I think David B. is making a good move with doc update ( https://github.com/edx/edx-platform/pull/3434 ) - although with unreviewed (but highly used) wikis, it might be (?) good to have a wiki for a doc via link in case like this.

As for confusions etc. transparency & communication will go a long way. When there is to much of everything a structure will evolve as needed.

Reply all
Reply to author
Forward
0 new messages