Re: Initial Thoughts

83 views

Skip to first unread message

Jordan Boyd-Graber

unread,

Jul 17, 2017, 2:09:17 PM7/17/17

to Shravan Sanjiv, feet-t...@googlegroups.com, hc...@googlegroups.com

On Wed, Jul 12, 2017 at 3:10 PM, Shravan Sanjiv <sanjiv...@gmail.com> wrote:

I read a little about Luigi via the documentation, and it makes a lot more sense. Here are some feedback about the Qanta documentation as well as a question:

1) I know that Qanta is meant to be run on AWS, but it would be helpful to have a quick start guide for non-AWS users as well. This could be something to add later in the future. Most of the information is already in the current readme, but some of it can easily be skipped over such as the crucial “Quanta on Path” section.

100% agree. I'm going through the process of running everything locally non-AWS and trying to update things as I go.

2) The “running on batch mode” instructions should be updated. In step 2, the “CreateGuesses” task is missing in the pipeline itself, and in step 3, the “AllSummaries” task needs to be updated according to a comment in the file itself. I’m not sure if the “AllSummaries” task works as expected or if that comment should be deleted.

Thanks! Feel free to submit pull requests that fix specific issues you find.

3) This is pretty minor, but it might be helpful to include the modification of using “—local-schedule” in the Luigi targets for those running Qanta without Spark.

While I think it would be useful to give this as a hint, Spark is still useful on multicore machines to speed things up that it should be part of the "standard" use case that we suggest.

(For the simplified system we hope to put out in the near future, we'd probably want to ditch both Spark and Luigi.)

4) Could I get some documentation for the Luigi parameters in the TrainGuesser and GenerateGuesses tasks? It seems that GenerateGuesses specifies TrainGuesser as a dependency, so is there any way to save the model and reuse it for GenerateGuesses?