[Scalding] Trying Execution, getting too much log output

51 weergaven
Naar het eerste ongelezen bericht

Vesa Muhonen

ongelezen,
2 mrt 2015, 09:48:2202-03-2015
aan cascadi...@googlegroups.com
Hi

I was just trying out to get the Execution to work with a really simple job (WordCount of course). I do get it to run, but I get all logs in double. Is that a bug or am I just doing something wrongly?

(scalding 0.13.1 and Scala 2.11.5)

Here's the code:

class WordCount {

 
def theJob: Execution[Unit] =
   
TypedPipe.from(TextLine("tmp/alice.txt"))
     
.flatMap { _.split("\\s+") }
     
.map { word => (word, 1) }
     
.sumByKey
     
.writeExecution(TypedTsv("tmp/wc_out.text"))

}

This is using it:

object WordCountJob
 
extends ExecutionApp {

 
val jobClass = new WordCount
 
val job = jobClass.theJob

 
val u: Unit = job.waitFor(Config.default, Local(strictSources = true))

}

and this is the output:

15/03/02 15:44:57 INFO property.AppProps: using app.id: 3CBCB5082FF94B54B7C6EBDD97E9893B
15/03/02 15:44:57 INFO util.Version: Concurrent, Inc - Cascading 2.6.1
15/03/02 15:44:57 INFO flow.Flow: [] starting
15/03/02 15:44:57 INFO flow.Flow: []  source: FileTap["TextLine[['offset', 'line']->[ALL]]"]["tmp/alice.txt"]
15/03/02 15:44:57 INFO flow.Flow: []  sink: FileTap["TextDelimited[[0:1]]"]["tmp/wc_out.text"]
15/03/02 15:44:57 INFO flow.Flow: []  parallel execution is enabled: true
15/03/02 15:44:57 INFO flow.Flow: []  starting jobs: 1
15/03/02 15:44:57 INFO flow.Flow: []  allocating threads: 1
15/03/02 15:44:57 INFO flow.FlowStep: [] starting step: local
15/03/02 15:44:58 INFO flow.Flow: [] starting
15/03/02 15:44:58 INFO flow.Flow: []  source: FileTap["TextLine[['offset', 'line']->[ALL]]"]["tmp/alice.txt"]
15/03/02 15:44:58 INFO flow.Flow: []  sink: FileTap["TextDelimited[[0:1]]"]["tmp/wc_out.text"]
15/03/02 15:44:58 INFO flow.Flow: []  parallel execution is enabled: true
15/03/02 15:44:58 INFO flow.Flow: []  starting jobs: 1
15/03/02 15:44:58 INFO flow.Flow: []  allocating threads: 1
15/03/02 15:44:58 INFO flow.FlowStep: [] starting step: local

Oscar Boykin

ongelezen,
3 mrt 2015, 20:49:4903-03-2015
aan cascadi...@googlegroups.com
Yeah, I agree the default cascading logging is quite verbose. Often that is pretty nice. Sometimes it is annoying.

I'm not log4j/slf4j master, but these are configurable to quiet the logs (I'm sure stackoverflow has some help here).

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/67953143-e0aa-4842-a7e0-9115529cd4f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Oscar Boykin :: @posco :: http://twitter.com/posco

Vesa Muhonen

ongelezen,
5 mrt 2015, 08:28:5505-03-2015
aan cascadi...@googlegroups.com
They are indeed on the loquacious side of things, but it's not that bad. I was more concerned that I got the exact same log twice and was wondering did I do some mistake in the code that actually makes it run twice for some reason...

Cheers,
Vesa


On Wednesday, 4 March 2015 02:49:49 UTC+1, Oscar Boykin wrote:
Yeah, I agree the default cascading logging is quite verbose. Often that is pretty nice. Sometimes it is annoying.

I'm not log4j/slf4j master, but these are configurable to quiet the logs (I'm sure stackoverflow has some help here).
 
--

Chris K Wensel

ongelezen,
5 mrt 2015, 11:27:2505-03-2015
aan cascadi...@googlegroups.com
providing a flow name will help distinguish units of work, [] would become [your flow].


--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/67953143-e0aa-4842-a7e0-9115529cd4f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Ian O'Connell

ongelezen,
5 mrt 2015, 11:38:3105-03-2015
aan cascadi...@googlegroups.com
Logs are pretty good here, you are indeed running it twice..



when you override the job method for the execution app it will be run, it looks like your running it also in the constructor, which was more a pattern for the older job class. Also I'd recommend putting in override for those to make it clear your supplying something from a super class.

Oscar Boykin

ongelezen,
5 mrt 2015, 12:59:4505-03-2015
aan cascadi...@googlegroups.com
Good catch, Ian. I didn't even notice the issue.

To be concrete, the simplest way to go would be:
object WordCount extends ExecutionApp {

  override
def job =

For more options, visit https://groups.google.com/d/optout.

Vesa Muhonen

ongelezen,
6 mrt 2015, 05:19:4006-03-2015
aan cascadi...@googlegroups.com
Thanks for the help. What I was aiming for was to have the job defined in one class that I could then easily call from another piece of code...

Oscar Boykin

ongelezen,
6 mrt 2015, 18:30:1806-03-2015
aan cascadi...@googlegroups.com
You can call WordCount.job from another job, or you can keep it as it was, just don't call waitFor.

(You should never need waitFor when using ExecutionApp).


For more options, visit https://groups.google.com/d/optout.
Allen beantwoorden
Auteur beantwoorden
Doorsturen
0 nieuwe berichten