val vs lazy val vs def in Scalding

171 views
Skip to first unread message

karthik

unread,
Apr 15, 2014, 6:11:42 PM4/15/14
to cascadi...@googlegroups.com
Hi,

I have a couple of really basic questions as I am new to scalding
1. While defining a series of steps in a Scalding script, if I need to reuse a previous pipe, what should that pipe be stored as. Should it be lazy val or val?

2. Also, where can I find the job flow DAG in Scalding? In Pig, I used to be able to see the logical plan. However, if I define different pipes as vale, they seem to be executed over and over again. 

Any help would be appreciated. 
Cheers!

m.orazow

unread,
Apr 24, 2014, 9:49:24 AM4/24/14
to cascadi...@googlegroups.com
Hello,



1. While defining a series of steps in a Scalding script, if I need to reuse a previous pipe, what should that pipe be stored as. Should it be lazy val or val?
Please correct me if I am wrong, but by reusing you mean, like joining with some other pipes? Then check .forceToDisk command, so that you do not recalculate a pipe.
For instance,
val pipe1 = input1.read./do some more job/.write(Tsv(args("output-pipe1")).forceToDisk
val pipe2 = input2.read./do some more job/.write(Tsv(args("output-pipe2"))
val res = pipe1.joinWithSmaller('field -> 'field, pipe2).write(args("joined"))


But usually defining a job pipe with val should suffice.


Also, where can I find the job flow DAG in Scalding? In Pig, I used to be able to see the logical plan. However, if I define different pipes as vale, they seem to be executed over and over again.
Running a scalding job using "--tool.graph" command will create you dot file which then you can see you flow schema.

-cheers

Marius Soutier

unread,
Apr 25, 2014, 6:48:53 AM4/25/14
to cascadi...@googlegroups.com
When I tried that with my jobs (where I reuse a lot of pipes), it actually slowed down everything considerably.

cvk

unread,
Apr 29, 2014, 3:27:31 PM4/29/14
to cascadi...@googlegroups.com
@m.orazow: Thanks! I did not know about the tool graph option. 

@Marius: Yes, I observe that too. Try using lazy val in that scenario.

Marius Soutier

unread,
Apr 30, 2014, 4:33:34 AM4/30/14
to cascadi...@googlegroups.com
That doesn’t change anything, makes my jobs slow as well.

Oscar Boykin

unread,
Apr 30, 2014, 12:01:07 PM4/30/14
to cascadi...@googlegroups.com
On Tue, Apr 15, 2014 at 3:11 PM, karthik <cvkrish...@gmail.com> wrote:
Hi,

I have a couple of really basic questions as I am new to scalding
1. While defining a series of steps in a Scalding script, if I need to reuse a previous pipe, what should that pipe be stored as. Should it be lazy val or val?

val is fine here. lazy is not useful unless you might not actually access the result.

Also, if you make everything lazy, then there will be no resulting job. Something must be strict to trigger the definition of the FlowDef.


2. Also, where can I find the job flow DAG in Scalding? In Pig, I used to be able to see the logical plan. However, if I define different pipes as vale, they seem to be executed over and over again. 

Any help would be appreciated. 
Cheers!

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/fb76921c-59d7-43fe-a285-a64e1a532c32%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Oscar Boykin :: @posco :: http://twitter.com/posco

cvk

unread,
Sep 22, 2014, 9:48:05 PM9/22/14
to cascadi...@googlegroups.com
val is fine here. lazy is not useful unless you might not actually access the result.
Also, if you make everything lazy, then there will be no resulting job. Something must be strict to trigger the definition of the FlowDef.

Cool. Thanks Oscar. Does using the lazy val cache anything in the heap? What is the impact on the heapspace?
Reply all
Reply to author
Forward
0 new messages