Turn on cli debugging

Andrew Stevenson

unread,

Oct 6, 2015, 2:43:42 PM10/6/15

to cdk...@cloudera.org

Guys,

How do I turn on more logging at the cli? I've added the debug flag but just get the environment variables, class path plus the parquet writer?

The reason I ask is because it takes along time to spin up the crunch job even in local mode.

Regards

Andrew

Ryan Blue

unread,

Oct 6, 2015, 2:49:39 PM10/6/15

to Andrew Stevenson, cdk...@cloudera.org

Add -v before the command:

kite-dataset -v copy d1 d2

rb

> --
> You received this message because you are subscribed to the Google
> Groups "CDK Development" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cdk-dev+u...@cloudera.org
> <mailto:cdk-dev+u...@cloudera.org>.
> For more options, visit https://groups.google.com/a/cloudera.org/d/optout.

--
Ryan Blue
Software Engineer
Cloudera, Inc.

Andrew Stevenson

unread,

Oct 6, 2015, 3:23:25 PM10/6/15

to Ryan Blue, cdk...@cloudera.org

I've done that but get nothing extra.

Regards

Andrew

To unsubscribe from this group and stop receiving emails from it, send an email to cdk-dev+u...@cloudera.org.

Ryan Blue

unread,

Oct 6, 2015, 3:53:47 PM10/6/15

to Andrew Stevenson, cdk...@cloudera.org

In that case, you would probably need to change the logging settings,
which are controlled by a log4j properties file in the command's
embedded jar. There might also be a way to control this on the command
line using the flags= variable that passes arguments to the hadoop command.

The crunch spin-up time just got a lot better when I fixed the classpath
issues in KITE-1083. When using CDH installed by parcels, Kite was
loading way too many jars into the classpath and the search time was
really ugly. Startup for copy jobs went down dramatically after that
patch. Maybe you could try building the current master and seeing if
that fixes it?

rb

Andrew Stevenson

unread,

Oct 6, 2015, 4:13:25 PM10/6/15

to Ryan Blue, cdk...@cloudera.org

I'll try that. I did see the hadoop class path was rather large.

Regards

Andrew

From: Ryan Blue
Sent: ‎06/‎10/‎2015 20:53
To: Andrew Stevenson; cdk...@cloudera.org

Andrew Stevenson

unread,

Oct 7, 2015, 3:16:26 AM10/7/15

to Ryan Blue, cdk...@cloudera.org

Hi Ryan,

I was wondering if the contents of hbase/lib are also added unnecessarily?

Regards

Andrew

From: Andrew Stevenson
Sent: ‎06/‎10/‎2015 21:13
To: Ryan Blue; cdk...@cloudera.org
Subject: RE: Turn on cli debugging

To unsubscribe from this group and stop receiving emails from it, send an email to cdk-dev+u...@cloudera.org.

Ryan Blue

unread,

Oct 7, 2015, 12:13:00 PM10/7/15

to Andrew Stevenson, cdk...@cloudera.org

Only if you're using a dataset in HBase. Maybe we could do the same sort
of logic we use for Hive when submitting jobs. We currently check
whether the dataset is Hive and will run via MR and then add
dependencies for it. We could do the same for HBase, though the initial
classpath needs to include everything to know where to pull the jars from.

rb

On 10/07/2015 12:16 AM, Andrew Stevenson wrote:
> Hi Ryan,
>
> I was wondering if the contents of hbase/lib are also added unnecessarily?
>
> Regards
>
> Andrew

> ------------------------------------------------------------------------
> From: Andrew Stevenson <mailto:astev...@outlook.com>

> Sent: ‎06/‎10/‎2015 21:13

> To: Ryan Blue <mailto:bl...@cloudera.com>; cdk...@cloudera.org
> <mailto:cdk...@cloudera.org>

> Subject: RE: Turn on cli debugging
>
> I'll try that. I did see the hadoop class path was rather large.
>
> Regards
>
> Andrew

> ------------------------------------------------------------------------
> From: Ryan Blue <mailto:bl...@cloudera.com>

> Sent: ‎06/‎10/‎2015 20:53

> To: Andrew Stevenson <mailto:astev...@outlook.com>;
> cdk...@cloudera.org <mailto:cdk...@cloudera.org>

Reply all

Reply to author

Forward