cascading loganalysis example

7 views
Skip to first unread message

Robert Henry

unread,
Nov 9, 2009, 7:49:04 PM11/9/09
to cascadi...@googlegroups.com
The cascading loganalysis example seems to produce data sorted on the
timestamp, at least for small input files and a single shard of
output. There seems to be some implicit typing going on under the
covers so that the timestamps are treated as longs and sorted as such;
where dos this happen coercion happen?

Is there a way to disable the sorting by timestamp, so that the
loganalysis benchmark is unconstrained as to the output order, and
thus presumably make it run faster? This is to say, treat the log
data as a set of events, rather than a sequence of events.

Robert Henry

Chris K Wensel

unread,
Nov 10, 2009, 12:07:34 PM11/10/09
to cascadi...@googlegroups.com

In MapReduce sorting happens to support grouping on key values.

So the results are sorted on the fields that are grouped upon.

In this example, we are grouping on timestamps (minute and second
intervals) in order to get the metrics for each.

cheers,
chris
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google
> Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en
> -~----------~----~----~----~------~----~------~--~---
>

--
Chris K Wensel
ch...@concurrentinc.com
http://www.concurrentinc.com

Robert Henry

unread,
Nov 10, 2009, 12:48:07 PM11/10/09
to cascadi...@googlegroups.com
How is the type of the key values determined? The key values must be
Longs or Dates, somehow, to be sorted correctly. Which of the
pipeline builders knows that it will be dealing with Longs or Dates?
Is there some magic involved with the use of the DateParser object, or
the field named "ts"?

Thanks.
> --
>
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=.
>
>
>

Chris K Wensel

unread,
Nov 10, 2009, 12:58:14 PM11/10/09
to cascadi...@googlegroups.com
The operations that create ts and tm are responsible.

new DateParser( new Fields( "ts" ), "dd/MMM/yyyy:HH:mm:ss Z" );

new ExpressionFunction( new Fields( "tm" ), "ts - (ts % (60 * 1000))",
long.class )
Reply all
Reply to author
Forward
0 new messages