Error creating SequenceFile Tap, but works with TextLine?

12 views
Skip to first unread message

Chris Curtin

unread,
Feb 6, 2009, 9:56:59 AM2/6/09
to cascading-user
Hi,

I am using 1.0.1 with Hadoop 0.19.0

I am having problems with a SequenceFile output Tap. The logic works
with a TextLine(), but not the SequenceFile.

I have my own aggregator which defines the Fields in the constructor
using the single Fields version. It is called after a GroupBy like
this:

formatPipe = new GroupBy(formatPipe, new Fields
(MetricColumnDefinition.RECIPIENT_ID_NAME));
formatPipe = new Every(formatPipe, Fields.ALL, aggr);

In the Complete method, when I am adding the Tuple via
getOutputCollector().add(tuple). I am receiving the error below. If I
change the Tap to be a TextLine() instead of a SequenceFile it works
fine.

The SequenceFile is being created with the same Fields array as the
custom Aggregator.

Looking at the error it appears that the SequenceFile is looking at
the Fields from the GroupBy, since ‘Recipient Id’ is the column I used
there.

Caused by: cascading.tuple.TupleException: field not found: ''Mailing
Id'', available fields: ['Reci
pient Id', 1:65]



Any ideas?

09/02/06 09:48:44 WARN mapred.LocalJobRunner: job_local_0001
cascading.pipe.OperatorException: [reformat]
[com.misc.cmc.reports.mailingreport.MailingReport.p
rocessData(MailingReport.java:133)] operator Every failed completing
aggregator
at cascading.pipe.Every$EveryAggregatorHandler.complete
(Every.java:420)
at
cascading.flow.stack.EveryAggregatorReducerStackElement.operateEveryHandler
(EveryAggregat
orReducerStackElement.java:88)
at
cascading.flow.stack.EveryAggregatorReducerStackElement.collect
(EveryAggregatorReducerSta
ckElement.java:81)
at
cascading.flow.stack.EveryAllAggregatorReducerStackElement.operateEveryHandlers
(EveryAllA
ggregatorReducerStackElement.java:98)
at
cascading.flow.stack.EveryAllAggregatorReducerStackElement.collect
(EveryAllAggregatorRedu
cerStackElement.java:64)
at cascading.flow.stack.GroupReducerStackElement.operateGroup
(GroupReducerStackElement.java:
76)
at cascading.flow.stack.GroupReducerStackElement.collect
(GroupReducerStackElement.java:60)
at cascading.flow.stack.FlowReducerStack.reduce
(FlowReducerStack.java:152)
at cascading.flow.FlowReducer.reduce(FlowReducer.java:77)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:
430)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run
(LocalJobRunner.java:170)
Caused by: cascading.tuple.TupleException: unable to select from:
['Recipient Id', 1:65], using sele
ctor: [['Mailing Id'], ['Report Id'], ['Campaign Id'], ['Recipient
Id'], ['Recipient Type'], ['Email
'], ['DOMAIN'], ['Suppression Reason'], ['SENT'], ['DAYS_IN_LIST'],
['DAYS_IN_LIST_BUCKET'], ['OPENS
_TOTAL'], ['OPENS_HTML'], ['OPENS_AOL'], ['OPENS_TEXT'],
['OPENS_WEB'], ['OPENS_FIRST'], ['OPENS_LAS
T'], ['CLICK_TOTAL_ANY'], ['CLICK_TOTAL_HTML'], ['CLICK_TOTAL_AOL'],
['CLICK_TOTAL_TEXT'], ['CLICK_T
OTAL_WEB'], ['CLICK_ANY_FIRST'], ['CLICK_ANY_LAST'], ['FTF_TOTAL'],
['FTF_FIRST'], ['FTF_LAST'], ['C
ONVERSIONS_TOTAL'], ['CONVERSIONS_AMOUNT'], ['CONVERSION_FIRST'],
['CONVERSION_LAST'], ['HARD_BOUNCE
'], ['SOFT_BOUNCE'], ['BOUNCE_DATE'], ['OPTED_OUT_FROM_MAILING'],
['OPTED_OUT_FROM_MAILING_DATE'], [
'OPTED_OUT_VIA_ABUSE'], ['OPTED_OUT_VIA_ABUSE_DATE'],
['REPLY_MAIL_BLOCK'], ['REPLY_MAIL_BLOCK_DATE'
], ['REPLY_MAIL_RESTRICTION'], ['REPLY_MAIL_RESTRICTION_DATE'],
['REPLY_COUNT'], ['REPLY_FIRST'], ['
REPLY_LAST'], ['Opted Out'], ['frequency'], ['Opted Out Date'], ['Opt
In Date'], ['Opt Out Details']
, ['google_total'], ['google_html'], ['google_aol'], ['google_text'],
['google_web'], ['google_first
'], ['google_last'], ['other_total'], ['other_html'], ['other_aol'],
['other_text'], ['other_web'],
['other_first'], ['other_last']]
at cascading.tuple.TupleEntry.selectTuple(TupleEntry.java:372)
at cascading.scheme.SequenceFile.sink(SequenceFile.java:90)
at cascading.tap.Tap.sink(Tap.java:250)
at cascading.flow.stack.TapReducerStackElement.operateSink
(TapReducerStackElement.java:87)
at cascading.flow.stack.TapReducerStackElement.collect
(TapReducerStackElement.java:67)
at cascading.pipe.Every$EveryAggregatorHandler$1.collect
(Every.java:370)
at cascading.tuple.TupleEntryCollector.add
(TupleEntryCollector.java:71)
at com.misc.cmc.reports.mailingreport.RowAggregator.complete
(RowAggregator.java:162)
at cascading.pipe.Every$EveryAggregatorHandler.complete
(Every.java:416)
... 10 more
Caused by: cascading.tuple.TupleException: field not found: ''Mailing
Id'', available fields: ['Reci
pient Id', 1:65]
at cascading.tuple.Fields.indexOf(Fields.java:699)
at cascading.tuple.Fields.translatePos(Fields.java:641)
at cascading.tuple.Fields.getPos(Fields.java:625)
at cascading.tuple.Tuple.get(Tuple.java:370)
at cascading.tuple.TupleEntry.selectTuple(TupleEntry.java:368)
... 18 more
09/02/06 09:48:48 WARN flow.FlowStep: [reformat] completion events
count: 0
09/02/06 09:48:48 WARN flow.Flow: stopping jobs
09/02/06 09:48:48 INFO flow.FlowStep: [reformat] stopping: 1/1
09/02/06 09:48:48 WARN flow.Flow: stopped jobs
09/02/06 09:48:48 WARN flow.Flow: shutting down job executor
09/02/06 09:48:48 WARN flow.Flow: shutdown complete
09/02/06 09:48:48 WARN cascade.Cascade: [reformat] flow failed:
reformat
cascading.flow.FlowException: step failed: 1/1
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:449)
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:380)
at java.util.concurrent.FutureTask$Sync.innerRun
(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
09/02/06 09:48:48 WARN cascade.Cascade: [reformat] stopping flows
09/02/06 09:48:48 INFO cascade.Cascade: [reformat] stopping flow:
reformat
09/02/06 09:48:48 WARN cascade.Cascade: [reformat] stopped flows
09/02/06 09:48:48 WARN cascade.Cascade: [reformat] shutting down flow
executor
09/02/06 09:48:48 WARN cascade.Cascade: [reformat] shutdown complete
cascading.cascade.CascadeException: flow failed: reformat
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:413)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:354)
at java.util.concurrent.FutureTask$Sync.innerRun
(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: cascading.flow.FlowException: step failed: 1/1
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:449)
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:380)
... 5 more

Chris K Wensel

unread,
Feb 6, 2009, 10:26:57 AM2/6/09
to cascadi...@googlegroups.com
do you get this error in 1.0.0?
--
Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/

Chris K Wensel

unread,
Feb 6, 2009, 10:43:02 AM2/6/09
to cascadi...@googlegroups.com
Starting to wrap my head around this, still early.

SequenceFile is more strict. It will select against the incoming Tuple
and only write those fields. TextLine is relaxed, it, by default,
writes Fields.ALL.

If you write out the dot file, you should be able to see the fields
SequenceFile wants to sink.

ckw

On Feb 6, 2009, at 6:56 AM, Chris Curtin wrote:

>

Chris Curtin

unread,
Feb 6, 2009, 11:48:51 AM2/6/09
to cascading-user
Yes. I upgrade to 1.0.1 this morning in case it was fixed.

I'll generate the .dot file after lunch.

Chris Curtin

unread,
Feb 6, 2009, 12:47:41 PM2/6/09
to cascading-user
Here is the .dot file.

One thing that looks strange: line 2, there are 3 levels of [ in the
definition of SequenceFile. I've noticed in a few of the error dumps
that sometimes the field names are enclosed in [ ] and sometimes not.
Is that showing some unexpected level of nesting?

digraph G {
1 [label = "Every('reformat')[RowAggregator[decl:['Mailing Id'],
['Report Id'], ['Campaign Id'], ['Recipient Id'], ['Recipient Type'],
['Email'], ['DOMAIN'], ['Suppression Reason'], ['SENT'],
['DAYS_IN_LIST'], ['DAYS_IN_LIST_BUCKET'], ['OPENS_TOTAL'],
['OPENS_HTML'], ['OPENS_AOL'], ['OPENS_TEXT'], ['OPENS_WEB'],
['OPENS_FIRST'], ['OPENS_LAST'], ['CLICK_TOTAL_ANY'],
['CLICK_TOTAL_HTML'], ['CLICK_TOTAL_AOL'], ['CLICK_TOTAL_TEXT'],
['CLICK_TOTAL_WEB'], ['CLICK_ANY_FIRST'], ['CLICK_ANY_LAST'],
['FTF_TOTAL'], ['FTF_FIRST'], ['FTF_LAST'], ['CONVERSIONS_TOTAL'],
['CONVERSIONS_AMOUNT'], ['CONVERSION_FIRST'], ['CONVERSION_LAST'],
['HARD_BOUNCE'], ['SOFT_BOUNCE'], ['BOUNCE_DATE'],
['OPTED_OUT_FROM_MAILING'], ['OPTED_OUT_FROM_MAILING_DATE'],
['OPTED_OUT_VIA_ABUSE'], ['OPTED_OUT_VIA_ABUSE_DATE'],
['REPLY_MAIL_BLOCK'], ['REPLY_MAIL_BLOCK_DATE'],
['REPLY_MAIL_RESTRICTION'], ['REPLY_MAIL_RESTRICTION_DATE'],
['REPLY_COUNT'], ['REPLY_FIRST'], ['REPLY_LAST'], ['Opted Out'],
['frequency'], ['Opted Out Date'], ['Opt In Date'], ['Opt Out
Details'], ['google_total'], ['google_html'], ['google_aol'],
['google_text'], ['google_web'], ['google_first'], ['google_last'],
['other_total'], ['other_html'], ['other_aol'], ['other_text'],
['other_web'], ['other_first'], ['other_last']]]"];
2 [label = "Lfs['SequenceFile[[['Mailing Id'], ['Report Id'],
['Campaign Id'], ['Recipient Id'], ['Recipient Type'], ['Email'],
['DOMAIN'], ['Suppression Reason'], ['SENT'], ['DAYS_IN_LIST'],
['DAYS_IN_LIST_BUCKET'], ['OPENS_TOTAL'], ['OPENS_HTML'],
['OPENS_AOL'], ['OPENS_TEXT'], ['OPENS_WEB'], ['OPENS_FIRST'],
['OPENS_LAST'], ['CLICK_TOTAL_ANY'], ['CLICK_TOTAL_HTML'],
['CLICK_TOTAL_AOL'], ['CLICK_TOTAL_TEXT'], ['CLICK_TOTAL_WEB'],
['CLICK_ANY_FIRST'], ['CLICK_ANY_LAST'], ['FTF_TOTAL'], ['FTF_FIRST'],
['FTF_LAST'], ['CONVERSIONS_TOTAL'], ['CONVERSIONS_AMOUNT'],
['CONVERSION_FIRST'], ['CONVERSION_LAST'], ['HARD_BOUNCE'],
['SOFT_BOUNCE'], ['BOUNCE_DATE'], ['OPTED_OUT_FROM_MAILING'],
['OPTED_OUT_FROM_MAILING_DATE'], ['OPTED_OUT_VIA_ABUSE'],
['OPTED_OUT_VIA_ABUSE_DATE'], ['REPLY_MAIL_BLOCK'],
['REPLY_MAIL_BLOCK_DATE'], ['REPLY_MAIL_RESTRICTION'],
['REPLY_MAIL_RESTRICTION_DATE'], ['REPLY_COUNT'], ['REPLY_FIRST'],
['REPLY_LAST'], ['Opted Out'], ['frequency'], ['Opted Out Date'],
['Opt In Date'], ['Opt Out Details'], ['google_total'],
['google_html'], ['google_aol'], ['google_text'], ['google_web'],
['google_first'], ['google_last'], ['other_total'], ['other_html'],
['other_aol'], ['other_text'], ['other_web'], ['other_first'],
['other_last']]]']['c:/temp/clouds/test_data/local/output/
7_flat']']"];
3 [label = "GroupBy('reformat')[by:['Recipient Id']]"];
4 [label = "Each('reformat')[MetricsParser[decl:'Recipient Id',
'Recipient Type', 'Mailing Id', 'Report Id', 'Campaign Id', 'Email',
'Event Type', 'Event Timestamp', 'Body Type', 'Content Id', 'Click
Name', 'URL', 'Conversion Action', 'Conversion Detail', 'Conversion
Amount', 'Suppression Reason', 'Opted Out', 'frequency', 'Opted Out
Date', 'Opt In Date', 'Opt Out Details']]"];
5 [label = "MultiTap[[Lfs['TextLine[['offset', 'line']->[ALL]]']['c:/
temp/clouds/test_data/local/source/7_sent']'], Lfs['TextLine
[['offset', 'line']->[ALL]]']['c:/temp/clouds/test_data/local/source/
7_metrics']']]]"];
6 [label = "[head]"];
7 [label = "[tail]"];
1 -> 2 [label = "['Recipient Id', 1:65]\n['Recipient Id', 'Recipient
Type', 'Mailing Id', 'Report Id', 'Campaign Id', 'Email', 'Event
Type', 'Event Timestamp', 'Body Type', 'Content Id', 'Click Name',
'URL', 'Conversion Action', 'Conversion Detail', 'Conversion Amount',
'Suppression Reason', 'Opted Out', 'frequency', 'Opted Out Date', 'Opt
In Date', 'Opt Out Details']"];
5 -> 4 [label = "['offset', 'line']\n['offset', 'line']"];
4 -> 3 [label = "['Recipient Id', 'Recipient Type', 'Mailing Id',
'Report Id', 'Campaign Id', 'Email', 'Event Type', 'Event Timestamp',
'Body Type', 'Content Id', 'Click Name', 'URL', 'Conversion Action',
'Conversion Detail', 'Conversion Amount', 'Suppression Reason', 'Opted
Out', 'frequency', 'Opted Out Date', 'Opt In Date', 'Opt Out Details']
\n['Recipient Id', 'Recipient Type', 'Mailing Id', 'Report Id',
'Campaign Id', 'Email', 'Event Type', 'Event Timestamp', 'Body Type',
'Content Id', 'Click Name', 'URL', 'Conversion Action', 'Conversion
Detail', 'Conversion Amount', 'Suppression Reason', 'Opted Out',
'frequency', 'Opted Out Date', 'Opt In Date', 'Opt Out Details']"];
3 -> 1 [label = "reformat['Recipient Id']\n['Recipient Id',
'Recipient Type', 'Mailing Id', 'Report Id', 'Campaign Id', 'Email',
'Event Type', 'Event Timestamp', 'Body Type', 'Content Id', 'Click
Name', 'URL', 'Conversion Action', 'Conversion Detail', 'Conversion
Amount', 'Suppression Reason', 'Opted Out', 'frequency', 'Opted Out
Date', 'Opt In Date', 'Opt Out Details']"];
6 -> 5 [label = ""];
2 -> 7 [label = ""];
}


Chris K Wensel

unread,
Feb 6, 2009, 1:08:04 PM2/6/09
to cascadi...@googlegroups.com
can you resend as an attachment? dot files are \n sensitive

ckw

Chris K Wensel

unread,
Feb 6, 2009, 5:48:38 PM2/6/09
to cascadi...@googlegroups.com
fyi, we resolved this out of band.

Turns out Cascading should be throwing an error instead of allowing
for Fields instances to be nested.

that is, new Fields( new Fields( "..." ) ); should be prevented.

should have a fix in the next minor release.

ckw
Reply all
Reply to author
Forward
0 new messages