IgnoreInvalideRows in Hadoop Indexing task

156 views
Skip to first unread message

Benjamin Angelaud

unread,
Jun 22, 2016, 5:26:45 AM6/22/16
to Druid Development
Hi guys,

It seems like IgnoreInvalideRows doesn't work for me, maybe it is a more global problem. Just putting this here.

In my json task:

 "tuningConfig" : {
"type": "hadoop",
"jobProperties" : {
"fs.s3.awsAccessKeyId" : xxxx,
"fs.s3.awsSecretAccessKey" :xxxx,
"fs.s3.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
"fs.s3n.awsAccessKeyId" : xxxx,
"fs.s3n.awsSecretAccessKey" : xxxx,
"fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
"io.compression.codecs" : "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"
},
"partitionsSpec": {
"type": "hashed",
"numShards": 1,
"assumeGrouped": true
},
"ignoreInvalidRows": true
}

And here's the error

Error: com.metamx.common.RE: Failure on row[20160619 562498 733291 2536 7 2 -8817085814308377886 2 0 0 \N 255 0 0 3 ad_view::photos \N ad_view ad_view 2 \N \N]
at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:88)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: com.metamx.common.parsers.ParseException: Unable to parse metrics[PageDuration], value[\N]
at io.druid.data.input.MapBasedRow.getLongMetric(MapBasedRow.java:154)
at io.druid.segment.incremental.IncrementalIndex$1$2.get(IncrementalIndex.java:108)
at io.druid.query.aggregation.LongSumAggregator.aggregate(LongSumAggregator.java:60)
at io.druid.indexer.InputRowSerde.toBytes(InputRowSerde.java:94)
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorMapper.innerMap(IndexGeneratorJob.java:292)
at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:84)
... 8 more
Caused by: java.lang.NumberFormatException: For input string: "\N"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.valueOf(Double.java:502)
at io.druid.data.input.MapBasedRow.getLongMetric(MapBasedRow.java:151) 
... 13 more 

Nishant Bangarwa

unread,
Jun 22, 2016, 12:16:18 PM6/22/16
to Druid Development
Hi Benjamin, 
This looks like a bug, created https://github.com/druid-io/druid/pull/3171 to hopefully fix it. 
Can you try with these changes if it works for you ?  

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/220cbbfe-a469-4d4c-b473-14e650dafba5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Benjamin Angelaud

unread,
Jun 23, 2016, 4:44:39 AM6/23/16
to Druid Development
Hi Nishant
We already correct it by changing the data causing problems, so i can't test it anymore. I'll probably need it soon btw so i'll let you know.
Many thanks !
Reply all
Reply to author
Forward
0 new messages