IgnoreInvalideRows in Hadoop Indexing task

Benjamin Angelaud

unread,

Jun 22, 2016, 5:26:45 AM6/22/16

to Druid Development

Hi guys,

It seems like IgnoreInvalideRows doesn't work for me, maybe it is a more global problem. Just putting this here.

In my json task:

"tuningConfig" : {

"type": "hadoop",

"jobProperties" : {

"fs.s3.awsAccessKeyId" : xxxx,

"fs.s3.awsSecretAccessKey" :xxxx,

"fs.s3.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",

"fs.s3n.awsAccessKeyId" : xxxx,

"fs.s3n.awsSecretAccessKey" : xxxx,

"fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",

"io.compression.codecs" : "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"

},

"partitionsSpec": {

"type": "hashed",

"numShards": 1,

"assumeGrouped": true

},

"ignoreInvalidRows": true

}

And here's the error

Error: com.metamx.common.RE: Failure on row[20160619 562498 733291 2536 7 2 -8817085814308377886 2 0 0 \N 255 0 0 3 ad_view::photos \N ad_view ad_view 2 \N \N]
at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:88)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: com.metamx.common.parsers.ParseException: Unable to parse metrics[PageDuration], value[\N]
at io.druid.data.input.MapBasedRow.getLongMetric(MapBasedRow.java:154)
at io.druid.segment.incremental.IncrementalIndex$1$2.get(IncrementalIndex.java:108)
at io.druid.query.aggregation.LongSumAggregator.aggregate(LongSumAggregator.java:60)
at io.druid.indexer.InputRowSerde.toBytes(InputRowSerde.java:94)
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorMapper.innerMap(IndexGeneratorJob.java:292)
at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:84)
... 8 more
Caused by: java.lang.NumberFormatException: For input string: "\N"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.valueOf(Double.java:502)
at io.druid.data.input.MapBasedRow.getLongMetric(MapBasedRow.java:151)

... 13 more

Nishant Bangarwa

unread,

Jun 22, 2016, 12:16:18 PM6/22/16

to Druid Development

Hi Benjamin,

This looks like a bug, created https://github.com/druid-io/druid/pull/3171 to hopefully fix it.

Can you try with these changes if it works for you ?

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/220cbbfe-a469-4d4c-b473-14e650dafba5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Benjamin Angelaud

unread,

Jun 23, 2016, 4:44:39 AM6/23/16

to Druid Development

Hi Nishant

We already correct it by changing the data causing problems, so i can't test it anymore. I'll probably need it soon btw so i'll let you know.

Many thanks !

Reply all

Reply to author

Forward