Hi,
I am trying to read file which contains integer field. Some of the int data contains space. While executing I got the exception as java.lang.NumberFormatException: For input string: "2011 ".
I tried the same thing by applying the BigDecimal type and I got the same exception. Below are the details:
public static void main(String[] args) {
Scheme sourceScheme = new TextDelimited(
new Fields("field1", "field2").applyTypes(new Type[] {
Integer.class, String.class }), null, false, true,
",", false, "\"", null, false);
String inputPath = "datafiles/input/in";
Tap source = new Hfs(sourceScheme, inputPath);
Pipe pipe = new Pipe("pipe");
Scheme sinkScheme = new TextDelimited(new Fields("field1", "field2"),
String outputPath = "datafiles/output/out";
Tap sink = new Hfs(sinkScheme, outputPath, SinkMode.REPLACE);
Properties properties = new Properties();
AppProps.setApplicationJarClass(properties, Test.class);
FlowConnector flowConnector = new HadoopFlowConnector(properties);
Flow flow = flowConnector.connect("test", source, sink, pipe);
Input File data -
2011,s
2011 ,s
2012,s
2013 ,s
2011 ,s
Output:
cascading.tuple.TupleException: unable to read from input identifier: file:/C:/Projects/CascadingTest/datafiles/input/in111
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:127)
at cascading.flow.stream.SourceStage.map(SourceStage.java:76)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:127)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: cascading.tap.TapException: field field1 cannot be coerced from : 2011 to: java.lang.Integer
at cascading.scheme.util.DelimitedParser.coerceParsedLine(DelimitedParser.java:370)
at cascading.scheme.util.DelimitedParser.parseLine(DelimitedParser.java:345)
at cascading.scheme.hadoop.TextDelimited.source(TextDelimited.java:1008)
at cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:140)
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:120)
Caused by: java.lang.NumberFormatException: For input string: "2011 "
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at cascading.tuple.coerce.IntegerObjectCoerce.coerce(IntegerObjectCoerce.java:50)
at cascading.tuple.coerce.IntegerObjectCoerce.coerce(IntegerObjectCoerce.java:29)
at cascading.tuple.coerce.Coercions$Coerce.canonical(Coercions.java:64)
at cascading.scheme.util.DelimitedParser.coerceParsedLine(DelimitedParser.java:363)
I was checking the code for BigDecimalCoerce.java which the coerce the value to its type, below is the code:
@Override
public BigDecimal coerce(Object value) {
if (value instanceof Double)
return BigDecimal.valueOf((Double) value);
else if (value instanceof Long)
return BigDecimal.valueOf((Long) value);
else if (value == null || value.toString().isEmpty())
return null;
else
return new BigDecimal(value.toString());
}
Q:
Is it possible to trim the value for BigDecimal, Double, Integer, Long, Short in Coerce method? If somehow we are able to trim() the value here then we will not get NumberFormatException while coercions.
Please let me know how we can read the number having spaces without exception.
Thanks,
Bhavesh Shah