java.lang.NumberFormatException: For input string: "2011 "

261 views
Skip to first unread message

Bhavesh Shah

unread,
Jul 15, 2015, 11:37:16 AM7/15/15
to cascadi...@googlegroups.com
Hi,

I am trying to read file which contains integer field. Some of the int data contains space. While executing I got the exception as  java.lang.NumberFormatException: For input string: "2011 ". 

 I tried the same thing by applying the BigDecimal type and I got the same exception. Below are the details:

Code:
public class Test {
public static void main(String[] args) {

Scheme sourceScheme = new TextDelimited(
new Fields("field1", "field2").applyTypes(new Type[] {
Integer.class, String.class }), null, false, true,
",", false, "\"", null, false);
String inputPath = "datafiles/input/in";
Tap source = new Hfs(sourceScheme, inputPath);

Pipe pipe = new Pipe("pipe");

Scheme sinkScheme = new TextDelimited(new Fields("field1", "field2"),
",");
String outputPath = "datafiles/output/out";
Tap sink = new Hfs(sinkScheme, outputPath, SinkMode.REPLACE);

Properties properties = new Properties();
AppProps.setApplicationJarClass(properties, Test.class);

FlowConnector flowConnector = new HadoopFlowConnector(properties);
Flow flow = flowConnector.connect("test", source, sink, pipe);
flow.complete();
       }
}


Input File data - 
2011,s
2011 ,s
 2012,s
2013 ,s
2011   ,s


Output:
cascading.tuple.TupleException: unable to read from input identifier: file:/C:/Projects/CascadingTest/datafiles/input/in111
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:127)
at cascading.flow.stream.SourceStage.map(SourceStage.java:76)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:127)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: cascading.tap.TapException: field field1 cannot be coerced from : 2011  to: java.lang.Integer
at cascading.scheme.util.DelimitedParser.coerceParsedLine(DelimitedParser.java:370)
at cascading.scheme.util.DelimitedParser.parseLine(DelimitedParser.java:345)
at cascading.scheme.hadoop.TextDelimited.source(TextDelimited.java:1008)
at cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:140)
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:120)
... 6 more
Caused by: java.lang.NumberFormatException: For input string: "2011 "
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at cascading.tuple.coerce.IntegerObjectCoerce.coerce(IntegerObjectCoerce.java:50)
at cascading.tuple.coerce.IntegerObjectCoerce.coerce(IntegerObjectCoerce.java:29)
at cascading.tuple.coerce.Coercions$Coerce.canonical(Coercions.java:64)
at cascading.scheme.util.DelimitedParser.coerceParsedLine(DelimitedParser.java:363)
... 10 more


I was checking the code for BigDecimalCoerce.java which the coerce the value to its type, below is the code:
        @Override
public BigDecimal coerce(Object value) {
if (value instanceof Double)
return BigDecimal.valueOf((Double) value);
else if (value instanceof Long)
return BigDecimal.valueOf((Long) value);
else if (value == null || value.toString().isEmpty())
return null;
else
return new BigDecimal(value.toString());
}

Q: 
Is it possible to trim the value for BigDecimal, Double, Integer, Long, Short in Coerce method? If somehow we are able to trim() the value here then we will not get NumberFormatException while coercions. 


Please let me know how we can read the number having spaces without exception.



Thanks,
Bhavesh Shah

Elliot West

unread,
Jul 15, 2015, 12:44:47 PM7/15/15
to cascadi...@googlegroups.com
Hi Bhavesh,

I believe that the error you are seeing is expected behaviour. The clearest approach here is probably to read these values as Strings (with the spaces they are technically not numeric), trim them with an ExpressionFunction or similar, and then use the Coerce subassembly to convert them to numeric types. Alternatively, you could try and subclass DelimitedParser adding your trimming functionality there and passing this into your TextDelimited. Perhaps override cleanSplit(...) like so:

public Object[] cleanSplit( Object[] split, Pattern cleanPattern, Pattern escapePattern, String quote ) {
  Object[] values = super.cleanSplit(split, cleanPattern, escapePattern, quote);
  for (int i = 0 ; i < values.length ; i++) {
    values[i] = values[i] == null ? null : ((String) values[i]).trim(); 
  }
  return values;
}

Clearly this will trim all columns, not just those whose Fields are numeric which may be limiting or undesirable.

Cheers - Elliot.



--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/60d86dae-fa0d-452e-a467-bbc1f90794bf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages