Hello,
My question might be inline with
TextDelimited, strictness and error trap but I don't want to hijack that thread.
I have a set of files at are CSV. The all contain the same domain data. We we first go to production there will be 93 columns. In the next release there will be 95 columns. The rules of the file is such that all the new columns will go at the end of the row. The older 93 column data is still valid, but doesn't have the extra 2 columns of data.
What I want to do is read in a standard set of fields that are defined statically. The Tap will use a TextDelimited to read the files in. This scheme will be set to strict = false. If the file has 93 columns, the last two should be set to null. If the file has 95 columns the last two columns will be read in.
My unit test method where I'm trying to prove this out is below.
@Test
@SuppressWarnings("rawtypes")
public void load93File() throws IOException {
final String intest = TestHelper.getPathFromCP("/cascading/93columns.dat");
final Tap _93File = getPlatform().getTap(Helper.getLocalEnhancedClaimLineScheme(), intest, SinkMode.UPDATE);
final Tap _93Out = getPlatform().getTap(Helper.getLocalEnhancedClaimLineScheme(), "/tmp/93columnsOut.dat", SinkMode.REPLACE);
class CheckFunction extends BaseOperation<Tuple> implements Function<Tuple> {
private static final long serialVersionUID = 1L;
@Override
public void operate(final FlowProcess arg0, final FunctionCall<Tuple> call) {
final TupleEntry arguments = call.getArguments();
assertEquals(95, call.getArgumentFields().size());
//assertEquals("setEnhancementId", arguments.getString(EnhancedClaimLine.ENHANCEMENT_ID));
for (int i = 0; i < 95; i++) {
System.out.println(arguments.getObject(i));
}
call.getOutputCollector().add(arguments);
}
@Override
public Fields getFieldDeclaration() {
return EnhancedClaimLine.getEnhancedClaimLineFields();
}
}
final Pipe p = new Each("JustChecking", new CheckFunction());
final FlowDef flowDef = FlowDef.flowDef();
flowDef.addSource(p, _93File);
flowDef.addTailSink(new Pipe("out", p), _93Out);
final FlowConnector flowConnector = getPlatform().getFlowConnector();
final Flow connect = flowConnector.connect(flowDef);
connect.complete();
}
Helper.getLocalEnhancedClaimLineScheme() is
@SuppressWarnings("rawtypes")
public static Scheme getLocalEnhancedClaimLineScheme() {
return new cascading.scheme.local.TextDelimited(EnhancedClaimLine.getEnhancedClaimLineFields(), true, false, ",", false, "", null, true);
}
Whenever I run this with data, the output is a row with all nulled values like
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
The point of the unit test is to read in the file and just write it out.
What am I missing?