New to streamsets, so I apologize in advance if I am doing something goofy. All I want to do is parse an xml file with the following format
.
.
.
I've tried multiple combinations of directory reader, with the XML ata format, including xpath /ordata/row/ and row as the record delimiter, and nothing as record delimiter. Wondering if it's because all the fields are attributes, or that there's no explicit end tag. In preview all I get back is
2017-11-06 17:29:32,251 [user:*admin] [pipeline:SO Input/SOInputfda19c9f-674f-4325-99b5-1e0533a68d4e] [runner:] [thread:preview-pool-1-thread-1] INFO Pipeline - Processing lifecycle start event with stage
2017-11-06 17:29:32,254 [user:*admin] [pipeline:SO Input/SOInputfda19c9f-674f-4325-99b5-1e0533a68d4e] [runner:] [thread:preview-pool-1-thread-1] ERROR SpoolDirSource - Failed to process file '/STREAMSETS/SO/source/Data.xml' at position '-1': com.streamsets.pipeline.stage.origin.spooldir.BadSpoolFileException: com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_00 - Cannot advance reader 'Data.xml' to offset '0'
com.streamsets.pipeline.stage.origin.spooldir.BadSpoolFileException: com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_00 - Cannot advance reader 'Data.xml' to offset '0'
at com.streamsets.pipeline.stage.origin.spooldir.SpoolDirSource.produce(SpoolDirSource.java:652)
at com.streamsets.pipeline.stage.origin.spooldir.SpoolDirSource.produce(SpoolDirSource.java:510)
at com.streamsets.pipeline.configurablestage.DSource.produce(DSource.java:38)
at com.streamsets.datacollector.runner.StageRuntime$2.call(StageRuntime.java:228)
at com.streamsets.datacollector.runner.StageRuntime$2.call(StageRuntime.java:222)
at com.streamsets.datacollector.runner.StageRuntime.execute(StageRuntime.java:180)
at com.streamsets.datacollector.runner.StageRuntime.execute(StageRuntime.java:249)
at com.streamsets.datacollector.runner.StagePipe.process(StagePipe.java:231)
at com.streamsets.datacollector.runner.preview.PreviewPipelineRunner.runPollSource(PreviewPipelineRunner.java:315)
at com.streamsets.datacollector.runner.preview.PreviewPipelineRunner.run(PreviewPipelineRunner.java:214)
at com.streamsets.datacollector.runner.Pipeline.run(Pipeline.java:510)
at com.streamsets.datacollector.runner.preview.PreviewPipeline.run(PreviewPipeline.java:51)
at com.streamsets.datacollector.execution.preview.sync.SyncPreviewer.start(SyncPreviewer.java:206)
at com.streamsets.datacollector.execution.preview.async.AsyncPreviewer.lambda$start$0(AsyncPreviewer.java:94)
at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.lambda$call$0(SafeScheduledExecutorService.java:249)
at com.streamsets.datacollector.security.GroupsInScope.execute(GroupsInScope.java:33)
at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.call(SafeScheduledExecutorService.java:245)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_00 - Cannot advance reader 'Data.xml' to offset '0'
at com.streamsets.pipeline.lib.parser.xml.XmlDataParserFactory.createParser(XmlDataParserFactory.java:80)
at com.streamsets.pipeline.lib.parser.xml.XmlDataParserFactory.getParser(XmlDataParserFactory.java:60)
at com.streamsets.pipeline.lib.parser.WrapperDataParserFactory.getParser(WrapperDataParserFactory.java:65)
at com.streamsets.pipeline.stage.origin.spooldir.SpoolDirSource.produce(SpoolDirSource.java:585)
... 22 more
Caused by: java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
at com.streamsets.pipeline.lib.parser.xml.XmlCharDataParser.<init>(XmlCharDataParser.java:89)
at com.streamsets.pipeline.lib.parser.xml.XmlDataParserFactory.createParser(XmlDataParserFactory.java:77)
... 25 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:596)
at com.sun.xml.internal.stream.XMLEventReaderImpl.peek(XMLEventReaderImpl.java:276)
at javax.xml.stream.util.EventReaderDelegate.peek(EventReaderDelegate.java:104)
at com.streamsets.pipeline.lib.xml.StreamingXmlParser.skipIgnorable(StreamingXmlParser.java:232)
at com.streamsets.pipeline.lib.xml.StreamingXmlParser.hasNext(StreamingXmlParser.java:238)
at com.streamsets.pipeline.lib.xml.StreamingXmlParser.<init>(StreamingXmlParser.java:113)
at com.streamsets.pipeline.lib.xml.OverrunStreamingXmlParser.<init>(OverrunStreamingXmlParser.java:59)
at com.streamsets.pipeline.lib.parser.xml.XmlCharDataParser.<init>(XmlCharDataParser.java:80)
... 26 more
2017-11-06 17:29:32,254 [user:*admin] [pipeline:SO Input/SOInputfda19c9f-674f-4325-99b5-1e0533a68d4e] [runner:] [thread:preview-pool-1-thread-1] ERROR DirectorySpooler - Leaving file in error '/STREAMSETS/SO/source/Data.xml' in spool directory
2017-11-06 17:29:32,254 [user:*admin] [pipeline:SO Input/SOInputfda19c9f-674f-4325-99b5-1e0533a68d4e] [runner:] [thread:preview-pool-1-thread-1] INFO Pipeline - Destroying pipeline with reason=UNKNOWN
2017-11-06 17:29:32,255 [user:*admin] [pipeline:SO Input/SOInputfda19c9f-674f-4325-99b5-1e0533a68d4e] [runner:] [thread:preview-pool-1-thread-1] INFO Pipeline - Processing lifecycle stop event
2017-11-06 17:29:32,255 [user:*admin] [pipeline:SO Input/SOInputfda19c9f-674f-4325-99b5-1e0533a68d4e] [runner:] [thread:preview-pool-1-thread-1] INFO Pipeline - Pipeline finished destroying with final reason=FAILURE
2017-11-06 17:29:33,444 [user:admin] [pipeline:SO Input/SOInputfda19c9f-674f-4325-99b5-1e0533a68d4e] [runner:] [thread:webserver-127] WARN StandaloneAndClusterPipelineManager - Evicting idle previewer 'SOInputfda19c9f-674f-4325-99b5-1e0533a68d4e::0'::'47ee8166-ccaf-4e87-b576-e030695edc91' in status 'FINISHED