Issue 221 in any23: N3/NQ parsers ignoring stopAtFirstError flag

4 views
Skip to first unread message

an...@googlecode.com

unread,
Feb 17, 2012, 4:59:25 AM2/17/12
to any2...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 221 by hannes.m...@gmail.com: N3/NQ parsers ignoring
stopAtFirstError flag
http://code.google.com/p/any23/issues/detail?id=221

The base interface for all RDF parsers (org.openrdf.rio.RDFParser) defines
a method setStopAtFirstError. The documentation for this methods reads
as "Sets whether the parser should stop immediately if it finds an error in
the data". This is indeed very useful, as many data sets "out there"
contain an amount of malformed entries.

However, as far as I can tell from the current source code (0.6.1 and SVN
trunk), both the NTriples parser
(org.openrdf.rio.ntriples.NTriplesParser.NTriplesParser) and the
NQuadsParser (org.deri.any23.parser.NQuadsParser) ignore this flag. In
their respective implementations, they run through the entire files in an
unchecked loop (see
http://code.google.com/p/any23/source/browse/trunk/any23-core/src/main/java/org/deri/any23/io/nquads/NQuadsParser.java#100).

while(parseLine(fileReader)) {
nextRow();
}

Now, if the parsing of any line in a potential huge file throws an
exception, the entire parsing process stops regardless of the setting of
the "stopAtFirstError" flag. I propose these loops to be changed to honor
this flag, so that when it is set to "false", the rest of the line is
discarded and the parsing process can continue with the next line.

an...@googlecode.com

unread,
Feb 17, 2012, 5:22:47 AM2/17/12
to any2...@googlegroups.com

Comment #1 on issue 221 by hannes.m...@gmail.com: N3/NQ parsers ignoring
stopAtFirstError flag
http://code.google.com/p/any23/issues/detail?id=221

I have implemented this behavior on the latest version of NQuadsParser from
SVN (r1601), the source file is attached. I have changed the parseLine()
method as follows:
{{{
private boolean parseLine(BufferedReader br) throws IOException,
RDFParseException, RDFHandlerException {
// [...]
try {
// [...]
// notifiyStatement moved into try block
notifyStatement(sub, pred, obj, graph);
} catch (EOS eos) {
reportFatalError("Unexpected end of line.", row, col);
throw new IllegalStateException();
} catch (IllegalArgumentException iae) {
if (!stopAtFirstError()) {
// remove remainder of broken line
consumeBrokenLine(br);
// notify parse error listener
reportError(iae.getMessage(), row, col);
} else {
throw new RDFParseException(iae);
}
}
// [...]
}

private void consumeBrokenLine(BufferedReader br) throws IOException {
char c;
while (true) {
mark(br);
c = readChar(br);
if (c == '\n') {
return;
}
}
}
}}}

It would be great if this or similar changes would find their way into the
Any23 parsers.

Attachments:
RobustNquadsParser.java 16.7 KB

an...@googlecode.com

unread,
Feb 17, 2012, 5:41:55 AM2/17/12
to any2...@googlegroups.com

Comment #2 on issue 221 by lewis.mc...@gmail.com: N3/NQ parsers ignoring
stopAtFirstError flag
http://code.google.com/p/any23/issues/detail?id=221

Hi Hannes,

Firstly thank you very much for reporting this.
We are now using a Jira instance to track bugs and other related issues,
you can find it here [1]. Would you be kind enough to open an issue and
attach your contribution as a patch over there please?

Thanks again.

[1] https://issues.apache.org/jira/browse/ANY23

an...@googlecode.com

unread,
Feb 17, 2012, 7:18:47 AM2/17/12
to any2...@googlegroups.com

Comment #3 on issue 221 by hannes.m...@gmail.com: N3/NQ parsers ignoring
stopAtFirstError flag
http://code.google.com/p/any23/issues/detail?id=221

Copied to https://issues.apache.org/jira/browse/ANY23-49

Reply all
Reply to author
Forward
0 new messages