Scardf N-Triples parser

30 views
Skip to first unread message

Leif Warner

unread,
Dec 29, 2010, 4:29:13 AM12/29/10
to scardf
Since the stand-alone scardf was missing the ability to round-trip RDF
without being able to read N-Triples, I had a go at making a parser
for that.

I thought using Scala's parser combinators would be overkill for
something as trivial as N-Triples. There is that Scanner class in
Java, also... What I have here is perhaps going overboard with low-
level, character at a time streaming, mutable state abuse. I think I
was trying to go for performance, inline everything, and not create
any new objects unnecessarily.

But it seems to work. Just haven't finished the support for language
tags yet - I haven't seen many of those.

The parser can be downloaded from:
https://github.com/LeifW/scardf/raw/master/src/main/scala/org/scardf/Parse.scala

And here's the svn diff of graph.scala:
Index: src/main/scala/org/scardf/graph.scala
===================================================================
--- src/main/scala/org/scardf/graph.scala (revision 200)
+++ src/main/scala/org/scardf/graph.scala (working copy)
@@ -221,7 +221,10 @@
sw.toString
}

- def readFrom( r: java.io.Reader ): Graph = throw new
UnsupportedOperationException()
+ def readFrom( r: java.io.Reader ): Graph = sf match {
+ case NTriple => Parse(r)
+ case _ => throw new UnsupportedOperationException()
+ }
}

abstract class SerializationFormat

A question: How do you build this?
When I type mvn compile, the only thing that gets built is
net.croz.scardf.query.NTripleHelper.class
I just made it an sbt project, and it seems to build fine - I'm more
familiar with sbt anyways.

-Leif

Hrvoje Simic

unread,
Dec 29, 2010, 6:54:50 AM12/29/10
to sca...@googlegroups.com
Hi Leif,

First of all, great job! Here's my initial impression - I'll have to
take some time to try it out properly, though.

> Since the stand-alone scardf was missing the ability to round-trip RDF
> without being able to read N-Triples, I had a go at making a parser
> for that.

Don't know if you saw the swap-scala project. Dan Connolly made a
model and multiple parsers in Scala. Here's his N-Triples parser, for
reference:

http://code.google.com/p/swap-scala/source/browse/src/main/scala/ntriples.scala?r=b94a4672614017649da77ec24cf2b2772e91d8b5

> I thought using Scala's parser combinators would be overkill for
> something as trivial as N-Triples.  There is that Scanner class in
> Java, also...  What I have here is perhaps going overboard with low-
> level, character at a time streaming, mutable state abuse.  I think I
> was trying to go for performance, inline everything, and not create
> any new objects unnecessarily.

I have some personal experience with combinator parsing in 2.7. Most
recently I've been parsing 20+ GB log files and found them to be quite
fast, easy on the memory, and great for rapid development.

But N-Triples has a trivial syntax and so your approach is probably
better in the long term. I'll try it out and let you know what I
think.

> But it seems to work.  Just haven't finished the support for language
> tags yet - I haven't seen many of those.

You're an American, right? Well, they are kind of a big deal outside
of the US ;)

> A question: How do you build this?
> When I type mvn compile, the only thing that gets built is
> net.croz.scardf.query.NTripleHelper.class

*~* (or whatever is the emoticon for blushing)

You can type "mvn scala:compile". Or just download the latest version
of the POM file.

> I just made it an sbt project, and it seems to build fine - I'm more
> familiar with sbt anyways.

Never got around to learn how to use sbt. Maybe you could contribute
your sbt file/script/whatchamacallit?

Hrvoje

Leif Warner

unread,
Dec 31, 2010, 4:10:31 AM12/31/10
to scardf


On Dec 29, 3:54 am, Hrvoje Simic <hrvoje.si...@gmail.com> wrote:
> Hi Leif,
>
> First of all, great job! Here's my initial impression - I'll have to
> take some time to try it out properly, though.
>
> > Since the stand-alone scardf was missing the ability to round-trip RDF
> > without being able to read N-Triples, I had a go at making a parser
> > for that.
>
> Don't know if you saw the swap-scala project. Dan Connolly made a
> model and multiple parsers in Scala. Here's his N-Triples parser, for
> reference:
>
> http://code.google.com/p/swap-scala/source/browse/src/main/scala/ntri...
>

Oh, that would've been nice... Should be easy to adapt that to build
a Scardf graph, too.
I started off thinking, "Oh, N-Triples are so simple, writing a
'parser' would be nothing.", but so far it turned out literals are
more complicated than I expected.
I noticed it would fall over on literals that had escaped quote marks
in them, because it was just looking for the next quote marks. So now
it handles escapes in strings, like unicode sequences, tabs, newlines,
quotes, etc.

> > I thought using Scala's parser combinators would be overkill for
> > something as trivial as N-Triples.  There is that Scanner class in
> > Java, also...  What I have here is perhaps going overboard with low-
> > level, character at a time streaming, mutable state abuse.  I think I
> > was trying to go for performance, inline everything, and not create
> > any new objects unnecessarily.
>
> I have some personal experience with combinator parsing in 2.7. Most
> recently I've been parsing 20+ GB log files and found them to be quite
> fast, easy on the memory, and great for rapid development.
>
> But N-Triples has a trivial syntax and so your approach is probably
> better in the long term. I'll try it out and let you know what I
> think.
>
> > But it seems to work.  Just haven't finished the support for language
> > tags yet - I haven't seen many of those.
>
> You're an American, right? Well, they are kind of a big deal outside
> of the US ;)
>

Yeah. I think I remember them mentioning something about that back in
school, on the existence "other languages" out there, but I can't
quite remember all the details... :/
The language tags it'll handle now are some lowercase chars,
optionally followed by a dash and some alphaNumeric chars.

I tested it on the AGROVOC SKOS, but all the language tags in there
just look like simply "en", "es", "sk", etc...

Speaking of testing, I made a round-trip test for the serialization,
up at
https://github.com/LeifW/scardf/raw/master/src/test/scala/org/scardf/serialize.spec.scala

Nothing blows up if I run mvn test, so I think it works?

I ran into an issue with what the parser was expecting from the
serialized N-Triples, though. It complained about expecting an EOL
character at the end of the file...
The N-Triples BNF does say all lines must end with an eoln, so I added
one.
http://www.w3.org/TR/rdf-testcases/#ntriples

Just added "+ '\n'" on line 214:
Index: graph.scala
===================================================================
--- graph.scala (revision 201)
+++ graph.scala (working copy)
@@ -211,7 +211,7 @@
}

def write( g: Graph, w: java.io.Writer ): Unit = sf match {
- case NTriple => w write g.triples.map{ _.rend }.mkString( "\n" )
+ case NTriple => w write (g.triples.map{ _.rend }.mkString( "\n" )
+ '\n')
case _ => throw new UnsupportedOperationException()
}

> > A question: How do you build this?
> > When I type mvn compile, the only thing that gets built is
> > net.croz.scardf.query.NTripleHelper.class
>
> *~* (or whatever is the emoticon for blushing)
>
> You can type "mvn scala:compile". Or just download the latest version
> of the POM file.
>
> > I just made it an sbt project, and it seems to build fine - I'm more
> > familiar with sbt anyways.
>
> Never got around to learn how to use sbt. Maybe you could contribute
> your sbt file/script/whatchamacallit?
>

Sure, I've got all these changes on github,
https://github.com/LeifW/scardf
The sbt config is in the /project directory, with /project/
build.properties containing the project name and version number, and /
project/build/ScardfProject.scala having the dependencies.

You wouldn't happen to know anything about setting up a maven / ivy
repo for this, would you?

-Leif

Hrvoje Simic

unread,
Dec 31, 2010, 1:52:01 PM12/31/10
to sca...@googlegroups.com
>> Don't know if you saw the swap-scala project. Dan Connolly made a
>> model and multiple parsers in Scala.
> Oh, that would've been nice...  Should be easy to adapt that to build
> a Scardf graph, too.

Probably. I haven't looked at it in detail. There are no new updates
since February, and Connolly left W3C in June. Code is MIT-licensed,
so there shouldn't be
a problem to adopt parts of his code.

> Speaking of testing, I made a round-trip test for the serialization...


> Nothing blows up if I run mvn test, so I think it works?

Umph, not quite. Pure org.scardf has an incomplete implementation of
the is-isomorphic-to method (=~). Currently, if it returns true that
doesn't mean the graphs are isomorphic for sure (false does mean
they're not). I've been working on a full RDF isomorphism algorithm in
Scala but I've never finished it.

The workaround is to use Jena. If both A and B are JenaGraphs, then
A=~B will delegate to Jena's isIsomorphicWith method, which is tried
and true. So I do a conversion like this:

val g = new jena.JenaGraph ++ Doe.graph

And your mustVerify test now really works!

I tried to make a ScalaCheck test for it, but I got stuck. I'll try to
make additional tests, ScalaCheck or not. Also, it would be good to
add blank lines and comment lines so that it becomes an
almost-complete implementation. It got me thinking - it could be used
as a low-tech bridge between frameworks.

> The N-Triples BNF does say all lines must end with an eoln, so I added
> one.

OK, great.

>> Never got around to learn how to use sbt. Maybe you could contribute
>> your sbt file/script/whatchamacallit?
>
> Sure, I've got all these changes on github,
> https://github.com/LeifW/scardf
> The sbt config is in the /project directory, with /project/
> build.properties containing the project name and version number, and  /
> project/build/ScardfProject.scala having the dependencies.

I will check it out.

> You wouldn't happen to know anything about setting up a maven / ivy
> repo for this, would you?

No, not really. Are you talking about setting up a private Maven
repository to put up scardf, or to publish Scardf on a well-known
repository, or something else?

cheers,
Hrvoje

Hrvoje Simic

unread,
Dec 31, 2010, 2:18:16 PM12/31/10
to sca...@googlegroups.com
> Also, it would be good to
> add blank lines and comment lines so that it becomes an
> almost-complete implementation.

Just so you know, I've just added blank lines and comments to the parser.

Hrvoje Simic

unread,
Jan 3, 2011, 7:47:07 AM1/3/11
to sca...@googlegroups.com
Commited. I've also added support to non-eol'd documents and additional tests:

http://code.google.com/p/scardf/source/detail?r=202

Leif Warner

unread,
Jan 3, 2011, 10:08:36 AM1/3/11
to sca...@googlegroups.com
Cool!  Ah, this whole time I was thinking that by "add blank lines and comment lines" you meant add comments and formatting to the Scala source code... and somehow I went ahead and added support for parsing blank lines and comment lines without realizing that's what you were talking about. *facepalm*  

About writing out the serialized form; I was thinking, would it be more stream-like to write a triple at a time rather than using mkString?
Say line 217 could instead say something like
case NTriple => g.triples.foreach{ t=> w write t.rend }
and the '\n' could be part of the triple's rend method?

I was just thinking it'd be nice to have scardf on a private repo for projects that depend on it.  I looked into it a little, and apparently you just need access to a directory on an http server over "SCP, SFTP, FTP, WebDAV, or the filesystem" that'll host the pom and jar files generated with "mvn deploy".
I found this blog entry on using google code's svn repo as a maven repo, pushing over WebDAV:

But with the build tool I'm using at the moment (SBT), I can also just directly specify the URL of a jar file as a dependency, which I'm doing right now for scardf.  It just doesn't read in the pom.xml and resolve the transitive dependencies in that case, like Joda-Time, so I list that dependency in that in my project, too.
-Leif
Reply all
Reply to author
Forward
0 new messages