Hi Stephen,
I had no commits before breaking things by pulling in MemorySailStore and related classes without properly integrating them, but I will put the repo out there when tests pass. I can tell you a little about the approach:
1) it is a straightforward mapping from RDF 1.1 to property graphs and back. Similar to the old SailGraph mapping [1], but with some small differences. Instead of vertex "kind" (URI, BNode, Literal) we have vertex labels (IRI, BNode, Literal). Every statement is an edge, and the label of the edge is the full IRI of the predicate. For example:
ex:Arthur foaf:name "Arthur Dent"
becomes an edge from IRI vertex Arthur to Literal vertex "Arthur Dent", with the label "
http://xmlns.com/foaf/0.1/name". If the statement is in a named graph other than the default, you have a "context" property with the IRI of the graph as its value. The Literal vertex "Arthur Dent" also has an IRI (String)-valued "datatype" property, and others may have a "language" property.
As in SailGraph, the "value" property is used for the string value of a vertex (IRI or literal value, BNode id).
2) Every vertex is added to a value index. IRI and BNode vertices are unique in the index, while Literals may differ by datatype or language. (S,?,?,?), (?,?,O,?) and (S,?,O,?) query patterns are relatively fast because we can look up S or O in the index. The (less common) (?,P,?) pattern is slow because predicates are not indexed; it requires a full scan over all statement edges. The same goes for (?,?,?,C) or (?,P,?,C) queries (slightly more common -- "give me all the statements in named graph C"). The basic strategy is to choose the better of (subject, object) for a small-as-possible vertex iterator, build the corresponding statements, and filter the statements according to the other constraints of the query.
3) the experimental feature of statement indexes (i.e. indexes on subject-predicate-object-context patterns) is gone from this implementation, making the query solution simpler and presumably faster
4) namespaces are not stored "in the graph" (which means we don't need a special reference vertex for them). They are a convenience which is stored in memory and disappear between sessions.
5) GraphSail again supports an optional "unique statements" constraint, for which the performance penalty should be fairly small
Josh