Project Status and Futures?

145 views
Skip to first unread message

Gary Kopp

unread,
Sep 9, 2021, 2:56:14 PM9/9/21
to HyperGraphDB
I'm seriously considering developing an open-source hypergraph-based application. HypergraphDB looks to be a perfect foundation for this application, even though I would need to extend and polish it. Although I'm extremely impressed with HypergraphDB's architecture and the quality of the code, I note that development appears to be stalled. There is a 1.4 branch that has been open for some time but has not been merged into the master as yet. How would you characterize the current status and future of this project? Would I be likely to encounter serious issues if I tried to make the HypergraphDB code base production-ready? BTW, I'm only interested in the core, storage, P2P, management, and Prolog components; the other HypergraphDB components aren't applicable to my project.

One issue in particular has me confused about the project's status. Issue #134 was opened over a year ago, concerning replacing the use of Berkeley DB with LMDB, and is unassigned. However, there is an LMDB implementation in the HypergraphDB distribution. Is the LMDB implementation ready for use?

Any and all comments would be appreciated.

--Gary

Borislav Iordanov

unread,
Sep 13, 2021, 3:06:16 AM9/13/21
to HyperGraphDB
Hi Gary,

Let me answer briefly and then if you want to talk further, drop me a
note privately and we could have a chat.

HyperGraphDB has been in slow maintenance mode for years already, for
many reasons that are probably irrelevant. However, it is still
actively being used, certainly by myself in commercialized, production
setting and by some others as well. I haven't had the time to make a
serious push, but given resources (e.g. someone gets serious in
putting in some man hours or perhaps paying for its development) I
certainly would. I think the whole industry around knowledge graphs
that's booming at the moment will eventually come to its senses and
evolves into a hypergraph based model. Hard to say what the future of
HGDB will be because I've been want to make serious progress for years
and have been saying so, but have been consistently distracted by
other priorities.

The 1.4 branch is pretty stable and used in production. No release
simply because I haven't bother to make one. I've been just fixing
bugs there and creating snapshot builds.

There was an LMDB implementation done by Alain Picard and shared at
some point, but it's never been properly integrated into the main
project (properly tested etc.). These days there seem some other
attractive alternatives like RockDB for example.

Of the modules you mentioned, I should not that while the Prolog
implementation works, it has its limitation especially when it comes
to running transactions concurrently (i.e. it doesn't work in that
context). The P2P module works, but looking back the API design, it's
not so easy to work with, one might consider keeping the foundation
but simplifying the high-level protocol (which was originally inspired
by the KQML agent communication language that no one cares about
anymore).

Anyway, if you are serious about putting in time, as I said, drop me
an email privately and let's talk.

Best,
Boris
> --
> You received this message because you are subscribed to the Google Groups "HyperGraphDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hypergraphdb...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hypergraphdb/fc984338-030e-4090-9027-fb6294275a8fn%40googlegroups.com.



--

"Damn! The world is big!"

-- Heleni Daly

Gary Kopp

unread,
Sep 13, 2021, 6:11:31 PM9/13/21
to HyperGraphDB
Hello Boris,

Thanks for the thorough and informative response. Let me think about this for a while. I just might want to invest time in this project. I'll contact you privately as my plans firm up. One thing I have to work out in my mind is how one might create a hypergraph "schema," or in other words how to approach the design of a hypergraph solution. At the moment I'm experimenting with object-role modeling, which has a pretty natural translation to a hypergraph, but I'm early in that analysis.

BTW, in case you are not aware of it there is a relatively new product TypeDB (previously Grakn) that is based on hypergraphs. I'm not seriously considering it for anything at this time because of a number of limitations it currently has, but it does appear it might have a bright future.

--Gary

Borislav Iordanov

unread,
Sep 15, 2021, 1:30:53 AM9/15/21
to HyperGraphDB
Yes, agreed - there is a need for a schema, a proper formal model etc.
That's one take away after years of using HGDB in its original API
which very much wanted to avoid created a formal model,but rather be
truly multi-model by incorporating other meta-model as modules - hence
you have the ability to have an RDF triplestore that you query via a
Prolog interpreter and integrate that to some vanially JSON data,
writing a HGDB query that spans all those models or talking to each
independently. That actually works at the moment, but it's less
practical.

Yeah, I'm pretty well aware of TypeDB as I'm very much an integral
part of its origin story, having suggested to the team there to look
into Topic Maps as an OWL alternative and also leading the team for
the very first version of Grakn (used to be called Mindmaps,then Grakn
and now TypeDB) .

Hope you find motivation to take on the project, would be really cool
to revive it.

Best,
Boris
> To view this discussion on the web visit https://groups.google.com/d/msgid/hypergraphdb/e181c026-76ff-4db6-829a-9d899dc1a126n%40googlegroups.com.

Gary Kopp

unread,
Sep 15, 2021, 12:38:55 PM9/15/21
to HyperGraphDB
I don't seem to have permission to "reply to author" (private replies) in this group. I'd like to take this conversation offline. How do I do that?
--Gary

Borislav Iordanov

unread,
Sep 15, 2021, 12:40:28 PM9/15/21
to HyperGraphDB
Copy & paste the email address from the "from" field? :)

I can see your email address this way..
> To view this discussion on the web visit https://groups.google.com/d/msgid/hypergraphdb/a2aa86eb-9c15-4b62-a9e0-46e92934ee26n%40googlegroups.com.

Gary Kopp

unread,
Sep 16, 2021, 9:43:28 AM9/16/21
to HyperGraphDB
I did send you an email (from my primary email, not my Google account email). I hope you received it.

Axiatropic Semantics

unread,
Sep 17, 2021, 2:07:51 PM9/17/21
to HyperGraphDB
I followed this thread with interest.  I'd like to make a comment which I think is relevant and then ask a question: nb. I'm especially interested in a C++ hypergraph db engine which would be easy to incorporate into other projects (ideally developers would need little more than just include the db files in their code base).  AFAICT the distinguishing feature of a hypergraph db schema (compared with other graph database architectures) would be that information could be "extracted" from a node in numerous different ways.  E.g. if x is a node and y is a "field" of some kind (person.name and such) the "y" could be a "projection" in the sense of HGAbstractCompositeType.Projection, or a "property" in the Neo4j (etc.) sense, or the relation node of a triple where x is the source, or a node "contained in" x (qua hypernode), or a sibling to x in a multi-part relation (cf. Grakn).  I assume that schemata should therefore distinguish each of these cases (which would be more complex than metamodels just for RDF, or just for property graphs, or non-directed hypergraphs, etc.).  I think this also points toward possible implementations: assuming the different "flavors" of "x.y" style associations each have a different bit-pattern then the whole database could in principle be just a key-value collection.  Relations could likewise be stored as hypernodes where each contained node is assigned a unique role.  Such an approach might not be optimized, because for instance one may want to lay out each node's projections in contiguous memory, or index property values, or quickly traverse through hypernodes used as multi-part relations.  However, it would seem that the key-value model would at least provide a minimal behavioral contract for a database seeking to conform to the relevant kind of schema: that is, a hypergraph db engine should be able to retrieve values given constructions of the form "x.y" where "." is a placeholder for multiple sorts of node/value and node/node associations, and the behavioral requirements could be defined via a demonstrative key-value store for sake of reference.

My question, then, is: have you considered codifying the different constructions whereby one node can be the basis for a pattern to extract a particular piece of information, given the diversity of strategies in a general-purpose hypergraph model for associating data with hypernodes, such that each such strategy has a unique signature that could be disambiguated amongst keys in a key-value store (modulo individuation of string/numeric labels, so each key is unique even with multiple association-strategies assuming its label is unique in context)?  Also, how much performance fine-tuning might be necessary on top of a basic key-value strategy?  Currently I'm working on an implementation along these lines using Tkrzw (successor to Tokyo Cabinet) and would be interested in supporting a canonical key-encoding format if one exists (or could be standardized) for hypergraph db engines built in different environments (programming languages, back-ends, etc.).

Thanks!      

Alain

unread,
Sep 19, 2021, 1:13:32 PM9/19/21
to hyperg...@googlegroups.com
Hi,

Just catching up on emails, and noticed this email. I would like to mention that we are still very actively using HypergraphDB. On our end we use it as a persistence layer on top of Eclipse EMF/CDO.

As for the LMDB storage, we did put that together, as well as the Java Berkeley version a long time ago.

At some point, we converted LMDB and switch to libmdbx (https://github.com/erthink/libmdbx) which is essentially a drop in replacement for LMDB but which we have found to be much more stable and addresses a number of shortcoming of LMDB.

In the last year or so we have added support for schema migration in HG, mainly focused for us on migration of EMF models.

So very much still in use in commercial deployment.

Alain

Alain

unread,
Sep 20, 2021, 6:47:15 AM9/20/21
to hyperg...@googlegroups.com
BTW, I forgot to mention 2 other important feature changes:
-Multi tenant data level encryption support. This allows encrypting the data, with possibly different keys per tenant, but still allows access to the metadata unencrypted.
-Support for full auditing and branching. This is partly tied to EMF/CDO but is a general approach that can easily be supported in "native" mode. It works with VersionedAtom and pretty much, if enabled, affects the operation of every query and also defines how saves and delete operate at a very low level.

HTH,
Alain

Borislav Iordanov

unread,
Sep 27, 2021, 3:31:43 AM9/27/21
to HyperGraphDB
Hi there,

C++ would be desirable as it would open up the possibility to use it
outside the JVM, for sure. And of course, potentially more performant.

I think your view on what information is carried by a node and how it
is to be extracted and how that relates to schemata is right. Given a
schema language for a graph database where you can model something as
a property of a node or a relationship with another node, it's not
easy to make that choice and it feels quite artificial. RDF doesn't
have that problem, but of course that simplicity in the model yields
complexity in its use (because everything has to be a node).

So yeah, the details matter of course, and IMO it'll be some work to
come up with the right abstraction to cover this sort of flexibility,
but your vision seems on the right track. There is the notion of
"hyper traversal" in HGDB which has to be developed, there is the
notion of complex types with projections, but those are not part of
the graph proper and they need to be. I think there needs to be a way
to look at a node as a strongly typed value with properties or as a
mini-graph (ala RDF) with nodes and values, and one should be able to
take those perspectives at will. And then there is the idea of
hypernodes, which are about graphs-as-nodes, which also needs to be
elaborated. Trying to put all those things together conceptually in a
single model is not easy.

At the moment HGDB is built on top of a key-value store, which gives
great flexibility to tinker on top of it, but suffers from performance
quite a bit. I'm not sure I understand your idea of "codifying", but
there is no universal way to do it - there is a storage layer on top
of the key-value, which is a sort of a primitive graph and then the
actual model is built on top of that. Types of nodes and edges are
also represented in the graph, so that's a codifying mechanism. But
there is language or even a single API to "codify" stuff and over the
years I've come to realize the limitations to some of the interfaces
that are very core to the framework (e.g. the HGAtomType interface)
and given the opportunity I'd like to change that.

In short, if I understand you correctly, HGDB is doing some of what
you are describing, but not all. And I agree it would be nice to do
all of it :)

Cheers,
Boris
> To view this discussion on the web visit https://groups.google.com/d/msgid/hypergraphdb/6c4355d4-bd71-401e-b8b2-5bde14003321n%40googlegroups.com.

Axiatropic Semantics

unread,
Sep 27, 2021, 2:24:40 PM9/27/21
to hyperg...@googlegroups.com
Here's a few more comments for this fascinating discussion (thanks for your detailed reply): I envision a "schema" as something like what "shape constraint" languages try to do for RDF.  I agree with and like the idea of "looking at" a node in different ways, e.g. as a larger structure with properties, or as a single unit at the center of a (maybe shape-constrained) neighborhood, or (say) as a container for a subgraph.  I'd also add: why not be able to look at a node also as a list, or another kind of array-like data structure (stack, queue)?  

In general, consider use cases for "attaching" information to a node: do I have certain fields that can be named which (for a given type) should always be defined/accessible from that node (maybe defined as a null value, but querying the node for the field with label should always be successful).  For such info the node would seem to be in effect a tuple of values (e.g., a simple kind of nested graph).  Or, perhaps one wants to assert metadata which may be applicable only to some subset of a type's instances, which are somehow outside the type's natural logic, which are not part of instances' serialization, etc.  This would seem to call for properties.  Or there's data connected to the node where the connection itself has additional structure -- maybe it's not just a string label but something enumerated by a controlled vocabulary (or Ontology etc.) and/or it's part of a multi-sided relation.  This would seem to call for a subject/relation/object triple where maybe the relation can be reified to a node if needed.  

Conceptually the cases seem different to me, and maybe that can be part of the semantics as well (e.g. queries could return different "null"-like values for absent properties vs present ones with null values, but just use a single null value for tuple members).  Tuples could be generalized to resizable arrays (stacks, queues, and so forth) and maybe nested graphs.  My point is these various constructions are not redundant; they seem to serve different use-cases and might coexist in a single graph.  

The reason I spoke of "codifying" is with the hope that different (hyper)graph engines even with divergent implementations could support a common set of data models (common query semantics, schema-checking options, and so forth) which recognize these various constructions.  That could then serve as a target for new implementations.  I.e., if I'm writing a new db engine which supports properties, hypernodes, multi-relations with roles, nested graphs, etc., then here's what my properties need to do, here's what my hypernodes need to do, etc.  I'd rather design these requirements around a common hypergraph model than implement something with idiosyncratic features that doesn't translate to other solutions (for developers that want to reuse query code, or simply apply their existing knowledge to new implementations, etc.).   

Cheers ...

Borislav Iordanov

unread,
Sep 27, 2021, 5:55:30 PM9/27/21
to HyperGraphDB
Hi Alain,

Really happy to hear that :) ! LMDBX sounds amazing. and the announced
C++ '17 rewrite sounds like what Axiatropic Semantics is looking for
as a key-value store foundation.

Cheers,
Boris

On Sun, Sep 19, 2021 at 1:13 PM Alain <alp...@gmail.com> wrote:
>
> To view this discussion on the web visit https://groups.google.com/d/msgid/hypergraphdb/CACokUmKTx%3DPHrC3NDQyF3LNBza65OqJr6C2rOA0bC2W0q62kOA%40mail.gmail.com.

Borislav Iordanov

unread,
Oct 15, 2021, 12:38:20 AM10/15/21
to hyperg...@googlegroups.com
Hi,


On Mon, Sep 27, 2021 at 2:24 PM Axiatropic Semantics <axiatropic...@gmail.com> wrote:
Here's a few more comments for this fascinating discussion (thanks for your detailed reply): I envision a "schema" as something like what "shape constraint" languages try to do for RDF.  I agree with and like the idea of "looking at" a node in different ways, e.g. as a larger structure with properties, or as a single unit at the center of a (maybe shape-constrained) neighborhood, or (say) as a container for a subgraph.  I'd also add: why not be able to look at a node also as a list, or another kind of array-like data structure (stack, queue)?  

Sure. Fundamentally what HGDB does is make this sharp distinction between values and atoms - only the latter are first class citizens in the graph even though the structure of values can be arbitrarily complex and they form their own low level graph in a sense. And other models do something similar - the value, the payload of a node or an edge is not part of the graph. In the case of the so called property graphs, it’s just key-value map. Doesn’t seem like too much thought went into designing that. RDF is more promising of course because it starts with a very raw model and now RDF* starts moving in the right direction, it’s simple step and probably it’ll take a lot more to increase the modeling capabilities, but the promising bit is that you always stay in the graph, there are no two separate layers of data - one for complex values and another for the graph itself.


In general, consider use cases for "attaching" information to a node: do I have certain fields that can be named which (for a given type) should always be defined/accessible from that node (maybe defined as a null value, but querying the node for the field with label should always be successful).  For such info the node would seem to be in effect a tuple of values (e.g., a simple kind of nested graph).  Or, perhaps one wants to assert metadata which may be applicable only to some subset of a type's instances, which are somehow outside the type's natural logic, which are not part of instances' serialization, etc.  This would seem to call for properties.  Or there's data connected to the node where the connection itself has additional structure -- maybe it's not just a string label but something enumerated by a controlled vocabulary (or Ontology etc.) and/or it's part of a multi-sided relation.  This would seem to call for a subject/relation/object triple where maybe the relation can be reified to a node if needed.  

All of this is already possible in HGDB (and let me know if I’m missing or misunderstanding something) because you can create types and type constructors to do pretty much anything. 

Conceptually the cases seem different to me, and maybe that can be part of the semantics as well (e.g. queries could return different "null"-like values for absent properties vs present ones with null values, but just use a single null value for tuple members).  Tuples could be generalized to resizable arrays (stacks, queues, and so forth) and maybe nested graphs.  My point is these various constructions are not redundant; they seem to serve different use-cases and might coexist in a single graph.  

The reason I spoke of "codifying" is with the hope that different (hyper)graph engines even with divergent implementations could support a common set of data models (common query semantics, schema-checking options, and so forth) which recognize these various constructions.  That could then serve as a target for new implementations.  I.e., if I'm writing a new db engine which supports properties, hypernodes, multi-relations with roles, nested graphs, etc., then here's what my properties need to do, here's what my hypernodes need to do, etc.  I'd rather design these requirements around a common hypergraph model than implement something with idiosyncratic features that doesn't translate to other solutions (for developers that want to reuse query code, or simply apply their existing knowledge to new implementations, etc.).   

Right, I agree with that. And sufficiently complex metamodel will afford many different possible implementations with many different characteristics. Defining a model and then allowing different implementations to be swapped at will, based on use cases and desired performance profile, would be a killer of course. That technology doesn’t exist today and it would be revolutionary. One strategy for this might be to create a library of high-level storage primitives (key-value stores, graph stores, tabular store, bitmaps whatever), a sort of an “assembly” language for managing data using those storage primitives, create a very high-level generalized hypergaph model and allow implementations to be produced using that assembly language and primitives. It must be an open system, with the ability to easily interject in different layers in a relatively localized way. I’m not super familiar with the latest compiler technology, but it’s been maturing this way with highly modularized pieces in the compiler pipeline. Except because languages are fairly static and their compilation converges towards a single optimization strategy, not much is done beyond maybe making it easier to implement new languages. 

The above sounds highly unrealistic, but maybe it’s just so until one tries to actually build it and then it becomes within reach :) 

Best,
Boris

Zach Cuneo

unread,
Oct 15, 2021, 11:55:06 AM10/15/21
to hyperg...@googlegroups.com
Hi all,

I've been on a bit of an exploratory journey this year in search of a simple, flexible data model that maps well to conceptual/domain modeling. The HGDB paper and project have been insightful and inspiring, so thank you for that!

In the case of the so called property graphs, it’s just key-value map. Doesn’t seem like too much thought went into designing that.

Out of curiosity, has anybody read the Algebraic Property Graphs paper? Despite the title, HGDB is cited and hyper(graphs/elements) are discussed. https://arxiv.org/abs/1909.04881

There's also a presentation on YouTube where the authors go over motivations and the paper: https://www.youtube.com/watch?v=W3rpPnhw-nM

Frankly, I do not have the formal background to fully appreciate the math and implications, so I'm curious to hear your thoughts (if APGs are even applicable here).

Thank you,
Zach

You received this message because you are subscribed to a topic in the Google Groups "HyperGraphDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hypergraphdb/n3y8_ryQAxQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hypergraphdb...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hypergraphdb/CAK6eaTkkFBvhVTnLtvk0K5UppnwJ_iVOmrwmVB2Bo7dy%3DS0qkQ%40mail.gmail.com.

Borislav Iordanov

unread,
Oct 16, 2021, 2:27:57 AM10/16/21
to hyperg...@googlegroups.com
Hi Zach,

Yes, I have read the APG paper, thanks for bringing it up. It is very relevant as a solid formal foundation. I regret to not be familiar enough with category theory to appreciate to benefits of using it in that context. Though I think more and more mathematicians are exploring that natural connection between graphs and categories, for obvious reasons. And I’ve had a few brief interactions with one of the authors (Joshua) in the context of designing a schema language for property graphs. 

I’ve been saying this for years :), but sooner or later people will realize that hypergraphs are really the way to go for data management (= modeling + storage + querying + etc…). For the moment, a paper and a solid theory about generalized graphs, has to use words like “labeled’ and “property” in its title. But the essence of it hasn’t much to do with property graphs. And I think the “labeled” part could be dispensed with as well.

But yeah, I think that could be a great resource/starting point  to develop a hypergraph data model.

In short the ideas in the APG paper are certainly applicable and the type of approach is probably the right way to formalize anything hypergraph based. But before that, I think it would be more productive to take a user perspective and think about what types of constructs and features would be desirable. As a long time HGDB user,  my list of foundational items that I would like to see in a new and improved model is not very long actually:

1) Handling of nested graphs (an atom as a graph) efficiently, at the storage level.
2) Better handling of sub-graphs (they are an afterthought at the moment), also at the primitive storage level
3) Lifting the storage graph (a.k.a value graph) at the atom level, or at least making it possible for the type system to map a graph to a complex value so that one can see a complex value as a graph or a graph as a complex value. This is simpler than it sounds: at the moment to have all the attributes of a record be atoms themselves.

Those are things that I’ve come across again and again. 

There are many other topics that can be opened for discussion, for example immutability, versioning…. 

Btw, security - storage level encryption in particular - has become quite important recently and IMO a fair amount can be done without rearchitecting at all, since all primitive types are customizable and one could plug versions that encrypt and decrypt the data. But I digress….that’s more of a concern for us poor souls who work in the enterprise world :)

Best,
Boris

Reply all
Reply to author
Forward
0 new messages