What about C++ version ?

78 views
Skip to first unread message

Вася Пупкин

unread,
Jan 18, 2014, 3:13:23 AM1/18/14
to hyperg...@googlegroups.com
Hi Borislav ! I use hypergraphdb in my current project. It is cool ! Respect ! But I think that C++ port will be very useful. What about this work ? Is there any peoples who doing it ? If no, I whant try to do. To start at least. So can you write me what are you think about C++ version ? It is clear, that it will differ from Java one. For example it must work as standalone server, not only as embeddable, as Java one. So it is nessesary some query language. And it is only one thing about C++ port. Probably you have think about C++ version features. Can you tell me about ?

Also I need with hypergraphdb viewer. I tried to run HGViewer but it cannot works with version 1.2 database. Now I use my own viewer, but it related very strong with my own database structure. So it is very special tool. Do you thing about the next version of  HGViewer ? Can you tell me about it ? Maybe I make universal viewer from this my working tool.
Regards
Eugene.  

Axiatropic Semantics

unread,
Sep 10, 2020, 9:54:15 PM9/10/20
to HyperGraphDB
I've seen several posts about C++, mostly from a few years ago.  What's the status of a C++ port now?  Is anyone interested in collaborating on a C++ version of HyperGraphDB or at least something similar?  I've been working on a Qt-oriented C++ persistence module based on WhiteDB (WhiteDB is a sort of embeddable pure-C database sometimes described as a "tuple store").  Qt's native binary serialization capabilities can encode many data types automatically, providing functionality analogous to HGAtomType.store.  When someone wants to query over a projection, it can be factored into a separate WhiteDB record "field" (instead or in place of existing only in a binarized "blob").  WhiteDB records can have varying numbers of fields, so they can hold lists/vectors/stacks/queues/etc. if one wants to query individual items.  There's a lot more details, but that's the backbone; in general I think this setup provides a reasonable basis for a C++ engine structurally similar to HyperGraphDB.  Still, I'd like to enquire what other approaches have been taken toward a possible C++ port if there's anything?

Borislav Iordanov

unread,
Oct 15, 2020, 12:43:43 AM10/15/20
to HyperGraphDB
Hi there,

And apologies for the delayed response. There has never been a real
attempt for a C++ version. The main motivation for a C++ version was
integration with OpenCog, but there has never been a serious push for
that.

Outside of the OpenCog context, It's a bit harder to justify a C++
rewrite given how many other more beneficial things could be done with
the project. The main benefit would be embedding in C++ based
projects, or other natively compiled languages (like Go maybe, I'm not
sure), but the JVM has such a bigger user base overall still.

That said, WhiteDB seems like a pretty cool project and possibly a
promising storage engine for HyperGraphDB. The only red flag I see
from the project description is the locking based concurrency, which
is also kind of surprising for a new DB, given how much better the
MVCC approach is :)

What is your interest in a C++ move?

Cheers,
Boris
> --
> You received this message because you are subscribed to the Google Groups "HyperGraphDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hypergraphdb...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hypergraphdb/2ebd7a56-f445-44a3-9542-b252e4df84d2n%40googlegroups.com.



--

"Damn! The world is big!"

-- Heleni Daly

Axiatropic Semantics

unread,
Oct 19, 2020, 1:00:05 PM10/19/20
to hyperg...@googlegroups.com
Thanks alot for writing back.  My goal is (first and foremost) a combined property graph/hypergraph db engine which could be embedded in C++ applications, particularly those used for science and medicine.  I want to minimize external dependencies (OpenCog seems to have an elaborate compile process).  The db should be trivial to compile given a minimal C++ dev environment; WhiteDB and any layers I'm adding are basically self-contained (their only major prerequisite is Qt, for GUI integration).  Use-cases might be persisting application state, storing and executing computational workflows, and representing metadata about data sets.  Scientific research often involves data *sets* instead of data *bases* -- the data is stored in files with special formats, often requiring special parser/client libraries -- but a database could be used as a curation tool, keeping metadata, file lists, and versioning up-to-date.  With one co-author I'm currently writing a book to be published by Elsevier next year about "data integration for Covid-19" where we analyze Covid-19 data sets and the technologies (lab equipment and software) used to generate them.  We're planning to build an archive of republished Covid-19 data sets and intend to distribute the source code for a lightweight database engine that could be used to query and manipulate those data sets.  We're using C++ so that this engine could be available as a source-code drop-in library for C++ applications in fields like bioimaging, cytometry/microscopy, simulations, etc., where C++ is prevalent.

More broadly I'm trying to combine the features of property graphs and hypergraphs, which involves implementing something like a Gremlin Virtual Machine that would recognize hypergraph constructs (e.g. multi-vertex edges and "payloads" in the sense of serializing objects in binary data accessible through hypernodes).  There is a partial header-only C++ implementation of Gremlin, called "BitGraph" (it was an MS thesis, apparently, by an Alex Barghi who's now at the Johns Hopkins Applied Physics Laboratory) which is a good starting point. 

I'd be curious whether anyone has ideas about designs for a query language that could traverse hypergraphs as well as property graphs.  Gremlin has several dozen values of "traverser state" and "steps" which transition between states.  Depending on the current state, queries can get a value, step to another state, or "call a lambda", so the idea is to model traversal and query operations in terms of these primitives (states, steps, and lambdas).  Unlike HyperGraphDB and TinkerPop, there aren't well-known JVM-like languages in the C++ context, but there are some scripting platforms explicitly designed for C++ embedding, e.g. AngelScript, ECL and Clasp.  What I've found is that a Gremlin-style VM is an elegant basis for an AngelScript interface because you only need to expose a small set of traversal primitives to the AngelScript runtime, and query scripts can navigate through a graph via these primitives.  Ideally, query VMs for different hypergraph engines could transpile so that queries written for one engine would work for others as well (which is a selling point for the suite of distinct projects which all use a TinkerPop back-end).

Since interoperability is a goal in that sense, I'd be eager to hear feedback from developers working in a Java rather than C++ context.  If there were some language-neutral intermediate representation for hypergraph traversals then it should be possible to translate BitGraph queries into HyperGraphDB queries, e.g., and vice-versa. 

Thanks again --  Nathaniel

Borislav Iordanov

unread,
Oct 29, 2020, 2:20:41 AM10/29/20
to HyperGraphDB
Hi Nathaniel,

See replies inline:

On Mon, Oct 19, 2020 at 1:00 PM Axiatropic Semantics <axiatropic...@gmail.com> wrote:
Thanks alot for writing back.  My goal is (first and foremost) a combined property graph/hypergraph db engine which could be embedded in C++ applications, particularly those used for science and medicine.  I want to minimize external dependencies (OpenCog seems to have an elaborate compile process).  The db should be trivial to compile given a minimal C++ dev environment; WhiteDB and any layers I'm adding are basically self-contained (their only major prerequisite is Qt, for GUI integration).  Use-cases might be persisting application state, storing and executing computational workflows, and representing metadata about data sets. 

Ok, that makes sense. Opencog does have a lot of extra complexity. So I take it that you'd want that flexible hypergraph model easily available to C++ projects. You are not looking into C++ for ultra high-performance big data, distributed engine etc. etc., at least that's not the immediate goal. That's interesting. That's more or less how HGDB has been and continues to be used so far. 
 
Scientific research often involves data *sets* instead of data *bases* -- the data is stored in files with special formats, often requiring special parser/client libraries -- but a database could be used as a curation tool, keeping metadata, file lists, and versioning up-to-date.  With one co-author I'm currently writing a book to be published by Elsevier next year about "data integration for Covid-19" where we analyze Covid-19 data sets and the technologies (lab equipment and software) used to generate them.  We're planning to build an archive of republished Covid-19 data sets and intend to distribute the source code for a lightweight database engine that could be used to query and manipulate those data sets.  We're using C++ so that this engine could be available as a source-code drop-in library for C++ applications in fields like bioimaging, cytometry/microscopy, simulations, etc., where C++ is prevalent.

So if I'm understanding correctly, you are trying to do a C++ data integration framework that is geared towards high-performance applications written in C++? That sounds like an interesting problem and a very useful product to have. Data integration is typically done for things like reporting in an enterprise setting and linked data is gaining lots of ground there, but this seems different. I'd be curious to learn more if you have papers or online resources to share. 
 

More broadly I'm trying to combine the features of property graphs and hypergraphs, which involves implementing something like a Gremlin Virtual Machine that would recognize hypergraph constructs (e.g. multi-vertex edges and "payloads" in the sense of serializing objects in binary data accessible through hypernodes).  There is a partial header-only C++ implementation of Gremlin, called "BitGraph" (it was an MS thesis, apparently, by an Alex Barghi who's now at the Johns Hopkins Applied Physics Laboratory) which is a good starting point. 

I think the idea of having a virtual machine to compile to, and potentially go down to the metal, for graph databases is fundamentally sound.  Can you elaborate on "combine features of property graphs and hypergraphs". When I read something like this, I'm thinking what's the fundamental model here? But I suppose you'd want the ability to specify traversal patterns in the style of Gremlin and do it over a hypergraph model, but it would be a similar style of processing. 
 

I'd be curious whether anyone has ideas about designs for a query language that could traverse hypergraphs as well as property graphs.  Gremlin has several dozen values of "traverser state" and "steps" which transition between states.  Depending on the current state, queries can get a value, step to another state, or "call a lambda", so the idea is to model traversal and query operations in terms of these primitives (states, steps, and lambdas). 

I wouldn't take the above as a very solid foundation. Well, I am not very familiar with it, but from the hints that you are giving it seems a bit ad hoc, but I might be wrong. I think the initial sin of Gremlin (TinkerPop) was to not abstract on the traversal operations, but on the data structure. IMO, the way to go would be to design primitives that are completely independent of the structure, but that have a clean and expressive algebraic foundation which would make it possible to optimize and rewrite queries & traversals to match the data and its structure in an optimal way. A highly abstracted notion of traversing would cover hyper-traversals. I don't know what exactly it would look like, it's hard to think in terms of operations without grounding them in some sort of structure. As an example, I'm thinking of something analogous to a standard Iterator interface which is so powerful because it is independent from the underlying structure. It's trivial to think of an iterator as an abstract set of operations, because we are so familiar with sequential structures. 
 
Unlike HyperGraphDB and TinkerPop, there aren't well-known JVM-like languages in the C++ context, but there are some scripting platforms explicitly designed for C++ embedding, e.g. AngelScript, ECL and Clasp.  What I've found is that a Gremlin-style VM is an elegant basis for an AngelScript interface because you only need to expose a small set of traversal primitives to the AngelScript runtime, and query scripts can navigate through a graph via these primitives.  Ideally, query VMs for different hypergraph engines could transpile so that queries written for one engine would work for others as well (which is a selling point for the suite of distinct projects which all use a TinkerPop back-end). 

Since interoperability is a goal in that sense, I'd be eager to hear feedback from developers working in a Java rather than C++ context.  If there were some language-neutral intermediate representation for hypergraph traversals then it should be possible to translate BitGraph queries into HyperGraphDB queries, e.g., and vice-versa. 


A "language-neutral representation" of hypergraph traversals is just another way of saying a "language" of hypergraph traversals :) But there are different possible hypergraph models as well. HyperGraphDB's original sin is to not commit enough to a such model and go with it fully. There are two thoroughly worked out formalisms of a hypergraph data model that I know of: one is the all but forgotten Topic Maps  and the other is PSOA https://wiki.ruleml.org/index.php/PSOA_RuleML_Bridges_Graph_and_Relational_Databases, which is more recent and actively being worked on. 

Perhaps a fruitful approach would be to simplify the current HGDB model even further, get rid of types for example, and try to build other formalisms on top of that simpler model. 
 
I'm probably not helping much here. I'd be happy to create a POC with a small C++ port of HGDB, based one WhiteDB or LMDB or whatever, but the end goal and a clear conception of the ultimate architecture are important. If the goal is high-performance data integration for C++ scientific applications, that already gives quite a bit of direction. 

Cheers,
Boris

Reply all
Reply to author
Forward
0 new messages