weak hash for (MongoDb like) data server

162 views
Skip to first unread message

Basile Starynkevitch

unread,
Feb 14, 2017, 10:21:48 AM2/14/17
to golang-nuts
Hello list,

I'm just starting to learn  and code in Go (but I know well C, C++, Ocaml, Scheme and I am the main designer & implementor of GCC MELT, a Lisp-y domain specific language to customize GCC).
I'm using Go 1.8rc3 on Debian/Linux/Sid/amd64 (because the plugin facilities are essential to me). I don't care yet about other platforms, but I am interested in coding something which could be compilable both with Go 1.8 native compiler
and with future GCC GO 7.0.

I did read the following thread on The feature I want most, weak https://groups.google.com/forum/#!topic/golang-nuts/MMWXRANh0-g I am familiar with GC techniques, and I do know about the SetFinalizer routine of the standard runtime package.

I want to code in Go something with would be similar to a dynamic data-like server a bit inspired by e.g. MongoDb. Nothing very extraordinary in principle!



Context & motivations

It would be a server (probably some JsonRpc one, but the exact protocol is not yet defined and is an unimportant implementation detail; I could switch to some HTTP or REST thing).
Probably each connected client would be managed by one (or several) goroutines. I don't expect a lot of simultaneous clients (I'm thinking of a dozen at most).

It would manage persistent data organized in data items. So at shutdown time it would persist the set of alive items (and all the data and items reachable from it) into e.g. some Sqlite3 database, probably in JSON format.
I don't expect a lot of data. It would easily fit into RAM on a powerful desktop machine (so think of several gigabytes at most and a few million items at most, and probably a lot less: dozens of megabytes and a few thousand of items).

Each data item has a fixed  (globally) unique id (inspired by UUID-s) which is randomly generated.
That item id is a 126 bit number, represented by two 63 bits (of course represented as uint64_t in Go) serial numbers in base 62 notation starting with an underscore then a digit, so both _136vhwQ6Q5W &
_3FirKslewA3 are serials, and their concatenation _136vhwQ6Q5W_3FirKslewA3 is the textual representation of some id; the underlying uint64_t number pair of that id is (881330831883687952,3076946490273043099)

So data items are both hash-able (their hash code is some hash of their id) and comparable (to compare two items, e.g. for sorting purposes or for dichotomical access, we would compare their id).

There exist some root data items (whose ids are publicly known, without any interaction with the server) and which are always alive.
Without loss of generality we might pretend that there is a single root item of id _123456789AB_987654321CD (so that particular root id is publicly known and is documented with the protocol) and
 that root data item is always alive.

Each data item contain data values, organized as attributes and components (both explained in more detail below) of that item, and these data values are always  immutable. So one can replace a data value by another one (e.g. in attributes or components), but cannot change the content of a given data value.

A data value is one of:
  • null (which also represent "absence" of value)
  • an integer
  • a string
  • a reference to a data item (represented by the id)
  • a tuple (finite sequence of data item references, represented by their id; repetitions are possible) of components
  • a set (ascending ordered finite sequence of data item references, represented by their id, without repetition) of elements, so membership (of some data item in some given set) can be tested dichotomically
The attributes  of a data item are conceptually a (finite) association from data items (keys) to non-nil data values (so fetching a missing attribute gives null).
The components of a data item are conceptually a (finite) vector of data values. That vector could be resized (with new slots having null by default).

A request to the data server would be an (atomicly processed) sequence (ordered) or composition of elementary operations including
  • accessing operations, notably
    • fetching a data item of given id (or null if none)
    • get the attribute in some data item with some given data item Id as key
    • get the N-th component of some data item
    • get the I-th item inside a tuple or a set
    • test membership inside a set
    • get the component size of a data item
    • get the attribute size (cardinality of attribute association) in a data item
    • etc...
  • creating operations, notably
    • make a fresh data item (giving its new id)
    • make an integer data value, or a string data value
    • make a tuple data value, from the ids of its tuple-compoents
    • make a set data value, from the sequence of ids of its set-elements
  • modifying operations, notably
    • adding (or removing) some attribute inside some data item, with a given key data item (and added data value, if relevant)
    • putting some component inside some data item
    • resizing the component vector of some data item

And that request would be processed by the server which would return some serialization of the resulting data values (with data item references in results transmitted by their id)


 Q1: How to associate ids to data items?


The id of data items is the only way to refer to data items on the protocol side. So I need a weak association between an id and the data item (if any) of that id. I do not want to keep (in some ordinary Go map, for example) the association between ids and their data items (otherwise, no data item would be garbage collected). I accept the fact that eventually (and perhaps quite later), a data item which is not reachable from the root data item would be garbage collected.

I was thinking of using unsafe.Pointer-s with runtime.SetFinalizer. A simplistic approach could be to use a Go map (module variable) from pairs of uint64_t to such unsafe pointers. At data item creation, I would register in that map, using the pair of uint64_t-s as key, the pointer of that data item suitably converted to an unsafe.Pointer. I would also pass to SetFinalizer some finalization routine which would unregister (delete) from that map. That scheme would work only if Go garbage collector does not move objects (i.e. struct-s). If Go's GC is a conservative GC à la Boehm GC, I'll probably even better "hide" the pointer (e.g. by keeping not a Pointer, but some uintptr suitably "hidden" e.g. by complementing all its bits à la GC_HIDE_POINTERS in Boehm GC). But that scheme would work only if Go's garbage collector is not moving values (e.g. struct-s). Is Go's garbage collector guaranteed to be some marking (not moving or generational copying, à la Ocaml) one? FWIW, I am familiar with the terminology of the GC handbook.

This Q1 is my main question. If you answer only to one question, please let that be Q1!


Q2: How to represent attribute associations?

I'm tempted to use a Go map with data item pointers as keys. But then I don't take advantage that ids are hashable. Maybe that is ok. Or should I find or implement my self some explicit manually coded hash table?
(I believe I could experiment both, so if you don't have time for Q2, please focus on Q1).


Q3: How to represent data values?

I'm sure to be able to find out how sum types should be represented in Go (a canonical example is of course the AST in the Go compiler itself). But if you have time, good hints are welcome. But I prefer answers to Q1 and perhaps to Q2 before that Q3.
I might have as additional wish the ability to dynamically load some plugin which would represent an additional kind of value (think of floating point, or bignums, etc...)

Thanks for reading.
Regards.

PS. Most important question is Q1.
-- 

Basile Starynkevitch http://starynkevitch.net/Basile/   - email is  basile at starynkevitch dot net

Bourg La Reine, France






Tamás Gulácsi

unread,
Feb 15, 2017, 3:01:58 PM2/15/17
to golang-nuts

The id of data items is the only way to refer to data items on the protocol side. So I need a weak association between an id and the data item (if any) of that id. I do not want to keep (in some ordinary Go map, for example) the association between ids and their data items (otherwise, no data item would be garbage collected). I accept the fact that eventually (and perhaps quite later), a data item which is not reachable from the root data item would be garbage collected.

I was thinking of using unsafe.Pointer-s with runtime.SetFinalizer. A simplistic approach could be to use a Go map (module variable) from pairs of uint64_t to such unsafe pointers. At data item creation, I would register in that map, using the pair of uint64_t-s as key, the pointer of that data item suitably converted to an unsafe.Pointer. I would also pass to SetFinalizer some finalization routine which would unregister (delete) from that map. That scheme would work only if Go garbage collector does not move objects (i.e. struct-s). If Go's GC is a conservative GC à la Boehm GC, I'll probably even better "hide" the pointer (e.g. by keeping not a Pointer, but some uintptr suitably "hidden" e.g. by complementing all its bits à la GC_HIDE_POINTERS in Boehm GC). But that scheme would work only if Go's garbage collector is not moving values (e.g. struct-s). Is Go's garbage collector guaranteed to be some marking (not moving or generational copying, à la Ocaml) one? FWIW, I am familiar with the terminology of the GC handbook.

Why do you need this?
You want the GC do the housekeeping for you, but I'm sure you won't be happy with the result, as the GC's policy differs from what you await...

What kind of eviction policy would you want?
Storing the data in an append-only file (or LevelDB), the deleted items in SQLite, evicting regularly and on close?

Basile Starynkevitch

unread,
Feb 16, 2017, 12:42:05 AM2/16/17
to golang-nuts


On Wednesday, February 15, 2017 at 9:01:58 PM UTC+1, Tamás Gulácsi wrote:


Why do you need this?
You want the GC do the housekeeping for you, but I'm sure you won't be happy with the result, as the GC's policy differs from what you await...

What make you believe I won't be happy with the GC's policy. I'm quite open on that. My understanding is that the GC's policy is what I want.

What kind of eviction policy would you want?
Storing the data in an append-only file (or LevelDB), the deleted items in SQLite, evicting regularly and on close?

Persistence to disk will only happen explicitly at shutdown (process exit) time. So the file aspect don't matter much for my Q1.

So my Q1 is basically: how can I make a weak hash table (in the sense I have described, of having an association from strong keys -pairs of uint64- to weak pointers to items).
 

Basile Starynkevitch

unread,
Feb 16, 2017, 12:45:37 AM2/16/17
to golang-nuts
BTW, I have found https://github.com/fx5/weakref/blob/master/weakref.go but I am not sure to understand how it works (and even if it works).

Ian Lance Taylor

unread,
Feb 16, 2017, 3:11:08 PM2/16/17
to Basile Starynkevitch, golang-nuts
Hi Basile.

Go does not support any sort of weak pointer. You've already
identified one approach using finalizers. That approach will work
today as Go's garbage collector never moves items. There are no
current plans to ever move items, but it is possible that some Go
implementation will do so, in which case hiding the pointer as a
uintptr will break. (By the way, Go's garbage collector is precise,
so there is no need to invert the bits in the uintptr or anything like
that--simply storing a pointer value in a uintptr is enough to hide it
from the garbage collector.)

Other than that, I think you will have to implement garbage collection
yourself. That doesn't sound too hard with your data structure, which
doesn't seem to have cycles. Basically traverse the data structure
copying data from the current map to a new one, then replace the
official map with the new one. With judicious locking this could even
be done concurrently.

Sorry I don't have anything else to suggest.

Ian

Basile Starynkevitch

unread,
Feb 27, 2017, 7:27:18 AM2/27/17
to golang-nuts, bas...@starynkevitch.net


On Thursday, February 16, 2017 at 9:11:08 PM UTC+1, Ian Lance Taylor wrote:
identified one approach using finalizers.  That approach will work
today as Go's garbage collector never moves items. 

Thanks Ian for your help. FWIW, I just have coded this objvalmo.go file and function FindOrMakeObjectById commit 15a7fbc61e141 so I hope to have understood your advice correctly.

Regards.

Basile Starynkevitch
 
Reply all
Reply to author
Forward
0 new messages