[ann] hamsterdb Transactional Storage (thanks to all of you)

Christoph Rupp

unread,

Jul 20, 2009, 5:26:58 AM7/20/09

to

Dear comp.databases.theory Group,

over the last one or two years you helped me a couple of times when i
asked silly questions, mainly about transactions, isolation levels
etc.

Now i'm really happy to tell you that i finally released the first
beta version of hamsterdb Transactional Storage. if you're curious
about implementation details, feel free to ask, look at the source or
read the whitepaper (link is below).

In a nutshell:
* ACID compliant transactions in a lock-free architecture;
Transactions do not hold locks during their life-time. Conflicts are
resolved in memory, without accessing the disk
* Fast; performance-relevant functions are moved to the background
* Logical, idempotent logging and Recovery
* Multithreaded; every handle can be used from arbitrary threads
* portable ANSI-C code (Linux/Unix/Win32)
* Easy to use

Source:
http://hamsterdb.com/public/dl/hamsterdb2-0.0.1.tar.gz

The linux/unix version is stable - but win32 still has some issues
(although it works most of the time). This release is not intended for
production use.

The next releases will improve performance, improve win32 and add more
features (Cursors, duplicate keys, more transaction isolation
levels...)

Whitepaper:
http://hamsterdb.com/public/dl/hamsterdb2_technical_overview.pdf

Best regards
Christoph

Roy Hann

unread,

Jul 20, 2009, 8:13:47 AM7/20/09

to

Christoph Rupp wrote:

> Dear comp.databases.theory Group,
>
> over the last one or two years you helped me a couple of times when i
> asked silly questions, mainly about transactions, isolation levels
> etc.

You are welcome.

> The next releases will improve performance, improve win32 and add more
> features (Cursors, duplicate keys, more transaction isolation
> levels...)

Not supporting cursors would actually be a good *feature* IMO.

"Duplicate keys" sounds like it could be a contradiction in terms,
unless you are talking about duplicate (non-unique) *physical* keys.

--
Roy

com...@hotmail.com

unread,

Jul 22, 2009, 7:21:18 AM7/22/09

to

On Jul 20, 2:26 am, Christoph Rupp <cruppst...@gmail.com> wrote:
> Whitepaper:http://hamsterdb.com/public/dl/hamsterdb2_technical_overview.pdf

quoting now from the white paper:

>> (a conflict occurs if another active Transaction is modifying the same Database key.
>>If a conflict is discovered, the operation returns with an error.
in other words, a transaction locks a key.

>> only one Transaction isolation level is supported: "read committed"
no, it is not.
yes you prevent dirty reads, but concurrent transactions must all
finish
(unless one rolls itself back on a constraint violation).
you have to support concurrency before "isolation level" makes any
sense.

you still don't understand the basic concepts.

philip

Christoph Rupp

unread,

Jul 31, 2009, 10:10:00 AM7/31/09

to

feel free to explain them to me.

Christoph

com...@hotmail.com

unread,

Aug 2, 2009, 3:41:52 AM8/2/09

to

On Jul 31, 7:10 am, Christoph Rupp <cruppst...@gmail.com> wrote:
> On Jul 22, 1:21 pm, com...@hotmail.com wrote:
> > you still don't understand the basic concepts.

> feel free to explain them to me.

In my previous message I meant to bear bad tidings,
but I didn't mean to have an attitude.

To expand on my previous message:
1. You have locks.
"Locking" just means others' access is constrained.
Latches and mutexes are locks.
2. A system only has concurrency if each user's sequence
of transactions proceeds despite other transactions.
Isolation level is one way to define the kind of atomic
changes that a transaction is a sequence of from
the user's point of view.
(You don't seem to understand that isolation levels other
than serializable violate ACID, unless you interpret the
atomic changes as nested transactions.)
The system must produce some interleaving of these
atomic changes.
Rollback and commitment are implementation
concepts dealing with (user-invisible) attempted executions
of individual transactions or atomic changes.
Each user must see their transactions advance
(observing also non-deterministic atomic changes from
other transactions between their own atomic changes).

I said these are basic concepts. If I haven't motivated
you to re-examine some basic expositions, so be it.

philip

Christoph Rupp

unread,

Aug 2, 2009, 6:41:08 AM8/2/09

to

Philip,

On Aug 2, 9:41 am, com...@hotmail.com wrote:

> 1. You have locks.
> "Locking" just means others' access is constrained.
> Latches and mutexes are locks.

My "lock-free" described the implementation and the fact that i do not
lock pages. I'll make sure that next time i'm understood better

> 2. A system only has concurrency if each user's sequence
> of transactions proceeds despite other transactions.
> Isolation level is one way to define the kind of atomic
> changes that a transaction is a sequence of from
> the user's point of view.
> (You don't seem to understand that isolation levels other
> than serializable violate ACID, unless you interpret the
> atomic changes as nested transactions.)
> The system must produce some interleaving of these
> atomic changes.
> Rollback and commitment are implementation
> concepts dealing with (user-invisible) attempted executions
> of individual transactions or atomic changes.
> Each user must see their transactions advance
> (observing also non-deterministic atomic changes from
> other transactions between their own atomic changes).

i think i understand your points. I also understand that the 'i' in
ACID is violated if the isolation level is not SERIALIZABLE. on the
other hand even ANSI suggests different/weaker isolation levels. I
guess i follow a more pragmatic approach - also in case of
concurrency, .

>
> I said these are basic concepts. If I haven't motivated
> you to re-examine some basic expositions, so be it.
>
> philip

Actually i'm grateful for your comments. my database is a hobby
project and i'm happy to learn and improve my knowledge. I'd
appreciate book recommendations.

Christoph

com...@hotmail.com

unread,

Aug 2, 2009, 11:49:32 PM8/2/09

to

> on the
> other hand even ANSI suggests different/weaker isolation levels.

ANSI is just committees of interested parties.
In the case of databases that's SQL companies.
Standard doesn't mean good.

> guess i follow a more pragmatic approach

I find your use of this word interesting.
Perhaps as if theoretical or abstract meant impractical,
instead of "sound", "clear" and "fundamental".

A classic & good beginning is Date's Intro to DB Systems.
The classic reference is Gray and Reuter
Transaction Processing: Concepts and Techniques.
On the other hand Gray never understood the relational model.
Berstein has a 2009 edition of Principles of Transaction Processing.

philip

Christoph Rupp

unread,

Aug 3, 2009, 3:10:13 AM8/3/09

to

On Aug 3, 5:49 am, com...@hotmail.com wrote:
> > guess i follow a more pragmatic approach
>
> I find your use of this word interesting.
> Perhaps as if theoretical or abstract meant impractical,
> instead of "sound", "clear" and "fundamental".

No, not at all. Read it as "Accept something imperfect if it's good
enough to solve most problems effectively (and then improve the
solution step by step)". I think this comes from my development style
which is based on iterations.

Thanks for the book recommendations and your other comments.

Christoph

noyb

unread,

Sep 25, 2009, 11:46:19 PM9/25/09

to

com...@hotmail.com wrote:
> On the other hand Gray never understood the relational model.

That's interesting. Can you point me to some evidence of that?
Just out of curiosity, mind, I'm not saying I disbelieve you.
But it seems rather surprising...

Clifford Heath, Data Constellation, http://dataconstellation.com
Agile Information Management and Design

com...@hotmail.com

unread,

Sep 29, 2009, 8:59:12 PM9/29/09

to

On Sep 25, 8:46 pm, noyb <no.s...@please.net> wrote:
> com...@hotmail.com wrote:
> > On the other hand Gray never understood the relational model.
>
> That's interesting. Can you point me to some evidence of that?

I might have come to this conclusion because of "A Call to Arms".

philip

Clifford Heath

unread,

Sep 30, 2009, 4:02:54 AM9/30/09

to

Thanks, I hadn't read that. I found his big black book (with Reuter)
hugely valuable; it's one of my most-prized books.

In "A Call to Arms", he seems to be mostly describing the evolving
global data scenario, not necessarily approving it. The only thing
I found surprising was his cheerfulness about the whole mess, and
the only thing he approved of that I thought was a really bad idea
was running user code inside the database, where it can hide inside
a query that's now impossible to optimize. But I guess that was his
employer's idea, not his...

If we had an interface with proper nested relation (rel-valued attrs)
support, a lot of the push to drive code into the DBMS would vanish,
and that'd be wonderful. But for that, a new query language has to
emerge. This group ought to be a good place to discuss it, but...

--

paul c

unread,

Sep 30, 2009, 4:32:22 AM9/30/09

to

Clifford Heath wrote:
> com...@hotmail.com wrote:
>> On Sep 25, 8:46 pm, noyb <no.s...@please.net> wrote:
>>> com...@hotmail.com wrote:
>>>> On the other hand Gray never understood the relational model.
>>> That's interesting. Can you point me to some evidence of that?
>> I might have come to this conclusion because of "A Call to Arms".
>
> Thanks, I hadn't read that. I found his big black book (with Reuter)
> hugely valuable; it's one of my most-prized books.

> ...

Gray has often been mis-classified. That book was unique, a bit of a
cross-over, but the fact remains he was basically an operating system
guy. Thirty years ago, he was ahead of his time when he picked up terms
like "predicate" but used it for a more narrow purpose than it's used
today. His actual field was concurrency, maybe more accurate to say
physical concurrency. Sometimes I imagine that the logical concurrency
field is yet to be invented.

com...@hotmail.com

unread,

Oct 1, 2009, 12:08:16 AM10/1/09

to

On Sep 30, 1:02 am, Clifford Heath <no.s...@please.net> wrote:
> If we had an interface with proper nested relation (rel-valued attrs)
> support, a lot of the push to drive code into the DBMS would vanish,
> and that'd be wonderful. But for that, a new query language has to
> emerge.

??
Date and Darwen have proper nested relations.
The algebra is technically speaking slightly different as whenever
you have a recursively nested version of a type.
To NEST is to allow relations as attribute types.
UNNEST is primitive (D&D don't address this correctly).
If you have a generalized aggregate corresponding to series notation
you can express UNNEST as UNION over a set of tuples.

> If we had [X] a lot of the push to drive [Y] into the DBMS would vanish,

> and that'd be wonderful.

X need only be "the relational model being used".
Y is "NOT X" (see "A Call to Arms" for examples).

philip

Clifford Heath

unread,

Oct 1, 2009, 5:47:15 AM10/1/09

to

com...@hotmail.com wrote:
> On Sep 30, 1:02 am, Clifford Heath <no.s...@please.net> wrote:
>> If we had an interface with proper nested relation (rel-valued attrs)
>> support, a lot of the push to drive code into the DBMS would vanish,
>> and that'd be wonderful. But for that, a new query language has to
>> emerge.

> Date and Darwen have proper nested relations.

D is not a language; it's a set of requirements for a language.
Tutorial D is only offered as an example of what a D language
might look like.

I maintain there needs to be a language, then an implementation,
then the implementation must be deployable. This last requirement
probably means that it needs to run on an SQL backend provided
by all the major RDBMs. The implementations must be multi-sourced
to be adopted; enterprises won't trust their data to a single
vendor any more. Finally there needs to be universities cranking
out graduates able to use it, and all the end-user tools that are
needed for enterprises to build, deploy and maintain applications.

We're still decades away at the current rate of progress. Maybe
centuries... and it seems to be getting further way :-(.

>> If we had [X] a lot of the push to drive [Y] into the DBMS would vanish,
>> and that'd be wonderful.
> X need only be "the relational model being used".
> Y is "NOT X" (see "A Call to Arms" for examples).

Yes, arguably; but for it to be used, there must be solid, deployable
and well-supported implementations available from more than one vendor.

rpost

unread,

Oct 5, 2009, 1:56:49 PM10/5/09

to

philip wrote:

>To NEST is to allow relations as attribute types.

No, it is an operator that creates nested relations from flat relations.

>UNNEST is primitive (D&D don't address this correctly).
>If you have a generalized aggregate corresponding to series notation
>you can express UNNEST as UNION over a set of tuples.

I don't understand this ...

--
Reinier

com...@hotmail.com

unread,

Oct 7, 2009, 6:59:01 PM10/7/09

to

On Oct 5, 10:56 am, rp...@pcwin518.campus.tue.nl (rpost) wrote:
> >To NEST is to allow relations as attribute types.
> No, it is an operator that creates nested relations from flat relations.

Sure. So does EXTENDing by a relation attribute. I meant that if we
have relation attributes we can write R NEST A using other operators.
See Date & Darwen's TTM 3rd ed end of appendix A re GROUP.

> >If you have a generalized aggregate corresponding to series notation
> >you can express UNNEST as UNION over a set of tuples.

If we could define aggregates like SUM(R, X) and PRODUCT(R, X+Y) for
arbitrary operators via a generalization of standard math SIGMA and PI
series notation over elements of a set then we could define R UNNEST A
as the iterated UNION over tuples t in R of expression (relation{t}
REMOVE A) JOIN t.A. So generalized aggregates give it for free.

philip

Bob Badour

unread,

Oct 7, 2009, 7:26:32 PM10/7/09

to

com...@hotmail.com wrote:

Something I have been saying for well over 15 years: Give me adequate
domain support and I don't need all the other crap folks want to throw
into the database language.

paul c

unread,

Oct 13, 2009, 10:25:09 AM10/13/09

to

I've been puzzling over this. Is it a way of allowing user-defined
aggregate ops? And what "detritus" is avoided? Also does it mean a
"tuple" programming interface? Or does it simply mean that most
implementations don't define all the possible ops for each domain? I
might be able to get the point with an answer to just one of those
questions.

(I take it that UNNEST here has a tuple operand, unlike UNGROUP. I've
presumed that the conventional aggregate definitions involve UNGROUP as
well as EXTEND.)

Bob Badour

unread,

Oct 13, 2009, 11:26:06 AM10/13/09

to

paul c wrote:

> Bob Badour wrote:
>
>> com...@hotmail.com wrote:
>>
>>> On Oct 5, 10:56 am, rp...@pcwin518.campus.tue.nl (rpost) wrote:
>>>
>>>>> If you have a generalized aggregate corresponding to series notation
>>>>> you can express UNNEST as UNION over a set of tuples.
>>>
>>> If we could define aggregates like SUM(R, X) and PRODUCT(R, X+Y) for
>>> arbitrary operators via a generalization of standard math SIGMA and PI
>>> series notation over elements of a set then we could define R UNNEST A
>>> as the iterated UNION over tuples t in R of expression (relation{t}
>>> REMOVE A) JOIN t.A. So generalized aggregates give it for free.
>>
>> Something I have been saying for well over 15 years: Give me adequate
>> domain support and I don't need all the other crap folks want to throw
>> into the database language.
>
> I've been puzzling over this. Is it a way of allowing user-defined
> aggregate ops?

Of course, and a whole lot more.

> And what "detritus" is avoided?

What detritus or "features" do we have now that one could more
accurately and more robustly model with user-defined types and type
generators?

paul c

unread,

Oct 13, 2009, 11:57:03 AM10/13/09

to

Bob Badour wrote:
> paul c wrote:
>
>> Bob Badour wrote:
>>
>>> com...@hotmail.com wrote:

...

>> And what "detritus" is avoided?
>
> What detritus or "features" do we have now that one could more
> accurately and more robustly model with user-defined types and type
> generators?

Thanks, I see it was about user-defined types all along. Call me a
language philistine, I've always been unhappy about integrated user type
support because I think that makes user languages more unwieldy than
they need to be, way too much so. In the old days, most customization
of that sort was labelled "exits", in the late 1970's people started
calling the same thing "drivers". Today I don't know what products, if
any, support an add-on layer and probably the only "exit" most people
are aware of are simple functions to handle sort comparisons. I've
never seen much advantage in having dynamic binding for type operators
and there is huge practical work to make sure a type isn't redefined in
a way that distorts the original data. Maybe if the big-name products
had had a way of linking type "drivers" from day one we'd now have a
viable and competitive sub-industry supplying alternative types that
users could choose from. 'Lord knows there are plenty of encoding
standards. Personally I think it better to apply type theory in a
separate implementation language which most users could ignore and which
wouldn't need to be optimized by the dynamic engine.

Bob Badour

unread,

Oct 13, 2009, 12:48:52 PM10/13/09

to

paul c wrote:

Sort of a "type sub-language"? ;)

paul c

unread,

Oct 13, 2009, 1:17:50 PM10/13/09

to

Bob Badour wrote:
...

> Sort of a "type sub-language"? ;)

yes, and not necessarily a "sub-type sub-language"!

Bob Badour

unread,

Oct 13, 2009, 1:19:42 PM10/13/09

to

paul c wrote:

Not necessarily but not necessarily not either.

paul c

unread,

Oct 13, 2009, 2:29:22 PM10/13/09

to

I guess so (even though I still haven't gotten over the metric switch
and still prefer B&W tv, nothing wrong with that whisky either).

Bob Badour

unread,

Oct 13, 2009, 10:23:24 PM10/13/09

to

paul c wrote:

I am bimetrical or bi-unital or something. I was 9 when Canada switched
over so I was old enough to have fully internalized imperial units and
still young enough to fully internalize S.I. At least the day-to-day
units. I have spent my life since then translating between people older
and younger than myself.

While I would only ever report force in slug-rods per fortnight squared
as an act of perverse defiance, I notice google will happily convert
them to newtons or pounds if asked.

paul c

unread,

Oct 13, 2009, 10:33:34 PM10/13/09

to

Bob Badour wrote:
> paul c wrote:
>
>> Bob Badour wrote:
...

> I am bimetrical or bi-unital or something. I was 9 when Canada switched
> over so I was old enough to have fully internalized imperial units and
> still young enough to fully internalize S.I. At least the day-to-day
> units. I have spent my life since then translating between people older
> and younger than myself.
>
> While I would only ever report force in slug-rods per fortnight squared
> as an act of perverse defiance, I notice google will happily convert
> them to newtons or pounds if asked.

When that switch happened, the price of milk and paint jumped overnight.
Today, it is hard not to pay double for a wrench or socket set because
they usually have both SAE and metric sizes. The funny part is that
many torque wrenches have dual scales because many mechanics are not
able to multiple by four and divide by three nor vice-versa, ironic
because those are usually the mechanics who think they don't need a
torque wrench.

Bob Badour

unread,

Oct 13, 2009, 10:58:26 PM10/13/09

to

paul c wrote:

> Bob Badour wrote:
>
>> paul c wrote:
>>
>>> Bob Badour wrote:
>
> ...
>
>> I am bimetrical or bi-unital or something. I was 9 when Canada
>> switched over so I was old enough to have fully internalized imperial
>> units and still young enough to fully internalize S.I. At least the
>> day-to-day units. I have spent my life since then translating between
>> people older and younger than myself.
>>
>> While I would only ever report force in slug-rods per fortnight
>> squared as an act of perverse defiance, I notice google will happily
>> convert them to newtons or pounds if asked.
>
> When that switch happened, the price of milk and paint jumped overnight.

It's not like that's the only time they jumped overnight, though.

> Today, it is hard not to pay double for a wrench or socket set because
> they usually have both SAE and metric sizes. The funny part is that
> many torque wrenches have dual scales because many mechanics are not
> able to multiple by four and divide by three nor vice-versa, ironic
> because those are usually the mechanics who think they don't need a
> torque wrench.

That reminds me: The mechanic had my driver's side rear dually off
tonight for inspection so I have to torque it again tomorrow before I
drive it very far.

com...@hotmail.com

unread,

Oct 15, 2009, 8:26:17 PM10/15/09

to

On Oct 13, 7:25 am, paul c <toledobythe...@oohay.ac> wrote:

>Is it a way of allowing user-defined
> aggregate ops?

Yes.

> Also does it mean a
> "tuple" programming interface?

Only in exactly the sense that R WHERE X+Y involves a tuple interface.

> And what "detritus" is avoided?

> Or does it simply mean that most

> implementations don't define all the possible ops for each domain?

They don't let you write arbitrary operators on relations (a special
case being applying other operators to tuples collectively). Generic
aggregation does.

> (I take it that UNNEST here has a tuple operand, unlike UNGROUP.

By NEST I just meant Date & Darwen's GROUP. The operands are a
relation and a quoted-expression.

> I've
> presumed that the conventional aggregate definitions involve UNGROUP as
> well as EXTEND.)

No, they involve "standard math series notation". It's operators like
SUMMARIZE, EXTEND and SELECT that traditionally take
aggregates that also use EXTEND and traditionally are the only place
aggregates (column functions) can be used.

SUM(s, e) // set s, quoted-expression e binds name i
= SIGMA for i in s of e
SUM({1, 2, 3}, i**2) = 1**2 + 2**2 + 3**2 = 1 + 4 + 9 = 15

SUM(r, e)
// relation expression r, quoted-expression e binds attribute names
of r
= SIGMA for <a1, a2, ...> in r of e
SUM({<X 1, Y 2>, <X 5 Y 6>}, 2*(Y-X))
= 2*(2-1) + 2*(6-5) = 2 + 2 = 4

philip

paul c

unread,

Oct 15, 2009, 9:22:58 PM10/15/09

to

com...@hotmail.com wrote:
...

> By NEST I just meant Date & Darwen's GROUP. The operands are a
> relation and a quoted-expression.

> ...

If you're going to refer to those guys, you should stick to what they
say. Their GROUP operands are relations/relvars and attributes. Their
NEST involves tuples.

com...@hotmail.com

unread,

Oct 20, 2009, 9:39:11 PM10/20/09

to

To "nest" is the generic term in the sense of nested tuples and
relations.
D&D GROUP attributes into nested relations
and WRAP attributes into nested tuples.

I started off writing about nested relations and though I mentioned
D&D
I didn't think it mattered what they happened to call NEST/UNNEST.
Later I pointed to an example by them so there
I tried to indicate they use GROUP/UNGROUP not NEST/UNNEST.
The operator whose operands were in question was UNGROUP/UNNEST.
For unnesting a single attribute its operands
are a relation and an attribute name.
The value of any relation expression, possibly involving relvar names,
denotes the relation operand.
I referred to the attribute name as being a quoted expression because
it is passed on as-is without evaluation to the series operator.

Sorry about not being clear.

philip

paul c

unread,

Oct 21, 2009, 12:03:06 AM10/21/09

to

com...@hotmail.com wrote:
> On Oct 15, 6:22 pm, paul c <toledobythe...@oohay.ac> wrote:
>> com...@hotmail.com wrote:
>>
>> ...
>>
>>> By NEST I just meant Date & Darwen's GROUP. The operands are a
>>> relation and a quoted-expression.
>>> ...
>> If you're going to refer to those guys, you should stick to what they
>> say. Their GROUP operands are relations/relvars and attributes. Their
>> NEST involves tuples.
>
> To "nest" is the generic term in the sense of nested tuples and
> relations.
> D&D GROUP attributes into nested relations
> and WRAP attributes into nested tuples.

> ...

Sorry, I had nest and WRAP mixed-up. Maybe because I'd rather people
said either wrap or group when they're referring to D&D, otherwise it's
hard to know what they're talking about.