Suggestion for Happstack State

2 views
Skip to first unread message

morten...@gmail.com

unread,
Jun 21, 2009, 2:57:12 PM6/21/09
to HAppS
My main motivation for considering Happstack was the memory state
system. I think that
is what makes Happstack special. And Haskell of course.

However, it seems like State is not really about to be done, and there
is a lot of work in it.
Part of State is the Spread toolkit, which means that you accept that
other libraries can be used. Then why not go all the way, and use
another tool kit and just make a Haskell binding for that.

My suggestion is Scalaris. There might be others. I am not so familiar
with all these new data stores. Scalaris runs in memory, and it has
automatic replication and sharding.
It is written in Erlang. It seems like it is doing exactly what State
should be doing.
It would be so much simpler to make a Haskell binding to it. Scalaris
does not serialize to
disk. There is no persistent storage. They claim that it is not easy
to consistently write to disk. The whole system is supoosed to be
'always on'. I guess one could make an occasional query for all key/
values and dump them to disk. That would give some persistence, but
ACID would not be totally guaranteed on disk. It makes sense to me
that the serialization is done, if at all, by an outside script
similar to monitoring.

State is probably not serialized consistently right now in Happstack
State either; suppose the multi masters lose their internal connection
and each continue updating state. Then what do the disk files mean?

On a single server it is possible to serialize consistently of
course.

The Scalaris people tested it on all of Wikipedia's data, and they
could run it on way fewer servers than the real wikipedia with a
similar request rate.

One might also make bindings to other tool kits, including some that
write to disk. Scalaris is just one possibility. But the most
interesting I have seen.

So instead of writing all the code yourself, Happstack could become a
"binder" for other good toolkits.

On another note, I looked into Hyena. It looked very simple. I think
one of the barriers for Happstack is the complexity of the syntax.
Hyena's application is just a function like

Environment -> IO Response

It is extremely simple to understand. To become more popular,
Happstack need to be simpler as well.

Why not refocus Happstack to become a "binder" between other
libraries, giving a full framework all the way from Javascript
libraries to data storage.

For instance:

Qooxdoo, JQuery, etc -> Apache, nginx etc -> Hyena, Happstack-Server
etc -> Scalaris, memcachedb, coucheDB etc.

Users (application writers) could write solely in Haskell, (and
Javascript), and the Happstack code development itself would be simple
since it leverages other peoples' work.

Compare the ease of writing a Haskell binding to Scalaris with
writing, from scratch, a State module with automatic sharding and
replication, AND convincing people that they should trust such a
memory based State system. Erlang might also be a better language for
such a fail safe system than Haskell.

Cheers and thanks for all the good work and all the good discussions
in this group.


PS. Another useful library would be a JSON - bytestring library.

thomas hartman

unread,
Jun 21, 2009, 3:35:04 PM6/21/09
to HAppS
> State is probably not serialized consistently right now in Happstack
> State either; suppose the multi masters lose their internal connection
> and each continue updating state. Then what do the disk files mean?

could somebody answer this excellent question?



scalaris certainly seems interesting. going to a never-written-to-hd
system seems like quite a conceptual leap to me, though.

do you have a favorite demo / quickstart / tutorial or toy app that
showcases scalaris?

On Jun 21, 2:57 pm, "morten.kr...@amberbio.com"

stepcut

unread,
Jun 21, 2009, 5:21:40 PM6/21/09
to HAppS
On Jun 21, 1:57 pm, "morten.kr...@amberbio.com"
<morten.kr...@gmail.com> wrote:
> It would be so much simpler to make a Haskell binding to it. Scalaris
> does not serialize to
> disk. There is no persistent storage. They claim that it is not easy
> to consistently write to disk. The whole system is supoosed to be
> 'always on'.

Happstack-state already supports this mode of operation. Simply use
the null-saver to avoid journaling any events.

> I guess one could make an occasional query for all key/
> values and dump them to disk. That would give some persistence, but
> ACID would not be totally guaranteed on disk.

That is exactly what happens when you call 'createCheckpoint'. I
believe you can already use the nullSaver to avoid saving individual
events to disk, and still use createCheckpoint to create a checkpoint
to disk, S3, etc, on occasion. However, I am not sure why you would
want to do that. Saving the events seems better.

> It makes sense to me
> that the serialization is done, if at all, by an outside script
> similar to monitoring.

How does this script get access to the RAM in the process where the
values are stored? With happstack, you could have your normal
multimasters, which handle incoming requests, run with out saving any
events or checkpoints to disk. But you could also have an additional
master or two, which did not handle any incoming requests, but did
saving events and checkpoints to disk.

In this way, you could have a bunch of diskless machines with fast
processors for handling requests, and different machines with fast
RAID arrays for providing persistent storage to disk. Ideally one or
two RAID machines per shard?

> State is probably not serialized consistently right now in Happstack
> State either; suppose the multi masters lose their internal connection
> and each continue updating state. Then what do the disk files mean?

That would never happen in the current architecture. We use the spread
mode which has the following guarantees:

1. a message is either delivered to all clients or no clients
2. all clients receive messages sent to the network in the same order
3. there is only one spread network, and it can never become
fragmented

In multimaster-mode, a client does not directly perform an update on
its local state. Instead it creates an update event which it sends to
the network. Later, it will receive this message back from the spread
network, just like all the other clients, and will perform the update
in the same way that all the other clients do. If a multimaster were
to get disconnected from the network, then it would not received the
update event, and would therefore not commit update. When a
multimaster rejoins the network, it will request the latest state.
Note that the lost update is consistent with ACID principles, because
the event is lost before the transaction is commited:
http://www.nusphere.com/products/library/acid_transactions.htm

I believe there is a weakness right now that if all the nodes go down,
then the state is restored from which ever node joins first. If that
node happened to be disconnected from the network prior to everything
going down, then it could be missing events that other servers have.
This is not an unfixable flaw however. Certainly, any solution that
would work for scalaris should work here. (Assuming you have some way
to recover from all nodes going down in scalaris at all).

Scalaris seems to be based around storing (key, value) pairs -- but
what if your state is not really based around (key, value) pairs?
Happstack-state allows you to use almost any Haskell data type, so you
can choose to use (key,value) pairs only if they are right for your
application (via, IxSet, or whatever else you want). For example,
storing a tree (such as a threaded message board) as key/value pairs
requires a lot more work than just storing the tree. With happstack-
state, you just use your basic tree type, and normal tree manipulation
functions, and all is good. To use key value pairs, you either have to
convert between a Tree type and a key/value pair types (which means
writing and debugging more code), or you have to write a bunch of
functions for doing tree-like manipulations on data stored in key/
value pairs (more extra code, testing, and debugging). This is the
same reason why on a single server system, happstack-state is still
nicer than BerkeleyDB for many applications.

Also, scalaris does not allow you to delete keys.

And, perhaps most importantly, Scalaris is based on Paxos -- and I
have not heard good things about Paxos scaling. Do you have some
reason to believe that Paxos scales better than Spread?

So, to answer your question more directly, there is more to happstack-
state than replication. In fact, most (perhaps all) users of happstack-
state today do not use replication. So, on a single server setup, why
would people want to used happstack-state instead of MySQL, sqlite (a
relation database) or BerkeleyDB (a key/value store), which all
support running entirely in-memory rather than on-disk. One answer is
that happstate-state lets you use Haskell data types directly, with
out having to writing any marshaling code by hand, and that you can
write your 'queries' in Haskell instead of SQL. These features ought
to allow you to write shorter and simpler code in less time, with less
bugs, and less testing. So, for happstack-state on a single-server
system, it is not clear that there is a pre-existing library which
provides similar benefits.

When introducing replication and sharding into the mix, we, of course,
want to retain the benefits of happstack-state, and get more
scalability. We leverage spread, because spread *does* provide
functionality that we can use out of the box to extend happstack-
state. There is little benefit to writing something like spread, when
spread already provides the exact functionality that we need. The
functionality that spread provides is low-level, and can be used to
extend the benefits of happstack-state in a fairly transparent way. If
we pick something higher-level, then it would need to be something
that could provide the same benefits that happstack-state provides.
However, most of the higher level libraries seem like a step
backwards, since they will create more work for people building on our
platform (in the form of having to writing marshaling code, or only
use a restricted set of types).

morten...@gmail.com

unread,
Jun 22, 2009, 7:07:02 AM6/22/09
to HAppS
Hi Stepcut

Thanks for your very good reply.


> How does this script get access to the RAM in the process where the
> values are stored? With happstack, you could have your normal
> multimasters, which handle incoming requests, run with out saving any
> events or checkpoints to disk. But you could also have an additional
> master or two, which did not handle any incoming requests, but did
> saving events and checkpoints to disk.

That was exactly what I meant. The checkpointing is separarated from
the web serving.


> the network. Later, it will receive this message back from the spread
> network, just like all the other clients, and will perform the update
> in the same way that all the other clients do. If a multimaster were
> to get disconnected from the network, then it would not received the
> update event, and would therefore not commit update. When a
> multimaster rejoins the network, it will request the latest state.
> Note that the lost update is consistent with ACID principles, because
> the event is lost before the transaction is commited:http://www.nusphere.com/products/library/acid_transactions.htm

So what happens in the following scenario. Two multimasters that lose
their internal connection but each continue to receive requests.
How does Spread avoid fragmentation?


> I believe there is a weakness right now that if all the nodes go down,
> then the state is restored from which ever node joins first. If that
> node happened to be disconnected from the network prior to everything
> going down, then it could be missing events that other servers have.
> This is not an unfixable flaw however. Certainly, any solution that
> would work for scalaris should work here. (Assuming you have some way
> to recover from all nodes going down in scalaris at all).

Scalaris cannot recover. They assume that the whole network will never
crash simultaneously.


> Scalaris seems to be based around storing (key, value) pairs -- but
> what if your state is not really based around (key, value) pairs?
> Happstack-state allows you to use almost any Haskell data type, so you
> can choose to use (key,value) pairs only if they are right for your
> application (via, IxSet, or whatever else you want). For example,
> storing a tree (such as a threaded message board) as key/value pairs
> requires a lot more work than just storing the tree. With happstack-
> state, you just use your basic tree type, and normal tree manipulation
> functions, and all is good. To use key value pairs, you either have to
> convert between a Tree type and a key/value pair types (which means
> writing and debugging more code), or you have to write a bunch of
> functions for doing tree-like manipulations on data stored in key/
> value pairs (more extra code, testing, and debugging). This is the
> same reason why on a single server system, happstack-state is still
> nicer than BerkeleyDB for many applications.
>
> Also, scalaris does not allow you to delete keys.
>
> And, perhaps most importantly, Scalaris is based on Paxos -- and I
> have not heard good things about Paxos scaling. Do you have some
> reason to believe that Paxos scales better than Spread?

I didn't know that Spread was better than Paxos. That is good for
Happstack.


So basically, the argument for the current Happstack State is that
marshalling of Haskell types should be minimized for the application
developer.
I think there are many drawbacks with that approach relative to the
key/value approach:

It is more difficult to program the application because of all the
serializable, and version, mkMethods, update/query monads.
Lots of newcomers will stop here.
The difficulty of converting a tree to a bytestring value is not that
high, and everybody understands the problem. And most web
applications
don't even have trees or more complicated data structures. Maps are
perfect for many applications.

It is impossible to replace the backend with another. Modularity is
lost. What if one doesn't feel 100% sure about Happstack State. Will
it leak memory etc.


What about the following approach then.

Happstack State gives the user a universal Data.Map.Map with all the
methods from Map. The user will not need to define State because State
is just
Map ByteString ByteString

There are no versions. No migrations. No mkMethods etc. Very little
uncertainty. Simplicity is important. Very important, if Happstack
should ever become very popular.

Now, conversions from arbitrary Haskell types to the key/value could
be done by some TH code living in its own modules. But the important
part is people can start using
State immediately


Programming would just be like this

import Happstack.StateMap (universalMap) -- universalMap is of type
Map ByteString ByteString

parse request
transaction begin
universalMap.lookup(x)
universalMap.insert(y,z)
etc.
transaction commit.
Send a response.

That would be easy!

Everybody could get started with Happstack immediately.
Then there could be some libraries for complicated marshalling/
serialization etc, which most people might just ignore.
In the end, isn't the tree converted to a ByteString by the Binary
instance anyway.

My planned applications would be easy to formulate in terms of key/
value, and I would like the possibility of being able to replace the
backend easily.

Imagine if the Happstack documentation looked like this:

"Your application is a function Request -> IO Response, where Request
= ........
and Response = .....

Inside your application function, you have access to a Data.Map.Map
called universalMap which follows the standard Data.Map interface.
UniversalMap is persistent across requests, and is replicated/sharded
if you so wish.

You use it like this

import ..........
transaction begin
universalMap.lookup etc...
universalMap.delete etc..
commit.


To make a replicated and/or shared application, you will need to start
the Spread daemon on all your servers, and start Happstack.

The configuration files for ports, ipV4/ipV6 and multimaster look like
this:

Now, you know enough. Go ahead and implement your application!

By the way, we have a lot of helpers functions in separate modules,
which will assist you in url parsing, response generation, serializing
arbitrary Haskell data types etc,
but you don't need to know anything about these helper functions to
use Happstack."


If the interface and documentation was like this, Happstack would not
just be one of the most impressive frameworks, it would also be one of
the easiest to use. And it would get a lot more users, I believe. They
don't need to care about Serializable, Version, mkMethods, Query/
Update events etc.
This interface would give people the option of replacing State with
any other data store. Right now, people need to take the risk that
State is good enough. There is no exit strategy.

Personally, I think I will write my application with one universal
map, and let State=Map ByteString ByteString. If it doesn't work, I
can replace State with any other data store. I can even compare
different data stores.

Cheers.

morten...@gmail.com

unread,
Jun 22, 2009, 7:49:23 AM6/22/09
to HAppS
Thomas,

I don't know any demos of Scalaris. One can download it, of course.

But Scalaris was just meant as one example. If State had an interface
of a map as described above, one might be able to
replace one store with another.

On Jun 21, 9:35 pm, thomas hartman <thomashartm...@googlemail.com>
wrote:

stepcut

unread,
Jun 22, 2009, 12:37:06 PM6/22/09
to HAppS
On Jun 22, 6:07 am, "morten.kr...@amberbio.com"
<morten.kr...@gmail.com> wrote:
> Hi Stepcut

> So what happens in the following scenario. Two multimasters that lose
> their internal connection but each continue to receive requests.
> How does Spread avoid fragmentation?

Actually, Spread itself does not avoid network partitioning. It Does,
however, detect it, and provide each partition with a new member list,
and the information about which messages got delivered.

When a cluster becomes disconnected, it should have a policy for
determining what to do. For large sites, this policy is likely to be
application specific. Happstack should provide a reasonable default.

> So basically, the argument for the current Happstack State is that
> marshalling of Haskell types should be minimized for the application
> developer.
>
>  I think there are many drawbacks with that approach relative to the
> key/value approach:

> It is more difficult to program the application because of all the
> serializable, and version, mkMethods, update/query monads.

I would argue that it is easier to develop and application because of
those things.

> Lots of newcomers will stop here.

A lot of newcomers will stop at the 'Haskell' part. Almost everything
you need to know to get started is in this short tutorial (which needs
to be updated to use Happstack instead of HAppS):

http://nhlab.blogspot.com/2008/07/extending-asterisk-with-happs.html

If newcomers are not willing to learn that little, then I don't really
care. The stated goals of HAppS are:

"A web framework for developers to prototype quickly, deploy
painlessly, scale massively, operate reliably, and change easily."

I am not sure that we can deliver on those promises and avoid the need
for developers to learn new things.

Compared to learning MySQL, learning happstack-state seems like a
breeze, yes?

> It is impossible to replace the backend with another. Modularity is
> lost. What if one doesn't feel 100% sure about Happstack State. Will
> it leak memory etc.

Impossible in what way? Many happstack users today use MySQL instead
of happstack-state. If you wish to change ships midstream, then you
need only export your data from happstack-state, and import it into
the new system. Yes, that will require some code to convert your data
from one format to another, but even if you are just switching from
MySQL to Postgres, you are going to have to jump through similar
hurdles, despite them both being SQL. I don't see how using Scalaris,
or anything else, makes things more modular.

> What about the following approach then.
>
> Happstack State gives the user a universal Data.Map.Map with all the
> methods from Map. The user will not need to define State because State
> is just Map ByteString ByteString
>
> There are no versions. No migrations. No mkMethods etc.  Very little
> uncertainty. Simplicity is important. Very important, if Happstack
> should ever become very popular.

Ok, so let's consider the very simple guestbook application that ships
with happstack. It has a form with two fields:

- user name
- your message

In the state, we also store the date that the user left the message.

So we basically have two types:

data Entry = Entry { username :: String, message :: String, date ::
UTCTime }
newtype Guestbook = Guestbook [Entry]

Our top-level state, Guestbook, is just a list of guestbook entries,
with the most recent entries added to the beginning of the list.

- how would we store this same data as, Map ByteString ByteString

- what happens if we have a bunch of data in our guestbook, and then
we want to add a new field, messageLength, which contains the number
of characters in the message?

- jeremy

Lemmih

unread,
Jun 22, 2009, 1:29:43 PM6/22/09
to HA...@googlegroups.com

You could keep user names, messages and dates in separate maps. The
Entry map would be virtual and constructed with parallel lookups in
the field maps.

--
Cheers,
Lemmih

MightyByte

unread,
Jun 22, 2009, 1:51:22 PM6/22/09
to HA...@googlegroups.com
On Mon, Jun 22, 2009 at 12:37 PM, stepcut<jer...@n-heptane.com> wrote:
>
> On Jun 22, 6:07 am, "morten.kr...@amberbio.com"
> <morten.kr...@gmail.com> wrote:
>> It is more difficult to program the application because of all the
>> serializable, and version, mkMethods, update/query monads.
>
> I would argue that it is easier to develop and application because of
> those things.

Also, Alex mentions in this thread

http://groups.google.com/group/HAppS/msg/d6a754a0a5c49371?hl=en

that advances in TH may make it possible to reduce this to a single
mkMethods on the top-level type. In my opinion writing these
instances isn't a very big obstacle--and certainly not one that
involves comprehending difficult concepts. But if this TH improvement
can be done, the obstacle is all but gone.

morten...@gmail.com

unread,
Jun 22, 2009, 2:55:28 PM6/22/09
to HAppS
On Jun 22, 6:37 pm, stepcut <jer...@n-heptane.com> wrote:
> On Jun 22, 6:07 am, "morten.kr...@amberbio.com"
>
> <morten.kr...@gmail.com> wrote:
> > Hi Stepcut
> > So what happens in the following scenario. Two multimasters that lose
> > their internal connection but each continue to receive requests.
> > How does Spread avoid fragmentation?
>
> Actually, Spread itself does not avoid network partitioning. It Does,
> however, detect it, and provide each partition with a new member list,
> and the information about which messages got delivered.
>
> When a cluster becomes disconnected, it should have a policy for
> determining what to do. For large sites, this policy is likely to be
> application specific. Happstack should provide a reasonable default.

Impossible as I see it. Both multimasters will have to continue.
From the point of view of each of them, the other one is gone, so they
have to continue serving requests.
Both of them believe that. The checkpoints and event logs will now
need a manual conflict resolution
in order to repair the system.

>
> > So basically, the argument for the current Happstack State is that
> > marshalling of Haskell types should be minimized for the application
> > developer.
>
> >  I think there are many drawbacks with that approach relative to the
> > key/value approach:
> > It is more difficult to program the application because of all the
> > serializable, and version, mkMethods, update/query monads.
>
> I would argue that it is easier to develop and application because of
> those things.

Only if you know the syntax to start with, and you are sure that
Happstack-State will work,
and you don't need to plan for a migration to another backend.


> > Lots of newcomers will stop here.
>
> A lot of newcomers will stop at the 'Haskell' part. Almost everything
> you need to know to get started is in this short tutorial (which needs
> to be updated to use Happstack instead of HAppS):
>
> http://nhlab.blogspot.com/2008/07/extending-asterisk-with-happs.html

Those newcomers who learn or know Haskell will want to feel they
understand
what is going on. What you mean by need to know is that they can copy
from your example.
But many people want to understand the API they program against. It
should make sense.

What is this:

instance Component HitCounter where
type Dependencies HitCounter = End
initialValue = HitCounter 0

entryPoint :: Proxy HitCounter
entryPoint = Proxy

many people, including myself, would ask. I can just copy that and go
on. I need to understand it. That means, that one
has to go into the source code and understand it, and then later
realize that, entryPoint=Proxy, is unimportant. Just copy/paste it.
Why type Dependencies = End? Why Component?

All commands should have an intuitive meaning like when you use a
database driver. There is nothing equivalent to entryPoint=Proxy
suddenly popping
up.


> If newcomers are not willing to learn that little, then I don't really
> care. The stated goals of HAppS are:
>
>  "A web framework for developers to prototype quickly, deploy
> painlessly, scale massively, operate reliably, and change easily."
>
> I am not sure that we can deliver on those promises and avoid the need
> for developers to learn new things.
>
> Compared to learning MySQL, learning happstack-state seems like a
> breeze, yes?

Even if you don't like MySQL, there is nothing in the API as strange
and meaningless
as
type Dependencies HitCounter = End
or
entryPoint :: Proxy HitCounter
entryPoint = Proxy


> > It is impossible to replace the backend with another. Modularity is
> > lost. What if one doesn't feel 100% sure about Happstack State. Will
> > it leak memory etc.
>
> Impossible in what way? Many happstack users today use MySQL instead
> of happstack-state. If you wish to change ships midstream, then you
> need only export your data from happstack-state, and import it into
> the new system. Yes, that will require some code to convert your data
> from one format to another, but even if you are just switching from
> MySQL to Postgres, you are going to have to jump through similar
> hurdles, despite them both being SQL.

You are right. It is not impossible. But the point is that if you have
a
new product, it is easier to gain market share if the API isn't new as
well.
If a key/value Data.Map API was used for State, it would be seamless
to write an
application to State, and then replace it with another key/value store
if State turned out to
leak memory, or if it was decided to use the hard disk instead of pure
memory. Maybe most users
never log on, and hence their data can be sitting on the hard disk.

>I don't see how using Scalaris,
> or anything else, makes things more modular.

Not Scalaris in itself, but making the key/value API would. Then the
user could plugin Scalaris if needed or something else.


> > What about the following approach then.
>
> > Happstack State gives the user a universal Data.Map.Map with all the
> > methods from Map. The user will not need to define State because State
> > is just  Map ByteString ByteString
>
> > There are no versions. No migrations. No mkMethods etc.  Very little
> > uncertainty. Simplicity is important. Very important, if Happstack
> > should ever become very popular.
>
> Ok, so let's consider the very simple guestbook application that ships
> with happstack. It has a form with two fields:
>
>   - user name
>   - your message
>
> In the state, we also store the date that the user left the message.
>
> So we basically have two types:
>
> data Entry = Entry { username :: String, message :: String, date ::
> UTCTime }
> newtype Guestbook = Guestbook [Entry]
>
> Our top-level state, Guestbook, is just a list of guestbook entries,
> with the most recent entries added to the beginning of the list.
>
>  - how would we store this same data as, Map ByteString ByteString

At least two ways,

key=username, value=username SEPARATOR message SEPARATOR date

or a lot of key/value pairs
Suppose the username="John"

key = message.John value = the message
etc.

The first is probably more or less what Data.Binary does in any case.
It might just be better to get it out of the API, and into a helper
function
for those who want it.

>  - what happens if we have a bunch of data in our guestbook, and then
> we want to add a new field, messageLength, which contains the number
> of characters in the message?

In the first case, by appending to the end.
In the second case by a new key/value

key = messageLength.John Value=45

Alternatively, one could have more than one map, and get rid of the
silly prefixes.
Probably multiple maps is better than one.


Anyway, what I am saying is maybe just that it makes sense to give
State an established API,
instead of a new one, especially when the new one is strange.


Cheers Morten.

stepcut

unread,
Jun 22, 2009, 3:29:14 PM6/22/09
to HAppS
On Jun 22, 12:29 pm, Lemmih <lem...@gmail.com> wrote:
> You could keep user names, messages and dates in separate maps. The
> Entry map would be virtual and constructed with parallel lookups in
> the field maps.

Right. That is similar to how you would use something like BerkeleyDB.
An that is basically how facebook uses MySQL -- their tables only have
two columns.

But the proposal was for *one* universal map:

"Personally, I think I will write my application with one universal
map, and let State=Map ByteString ByteString."

Now, I happen to think BerkeleyDB is pretty nifty. And, as you are
well aware, there is even a Data.Map-like interface to BerkeleyDB on
hackage,

http://hackage.haskell.org/package/berkeleydb

But, that does not lessen my interest in happstack-state. After all,
using Map ByteString ByteString, is not a magical cure for having to
worry about serialization, versions, or migration. That berkeleydb
package is definitely the right solution for some sites though.It
might even be used in conjunction with happstack-state...

- jeremy

morten...@gmail.com

unread,
Jun 22, 2009, 4:07:49 PM6/22/09
to HAppS
Jeremy,

I gave an example of how to put it in one map above using prefixes.
But multiple maps are fine as well.
No, you still have to worry about it. The question is just where in
the API it is sitting.
Do you think the current API is good for a data store?

stepcut

unread,
Jun 22, 2009, 4:40:25 PM6/22/09
to HAppS
On Jun 22, 1:55 pm, "morten.kr...@amberbio.com"
<morten.kr...@gmail.com> wrote:

> Impossible as I see it. Both multimasters will have to continue.
> From the point of view of each of them, the other one is gone, so they
> have to continue serving requests.
> Both of them believe that. The checkpoints and event logs will now
> need a manual conflict resolution
> in order to repair the system.

There are many possible resolution mechanisms. There is not *one*
mechanism that is optimal for all sites. Examples:

1. one machine is designated the master. It will always serve
requests. The other machine will only serve requests when it can talk
to the master. Since the second machine serves no requests while the
network is partitioned, when the network heals, it can dump it's
internal state and get the latest from the master. Obviously, if the
master is not accessible, then no requests are served.

2. neither machine will allow update events unless they can see each
other, but they can continue to process read-only queries.

3. If you have more than 2 machines, and the number of machines is
known/fixed, then if the network gets partitioned, only the group with
a majority of the members can respond to requests.

4. If the network gets partitioned the machines can talk to a third
party arbitrator which tells them which one should continue processing
requests.

5. you can design your application so that each partition is allowed
to keep operating, and when the network heals an automatic merge
operation is performed. (ie, design it so that merges never result in
conflicts).

6. a specific machine can be declared the designator in advance of a
partition. After the partition, whichever machines are in the group
with the designator can continue. If only the designator is lost, then
a new designator is elected.

Obviously, each of these solutions has pros and cons. The right
solution will be application specific. I don't see how this is
different for any other system. Either you have to handle it in your
application, or the system picks a solution and forces you to use it.
By default happstack could provide a system which uses a combination
of a designator and majority rule.

> Only if you know the syntax to start with, and you are sure that
> Happstack-State will work,
> and you don't need to plan for a migration to another backend.

Yes, you will be be to learn Happstack much faster if you have a
strong grasp on the Haskell language. The promise of happstack is that
even if you have to do a bit of upfront work to learn the platform,
that investment will pay off many fold in the long term.

I don't understand the migration thing. Do people design their systems
for MySQL, but plan to migrate Google's BigTable later?

There are no guarantees that any system will scale. Let's say you pick
MySQL because it will be 'easy' to migrate to Postgres if MySQL is too
slow. And then it turns out that all RDBMS are too slow and you have
to switch to BigTable. Are you really any better off than if you
started with happstack-state?


> > Compared to learning MySQL, learning happstack-state seems like a
> > breeze, yes?
>
> Even if you don't like MySQL, there is nothing in the API as strange
> and meaningless
> as
> type Dependencies HitCounter = End
> or
> entryPoint :: Proxy HitCounter
> entryPoint = Proxy

Right. INNER JOINS are completely obvious. You don't have to read any
documentation to get started.

> If a key/value Data.Map API was used for State, it would be seamless
> to write an
> application to State, and then replace it with another key/value store
> if State turned out to
> leak memory, or if it was decided to use the hard disk instead of pure
> memory.

seemless? I highly doubt that, unless the product you are switching to
is designed to be a drop in replacement.


> At least two ways,
>
> key=username, value=username SEPARATOR message SEPARATOR date

This is basically Serializing the Entry type into a String based
format, yes ?

> or a lot of key/value pairs
> Suppose the username="John"
>
> key = message.John   value = the message
> etc.

Another Serialization scheme.

> >  - what happens if we have a bunch of data in our guestbook, and then
> > we want to add a new field, messageLength, which contains the number
> > of characters in the message?
>
> In the first case, by appending to the end.
> In the second case by a new key/value

So, these are migration schemes. You also need some way to tell what
format the old data was using so you can migrate it to the new format.

> Anyway, what I am saying is maybe just that it makes sense to give
> State an established API,
> instead of a new one, especially when the new one is strange.

So far I am not convinced. It's not like there is an existing key/
value API that all key/value stores use, such that you can just
replace one with another. Most of them have significant architectural
differences that permeate your entire code base. I don't think you are
going to be able to replace BerkeleyDB with BigTable 'seemlessly',
even though they are both key/value stores.

Additionally, the suggestion of providing just a, Map ByteString
ByteString, does not seem to really solve things like the fact that
you need to serialize and migrate data. What it mostly seems to do is
require each developer to come up with their own adhoc mechanisms,
instead of using existing functionality.

It's possible this lowers the barrier to entry, but if that was our
chief concern, we would be writing PHP or Python. Our goal is
designing happstack so that a user who is *familiar with the platform*
can:

- prototype quickly
- deploy painlessly
- scale massively
- operate reliably
- change easily

It is our belief that these features will be compelling enough that
people will be willing to do a bit of learning to gain the benefits.

The current state API has gone through many iterations in an attempt
to deliver those features. And, it is likely to go through more.

As a counter argument, Ruby on Rails introduced an API that was very
different from what people were used to (php+mysql). Yet, it got very
popular. In fact it was the new API that made Ruby on Rails
attractive, because that API offered significant productivity gains.

- jeremy

stepcut

unread,
Jun 22, 2009, 5:25:36 PM6/22/09
to HAppS
On Jun 22, 3:07 pm, "morten.kr...@amberbio.com"
<morten.kr...@gmail.com> wrote:
> No, you still have to worry about it. The question is just where in
> the API it is sitting.
> Do you think the current API is good for a data store?

I think it is pretty good, though it can use incremental improvements.
There is not that much too it:

Serialize/Version
-------------------------

This is actually not part of happstack-state. It is part of happstack-
data, which happstack-state builds on top of. The Serialize/Version
instances are very much in line with the Data, Typeable instances that
many types provide. These instances are useful for many applications
which have nothing to do with happstack-state.

happstack-state
-----------------------

The basic unit of state is a Component. Each Component holds values of
a particular type. The type can be almost any valid Haskell type, with
a few restrictions. If you are familiar with SQL, then a component is
somewhat similar to a SQL database, and the type of a Component is
similar to the schema of a SQL database.

To perform a query on a Component, you simply write a normal Haskell
function that operates on the type stored in the Component. If your
query only needs to read values, then you can use the normal Reader
Monad to get access to the Component value. If you want to write as
well, then you use the normal State Monad.

There is also a bit of glue code that associates the queries with the
components. Because Template Haskell is better than it used to be,
some of this glue code could be eliminated.

So, overall, the thought patterns needed to write code which uses
happstack-state should be very familiar to a Haskell developer. You
simple declare normal Haskell types. And then you write normal
functions that operate in the normal State or Reader monads to operate
on those values.

The cool part is that you are writing code pretty much the same way
you would for a normal, single threaded, Haskell application. You use
the same type of data types that you normally would, and you write
normal functions that operate over those values. So, most of what you
already know about writing Haskell code should already apply. You
don't have to learn about database normalization, you don't have to
learn some other query language, you don't have to convert data from
some flat format to a nice pretty Haskell data type. You just write
very normal looking Haskell code, but, your code actually has
persistence, thread-safe transactions, ACID, and (in progress)
multimaster and sharding.

Haskell programmers know how to map a function over a Tree or list,
etc. Having to think about how to take the data structures they are
familiar with, split them into key/value pairs, and then apply the
functions they already know to key/value pairs does not seem like a
way to leverage their existing expertise. Additionally, even if they
become experts at thinking of things as key/value pairs, it does not
seem obvious that they will be faster at doing things that way. It
seems like more lines of code to do it that way.

So, yes, I think that happstack-state has a lot to offer in terms of
writing web applications quickly, reliably, etc.

I also do not think it is the optimal solution for all state
management. There are definitely cases where something like BerkeleyDB
would be more appropriate.

I also believe that if you have to switch persistent stores mid-
stream, you are in for a world of pain no matter what. If you are
having problems with your persistent storage, you problem is likely
not that you need it to go twice as fast. You probably need it to go
10-100 as fast. And, that probably means that you need a major
architecture change. At that point in time, exporting the old data (to
XML or something) and reimporting it, is going to be a minor issue.

- jeremy

MightyByte

unread,
Jun 23, 2009, 6:55:11 AM6/23/09
to HA...@googlegroups.com

This post could provide the basis for some better Happstack
documentation. A Happstack book anyone?

Xabi Tapia

unread,
Jun 23, 2009, 7:04:53 AM6/23/09
to HA...@googlegroups.com
Why not a Happstack book project written in latex hosted on Patch-tag?

Le 23 juin 09 à 12:55, MightyByte a écrit :

John MacFarlane

unread,
Jun 23, 2009, 12:05:00 PM6/23/09
to HA...@googlegroups.com
+++ Xabi Tapia [Jun 23 09 13:04 ]:

>
> Why not a Happstack book project written in latex hosted on Patch-tag?

What about a happstack book project hosted on a gitit wiki (running
on happstack)? Articles/chapters could be stored in a darcs repository,
and you'd have an up-to-date, browsable, searchable version on the web.
Gitit can export pages to LaTeX (as well as several other formats); in
fact, the dev version allows you to write pages in a restricted subset
of LaTeX instead of the default enhanced markdown. You can also write
pages in literate Haskell...

John

Matthew Elder

unread,
Jun 24, 2009, 2:37:14 AM6/24/09
to HA...@googlegroups.com
A book sounds great ;)
--
Sent from my mobile device

Need somewhere to put your code? http://patch-tag.com
Want to build a webapp? http://happstack.com
Reply all
Reply to author
Forward
0 new messages