Re: [Yesod] Strengthening Yesod's RDBMS capabilities

Mats Rauhala

unread,

Aug 14, 2012, 7:58:49 AM8/14/12

to yeso...@googlegroups.com

If you forget about schema with HaskellDB it's an excellent tool. For
schema and schema migration you should have some other tool available,
but for queries HaskellDB is great. It let's you 'bring' the
database-tier into your logic-tier, without coming in the way of
queries.

As for the bugs, and missing features, HaskellDB is in need of some
tender loving care. One of my troubles was getting the last inserted id,
but I think it was Chris Done who said it would be trivial to implement.
But if HaskellDB has active users, I believe there will also be some
advancement, if not from the maintainer, but maybe by yourself? It's not
difficult codebase.

As far as I see it, you have a couple of choices:

1. Direct sql
- No, or very little type safety
- No, or very little compile time checks
- Raw queries are ugly
+ Full power of SQL
2. ORM
+ Type safety (ala persistent)
- No control over queries
- No proper join support(?, persistent)
+ Abstract
+- High-level
3. Relational algebra
+ Type safety (ala HaskellDB)
+ Great control over queries
+ Good control over abstractions
+ It's algebra, therefore fits Haskell (tongue in cheek)
- Forgotten (HaskellDB)

As for the schema and migrations, it's an interesting subject. If it was
me, I would probably just take the easy way out at least until there is
a need for more. A numbered list of database updates which are ran
sequentially. Keep record of the latest update and next time you need to
update your schema, just start from `previous+1`.

For the more interesting solutions, I think next comes a variant of
the previous. Instead of having a numbered list of updates, have a list
of updates which have dependencies, and create a dependency solver which
runs the updates in correct order. I created a proof-of-concept
solution, taking ~30-50 lines of code, nothing too bad.

And then comes the DSL for entity declarations. I haven't done proper
research yet, only skimmed some materials so take this with a grain of
salt. You need to separate the DSL into several parts. One part is the
syntax itself. Then comes the Haskell-side declaration. From this
declaration you need to be able to find the differences between
database-tier version and your logic-tier version. If there is no
differences, create the initial schema, the easy part. If there are
differences, create the alter table declarations. And even then, how do
you represent the different data types, constraints, encodings, engines
etc. I can't recall any ORM that could create a schema, handle
migrations, and not be limited some way. It could be as simple as saying
that keys need to be numeric and named 'id'. I have many situations
where not only is a natural key more sensible, but I might need to
declare multiple keys. Or at least make multiple keys unique together.

--
Mats Rauhala
MasseR

Greg Weber

unread,

Aug 14, 2012, 10:24:21 AM8/14/12

to yeso...@googlegroups.com

You should consider using Persistent to declare your schema and run migrations.
I think you could have HaskellDB generate SQL queries and have
Persistent execute them and return results in the form of RawSql.

Another approach to this whole issue is to use SQL strings, but to
verify that they are correct. This avoids the need to learn a new
abstraction at the cost of SQL strings being difficult to compose.
https://gitorious.org/persistent-hssqlppp

Vagif Verdi

unread,

Aug 14, 2012, 11:21:05 AM8/14/12

to yeso...@googlegroups.com

On Tuesday, August 14, 2012 04:36:36 AM Tero Laitinen wrote:
I'm intending to use an RDBMS ...

Which one ?

Tero Laitinen

unread,

Aug 15, 2012, 12:59:26 AM8/15/12

to yeso...@googlegroups.com

You managed to encourage me with HaskellDB, so I'm going to try it out. As long as I don't get any nasty surprises, I'm happy to leave SQL query generation to an existing library.

I would like to be aware of the table and column names, so if needed, I can write some stored procedures as well.

For now, I'm happy with the way Database.Persist handles migrations. Only thing I'd need more is support for creating and dropping indexes.

As for the database identifiers, integers are fine for me, but if a more general library is sought, then it requires a more elaborate solution. I'm wondering what is the overhead of having a "redundant" numeric identifier and a multi-column unique index (that would be the natural primary key)..

Tero Laitinen

unread,

Aug 15, 2012, 1:03:16 AM8/15/12

to yeso...@googlegroups.com

Using Persistent for migrations sounds like a good idea since Database.Persist integrates already so nicely with Yesod. I'll have a look if the SQL generated by HaskellDB is "compatible" with Database.Persist.

I'll try to avoid generating raw SQL by myself as much as possible.

Tero Laitinen

unread,

Aug 15, 2012, 1:04:53 AM8/15/12

to yeso...@googlegroups.com

I'm using PostgreSQL, but I can imagine that there are people out there using MySQL and other databases, so it might be better not to restrict the library to a certain database engine.

Mats Rauhala

unread,

Aug 15, 2012, 1:52:22 AM8/15/12

to yeso...@googlegroups.com

In that case, check out this link too:
http://users.utu.fi/machra/posts/2011-07-15-haskelldb.html

--
Mats Rauhala
MasseR

Michael Snoyman

unread,

Aug 15, 2012, 11:45:42 AM8/15/12

to yeso...@googlegroups.com

On Tue, Aug 14, 2012 at 2:36 PM, Tero Laitinen <tola...@gmail.com> wrote:

Hello,

I recently started using Yesod in a commercial context. So far I've been very pleased with the framework. However, I'm intending to use an RDBMS as the database backend, so the non-relation Database.Persist layer falls a bit short. HaskellDB seems to be more suitable to use with a relational database. I read an alarming post (http://chrisdone.com/posts/2011-11-06-haskelldb-tutorial.html) commenting about its bugs, so I don't know yet which database layer to tap into.

I'll describe a bit of my use case, so my approach hopefully makes more sense. I'm building data intensive applications that can have dozens or even hundreds of tables. I want to automate most database tasks like database migrations (including indexes), validation of fields' data, versioning, per-user-limited-database-views, etc. Also, I may need to access the database from a Python or C++ application, so building on top of Template Haskell might incur additional troubles. Also, I have no idea on how to write a EDSL with Haskell, so at the moment I'm considering a high-level DSL and a code generator to do the heavy lifting.

I imagine that there is a number of people who might benefit from a library that integrates Yesod tighter to an RDBMS, there could be potential for fruitful collaboration.

I did some preliminary hacking experimenting with a DSL aimed at defining a database in a modular way. The parser for the work-in-progress DSL is at https://github.com/tlaitinen/enterdsl .

There's not much to see, so I explain the core ideas here.

The database schema is composed of database definition modules. Let's call them dbdef-files from now on. There are three kind of objects in a dbdef-file: entities, relations, and interfaces. The first two are easier to understand with some background of the ER-model while the third one offers a polymorphic interface to a number of entities,

For example, the top level dbdef-file can be following.

import "module.dbdef"; -- entities and interfaces from "mobule.dbdef" can now be used


entity Bar {
-- string field with a default value, and whenever
-- the field value is changed, the function longer3 has to return True
string name (default "foo", check longer3);

-- one-to-one relation between Bar and Foo. The identifier 'bar' in parentheses
-- is "attached" to the entity "Foo", e.g. Foo.bar can be used to retrieve the

-- corresponding Bar.
Foo foo (bar);

-- defines a unique index with one field
unique index (name);
}
entity Foo {
-- implements-keyword is used to inherit all fields and relations

-- of an interface, see below
implements Versioned;

optional string name;
}
interface Versioned {
-- this field is embedded in every table implementing the interface

integer version;
}

-- a "relation" keyword specifies typically a many-to-many relationship
-- implemented as a single table, but when used with interfaces,

-- it can also be a one-to-many relationship the relation.
-- If an interface is used in the relation, the relation is implemented
-- using a number of tables - one for each implements-instance.

relation Changes {
-- in this case, there will be a table referencing Foo and ChangeRecord
-- since Foo implements Versioned-interface.
-- the notation [] in parentheses denotes the (many-)multiplicity of the relation.

-- Foo.changes can now be used to retrieve all ChangeRecord's of a number of Foo entityies
Versioned versioned ([] changes);
ChangeRecord change;
}

entity ChangeRecord {
string field:
string oldValue;
string newValue;
}

relation FooBar {
Foo foo ([] fooBars);
Bar bar (bar);

}

Please bear with the syntax. It was just easier to write a parser with explicit blocks and statements. The idea is that the database interfaces can also be used to perform operations on objects implementing an interface, e.g.

- selecting some objects implementing a specific interface
- creating "business layers" on top of the database, e.g. defining a layer to update an entity implementing Versioned-interface or to display an entity implementing a Renderable-interface.

I can already confess that I have not extensively searched for related work on this, so I would like to hear your comments on:
- related work
- how Yesod's RDBMS capabilities could be enhanced and whether there is any work-in-progress on this

- which database layer should be used for this
- whether template Haskell or a separate DSL is the way to go
- potential pitfalls of the proposed entity/relation/interface paradigm

Best Regards,
Tero Laitinen

I've actually been kicking around some ideas in my head for a more powerful SQL generation library built on top of Persistent. I considered trying to get it written for 1.0, but I ran out of time and didn't yet have a completely fleshed-out design. Here's a sketch of what I've been thinking of, maybe it will be a good basis for some brainstorming.

I first wanted to start off with a way to select individual fields. I was thinking of some kind of function that you could call like:

rows <- runQuery $ singleTable (PersonName, PersonAge) [PersonName !=. "foo"] [Asc PersonAge]

print (rows :: [(Text, Int)])

Next was going to be some way to perform joins between two tables based on explicitly state IDs:

let personQuery = singleTable PersonName [] []

carQuery = singleTable EntireEntity [] []

fullQuery = innerJoin personQuery carQuery PersonId CarOwner

rows <- runQuery fullQuery

print (rows :: [(Text, Car)])

That initial design doesn't really support querying the same table multiple times, which is a limitation I'd like to overcome. I'd also like to think about a way to implement aggregates and HAVING clauses. But as a core concept, I think Persistent's existing infrastructure might be able to get us pretty far.

Michael

Tero Laitinen

unread,

Aug 16, 2012, 5:53:41 AM8/16/12

to yeso...@googlegroups.com

Thanks for the link. Do you have any tips on how to integrate Database.Persist and HaskellDB?

Tero Laitinen

unread,

Aug 16, 2012, 6:03:17 AM8/16/12

to yeso...@googlegroups.com, mic...@snoyman.com

I've actually been kicking around some ideas in my head for a more powerful SQL generation library built on top of Persistent. I considered trying to get it written for 1.0, but I ran out of time and didn't yet have a completely fleshed-out design. Here's a sketch of what I've been thinking of, maybe it will be a good basis for some brainstorming.

I first wanted to start off with a way to select individual fields. I was thinking of some kind of function that you could call like:

rows <- runQuery $ singleTable (PersonName, PersonAge) [PersonName !=. "foo"] [Asc PersonAge]

print (rows :: [(Text, Int)])

Next was going to be some way to perform joins between two tables based on explicitly state IDs:

let personQuery = singleTable PersonName [] []

carQuery = singleTable EntireEntity [] []
fullQuery = innerJoin personQuery carQuery PersonId CarOwner
rows <- runQuery fullQuery
print (rows :: [(Text, Car)])

That initial design doesn't really support querying the same table multiple times, which is a limitation I'd like to overcome. I'd also like to think about a way to implement aggregates and HAVING clauses. But as a core concept, I think Persistent's existing infrastructure might be able to get us pretty far.

Michael

This is excellent news that you have already considered this issue.

In case of tree-like relations (e.g. Person.parentId references another Person), joining the same table another time would be needed. Is the code snippet you pasted functional to some degree? I also need outer joins for cases with optional attributes.

Tero

Mats Rauhala

unread,

Aug 16, 2012, 6:56:26 AM8/16/12

to yeso...@googlegroups.com

Not really. You need to tell HaskellDB somehow the schema, so you need
to keep it up to date. Then again Persist creates the schema for you.
You can try having a pre-step that checks the schema and creates the
necessary files. (1) can create the HaskellDB schema files (DBInfo) for
you from what Persist makes, but it's clumsy because it needs to be ran
beforehand, but it's still probably your best bet.

[1] http://hackage.haskell.org/packages/archive/haskelldb/2.1.1/doc/html/Database-HaskellDB-DBSpec-DatabaseToDBSpec.html

--
Mats Rauhala
MasseR

Michael Snoyman

unread,

Aug 16, 2012, 8:16:39 AM8/16/12

to yeso...@googlegroups.com

If by "functional" you mean that the code to implement is already written: no. If you mean that it's reasonably simple to implement the code I described, I *think* the answer is yet. I also think that left, right, and outer joins should be easily implemented (though we should probably avoid right joins since SQLite doesn't like them, and they can always be replaced with left joins).

A basic idea I had for implementing multiple tables and aggregates: perhaps we need to stick this all in a Monad which provides a unique identifier for each time a table is used, and some unique name for each aggregate. Then perhaps we could build up more complicated queries with something like:

person1 <- getTableName

totalIncome <- getAggregateName $ SqlSum PersonIncome

query1 <- mkQuery $ singleTable { fields = (PersonAge, totalIncome), having = [totalIncome >=. 50000] }

...

It's not really a fleshed-out idea, though.

Michael

Felipe Almeida Lessa

unread,

Aug 16, 2012, 8:42:35 AM8/16/12

to yeso...@googlegroups.com

On Thu, Aug 16, 2012 at 9:16 AM, Michael Snoyman <mic...@snoyman.com> wrote:
> A basic idea I had for implementing multiple tables and aggregates: perhaps
> we need to stick this all in a Monad which provides a unique identifier for
> each time a table is used, and some unique name for each aggregate. Then
> perhaps we could build up more complicated queries with something like:
>
> person1 <- getTableName
> totalIncome <- getAggregateName $ SqlSum PersonIncome
> query1 <- mkQuery $ singleTable { fields = (PersonAge, totalIncome),
> having = [totalIncome >=. 50000] }
> ...
>
> It's not really a fleshed-out idea, though.

Nice! It's reminding me of LINQ. Does anyone here have experience
using it? I've only read about it, never used. Maybe persitent could
become Haskell's LINQ-to-SQL?

Cheers,

--
Felipe.

Tero Laitinen

unread,

Aug 16, 2012, 11:49:03 AM8/16/12

to yeso...@googlegroups.com, mic...@snoyman.com

Okay, for the time being, I'll probably try to find a way to combine Persistent and HaskellDB. However, I'll be happy to try out new functionality in Persistent when proper support for joins is working.

Tero

Reply all

Reply to author

Forward