sqlpp11 - compile time SQL expressions instead of a string-based approach

2,192 views
Skip to first unread message

Roland Bock

unread,
Feb 16, 2014, 1:22:49 PM2/16/14
to data...@isocpp.org
Hi everybody,

I would like to contrast the string-based approach of N3886 and many others with sqlpp11 [1], an embedded domain specific language for SQL in C++:

sqlpp11 is a template library that creates SQL expression trees at compile time in a type-safe manner. These expression trees can be evaluated by connector libraries, which interact with the actual database.

Databases to connect to are in no way limited to SQL databases. Think LINQ[3]. You use sqlpp11 as a frontend to connect to pretty much anything. For instance, there are existing connectors to MySQL, SQLite3, but there is also an SQL interface to std::vector[2].

The benefits of the compile time expressions versus the string-based approach are huge in my opinion:

  1. Type safety during statement construction
  2. Type safety when accessing result rows
  3. Immediate feedback for incorrect queries at compile time instead of runtime during tests or even production
  4. The SQL statements are already present as an expression tree at compile time. There is no requirement for the database to parse an SQL string
  5. The presence of the compile time tree allows to transform statements at compile time for databases of many different kinds, no only SQL databases.


For instance, this code snipplet is taken from a test for SQL with std::vector:

    using container = std::vector<sample>;
    container data;

    sql::connection<container> db(data);
    TabSample tab;

    db.run(insert_into(tab).set(tab.alpha = 17));
    db.run(insert_into(tab).set(tab.beta = "cheesecake"));
    db.run(insert_into(tab).set(tab.alpha = 42, tab.beta = "hello", tab.gamma = true));
    db.run(insert_into(tab).set(tab.gamma = true));

    for (const sample& row: db.run(select(all_of(tab(tab))
                                     .from(tab)
                                     .where(tab.alpha < 18)))
    {
        std::cerr << "alpha=" << row.alpha << 
                   ", beta=" << row.beta << 
                   ", gamma=" << row.gamma << std::endl;
    }
The syntax is the same as with any "normal" SQL database, but here the SQL statements are transformed into functors at compile time, which do the appropriate things at runtime.

There is no formal proposal yet, but I believe that an approach with strongly typed compile time expressions is superior to any string based approach.


Regards,

Roland


[1]: https://github.com/rbock/sqlpp11
[2]: https://github.com/rbock/sqlpp11-connector-stl
[3]: http://en.wikipedia.org/wiki/Language_Integrated_Query

Klaim - Joël Lamotte

unread,
Feb 16, 2014, 1:51:18 PM2/16/14
to data...@isocpp.org

On Sun, Feb 16, 2014 at 7:22 PM, Roland Bock <rb...@eudoxos.de> wrote:
There is no formal proposal yet, but I believe that an approach with strongly typed compile time expressions is superior to any string based approach.

I believe that without a proposal, not a lot more can be done by this group for now (other than giving feedback on the library, like on the boost mailing list).
Therefore, are you currently working on that proposal or not yet? Or maybe you are looking for someone to help writing one? (I'm not available but making it clear might help finding someone here).

Johann Anhofer

unread,
Feb 16, 2014, 3:45:58 PM2/16/14
to data...@isocpp.org
A few questions/comments:

ad benefit 3.) This only helps against syntax errors in sql queries. It doesn't help against a lot of other problems which arise in combination with databases, at runtime (Missing tables, incorrect database schema versions, incorrect database versions, ...), and no untested sql statements should go into production code. 

ad benefit 4.) This sounds interesting, if I unterstand this correct. Does this mean you are able to access any relational database without SQL ? Only using your compile time expression tree? How does this work?

ad benefit 5.) There are no such things as "SQL databases", there are also no such things as "No SQL Databases". There are "relational  Databases" for which SQL is used to communicate with them, and there are highly optimized (for speed) key/value stores which use other interfaces. 

Question 1.) In your snippet, how are the types sample and TabSample defined?

Question 2.) Have plans to support any kind of SQL statements (e.g. Common Table Expressions), 
or only a limited subset?

The problem with Database abstractions at this level (like with entity frameworks) IMHO is, that
the user/programmer of such libraries loose the understanding of a relational database and how to
use it in a performant manner. So they would never replace SQL completely.


Regards, Johann

Roland Bock

unread,
Feb 17, 2014, 1:25:57 AM2/17/14
to data...@isocpp.org
On 2014-02-16 19:51, Klaim - Joël Lamotte wrote:

On Sun, Feb 16, 2014 at 7:22 PM, Roland Bock <rb...@eudoxos.de> wrote:
There is no formal proposal yet, but I believe that an approach with strongly typed compile time expressions is superior to any string based approach.

I believe that without a proposal, not a lot more can be done by this group for now (other than giving feedback on the library, like on the boost mailing list).
Sure, no decision can be made without a proposal, but the database topic is a complex one and discussions will help both in preparing and assessing proposals, IMHO. The discussions so far (e.g. on boost or std-proposals or at my day-job) have helped a lot in understanding requirements way beyond what I had in mind when I started development of sqlpp11.
It is important to me to get lots of different feedback before making a formal proposal for the standard and/or for boost.

I certainly hope that such discussions are useful to others as well.


Therefore, are you currently working on that proposal or not yet? Or maybe you are looking for someone to help writing one? (I'm not available but making it clear might help finding someone here).
I am not working on a proposal document yet, and I would be very glad to find someone to help writing one :-)

Regards,

Roland


Roland Bock

unread,
Feb 17, 2014, 1:54:42 AM2/17/14
to data...@isocpp.org
Hi Johann,

Thanks for your detailed feedback and questions!


On 2014-02-16 21:45, Johann Anhofer wrote:
A few questions/comments:

ad benefit 3.) This only helps against syntax errors in sql queries. It doesn't help against a lot of other problems which arise in combination with databases, at runtime (Missing tables, incorrect database schema versions, incorrect database versions, ...), and no untested sql statements should go into production code.
True, and if test system and production system are out of sync, then even testing doesn't help much. There is no silver bullet. But the compiler can help reducing the code/test cycles by telling you about incorrect statements before they enter testing.
It also helps if for whatever reason there are no tests. This shouldn't happen, but it does.



ad benefit 4.) This sounds interesting, if I unterstand this correct. Does this mean you are able to access any relational database without SQL ? Only using your compile time expression tree? How does this work?
The important point is: You don't need a string to express and SQL statement. The string is just one possible representation. The string is a serialized form of the SQL statement (you serialize the expression into a string) which needs to be parsed afterwards by the database.
With a concept like sqlpp11 you don't serialize into a string but construct an expression tree instead. This tree can be interpreted, even at compile time. That's why I wrote the SQL-interface to std::vector. It shows how to rewrite the expression tree at compile time into combination of functors which handle INSERT and SELECT on a std::vector. No SQL is parsed at runtime in that code.

Thus, the expression trees make you more flexible, for instance:

  • You can serialize the tree into a string, if your database backend requires a string. But even in this step, the tree has a major benefit: sqlpp11 comes with a default serializer, but database connectors can override it for those parts, where their database expects strings that differ from the standard. For instance, MySQL uses the CONCAT method instead of the ||-Operator. So if the expression tree contains string concatenation, the MySQL serializer can translate it in a way that the database comprehends.
    As a result, the developer does not have to care so much about vendor specifics.
  • You can translate the expression tree at compile time
  • You can translate the expression tree at runtime

In the latter cases, you skip the serialization/parsing steps which improves runtime performance and lowers the barrier for using SQL syntax with databases which do not have a string-based SQL interface like containers or XML or JSON, etc.

Of course, someone has write libraries to perform this translation of expression trees. This does not happen automatically. But it is much simpler than I anticipated a few months ago. I had a first version of the partial SQL-interface for std::vector ready in about a day.



ad benefit 5.) There are no such things as "SQL databases", there are also no such things as "No SQL Databases". There are "relational  Databases" for which SQL is used to communicate with them, and there are highly optimized (for speed) key/value stores which use other interfaces.
Right. I meant databases with an SQL interface.
But there are a lot of databases and in memory structures that have operations that correspond to aspects of SQL. A library like sqlpp11 can be used to create (partial) SQL interfaces to those functions.




Question 1.) In your snippet, how are the types sample and TabSample defined?
sample is a simple struct:
struct sample
{
    int64_t alpha;
    std::string beta;
    bool gamma;
};
The class TabSample describes this struct as a database table to the library sqlpp11. I attached the header file for TabSample. Most of it is boilerplate which could easily be generated by a simple macro or a small code generator.

You can also find the slowly improving SQL<->std::vector library along with the examples and table definitions here:
https://github.com/rbock/sqlpp11-connector-stl/tree/develop (the develop branch is slightly more advanced than the master).

BTW: You could add an SQL interface to git and then do stuff like

select(commit.id).from(commits).where(commit.branch == 'develop' and commit.message.like('vector')).

That shouldn't be hard to do, as well.



Question 2.) Have plans to support any kind of SQL statements (e.g. Common Table Expressions), 
or only a limited subset?
I intend to support any kind of SQL statement, including vendor specific extensions. The level of support will vary, though:

The library will support a major subset "natively". It also allows for extensions in quite a few ways. For instance, you can add unknown functions or datatypes with rather little overhead.

Also (this is not done yet and one reason I haven't started to write a proposal yet), there will be support for extending the structure of statements, e.g. Hierarchical Queries for Oracle. The extensions themselves would not be part of sqlpp11, but the ability to extend the syntax would.



The problem with Database abstractions at this level (like with entity frameworks) IMHO is, that
the user/programmer of such libraries loose the understanding of a relational database and how to
use it in a performant manner. So they would never replace SQL completely.
I believe there is a misunderstanding. sqlpp11 is not an abstraction layer or an ORM. It just offers a different way of expression SQL statements. You still have to deal with all the gory details like how to join your tables, which condition to put into the WHERE and which condition to put into the HAVING part, for example. sqlpp11 does not relieve you of any of those decisions.

But it can tell you at compile time if your syntax is wrong, or if you forgot to add a table to the FROM, or if your database does not support one of the features you are using. It also helps to write vendor neutral code (where there are equivalent but slightly different implementations between vendors) or to isolate vendor specific code, which makes it easier to migrate.

It most likely improves performance, if the parsing of strings can be omitted. And it makes it easier to use SQL on databases, data structures, streams, etc, which do not have a "natural" SQL interface.


Sorry, if the answers are a bit lengthy. I'm still struggling with finding the best way to explain how these expression trees can be utilized :-)


Regards,

Roland

--
You received this message because you are subscribed to the Google Groups "SG11 - Databases" group.
To unsubscribe from this group and stop receiving emails from it, send an email to databases+...@isocpp.org.
To post to this group, send email to data...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/databases/.
For more options, visit https://groups.google.com/a/isocpp.org/groups/opt_out.

TabSample.h

Csaba Csoma

unread,
Feb 17, 2014, 1:42:38 PM2/17/14
to data...@isocpp.org
The "ad benefit 3.)" comment is important: schema management is not optional.

For example the Scala/Lift Mapper has a "Schemifier.schemify" function.
You can call this when your application starts - it will figure out the difference between the schema encoded in your app and the actual database.
It will emit the requires SQL commands if there's any difference.

So ideally in your C++ code you could just a add a new field to Users class, called "timestamp FirstContacted", build and run.
The "schemifier" would take care of "ALTER TABLE users ADD COLUMN first_contacted timestamp" query.

This way you can have relationships. data types and widths, etc all in your code, no additional tool required.

Csaba

Roland Bock

unread,
Feb 17, 2014, 4:25:48 PM2/17/14
to data...@isocpp.org
On 2014-02-17 19:42, Csaba Csoma wrote:
The "ad benefit 3.)" comment is important: schema management is not optional.

For example the Scala/Lift Mapper has a "Schemifier.schemify" function.
You can call this when your application starts - it will figure out the difference between the schema encoded in your app and the actual database.
Finding differences is certainly possible. And I agree that that is a very important aspect.


It will emit the requires SQL commands if there's any difference.
Nice idea, it sounds cool but also bit scary :-)

For instance, today, I had to "move" some columns from one table into a new table, using ALTER TABLE T..., UPDATE T JOIN SELECT T..., CREATE TABLE N, insert into N..select, ALTER TABLE T. There is no way the software can guess what to do just by analyzing the difference.

Also, there might be several services operating on the same database. What if one service is updated and finds a non-matching database. Should it modify the database (leading to undefined behavior in the other services) or should it report the inconsistency and halt?

Personally, I'd prefer the second option: Report the problem and let the admin figure it out.

Can you give me an example of a setup where the automated update of the database would be preferable?



So ideally in your C++ code you could just a add a new field to Users class, called "timestamp FirstContacted", build and run.
The "schemifier" would take care of "ALTER TABLE users ADD COLUMN first_contacted timestamp" query.

This way you can have relationships. data types and widths, etc all in your code, no additional tool required.
I guess it would be possible to do that but it would only work if you really just want to modify your tables without any complex logic for initializing new columns for instance, right?

So I guess you would call schemify() in some programs and not in others...

Hmm. Another question that comes to mind: How would schemify "know" which tables to check? As of now, there is no registry for tables (although that could be changed easily in the table classes).


Regards,

Roland

Csaba Csoma

unread,
Feb 17, 2014, 6:31:34 PM2/17/14
to data...@isocpp.org
Roland,

> Nice idea, it sounds cool but also bit scary

That's why you call the schemify function and it does not just happen automatically.

Of course moving fields and other advanced changes you still have to do manually - but that does not happen often.
In past 3-4 years maybe we had 2 manual changes and hundreds of automatic updates.
"add new field x", "increase to string 50" etc are all automatic schema migrations.

Especially with the newer API based architectures (Amazon, Netflix) it makes sense the API itself to take care of the database too.
There's just one "authority", no multiple apps changing the same schema.

Report the problem and let the admin figure it out.

Yes, you could report the problem and let somebody else figure it out, but that's not developer friendly.
Solving most of the issues and leaving only just the few exception to manual intervention should be the way to go.
Manual changes are error prone, we should avoid as much as possible, especially when there's a trivial automation.

BTW, that means there MUST be a way to manually override whatever the library can't handle.
For example check if column X exists and if not, execute an SQL command like the move you described.
Now just deploy your app to development server, test, staging, production etc and takes care of itself.

 just want to modify your tables without any complex logic for initializing new columns

There's no reason why you can't have any complex logic as a "Default" string that gets handed over to the database as it is.

How would schemify "know" which tables to check?

In Lift Schemify expects a list of tables, does not touch anything that's not on this list.
It means an app can be "in charge" only for a part of the database and just consume the other tables.

Csaba

Roland Bock

unread,
Feb 17, 2014, 11:35:20 PM2/17/14
to data...@isocpp.org
On 2014-02-18 00:31, Csaba Csoma wrote:
Roland,

> Nice idea, it sounds cool but also bit scary

That's why you call the schemify function and it does not just happen automatically.

Of course moving fields and other advanced changes you still have to do manually - but that does not happen often.
In past 3-4 years maybe we had 2 manual changes and hundreds of automatic updates.
"add new field x", "increase to string 50" etc are all automatic schema migrations.
The ratio is a bit different for me, but it certainly makes sense to have automated updates for certain cases.

I've added schemify and schema_report to my todo list.

Thanks and regards,

Roland

Johann Anhofer

unread,
Feb 19, 2014, 8:25:17 AM2/19/14
to data...@isocpp.org
How do I build the library?

If I try to build the library, I got an error from the ddl2cpp script. (I've called cmake . and then make)

It yells "ImportError: No module named pyparsing"

On my Ubuntu 13.10 with gcc 4.8.1, python 2.7.5 and cmake 2.8.11.2 

Any ideas?

Johann

William La Cholter

unread,
Jun 29, 2014, 5:22:33 PM6/29/14
to data...@isocpp.org

I think a query builder could be a great thing. And if fused to a database back-end could enable code considerably faster than the current method of client: generate text -> client: bind values -> server: parse text -> server: extract values -> server execute query

High-level comments: in a similar vein to other feedback, and without downloading the code yet:
  • I think there needs to be support for SQL dialects, represented as separate type collections (e.g., postgresql::statement, postgresql::database, sqlite::database, etc.). There may be a base dialect from which the others can be built (helps with reuse and reduces burden to add a dialect).
  • If/when there's reflection and an externalized representation of column, table, view, etc. names, then the query expressions should also be constructed from them (e.g., insert<Reflection>(stmt, source) could be replaced by insert<Reflection>(database, source), where the statement could be created from a query expression, which is created from the Reflection).
  • I'd be very interested in seeing a full DSL, akin to Boost Fusion, but I'm not sure that something like that could ever fly at the C++ Standards level.
  • Does this work with expressions being deferred? E.g., does ".where(tab.alpha < 18)" result in an expression functor that renders "WHERE $1 < 18"?
  • Ultimately it'd be nice to have the compiler generate the query test. I'm not sure that there's a sensible approach with constexpr (there have been lots of suggestions on text processing at compile-time, but they are messy).

Roland Bock

unread,
Jun 30, 2014, 1:04:34 AM6/30/14
to data...@isocpp.org
That is one of the great potential benefits indeed. The expression tree can be translated directly into code understood by the back-end. I have not gotten to do that with any "classic" database yet, but I have a proof of concept for inserting/selecting structs to/from a vector [1]. Here the SQL expression tree is translated at compile time into a combination of functors that perform the insert/select.


High-level comments: in a similar vein to other feedback, and without downloading the code yet:
  • I think there needs to be support for SQL dialects, represented as separate type collections (e.g., postgresql::statement, postgresql::database, sqlite::database, etc.). There may be a base dialect from which the others can be built (helps with reuse and reduces burden to add a dialect).
Right. I just released that :-)
sqlpp11 aims to represent the standard, but you can easily change the structure of statements to support certain dialects. All of this at compile time, of course, so that the compiler can tell you if you are writing correct statements for your current database.

For example, this is how a blank select statement is defined:
  template<typename Database>
    using blank_select_t = statement_t<Database,
          select_t,
          no_select_flag_list_t,
          no_select_column_list_t,
          no_from_t,
          no_extra_tables_t,
          no_where_t<true>,
          no_group_by_t,
          no_having_t,
          no_order_by_t,
          no_limit_t,
          no_offset_t>;

Behind the curtains, when you declare the columns you want to select, the no_select_column_list_t is replaced by an appropriate type, when you declare which tables you want to select from, the no_from_t is replaced, etc.

Each part of a statement requires some boilerplate code but is actually quite simple to write so that you can easily add vendor specific features to, say postgresql::blank_select_t.

Removing features is even simpler: You can specify in the interpreter (the template that translates the expression tree into whatever your database needs) if a feature is not supported.

Similar, if you are using an intermediate string representation, this representation might differ from the standard in the dialect your database supports. This can also be taken care of in the interpreter.


  • If/when there's reflection and an externalized representation of column, table, view, etc. names, then the query expressions should also be constructed from them (e.g., insert<Reflection>(stmt, source) could be replaced by insert<Reflection>(database, source), where the statement could be created from a query expression, which is created from the Reflection).
To be honest, I am having trouble to grok that. Can you elaborate on what Reflection would contain?


  • I'd be very interested in seeing a full DSL, akin to Boost Fusion, but I'm not sure that something like that could ever fly at the C++ Standards level.
Well, who can be sure about the future? I am certainly willing to try! And I am looking for people to join that quest :-)

I am also going to give talks about sqlpp11 at CppCon in September and at MeetingC++ in Dezember and the feedback I received up front is quite encouraging.


  • Does this work with expressions being deferred? E.g., does ".where(tab.alpha < 18)" result in an expression functor that renders "WHERE $1 < 18"?
Hmm? For prepared statements, you can write

.where(t.alpha == parameter(t.alpha))

This will translate to "WHERE tab_sample.alpha == $1" (or whatever your database requires). And then you can set the parameter by name (type checked, of course), e.g.

auto s = select(all_of(t)).from(t).where(t.alpha == parameter(t.alpha));
auto p = db.prepare(s);
p.params.alpha = 7;
for (const auto& row : db(p))
{
  std::cout << row.alpha << std::endl;
}
Is that what you are looking for? Otherwise please explain in more detail :-)


  • Ultimately it'd be nice to have the compiler generate the query test. I'm not sure that there's a sensible approach with constexpr (there have been lots of suggestions on text processing at compile-time, but they are messy).
If you want to use a text representation that would be nice, of course. But I have no idea how to do that at compile time. Maybe there is a way to use boost karma?


Best regards,

Roland

Roland Bock

unread,
Jun 30, 2014, 1:07:40 AM6/30/14
to data...@isocpp.org
On 2014-06-30 07:04, Roland Bock wrote:
On 2014-06-29 23:22, William La Cholter wrote:

I think a query builder could be a great thing. And if fused to a database back-end could enable code considerably faster than the current method of client: generate text -> client: bind values -> server: parse text -> server: extract values -> server execute query
That is one of the great potential benefits indeed. The expression tree can be translated directly into code understood by the back-end. I have not gotten to do that with any "classic" database yet, but I have a proof of concept for inserting/selecting structs to/from a vector [1]. Here the SQL expression tree is translated at compile time into a combination of functors that perform the insert/select.

Sorry, forgot to add the link:

[1] https://github.com/rbock/sqlpp11-connector-stl
Reply all
Reply to author
Forward
0 new messages