Google Bigquery support

936 views
Skip to first unread message

rla...@fastly.com

unread,
Aug 15, 2016, 9:01:53 AM8/15/16
to sqlalchemy
I would be interested in contributing towards implementing a dialect for BigQuery. Is there a minimal dialect implementation that I could use as a guide on how to start doing this for BigQuery?

Thanks
Raul

Mike Bayer

unread,
Aug 15, 2016, 9:54:11 AM8/15/16
to sqlal...@googlegroups.com
I took a peek at BigQuery and the first issue is that it does not seem
to have a pep-249-style (https://www.python.org/dev/peps/pep-0249/)
DBAPI available, looking at google's docs I just see this:
https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/python/latest/
. So the first step would likely be creating some kind of DBAPI-like
facade that approximates the connection / cursor / execute semantics of
pep 249. That would make construction of a SQLAlchemy dialect more
straightforward. For reflection it looks like the APIs at
https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/python/latest/bigquery_v2.tables.html
would handle that.

BigQuery does not seem to use SQL for inserts/updates/deletes and
instead has direct API commands. It still may be possible to
approximate these with a SQLAlchemy dialect, if the SQL compiler here
returned special command tokens (or just python functions) rather than
strings which were then interpreted by the dialects execution mechanics
to route those statements into the specific API calls, rather than
SQL-strings (see the CALCHIPAN example below for one way to do this).

I think a SQLAlchemy dialect here is likely feasible but it would be
obviously extremely limited in its behaviors. To start this, I'd
first start with the README for writing dialects:

https://bitbucket.org/zzzeek/sqlalchemy/src/94a95b3e8fa0e6c8f9201f85a5f119cd424d72ac/README.dialects.rst?fileviewer=file-view-default

that will point you to a very undeveloped dialect for MS Access as the
"example", but that example is only to illustrate file layout, not
implementation. For implementation, I'd first familiarize roughly with
a "normal" dialect, e.g. look in
https://bitbucket.org/zzzeek/sqlalchemy/src/94a95b3e8fa0e6c8f9201f85a5f119cd424d72ac/lib/sqlalchemy/dialects/?at=master
and then check out sqlite, mysql, and postgresql a little bit.

Then, because you're writing a very "alternative-" style dialect, maybe
look at some of the more weird ones externally, which are listed at
docs.sqlalchemy.org/en/latest/dialects/index.html#external-dialects. In
particular you can check out my own CALCHIPAN dialect
https://bitbucket.org/zzzeek/calchipan/ which is a dialect for Pandas
dataframes - the Google BigQuery API here would be a little bit like
this in some ways (though not as complicated). In particular with
CALCHIPAN you can see how the SQL compiler generates callable functions,
rather than strings, in order to invoke Pandas APIs in response to
SQLAlchemy Core elements, rather than sending SQL strings to a database.
It also makes a "fake" pep-249 DBAPI so you can see how that looks too.

I think this is doable so let me know how it goes!







>
> Thanks
> Raul
>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

rla...@fastly.com

unread,
Aug 15, 2016, 12:11:33 PM8/15/16
to sqlalchemy
Hey Mike,
Thanks for the thoughtful response. Will have a look in the resources you suggested.

Many thanks!
Raul

rla...@fastly.com

unread,
Aug 19, 2016, 6:37:36 AM8/19/16
to sqlalchemy
Hello all,

I have implemented the first stab at a PEP 249 adaptor layer for BigQuery and it seems to work well. It is possible to create an engine/connection/cursor, submit an SQL query and get results back (only SELECT statements for now, API commands will come later). I have moved on to changing the DDL and statement compilers to conform to BigQuery's standard SQL dialect:


I hit an immediate hurdle and I am not sure if it is because the Dialect object I created is incorrect or because the PEP 249 adapter is behaving in an unexpected way. Basically, upon connection SQL alchemy will fire some test queries (afaiu to detect whether column names support unicode), one of which is being rendered as 

SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1


The problem is that BigQuery does not support VARCHAR. I have already added a colspecs dictionary to my new dialect object, with many common data types mapped to their BgQuery equivalents:


colspecs = {

        types.Unicode: BQString,

        types.Integer: BQInteger,

        types.SmallInteger: BQInteger,

        types.Numeric: BQFloat,

        types.Float: BQFloat,

        types.DateTime: BQTimestamp,

        types.Date: BQTimestamp,

        types.String: BQString,

        types.LargeBinary: BQBytes,

        types.Boolean: BQBoolean,

        types.Text: BQString,

        types.CHAR: BQString,

        types.TIMESTAMP: BQTimestamp,

        types.VARCHAR: BQString

    }


I was under the impression that this would be enough to define a behaviour where sqlalchemy queries using e.g. VARCHAR would be compiled using the BQString class, which should render as 'STRING' as defined in its get_col_spec method. This is in accordance to BigQuery's basic type system:


However, the query renders as above and the underlying PEP 249 throws an exception. I could however envisage a type of operation where 2 queries a fired to the DB, one using VARCHAR and another using unicode, in order to detect which one succeeds. If this is the case, maybe the query rendering is fine and the problem is the underlying library returning an exception instead of some standard failure signal.

Does anybody know if

1) defining a colspecs object as above will be enough for objects of e.g. types.VARCHAR to be rendered as e.g. 'STRING' in generated SQL?
2) Does the underlying PEP 249 implementation need to signal failure in a particular way, or is throwing exceptions the expected behaviour?


Thanks,
Raul

Mike Bayer

unread,
Aug 19, 2016, 10:14:00 AM8/19/16
to sqlal...@googlegroups.com


On 08/19/2016 06:37 AM, rla...@fastly.com wrote:
> Hello all,
>
> I have implemented the first stab at a PEP 249 adaptor layer for
> BigQuery and it seems to work well. It is possible to create an
> engine/connection/cursor, submit an SQL query and get results back (only
> SELECT statements for now, API commands will come later). I have moved
> on to changing the DDL and statement compilers to conform to BigQuery's
> standard SQL dialect:
>
> https://cloud.google.com/bigquery/sql-reference/query-syntax
>
> I hit an immediate hurdle and I am not sure if it is because the Dialect
> object I created is incorrect or because the PEP 249 adapter is behaving
> in an unexpected way. Basically, upon connection SQL alchemy will fire
> some test queries (afaiu to detect whether column names support
> unicode), one of which is being rendered as
>
> SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
>
>
> The problem is that BigQuery does not support VARCHAR. I have already
> added a colspecs dictionary to my new dialect object, with many common
> data types mapped to their BgQuery equivalents:


The method that's calling this test and others is in
sqlalchemy/engine/default -> DefaultDialect.initialize(). You should
override that whole method and do away with all the things it's trying
to check there, as calchipan does:

https://bitbucket.org/zzzeek/calchipan/src/86ef380c572b9c1b8186278446a9b4952a538f97/calchipan/base.py?at=master&fileviewer=file-view-default#base.py-45

Although I would say that on the SQLAlchemy side,
_check_unicode_returns() should likely be a method that can raise
NotImplementedError() individually like the rest of the tests called
within the base initialize().

rla...@fastly.com

unread,
Aug 22, 2016, 5:21:49 AM8/22/16
to sqlalchemy
Thanks! Will do.
All the best
Raul

rla...@fastly.com

unread,
Aug 25, 2016, 11:26:22 AM8/25/16
to sqlalchemy
Hello all,

I overrided that method as you suggested, and included placeholders for the other ones being overloaded by calchipan. The reflection functions now work, and I decided to use the same mechanism as calchipan for DDL compilation (instantiating a Resolver object that is then passed to the dbapi implementation). Hoever I've hit another snag that maybe you guys have seen before.

The current state of the code can be checked out here:


To start filling in the CREATE functionality, I'm using this toy example:


metadata = MetaData(engine)

users = Table('users', metadata,
    Column('id', Integer, primary_key=True),
    Column('name', String),
    Column('age', Integer),
    Column('fullname', String),
)
users.create()

I expected this to trigger visit_create_table in the DDL compiler, which would return a Resolver object that would be provided to the execute method in the Cursor implementation in dbapi. However, this is not happening - instead, it seems that the default execute method is being triggered, instead of that in dbapi. Since that method expects a string instead of the Resover object, it complains. The exception I am getting is:


Traceback (most recent call last):

  File "./test_bq.py", line 39, in <module>

    users.create()

  File "/Users/rlanda/Workspace/Python/sqlalchemy/lib/sqlalchemy/sql/schema.py", line 747, in create

    checkfirst=checkfirst)

  File "/Users/rlanda/Workspace/Python/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1920, in _run_visitor

    conn._run_visitor(visitorcallable, element, **kwargs)

  File "/Users/rlanda/Workspace/Python/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1529, in _run_visitor

    **kwargs).traverse_single(element)

  File "/Users/rlanda/Workspace/Python/sqlalchemy/lib/sqlalchemy/sql/visitors.py", line 126, in traverse_single

    return meth(obj, **kw)

  File "/Users/rlanda/Workspace/Python/sqlalchemy/lib/sqlalchemy/sql/ddl.py", line 767, in visit_table

    include_foreign_key_constraints=include_foreign_key_constraints

  File "/Users/rlanda/Workspace/Python/sqlalchemy/lib/sqlalchemy/engine/base.py", line 947, in execute

    return meth(self, multiparams, params)

  File "/Users/rlanda/Workspace/Python/sqlalchemy/lib/sqlalchemy/sql/ddl.py", line 68, in _execute_on_connection

    return connection._execute_ddl(self, multiparams, params)

  File "/Users/rlanda/Workspace/Python/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1004, in _execute_ddl

    compiled

  File "/Users/rlanda/Workspace/Python/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1125, in _execute_context

    util.text_type(statement), parameters,

TypeError: coercing to Unicode: need string or buffer, CreateTableResolver found



Any ideas on how invoke the correct execute method?

Thanks again!
Raul



On Friday, August 19, 2016 at 3:14:00 PM UTC+1, Mike Bayer wrote:

Mike Bayer

unread,
Aug 25, 2016, 4:28:20 PM8/25/16
to sqlal...@googlegroups.com


On 08/25/2016 11:26 AM, rla...@fastly.com wrote:
> Hello all,
>
> I overrided that method as you suggested, and included placeholders for
> the other ones being overloaded by calchipan. The reflection functions
> now work, and I decided to use the same mechanism as calchipan for DDL
> compilation (instantiating a Resolver object that is then passed to the
> dbapi implementation). Hoever I've hit another snag that maybe you guys
> have seen before.
>
> The current state of the code can be checked out here:
>
> https://github.com/rlanda/sqlalchemy-bigquery/tree/master/sqlalchemy_bigquery
>
> To start filling in the CREATE functionality, I'm using this toy example:
>
>
> metadata = MetaData(engine)
>
> users = Table('users', metadata,
> Column('id', Integer, primary_key=True),
> Column('name', String),
> Column('age', Integer),
> Column('fullname', String),
> )
> users.create()
>
> I expected this to trigger *visit_create_table *in the DDL compiler,
> which would return a *Resolver* object that would be provided to the
> *execute* method in the *Cursor* implementation in *dbapi*. However,
that text_type() call is inside of the part where it's trying to throw
an exception. you'll want to give your Resolver object a __str__()
method so that unicode() reports something meaningful.
> > an email to sqlalchemy+...@googlegroups.com <javascript:>
> > <mailto:sqlalchemy+...@googlegroups.com <javascript:>>.
> > To post to this group, send email to sqlal...@googlegroups.com
> <javascript:>
> > <mailto:sqlal...@googlegroups.com <javascript:>>.
> <https://groups.google.com/group/sqlalchemy>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
Reply all
Reply to author
Forward
0 new messages