psycopg2 or pg8000

4,536 views
Skip to first unread message

Marco Túlio Cícero de M. Porto

unread,
May 1, 2013, 7:59:26 PM5/1/13
to web...@googlegroups.com
Are there any advantages on one or another or are they basically the same thing?
I'm using psycopg2 atm.

--
[]'s
Marco Tulio

Ovidio Marinho

unread,
May 1, 2013, 8:13:06 PM5/1/13
to web...@googlegroups.com

      


         Ovidio Marinho Falcao Neto
                 Web Developer
             ovid...@gmail.com 
               83   8826 9088 - Oi
               83   9336 3782 - Claro
                        Brasil
              


2013/5/1 Marco Túlio Cícero de M. Porto <mtcp...@gmail.com>

--
 
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Marco Túlio Cícero de M. Porto

unread,
May 1, 2013, 8:52:14 PM5/1/13
to web...@googlegroups.com
First link (the comparison), was written on May 20 2008 (5 years ago), so I believe a few things have changed from that point to now.

Second link does not provide a comparison that could help. Although it's interesting in itselft.

Thanks anyway for your time and help.

Cheers,
Marco Tulio


2013/5/1 Ovidio Marinho <ovid...@gmail.com>



--
[]'s
Marco Tulio

Marin Pranjić

unread,
May 2, 2013, 4:02:26 AM5/2/13
to web2py-users
I remember pg8000 having some bugs and I'm not sure if they are fixed.

psycopg2 should be better choice

Marin


--

Massimo Di Pierro

unread,
May 2, 2013, 9:13:55 AM5/2/13
to web...@googlegroups.com
I do not remember what the bugs were but they were related to escaping and this poses a security issue. It has not been fixed to my knowledge.

Mariano Reingart

unread,
May 2, 2013, 11:35:58 AM5/2/13
to web...@googlegroups.com
There is new activity in the "original" pg8000 repo, but AFAIK some of
my changes where not merged, and others where not addressd, but you
could try that:

https://github.com/mfenniak/pg8000

About escaping, the difference with psycopg2 is that pg8000 uses the
more advanced query protocol (Prepared Statement) that don't requires
escaping and inlining the parameters in the SQL query (so, in theory,
it is even safer than psycopg2. and more straightforward as it allows
binary parameters directly).
Another advantages of de prepared statement protocol is that it could
be parsed/analyzed once, and executed many times, that should be
faster in some circumstances.

IIRC, psycopg2 uses the simple query protocol, with inlined parameters
directly in the SQL.
It has a mogrify function that escapes the parameters, that should be
migrated to pg8000 to avoid this kind of issues, or better, just use
the DBAPI call with parameters, so no scaping will be required (IIRC,
this is not supported by web2py right now).
Sadly, many connectors have different implentation of the DBAPI, so it
would not be trivial to adapt, see for example:
http://www.python.org/dev/peps/pep-0249/#paramstyle

Also, psycopg2 uses libpq (the postgresql client C library), so it
will be a lot faster (but a bit harder to install in some scenarios).
Its depends highly on which kind of application you're writting (for
simple apps, there wouldn't be a significative difference).

BTW, if you have a test case with web2py, I could help to fix the
issues with pg8000.

Best regards,

Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

Marco Túlio Cícero de M. Porto

unread,
May 2, 2013, 1:54:56 PM5/2/13
to web...@googlegroups.com
Thanks everyone for your time and knowledge. 

Mariano, I think I'll make a test server where I can compare both drivers. When I have that done, if I have any problems, would it be ok to ask for your help?


See you around.
Marco Tulio


2013/5/2 Mariano Reingart <rein...@gmail.com>



--
[]'s
Marco Tulio

Mariano Reingart

unread,
May 2, 2013, 2:39:47 PM5/2/13
to web...@googlegroups.com
Marco, yes, I could help you with this, and that would be great to
test fixes and improvements in the DAL ;-)
On Thu, May 2, 2013 at 2:54 PM, Marco Túlio Cícero de M. Porto

Massimo Di Pierro

unread,
May 3, 2013, 9:21:46 AM5/3/13
to web...@googlegroups.com
OK. I think I know what to do now. I opened a ticket and will review this asap. Thanks Mariano.

Massimo Di Pierro

unread,
May 5, 2013, 3:00:34 PM5/5/13
to web...@googlegroups.com
I need help with this:

1) the pg8000 in web2py says version 1.10. The only you link says 1.08 but seems more recent. there are two 1.08 the on github and the one here: http://pybrary.net/pg8000/ which one should I use?

2) which patches did you make which are not merged?

3) In postgresqladapter we set:

   SET standard_conforming_strings=on;

pg8000 ignores this and interprets % as an escape chars. This is a security issue. How can we make sting escaping conforming to SQL in pg8000?

I may be able to fix 3 if you can send me a patch for 1 and 2.

Massimo


On Thursday, 2 May 2013 10:35:58 UTC-5, Mariano Reingart wrote:

Mariano Reingart

unread,
May 5, 2013, 4:21:32 PM5/5/13
to web...@googlegroups.com
On Sun, May 5, 2013 at 4:00 PM, Massimo Di Pierro
<massimo....@gmail.com> wrote:
> I need help with this:
>
> 1) the pg8000 in web2py says version 1.10. The only you link says 1.08 but
> seems more recent. there are two 1.08 the on github and the one here:
> http://pybrary.net/pg8000/ which one should I use?

The github / pubrary were unmaitained, so I begin a separate fork:

https://code.google.com/p/pg8000

The "official" repo is:

https://github.com/mfenniak/pg8000

I don't know it the official repo work out of the box with web2py, I
couldn't test it.


> 2) which patches did you make which are not merged?

Changelog:
----------

- Added Two Phase Commit support, based on prior TPC
connection wrapper for psycopg2 developed originally for
http://pgfoundry.org/projects/pyreplica/

- Added autocommit attribute to avoid to send a BEGIN automatically.
This is needed for some commands (e.g. CREATE DATABASE, VACUUM...)
that require to be run outside any transaction. Thanks funkybob for
the original patch (adapted from a method to a property to be
compatible with other python connectors)

- Fixed NUMERIC representation issue (truncation on string
conversion). Thanks zeha for the report and patch.

- Fixed "Empty array issue" that raised an exception when SELECTing an
empty array field (with any primitive type into). Thanks franklx for
the original patch (modified to return a true empty list)

- Fixed unicode issue in textout that caused an exception when using
PostgreSQL errors in non-English Locale and, in some cases, when using
non ASCII chars.Now, the text is encoded correctly using the
client_encoding.

- Added server_version attribute as reported by the backend. Thanks
Ulrich Petri for the patch, docs and test case.

- Added OID 194 mapping for "string representing an internal node
tree" to support sqlalchemy introspection with PostgreSQL 9.1.
Thanks dave42 for the original patch.

- Added OID 142 mapping for "xml" type basic support. Thanks dahilia
for the report.

- Fixed issue with python 2.5 missing ssl (no ssl support if package
is not installed)

- Added basic automatic test connection setup (using loged-in username)

There are more changesets:

https://code.google.com/p/pg8000/source/list

Some were merged, but not my changes.
The problem is that the "official" pg8000 source changed a lot of
things (mainly lookig for speed ups), so both code bases are now
incompatible.
They also implemented some of my changes in a different way.

> 3) In postgresqladapter we set:
>
> SET standard_conforming_strings=on;
>
> pg8000 ignores this and interprets % as an escape chars. This is a security
> issue. How can we make sting escaping conforming to SQL in pg8000?

Maybe this helps:

https://github.com/mfenniak/pg8000/commit/b59841124f474fe14c69f7ff671e98f814447717

Anyway, a more permanent and secure solution would be using parameters
and the correct paramstyle, and not inlining them in the SQL

> I may be able to fix 3 if you can send me a patch for 1 and 2.

If you can send me the tests case for the issues, I could debug them
and make the patches.
Please give me some weeks because I'm busy with personal issues and
other projects.

Joe Barnhart

unread,
May 30, 2013, 3:41:55 PM5/30/13
to web...@googlegroups.com
I have just tried both drivers -- but in an apples-and-oranges comparison.  I used pg8000 with pypy and web2py because it is pure Python and can be used with pypy.  I used psycopg2 with python 2.7 on the same database and application.

My application begins with a bulk-load of a CSV file.  The file has about 450,000 records of about 10 fields each.  Inserting the file using psycopg2 and python 2.7 took about 4-5 minutes on a quad-core i7 iMac.  The memory used was about 20M for postgres (largest thread) and about an equal amount for python.  The task was handled by the web2py scheduler.

The pypy-pg8000 version of the file load took almost an hour, but that is deceptive.  The problem is that it overwhelmed the 12GB of memory in the computer.  Both the pypy task and the postgres task ran amok with memory requirements.  The postgres task took >8GB and forced the computer into swapping, killing the response time.

Pypy is known for being somewhat of a memory hog (I was trying version 2.0.2).  It worked darned well in web2py, with this being the only problem I encountered.  Since my code heavily relies on modules, the speedup was noticible using pypy.  Some of my longer tasks include creating pdf files and this took about 1/3 to 1/5 the time under pypy as compared to cpython 2.7.1.

I know this is not an accurate comparison (because of the pypy component), but the runaway memory use of postgres under pg8000 concerned me so I thought I'd mention it.

-- Joe B.

Massimo Di Pierro

unread,
May 30, 2013, 5:33:00 PM5/30/13
to web...@googlegroups.com
Mind I have security concern about pg8000. It is vulnerable to SQL injections in web2py.

Mariano Reingart

unread,
May 30, 2013, 6:05:30 PM5/30/13
to web...@googlegroups.com
Hi Massimo, do you have a link to the SQL injection issue?

I couldn't reproduce it, nor the communication problem (there were an
out of sync statement issue under high loads, IIRC)

BTW, I was given access to the pg8000 official repository (now it is
being maintained again), so I'm planning to merge my version with the
latest updates (including some performance enhancements).

Joe: I attended the pypy tutorial at PyCon US 2012, seeking to speed
up pg8000 without luck. Not only there was no improvement, also I got
stuck by a pypy unsuported feature in Windows. Maybe pypy has better
support now, maybe the new enhancements in pg8000 are better for its
JIT compiler.

If you just have to upload a CSV file, see the COPY statement, it is unbeatable.

Massimo Di Pierro

unread,
Jun 1, 2013, 12:39:00 AM6/1/13
to web...@googlegroups.com
Can you try this? With postgres and pg8000

db.define_table('thing',Field('name'))
value = r"\'"
db.thing.insert(name=value)

It should insert the thing but I suspect you will get an error

You can also try:

id = db.thing.insert(name='%')
print db.thing[id].name

do you get '%' or '%%'?

Massimo

Mariano Reingart

unread,
Jun 1, 2013, 10:09:54 AM6/1/13
to web...@googlegroups.com
I don't get errors nor any difference:

db = DAL('postgres:pg8000://reingart:1234@localhost/pg8000',pool_size=1,check_reserved=['all'])

db.define_table('thing',Field('name'))

def test1():
value = r"\'"
id = db.thing.insert(name=value)
value = db(db.thing.id==id).select().first().name
return dict(id=id, value=value, lenght=len(value),
adapter=db._adapter.__version__)

def test2():
id = db.thing.insert(name='%')
value = db(db.thing.id==id).select().first().name
return dict(id=id, value=value, lenght=len(value),
adapter=db._adapter.__version__)

def test3():
id = db.thing.insert(name='%%')
value = db(db.thing.id==id).select().first().name
return dict(id=id, value=value, lenght=len(value),
adapter=db._adapter.__version__)


Test1

adapter:gluon.contrib.pg8000.dbapi 1.10
id:14L
lenght:2
value:\'

Test2

adapter:gluon.contrib.pg8000.dbapi 1.10
id:15L
lenght:1
value:%

Test3

adapter:gluon.contrib.pg8000.dbapi 1.10
id:16L
lenght:2
value:%%

I'm missing something?

Regards
On Sat, Jun 1, 2013 at 1:39 AM, Massimo Di Pierro

Massimo Di Pierro

unread,
Jun 1, 2013, 3:03:28 PM6/1/13
to web...@googlegroups.com
Looks like I am wrong and there is no problem with pg8000.

Ovidio Marinho

unread,
Jun 1, 2013, 9:13:57 PM6/1/13
to web...@googlegroups.com
The books Learning Python in chapter Execution Optimization Tools  pag.30
...Execution Optimization Tools
CPython, Jython, and IronPython all implement the Python language in similar ways:
by compiling source code to byte code and executing the byte code on an appropriate
virtual machine. Still other systems, including the Psyco just-in-time compiler and the
Shedskin C++ translator, instead attempt to optimize the basic execution model. These
systems are not required knowledge at this point in your Python career, but a quick
look at their place in the execution model might help demystify the model in general.
The Psyco just-in-time compiler
The Psyco system is not another Python implementation, but rather a component that
extends the byte code execution model to make programs run faster. In terms of
Figure 2-2, Psyco is an enhancement to the PVM that collects and uses type information
while the program runs to translate portions of the program’s byte code all the way
down to real binary machine code for faster execution. Psyco accomplishes this
† Jython and IronPython are completely independent implementations of Python that compile Python source
for different runtime architectures. It is also possible to access Java and .NET software from standard CPython
programs: JPype and Python for .NET systems, for example, allow CPython code to call out to Java and .NET
components. 
Translation without requiring changes to the code or a separate compilation step during
development.
Roughly, while your program runs, Psyco collects information about the kinds of objects
being passed around; that information can be used to generate highly efficient
machine code tailored for those object types. Once generated, the machine code then
replaces the corresponding part of the original byte code to speed your program’s overall
execution. The net effect is that, with Psyco, your program becomes much quicker
over time and as it is running. In ideal cases, some Python code may become as fast as
compiled C code under Psyco.
Because this translation from byte code happens at program runtime, Psyco is generally
known as a just-in-time (JIT) compiler. Psyco is actually a bit different from the JIT
compilers some readers may have seen for the Java language, though. Really, Psyco is
a specializing JIT compiler—it generates machine code tailored to the data types that
your program actually uses. For example, if a part of your program uses different data
types at different times, Psyco may generate a different version of machine code to
support each different type combination.
Psyco has been shown to speed Python code dramatically. According to its web page,
Psyco provides “2x to 100x speed-ups, typically 4x, with an unmodified Python interpreter
and unmodified source code, just a dynamically loadable C extension module.”
Of equal significance, the largest speedups are realized for algorithmic code written in
pure Python—exactly the sort of code you might normally migrate to C to optimize.
With Psyco, such migrations become even less important.
Psyco is not yet a standard part of Python; you will have to fetch and install it separately.
It is also still something of a research project, so you’ll have to track its evolution online.
In fact, at this writing, although Psyco can still be fetched and installed by itself, it
appears that much of the system may eventually be absorbed into the newer “PyPy”
project—an attempt to reimplement Python’s PVM in Python code, to better support
optimizations like Psyco.
Perhaps the largest downside of Psyco is that it currently only generates machine code
for Intel x86 architecture chips, though this includes Windows and Linux boxes and
recent Macs. For more details on the Psyco extension, and other JIT efforts that may
arise, consult http://www.python.org; you can also check out Psyco’s home page, which
currently resides at http://psyco.sourceforge.net. ...

I think this is enough to say that psyco is faster.by.

      


         Ovidio Marinho Falcao Neto
                 Web Developer
             ovid...@gmail.com 
               83   8826 9088 - Oi
               83   9336 3782 - Claro
                        Brasil
              


2013/6/1 Massimo Di Pierro <massimo....@gmail.com>

Joe Barnhart

unread,
Jun 6, 2013, 2:31:02 AM6/6/13
to web...@googlegroups.com
Hi Ovidio --

You posted interesting (if somewhat old) information about psyco, but I'm not sure how that fits in.  I'm sure you realize that psyco is not the same as psycopg2 -- the two are completely different and have no relation to one another.

Also, the psyco library is unmaintained and is now considered "dead".  See http://psyco.sourceforge.net/ for details.  It seems the main psyco inspiration has been swayed to the dark side of pypy and is contributing to it.  (Which of course is fantastic news to those of us who are secretly cheering for pypy and its eventual world domination.)

Warm regards,

Joe
Reply all
Reply to author
Forward
0 new messages