So� I did a crazy thing� I created an entirely new JDBC driver.
(For those who care there's an actual question at the bottom� this isn't just bragging)
About 10 days ago I set out to fix the current driver's support for composite and array objects. �Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. �Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).
I began by hacking the driver to force it into binary mode and pulling the raw data with getBytes. Then I started building a framework for recognizing, encoding and decoding all the possible types PostgreSQL knows about. ��Once that was working well, my mission was to rework this into the current driver. �This proved almost impossible due to 1) my limited familiarity with the code and 2) the assumptions it makes about the formats of things like parameters. �In the end it seemed too time consuming for ME to do this. �So, partly for fun, I decided to just implement the FE/BE protocol and see where it got me. Next thing I knew I was running queries and retrieving data. �Basically it's just a side project, of a retro-fit, that went wrong and has spiraled out of control ;)
As outlined above I started the project to support decoding of Composite and Array types. �To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. �I then create a type registry that holds all the required details of all the types. �Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". �When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. �When sending parameter data the type is located and the it's encoder is called to encode the data. Reading through the driver mailing-lists, it seems using binary only has some ramifications as far as type coercion and such are concerned; currently all user initiated queries use the Extended Protocol & Statement Describe to ensure parameter types/values are correct.
Where to go from here�
The major question I would like to ask is��
Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?
I am no dummy. �I understand the years of experience the current driver has to ensure it works well in an extremely large number of cases. �At the same time, anybody who has peeked around in there (and I have done quite a bit of it) knows its showing its age. ��My driver is 100% new code� not a stitch of the old was used. �Give this, it seems like transplanting my new "core" into the current project would be like giving it a brain transplant just after a fresh head transplant; in other words� a rewrite.
I'd love it if some folks in the know could take a look at my code and see if it stirs up any ideas on integration or just makes you want to jump off a bridge.
If you read this far you get a cookie�
Here is the GitHub project� �https://github.com/kdubb/pgjdbc-ng
-- Get the PriceGoblin Browser Addon www.pricegoblin.co.uk
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Kevin,
I'm excited to try it out - I've been thinking on doing the same quite a few times while hacking
yet another hotspot out of the driver, and you're right: It's showing it's age, but also it's
experience.
Kevin Wooten wrote on 12.03.2013 02:19:
So… I did a crazy thing… I created an entirely new JDBC driver.
(For those who care there's an actual question at the bottom… this
isn't just bragging)
I'd love it if some folks in the know could take a look at my code
and see if it stirs up any ideas on integration or just makes you
want to jump off a bridge.
From: Kevin Wooten <kd...@me.com>Subject: A new JDBC driver...Date: 12. mars 2013 02:19:11 CET
About 10 days ago I set out to fix the current driver's support for composite and array objects. Currently they are only ever retrieved as string encoding due to the use of the TEXT format for those types; parsing these encoded strings seemed just short of a nightmare. Ambition got the better of me and before I knew it I had a "working" implementation of the entire FE/BE protocol (save for Copy In/Out).
* Can decode any recognized type to a Java object (this includes any imaginable composite or array type)
* Connection.setTypeMap and ResultSet.get(int idx, Map) are both fully supported
* Requests for composite objects that have no custom mapping are returned as HashMap
* Arrays can be decoded as a List, Map or native array (e.g. Object[], int[])
* As an an extension it can decode whole rows into POJO's as well (acts a tiny bit like MyBatis)
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
* LISTEN/NOTIFY and notifications can come through asynchronously
* Netty has a great system for managing buffers and reading/writing messages that shows increased speed
* Performance wasn't a goal of this project but it's a nice side effect
BINARY SUPPORT
As outlined above I started the project to support decoding of Composite and Array types. To accomplish this I download "pg_type", "pg_attribute" and "pg_proc" to the client upon connection. I then create a type registry that holds all the required details of all the types. Procedures, for both TEXT and BINARY protocols, are looked up and matched by name (e.g. "bool_send", "money_recv", etc) by a list of "Procedure Providers". When a DataRow message is received it looks up the type in the registry and calls the appropriate TEXT or BINARY decoder to decode the row. When sending parameter data the type is located and the it's encoder is called to encode the data.
Should I continue on this path of a new driver and see if people join or should I take what I have learned and try to refit it into the current code?
* Asynchronous I/O engine provided by Netty
* All connections share a single group of worker threads
-- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Craig, thanks a lot for the information. I read you SO question and related info, and then did a bit of researching on my own. I have come up with a few things…
Starting your own threads is not suggested because of the problems with not have a definitive startup/shutdown (until more recent JavaEE versions) and any thread you start cannot use any of the services of the container. Everything else I have read just says "IF" you don't shut the threads down properly you could wreak havoc, eat up resources, etc, etc.
The first issue isn't an issue at all. The threads handle the I/O only and all data is delivered back to an application level thread for processing. Basically the fact that threads are used is completely transparent to the application code; it treats the calls synchronously.
Secondly, I have actually paid a bit of attention to the issue of threads shutting down because any abandoned connection causes the threads to remain active and the program to, at the very least, not be able to shutdown.
Good idea. The driver *must* continue to function when connections aren't closed, as this is unfortunately extremely common. Relying on finalizers won't cut it in my opinion, they're just too unreliable to use for anything except logging warnings to say you forgot to close something.I used a reference counting system to share the thread pool between connections. This guarantees that if the connections are properly closed, the threads will be killed. I did a bit of experimentation last night with using weak references everywhere and trying to handle the case where a person forgets to close a connection.
PostgreSQL's cancel system is pretty interesting. You have to open a new socket and send a cancel packet.
Finally, with regard to your SO question since there seems to be no answer, you could try, as I touched on earlier, and implement the query timeout by using non-blocking sockets, selectors and the like. I think you'll quickly grow to appreciate why others are using threads; Java has made something that was easy in C, very hard. Also, in my journeys last night I discovered the statement "statement_timeout" connection parameter. If you didn't know about it already, the server will cancel any statement that takes longer than this value. It may be an easy solution to your problem.
After a bit of messing around I finally settled on the much maligned "finalizer" to kill the connection if it's abandoned.
finalize() method is problematic.System.gc(), System.runFinalization(), System.runFinalizersOnExit(),
and Runtime.runFinalizersOnExit() either lack such
guarantees or have been deprecated because of lack of safety and
potential for deadlock.One consequence is that slow-running finalizers can delay execution of other finalizers in the queue. Further, the lack of guaranteed ordering can lead to substantial difficulty in maintaining desired program invariants.The Java programming language imposes no ordering on
finalize()method calls. Finalizers [of different objects] may be called in any order, or even concurrently.
finalize() methods are invoked by
the garbage collector from one or more threads of its choice;
these threads are typically distinct from the main() thread, although this
property is not guaranteed. When a finalizer is necessary, any
required cleanup data structures must be protected from
concurrent access. See the JavaOne presentation by Hans J. Boehm
[Boehm 2005] for
additional information.