Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

TIP#308 Published: Tcl Database Connectivity (TDBC)

265 views
Skip to first unread message

Donal K. Fellows

unread,
Nov 15, 2007, 5:45:22 PM11/15/07
to
TIP #308: TCL DATABASE CONNECTIVITY (TDBC)
============================================
Version: $Revision: 1.2 $
Author: Kevin B. Kenny <kennykb_at_acm.org>
State: Draft
Type: Informative
Vote: Pending
Created: Thursday, 15 November 2007
URL: http://tip.tcl.tk/308.html
Post-History:

-------------------------------------------------------------------------

ABSTRACT
==========

This TIP defines a common database access interface for Tcl scripts.

INTRODUCTION
==============

There has been a fair amount of discussion, that flares and dies back,
regarding the need for a "Tcl database connectivity layer" in the Tcl
core. This document specifies what this discussion means. At its
present stage of development, it is to be considered very much a draft;
discussion is actively solicited.

WHAT IS TCL'S DATABASE CONNECTIVITY LAYER?
--------------------------------------------

If we look at other database connectivity layers such as ODBC/DAO,
JDBC, Perl's DBD/DBI, we find that there really isn't very much, if
anything, inside them. Rather than being a body of code, they consist
primarily of specifications of the interfaces to which the author of a
database connectivity module must conform. The real work of connecting
to the databases happens inside the connectivity modules, which are
generally speaking under the control of the database teams. In terms of
practical politics, there isn't really any other way to do it; the Tcl
maintainers are highly unlikely to want to take on the job of
connecting to arbitrary database API's.

In other languages, such as C++ and Java, it is often necessary to have
interface definitions that are understood by a compiler in order to get
the "pluggability" of arbitrary database connectivity. In Tcl, however,
an "interface" is best understood as an ensemble implementing a
predetermined set of commands. There is no counterpart to a Java or C++
interface definition, nor does there need to be. For this reason, the
work product of a "Tcl database connectivity" development effort is
likely (at least at the first stage) to consist primarily of a
specification document, perhaps with reference implementations for one
or a few popular databases. To be considered "in the core", the
specification should be included with the Tcl documentation, and be
under control of the TIP process. The database implementations should
be considered "extensions," and have their own configuration
management. This statement doesn't say that we can't choose from among
them a set that we will package with releases of the Tcl core. In fact,
I hope that this effort will be one driver for the TCT to sort out the
management of "bundled extensions."

MECHANICS OF THIS DOCUMENT
----------------------------

I write this document in "standards committee prose". (While turgid, it
at least is often well-understood; I offer no further defence.) In
particular:

* the word "MAY" is construed as allowing a given behaviour but
imposing no requirement other than that clients be prepared for
it;

* the word "MUST" (and conversely "MUST NOT") is construed as
requiring a given behaviour; implementations that fail one or
more requirements given by "*must*" are non-compliant;

* the word "SHOULD" (and conversely "SHOULD NOT") indicates that a
given behaviour is expected of an implementation unless there is
a compelling reason not to include it; while not formally
non-compliant, implementations that fail one or more requirements
given by "SHOULD" can be understood to have issues with respect
to "quality of implementation."

* the future of determination ("SHALL" or "WILL" according to the
usual customs of formal written English) is construed as a
promise to which the Tcl Core or the Tcl Core Team, as
appropriate, shall adhere. It describes requirements of the Tcl
Core, rather than of database connection modules.

* the term, "integer value" refers to any string acceptable to
*Tcl_GetBignumFromObj*; the term "native integer value" refers to
a value acceptable to *Tcl_GetIntFromObj*, and hence to a value
that can be represented by a C *int* on the target machine.

* the term, "boolean value" refers to any string acceptable to
*Tcl_GetBooleanFromObj* and hence includes at least '1', '0',
'on', 'off', 'yes', 'no', 'true', and 'false'.

SPECIFICATION
===============

CONNECTING TO A DATABASE
--------------------------

Obviously the first thing that any connectivity layer has to offer is
the ability to select a database. The way databases are named is quite
specific to the database manager, as is the way access is negotiated
(credentials such as user name and password may be required, session
keys may be negotiated for privacy and authentication, and so on). All
of this machinery is formally out of scope for this specification.
Similarly, the machinery of database administration (at least at the
level of creating/deleting entire databases, managing the physical
layer, and authorizing clients) is presumed to be already taken care
of. We need merely specify that a connectivity layer must provide at
least one command that accepts arguments describing the desired
connection and returns a /database handle/ - defined to be an ensemble
through which interactions with the given database instance will take
place. Here, /database instance/ means the database, or databases, that
the given handle can access; rather a circular definition. In many SQL
systems, it is possible for a single connection to access several
"databases" managed by SQL CREATE DATABASE statments, or several
"tablespaces" or similar constructs. We presume that database module
implementors will know what is appropriate for their systems, and
intentionally leave this particular matter somewhat vague.

BASIC MECHANICS OF DATABASE INTERFACES
----------------------------------------

Database handles are Tcl ensembles, meaning that they are commands that
support subcommands. Other ensembles, such as statement handles, may be
supported. Any of the ensembles may support abbreviation of its
subcommands according to the rules defined by *Tcl_GetIndexFromObj*;
nevertherless, code that uses the database interface should spell out
subcommands in full.

Many of the subcommands are expected to take options in Tcl's usual
syntax of:

?/-option/ ?/value/?? ?/-option value/?...

In all of the places where this syntax is expected, a database module
may support abbreviation of options according to the rules of
*Tcl_GetIndexFromObj()*; once again, code that uses the interface
should spell out options in full.

CONFIGURING A DATABASE HANDLE
-------------------------------

Once a handle is returned, there are a number of session-level
attributes that may be controllable. Every database handle MUST provide
a *configure* subcommand that takes the form:

/dbHandle/ *configure* ?/-option/ ?/value/?? ?/-option value/?...

This configuration process is analogous to configuring a Tk widget. If
there are no arguments presented to *configure*, the return value MUST
be a list of triples; each element of the list MUST comprise the name
of an available configuration parameter, its default value, and its
current value. If a single argument is presented, it MUST be the name
of a configuration parameter, and the return value MUST be the triple
of name, default value, and current value for that parameter. Finally,
if more than one argument is presented, they MUST be a list of
alternating parameter names and values. This last form is an order to
set the given parameters to the given values.

The connectivity layer SHOULD implement the following parameters, and
MAY implement others:

* *-autocommit* /boolean/

Requests a change in the 'auto-commit' behaviour of the
connection. The value MUST be a boolean value. If the value is
true, the connection SHOULD henceforward treat every SQL
statement as belonging to a transaction by itself, and commit
changes to the database when any statement is successfully
concluded. If the value is false, the connection SHOULD give the
caller control over transaction isolation by means of the
*commit* and *rollback* subcommands (q.v.). Single-process
database interfaces that do not support transaction isolation MAY
return an error if the caller requests *-autocommit true*; they
SHOULD accept *-autocommit false* silently.

* *-encoding* /name/

Requests that the encoding to be used in database communication
protocol be changed to the one given by /name/, which MAY be any
name acceptable to the [encoding] command. A well-designed
database interface SHOULD NOT require this command; however, some
backends make it virtually inevitable that mid-stream changes of
encodings will be required.

* *-timeout* /ms/

Requests that operations requested on the database SHOULD time
out after the given number of milliseconds, if such an option is
supported by the underlying connectivity layer.

* *-readonly* /boolean/

Notifies that the application will, or will not, limit its
activity to operations that do not modify the content of the
database. This option MAY have the effect of adjusting the
transaction isolation level.

The command that returns a database handle SHOULD also accept these
options.

TRANSACTION ISOLATION
-----------------------

A database handle MUST implement two commands:

* /dbHandle/ *commit*

Requests that the current transaction against the database be
committed.

* /dbHandle/ *rollback*

Requests that the current transaction against the database be
rolled back.

Both commands MUST take no action if the *-autocommit* option is true.
If an underlying database implementation does not support transaction
isolation, then the *commit* command SHOULD take no action and the
*rollback* commands SHOULD return an error.

CLOSING A DATABASE CONNECTION
-------------------------------

A database handle MUST implement the command:

* /dbHandle/ *close*

This command MUST dismiss the connection to the database and is
expected to clean up the system resources associated with it. If there
is an uncommitted transaction, it SHOULD be rolled back. Any handles to
other objects associated with the database SHOULD become invalid.

A database interface also SHOULD perform the same actions if a handle
is deleted by means of the *rename* command. (Interfaces that are
implemented in Tcl may be notified of this action by creating a
deletion trace with *trace add command*.) It is recognized that command
deletion traces present difficulties in situations like namespace and
interpreter deletion; the *close* subcommand shall therefore be
considered the preferred way to terminate connections.

A database interface SHOULD attempt to arrange, if possible, to
rollback unfinished transactions and clean up on process exit. In
particular, if the underlying database engine supports transactions, it
SHOULD be considered an error to commit any work that remains
uncommitted on process exit.

PREPARING STATEMENTS
----------------------

A database handle must support the *prepare* command, which has the
syntax:

* /dbHandle/ *prepare* /SQL-code/

The /SQL-code/ argument is a SQL statement that is to be executed
against the given database connection. This command does not execute
the statement directly; rather, it prepares to execute the statement,
possibly performing tasks such as code compilation and query
optimisation.

The database interface MUST support substitutions in /SQL-code/. Each
substitution request has the form /:variableName/. That is, each
substitution request begins with a literal colon (:), followed by a
letter or underscore, followed by zero or more letters, digits, or
underscores. The database interface is responsible for translating from
this syntax to whatever the underlying engine requires. Typical strings
required in database interfaces are /:name/, /:number/, /@name/,
/@number/, and /?/.

The return value from the *prepare* command is a /statement handle/,
discussed under "The statement interface" below.

/Rationale./ The choice of the colon deserves some discussion. It would
surely be more natural for Tcl to use a literal dollar sign to
introduce a variable name. This choice, however, seems unwise, since
several databases (most notably Oracle) allow the use of table and view
names that contain dollar signs. While it might be possible to continue
to use these while allowing for variable substitution (for instance, by
mandating that table or view names with dollar signs be enclosed in
double quotes), it seems unnatural. The colon is syntax that is
recognized by JDBC, ODBC, and Oracle's native API, and as such will be
familiar to most SQL programmers and unlikely to collide with native
syntax.

The requirement to support prepared statements is intended to guard
against SQL insertion attacks. An interface to a database whose native
API does not support prepared statements MUST simulate them. In
particular, when the *run* or *execute* commands are executed on a
statement, substitution must be performed in a safe fashion with
whatever magic quoting is required. In any case, magic quoting should
be regarded as an infelicitous expedient and avoided if at all
possible.

INTROSPECTING THE SETS OF HANDLES
-----------------------------------

A database handle MUST support the *statements* command:

* /dbHandle/ *statements*

This command MUST return a list of the statements that have been
prepared by means of [/dbHandle/ *prepare*] but not yet dismissed using
[/statementHandle/ *close*].

Likewise, a database handle MUST support the *resultsets* command:

* /dbHandle/ *resultsets*

This command MUST return a list of the result sets that have been
returned (by executing statements, or by querying metadata) and have
not yet been dismissed using [/resultSetHandle/ *close*].

QUERYING METADATA
-------------------

A database interface SHOULD provide a way of enumerating the tables in
the database. The syntax for querying tables MUST be:

* /dbHandle/ *tables* ?/matchPattern/?

The optional argument /matchPattern/, if supplied, is a pattern against
which the table names are to be matched. The database interface MUST
recognize the SQL wildcards *%* and *_* in the pattern.

A database interface SHOULD provide a way of enumerating the columns in
a database table. The syntax for querying columns MUST be:

* /dbHandle/ *columns* /tableName/ /?matchPattern?/

The return value from the *tables* and *columns* commands MUST be a
result set handle, and MUST function as all other result set handles
do; see "The result set interface" below for the API.

The result set of the *tables* command must include at least a column
called *name* in its result set; this column must contain the name of a
table.

Similarly, the result set of the *columns* command MUST include at
least the columns *name*, *type*, *scale*, *precision*, and *nullable*.
The *name* column MUST contain the column name. The *type* column MUST
be the data type of the column, and SHOULD be chosen from among the
standard types /BIGINT/, /BINARY/, /BIT/, /CHAR/, /DATE/, /DECIMAL/,
/DOUBLE/, /FLOAT/, /INTEGER/, /LONGVARBINARY/, /LONGVARCHAR/,
/NUMERIC/, /REAL/, /TIME/, /TIMESTAMP/, /SMALLINT/, /TINYINT/,
/VARBINARY/, and /VARCHAR'. The *scale* and *precision* SHOULD give the
scale and precision of the column, and *nullable* SHOULD give a boolean
value that represents whether the given column can contain NULL values.

Other columns MAY be included in the result sets of *tables* and
*columns*, and SHALL be added to this document (as optional columns) on
request from the implementors of database interfaces.

THE STATEMENT INTERFACE
-------------------------

The statement handle returned from the *prepare* command on a database
interface must itself be an ensemble, which means that it must be a Tcl
command that accepts a subcommand as its first parameter.

The following subcommands MUST be accepted:

* /statementHandle/ *params*

Requests a description of the names and expected data types of
the parameters to the given statement. The return value from the
*params* command MUST be a result set (See "The result set
interface" below for details of its API). The columns of the
result set MUST include /name/, /type/, /scale/, /precision/, and
/nullable/. They are interpreted in the same way as those of the
*columns* subcommand to a database interface (shown above).

* /statementHandle/ *execute*

Executes a statement against a database. Any variable
substitution present in the SQL that was provided when the
statement was created MUST be performed at this time, with the
variable values being obtained from the scope in which the
*execute* command was evaluated. Any variable that is undefined
in that scope must be replaced with a /NULL/ value. An array
variable provided to a substituent MUST result in an error. Read
traces against the substituted variables SHOULD fire, in
left-to-right order as they appeared in the SQL statement. The
result of the *execute* command SHOULD be a result set, as
defined under "The result set interface" below.

* /statementHandle/ *close*

Announces that a statement is no longer required, and frees all
system resources associated with it. The *close* command MAY
invalidate any result sets that were obtained by the *params* and
*execute* commands.

As with database connections, the database interface SHOULD also
clean up if a statement handle is removed with /[rename
$statement {}]/. Once again, it is recognized that the strange
side effects of namespace and interpreter deletion may make this
cleanup impossible in some interfaces, so *close* SHALL be
considered the standard means of discarding statements.

DATA TYPES OF PARAMETERS TO PREPARED STATEMENTS

The syntax described so far presumes that the database interface can
determine the expected types of the variables that appear in a prepared
statement, or at the very least can accept some sort of variant type
and perform automatic type coercion. This requirement does not seem
horribly onerous at first inspection, since SQLite allows for
"everything is a string" parameters; ODBC offers parameter
introspection via the /SQLDescribeParam/ call; and JDBC offers it via
the /getParameterMetaData/ method of the /PreparedStatement/ interface.

Nevertheless, a deeper examination discovers that in at least ODBC, a
driver is allowed to fail to offer /SQLDescribeParam/. Inspection of
the JDBC-ODBC bridge reveals that in this case, JDBC will return a
/ParameterMetaData/ object that throws a /SQLException/ on any attempt
to query specific data. The result is that, while the APIs to
introspect parameter types are available, they may be unusable against
a particular database engine. In these cases, a backup is needed.

For this reason, a database interface MUST support allowing the user to
specify types of the parameters of a prepared statement. The syntax for
doing so MUST be:

* /statementHandle/ *paramtype* /paramName/ /type/ /scale/
/precision/

Defines that the parameter identified by /paramName/ in the given
statement is to be of type /type/. The /type/ MUST be chosen from
among the names /BIGINT/, /BINARY/, /BIT/, /CHAR/, /DATE/,
/DECIMAL/, /DOUBLE/, /FLOAT/, /INTEGER/, /LONGVARBINARY/,
/LONGVARCHAR/, /NUMERIC/, /REAL/, /TIME/, /TIMESTAMP/,
/SMALLINT/, /TINYINT/, /VARBINARY/, and /VARCHAR/.

(/Rationale:/ These types appear to suffice for ODBC, and we can
always come back and extend them later if needed.)

The /scale/ of a parameter defines the number of characters or
digits that it requires, and its /precision/ defines the number
of digits after the decimal point, if neeeded.

/Examples:/

$statement paramtype name varchar 40
$statement paramtype balance decimal 10 2
$statement paramtype transactionDate timestamp

Implementors of database APIs SHOULD make every effort to do
appropriate type introspection so that programmers can avoid needing to
include explicit type information in their SQL statements.

THE RESULT SET INTERFACE
--------------------------

Result sets represent the results of operations performed on the
database. A preferred implementation for large result sets is that they
be implemented as database cursors, so that it is possible to iterate
over result sets that will not fit in memory. A result set MUST be an
ensemble, that is to say, a Tcl command whose first parameter is a
subcommand. The following subcommands MUST be accepted:

* /resultSetHandle/ *rows*

Determines the number of rows affected by a SQL statement such as
*INSERT*, *DELETE* or *UPDATE*. This count MUST be returned as an
integer. It SHOULD NOT be confused with the number of rows in the
result set. A database interface need not provide any interface
to determine that number (often, the only way to determine it is
to read all the rows). For this reason, the *rows* command MAY
return an empty string for *SELECT* operations.

* /resultSetHandle/ *columns*

Determines the set of columns contained in the result set. The
set of columns is itself returned as a result set. The columns in
this second result set MUST include /name/ (the name of the
column in the first result set) and SHOULD include /type/,
/scale/, and /precision/. A database interface MAY include other
columns if additional information about the columns of the result
set is available.

* /resultSetHandle/ *nextrow* /variableName/

Fetches a row of data from the result set and stores it in the
given variable in the caller's context. The row MUST be
represented as a dictionary suitable for use with the *dict*
command. The keys in the dictionary SHALL be the column names,
and the values SHALL be the values of the cells. If no rows
remain, the *nextrow/ command MUST store an empty dictionary. The
return value of /nextrow/ MUST be 1 if a row has been returned,
and 0 if no rows remain in the result set.

In the result set, values of type /BIGINT/, /BIT/, /DECIMAL/,
/DOUBLE/, /FLOAT/, /INTEGER/, /NUMERIC/, /REAL/, /SMALLINT/, and
/TINYINT/ MUST receive their natural representation as decimal
numbers. Ideally, they should be returned as "pure" numbers with
their string representations generated only on demand. Values of
type /CHAR/, /LONGVARCHAR/ and /VARCHAR/ MUST be returned as Tcl
strings. /A database interface implemented in C /MUST/ take care
that all strings are well-formed UTF-8./ Values of type /DATE/
and /TIMESTAMP/ MUST be returned as a numeric count of seconds
from the Tcl epoch; if necessary, this count may have a decimal
point and an appropriate number of additional decimal places
appended to it. Values of type /TIME/ MUST be returned as a
integer count of seconds since midnight, to which MAY be appended
a decimal point and a fraction of a second. Values of type
/BINARY/, /LONGVARBINARY/ and /VARBINARY/ MUST be returned as Tcl
byte arrays.

If a cell in the row is NULL, the key MUST be omitted from the
dictionary. A database interface MUST NOT use a special value of
any kind to represent a NULL.

/Rationale:/ As far as the author of this TIP is aware, no
existing database API makes use of dictinaries for result sets;
they tend to prefer lists to represent them. Nevertheless, this
TIP proposes dictionaries because they allow for a ready
distinction between NULL values in a database and any other
string. With any scheme where values that can include NULLs can
appear in Tcl objects, the problem arises that NULL must be
distinguished from any other string, particularly including the
empty string and the word "NULL". The lack of such a distinction
has led to several ill-advised proposals, such as [TIP #185], for
representing NULLs in Tcl. These alternatives founder on the
principle of "everything is a string". The NULL value is not any
string.

* /resultSetHandle/ *close*

Dismisses a result set and releases any system resources
associated with it.

As with statements and database connections, the database
interface SHOULD also clean up if a resut set handle is removed
with /[rename $statement {}]/. Once again, it is recognized that
the strange side effects of namespace and interpreter deletion
may make this cleanup impossible in some interfaces, so *close*
SHALL be considered the standard means of discarding result sets.

CONVENIENCE PROCEDURES FOR THE PROGRAMMER
-------------------------------------------

Since the cognitive overhead of preparing a statement, executing the
prepared statement, and iterating over a result set is overkill for
many applications, the Tcl system SHALL provide convenience procedures.
The following procedures are envisioned: they can be obtained from
[package require tcl::db].

* *::tcl::db::foreach* /variableName/ /statementHandle/ /script/

Runs the given statement, obtaining a result set. Applies the
given /script/ to each row of the result set, substituting the
given variable with the dictionary returned from /$resultSet
nextrow/ prior to each iteration. When no rows remain, closes the
result set.

* *::tcl::db::allrows* /statementHandle/

Runs the given statement, obtaining a result set. Applies the
*nextrow* command to the result set repeatedly, and constructs a
Tcl list whose elements are the rows returned. When no rows
remain, closes the result set and returns the list of rows.

* *::tcl::db::execute* /dbHandle/ /SQL-code/ ?/variable/?
?/script/?

Prepares a statement with the given SQL code and executes it. If
a /script/ argument is supplied, retrieves rows from the result
set one at a time and executes the given script once per row with
the row dictionary stored in the given variable. If the /script/
argument is omitted, instead stores in /variable/ a list of all
the rows returned. If neither /variable/ nor /script/ is present,
the result set is simply discarded. In any case, the result set
and the statement are both closed, and the return value from
*execute* is the number of rows affected by the operation (for
*INSERT*, *UPDATE* and *DELETE* operations) and may be an empty
string for *SELECT* operations.

It is envisioned that database interfaces will add these conveniences
to the API as

* /dbHandle/ *execute* /SQL-code/ /?variableName ?script??/

* /statementHandle/ *foreach* /variableName/ /script/

* /statementHandle/ *allrows*

either by wrapping the /::tcl::db/ procedures or by implementing
equivalent functionality themselves.

SUPPORT PROCEDURES FOR IMPLEMENTORS OF DATABASE INTERFACES
------------------------------------------------------------

In addition to the convenience commands discussed above, the Tcl system
SHALL provide certain commands to aid the job of database implementors.

SQL TOKENISATION

The task of mapping variable substituions in the form, *:varName* into
whatever form that a native database API can handle is a somewhat
tricky one. For instance, substitutions that appear inside quoted
strings MUST NOT be mapped. In order to aid in this task, the Tcl
system SHALL provide a command, *::tcl::db::tokenize*. This command
SHALL accept a SQL statement as its sole parameter, and return a list
of alternating strings and variable names. Assuming that a native
database's lexical structure conforms with standard SQL, the variable
names (which SHALL always be at the odd indices in the list) can be
substituted with parameter numbers, question marks, or whatever the
database needs, to yield the native SQL that must be prepared.

REFERENCES
============

This specification is largely built from studying existing
cross-platform database APIs and deriving a comon set of requirements
from them. These include both popular C-level offerings (ODBC and JDBC)
and Tcl-level ones (notably the SQLite API and tclodbc).

"ODBC Programmer's Reference." Redmond, Wash.: Microsoft Corporation,
2007. [<URL:http://msdn2.microsoft.com/library/ms714177.aspx>].

"Java Platform Standard Edition 6 API Specification." Santa Clara,
Calif.: Sun Microsystems, 2007
[<URL:http://java.sun.com/javase/6/docs/api/>]; in particular the
package named, *java.sql*.

Hipp, D. Richard. "The Tcl interface to the Sqlite library."
[<URL:http://www.sqlite.org/tclsqlite.html>].

Nurmi, Roy. "Tclodbc v 2.3 Reference." Available as part of the Tclodbc
distribution at [<URL:http://sourceforge.net/projects/tclodbc/>], in
the file, *DOC/REFERENC.HTM*.

LICENSE
=========

This file is explicitly released to the public domain and the author
explicitly disclaims all rights under copyright law.

-------------------------------------------------------------------------

TIP AutoGenerator - written by Donal K. Fellows

Twylite

unread,
Nov 16, 2007, 10:42:44 AM11/16/07
to
Kudos on putting together this spec :)

I'm going to jump right in and start making some comments, because
I've
recently developed a DBI for in-house use and there are a couple of
things about
this spec that concern me.

(Aside: Is there somewhere on the Wiki to maintain an ongoing
discussion/
comments?)

I'm approaching this from the view of a developer using Tcl for
desktop
applications that manage their data in a database, and server
applications that
allow storage and retrieval of data by clients but apply some set of
business
rules. In my experience this covers a significant amount (possibly
the
majority) of small to medium sized "database-driven applications".

I am not considering the requirements of enterprise applications here
(where
things like optimising your access to the DB become increasingly
important),
so I am putting more focus on ease-of-use and elegance in "simple"
scenarios,
with a view that you should do extra work if you want to develop an
enterprise-
capable application.

I am also representing a _commercial_ (not theoretical) software
engineering
viewpoint: something that facilitates fast development and quality
assurance
is good.

To summarise: simple app needs simple DBI, complex app needs complex
DBI. If
there is a single DBI then it must by simple to use for simple cases.

SO ...

A common pattern in database-driven applications is to display to a
user some
filtered set of data in tabular form, whether a TkTable on screen on
an HTML
table report.

Let's assume I have a table of products (name, description, price) and
I want
to display/report all products under a given price. Here are some
example
procs:

# Get all products under a given price using PostgreSQL.
# Returns the table of products as a list of tuples (lists).
headingsVar will
# be set to contain an ordered list of columns that indicates the
contents of
# the tuples.
package require pgintcl
proc pg_show_cheap_products {maxprice headingsVar} {

# Connect to the database
set opts "host = localhost port = 5432 dbname = shop user = app
password = mypass"
set dbhandle [pg_connect -conninfo $opts]

set finally_script
set failed [catch {
# Get the table of products as a list of lists
set stmt {SELECT name, description, price FROM t_products
WHERE price <= $0}
set result [pg_exec $dbhandle $stmt $maxprice]
set finally_script [list pg_result $result -clear]

if { [pg_result $result -status] ne "PGRES_TUPLES_OK" } {
error "sql query failed: [pg_result $result -error]"
}

set resulttable [pg_result $result -llist]

# Get the headings / column names
upvar $headingsVar headings
set headings [pg_result $result -attributes]
} e errorOpts]

catch $finally_script
pg_disconnect $dbhandle
if { $failed } {
return -options $errorOpts "couldn't get product list: $e"
}

return $resulttable
}

Some things to note about this:
1) In the absence of a Tcl try/finally I have used a workaround to
reduce
if/then blocks but ensure clean-up.
2) I have to check the status of the result. It is likely that if Tcl
had a
try/catch/finally syntax that I could avoid this extra check.
3) The pg interface uses ordered paremeters ($0, $1, ...) to the SQL
statement,
which requires some care from the developer.
4) pgintcl allows me to retrieve the resultset as a list of lists -
exceptionally convenient for this sort of query.

# Get all products under a given price using SQLite
# Returns the table of products as a list of tuples (lists).
headingsVar will
# be set to contain an ordered list of columns that indicates the
contents of
# the tuples.
package require sqlite3
proc sqlite_show_cheap_products {maxprice headingsVar} {

# Connect to the database
set dbfile "shop.db"
sqlite3 dbhandle $dbFile

set failed [catch {
# Get the table of products as a list of lists
set stmt {SELECT name, description, price FROM t_products
WHERE price <= $maxprice}
set resulttable {}
dbhandle eval $stmt dbrow {
set row {}
set _headings $dbrow(*)
foreach col $dbrow(*) {
lappend row $dbrow($col)
}
lappend resulttable $row
}

# Get the headings / column names
upvar $headingsVar headings
set headings $_headings
} e errorOpts]

dbhandle close
if { $failed } {
return -options $errorOpts "couldn't get product list: $e"
}

return $resulttable
}

Some things to note about this:
1) The structure of the sqlite interface doesn't require a finally
handler.
This is convenient, but not necessarily a good thing.
2) The interface returns the resultset in one of several ways:
2.a) in rows, as used here, where I can retrieve the ordered list of
columns
that are returned by the SELECT and build a list-of-lists using a
nested loop.
This is rather inefficient (in terms of performance and source code
elegance)
as the DB engine puts everything into an array and I must then pull it
out
again, and must also check or set the ordered list of headings from
within
a loop.
2.b) as a flat list (not used here) where I must know the ordered list
of
columns to be returned by SELECT as I cannot query it.

# Get all products under a given price using TDBC
# Returns the table of products as a list of tuples (lists).
headingsVar will
# be set to contain an ordered list of columns that indicates the
contents of
# the tuples.
package require MyTdbcCompliantDatabase
proc tdbc_show_cheap_products {maxprice headingsVar} {

# Connect to the database
set opts [dict create -host localhost -port 5432 -dbname shop -user
app \
-password mypass]
MyTdbcCompliantDatabase dbhandle {*}$opts

set finally_scripts [list dbhandle close]
set failed [catch {
# Get the table of products as a list of lists

# First we prepare the statement
set stmt {SELECT name, description, price FROM t_products
WHERE price <= :maxprice}
set stmthandle [dbhandle prepare $stmt]
lappend finally_scripts [list $stmthandle close]

# Then we execute the statement (using this stack frame for
variable
# substitution)
set resultset [$stmthandle execute]
lappend finally_scripts [list $resultset close]

# Then we need to get the ordered list of column names
set colresultset [$resultset columns]
lappend finally_scripts [list $colresultset close]

set _headings {}
while { [$colresultset nextrow row] != 0 } {
lappend headings [dict get $row "name"]
}

# Iterate over the rows (of the main resultset) and build into a
correctly
# ordered list of lists
set resulttable {}
while { [$resultset nextrow row] != 0 } {
set outrow {}
foreach col $_headings {
if { [dict exists $row $col] } {
lappend outrow [dict get $row $col]
} else {
lappend outrow {} ;# NULL value substitute - I don't care
about them
}
}
lappend resulttable $outrow
}

# Get the headings / column names
upvar $headingsVar headings
set headings $_headings
} e errorOpts]

foreach script $finally_scripts {
catch $script
}
if { $failed } {
return -options $errorOpts "couldn't get product list: $e"
}

return $resulttable
}

Some things to note about this:
1) It's a lot longer and more complex to do a simple query. This is
IMHO
Not Good.
2) I have had to use 4 finally scripts, the equivalent of 4 nested try/
catch/
finally statements in order to handle error conditions.
3) As with SQLite, the statement execution does something non-obvious
when it
substitutes variables in my stack frame into the SQL statement. This
is likely
to be a common source of errors.
4) By returning the columns as a resultset I have to do extra work to
get them
into a usable list. All other DBIs I have encountered make an ordered
list of
columns in the resultset straightforwardly available.
5) By using dicts to return the rows I have to:
5.a) Do a bunch of extra work to build an ordered list of tuples
(lists). Since
most DBMS return row data as an array (in the C sense, or a list in
the Tcl
sense) this means that the TDBC component is doing a bunch of work to
put the
row into a dict, which I must then do a bunch of work to undo. That's
a massive
performance hit PER ROW.
5.b) Handle NULLs. This will be another common source of errors, as
developers
assume that "SELECT a, b, c FROM t_xyz" will return a dict with keys
a, b and c
... but it won't. So at the first [dict get $resultrow ...] they may
have an
error if they encountered a NULL. This is seriously non-obvious.


DISCUSSION ...

You may be wondering why I'm doing the extra work to figure out the
ordered
list of columns returned by the SQL query (after all the SQL statement
is
right there). It's because the majority of each of those functions is
a
reusable bit of logic that we could call "extract_report":

proc db_specific_extract_report {stmt headingsVar vars}{
...
}

proc show_cheap_products {maxprice headingsVar} {
set stmt {SELECT name, description, price FROM t_products
WHERE price <= :maxprice}
uplevel 1 [list db_specific_extract_report $stmt $headingsVar \
[dict create maxprice $maxprice] \
]
}

Now I could just return a list of dicts and leave it to the calling
code to
sort out, but I have this rather convenient function "populate_table"
that fills
up a TkTable or ttk::treeview from a list of headings and a list of
tuples;
and another rather convenient function "create_html_report" that,
well,
creates an HTML table in pretty much the same way.

In fact I can create an enormously flexible reporting facility by
extending
this just a little:

proc define_report {title sqlstmt args} {
# each arg is a tuple of entry-title, entry-type and entry-variable
# which describes what a report UI must solicit from the user
# entry-type is a regexp
...
}

define_report "Cheap products" \
{SELECT name, description, price FROM t_products WHERE price
<= :maxprice} \
{"Maximum product price" {\d+} maxprice}

# the report UI and execution is left as an exercise to the reader

If I leave column ordering to the calling code, then it must know what
columns
the query can return, and in what order they should be displayed or
reported.
So now I need to keep my column names and column order in multiple
places in my
code and write extra logic to re-order columns.

In the reporting example I'm doing to have to add a list of column
names and
some reordering code (extra overhead and extra code to maintain) to
give me
absolutely zero extra functionality - I could have just reordered the
fields
in the SQL statement ...

As a _commercial_ software engineer having interfaces that make coding
easy
(time efficient, obvious, easy to maintain, etc) is really important
to me.

I'm not assuming that queries are always this simple. In fact we have
an
application where the user can select up to three mutually independent
filters,
and the user's view of the world is so different from the data
structure
(unless we want to manage a heavy amount of redundancy) that two
different
options of the same filter can require a reordering of the SQL
statement
(rather than just a different value in a WHERE clause). Our option
was to
build the SQL statement dynamically, or to implement about 8 separate
functions, plus one dispatcher to figure out which combination of
filter options leads to which function. But when it comes to running
that SQL
query we hand it to something very similar to extract_report (above)
and
everything happens magically from there.


MY CONCERNS ...

So here is a list of specific concerns that arise from this exercise:

(1) The DBI is complex. The comparative lengths of my example
functions
illustrate that clearly.
If ::tcl::db::execute had a mechanism to report the ordered list of
columns
returned this would make "simple" development a lot easier (comparable
to
SQLite). This change alone would also void my comments about error
handling
(at least to some degree).

(2) The amount of error handling this DBI requires is significant - 4
nested
error handlers. Other DBIs require 1 to 3.
This level of error handling detracts from my task of writing in
functionality
makes quality assurance more difficult.

(3) Taking variables from the current stack frame is a recipe for
trouble,
and makes writing logic like extract_report more complex and slower
(best case: [dict with ...]; worst case: foreach {name value}
{ set ... })
I would prefer to see execute take a dict; then it is at least clear
what
variables you are (and are not) providing. Taking variables from the
stack
without clearly indicating them _in Tcl code_ (as opposed to another
language
like SQL) is a little to much of a DWIM for my liking.

(4) Returning the columns as a resultset is a real pain. Rather make
the
columns available as
4.a) a list of tuples; or
4.b) a list of column names, plus a function to get a dict of
information
about the column (given its name)

(5) The primitive function to retrieve a row needs to retrieve it as a
list,
IM(NS)HO. This is the natural structure to represent ordered data as
retrieved from a DBMS, and avoids a bunch of overhead involved in
creating
and then parsing out a dict.

Every DBI I have used that uses an array instead of a list (like
SQLite)
just causes me extra work. It is easy for me to lassign out of a
list,
hard to lassign multiple values out of a dict (especially if you have
to have
error handling for NULL cases), and hard to build an ordered list from
an array.
It is of course hard to build a dict from pairwise ordered lists, so
clearly
there is a trade-off between the list/dict options.
Sidenote: I realise that "easy" and "hard" here amount to a difference
of 2 or 3
lines of code, but it is (i) extra code to maintain, (ii) extra code
obscuring
what is actually going on, (iii) a loop that causes a performance hit
which -
over a 1000 row report - can add up to a user-perceptable difference
(and should
certainly be a concern for enterprise use!).

I understand the concerns around NULL, but returning this data as a
dict is
bad for at least two reasons:
5.a) The performance hit; AND
5.b) The implementation dangers you face by not having keys in the
dict that
the developer sanely expects to find there.

Furthermore, in a typical/simple database-driven application
environment you
either (i) don't expect to deal with NULL or (ii) expect NULL and an
empty
string to be equivalent, so having to deal with this problem
explicitly
(as I did in my example) is just added and unnecessary complexity.

Even further to that, the very concept of NULL is not Tclish and Tcl
developers
don't have to think about "dereferencing NULL"; so introducing a
landmine like
that is a Bad Idea.

I find SQLite's solution to this problem (nullvalue, see
http://www.sqlite.org/tclsqlite.html#nullvalue) quite interesting as
it covers
"the typical cases" but perhaps it isn't a "good enough" solution.

Perhaps an acceptable solution may be a combination of nullvalue and
isnull.
The returned list has $nullvalue substituted for null; and
implementation that
needs to distinguish NULL from not-NULL can check each fields that
matches
$nullvalue by calling a "$resultset isnull $colidx" function.

A better solution may be to have nullvalue plus nextrow_dict and
nextrow_list
methods. Hmm.

(6) I am concerned by the number of ensembles that a simple query must
create
(and clean up). This smells like a performance concern, especially
for a
pure-Tcl implementation for a specific database.

I am also worried that the "set handle [create_x] ; $handle" pattern
can be
a source of errors, is impossible to statically analyze, and is a pain
to
debug (think 'invalid command name ""' versus 'invalid result
handle ... while
executing pg_result') (and it's about the closest Tclers can get to a
NULL
dereference ;) ).

I suppose this is really a question of whether the API should be
modelled in
an OO or procedural (data-passing) style. There are benefits to each.

(7) A SELECT operation is fundamentally different from other commands
that
can be executed. This is clear for example when you read the spec
for
"resultset rows", which does not provide a meaningful value in the
case of
SELECT. How you query a resultset is thus dependant on the operation
that
was requested, which can make creating generic DB manipulation
functions
rather fun. Other DBIs have also struggled with this problem.

May I propose a distinction between a "query" function and an
"execute" or
"alter" function? We have made this distinction in our in-house DBI
and it
works well for us. I'd be interested to hear opinions from others.


Okay, that's my 2c and 400 lines.
Regards,
Twylite

Kevin Kenny

unread,
Nov 16, 2007, 10:58:15 AM11/16/07
to
Donal K. Fellows wrote:
> TIP #308: TCL DATABASE CONNECTIVITY (TDBC)
It occurs to me that I owe the community a cover letter presenting a
little bit of background, and assigning due credit to the sources of
various ideas.

Tcl's database access has for years been in a mild state of disarray.
The trouble is not that Tcl can't talk to databases. In fact, Tcl has
an impressive array of database access codes available, and a good
many of them are quite easy to use and well maintained. Nor is the
trouble that database access is not in the core language. By and
large, database users recognize the need for a "database driver,"
supplied by the database vendor or a third party, to connect a given
database to a given interconnection fabric. People install Oracle, or
MySQL, or Postgres95, or whatever, and expect that the installer will
provide an ODBC driver, a JDBC driver, or a Perl DBI (according to
their applications' programming environments) that will allow
connection. Nobody expects the runtime environments of Java, Visual
Studio, or Perl to provide connectivity to all the databases.

What's missing, rather, is a certain sense that databases are
"supported by Tcl." There's a certain feel that they are tolerated as
second class citizens rather than embraced as partners. There are
recurring questions, "doesn't Tcl have anything like JDBC (or DBI)?"
Clearly, something is needed to make databases feel more at home.

One cause of the feeling that databases aren't fully embraced, it
seems to me, is that their application programming interfaces (API's)
are so varied, since each was invented by the implementor of the
corresponding database interface module. This variability gives Tcl's
database access the feel of being a collection of bits grafted onto
Tcl, rather than a unified whole. This is an area where, it appears,
only the Tcl Core Team can help. While, for example, Michael Cleverly
has wrought a tremendous unification by implementing and releasing his
fine 'ns_tcl' package, it enjoys limited acceptance -- programmers
either do not hear of it, or else do not consider its benefits to be
worth the effort of installing yet another package, even one
implemented in pure Tcl. Moreover, 'ns_tcl' places an unreasonable
burden on its maintainers, since they are the ones that are
responsible for grafting it atop the heterogeneous APIs of 'oratcl',
'mysqltcl', 'tclodbc', 'sqlite3', and so on. The Tcl Core Team needs,
therefore, to use the bully pulpit that the office provides to
promulgate a uniform database API.

Building a uniform, "Tcl Core-supported," database API will actually
require comparatively little code in the Tcl Core itself. Unlike most
of the other languages with database API's, Tcl enjoys a dynamic
nature that allows much of the "glue code" of a package like ODBC or
JDBC to vanish entirely. If the interfaces associated with a database
are represented with command ensembles, then any ensemble that
implements the specified interfaces is _ipso facto_ a database
connection. The body of code in ODBC and JDBC that is resposible for
relaying the programmer's request to the database driver therefore
vanishes; the driver simply talks directly to the program.

Even the "database open" primitive is left unspecified in TIP 308. The
reason, once again, is the dynamic nature of Tcl. While JDBC, ODBC,
and the like generally provide a single "open" entry point, which
specifies a "connection string" to specify the database instance, Tcl
has no need of such a beast. Connection strings are
database-dependent, and the "open" operations of JDBC and ODBC simply
function as interpreters of the little language of connection strings,
allowing them to specify the driver and parameters. Tcl has a
perfectly good interpreter already; the "connection string" most
naturally appears as a Tcl command to evaluate.

Much of the rest of the design rationale will become apparent in
reading the TIP, and I won't say more about it in this
message. Instead, let me move on to the process we should follow in
fleshing out the TIP and considering it for acceptance.

This TIP has already been written with a good deal of informal support
from several maintainers of database interfaces, notably D. Richard
Hipp (sqlite3), Brett Schwarz (pgtcl), and Tom Poindexter (the
original author of oratcl). If in limiting my consultations, I've
given the impression of giving short shrift to the others, I
apologize. At the stage of actually writing a first draft, I had the
distinct impression that it would suffer greatly from "design by
committee," and might easily devolve into an incoherent mess of
features. The time has now come for "many eyeballs" to examine the
draft, and I promise to make every effort to respond to the concerns
of the community. In particular, I'm eager to hear from the
implementors of the many database APIs that are out there, telling me
what's ill-considered, what's unimplementable, what's insecure, ... in
general, what needs to be improved.

In fact, I've already identified a couple of areas that I plan to
rewrite in the next couple of days. In particular, the section on
transactions needs to be reconsidered (I thank Donal Fellows for
pointing me to a better idea, which I'm now working on fleshing out);
and the [$statement execute] command, and the convenience procedures
that use it, needs to be reworked to allow for more graceful handling
of dynamically generated queries (Thanks are due to Joe English for a
concrete suggestion of how to fix this one).

A few aspects of the specification, however, I consider
non-negotiable. The representation of result set rows as dictionaries
(which I believed was unique to this spec, but have since discovered
was anticipated by pgtcl), the mandate of support for prepared
statements (needed to guard against SQL insertion attacks) and the
representation of database objects as instance commands are among
these.

Moreover, I hope to be allowed to have the final say over the wording
of the specification as voted. I consider it essential to have a
single auctorial voice in such documents; I've seen too many of them
ruined by committees. I apologize for the presumption of arrogating
this role unto myself, and offer the weak defence that nobody else has
stepped up to do it.

I do commit not to call the vote until an implementation of the
specification, supporting at least one client-server and one
in-process database engine, is publicly available. I also commit to
withhold the vote while reasonable doubt remains that any of the
popular databases will be able to conform with the specification.
TIP 308 is therefore very much to be considered a work in progress for
some time yet.

--
73 de ke9tv/2, Kevin

Donal K. Fellows

unread,
Nov 16, 2007, 11:00:52 AM11/16/07
to
Twylite wrote:
> (1) The DBI is complex. The comparative lengths of my example
> functions illustrate that clearly. If ::tcl::db::execute had a
> mechanism to report the ordered list of columns returned this would
> make "simple" development a lot easier (comparable to SQLite). This
> change alone would also void my comments about error handling (at
> least to some degree).

We hope that we can get so most of that gunk is hidden.

> (2) The amount of error handling this DBI requires is significant - 4
> nested error handlers. Other DBIs require 1 to 3. This level of
> error handling detracts from my task of writing in functionality
> makes quality assurance more difficult.

Definitely. It's an early revision of this TIP I think, a place to start
the serious discussion from and not the thing that will get done at the
end. Not yet. :-)

> (3) Taking variables from the current stack frame is a recipe for
> trouble, and makes writing logic like extract_report more complex and
> slower (best case: [dict with ...]; worst case: foreach {name value}
> { set ... }) I would prefer to see execute take a dict; then it is at
> least clear what variables you are (and are not) providing. Taking
> variables from the stack without clearly indicating them _in Tcl
> code_ (as opposed to another language like SQL) is a little to much
> of a DWIM for my liking.

That's planned. It was one of the first things that was pointed out as a
flaw. :-)

> (4) Returning the columns as a resultset is a real pain. Rather make
> the columns available as
> 4.a) a list of tuples; or
> 4.b) a list of column names, plus a function to get a dict of
> information about the column (given its name)

Now you're just talking about how *I* would do it too. :-D

> (5) The primitive function to retrieve a row needs to retrieve it as
> a list, IM(NS)HO. This is the natural structure to represent ordered
> data as retrieved from a DBMS, and avoids a bunch of overhead
> involved in creating and then parsing out a dict.

No. That *really* screws with Tcl's value system.

> (6) I am concerned by the number of ensembles that a simple query
> must create (and clean up). This smells like a performance concern,
> especially for a pure-Tcl implementation for a specific database.

Psst! Use an object!

> (7) A SELECT operation is fundamentally different from other commands
> that can be executed. This is clear for example when you read the
> spec for "resultset rows", which does not provide a meaningful value
> in the case of SELECT. How you query a resultset is thus dependant
> on the operation that was requested, which can make creating generic
> DB manipulation functions rather fun. Other DBIs have also struggled
> with this problem.

Actually, it's not that clean. Apparently Oracle can return a result set
from non-SELECT statements if asked nicely. (Yes, this surprised me a
lot too.)

Donal.

Don Porter

unread,
Nov 16, 2007, 11:54:20 AM11/16/07
to
Twylite wrote:
> (Aside: Is there somewhere on the Wiki to maintain an ongoing
> discussion/comments?)

The TIP itself may be edited in the Wiki style. Look for the
Edit link at the bottom.

http://tip.tcl.tk/308

One common practice is to just add a "~ Comments" section at
the bottom, and have at it.

DGP

Wojciech Kocjan

unread,
Nov 16, 2007, 4:56:54 PM11/16/07
to Donal K. Fellows
Dnia 16-11-2007 o 17:00:52 Donal K. Fellows
<donal.k...@manchester.ac.uk> napisał(a):

>> (5) The primitive function to retrieve a row needs to retrieve it as
>> a list, IM(NS)HO. This is the natural structure to represent ordered
>> data as retrieved from a DBMS, and avoids a bunch of overhead
>> involved in creating and then parsing out a dict.
> No. That *really* screws with Tcl's value system.

I have to say I disagree here. A lot.

First of all, I think SQLite's syntax (flat list) really rocks:

foreach {id name value} [$db eval {SELECT id,name,value FROM table}] {
...
}

Also, a list of lists is a really useful construction. For example you can
pass that directly to tablelist widget, so all you do is call delete 0 end
and do an insertlist $result on it.

Also, I'm not sure if that changed in 8.5, but [lsort] doesn't talk to
dicts nicely - so sorting a list of lists would work much better than
getting dictionaries.

While I know dicts would be the most elegant way, I don't think they're
the fastest (in terms of performance) and most powerful ones.

I think that query results should be returned as:
- flat list
- list of lists
- list of dicts

--
Wojciech Kocjan

Wojciech Kocjan

unread,
Nov 16, 2007, 5:04:36 PM11/16/07
to
Dnia 15-11-2007 o 23:45:22 Donal K. Fellows
<donal.k...@manchester.ac.uk> napisał(a):

> TIP #308: TCL DATABASE CONNECTIVITY (TDBC)

First of all, great work on all of this, Donal!

Some comments based on what is missing that's in tcldb package:

1/ There should also be an API (perhaps another package that would work on
top of this one?) to allow quick access to trivial
select/insert/update/delete statements. Something like:

objectName insert tablename fields values
objectName delete tablename qfields qvalues
objectName update tablename fields values qfields qvalues
objectName select tablename fields ?qfields? ?qvalues?

For example - to set count to 25 for ID=1, you can just
$db update mytable {count} {25} {id} {1}
(this could work nicer using dicts, probably)

2/ I don't see any support for serial/auto_increment type of fields - this
is DB dependant but I think is used a lot. Most database APIs will allow
you to fetch it somehow assuming you know the field you're interested in.

3/ Inlined transactions - something like SQLite's [$db transaction] and a
similar part in tcldb

4/ Does your proposal handle any automatic reconnecting or handling of
dropped connections?

Perhaps some functionality is outside of this TIP's scope, but perhaps it
would make sense to split the API into lower and higher levels - ie
drivers providing some standard behavior and some other pure-Tcl offers
more advanced functionality that works on top of db APIs?

--
WK

Wojciech Kocjan

unread,
Nov 16, 2007, 5:04:59 PM11/16/07
to
Dnia 16-11-2007 o 17:00:52 Donal K. Fellows
<donal.k...@manchester.ac.uk> napisał(a):

>> (5) The primitive function to retrieve a row needs to retrieve it as
>> a list, IM(NS)HO. This is the natural structure to represent ordered
>> data as retrieved from a DBMS, and avoids a bunch of overhead
>> involved in creating and then parsing out a dict.
> No. That *really* screws with Tcl's value system.

I have to say I disagree here. A lot.

tom.rmadilo

unread,
Nov 16, 2007, 5:52:14 PM11/16/07
to
On Nov 16, 2:04 pm, "Wojciech Kocjan" <wojciec...@gazeta.pl> wrote:
> I think that query results should be returned as:
> - flat list
> - list of lists
> - list of dicts

Actually, as I just happened to be looking at dict and comparing that
with lists, I would have to say that a dict is the worst possible
choice for database results. On of the important benefits of a
database is that you can (and always should!) order your results. For
instance, postgresql always returns rows in the order it finds them if
there isn't an order by statement. So, an updated row goes to the
bottom of the list, even if it was first just before an update.

And, of course SQL allows multiple result columns to have the same
name, which would cause the dict to create a sub-dict. The order of
the sub-dict would also not have any relation to the order it was
added, and it also destroys the width of the result set.

I would recommend a list of lists for one reason: you can't tell if a
list contains sublists. A single item could look like a list of
multiple item. However, another idea I have used is to return a list
of references. Since references will (should) be words, there is no
ambiguity as to the number of them in a list. Also, a list of
references allows you to append/lappend, and efficiently pass around
the list.

Donal K. Fellows

unread,
Nov 16, 2007, 5:59:49 PM11/16/07
to
Wojciech Kocjan wrote:
>> TIP #308: TCL DATABASE CONNECTIVITY (TDBC)
> First of all, great work on all of this, Donal!

Not me! I'm just the Editor! This is Kevin Kenny's work. (I couldn't be
bothered to resurrect my TIP publisher code, so I posted it directly as
myself.)

Donal.

Roy Terry

unread,
Nov 16, 2007, 6:09:47 PM11/16/07
to
Wojciech Kocjan wrote:
>
> I think that query results should be returned as:
> - flat list
> - list of lists
> - list of dicts
>
I must second this sentiment. To me dicts
are convenient but exotic and high overhead.
The dict syntax is puzzling to many. Lists
are first class and well understood and optimized
object.

Please support lists.

Roy

Twylite

unread,
Nov 17, 2007, 12:30:25 AM11/17/07
to
On Nov 17, 12:52 am, "tom.rmadilo" <tom.rmad...@gmail.com> wrote:
> Actually, as I just happened to be looking at dict and comparing that
> with lists, I would have to say that a dict is the worst possible
> choice for database results. On of the important benefits of a
> database is that you can (and always should!) order your results.

Thanks for putting that so much more consisely than I did ;)

> And, of course SQL allows multiple result columns to have the same
> name, which would cause the dict to create a sub-dict. The order of
> the sub-dict would also not have any relation to the order it was
> added, and it also destroys the width of the result set.

I wasn't aware of this - it kindof destroys the dict option doesn't
it?

> I would recommend a list of lists for one reason: you can't tell if a
> list contains sublists.

Actually I'm going to recommend against a list of lists, and go with
the nextrow-returns-a-list suggestion. I don't think the DBI should
retrieve all results into memory unless you specifically ask it to
(using a helper like pg's -llist), so the primitive operation for
resultset needs to fetch a single row (the next row in order).
Building those return values (preferably lists) into a list is a
relatively cheap operator, so a llist wrapper/helper would not be a
problem to code in Tcl.

Twylite

Twylite

unread,
Nov 17, 2007, 1:02:49 AM11/17/07
to
On Nov 16, 5:58 pm, Kevin Kenny <kenn...@acm.org> wrote:
> A few aspects of the specification, however, I consider
> non-negotiable. The representation of result set rows as dictionaries
> (which I believed was unique to this spec, but have since discovered
> was anticipated by pgtcl), the mandate of support for prepared
> statements (needed to guard against SQL insertion attacks) and the
> representation of database objects as instance commands are among
> these.

Ah, how the mightly shall fall ;)

I have serious concerns about two of these (dicts and prepared
statements), both on performance and usability grounds. I'm clearly
not the only one ;)

Ultimately one needs to go through the motions of connect, prepare,
execute, get-results; the question is how best to represent these. A
database object as an ensemble is a natural fit in Tcl, but I'm less
convinced about things that will return short-lived commands/ensembles/
objects - I'd like to see a performance comparison of a TclOO-
implemented DB versus a DB that returns and operates on data handles.

I have serious doubts about all aspects of dicts. They have no
natural relationship to database rows (which are ordered), they will
impose (unnecesary) performance overhead, for many queries they will
put extra work on the developer to reconstruct the data into a tabular
form (list of lists), and tom.rmadilo raised a concern about handling
multiple columns with the same name. The sole benefit I can see of
dicts is that they offer a better way than lists to handle NULLs, and
even then the approach invites Tcl exceptions (IMHO, especially for
the unwary - i.e. the majority).

I think the solution is to take a harder look at NULL handling, and
the possibility of having to support both a dict and a list interface :
(

My 3c (inflation)
Twylite

Twylite

unread,
Nov 17, 2007, 1:08:40 AM11/17/07
to
On Nov 17, 8:02 am, Twylite <twylite.cr...@gmail.com> wrote:
> I think the solution is to take a harder look at NULL handling, and
> the possibility of having to support both a dict and a list interface

wrt NULL handling, I note the following:
- Tcl has no NULL and there is no way to represent NULL as a string.
I agree that TIP#185 is flawed and tryig to represent NULL in Tcl is a
Bad Approach.
- You could represent NULL as the lack of an expected element (as this
TDBC proposal does), but this is a landmine for the unwary (i.e. added
complexity and error handling for simple cases).
- Substituting NULL with a specific (possibly developer defined) value
will cover the extent to which many developers will interact with
NULL. Clearly this doesn't provide for the cases where there must be
certainty about NULL vs not-NULL, but it may be the starting point of
a middle ground.
- You could query for NULL as a separate step, either per cell or per
row. e.g.
$resultset columns -> list of col names
$resultset nextrow -> ordered list of fields
$resultset nulls -> ordered list of bools indicating NULL/not-NULL
fields for the most recently returned row; OR
$resultset nulls -> list of col names that contain NULLs for the
most recently returned row
- Using the above it is easy to convert from a list interface to a
dict interface:
set rowdict [dict create]
foreach {name value isnull} [$resultset columns] [$resultset
nextrow] [$resultset nulls] {
if { ! $isnull } {
dict set rowdict $name $value
}
}
- Remember that to convert from a dict to a list is more complex (in
code and in performance):
# first get the ordered list of cols, let's assume $resultset
columns does that
set row [list]
set dictrow [$resultset nextrow]
foreach name [$resultset columns] {
if { [dict exists $dictrow $name] } {
lappend row [dict get $dictrow $name]
} else {
lappend row $::nullvalue
}
}
- A microbenchmark suggests that dict access is at least 30% slower
than lindex, and foreach is even faster than a for/lindex. With two
dict accesses per column you pay a high price to convert a row from a
dict to a list.

Regards,
Twylite

Wojciech Kocjan

unread,
Nov 17, 2007, 6:01:18 AM11/17/07
to
Dnia 16-11-2007 o 16:58:15 Kevin Kenny <ken...@acm.org> napisał(a):
> A few aspects of the specification, however, I consider
> non-negotiable. The representation of result set rows as dictionaries
> (which I believed was unique to this spec, but have since discovered
> was anticipated by pgtcl), the mandate of support for prepared
> statements (needed to guard against SQL insertion attacks) and the
> representation of database objects as instance commands are among
> these.

Same as Twylite, I do find both of these awful non-negotiable parts. To be
honest, both of them make me stick with my tcldb instead of liking the new
"standard".

First of all, dicts are the worst possible way. A best choice is to allow
programmers to choose how they want the data provided -
list or dicts. Joining into a flat list or a list of lists should also be
done at C level to speed things up. Perhaps even a tcldb-like foreach
could be implemented:

$db query foreach {name value} "SELECT name, value FROM mytable" {
puts "$name='$value'"
}

Second of all, prepared statements are not a must. See how it's done in
tcldb - if you don't want a prepared statement, you just write

foreach {name value} [$db query flatlist "SELECT name, value FROM mytable
WHERE name LIKE '@LIKE@'" like $like] {
puts "$name='$value'"
}

In that case tcldb converts @LIKE@ to value of $like with substituting
database-specific characters for you. It's then much easier not to shoot
yourself in the leg.

--
Wojciech Kocjan

Michael Schlenker

unread,
Nov 17, 2007, 7:24:24 AM11/17/07
to
tom.rmadilo schrieb:

> On Nov 16, 2:04 pm, "Wojciech Kocjan" <wojciec...@gazeta.pl> wrote:
>> I think that query results should be returned as:
>> - flat list
>> - list of lists
>> - list of dicts
>
> Actually, as I just happened to be looking at dict and comparing that
> with lists, I would have to say that a dict is the worst possible
> choice for database results. On of the important benefits of a
> database is that you can (and always should!) order your results. For
> instance, postgresql always returns rows in the order it finds them if
> there isn't an order by statement. So, an updated row goes to the
> bottom of the list, even if it was first just before an update.
Read the spec again. Its 'list of dicts', so you get the row ordering
from the list but don't get a column ordering. (the existance of a
column ordering in SQL is considered to be a flaw in implementing the
relational model by some).

>
> And, of course SQL allows multiple result columns to have the same
> name, which would cause the dict to create a sub-dict.
Which RDBMS supports that abdominal feature? Sounds like a really stupid
idea. (i know it allows multiple columns with names like table_1.a and
table_2.a, which is sane).

Michael

Michael Schlenker

unread,
Nov 17, 2007, 7:39:31 AM11/17/07
to
Twylite schrieb:

> On Nov 17, 8:02 am, Twylite <twylite.cr...@gmail.com> wrote:
>> I think the solution is to take a harder look at NULL handling, and
>> the possibility of having to support both a dict and a list interface
>
> wrt NULL handling, I note the following:
> - You could represent NULL as the lack of an expected element (as this
> TDBC proposal does), but this is a landmine for the unwary (i.e. added
> complexity and error handling for simple cases).
NULLs are landmines. If you don't handle NULL your in for trouble anyway.

> - Substituting NULL with a specific (possibly developer defined) value
> will cover the extent to which many developers will interact with
> NULL. Clearly this doesn't provide for the cases where there must be
> certainty about NULL vs not-NULL, but it may be the starting point of
> a middle ground.

Can be had trivially with a dict approach too. Just use dict merge with
the first dictionary containing just your user specified NULL values.

Basically if you want NULLs as empty strings:

set default [dict create ...]
set result [dict merge $default $dbresult]

> - You could query for NULL as a separate step, either per cell or per
> row. e.g.
> $resultset columns -> list of col names
> $resultset nextrow -> ordered list of fields
> $resultset nulls -> ordered list of bools indicating NULL/not-NULL
> fields for the most recently returned row; OR
> $resultset nulls -> list of col names that contain NULLs for the
> most recently returned row

Getting an ordered list of columns from [$resultset columns] seems like
a convenient idea. Its currently not specified as 'ordered' in the spec.
From a pure relational POV the ordering is insignificant.

But i see the value in specifing your column names and ordering in just
one place (the sql statement) instead of having double work in specifing
it first in the query and then again while iterating the result set.

Michael

Michael Schlenker

unread,
Nov 17, 2007, 7:52:51 AM11/17/07
to
Wojciech Kocjan schrieb:

> Dnia 16-11-2007 o 16:58:15 Kevin Kenny <ken...@acm.org> napisał(a):
>> A few aspects of the specification, however, I consider
>> non-negotiable. The representation of result set rows as dictionaries
>> (which I believed was unique to this spec, but have since discovered
>> was anticipated by pgtcl), the mandate of support for prepared
>> statements (needed to guard against SQL insertion attacks) and the
>> representation of database objects as instance commands are among
>> these.

> Second of all, prepared statements are not a must. See how it's done in


> tcldb - if you don't want a prepared statement, you just write
>
> foreach {name value} [$db query flatlist "SELECT name, value FROM
> mytable WHERE name LIKE '@LIKE@'" like $like] {
> puts "$name='$value'"
> }
>
> In that case tcldb converts @LIKE@ to value of $like with substituting
> database-specific characters for you. It's then much easier not to shoot
> yourself in the leg.

How is this much different from the quoted part of TIP#308:


"An interface to a database whose native API does not support prepared

statements MUST simulate them. In particular, when the run or execute


commands are executed on a statement, substitution must be performed in
a safe fashion with whatever magic quoting is required. In any case,
magic quoting should be regarded as an infelicitous expedient and
avoided if at all possible."

If you say its not possible to support a prepared statement for all
types of databases, so just use this approach.

And from the current TIP i see also:
::tcl::db::executewithdictionary dbHandle SQL-code dictionary ?variable?
?script?

which is basically what your code does. The only thing that the spec
really mandates is 'protection against stupid sql injection attacks' and
it highly recommends prepared statements (like most of the rest of the
world does) to reach this goal.

Michael

Michael Schlenker

unread,
Nov 17, 2007, 8:06:34 AM11/17/07
to
Michael Schlenker schrieb:

> Getting an ordered list of columns from [$resultset columns] seems like
> a convenient idea. Its currently not specified as 'ordered' in the spec.
> From a pure relational POV the ordering is insignificant.
>
> But i see the value in specifing your column names and ordering in just
> one place (the sql statement) instead of having double work in specifing
> it first in the query and then again while iterating the result set.

Misread the spec. The column names are ordered in the resultset
retrieved from columns, it just needs extra steps to get them as a
simple list.

So building the list of lists style from a dict based resultset would
look like this:

proc lol {resultset {nullvalue}} {
set cols [$resultset columns]
set columns [list]
dict create nulldict
while {[$cols nextrow col]} {
set name [dict get $col name]
lappend columns $name
dict set nulldict $name $nullvalue
}
set results [list]
while {[$resultset nextrow row]} {
set rowlist [list]
set rvalue [dict merge $nulldict $row]
foreach col $columns {
lappend rowlist [dict get $rvalue $col]
}
lappend results $rowlist
}
}

Michael

Eric Hassold

unread,
Nov 17, 2007, 10:05:20 AM11/17/07
to
Michael Schlenker wrote :

>> And, of course SQL allows multiple result columns to have the same
>> name, which would cause the dict to create a sub-dict.
> Which RDBMS supports that abdominal feature? Sounds like a really stupid
> idea. (i know it allows multiple columns with names like table_1.a and
> table_2.a, which is sane).

% psql template1 -c "select 1 AS x, 2 AS x;"
x | x
---+---
1 | 2
(1 ligne)

% mysql
mysql> SELECT 1 AS x, 2 as x;
+---+---+
| x | x |
+---+---+
| 1 | 2 |
+---+---+
1 row in set (0.00 sec)

% sqlite3 :memory: "SELECT 1 AS x, 2 AS x;"
1|2

Also, to illustrate with a more usual case, two columns may end up
having same name because of implicit naming:

psql mydb -c "SELECT cid,sum(stock),sum(price) FROM c GROUP BY cid"
cid | sum | sum
-----------------+---------+---------
XXXXXXXXXXXXXXX | -337.22 | -3041.4


Of course, explicit column naming with 'AS' would be recommended. But to
answer your question/comment "which RDBMS supports that abdominal
feature?, I would say most if not all.

Eric

tom.rmadilo

unread,
Nov 17, 2007, 1:01:22 PM11/17/07
to
On Nov 16, 9:30 pm, Twylite <twylite.cr...@gmail.com> wrote:
> Actually I'm going to recommend against a list of lists, and go with
> the nextrow-returns-a-list suggestion. I don't think the DBI should
> retrieve all results into memory unless you specifically ask it to
> (using a helper like pg's -llist), so the primitive operation for
> resultset needs to fetch a single row (the next row in order).
> Building those return values (preferably lists) into a list is a
> relatively cheap operator, so a llist wrapper/helper would not be a
> problem to code in Tcl.

Yes, I agree with this. It is a five minute job to put everything into
whatever form you want, which is why the idea of going directly into a
dict is pointless. If all a dict has in it is one 'row' of data, what
is wrong with an array? In constructing an array, you could just as
easily lappend the data onto the end of an element.

What I suspect is that it looks like you can pass dicts around like
lists, as a value. Maybe this is seen as an advantage?

I wrote an DB API for this type of interface which returned a list of
array names. Using arrays, I realize the assumptions of ordered
columns and unique column names are lost, but that was _my_ decision.
But this happened to be a huge improvement over the previous result
set I had been using which maintained a bunch of other state, in a way
similar to a dict, but maybe uglier.

Now I'm rethinking this API, maybe an ordered list is a better idea.

Okay, back to the subject of why a list of lists. There is no need to
pull all the rows at once, and it is better left to the developer,
that is a thin helper proc. My reason for going straight to a list of
rows (whatever form), is that I used pooled database connections.
Instead of hogging both memory and an open connection, I chose to hog
the memory, or rewrite my query to return less information, and
release the connection back to a pool. This probably isn't what you
would do with an interactive GUI application, but for a web app, there
is no reason to select any more data than needed, you only get one
shot at doing the query.

But if you can't loop over the results one row at a time, you might
not be able to do sub-queries, that is, for each row, do another
database query using a different dbhandle. My experience is that this
is usually a very slow alternative to a correctly structured database,
but it is also possible that you will make a connection to some other
database/tablespace/etc.

So the single row return and maintaining an open result set is still
useful, and is still the most basic requirement of the interface. What
happens above this level should be up to the developer, not the
interface designer. Nothing is ever gained by building in unnecessary
limitations.

tom.rmadilo

unread,
Nov 17, 2007, 1:33:48 PM11/17/07
to
On Nov 17, 5:06 am, Michael Schlenker <schl...@uni-oldenburg.de>
wrote:

> So building the list of lists style from a dict based resultset would
> look like this:
>
> proc lol {resultset {nullvalue}} {
> set cols [$resultset columns]
> set columns [list]
> dict create nulldict
> while {[$cols nextrow col]} {
> set name [dict get $col name]
> lappend columns $name
> dict set nulldict $name $nullvalue
> }
> set results [list]
> while {[$resultset nextrow row]} {
> set rowlist [list]
> set rvalue [dict merge $nulldict $row]
> foreach col $columns {
> lappend rowlist [dict get $rvalue $col]
> }
> lappend results $rowlist
> }
>
> }

If you have two columns with the same name, you can't do this, as a
matter of fact, you would might have to test to see if a value was
another dict, with sub-values, or just a value. The sub-values are
unordered because they are in a dict. Compare you code above with
similar code which uses two ordered lists...sorry this is a long
example, but illustrates dict vs lists:

# Comparison with Tcl's 8.5 dict command:

namespace eval ::tws::nvlist::testEmployees {

variable employeeInfo [list]
variable employeeInfo2 [list]
variable employeeInfo2Fields [list]

namespace import ::tws::nvlist::*


addItem employeeInfo [list 12345-A forenames "Joe"] 0 2
addItem employeeInfo [list 12345-A surname "Schmoe"] 0 2
addItem employeeInfo [list 12345-A street "147 Short Street"] 0 2
addItem employeeInfo [list 12345-A city "Springfield"] 0 2
addItem employeeInfo [list 12345-A phone "555-1234"] 0 2

addItem employeeInfo [list 98372-J forenames "Anne"] 0 2
addItem employeeInfo [list 98372-J surname "Other"] 0 2
addItem employeeInfo [list 98372-J street "32995 Oakdale Way"] 0 2
addItem employeeInfo [list 98372-J city "Springfield"] 0 2
addItem employeeInfo [list 98372-J phone "555-8765"] 0 2

# Condensed Rep:

createEnum employeeInfo2Fields id forenames surname street city
phone

addItem employeeInfo2 [list 12345-A "Joe" "Schmoe" \
"147 Short Street" "Springfield" \
"555-1234"]

addItem employeeInfo2 [list 98372-J "Anne" "Other" \
"32995 Oakdale Way" "Springfield" "555-8765"]

}

proc ::tws::nvlist::testEmployees::printStuff { } {

variable employeeInfo
variable employeeInfo2
variable employeeInfo2Fields

set i 0
set return_value ""
set fieldNames [::tws::nvlist::toValue employeeInfo2Fields *]

# Create a return value
append return_value "There are [llength $employeeInfo2] employees
\n"

# Loop over each record
foreach record $employeeInfo2 {

foreach $fieldNames $record {

append return_value "Employee \#[incr i]: $id
Name: $forenames $surname
Address: $street, $city
Telephone: $phone\n"

}
}

set nameIndex [::tws::nvlist::toName employeeInfo2Fields
forenames]

# Select all of one field
foreach name [::tws::nvlist::filterIndex employeeInfo2 $nameIndex
*] {

append return_value "Hello $name\n"

}

return $return_value

}

The data and operations above were taken from the Tcl 8.5 manpages for
dict. The addItem statements are similar to the [dict set] statements
in the example. The field names are maintained in a separate list, but
can be used in either a foreach or an lassign to pull values out of
the ordered lists. The regular Tcl list commands appear to be much
more efficient than the dict commands, and they probably take up much
less space than the dict structure. I think that [dict] might make it
easier to handle the data in the form of employeeInfo, but
employeeInfo2 is a much more natural form, and doesn't have to repeat
the employeeID for each field. I really wonder why this is used as an
example for [dict] in the manpages?

Kevin Kenny

unread,
Nov 17, 2007, 2:18:54 PM11/17/07
to
Eric Hassold, Twylite, and Wojciech Kocjan all request that rows from
a database query all be returned as lists. Let me try to address the
concerns.

(1) Why a separate 'result set' object, as opposed to returning a list
of the resulting rows?

This question is addressed in the TIP, and is quite simple. I have
scripts that do things like flatten multi-gigabyte databases to ASCII
files. These tools can be a wonderful thing for backup, or for data
exchange among databases that use different engines. There is
something to be said for shipping a DVD+R with a Tcl script and an
ASCII file, having the recipient simply execute the script, and having
the database mirrored. In any case, for whatever good or bad reasons,
some people deal with result sets larger than memory. There needs to
be a way to iterate through these a row at a time.

Given the 'result set' interface, it's easy to create the 'list of
rows' representation, and in fact the '$statement execute' and '$db
execute' convenience procedures offer that possibility.

(2) Why choose dictionaries as opposed to lists?

The single reason that lists are undesirable is the possibility of
NULL values within a row. There is no graceful way to represent these.

This is a case that most of the database API's get wrong. For the
most part, they choose a string such as {}, {NULL} or {*null*} to
represent the NULL value. Using {} (the empty string) appears
natural, but I have had many cases where {} means something like
"Joe Blow has no middle name" while NULL means "I don't know Joe's
middle name". There needs to be a way to distinguish the cases.
Using a fixed string like {NULL} is only slightly better. Looking in
my local telephone directory, I find several individuals whose surname
is Null. Even a strange string like {*null*} is questionable, since
it opens another avenue that needs security audit - the possibility
that a malicious user will introduce such a string as content and
cause a failure in the app.

The suggestion of using an invalid UTF-8 sequence or a special object
type to represent NULL is also extremely questionable. For several
reasons, it wreaks havoc with Tcl's value system (for instance,
[string index] will fail because it converts the string to UCS-2
before examining it). For well or ill, "everything is a string" in
the calculus of Tcl values. The NULL value is not any string.

The principal alternative to using dictionaries appears to be to
take the approach that JDBC, for example uses. We could substitute
NULL values with some sort of default: likely the empty string for
character fields and an appropriately formatted zero for numeric
ones. We'd then need to provide something analogous to JDBC's
'isNull' method to inquire of a result set whether a given value is an
actual zero or empty string, or whether it is a NULL. Having to
perform this sort of explicit test seems just as awkward as using
dictionaries.

(3) How can we deal with columns having duplicate names?

Here Eric Hassold raises a valid point. It is possible for a SQL query
to return multiple columns with the same name. I've had the good
fortune to deal with interfaces that will at least rename (x#1, x#2,
etc...) in that case, but I see that neither MySQL or SQLite offers
that convenience.

My first inclination is to say, "This is a new interface. Since you'll
have to recode things anyway, Don't Do That." I don't know of any SQL
that fails to offer the 'AS' clause to rename a column in a result
set, so it's always at least possible to get distinct names. But I
realize that "Don't Do That" is not a reasonable position to
take. Some queries come from users, and may inadvertently duplicate
names. Some queries don't specify the names (SELECT * FROM a,b WHERE
a.foo=b.foo). And sometimes it's just inconvenient to write out the AS
clauses. The point is conceded.

Nevertheless, we have to reconcile the possibility of duplicate column
names with the possibility of NULLs. Here, I'm really tempted to
place the onus on the database driver to disambiguate the names, so
that dictionaries can still be used. I'd like to get some input from
actual driver implementors - if you don't maintain a database
interface, you probably don't have the necessary experience to comment
- about whether this constraint would be acceptable.

tom.rmadilo

unread,
Nov 17, 2007, 2:20:21 PM11/17/07
to
On Nov 17, 7:05 am, Eric Hassold <hass...@evolane.com> wrote:
> Michael Schlenker wrote :
>
> >> And, of course SQL allows multiple result columns to have the same
> >> name, which would cause the dict to create a sub-dict.
> > Which RDBMS supports that abdominal feature? Sounds like a really stupid
> > idea. (i know it allows multiple columns with names like table_1.a and
> > table_2.a, which is sane).

> Of course, explicit column naming with 'AS' would be recommended. But to


> answer your question/comment "which RDBMS supports that abdominal
> feature?, I would say most if not all.


Actually _EVERY_ RDBMS supports this feature. What none of the RDBMS's
support is ordered storage of data on disk (that is why you have order-
by, and should avoid select *). So, if you didn't enforce the ordering
of columns on select, you couldn't SELECT into a table using a select
statement.

tom.rmadilo

unread,
Nov 17, 2007, 2:41:34 PM11/17/07
to
On Nov 17, 11:18 am, Kevin Kenny <kenn...@acm.org> wrote:
> Nevertheless, we have to reconcile the possibility of duplicate column
> names with the possibility of NULLs. Here, I'm really tempted to
> place the onus on the database driver to disambiguate the names, so
> that dictionaries can still be used. I'd like to get some input from
> actual driver implementors - if you don't maintain a database
> interface, you probably don't have the necessary experience to comment
> - about whether this constraint would be acceptable.

NULL values have nothing to do with RDBMSs, at least with the Algebra
of Relations. This algebra is based upon perfect information. NULLs by
definition have no meaning. If the need to handle NULLs is the reason
for using a dict, man, what a bad reason. Even if a dict can
understand NULL values, Tcl doesn't. You still would have to
explicitly test for a NULL value, how can you eliminate this test? To
me it seem much more important to be able to insert a NULL value,
because either the interface or the database itself can substitute in
a default. In addition, assuming that you can use NULLs in a query,
you can specifically tell the database to insert whatever substitute
for NULL you want:

select
b.*,
rtrim(to_char(b.content_date, 'Day')) as day,
to_char(b.content_date,'Month DD, YYYY') as date,
case
when content_template_id is NULL
then '$default_content_template'
else
'$template_root/' || content_template_id
end
as template
from
cams_content b
where
b.content_type_id = :content_type_id
and
b.archive_p = 'f'
$ids_sql
$where_clause_limit
order by
$sort_by
$limit_clause

This API ends up using arrays, so column ordering isn't possible. $
vars are safe and :content_type_id is a bind var supplied by the user.
But also a NULL value is handled by substituting in a useful value, as
determined by the immediate circumstance.

Kevin Kenny

unread,
Nov 17, 2007, 3:00:39 PM11/17/07
to
Twylite wrote:

> So here is a list of specific concerns that arise from this
> exercise:
>
> (1) The DBI is complex. The comparative lengths of my example
> functions illustrate that clearly. If ::tcl::db::execute had a
> mechanism to report the ordered list of columns returned this would
> make "simple" development a lot easier (comparable to SQLite). This
> change alone would also void my comments about error handling (at
> least to some degree).

OK, so we start needing like [$db execute $sql -columns colsVar etc...]
That's easy enough to add; I can throw that in for the next round.

> (2) The amount of error handling this DBI requires is significant -
> 4 nested error handlers. Other DBIs require 1 to 3. This level of
> error handling detracts from my task of writing in functionality
> makes quality assurance more difficult.

If, as you say, the simple change of adding a place for [$db execute]
to return the column list fixes this, I'm not going to address it
for now. Feel free to follow up if you have further issues.

> (3) Taking variables from the current stack frame is a recipe for
> trouble, and makes writing logic like extract_report more complex
> and slower (best case: [dict with ...]; worst case: foreach {name
> value} { set ... }) I would prefer to see execute take a dict; then
> it is at least clear what variables you are (and are not) providing.
> Taking variables from the stack without clearly indicating them _in
> Tcl code_ (as opposed to another language like SQL) is a little to
> much of a DWIM for my liking.

The latest edits to TIP #308 allow for both versions. They are
both needed: fixed transactional queries like

select balance from accounts where account_number = :acct

really benefit from not having to fabricate a dictionary with the
substituents, while, as you observe, ad-hoc queries, often built
on the fly, need the extra isolation that a dictionary provides.

> (4) Returning the columns as a resultset is a real pain. Rather make
> the columns available as
> 4.a) a list of tuples; or
> 4.b) a list of column names, plus a function to get a dict of
> information about the column (given its name)

I'll think about this one a little bit. You're right that it's
most natural to the user to have it as list-of-tuples, and
that's easy to provide. It also seems more natural to the
database implementor to provide it as a result set, because it's
likely the result of a query against a system catalog. Probably
this is a spot where a little more glue is needed, to give both
sides of the interface their natural representation of the data.

> (5) The primitive function to retrieve a row needs to retrieve it as
> a list, IM(NS)HO.

I address this point in a separate message, and still consider it
an open item.

> (6) I am concerned by the number of ensembles that a simple query

> must create(and clean up). This smells like a performance concern,


> especially for a pure-Tcl implementation for a specific database.

Essentially, it's three: the database connection (which likely
persists, and likely has a fixed name), the statement, and the result
set. Given the fact that database drivers are free to overload [$db
execute] (and in so doing, eliminate the need for the statement and
result-set handles), and given that performance-critical applications
are likely to want the explicit statement handles anyway (because of
the gains achievable with prepared statements), do the tweaks to the
'execute' functionality that you discussed above address this concern?

If the concern is instead a general one with the "noun-verb" syntax of
[$handle command], I'd say, "it works for widgets, why shouldn't it
work for database connections?" This style has been around as long as
Tk has. While I agree that it remains controversial, I'd classify
under, "you can't please everyone."

> (7) A SELECT operation is fundamentally different from other
> commands that can be executed. This is clear for example when you
> read the spec for "resultset rows", which does not provide a
> meaningful value in the case of SELECT. How you query a resultset
> is thus dependant on the operation that was requested, which can
> make creating generic DB manipulation functions rather fun.

What's even more fun is that some things do both: Oracle's "UPDATE
... RETURNING ..." is a case in point. In any case, the 'statement'
and 'result set' interfaces proposed in the TIP are rather spare: it's
not immediately obvious to me that having separate 'exeute query' and
'execute update' functions will actually simplify things.

Summary:

- [$db execute] needs a place to store the column list, so that
user code doesn't need the result set for that purpose. Will do.
- [$statement execute] and friends all need to take, optionally
a dictionary of substituents. In the TIP already.
- Returning metadata as a result set is unnatural. Conceded.
I'll fix it.

Eric Hassold

unread,
Nov 17, 2007, 3:25:57 PM11/17/07
to
Hi,

Kevin Kenny wrote :

> Eric Hassold, Twylite, and Wojciech Kocjan all request that rows from
> a database query all be returned as lists.

Actually, I was only pointing out a possible flaw, and commenting about
this so-called "abominable" feature being quite usual. I find dict
elegant approach, but this has to be addressed. On the other side, rows
as list + method to retrieve null fields is more generic. So I really
have no definitive opinion (except maybe supporting both?)


tom.rmadilo wrote :

>> Of course, explicit column naming with 'AS' would be recommended. But to
>> answer your question/comment "which RDBMS supports that abdominal
>> feature?, I would say most if not all.
>
>
> Actually _EVERY_ RDBMS supports this feature. What none of the RDBMS's
> support is ordered storage of data on disk (that is why you have order-
> by, and should avoid select *). So, if you didn't enforce the ordering
> of columns on select, you couldn't SELECT into a table using a select
> statement.
>

I find my way to formulate answer, without uppercases and emphasis, more
gentle and adequate on a public forum. And, to be accurate, I was
explicitly commenting on "RDBMS supporting multiple columns having same
name" (not on rows being ordered), so "every" doesn't apply here. See
e.g. MS Access:

SELECT 1 AS a, 2 AS a FROM Customers;
ERREUR: Alias de sortie 'a' repliqué

yes, my Access version is localized. Something like:
ERROR: duplicate output alias 'a'


Eric


tun...@yahoo.com

unread,
Nov 17, 2007, 4:20:25 PM11/17/07
to
On Nov 17, 2:18 pm, Kevin Kenny <kenn...@acm.org> wrote:
> Eric Hassold, Twylite, and Wojciech Kocjan all request that rows from
> a database query all be returned as lists. Let me try to address the
> concerns.
>
> (1) Why a separate 'result set' object, as opposed to returning a list
> of the resulting rows?
>
> This question is addressed in the TIP, and is quite simple. I have
> scripts that do things like flatten multi-gigabyte databases to ASCII
> files. These tools can be a wonderful thing for backup, or for data


I do appreciate your initiative on this and wish you good luck.
However, I too have some criticism of the TIP. I would like to be
brief in this message so I will stay within the concerns already
raised.


This is very similar to the argument that you mentioned earlier where
you did not have proper access to the databases so you resorted to
copying large files around. No, this practice is not wonderful. I
wonder what kind of DBA will let you take the whole dump of the
database and transport it in dvd's or jump drives. (Not one concerned
about data privacy or security but one more concerned about securing
the database server program itself!) How many times has this been in
the news? I know of several active investigations into such practices
that exposed confidential data of millions of consumers.

When you do a large select, the database server manages the results
for you (all except sqlite which is admittedly not a database
server). Do not let popular clients like sql*plus fool you since they
display the whole result on to the screen. They are simple, generic
purpose, and rather rudimentary SQL clients. As a client, you can
check if you have results, retrieve some or all of it, or ignore the
whole thing completely and move on to your next query. It is your
choice. Just because the client issues a query to select terabytes
worth of data, it does not follow that it must have that much memory
or it must end up with/consume exactly that much data before doing
anything else. A client must be able to ask for a specific number of
rows (including zero) of that result at a time. Otherwise, the client
is not a client, it is a database server.


Sqlite does give you the whole result. And its API is different on
this point than all the others. I think it is wrong; however
considering its intended use, I see how it was a reasonable
assumption. However, it has caused me, and I am sure many others, to
come up with special mechanisms to deal with this so that the clients
do not get inundated with data they don't really need.


> exchange among databases that use different engines. There is
> something to be said for shipping a DVD+R with a Tcl script and an
> ASCII file, having the recipient simply execute the script, and having
> the database mirrored. In any case, for whatever good or bad reasons,


Not so fast! Apart from security, privacy, etc. concerns, what
happened to the cross references? Or to the foreign keys that must be
maintained internally or with another server altogether? How about
the sequences and auto generated keys? How are you tracking each use
and making sure that newly generated ones retain the structural
aspects among the data? There is a reason databases have formal
backup/recovery procedures.


I think it is important to maintain the distinction between the
database server and the client. The TIP should focus on how to enable
the communication between the two. No more, no less. (well, perhaps
a bit more if you'd like to offer creature comforts.)


On a related note, what is wrong with implementing the ODBC standard
interface? If this TIP can offer a stable tool with the same
functionality as TclODBC or snodbc on multiple platforms, I would be
happy. Many others and newcomers will be happy and feel right at
home. Many applications will not have to be re-written. Etc. etc.

> Given the 'result set' interface, it's easy to create the 'list of
> rows' representation, and in fact the '$statement execute' and '$db
> execute' convenience procedures offer that possibility.


This is the default/only behavior I have seen across the industry and
as you note, others have had the same experience too. Why make it a
second class possibility?

> (2) Why choose dictionaries as opposed to lists?
>
> The single reason that lists are undesirable is the possibility of
> NULL values within a row. There is no graceful way to represent these.
>
> This is a case that most of the database API's get wrong. For the
> most part, they choose a string such as {}, {NULL} or {*null*} to
> represent the NULL value. Using {} (the empty string) appears
> natural, but I have had many cases where {} means something like
> "Joe Blow has no middle name" while NULL means "I don't know Joe's
> middle name". There needs to be a way to distinguish the cases.


You are confusing theory and practice. This issue is fundamental, but
has been dealt with in relational algebra, in predicate calculus, in
logic programming languages, databases, etc. Some do it by ascribing
certain semantics to absence of facts. Some do it by ensuring that no
two null values are ever equal. It seems to me like the TIP is in the
early stages of reinventing the wheel. If you think your wheel is
really different, than you can handle it in your own client, any way
you like. This is purely a client side issue. The database servers
allow you to do anything you want with no confusion.


Furthermore, a database consists of tables. A table consists of
records or rows. A client asking for a record is valid in expecting
to receive something that resembles a record. A list (of lists) fits
that to a tee. A dict, not so much.

> Using a fixed string like {NULL} is only slightly better. Looking in


Again, you realize this is the client side representation of it,
dependent on the language and user preferences, right?

> (3) How can we deal with columns having duplicate names?
>
> Here Eric Hassold raises a valid point. It is possible for a SQL query
> to return multiple columns with the same name. I've had the good
> fortune to deal with interfaces that will at least rename (x#1, x#2,
> etc...) in that case, but I see that neither MySQL or SQLite offers
> that convenience.


It is an essential point. This is handled by the database servers as
you recognize. How/Why could an interface not allow it? I am
baffled.


> that dictionaries can still be used. I'd like to get some input from
> actual driver implementors - if you don't maintain a database
> interface, you probably don't have the necessary experience to comment
> - about whether this constraint would be acceptable.


How about you: have you imlemented a driver interface? :-) This seems
like a strong opinion where you insist that you will build it, and
they will come. In any case, I believe I qualify to comment on it and
hope you agree. You know, this reminds me of the type of questions
one gets on, say, Oracle or SQL Server vs. MySQL or Sqlite usenet
groups. Especially, on a place like Oracle, users really know
something about relational databases and it comes across quite
clearly. When you look at, say, MySQL posters, on the other hand, it
seems like they are trying to write SQL after learning PHP the week
before.

Again, as I said earlier, I appreciate your work and wish you good
luck and I hope you recognize the validity of the concerns raised by
the critics based on industry tradition, theory, practice and
experience.

tom.rmadilo

unread,
Nov 17, 2007, 4:22:50 PM11/17/07
to
On Nov 17, 12:25 pm, Eric Hassold <hass...@evolane.com> wrote:
> tom.rmadilo wrote :

> > Actually _EVERY_ RDBMS supports this feature.

> I find my way to formulate answer, without uppercases and emphasis, more


> gentle and adequate on a public forum.

Sorry about that, I didn't mean it as a scream. There are lots of
things that MS Access doesn't support, hopefully the DBI isn't
designed to make up for missing database features, this is an
interface, not an extension.

> And, to be accurate, I was
> explicitly commenting on "RDBMS supporting multiple columns having same
> name" (not on rows being ordered), so "every" doesn't apply here. See
> e.g. MS Access:
>
> SELECT 1 AS a, 2 AS a FROM Customers;
> ERREUR: Alias de sortie 'a' repliqué

Of course it isn't a question of finding a database which doesn't
support it, most real databases do support it. The reason is simple:
the names of the columns are also in a 'tuple', and they are presented
only once. Database driver vendors don't waste time sending the same
set of names for each row, the rows are an ordered list of values, no
names. Different database vendors will also select default names for
the columns in certain situations, like a function call, or a static
string. These names are not portable, so you can't rely on what name
you will get.

Again, you don't write a DBI to make up for deficiencies of a certain
database, or for how a particular database processes a query.

Eric Hassold

unread,
Nov 17, 2007, 6:05:48 PM11/17/07
to
tom.rmadilo wrote :

>
> Sorry about that, I didn't mean it as a scream. There are lots of
> things that MS Access doesn't support, hopefully the DBI isn't
> designed to make up for missing database features, this is an
> interface, not an extension.
>
> ....

>
> Of course it isn't a question of finding a database which doesn't
> support it, most real databases do support it. The reason is simple:
> the names of the columns are also in a 'tuple', and they are presented
> only once. Database driver vendors don't waste time sending the same
> set of names for each row, the rows are an ordered list of values, no
> names. Different database vendors will also select default names for
> the columns in certain situations, like a function call, or a static
> string. These names are not portable, so you can't rely on what name
> you will get.
>
> Again, you don't write a DBI to make up for deficiencies of a certain
> database, or for how a particular database processes a query.

I just totally agree with you. That's why I answered Michael's post,
pointing out many not so "abominable" RDBMS may return results with
several columns having same name, and aggreagate functions will name
columns in an undefined way, etc... While dict suggestion is elegant
under some aspect, I also believe being able to access rows as an
ordered list of fields is the canonical and required way to go. dict
could be provided as an option, at discretion of the programmer when he
knows each column will be explictly and uniquely named. But it seems to
me that ordered list is the only generic way to return a row, especially
in a DBI with many different DB backends.

Eric

Eric Hassold

unread,
Nov 17, 2007, 6:24:50 PM11/17/07
to
Kevin Kenny wrote:

> (3) How can we deal with columns having duplicate names?
>
> Here Eric Hassold raises a valid point. It is possible for a SQL query
> to return multiple columns with the same name. I've had the good
> fortune to deal with interfaces that will at least rename (x#1, x#2,
> etc...) in that case, but I see that neither MySQL or SQLite offers
> that convenience.
>
> My first inclination is to say, "This is a new interface. Since you'll
> have to recode things anyway, Don't Do That." I don't know of any SQL
> that fails to offer the 'AS' clause to rename a column in a result
> set, so it's always at least possible to get distinct names.

The interface is new, but you may probably expect many users to try to
reuse the SQL part they were using with their previous, backend
specific, interface into the new interface. Lots of code out there are
just doing things like:

SELECT avg(price),min(price),max(price) from orders;

without naming columns explicitly. If you don't provide the standard
mechanism to let the user retrieve average, minimum and maximum values
as resp. 1st, 2nd, and 3rd fields in row, but try to enforce them to
rewrite their SQL code too, you take the risk to discourage them from
adopting new DBI. Once again, I find the dict approach elegant as an
option, but I believe providing access to rows as ordered sets (i.e.
lists) should be preserved as the canonical interface.

Eric

Message has been deleted

Darren New

unread,
Nov 17, 2007, 10:33:21 PM11/17/07
to
Donal K. Fellows wrote:
> Single-process
> database interfaces that do not support transaction isolation MAY
> return an error if the caller requests *-autocommit true*; they
> SHOULD accept *-autocommit false* silently.

This seems backwards to me. If you're not capable of bundling five
statements into one transaction, it would seem I want to know that when
I turn off autocommit. Turning on autocommit should be trivial on any
connection that supports transactions: the extension simply adds a
"COMMIT" to the end of every statement. Default to autocommit true, and
throw an error when -autocommit false is set, or you risk really mucking
up your data.

> TRANSACTION ISOLATION
> Both commands MUST take no action if the *-autocommit* option is true.

Should there be something to see if a transaction is outstanding? Should
there be something to return the transaction isolation level (which is
AFAIK almost always chosen from a standard list)?

> particular, if the underlying database engine supports transactions, it
> SHOULD be considered an error to commit any work that remains
> uncommitted on process exit.

This is worded oddly. Who is doing the considering?

> PREPARING STATEMENTS
> ----------------------
> The database interface MUST support substitutions in /SQL-code/. Each
> substitution request has the form /:variableName/.

Personally, I'd use $ instead of :, since everyone using Tcl knows what
that means, and : is part of Tcl variable names. The places I've seen $
used in table names are system tables, or for namespace type things,
just where Tcl uses :. Either SYS\$NAMES for a table with dollar signs
in it, or use the @ so it doesn't conflict with Tcl names?

After all, how would one insert the value of the HOME environment variable?
select * where home=":::env(HOME)"
or something?

> The return value from the *prepare* command is a /statement handle/,
> discussed under "The statement interface" below.

> Likewise, a database handle MUST support the *resultsets* command:

Add in a way to get the list of open database connections?

> Other columns MAY be included in the result sets of *tables* and
> *columns*, and SHALL be added to this document (as optional columns) on
> request from the implementors of database interfaces.

Maybe an INDEXES command too, or some such. Is LONGVARCHAR the same as a
"text blob"?

> * /statementHandle/ *execute*
>
> Executes a statement against a database. Any variable
> substitution present in the SQL that was provided when the
> statement was created MUST be performed at this time, with the
> variable values being obtained from the scope in which the
> *execute* command was evaluated. Any variable that is undefined
> in that scope must be replaced with a /NULL/ value. An array
> variable provided to a substituent MUST result in an error.

It would be nice to see a subcommand that would take an array (or dict,
or keyed list) and use the values from particular indecies as values to
the statement. I.e.,
if myValues is an array, the equivalent of
foreach {name val} $myValues {set $name $val} ;
statementHandle execute
Then values don't have to be in a bunch of individual variables, which
is always kind of nice when you're working with mathematical sets.

This could be done as a procedure, if it could get into the ensemble, of
course.

> THE RESULT SET INTERFACE
> --------------------------
>
> Result sets represent the results of operations performed on the
> database. A preferred implementation for large result sets is that they
> be implemented as database cursors, so that it is possible to iterate
> over result sets that will not fit in memory.

It might make sense to let this be controlled somewhere along the way,
such as at the time of the execute? MySql allows it, because in many of
the physical layout engines, having a cursor open leaves all the rows
locked until you finish with it.

> /Rationale:/ As far as the author of this TIP is aware, no
> existing database API makes use of dictinaries for result sets;

fbsql :-)


> CONVENIENCE PROCEDURES FOR THE PROGRAMMER
> -------------------------------------------

> * *::tcl::db::allrows* /statementHandle/
>
> Runs the given statement, obtaining a result set. Applies the
> *nextrow* command to the result set repeatedly, and constructs a
> Tcl list whose elements are the rows returned. When no rows
> remain, closes the result set and returns the list of rows.

Actually, if you made this specificially an ensemble command, it would
provide an easy way of saying "don't use a cursor that locks the
database for long - suck it up into memory for me quickly."


> * *::tcl::db::execute* /dbHandle/ /SQL-code/ ?/variable/?
> ?/script/?
>
> Prepares a statement with the given SQL code and executes it. If
> a /script/ argument is supplied, retrieves rows from the result
> set one at a time and executes the given script once per row with
> the row dictionary stored in the given variable. If the /script/
> argument is omitted, instead stores in /variable/ a list of all
> the rows returned. If neither /variable/ nor /script/ is present,
> the result set is simply discarded. In any case, the result set
> and the statement are both closed, and the return value from
> *execute* is the number of rows affected by the operation (for
> *INSERT*, *UPDATE* and *DELETE* operations) and may be an empty
> string for *SELECT* operations.

::tcl::db::execute2 dbHandle SQL-code inDict variable script
prepares a statement with the given SQL code, taking the
values of variables to be substituted specificially and only
from the values of corresponding keys in inDict. Otherwise, the
same as execute.
(Bad name, tho.)

Otherwise, very nice! I'll go read the rest of the comments now. :-)

--
Darren New / San Diego, CA, USA (PST)
Remember the good old days, when we
used to complain about cryptography
being export-restricted?

Darren New

unread,
Nov 17, 2007, 10:39:36 PM11/17/07
to
Roy Terry wrote:
> I must second this sentiment. To me dicts
> are convenient but exotic and high overhead.

Or in commands that return a dict, have a
-returnDict varname
or
-returnList varname
or some such?

Darren New

unread,
Nov 17, 2007, 10:41:18 PM11/17/07
to
tom.rmadilo wrote:
> And, of course SQL allows multiple result columns to have the same
> name, which would cause the dict to create a sub-dict.

Doctor, doctor, it hurts when I do this!

Donal K. Fellows

unread,
Nov 18, 2007, 1:38:18 PM11/18/07
to
Eric Hassold wrote:
> I just totally agree with you. That's why I answered Michael's post,
> pointing out many not so "abominable" RDBMS may return results with
> several columns having same name, and aggreagate functions will name
> columns in an undefined way, etc... While dict suggestion is elegant
> under some aspect, I also believe being able to access rows as an
> ordered list of fields is the canonical and required way to go. dict
> could be provided as an option, at discretion of the programmer when he
> knows each column will be explictly and uniquely named. But it seems to
> me that ordered list is the only generic way to return a row, especially
> in a DBI with many different DB backends.

That's fine, but you'll have to have some way of defining how to handle
NULL in that case. And no, you'll not persuade me to let NULL go into
the Tcl language.

Donal.

Eric Hassold

unread,
Nov 18, 2007, 2:34:52 PM11/18/07
to
Donal K. Fellows wrote :

I don't have any reason to try to persuade you, since I don't wish
either to have such NULL go into the Tcl language, and hope this DBI
won't require anything but current Tcl features.

I didn't follow the whole thread, but from your initial TIP, I would
thing of something like:

* /resultSetHandle/ *next* ?n?
skip to nth next row of data from the result set. if n is ommitted, skip
to next row. The return value of /nextrow/ MUST be 1 if a row exists,
and 0 if no rows remain in the result set.

* /resultSetHandle/ *row* ?variableName?
If variableName is ommited, returns content of the current row as a list
of values. If variableName is given, it stores row in the given variable
in the caller's context, as dict. ... (as described in initial TIP)

* /resultSetHandle/ *null* ?index?
If index is ommited, returns a list of columns indices having NULL value
in current row. If index is provided, returns true if column /index/ of
current row is NULL, false otherwise.


Just my 2 cents

Eric

-----
Eric Hassold
Evolane - http://www.evolane.com/

Darren New

unread,
Nov 18, 2007, 3:15:16 PM11/18/07
to
Eric Hassold wrote:
> * /resultSetHandle/ *null* ?index?
> If index is ommited, returns a list of columns indices having NULL value
> in current row. If index is provided, returns true if column /index/ of
> current row is NULL, false otherwise.

I would let the index also be the name of a column, personally.

thel...@gmail.com

unread,
Nov 18, 2007, 4:13:55 PM11/18/07
to

Oratcl currently supports allowing the user to decide if the rows will
be returned as a list 'or' to choose the result s to be stored in a
tcl hash. With a hash member for each column name.

Some have pointed out that SQL allows you to have multiple columns
with the same name.

Oracle does allow this.

SQL> select u_id, u_id, u_id from uni;

U_ID U_ID U_ID
---------- ---------- ----------
1 1 1
2 2 2
1 1 1
2 2 2
1 1 1
2 2 2

6 rows selected.

And my Oratcl extension using Tcl Hash's (arrays) would break with
this kind of query. I never considered this before. Thanks for
bringing it up..

-Todd

Jeff Hobbs

unread,
Nov 18, 2007, 5:43:14 PM11/18/07
to
Kevin Kenny wrote:
> (2) Why choose dictionaries as opposed to lists?
>
> The single reason that lists are undesirable is the possibility of
> NULL values within a row. There is no graceful way to represent these.
>
> This is a case that most of the database API's get wrong. For the
> most part, they choose a string such as {}, {NULL} or {*null*} to
> represent the NULL value. Using {} (the empty string) appears
> natural, but I have had many cases where {} means something like
> "Joe Blow has no middle name" while NULL means "I don't know Joe's
> middle name". There needs to be a way to distinguish the cases.

Why can there not be a -null $value option? Default it to auto, which
is {} for strings/blobs and 0 for numbers.

I do agree that dicts can represent null existence properly (the query
of existence by dynamic languages is their best handling of "null"), but
it seems decisions are being made to handle 100% of cases at the expense
of 80+% of "ordinary" database users. (that point applies to list v.
dict / result_obj as well). It seems a small thing to support that
would make most users find things "just work". A similar issue exists
for tktable, which is why it has a -sparsearray option. Having a couple
more options isn't a bad thing ... having not the right defaults for
most users is.

BTW, it is overall well thought out, I'm just arguing on the details.

Jeff

Donal K. Fellows

unread,
Nov 18, 2007, 6:50:10 PM11/18/07
to
Eric Hassold wrote:
> I don't have any reason to try to persuade you, since I don't wish
> either to have such NULL go into the Tcl language, and hope this DBI
> won't require anything but current Tcl features.

Define "current", and realize that this is planned to be an interface
for 8.5 or 8.6; supporting older versions isn't really on our roadmap.

> I didn't follow the whole thread, but from your initial TIP, I would
> thing of something like:

I see you've not read the TIP. In particular, you've not read who the
author is. I wonder what else you've missed. :-)

Donal.

Eric Hassold

unread,
Nov 18, 2007, 8:08:05 PM11/18/07
to
Donal K. Fellows wrote :

> Eric Hassold wrote:
>> I don't have any reason to try to persuade you, since I don't wish
>> either to have such NULL go into the Tcl language, and hope this DBI
>> won't require anything but current Tcl features.
>
> Define "current", and realize that this is planned to be an interface
> for 8.5 or 8.6; supporting older versions isn't really on our roadmap.

I had 8.5 in mind when writing "current". I know it isn't stricly
"current" yet but... after waiting for it so long, can't help
anticipating, now it's a matter of weeks.

>
>> I didn't follow the whole thread, but from your initial TIP, I would
>> thing of something like:
>
> I see you've not read the TIP. In particular, you've not read who the
> author is. I wonder what else you've missed. :-)

Propably much. I should stop trying to give comments when brain is in
sunday mode, and my 2 cents contributions don't even worth 1!

Actually, I read it 3 days ago. Then, my brain forgot about the TIP
header, and stupidly considered the message sender (you), as displayed
at the top of the thread on google groups interface, as the author of
the message content. My apologize, Kevin.

Eric

Kevin Kenny

unread,
Nov 18, 2007, 10:38:59 PM11/18/07
to
Eric Hassold wrote:
> Actually, I read it 3 days ago. Then, my brain forgot about the TIP
> header, and stupidly considered the message sender (you), as displayed
> at the top of the thread on google groups interface, as the author of
> the message content. My apologize, Kevin.

If you're criticising the specification, I think it's perfectly
all right for you to throw brickbats at Donal instead of me. (No,
that's not true, I deserve all the brickbats that result from this
controversy. Throw the TIP 257 ones Donal's way.)

Twylite

unread,
Nov 19, 2007, 2:02:10 AM11/19/07
to
On Nov 17, 3:06 pm, Michael Schlenker <schl...@uni-oldenburg.de>
wrote:

> So building the list of lists style from a dict based resultset would
> look like this:
>
> proc lol {resultset {nullvalue}} {
> set cols [$resultset columns]
> set columns [list]
> dict create nulldict
> while {[$cols nextrow col]} {
> set name [dict get $col name]
> lappend columns $name
> dict set nulldict $name $nullvalue
> }
> set results [list]
> while {[$resultset nextrow row]} {
> set rowlist [list]
> set rvalue [dict merge $nulldict $row]
> foreach col $columns {
> lappend rowlist [dict get $rvalue $col]
> }
> lappend results $rowlist
> }
>
> }

I think this neatly illustrates my performance and complexity
concerns ;)

Here is a copy of my example from above to go from row-as-list to row-
as-dict:

set rowdict [dict create]
foreach {name value isnull} [$resultset columns] [$resultset nextrow]
[$resultset nulls] {
if { ! $isnull } {
dict set rowdict $name $value
}
}

Twylite

Twylite

unread,
Nov 19, 2007, 2:33:09 AM11/19/07
to
On Nov 17, 9:18 pm, Kevin Kenny <kenn...@acm.org> wrote:
> (1) Why a separate 'result set' object, as opposed to returning a list
> of the resulting rows?

Ick - leave my name out of this one ;)
I support nextrow as the primitive operation with list-of-lists as the
function built on top of it (for exactly this reason).
I questioned representing the resultset as an ensemble vs a data
handle that one can query by passing it to a function (like pgintcl's
pg_result) on performance grounds.

> This question is addressed in the TIP, and is quite simple. I have
> scripts that do things like flatten multi-gigabyte databases to ASCII
> files. These tools can be a wonderful thing for backup, or for data

...


> Given the 'result set' interface, it's easy to create the 'list of
> rows' representation, and in fact the '$statement execute' and '$db
> execute' convenience procedures offer that possibility.

Agreed completely.

> (2) Why choose dictionaries as opposed to lists?
>
> The single reason that lists are undesirable is the possibility of
> NULL values within a row. There is no graceful way to represent these.
>

> The principal alternative to using dictionaries appears to be to
> take the approach that JDBC, for example uses. We could substitute
> NULL values with some sort of default: likely the empty string for
> character fields and an appropriately formatted zero for numeric
> ones. We'd then need to provide something analogous to JDBC's
> 'isNull' method to inquire of a result set whether a given value is an
> actual zero or empty string, or whether it is a NULL. Having to
> perform this sort of explicit test seems just as awkward as using
> dictionaries.

set rowdict [dict create]


foreach {name value isnull} [$resultset columns] [$resultset nextrow]
[$resultset nulls] {
if { ! $isnull } {
dict set rowdict $name $value
}
}

Compare that to Michael Schlenker's dict-to-list example, then
consider which is awkward.
In this example I'm assuming some dbhandle-wide configurable value to
represent NULL (SQLite's nullvalue).

What's important to me is that we get the primitives right. IMHO it
is better (in ease of Tcl/C development, in performance and in
elegance) to build a dict from a list interface than a list from a
dict interface.

> My first inclination is to say, "This is a new interface. Since you'll
> have to recode things anyway, Don't Do That." I don't know of any SQL
> that fails to offer the 'AS' clause to rename a column in a result
> set, so it's always at least possible to get distinct names. But I
> realize that "Don't Do That" is not a reasonable position to
> take. Some queries come from users, and may inadvertently duplicate

Yay :)

> actual driver implementors - if you don't maintain a database
> interface, you probably don't have the necessary experience to comment
> - about whether this constraint would be acceptable.

Actually the expectations of a SQL developer are quite important
here. When I write a SQL statement "SELECT a AS x, b AS x, c AS
x ..." I expect to get an ordered set of results per row, and I expect
the columns to all be named "x". Doing clever things in the DBI is
not a good idea because it breaks the SQL developer's expectations.
It is not unusual in a report to have several columns entitled
"Quantity" or "Code", and the developer should expect that the DBI
doesn't arbitrarily retitle one of those columns.

An extra 2c
Twylite

Twylite

unread,
Nov 19, 2007, 2:59:18 AM11/19/07
to
On Nov 17, 10:00 pm, Kevin Kenny <kenn...@acm.org> wrote:
> OK, so we start needing like [$db execute $sql -columns colsVar etc...]
> That's easy enough to add; I can throw that in for the next round.

Sounds about right. I'm thinking that this execute is starting to
take a lot of arguments, and an -option/value approach may feel nicer.
e.g. tcl::db::execute dbHandle -sql required_sql_stmt -params
optional_dict_or_use_locals -nullvalue optional_null_substitute ...
You could also provide -variable and -script, or leave them out to get
a list-of-lists returned ;)

I'll do some thinking about this and propose an uber-function that
will solve world poverty, etc ;)

> > (2) The amount of error handling this DBI requires is significant -

> If, as you say, the simple change of adding a place for [$db execute]
> to return the column list fixes this, I'm not going to address it
> for now. Feel free to follow up if you have further issues.

I think I can live with this ;) I would like to see it addressed
before the TIP is finalised as catch {} has horrible performance
implications.

> > (3) Taking variables from the current stack frame is a recipe for

> The latest edits to TIP #308 allow for both versions. They are

I like this solution - best of both worlds.

> > (4) Returning the columns as a resultset is a real pain. Rather make
> > the columns available as

> I'll think about this one a little bit. You're right that it's
> most natural to the user to have it as list-of-tuples, and
> that's easy to provide. It also seems more natural to the
> database implementor to provide it as a result set, because it's
> likely the result of a query against a system catalog. Probably
> this is a spot where a little more glue is needed, to give both
> sides of the interface their natural representation of the data.

I know I suggested list-of-tuples, but a list-of-names really is the
most common use of this information, so for reasons of (performance,
elegance, etc) I feel that the ordered list of names should be the
primitive. Not sure what to build around that to retrieve the other
information though, will do some thinking.

> > (5) The primitive function to retrieve a row needs to retrieve it as
> > a list, IM(NS)HO.
> I address this point in a separate message, and still consider it
> an open item.

Reply likewise separate.

> > (6) I am concerned by the number of ensembles that a simple query
> > must create(and clean up). This smells like a performance concern,
> > especially for a pure-Tcl implementation for a specific database.
> Essentially, it's three: the database connection (which likely
> persists, and likely has a fixed name), the statement, and the result
> set. Given the fact that database drivers are free to overload [$db
> execute] (and in so doing, eliminate the need for the statement and
> result-set handles), and given that performance-critical applications
> are likely to want the explicit statement handles anyway (because of
> the gains achievable with prepared statements), do the tweaks to the
> 'execute' functionality that you discussed above address this concern?

This is more of a concern about the use of ensembles than of splitting
the process into db -> statement -> resultset (which is, I agree,
completely necessary). Doesn't the use of ensembles/objects have a
far higher performance overhead than returning a data handle which is
queried by a function (like pgintcl's pg_result), especially when
those ensembles are short-lived (in particular the resultsets)?

> If the concern is instead a general one with the "noun-verb" syntax of
> [$handle command], I'd say, "it works for widgets, why shouldn't it
> work for database connections?" This style has been around as long as
> Tk has. While I agree that it remains controversial, I'd classify
> under, "you can't please everyone."

Point conceded. All objects in Tcl (whatever OO system) will work as
noun-verb syntax, so coding and QA approaches will have to deal with
it.

> > (7) A SELECT operation is fundamentally different from other

> What's even more fun is that some things do both: Oracle's "UPDATE
> ... RETURNING ..." is a case in point. In any case, the 'statement'
> and 'result set' interfaces proposed in the TIP are rather spare: it's
> not immediately obvious to me that having separate 'exeute query' and
> 'execute update' functions will actually simplify things.

Erk! I knew there was a reason I avoided Oracle ;) Okay, sounds like
having a separate "query" doesn't buy anything.

> Summary:

- [$db execute] needs an option to store a column list, and could be
more generic.
- Column names should be directly available from the resultset as a
list.
- Concern about performance implications of representing resultsets as
ensembles/objects; both because of ensemble create/teardown overhead,
and additional error handling levels & complexity.

Regards,
Twylite

Twylite

unread,
Nov 19, 2007, 3:08:16 AM11/19/07
to
On Nov 18, 5:33 am, Darren New <d...@san.rr.com> wrote:
> Personally, I'd use $ instead of :, since everyone using Tcl knows what
> that means, and : is part of Tcl variable names. The places I've seen $
> used in table names are system tables, or for namespace type things,
> just where Tcl uses :. Either SYS\$NAMES for a table with dollar signs
> in it, or use the @ so it doesn't conflict with Tcl names?

Argh! Please no! This puts you straight into quoting hell,
_especially_ if you have to build your SQL query dynamically.

This character needs to be a reserved character in SQL if at all
possible, and/or a quasi-standard in DBIs (which KevinK seems to
indicate that it is in the case of a colon).

Regards,
Twylite


Twylite

unread,
Nov 19, 2007, 3:17:31 AM11/19/07
to
On Nov 19, 5:38 am, Kevin Kenny <kenn...@acm.org> wrote:
> that's not true, I deserve all the brickbats that result from this
> controversy. Throw the TIP 257 ones Donal's way.)

I've used ducktape to fix a whole can of whupass onto the end of a
brickbat for TIP 257 ;)

On this topic - I've brought up a couple of performance considerations
in which I have assumed that ensemble creation/teardown is
(significantly) slower than passing around a data reference. I wonder
if DKF could weight in one this, with his knowledge of the Tcl core
and his TclOO implementation?

Also does anyone know the performance implications of multiple
embedded catch {} statements versus the trick that I used with a
finally script:
set finally_script {}
catch {
do_x
lappend finally_script [list some cleanup code]
...
}
foreach script $finally_script {
eval $script
}

One catch for n-depth error handling, but multiple evals at the end
(which can't be byte code compiled if I'm not mistaken).

Regards,
Twylite

Twylite

unread,
Nov 19, 2007, 4:26:41 AM11/19/07
to
FWIW here is a sample implementation of tcl::db::execute. It uses an
option/value style for parameters, assumes a resultset interface
slightly different to the TIP (see below) and has not been tested ;)

Regards,
Twylite

# UTILITY FUNCTION: dictx pop dictVar name ?default?
# Finds name in the dict and returns the corresponding value, then
removes the
# name/value pair from the dict. If name does not exist then the
default is
# returned (if present), or an error is thrown.

# This example code uses the resultset interface proposed by Eric
Hassold:
# * /resultSetHandle/ *next* ?n?
# * /resultSetHandle/ *row* ?variableName?
# * /resultSetHandle/ *null* ?index?
# The columns subcommand if also different to the TIP:
# * /resultSetHandle/ *columns* ?name?
# If name is omitted, returns an ordered list of columns in the
resultset.
# If name is present, returns a dict of information about the column.
The
# dict MUST include the key "name" and SHOULD include the keys "type",
"scale"
# and "percision". Other keys MAY be included. Throws an error if
the named
# column is not defined in the resultset.
# If the resultset does not include row/column information (such as
the result
# of an INSERT, DELETE or UPDATE) then columns MUST return an empty
list.


# Prepares and executes a statement with the given SQL code.
# Returns the number of rows affected or retrieved by the operation.
# Options:
# -sql stmt : (required) SQL statement to execute
# -params dict : (optional) parameters to the SQL statement, as a
dict. If not
# present and the statement has parameters the substituents will be
obtained
# from variables in the current scope.
# -columns varname : (optional) if present the indicated variable will
be
# populated with an ordered list of column names before the result
rows are
# processed.
# -listvar varname : (optional) variable to be populated with an
ordered list
# of fields in the result row.
# (Alternative name: -fieldlist)
# -dictvar varname : (optional) variable to be populated with a dict
of fields
# (column name / value) in the result row.
# (Alternative name: -fielddict)
# -script script : (optional) if present the results will be
retrieved one row
# at a time; for each row the listvar and/or fieldvar will be
populated then
# the script will be executed. If absent _or empty_ then all rows
are
# retrieved and will be stored into listvar as a list-of-lists and/
or into
# dictvar as a list-of-dicts.
proc ::tcl::db::execute {dbhandle args} {

# Sanity check
if { [info commands $dbhandle] eq {} } {
error "can not find database instance \"$dbhandle\""
}

# Parse parameters
set sql [dictx pop args "-sql"]
set params [dictx pop args "-params" {}]
set columnsVar [dictx pop args "-columns" {}]
set listVar [dictx pop args "-listvar" {}]
set dictVar [dictx pop args "-dictvar" {}]
set script [dictx pop args "-script" {}]
if { [dict size $args] > 0 } {
error "unknown option \"[lindex $args 0]\""
}

# Error handling structure to ensure cleanup of resources
set finally_scripts [list]
set failed [catch {

# Prepare the statement
set stmthandle [$dbhandle prepare $sql]
lappend finally_scripts [list $stmthandle close]

# Execute the statement (in the caller's scope in case we need
substituents)
set resultset [uplevel 1 $stmthandle execute {*}$params]
lappend finally_scripts [list $resultset close]

# Get the result of the execution
set rows [$resultset rows]

# Provide column information if requested
if { $columnsVar ne {} } {
upvar $columnsVar columns
set columns [$resultset columns]
# WARNING: I am assuming that [$resultset columns] behaves
rationally for
# SQL statements that don't result rows (INSERT, DELETE,
UPDATE).
# "Rationally" probably involves returning an empty list, not
throwing
# an error.
}

# Before retrieving the rows link requested output variables to
this scope
if { $listVar ne {} } {
upvar $listVar fields
set resulttablel [list] ;# only used if -script ne {}
}
set dvref {}
if { $dictVar ne {} } {
upvar $dictVar fieldsdict
set dvref fieldsdict
set resulttabled [dict create] ;# only used if -script ne {}
}

# Retrieve rows
while { [$resultset next] != 0 } {

set fields [$resultset row {*}$dvref]
# Alternative: put the "set columns" about outside the if {}
block, and
# build the dvref dict if desired using
# "foreach name $columns value $fields { ... }"

if { $script ne {} } {
# Execute the script in the caller's scope
uplevel 1 $script
# CAUTION: script could RETURN, ERROR, BREAK, CONTINUE, etc.
Allowing
# BREAK is probably a good thing, but RETURN may get weird?
continue
}

# No script - build the result table(s) instead
if { $listVar ne {} } {
lappend resulttablel $fields
}
if { $dictVar ne {} } {
lappend resulttabled $fieldsdict
}

}

# All rows retrieved successfully; if there was no script we must
get the
# result tables back to the caller
if { $script eq {} } {
if { $listVar ne {} } {
set fields $resulttablel
}
if { $dictVar ne {} } {
set dvref $resulttabled
}
}

} e errorOpts]

# Run finally scripts
foreach script $finally_scripts {
catch $script
}

# Fail if there were errors
if { $failed } {
return -options $errorOpts "sql execute failed: $e"
}

# Return the number of rows affected, as reported by the DBI
return $rows
}

Donal K. Fellows

unread,
Nov 19, 2007, 5:13:16 AM11/19/07
to
Twylite wrote:
> On this topic - I've brought up a couple of performance considerations
> in which I have assumed that ensemble creation/teardown is
> (significantly) slower than passing around a data reference. I wonder
> if DKF could weight in one this, with his knowledge of the Tcl core
> and his TclOO implementation?

Ensemble setup/teardown is going to continue to be comparatively
expensive as it is aimed foremost at dealing with the static case
(FYI, 8.5b3 will support bytecode compilation of some ensembles, which
will make setup and teardown very expensive[*] but use otherwise
*very* fast). By contrast, I'm planning to put optimizations in for
TclOO to make creating objects of a class much cheaper in the normal
case (i.e. where no per-object configuration is done with
[oo::define]). Not done yet though; there's only so many hours in the
day. :-)

Donal.
[* Sufficiently expensive that it's a feature we won't be turning on
in most ensembles. ]

Donal K. Fellows

unread,
Nov 19, 2007, 5:25:03 AM11/19/07
to
Stephen wrote:
> How would one create a pool of database handles which could be shared
> between interps in different threads?

That's the sort of feature that I would view as being outside the scope
of the *standard* database interface. It's also another good reason for
connection creation to be outside the scope of the spec. Think of it
like this; we're talking about how you handle one of the things out of
the pool, not how you work with the pool itself.

Donal.

kie...@dunelm.org.uk

unread,
Nov 19, 2007, 7:38:42 AM11/19/07
to
Hello,

The standard database interface proposed in TIP #308 looks good,
just one small point ...

The proprietary Tcl <> IBM Informix library I use has a feature
which does not seem to be supported by the proposed interface.

The missing feature is the ability to get the value of the last
serial id generated by an insert into a table with a serial column.

To see what I mean, suppose I have a table defined as:

CREATE TABLE tFoo ( foo_id SERIAL NOT NULL, x INTEGER NOT NULL);

The foo_id column is a serial column (aka an auto-incrementing
or sequence column). If no foo_id value is specified when inserting
a row, the RDBMS will generate one automatically (typically with a
value one greater than the last value inserted, or 1 for the first
row).

So if the table is brand new, this statement:

INSERT INTO tFoo (x) VALUES (123);

will insert a row with a foo_id value of 1. Executing the statement
again would insert a row with a foo_id value of 2.

In some RDBMSs, the value generated can be retrieved with:

INSERT INTO tFoo (x) VALUES (123) RETURNING foo_id;

However, in IBM Informix this syntax is not available.

The proprietary interface I use provides an "inf_last_serial" command
which gives the value generated by a statement (if any).

Would it be possible to provide similar functionality in the
standard database interface?

I'll admit that workarounds do exist - e.g. a stored procedure could
be written to perform the insert and return the value using the
(rather nasty) syntax "return DBINFO('sqlca.sqlerrd1');".

Regards,
Kieran Elby

Kevin Kenny

unread,
Nov 19, 2007, 8:46:36 AM11/19/07
to
kie...@dunelm.org.uk wrote:
> The proprietary Tcl <> IBM Informix library I use has a feature
> which does not seem to be supported by the proposed interface.
>
> The missing feature is the ability to get the value of the last
> serial id generated by an insert into a table with a serial column.

The purpose of this specification is to provide a minimal set of
primitives that all the database interfaces must support. If a
particular database requires a non-standard extension to get
something done, it's free to implement additional primitives.
Since auto-serialisation of rows is something where database
APIs vary all over the place, right down to "no support,
get your serial numbers by UPDATE serial_table WHERE table_name
= 'my_table' SET sequence = sequence + 1", it's hard to imagine
managing it in a portable manner.

kie...@dunelm.org.uk

unread,
Nov 19, 2007, 9:26:17 AM11/19/07
to

OK, that's fair enough - I didn't realise until I did a little reading
just now quite how much this varied across different RDBMSs!

Incidentally, I like the "<dbhandle> transaction <script>" syntax,
though I'm less sure about not having commit and rollback primitives
available (though the -autocommit option still mentions them
in the TIP?).

In case it's useful to have another datapoint, the Tcl DB interface I
use (and like!) provides "begin", "commit" and "rollback" primitives.

In terms of TIP 308, "begin" requests "-autocommit false" behaviour
while "commit" and "rollback" restore the default "-autocommit true"
behaviour (in addition to committing/rolling back!).

The "transaction" command could be implemented very easily on top of
them:

proc transaction {db script} {
$db begin
# I'm being lazy & not handling continue/break/return codes
properly.
if { [catch {set r [uplevel 1 $script]} msg] } {
$db rollback
error $msg
} else {
$db commit
return $r
}
}

Regards,
Kieran Elby

Kevin Kenny

unread,
Nov 19, 2007, 9:34:28 AM11/19/07
to
kie...@dunelm.org.uk wrote:
> Incidentally, I like the "<dbhandle> transaction <script>" syntax,
> though I'm less sure about not having commit and rollback primitives
> available (though the -autocommit option still mentions them
> in the TIP?).

They're going back in. $db transaction {... script ...}
was proposed as a replacement (and will remain as a convenience
procedure!), but doesn't cover all the cases.

Darren New

unread,
Nov 19, 2007, 10:47:05 PM11/19/07
to
Twylite wrote:
> On Nov 18, 5:33 am, Darren New <d...@san.rr.com> wrote:
>> Personally, I'd use $ instead of :, since everyone using Tcl knows what
>> that means, and : is part of Tcl variable names. The places I've seen $
>> used in table names are system tables, or for namespace type things,
>> just where Tcl uses :. Either SYS\$NAMES for a table with dollar signs
>> in it, or use the @ so it doesn't conflict with Tcl names?
>
> Argh! Please no! This puts you straight into quoting hell,
> _especially_ if you have to build your SQL query dynamically.

And using : doesn't, if you want to use namespaced variables?

In any case, having the dict option gets around that problem, so I'm
happy there.

Darren New

unread,
Nov 19, 2007, 10:49:13 PM11/19/07
to
Kevin Kenny wrote:
> Since auto-serialisation of rows is something where database
> APIs vary all over the place, right down to "no support,
> get your serial numbers by UPDATE serial_table WHERE table_name
> = 'my_table' SET sequence = sequence + 1", it's hard to imagine
> managing it in a portable manner.

I think it's a useful enough feature to include. Otherwise, it basically
makes it impossible to usefully get the last sequence number created. If
your DB supports it, it should be presented at the Tcl level in the
standard way. If your DB doesn't support it, it should throw an error.
Just like getting the number of rows affected.

Kevin Kenny

unread,
Nov 19, 2007, 11:21:28 PM11/19/07
to
Darren New wrote:
> I think it's a useful enough feature to include. Otherwise, it basically
> makes it impossible to usefully get the last sequence number created. If
> your DB supports it, it should be presented at the Tcl level in the
> standard way. If your DB doesn't support it, it should throw an error.
> Just like getting the number of rows affected.

Uhm. If you do it in any database that supports it with a SQL
statement, of course it should Just Work. But if your database
needs some sort of magical API call, as Informix apparently does,
how is that to be represented in a portable Tcl interface in a
"standard way?"

Donal K. Fellows

unread,
Nov 20, 2007, 3:55:11 AM11/20/07
to
Darren New wrote:
> And using : doesn't, if you want to use namespaced variables?

Experience (with SQLite) suggests that the lack of namespaced
variables and even array elements is not disastrous. (It seems that
they're difficult to support due to parser ambiguities.)

> In any case, having the dict option gets around that problem, so I'm
> happy there.

Good. :-)

Donal.

Darren New

unread,
Nov 20, 2007, 2:18:39 PM11/20/07
to
Kevin Kenny wrote:
> Uhm. If you do it in any database that supports it with a SQL
> statement, of course it should Just Work. But if your database
> needs some sort of magical API call, as Informix apparently does,
> how is that to be represented in a portable Tcl interface in a
> "standard way?"

Well, the person responsible for writing the connector would have to
provide a routine to do it, yes? I mean, that's what an interface layer
is for. The same way you return the number of rows, which is *two* API
calls in MySql.

I don't understand what the problem is. If your windowing system needs
some sort of magical API call to set the focus to a particular window,
how does that get represented in a portable Tcl interface? Same idea.

--
Darren New / San Diego, CA, USA (PST)

It's not feature creep if you put it
at the end and adjust the release date.

Michael Schlenker

unread,
Nov 20, 2007, 3:56:21 PM11/20/07
to
Darren New schrieb:

> Kevin Kenny wrote:
>> Uhm. If you do it in any database that supports it with a SQL
>> statement, of course it should Just Work. But if your database
>> needs some sort of magical API call, as Informix apparently does,
>> how is that to be represented in a portable Tcl interface in a
>> "standard way?"
>
> Well, the person responsible for writing the connector would have to
> provide a routine to do it, yes? I mean, that's what an interface layer
> is for. The same way you return the number of rows, which is *two* API
> calls in MySql.
>
> I don't understand what the problem is. If your windowing system needs
> some sort of magical API call to set the focus to a particular window,
> how does that get represented in a portable Tcl interface? Same idea.
>
The problem is the amount of magic you want to have in the interface.

E.g. next the oracle users cry and want support for some feature (e.g.
async queries), then postgresql users want support for LISTEN and so
on..., and some point down the road you find a feature thats nearly
impossible to do. (oracles DESCRIBE is one of those, try to get the type
information for a query that does not return any rows from Microsoft SQL
Server or SQLite...)

Last serial is a common thing, so support would be nice to have for sure.

Michael

Donal K. Fellows

unread,
Nov 21, 2007, 4:09:20 AM11/21/07
to
Michael Schlenker wrote:
> Last serial is a common thing, so support would be nice to have for
> sure.

So maybe it's something to put in the standard API? Indeed, that's why
we have these sorts of discussions.

Donal.

tom.rmadilo

unread,
Nov 21, 2007, 10:24:48 AM11/21/07
to
On Nov 21, 1:09 am, "Donal K. Fellows"

Can anyone clarify the intent of this TIP? Is this API going to become
part of Tcl Core? And if so, are there any analogous APIs of similar
scope in Tcl Core?

I'm asking because there seems to be a major difference between
language elements ([expr], [string], [regexp], [dict]) and
application elements, which I can't find any examples of in Tcl/Tk
Core, with the exception of tclsh and wish.

Also, there were hints at some needed changes to Tcl Core to pull this
off, what are these changes, and are they going to receive a separate
TIP?

Finally, although only the Tcl scripting API is shown, it seems that
you would need Tcl C level support of the same features. Actually, C
level APIs are usually more complete, or fine-grained. Are there any
thoughts on this issue?

Darren New

unread,
Nov 21, 2007, 11:16:48 AM11/21/07
to
Michael Schlenker wrote:
> E.g. next the oracle users cry and want support for some feature (e.g.
> async queries), then postgresql users want support for LISTEN and so
> on...,

Obviously, if you can't support LISTEN than you can't support it. That
doesn't mean you shouldn't define the interface for when you *do*
support it.

I mean, there are primitive DB engines out there that don't support
returning rows one at a time either. I wouldn't want to leave out
support for that to accommodate brain-dead back ends either, when it's
so easy to emulate.

> Last serial is a common thing, so support would be nice to have for sure.

Right. I don't see any downside to saying "if your database connector
supports feature X, this is the syntax to use it from Tcl, and the
connector will translate it as appropriate."

We may need to figure out how to specify it. Perhaps you need to set a
flag in advance when you prepare the statement that says you'll want the
last ID back, in case you have to do something like "INSERT ...
RETURNING ..." or some such, rather than just having it at the low-level
interface like MySql presents it.

Darren New

unread,
Nov 21, 2007, 11:18:29 AM11/21/07
to
tom.rmadilo wrote:
> Also, there were hints at some needed changes to Tcl Core to pull this
> off, what are these changes, and are they going to receive a separate
> TIP?

I would think that async queries are likely to be the only part that
*really* needs to be in the core. Given how many DB APIs are already
out there as loadable modules, I can't imagine what core change you'd
need, other than a new channel type if you wanted asynchronous
non-blocking database queries. (And I'd still like to see *that* happen!)

tom.rmadilo

unread,
Nov 21, 2007, 12:51:05 PM11/21/07
to
On Nov 21, 8:18 am, Darren New <d...@san.rr.com> wrote:
> tom.rmadilo wrote:
> > Also, there were hints at some needed changes to Tcl Core to pull this
> > off, what are these changes, and are they going to receive a separate
> > TIP?
>
> I would think that async queries are likely to be the only part that
> *really* needs to be in the core. Given how many DB APIs are already
> out there as loadable modules, I can't imagine what core change you'd
> need, other than a new channel type if you wanted asynchronous
> non-blocking database queries. (And I'd still like to see *that* happen!)

I was thinking that the communication layer would be way below this
API. Shouldn't async be part of channel code, and isn't it already
there? The newest SQL standard SQL-2003 or ISO/IEC 9075, places
communication between the client and server as unspecified. But the
client in this case is the driver, the top layer is either the SQL-
agent, or even above that. In this scheme, there is a potential async
behavior because the SQL-agent communicates with the SQL-
implementation via a number of work/diagnostic areas which are
established and maintained by the SQL-client. The SQL-agent can be
relatively dumb, but the number of generic API would have to be
increased, but with each API being much simpler.

I also noticed that in ISO/IEC 9075, they define how to present NULL
values (as parameters) to the underlying SQL-client. The method seems
pretty simple, and has other uses:

An indicator parameter is an integer host parameter that is specified
immediately following another host parameter. Its primary use is to
indicate whether the value that the other host parameter assumes or
supplies is a null value.

The other use for indicator parameters is to indicate whether string
data truncation occurred during a transfer between a host program and
an SQL-implementation in host parameters or host variables. If a non-
null string value is transferred and the length of the target is
sufficient to accept the entire source value, then the indicator
parameter or variable is set to 0 (zero) to indicate that truncation
did not occur. However, if the length of the target is insufficient,
the indicator parameter or variable is set to the length (in
characters or octets, as appropriate) of the source value to indicate
that truncation occurred and to indicate original length in characters
or octets, as appropriate, of the source.

There are other meta-data parameters that can be used for things like
getting an offset into a very long character or binary field. The
indicator parameter might serve as the main switch for processing long
data fields (both input and output). This is also a nice thing to
build into a top layer API: silent truncation of data does happen, or
it could throw an error and abort an entire operation.

After considering this, I thought of dicts. What if each value itself
was a dict, instead of each row? This would not preclude the use of a
dict to represent a row, because you could also optionally add in
additional data to each value, like the ordinal position in the result
set. If each value was a dict, then this type of structure would
represent a very portable method of constructing, sharing and saving
values. It would also more correctly reflect the information
associated with each value in a database or query.

Donal K. Fellows

unread,
Nov 23, 2007, 5:40:26 AM11/23/07
to
tom.rmadilo wrote:
> Can anyone clarify the intent of this TIP? Is this API going to become
> part of Tcl Core? And if so, are there any analogous APIs of similar
> scope in Tcl Core?

This TIP is something of a new departure in that it is not intended as
something to change the core. (OK, might eventually end up with some of
the support code going in the core, but that's a separate matter.)

The aim is to define a script-level interface that makes it easy for the
majority of people to write SQL queries that are resistant to security
howlers like injection attacks, and which can be transferred (or rather
the experience transferred) from one database system to another.

It's not about GUIs. It's not about exposing every feature (adding extra
methods for system-specific stuff to the ensembles is Not A Big Deal).
Nor is it about the core magically gaining the ability to talk to every
single kind of DB! It's about making DBs easier to use for the majority;
about encouraging the spread of best practices.

It'll likely involve work by DBI authors. Sorry about that! :-)

The vision is that I, as a database *user*, will be able to do this:

# This part lies outside the TDBC spec
package require foobardb
foobardb dbcon -server db.foobar.com

# This part lies inside the TDBC spec
dbcon execute {SELECT x,y FROM foo WHERE bar LIKE :pattern} v {
puts "got x=[dict get $v x] and y=[dict get $v y]"
}
dbcon close

OK, it's a silly example. But it emphasizes what we're after;
commonality of user experience across DBs for basic operations.

> Also, there were hints at some needed changes to Tcl Core to pull this
> off, what are these changes, and are they going to receive a separate
> TIP?

I would imagine so.

> Finally, although only the Tcl scripting API is shown, it seems that
> you would need Tcl C level support of the same features. Actually, C
> level APIs are usually more complete, or fine-grained. Are there any
> thoughts on this issue?

I think there will be no C level API for this. I do not see the need.

Donal.

Donal K. Fellows

unread,
Nov 23, 2007, 5:41:31 AM11/23/07
to Darren New
Darren New wrote:
> I would think that async queries are likely to be the only part that
> *really* needs to be in the core. Given how many DB APIs are already
> out there as loadable modules, I can't imagine what core change you'd
> need, other than a new channel type if you wanted asynchronous
> non-blocking database queries. (And I'd still like to see *that* happen!)

Out of scope. Seriously. :-)

Donal.

Donal K. Fellows

unread,
Nov 23, 2007, 5:46:47 AM11/23/07
to
After reading through this thread again, I feel more and more like
there's a severe danger of the valuable parts of having a common DB API
being thrown out. It's time to say to ourselves "gloves!"

http://worsethanfailure.com/Articles/The_Complicator_0x27_s_Gloves.aspx

Donal.

Kevin Kenny

unread,
Nov 23, 2007, 9:42:35 AM11/23/07
to

I'm not *quite* ready to put the gloves on.

There are about half a dozen things that I know as of this moment that
I still need to do to the spec.

(1) The details of transaction handling still aren't quite right.
'-autocommit' is a redundant option. We still need explicit
starttransaction/commit/rollback options (for when your flow
of control comes from the event loop). We need transaction
isolation levels.

(2) Stored procedures (particularly ones with INOUT parameters)
still have to be addressed. Oh yes, and stored procedures
can return REFCURSORS.

(3) Batched updates could be addressed.

(4) Asynchronous queries could be addressed.

(5) Quite against my better judgment, I'm coming to the view that
both dicts and lists are needed for result sets. Dicts have
the advantages that columns can be located by name, and that
NULL has a natural representation. Lists have the advantage
that results with duplicate names, or unnamed results, can
be handled gracefully, and that report writing (with simple
rules like "leave columns containing null values blank")
is more convenient.

There's been just a week of public discussion; it's not reasonable
to expect that all the details are right in so short a time.

Russell Bell

unread,
Nov 23, 2007, 6:11:11 PM11/23/07
to

Async DB queries are a fairly common requirement. I'd like to
improve our application so long-running SQL queries don't tie up
the GUI, and so we could provide a cancel button in some places.

I maintain fbsql, a MySQL TCL extension. My brother wrote it in 2000.
The library is used extensively in our own application (there's about
4600 SQL queries/operations, spread through 180,000 lines of TCL) so
naturally I'm not about to break compatibility, without good reason.

I do intend to add support for async queries when I get the time.
If the TDBC defined an async API I would probably follow it's design.

The fbsql extension is in the public domain. It's MySQL-specific
and likely to remain so. I imagine an implementation of TDBC would be
largely pure TCL, built on top of database-specific C extensions like
fbsql.

Glad to hear row-results as lists will be in TDBC. Generally speaking
the fbsql API takes a pragmatic approach; regular TCL substitution,
lists, default values for NULLs. Prepared statements are a solution
to a problem we don't have (insertion attack) and I like the convenience
of normal TCL substitution, for layering the database-aware code.

Russell

Darren New

unread,
Nov 24, 2007, 11:18:06 AM11/24/07
to
Kevin Kenny wrote:
> (1) The details of transaction handling still aren't quite right.
> '-autocommit' is a redundant option. We still need explicit
> starttransaction/commit/rollback options (for when your flow
> of control comes from the event loop). We need transaction
> isolation levels.

Seconded.

> (2) Stored procedures (particularly ones with INOUT parameters)
> still have to be addressed. Oh yes, and stored procedures
> can return REFCURSORS.

Never used em (well, not in Tcl), but I can see the need in a larger
system with multiple applications using the same database.

> (3) Batched updates could be addressed.

Other than transactioned updates, I'm not sure what you mean. Bulk load?

> (4) Asynchronous queries could be addressed.

Could be, but I can see how this can be difficult if the underlying
database API doesn't allow for it. Settle for ensuring thread safety?

> (5) Quite against my better judgment, I'm coming to the view that
> both dicts and lists are needed for result sets. Dicts have
> the advantages that columns can be located by name, and that
> NULL has a natural representation. Lists have the advantage
> that results with duplicate names, or unnamed results, can
> be handled gracefully, and that report writing (with simple
> rules like "leave columns containing null values blank")
> is more convenient.

Most database APIs I've used support both pretty seamlessly. I don't see
a need to not do so.

Just my $0.02.

Darren New

unread,
Nov 24, 2007, 11:21:08 AM11/24/07
to
Russell Bell wrote:
> I maintain fbsql, a MySQL TCL extension.

For the record, I've been using this. It's nicely straightforward, and
it's easy to see just examining the code that it isn't broken. :-)

But unless MySql's DB layer supports non-blocking calls, I don't see it
happening in a convenient event-driven way.

I.e., I'd love to see asynch db access, while recognising that it's
really, really difficult to do in general. :-)

thel...@gmail.com

unread,
Nov 24, 2007, 6:38:33 PM11/24/07
to

So since we are in the tip process and as such we can propose and
implement anything.. I'm curious about the following idea.

As mentioned, OraTcl can return rows as either a list of values or
store them directly into a tcl array. The second option (the array)
is faster and more efficient due to the fact that the list has to be
built and the array elements just get set to an existing Tcl_Obj built
out of the database return values (which come back as individual
columns).

Either way, I've been looking at this for many years and the
discussion here if tuples need to be represented as a list, a list of
lists, a hash of lists, a list of dicts and maybe even a dict of
dicts.

My .02$ is that we are addressing the problem from the top down and
not the bottom up. Tcl doesn't have a standard DBI interface, because
nobody has decided (mandated) what a data value looks like. In my
belief, Tcl needs a new Tcl_Obj type called a 'column' that has
properties that the current Tcl_Obj's do not provide. In this case a
result tuple would be a list of 'columns' not a list of 'dicts'.
Mapping a database object which can have about 300 different types
including user defined types into a language that treats all data as a
'String' has always had its advantages and its disadvantages. One of
these is trying to determine the difference between a NULL and an
empty string. A new Tcl_Obj could address many of these deficiencies
by
adding additional methods to the object. Note also that there is very
little difference between a result column a bind column or a stored
procedure column.

I have some other concerns about certain aspects of this TIP mostly in
the area of Concurrency and Consistency:: knowing what rows are
committed and which are not. This falls into the realm of database
specifics. In my database of choice, an inserted row is inserted for
the connected session but not visible to other sessions until the
commit, the row inserted is just as valid as any other row in the
table. I know of no way in Oracle to 'detect' which rows are not yet
committed in your table. Maybe those are just items that are left up
to the developer of the glue to figure out.

I would have recoded OraTcl to use ensembles ages ago if it would not
break backwards compatibility.

To handle multi-threading and asynchronous and a few other non-trivial
operations, I had to change '2' exactly '2' non-backward compatible
items between OraTcl 3 and OraTcl 4 and I still get questioned about
why :)

-Todd

Donal K. Fellows

unread,
Nov 25, 2007, 5:49:53 AM11/25/07
to
Darren New wrote:
>> (4) Asynchronous queries could be addressed.
>
> Could be, but I can see how this can be difficult if the underlying
> database API doesn't allow for it. Settle for ensuring thread safety?

The reason I'm not keen on requiring support in the base TDBC interface
for asynch queries is that they mandate that the database be remote.
In-process DBs like SQLite cannot support it. Requiring thread safety is
much better. (And having a standard way of writing an asynch query is OK
too, but it comes under the heading of a standard extension to TDBC.)

My $0.04 (£0.02 at current exchange rates! :-))

Donal.

Darren New

unread,
Nov 25, 2007, 1:06:02 PM11/25/07
to
Donal K. Fellows wrote:
> The reason I'm not keen on requiring support in the base TDBC interface
> for asynch queries is that they mandate that the database be remote.

That's a bit of a surprising statement. They don't really *require* a
remote DB. You just won't get much asynchronicity without. :-)

I mean, there's no reason you couldn't have (say) an in-process database
that scans through a table and fires an event or a callback or something
for each row in the table. Or you could launch a separate (non-Tcl)
thread and pass events back that way. All the updates to a change in
packing in at Tk window are asynchronous, but all of them are
"in-process" as well.

Obviously, if you've chosen a database that doesn't support asynchronous
searching, you won't get much benefit.

But that's academic, really. And probably more work than it's worth.

> In-process DBs like SQLite cannot support it. Requiring thread safety is
> much better. (And having a standard way of writing an asynch query is OK
> too, but it comes under the heading of a standard extension to TDBC.)

OK. I guess what I'm saying is that having a standard mechanism that
says (say) whether the database connector is thread-safe, and having a
standard way of requesting asynchronous queries if it's supported, would
be a good idea. It will encourage people to say "Gee, some people might
want this".

Obviously, whatever you do with your database, you're still going to
have to do performance testing and such, and select your hardware and
underlying DB to support the power you need. Having a standard mechanism
would be more a "here's a good way to do it that works well" than
anything more, and entirely adequate.

Kind of like Ada: You might not be able to hook interrupts with your
compiler, or dynamically load code. But if you can, you know how to do it.

Koen Danckaert

unread,
Nov 26, 2007, 6:01:17 AM11/26/07
to
On Nov 23, 11:40 am, "Donal K. Fellows"

<donal.k.fell...@manchester.ac.uk> wrote:
> The aim is to define a script-level interface that makes it easy for the
> majority of people to write SQL queries that are resistant to security
> howlers like injection attacks, and which can be transferred (or rather
> the experience transferred) from one database system to another.
>
> The vision is that I, as a database *user*, will be able to do this:
>
> # This part lies outside the TDBC spec
> package require foobardb
> foobardb dbcon -server db.foobar.com
>
> # This part lies inside the TDBC spec
> dbcon execute {SELECT x,y FROM foo WHERE bar LIKE :pattern} v {
> puts "got x=[dict get $v x] and y=[dict get $v y]"
> }

But why not just:

dbcon select x,y -from foo -where bar -like $pattern

I realize this is probably a stupid question since I have no
experience with database systems, but I find that "execute
{SELECT ...}" statement really un-tclish... Plus it gives rise to
substitution problems. (The ":pattern" refers to $pattern, or not?)

-- Koen

Twylite

unread,
Nov 26, 2007, 3:18:46 PM11/26/07
to
On Nov 25, 12:49 pm, "Donal K. Fellows"

<donal.k.fell...@manchester.ac.uk> wrote:
> much better. (And having a standard way of writing an asynch query is OK
> too, but it comes under the heading of a standard extension to TDBC.)

Ooh, glad to see I'm not the only one thinking this ;) (see
http://groups.google.com/group/comp.lang.tcl/msg/c6dbe1fe0dad36e7).

Regards,
Twylite

Donal K. Fellows

unread,
Dec 3, 2007, 6:08:11 AM12/3/07
to
Koen Danckaert wrote:
> But why not just:
> dbcon select x,y -from foo -where bar -like $pattern

That handles only the very simplest cases. Doing anything more complex
(and yes, SQL gets a lot more complex) will be horrible and end up with
us replicating SQL. Embedding the SQL is easier. :-)

Donal.

Kevin Kenny

unread,
Dec 3, 2007, 8:58:39 AM12/3/07
to
Koen Danckaert wrote:
> But why not just:
>
> dbcon select x,y -from foo -where bar -like $pattern
>
> I realize this is probably a stupid question since I have no
> experience with database systems, but I find that "execute
> {SELECT ...}" statement really un-tclish... Plus it gives rise to
> substitution problems. (The ":pattern" refers to $pattern, or not?)

The principal reason is that SQL is a language unto itself. While
it isn't necessarily all that Tclish, it does a lot. I've been
sticking to simple examples for the purposes of illustration, but
I've got a lot of SQL statements that look more like:

INSERT INTO transfer_request(reqid, matid, devid,
time_needed, requestor, status)
SELECT a.reqid, a.matid, a.devid, a.time_needed, :pid, ''
FROM material_commitment a
WHERE a.devid = :devid
AND NOT EXISTS (SELECT 0
FROM server b, material_instance c
WHERE b.server_type = 'archive'
AND b.network_segment = a.network_segment
AND c.devid = b.devid
AND c.matid = a.matid)
AND NOT EXISTS (SELECT 0
FROM transfer_request d
WHERE d.reqid = a.reqid
AND d.matid = a.matid
AND d.devid = a.devid)

Even that one isn't inordinately complex. Some statements span multiple
pages. Really.

tom.rmadilo

unread,
Dec 3, 2007, 12:08:51 PM12/3/07
to

Kevin is 100% correct, you can't replace SQL with Tcl code, it is
easier to learn SQL than any programming language equivalent, if for
no other reason (and there are many) than the fact that you can find
lots of documentation at every level of sophistication online or at a
library or bookstore. Most frameworks have poor documentation, if any,
and they are always in a state of evolution. Also, it is just as easy
to write your own helper functions to handle the stuff that you find
boring or simple. The interface itself must handle arbitrary SQL
statements.

Now a digression having nothing to do with TIP #308:

Why not use a view, or multiple views? Every select clause above
should be replaced with a view named so that it is more obvious what
you are trying to do. Views are like functions, no sane programmer
would run page after page of a single procedure, but it seems to be
okay to do so with SQL. Another reason to write the SELECT as a view
is that you can then perform the select independently, just to make
sure that it produces what you expect before your start inserting
stuff into another table. Of course not every database supports views,
which is another reason you can't effectively write a Tcl'ish layer
which could avoid the need to support arbitrary query text.

I would also like to note that Kevin's example demonstrates that SQL
queries produce column ordered results, otherwise the insert shown
above wouldn't work.

Donal K. Fellows

unread,
Dec 3, 2007, 4:57:50 PM12/3/07
to
tom.rmadilo wrote:
> I would also like to note that Kevin's example demonstrates that SQL
> queries produce column ordered results, otherwise the insert shown
> above wouldn't work.

FWIW, a fairly recent change to 8.5 is that its dictionaries are
order-preserving.

Donal.

tom.rmadilo

unread,
Dec 3, 2007, 5:39:37 PM12/3/07
to
On Dec 3, 1:57 pm, "Donal K. Fellows"

I've heard that, I wonder why that hasn't made it into the
documentation (8.5b*). One thing I haven't been able to figure out is
how dictionaries represent NULL values. Besides being column ordered,
SQL queries produce values in the correct ordinal position, so each
result 'row' has the same number of columns.

Donal K. Fellows

unread,
Dec 3, 2007, 6:13:59 PM12/3/07
to
tom.rmadilo wrote:
> I've heard that, I wonder why that hasn't made it into the
> documentation (8.5b*).

The change post-dates b3, the last beta. :-)

> One thing I haven't been able to figure out is
> how dictionaries represent NULL values. Besides being column ordered,
> SQL queries produce values in the correct ordinal position, so each
> result 'row' has the same number of columns.

They don't. NULLs are totally inimical to Tcl's type system, since NULLs
are *not* strings. The best you could hope for (other than the unordered
approach or the use of structured sub-dictionaries[*]) is some sort of
magic value that is set up in the database interface before the query.

Donal.

thel...@gmail.com

unread,
Dec 3, 2007, 6:28:22 PM12/3/07
to
On Dec 3, 6:13 pm, "Donal K. Fellows"

And thusly was 'oraconfig nullvalue' born...

nullvalue Sets or returns the NULL value behavior. A
value of "default"
causes orafetch to substitute zeros for NULLs
in numeric columns
and null strings "{}" for NULLs in character
columns. Any other
value causes that value to be returned as a
string for all NULLs.
The default is "default".

Nulls have forever been the bane of database <==> tcl api's... The
best suggestion I've heard yet is similar to what the Oracle Call
Interface does. Have the DB layer populate a secondary array (or
whatever) with the is_null boolean and then set the returned value to
empty string. This handles the 'SELECT' case. INSERT/UPDATE/DELETE
handle themselves via the quoting mechanism. INSERT INTO FOO
(null); inserts a null value and INSERT INTO FOO ('null'); will
insert a 4 letter string with the letters 'n' 'u' 'l' 'l'.

-Todd

tom.rmadilo

unread,
Dec 3, 2007, 6:58:42 PM12/3/07
to
On Dec 3, 3:28 pm, "thelf...@gmail.com" <thelf...@gmail.com> wrote:
> Nulls have forever been the bane of database <==> tcl api's... The
> best suggestion I've heard yet is similar to what the Oracle Call
> Interface does. Have the DB layer populate a secondary array (or
> whatever) with the is_null boolean and then set the returned value to
> empty string. This handles the 'SELECT' case. INSERT/UPDATE/DELETE
> handle themselves via the quoting mechanism. INSERT INTO FOO
> (null); inserts a null value and INSERT INTO FOO ('null'); will
> insert a 4 letter string with the letters 'n' 'u' 'l' 'l'.

This is how I do it in my API, I use setParam to set the value or
setParamNull to indicate a null value. This has the effect of
bypassing the quoting that takes place in setParam. Also, each param
can have a default value, which could be "null". If the default was an
actual string, you would use "'mydefault'".

One thing I noticed is that a dictionary might be a good way to
represent values, similar to what is required in the CLI interfaces,
you use an indicator variable. A dictionary would have entries for the
value, the null indicator and any other implementation specific
entries. A very thin layer over this could provide many different
useful transformations of this basic return value.

Btw, if NULLs are represented in a dict as simply a missing name/
value, this will make it very hard to trace errors. Going back to an
earlier example, if a 'select * from table_x' is used, you will not
get a error if a column is dropped or added. But if dropped, a missing
column would no longer indicate NULL.

Michael Schlenker

unread,
Dec 3, 2007, 7:01:25 PM12/3/07
to
tom.rmadilo schrieb:

> Btw, if NULLs are represented in a dict as simply a missing name/
> value, this will make it very hard to trace errors. Going back to an
> earlier example, if a 'select * from table_x' is used, you will not
> get a error if a column is dropped or added. But if dropped, a missing
> column would no longer indicate NULL.

One of the reason most people suggest you don't use 'select * from'
anywhere but spell out the columns you want. Then you get a decent error
message from your db.

Michael

Darren New

unread,
Dec 3, 2007, 10:53:59 PM12/3/07
to
tom.rmadilo wrote:
> I would also like to note that Kevin's example demonstrates that SQL
> queries produce column ordered results,

I don't think anyone disputed that. IIRC, it was more an academic
discussion of whether preserving column order was part of the relational
model or not.

tom.rmadilo

unread,
Dec 4, 2007, 1:53:55 AM12/4/07
to
On Dec 3, 7:53 pm, Darren New <d...@san.rr.com> wrote:
> tom.rmadilo wrote:
> > I would also like to note that Kevin's example demonstrates that SQL
> > queries produce column ordered results,
>
> I don't think anyone disputed that. IIRC, it was more an academic
> discussion of whether preserving column order was part of the relational
> model or not.

You didn't appear to be interested in an academic discussion, I wrote:

> Maybe even
> write some regular SQL code so that you understand that select
> statements produce ordered fields.

And you responded:

> I've been doing database stuff since oracle cost $100K a seat. I've
> never relied on the order of columns in a result set. The relational
> model doesn't rely on the order of columns in a result set. Every
> tutorial I've ever seen says it's bad form to rely on the order of
> columns in a result set, especially with SELECT *.

Doing SELECT * obviously gives you the current order of the columns.
The order is fixed, until someone does a ddl which may reorder the
columns. Then they again remain fixed. This is a distinct concept from
relying on the order to remain the same as the database structure is
manipulated. But between ddl manipulations, the column order is always
the same. And obviously if you specify columns, you are relying on the
order of the columns in the result set. Your statement above indicates
you think this is bad form.

I've also provided a number of examples where ordinal numbers are used
in queries where the column is a function. SQL does not define names
for these columns. If you couldn't rely on the order of columns, SQL
would not work except for very simple operations. But even very simple
operations rely on this:

INSERT into table_x values (...);

This is a very reliable feature of SQL.

Darren New

unread,
Dec 4, 2007, 10:57:44 PM12/4/07
to
tom.rmadilo wrote:
> On Dec 3, 7:53 pm, Darren New <d...@san.rr.com> wrote:
>> tom.rmadilo wrote:
>>> I would also like to note that Kevin's example demonstrates that SQL
>>> queries produce column ordered results,
>> I don't think anyone disputed that. IIRC, it was more an academic
>> discussion of whether preserving column order was part of the relational
>> model or not.
>
> You didn't appear to be interested in an academic discussion, I wrote:

Where did I say that SQL doesn't return ordered fields? Nobody disputed
that.

>> Maybe even
>> write some regular SQL code so that you understand that select
>> statements produce ordered fields.
>
> And you responded:
>
>> I've been doing database stuff since oracle cost $100K a seat.

Pointing out that I've written quite a lot of regular SQL code, and that
I understand select statements produce ordered fields.

>> I've
>> never relied on the order of columns in a result set.

A personal style, not a dispute that columns aren't ordered.

>> The relational
>> model doesn't rely on the order of columns in a result set.

What I said above. SQL != relational model.

>> Every
>> tutorial I've ever seen says it's bad form to rely on the order of
>> columns in a result set, especially with SELECT *.

This is true.

> Doing SELECT * obviously gives you the current order of the columns.
> The order is fixed, until someone does a ddl which may reorder the
> columns. Then they again remain fixed.

Correct. So? Where does that disagree with anything I've said?

BTW, it's the fact that the column order *can* change that makes me use
named columns. It's why I use variables like "number_of_users" instead
of "X9" also.

> This is a distinct concept from
> relying on the order to remain the same as the database structure is
> manipulated. But between ddl manipulations, the column order is always
> the same.

I didn't dispute that.

> And obviously if you specify columns, you are relying on the
> order of the columns in the result set.

You're also limiting the ability to pass things through layers of
libraries, to some extent, in the same sense that strong typing without
OO-style classification can limit the usability of libraries.

For example, if every SELECT I do lists all the columns, then I can no
longer add a column to a table, populate it, and have it automatically
show up in reports. The once-and-only-once bit becomes much harder to
maintain, because you have to start querying the database to see what
columns are available if you want to do something generic, instead of
just using "*".

For example, I recently had a routine where I had SQL of the form
proc fetchrows {tag {tables tagtable} {extra ""}} {
...
set sql "SELECT * FROM $tables WHERE tagtable.tag = '$tag' $extra"
...
return $resultset
}

This got used in several routines. The $tables and $extra let me join
the tagtable with other tables as needed at the time. At one point, I
needed the "ham" column off one of the other tables in the $tables list,
and I spent a few minutes trying to figure out how to get it, whether it
would be efficient to look it up again, where would be appropriate in
the call chain, until I actually followed the code all the way back to
fetchrows and discovered I'd already fetched the appropriate ham. I
eliminated changing a routine used in a number of places, and simply
made use of extra data, which wouldn't have worked had I not used *.

Yeah, I could have passed in the list of columns to return, too, but
why? It's not noticably more efficient (certainly parsing vs
transmission of the data for one or two rows isn't important, or at
least better not be), and I'd have to test that other pieces of code
that used the same results didn't break.

> Your statement above indicates you think this is bad form.

Bad form, or at least not my style.

> I've also provided a number of examples where ordinal numbers are used
> in queries where the column is a function. SQL does not define names
> for these columns. If you couldn't rely on the order of columns, SQL
> would not work except for very simple operations. But even very simple
> operations rely on this:
>
> INSERT into table_x values (...);
>
> This is a very reliable feature of SQL.

I didn't dispute that. Nor did I suggest that ordinal columns shouldn't
be supported. I merely disputed that named columns should be excluded.

Again, where did I say columns aren't ordered in SQL? I said they're not
ordered in the relational model, and I never relied on them being ordered.

tom.rmadilo

unread,
Dec 5, 2007, 2:06:57 AM12/5/07
to
On Dec 4, 7:57 pm, Darren New <d...@san.rr.com> wrote:
> Again, where did I say columns aren't ordered in SQL? I said they're not
> ordered in the relational model, and I never relied on them being ordered.
>

Personally it seems to me, after reading over everything, that both of
us are more practical than caught up on abstract definitions. Maybe we
have slightly different practicalities, but they seem pretty close
overall.

I like your comments about layers and OO style. I can't figure out
what is the deal with OO when you compare it to SQL. Every query in
SQL creates objects, in a much more dynamic and exact way than any OO
language could ever keep up with.

As far as layers, I don't know if anyone agrees with me or not, but my
concept is that each layer must have the same content. If this isn't
true, if one layer has more or less content than another, it means
that object state is not consistent. In the current situation in Tcl
we write a SQL statement. Instead of a list of columns we can
substitute a *. The database parser replaces the * with the current
list of columns. The content is the same. * means all columns, so if
that is what you want, that is what you should use.

Not all databases support roles which can restrict down to the column
level, but it is an important concept. In this situation a * is
replaced with the columns available to the current role. If you
specified the columns, any mismatch with a user role would lead to an
(unnecessary) error.

Unfortunately OO concepts and procedural languages in general tend to
think of objects as having behavior. If objects have the same content
on every layer, they can't include behaviors, although their data
content determines past, present and future behavior. Essentially
behaviors are external and environmental.

Darren New

unread,
Dec 5, 2007, 2:34:42 AM12/5/07
to
tom.rmadilo wrote:
> I can't figure out
> what is the deal with OO when you compare it to SQL.

I meant that doing "SELECT *" lets you use essentially the same query
and the same handlers in a library routine regardless of what's in the
table. Just like in an OO collection library, you can do things with
collections regardless of what is in them, in a way you cannot in a less
flexibly-typed language. It was a very loose analogy.

I can pass to a library routine the name of the table whose rows should
go in the HTML template, and let the library do the formatting for me.
If I have to always give an explicit list of columns, then changing
what's in the report means changing both the list of columns in the code
and the template text. That's two places to update to change one thing.
I try to avoid that. If I have to use ordinal column numbers, then the
templates become very difficult to work with, because I have to
substitute in tags based on integers instead of tags based on meaningful
names. And when I change the columns in the database, I have to change
(or at least check) all the templates, including those that don't
reference columns I changed.

> Not all databases support roles which can restrict down to the column
> level, but it is an important concept. In this situation a * is
> replaced with the columns available to the current role. If you
> specified the columns, any mismatch with a user role would lead to an
> (unnecessary) error.

I think whether the error is unnecessary would depend on whether the
column you asked for is unnecessary. If you need that column for the
code to work, and your role denies it, then it's a good thing you get an
error. (Altho I must admit, I've never figured out a good way of
building a UI based on roles - I always wind up duplicating data between
what's permitted in the DB and what's presented in the UI. Hints
appreciated.)

There's a trade-off. If you specify columns, you catch errors earlier,
but things are less flexible. If you don't specify columns, you might
wind up passing around column objects/data/tuples/whatever you call them
when they're not in the database's that don't have the data you expect
if you change the schema after you write the code.

Since "prepared statements" in Tcl don't give you enough static
compile-time checking to check you got things right at compile-time
(unlike in some other languages where the SQL prepared statement stuff
is actually built into the syntax), the specification of exactly what
columns you want in what order doesn't seem to be an overall win to me.

USCode

unread,
Feb 19, 2008, 1:55:25 PM2/19/08
to
Donal K. Fellows wrote:
> TIP #308: TCL DATABASE CONNECTIVITY (TDBC)
Have we received any feedback from D.Richard Hipp regarding this TIP?
DRH, as most folks know, is the author of the incredibly popular
database SQLite and has been a strong proponent of Tcl/Tk for years.
In fact, only the TCL bindings are included with the core SQLite
package: http://www.tcl.tk/community/tcl2004/Papers/D.RichardHipp/drh.html

It would be interesting to hear what DRH has to say about TDBC.

Thanks!

0 new messages