[set v ""] = bytearray object?

44 views
Skip to first unread message

Tiago Dionizio

unread,
Jun 21, 2005, 7:43:55 AM6/21/05
to
Hi,

I have been using Tcl8.5 (from CVS) for some time and noticed one
strange behaviour in my scripts.

I build a TclKit application (console version, no Tk) and when
executing the statement [set v ""] the variable created (or reused i
guess) and saved in <v> is a bytearray object.

Now this can break things when using SQLite3 tcl bindings. When binding
variables in my SQL statements, sqlite turns bytearray objects into
BLOB variables and comparing BLOBs and STRINGs in sqlite is always
false (0).

This only happens on some occasions (depending on what Tcl statements
get executed first), but i was still wondering why do i get a bytearray
object when i what i want is an empty string.

My question to those who are familiar with the internals of Tcl is if
this is the expected behaviour, or if this can happen in some occasions
but still valid?

I hope to get an answer because this is annoying since i can't turn
bytearray objects into string objects explicitly from Tcl scripts and
some bindings depend on their internal representation to determine what
to do with them and have been trying to figure out what is going on for
the past few days.

Regards,
Tiago

Ralf Fassel

unread,
Jun 21, 2005, 8:48:16 AM6/21/05
to
* "Tiago Dionizio" <tng...@gmail.com>

| This only happens on some occasions (depending on what Tcl
| statements get executed first), but i was still wondering why do i
| get a bytearray object when i what i want is an empty string.

When you
catch {unset v}
set v ""
do you also get ByteArray Objs?

I would expect that
set v ""
append v {}

would turn v into a string Obj, but I haven't checked in the sources.

R'

Tiago Dionizio

unread,
Jun 21, 2005, 11:43:45 AM6/21/05
to
Ralf Fassel wrote:
> * "Tiago Dionizio" <tng...@gmail.com>
> | This only happens on some occasions (depending on what Tcl
> | statements get executed first), but i was still wondering why do i
> | get a bytearray object when i what i want is an empty string.
>
> When you
> catch {unset v}
> set v ""
> do you also get ByteArray Objs?

Same thing. I believe it has something to reusing string literals on a
hash table (from my quick look at Tcl internals), but not sure.

>
> I would expect that
> set v ""
> append v {}
>
> would turn v into a string Obj, but I haven't checked in the sources.

It turns it into a string object, but i'd like to know the exact
behaviour of creating an empty string (as in [set v ""]), and not
looking for workarounds unless i really have to.

Thanks for the reply.
Tiago

SM Ryan

unread,
Jun 21, 2005, 3:06:55 PM6/21/05
to
"Tiago Dionizio" <tng...@gmail.com> wrote:

# Same thing. I believe it has something to reusing string literals on a
# hash table (from my quick look at Tcl internals), but not sure.

You can do something like [string index X 0], but this points to the
fact that the sqlite code is broken if it thinks the current object
type is the sole interpretation of the object. That's why you have to
encode the type in the command name.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
GERBILS
GERBILS
GERBILS

miguel sofer

unread,
Jun 21, 2005, 6:48:52 PM6/21/05
to
Tiago Dionizio wrote:
> Hi,
>
> I have been using Tcl8.5 (from CVS) for some time and noticed one
> strange behaviour in my scripts.
>
> I build a TclKit application (console version, no Tk) and when
> executing the statement [set v ""] the variable created (or reused i
> guess) and saved in <v> is a bytearray object.
>
> Now this can break things when using SQLite3 tcl bindings. When binding
> variables in my SQL statements, sqlite turns bytearray objects into
> BLOB variables and comparing BLOBs and STRINGs in sqlite is always
> false (0).

Is that so? IMHO, this is a bug in SQLite's Tcl interface: Tcl is
typeless, the internal type of a Tcl_Obj should not be accorded a
meaning it does not have.

> This only happens on some occasions (depending on what Tcl statements
> get executed first), but i was still wondering why do i get a bytearray
> object when i what i want is an empty string.
>
> My question to those who are familiar with the internals of Tcl is if
> this is the expected behaviour, or if this can happen in some occasions
> but still valid?

Literals are shared - ie, separate uses of the same literal address the
same Tcl_Obj. If a previous command shimmered your {} literal to a
bytearray internal representation, it will keep it until it is shimmered
back.

>
> I hope to get an answer because this is annoying since i can't turn
> bytearray objects into string objects explicitly from Tcl scripts and
> some bindings depend on their internal representation to determine what
> to do with them and have been trying to figure out what is going on for
> the past few days.

One thing you can do that I hope is future proof (independently of
further optimisations in the Tcl core) could be:

# define a variable that holds an empty string forever
set empty {} ;# just once

# shimmer empty to string type before using
string range $empty 0 2;# every time you need an empty
set v $empty

This is not likely to be optimised away anytime soon (famous last words?).

Miguel

Donal K. Fellows

unread,
Jun 21, 2005, 7:24:18 PM6/21/05
to
Tiago Dionizio wrote:
> Now this can break things when using SQLite3 tcl bindings. When binding
> variables in my SQL statements, sqlite turns bytearray objects into
> BLOB variables and comparing BLOBs and STRINGs in sqlite is always
> false (0).

Yuck.

> I hope to get an answer because this is annoying since i can't turn
> bytearray objects into string objects explicitly from Tcl scripts and
> some bindings depend on their internal representation to determine what
> to do with them and have been trying to figure out what is going on for
> the past few days.

FWIW, this sort of behaviour (looking at the type of representation
cached in a value) is *not* recommended. You now know why! :-)

But you can use a few tricks to generate a clean empty object. The hard
part is that the base substring operations themselves know how to work
on a byte-array; you have to go to extreme lengths to get what you're
really after. The following should work though:

# Virtually any variable will do...
set cleanEmpty [string index [list $::errorCode] -1]

Other alternatives involve doing something with [regexp], etc.

Donal.

Donald Arseneau

unread,
Jun 21, 2005, 7:46:45 PM6/21/05
to
"Tiago Dionizio" <tng...@gmail.com> writes:

> executing the statement [set v ""] the variable created (or reused i
> guess) and saved in <v> is a bytearray object.

I don't know why, and maybe it should change.

> Now this can break things when using SQLite3 tcl bindings. When binding
> variables in my SQL statements, sqlite turns bytearray objects into
> BLOB variables and comparing BLOBs and STRINGs in sqlite is always
> false (0).

Then this is an error in sqllite. The internal representation
of objects should have no effect on a Tcl program, except for
speed.

--
Donald Arseneau as...@triumf.ca

Tiago Dionizio

unread,
Jun 21, 2005, 7:43:36 PM6/21/05
to
Donal K. Fellows wrote:

> Tiago Dionizio wrote:
> > I hope to get an answer because this is annoying since i can't turn
> > bytearray objects into string objects explicitly from Tcl scripts and
> > some bindings depend on their internal representation to determine what
> > to do with them and have been trying to figure out what is going on for
> > the past few days.
>
> FWIW, this sort of behaviour (looking at the type of representation
> cached in a value) is *not* recommended. You now know why! :-)

Yup. Learned the hard way! But i'm releaved to know this isn't an issue
with my build, but something that works as expected (but i wasn't aware
of).

Trying to make SQLite developers aware of this issue and hope it will
be solved soon!.


> But you can use a few tricks to generate a clean empty object. The hard
> part is that the base substring operations themselves know how to work
> on a byte-array; you have to go to extreme lengths to get what you're
> really after. The following should work though:
>
> # Virtually any variable will do...
> set cleanEmpty [string index [list $::errorCode] -1]

I sure hope i don't have to go this way, or else it will make it
virtually impossible to use.


Thanks for the responses, things are clear for me now.

Regards,
Tiago

Donald Arseneau

unread,
Jun 21, 2005, 8:06:14 PM6/21/05
to
miguel sofer <mso...@users.sf.net> writes:

> Literals are shared - ie, separate uses of the same literal address the same
> Tcl_Obj. If a previous command shimmered your {} literal to a bytearray
> internal representation, it will keep it until it is shimmered back.

Oh dear! Does this mean that:

set initial_string ""
set initial_list [list]

(in a proc) shimmers one underlying object? That would be terrible!

--
Donald Arseneau as...@triumf.ca

miguel sofer

unread,
Jun 21, 2005, 8:13:59 PM6/21/05
to

No; [list] (currently) returns an empty Tcl_Obj that
- is unshared
- does not have a list internal rep

The initial string value "" is (currently) shared with every other
literal empty value (ie, "" or {}) defined in the same interp.

What are shared are literal objects; in some cases, some command return
values (especially 0, 1, {} from bytecoded commands, in Tcl8.5) may be
shared too. This depends on many details, among them the Tcl patchlevel.

In principle, it shouldn't be "terrible": shimmering issues are very
much the central criterion for these internal optimisations. We might
make a mistake every now and then, but we try hard not to do it too often :)

Miguel

d...@hwaci.com

unread,
Jun 21, 2005, 8:56:15 PM6/21/05
to
SQLite very much needs to look at the internal representation of
Tcl_Objs, expecially in the case of ByteArrays (a.k.a. BLOBs).
For example, if I do this:

set fd [open image.gif]
fconfigure $fd -translation binary
set image [read $fd]
close $fd
sqlite3 db image.db
db eval {CREATE TABLE img(x)}
db eval {INSERT INTO img VALUES($image)}

On the last line, I most definitely do *not* want to insert the
text representation of the image. Doing so would break lots and
lots of code. That is just unacceptable.

I do not know the solution to Tiago's problem. But changing
SQLite so that it only uses the string representation of values
is not a reasonable approach.

miguel sofer

unread,
Jun 21, 2005, 9:17:28 PM6/21/05
to

The real answer is: there is no way to insure from a script that a
'value with a string representation that is an empty string' is
currently represented by a 'Tcl_Obj with string type'. Workarounds is
all that you can get, and they are likely to depend on the Tcl patchlevel.

Every value in Tcl is conceptually a string. The representation of
values in Tcl_Objs is just a performance optimisation: we cache a
suitable interpretation for the last usage of that string - in the hope
that it will be suitable the next time we use it.

This allows in many cases a shunting of the parse/interpret cycle, and
is one of the reasons for the huge speed improvements of Tcl8 over Tcl7.
The other one is the bytecoding of scripts - which is also caching an
optimised structure suitable for the last usage of the script ... errh,
string.

Any script that changes behaviour according to the internal
representation of some value is actually a bug - either in Tcl (we are
busy trying to remove the few remaining cases) or in some C coded extension.

The C api exposes much more of the internals, and it does give you the
possibility to extract a string rep from any Tcl_Obj if that is what is
needed. C coded extensions should not have any expectations wrt the
internal representation of values they get from Tcl; if they need a
particular type, they should include code to insure that they get it.

I hope I have addressed your concerns clearly. If not, maybe someone
else can explain better. If not, let's iterate.

Miguel

PS: the C api does allow creating empty values with string internal rep:
Tcl_NewObj() or Tcl_NewStringObj("", 0) will do that.

SM Ryan

unread,
Jun 21, 2005, 9:24:13 PM6/21/05
to
d...@hwaci.com wrote:
# SQLite very much needs to look at the internal representation of
# Tcl_Objs, expecially in the case of ByteArrays (a.k.a. BLOBs).
# For example, if I do this:
#
# set fd [open image.gif]
# fconfigure $fd -translation binary
# set image [read $fd]
# close $fd
# sqlite3 db image.db
# db eval {CREATE TABLE img(x)}
# db eval {INSERT INTO img VALUES($image)}
#
# On the last line, I most definitely do *not* want to insert the
# text representation of the image. Doing so would break lots and
# lots of code. That is just unacceptable.

You cannot expect Tcl give you the type information like that. You
can do something like
SHOW COLUMNS OF img LIKE 'x'
parse the output, and convert the inserted value into the appropriate
representation.

You can also write formatters and call them explicitly, perhaps
INSERT INTO img VALUES([formatAsBlob $image])

# I do not know the solution to Tiago's problem. But changing
# SQLite so that it only uses the string representation of values
# is not a reasonable approach.

Tcl has inherent limitations.

So basically, you just trace.

miguel sofer

unread,
Jun 21, 2005, 9:36:49 PM6/21/05
to

(With apologies for the approximate language: I do not know much about
SQLite, nor databases in general. Some vague recollections ...)

How are the field types defined in SQLite? I assume that the
fields/columns are typed. If that is the case, SQLite's Tcl api should
extract the correct value from the Tcl_Obj that is about to be inserted.
In this case, if we are about to insert a value into a BLOB field, the
api should call Tcl_GetByteArrayFromObj before the insertion.

OTOH, if SQLite's fields are typeless or auto-typed (as I might infer
from that piece of code), the api should provide some other way to
signal the fact that this particular piece of data is supposed to be
stored as a BLOB (or an integer, or a double).

Cheers
Miguel

d...@hwaci.com

unread,
Jun 21, 2005, 10:00:16 PM6/21/05
to
SQL is strongly typed. SQLite used to be typeless like TCL, back
in version 2. But that created too many incompatibilities with
standard SQL, so version 3 of SQLite went to using manifest typing.
Manifest typing still has types, but the type is associated with
the value itself, not with the container that you put it in.
With strong typing, the type is associated with the container.

SQLite supports 5 basic types: NULL, Integer, Float, Text, and
BLOB. The mapping between Integer, Float, and Text in SQLite<->TCL
is not a problem. NULL is a bit tricker but it works - the
details are not relevant to this issue. The problem here is BLOB.

SQLite sorts all BLOBs to be greater than objects of any other
type. BLOBs themselves compare in memcmp() order. NULLs sort
first and are mutually incomparable. Integer and Float sort
in numeric order. Text values are always larger than numeric
values and less than BLOBs. Text sorts according to a collating
function defined by the user. (Chinese users expect different
sort order from Americans, for example.) Text can be stored as
UTF-8, UTF-16be or UTF-16le.

Going from SQLite to TCL is not a problem. SQLite can have multiple
simultaneous representations of the same value, just like TCL. But not
if the value is a BLOB or a NULL. BLOB and NULL cannot coexist with
Text, Int, Real. So it is easy to decide whether one should use
Tcl_NewIntObj(), Tcl_NewDoubleObj(), Tcl_NewStringObj() or
Tcl_NewByteArrayObj().

Going from TCL back to SQLite is more problematic because TCL
does allow a value to have both a BLOB and a TEXT representation
at the same time. When it does, the BLOB representation is chosen.
This is causing problems from Tiago. (I'm very curious to know how
he managed to shimmer a {} constant into a bytearray - but that
is a separate topic...)

One possible solution I am considering is to only interpret TCL
variables as BLOB if they have no string representation. That prevent
string constants lilke "" or "abc" from ever going in as BLOBs because
their string representation can never be invalidated (right?) There
is a trick in the SQLite interface that could force the use of the BLOB
representation where it is really needed: represent the variable in
SQL as ":var" instead of as "$var" - with a ":" instead of a "$".
SQLite
allows that.

Additional information:

http://www.sqlite.org/datatype3.html
http://www.sqlite.org/cvstrac/tktview?tn=1287

d...@hwaci.com

unread,
Jun 21, 2005, 10:31:41 PM6/21/05
to
Let me explain this further. SQLite allows TCL variable
names to be embedded in the middle of SQL statements.
For example:

db eval {INSERT INTO table1 VALUE($a,$b(c),$d)}

Notice the curly-braces. SQLite's parser recognizes
the TCL variables in the middle of the SQL, reaches into
TCL and pulls out the values it needs. This is *very*
power interface that works exceedingly well - far better
than any of the "database APIs" used by other systems
that typically require a small subroutine just to run a
simple SQL statement. It also make SQLite very fast
since you can do things like this:

foreach x [some command] {
db eval {INSERT INTO table2 VALUES($x)}
}

In this case, the SQL is compiled just once on the first
time through the loop then the same statement is
reused after replacing the value in the $x variable
on subsequent executions of the loop.

This tight coupling of SQLite and TCL makes the two
very pleasing to use together. Perhaps the TCL interface
on SQLite is bending the extension rules a little bit.
But the result is worth it, in my view. Changing SQLite
to only look at string representations would be a terrible
loss in functionality. We'll find a way to make this
work.

Ramon Rib?

unread,
Jun 22, 2005, 4:27:40 AM6/22/05
to
d...@hwaci.com wrote in message news:<1119405616....@f14g2000cwb.googlegroups.com>...
...

> SQLite sorts all BLOBs to be greater than objects of any other
> type. BLOBs themselves compare in memcmp() order. NULLs sort
> first and are mutually incomparable. Integer and Float sort
> in numeric order. Text values are always larger than numeric
> values and less than BLOBs. Text sorts according to a collating
> function defined by the user. (Chinese users expect different
> sort order from Americans, for example.) Text can be stored as
> UTF-8, UTF-16be or UTF-16le.

What are the advantages of sorting the BLOBS as greater than objects
of any other type? Would not be possible to use memcmp to compare
a BLOB and a text type?

...

> One possible solution I am considering is to only interpret TCL
> variables as BLOB if they have no string representation. That prevent
> string constants lilke "" or "abc" from ever going in as BLOBs because
> their string representation can never be invalidated (right?) There
> is a trick in the SQLite interface that could force the use of the BLOB
> representation where it is really needed: represent the variable in
> SQL as ":var" instead of as "$var" - with a ":" instead of a "$".
> SQLite
> allows that.

In this case:

set fd [open image.gif]
fconfigure $fd -translation binary
set image [read $fd]
close $fd
sqlite3 db image.db
db eval {CREATE TABLE img(x)}
db eval {INSERT INTO img VALUES($image)}

puts $image


db eval {INSERT INTO img VALUES($image)}

set retList [db eval { SELECT x FROM img where x=$image }]

What would be the length of retList? Would it be 2, as supposed?

Regards,

Ramon Ribó

d...@hwaci.com

unread,
Jun 22, 2005, 6:40:09 AM6/22/05
to
SQLite can represent text internally as UTF-8 or UTF-16.
It might convert text from one representation to another
without notice. So if you compared BLOBs to Text using
memcmp() you would get different answers at different
times - not a desirable thing. Hence we made BLOBs
compare unequal to TEXT in all cases.

The length list that is returned from [db eval {SELECT ...}]
is the number of rows times the number of elements in
each row, as you suspected.

Kevin Kenny

unread,
Jun 22, 2005, 8:19:53 AM6/22/05
to
d...@hwaci.com wrote:
> SQLite can represent text internally as UTF-8 or UTF-16.
> It might convert text from one representation to another
> without notice. So if you compared BLOBs to Text using
> memcmp() you would get different answers at different
> times - not a desirable thing. Hence we made BLOBs
> compare unequal to TEXT in all cases.

It might be too late to change it now, but you *could*
get consistent comparison semantics by allowing blobs
to be strings, as Tcl does. If Tcl needs to construct
the string representation of a bytearray, it converts it
to UTF-8 (you could use another encoding if you like)
by interpreting each byte of the array as if it were
an ISO8859-1 character.

The drawback of the scheme described is that you could
have a string that's equal to a blob; a blob consisting
of the two bytes 0x41 0x42, and the character string 'AB',
would compare as equal. Since conventional SQL doesn't
allow blobs in string context, or vice versa, anything
coded to conventional SQL rules would never encounter
the situation, and "everything is a string" seems to be
a good principle otherwise.

I realize that this would probably totally mess up your
B-tree indices, and I unfortunately don't have a really
good answer for deployed systems. Of course, you *could*
have a version number on the database file (or on the
index structure itself?) and keep the old way around for
"bugward compatibility."

Your idea of deciding that a bytearray is a blob if it's
"pure" - that is, if it has no string representation -
is not going to work very well, I'm afraid. It's far
too easy in Tcl for something to acquire a string
representation. Even inadvertently examining the value
inside the TDK debugger will give the thing a string
rep and change its semantics.

Using the internal representation of a Tcl object as
anything but a cache for performance purposes is an
extremely bad idea. Another example that I suspect
may have an impact on your code is that a thing that
looks like an integer may have an internal representation
as a 'double' - for instance:

set i 123
set s [binary format d $i]
# i is now a 'double'

For this reason, [expr] has to be very careful to
*not* use impure doubles that pass Tcl_LooksLikeInt.
There is a TIP (#249) that proposes to change this
behavior by always storing the internal rep in a
canonical form. There is a pitfall for you there,
too, though, because Tcl_GetDoubleFromObj will no
longer guarantee that the internal representation
contains a 'double' - it might be an integer.
Even Tcl_ConvertToType will not be able to make an
unambiguous guarantee, because if there is a path
that allows a double that LooksLikeInt ever to be
generated, all the code everywhere has to check for
it.

An obvious solution would be to stop registering
tclIntType and tclDoubleType so that nobody can
convert to them, but that would break code that
inspects internal representations for performance.
Arguably, such code is already broken, but I'm
not willing to make it simply crash.

--
73 de ke9tv/2, Kevin

Reply all
Reply to author
Forward
0 new messages