Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

python-noob - which container is appropriate for later exporting into mySql + matplotlib ?

65 views
Skip to first unread message

someone

unread,
Apr 10, 2013, 6:06:35 PM4/10/13
to
Hi,

Here's my data:
-----------------------
20130315T071500 39000. 10 26 48000. 1 40
20130315T071501 39000. 10 26 48000. 2 42
20130315T071501 39000. 10 26 47520. 15 69
20130315T071501 39000. 10 26 47160. 1 70
20130315T071501 39000. 10 26 47000. 1 72
20130315T071501 39000. 10 26 47000. 2 81
20130315T071501 39000. 10 26 47000. 6 85
20130315T071501 39000. 10 26 46520. 10 95
20130315T071501 43000. 10 36 46520. 10 95
20130315T071501 43200. 4 43 46520. 10 104
20130315T071501 44040. 1 45 46520. 10 108
20130315T071501 44080. 3 48 46520. 10 109
20130315T071501 44080. 3 48 46520. 11 113
20130315T071501 44080. 3 48 46400. 2 131
20130315T071501 45080. 1 51 46400. 2 145
20130315T071501 45080. 1 51 46200. 1 147
20130315T071501 45080. 1 60 46120. 1 182
20130315T071501 45520. 1 65 46120. 1 225
20130315T071501 45520. 1 73 46120. 2 247
20130315T080000 45760. 1 133 46120. 2 378
20130315T080241 45760. 2 199 46120. 2 453
20130315T080945 45760. 3 217 46120. 2 456
20130315T081103 45760. 3 217 46080. 1 457
20130315T081105 45760. 3 218 46080. 2 458
20130315T081106 45760. 4 222 46080. 2 458
20130315T081107 45800. 1 229 46080. 2 458
20130315T082754 45800. 8 266 46080. 2 514
.....
....
...
etc.
-----------------------

The first column is date + time. I know how to use simple tuples, list's
and dict's. But I don't think it's a good idea to make each line an
element in a list/dict ?

I want to put this table into an appropriate container such that
afterwards I want to:

1) Put the data into a mySql-table
2) Be able to easily plot column 1 vs. either of the other columns using
matplotlib etc...

I consider myself a python-newbie so I'm not that good with
containers... I found something here:
http://docs.python.org/dev/library/collections.html but I'm not really
sure what is a good pythonic way of storing this data?

I also think that maybe I ought to convert the first data+time column
into a single number, because right now it's a string (because there's a
"T" in the middle of column 1)... Letting the date+time-column (=column
1) become entirely numbers, makes it easier to plot in matplotlib
afterwards...

I'm not sure how to store this in an efficient manner... What (=how?)
would you do it?

Thanks...

Cousin Stanley

unread,
Apr 10, 2013, 9:39:07 PM4/10/13
to
someone wrote:

> ....
> I want to put this table into an appropriate container
> such that afterwards I want to:
>
> 1) Put the data into a mySql-table
> 2) Be able to easily plot column 1 vs. either of the other columns
> using matplotlib etc...
> ....

Consider editing your data file
into a csv file named someone.csv ....

20130315T071500,39000.,10,26,48000.,1,40
20130315T071501,39000.,10,26,48000.,2,42
20130315T071501,39000.,10,26,47520.,15,69
20130315T071501,39000.,10,26,47160.,1,70
20130315T071501,39000.,10,26,47000.,1,72
20130315T071501,39000.,10,26,47000.,2,81
20130315T071501,39000.,10,26,47000.,6,85
20130315T071501,39000.,10,26,46520.,10,95
20130315T071501,43000.,10,36,46520.,10,95
20130315T071501,43200.,4,43,46520.,10,104
20130315T071501,44040.,1,45,46520.,10,108
20130315T071501,44080.,3,48,46520.,10,109
20130315T071501,44080.,3,48,46520.,11,113
20130315T071501,44080.,3,48,46400.,2,131
20130315T071501,45080.,1,51,46400.,2,145
20130315T071501,45080.,1,51,46200.,1,147
20130315T071501,45080.,1,60,46120.,1,182
20130315T071501,45520.,1,65,46120.,1,225
20130315T071501,45520.,1,73,46120.,2,247
20130315T080000,45760.,1,133,46120.,2,378
20130315T080241,45760.,2,199,46120.,2,453
20130315T080945,45760.,3,217,46120.,2,456
20130315T081103,45760.,3,217,46080.,1,457
20130315T081105,45760.,3,218,46080.,2,458
20130315T081106,45760.,4,222,46080.,2,458
20130315T081107,45800.,1,229,46080.,2,458
20130315T082754,45800.,8,266,46080.,2,514

# -----------------------------------------------
#
# The csv data can be loaded using the csv module
#
# named tuples might be used
# for convenience to access
# individual columns


#!/usr/bin/env python

import csv

from collections import namedtuple as NT

file_source = open( 'someone.ssv' )

# -------------------> individual column names ---------------

nt = NT( 'csv_data' , 'date time col1 col2 col3 col4 col5 col6' )

list_tuples = [ ]

for this_row in csv.reader( file_source ) :

# unpack the current row

zed , one , two , tre , fur , fiv , six = this_row

# split the date and time

d , t = zed.split( 'T' )

# convert individual columns in row to a named tuple

this_tuple = nt( d ,
t ,
float( one ) ,
int( two ) ,
int( tre ) ,
float( fur ) ,
int( fiv ) ,
int( six ) )

# save the current named tuple into a list

list_tuples.append( this_tuple )

# update_data_base( this_tuple )
# .... or ....
# update_data_base( choose individual columns )


# individual elements of the named tuples
# can be accessed by name
#
# this might be convenient for settup up
# data for plots of diffeent columns


print

for row in list_tuples :

print ' ' , row.date , row.time , row.col1 , row.col3 , row.col4

file_source.close()


--
Stanley C. Kitching
Human Being
Phoenix, Arizona

someone

unread,
Apr 11, 2013, 4:49:59 AM4/11/13
to
On 2013-04-11 03:39, Cousin Stanley wrote:
> for row in list_tuples :
>
> print ' ' , row.date , row.time , row.col1 , row.col3 , row.col4
>
> file_source.close()

Oh, that's great - thank you - I didn't know this named-tuple container
before... I'm still wondering whether or not it's the optimal container
type for me, because I just added a bit of matplotlib-code:

-----------------
#!/usr/bin/env python

import csv
from collections import namedtuple as NT

file_source = open( 'someone.csv' )
# -------------------> individual column names ---------------
nt = NT( 'csv_data' , 'date time col1 col2 col3 col4 col5 col6' )
list_tuples = [ ]
for this_row in csv.reader( file_source ) :
# unpack the current row
zed , one , two , tre , fur , fiv , six = this_row
# split the date and time
d , t = zed.split( 'T' )
# convert individual columns in row to a named tuple
this_tuple = nt( d ,
t ,
float( one ) ,
int( two ) ,
int( tre ) ,
float( fur ) ,
int( fiv ) ,
int( six ) )
# save the current named tuple into a list
list_tuples.append( this_tuple )
# update_data_base( this_tuple )
# .... or ....
# update_data_base( choose individual columns )

file_source.close()
# individual elements of the named tuples
# can be accessed by name
#
# this might be convenient for settup up
# data for plots of diffeent columns

x=[]
y=[]
print
for row in list_tuples :
print ' ' , row.date , row.time , row.col1 , row.col3 , row.col4
x.append(row.col3)
y.append(row.col4)

import matplotlib.pyplot as plt
plt.plot(x,y)
plt.ylabel('some numbers')
plt.show()
-----------------


As you can see, in order for me to make the x- and y-vectors, I need to
make a for-loop to access the individual rows in list_tuples and then I
append to the x- and y- lists...

Is there any clever way of avoiding this for loop, for either this
container or another clever container type?

If there isn't, then this is absolutely also an acceptable/good solution
for me... I also use Matlab and for matrices you can type e.g. plot(
matrix(:,3), matrix(:,4) ) to plot columns 3 against column 4. But
Matlab also has this problem, that it cannot store strings and numbers
in the same matrix - matrices must entirely be numeric, which my data
isn't (due to 1st column)...

Thanks for any input, if someone has any good ideas...



someone

unread,
Apr 11, 2013, 9:38:50 AM4/11/13
to
On 2013-04-11 10:49, someone wrote:
> On 2013-04-11 03:39, Cousin Stanley wrote:

> Is there any clever way of avoiding this for loop, for either this
> container or another clever container type?

Ah, I see - I can also just add a numpy array, i.e:
------------------------------
import matplotlib.pyplot as plt

test=numpy.random.rand(8,2)
new_tuple = nt(d,t, float(one), int(two), int(tre), float(fur),
int(fiv), test)

#new_tuple is now:
#csv_data(date='20130315', time='071500', col1=39000.0, col2=10,
#col3=26, col4=48000.0, col5=1, col6=array([[ 0.77714064, 0.06729907],
# [ 0.20418563, 0.97882722],
# [ 0.39130897, 0.06611205],
# [ 0.94938335, 0.50254674],
# [ 0.82047434, 0.71624034],
# [ 0.66618477, 0.92025612],
# [ 0.2789423 , 0.19212809],
# [ 0.7048946 , 0.79112071]]))

x=new_tuple.col6[:,0]
y=new_tuple.col6[:,1]
plt.plot(x,y)
plt.show()
------------------------

I get it - THANKS!

Cousin Stanley

unread,
Apr 11, 2013, 1:58:26 PM4/11/13
to
someone wrote:

> ....
> I want to put this table into an appropriate container
> such that afterwards I want to:
>
> 1) Put the data into a mySql-table
> ....

You might consider using sqlite3 as a database manager
since it is "batteries included" with python ....

The stand-alone sqlite interpreter can first be used
to create an empty database named some.sql3
and create a table named xdata in that data base ....

sqlite3 some.sql3 '.read xdata_create.sql'

where the file xdata_create.sql contains ....

create table xdata
(
xdate integer ,
xtime integer ,
col1 real ,
col2 integer ,
col3 integer ,
col4 real ,
col5 integer ,
col6 integer
) ;


# -----------------------------------------------------------

The csv data file can then be inserted into the xdata table
in the some.sql3 database via python ....

import sqlite3 as DBM

fs = open( 'some.csv' )

ls = [ ]

dbc = DBN.connect( 'some.sql3' )

cur = dbc.cursor()

sql = 'insert into xdata values( ? , ? , ? , ? , ? , ? , ? , ? )'

for row in fs :

dt, col1, col2, col3, col4,col5, col6 = row.strip().split(',' )

xdate , xtime = dt.split( 'T' )

xtuple = ( xdate, xtime, col1, col2, col3, col4, col5, col6 )

cur.execute( sql , xtuple )

fs.close()

dbc.commit()

dbc.close()


# ----------------------------------------------------------------

# python data selection example
# for column 4 between 8 and 9


import sqlite3 as DBM

fs = open( 'some.csv' )

ls = [ ]

dbc = DBM.connect( 'some.sql3' )

dbc.row_factory = DBM.Row

cur = dbc.cursor()

list_sql = [
'select xtime , col4' ,
'from xdata' ,
'where xtime >= 80000 and xtime <= 90000 ; ' ]

str_sql = '\n'.join( list_sql )

cur.execute( str_sql )

for row in cur :

print row[ 'xtime' ] , row[ 'col4' ]

fs.close()

dbc.close()


# ----------------------------------------------------

You can be creative with the data selections
and pass them off to be plotted as needed ....

If mysql is used instead of sqlite3
you should only have to monkey with
the data type declarations in xdata_create.sql
and the dbc.connect strings in the python code ....

Cousin Stanley

unread,
Apr 11, 2013, 2:44:12 PM4/11/13
to
Cousin Stanley wrote:

> The stand-alone sqlite interpreter can first be used
> to create an empty database named some.sql3
> and create a table named xdata in that data base ....
>
> sqlite3 some.sql3 '.read xdata_create.sql'

This step can also be done in python
without using the stand-alone sqlite interpreter ....


# -----------------------------------------

import sqlite3 as DBM

dbc = DBM.connect( 'some.sql3' )

cur = dbc.cursor()

list_sql = [
'create table if not exists xdata ' ,
'( ' ,
' xdate integer , ' ,
' xtime integer , ' ,
' col1 real , ' ,
' col2 integer , ' ,
' col3 integer , ' ,
' col4 real , ' ,
' col5 integer , ' ,
' col6 integer ' ,
') ; ' ]

str_sql = '\n'.join( list_sql )

cur.execute( str_sql )

dbc.commit()

dbc.close()

someone

unread,
Apr 11, 2013, 3:42:49 PM4/11/13
to
On 04/11/2013 07:58 PM, Cousin Stanley wrote:
> someone wrote:
> You can be creative with the data selections
> and pass them off to be plotted as needed ....
>
> If mysql is used instead of sqlite3
> you should only have to monkey with
> the data type declarations in xdata_create.sql
> and the dbc.connect strings in the python code ....

Uh, thank you very much for providing me with this (+ also the example
in the other post)! Unfortunately, I'm struggling a bit with my code
(I'm making some python-class'es), so it'll take a few days before I
begin on the SQL-stuff...

I'll get back, if the SQL-code you suggested causes any problems - thank
you VERY much for both examples (in both posts)... I'll try it out ASAP,
when I've made my code object-oriented and well-organized :-)

Thanks!

someone

unread,
Apr 12, 2013, 10:03:49 AM4/12/13
to
Oh, thank you very much! Now I understand this (I haven't really worked
much with sql before, so this was/is new to me, thanks!).

> The csv data file can then be inserted into the xdata table
> in the some.sql3 database via python ....
........ and .......
> # python data selection example
> # for column 4 between 8 and 9

I combined both code snippets into:

==============================
#!/usr/bin/python

import sqlite3 as DBM
import ipdb

fs = open( 'some.csv' )
ls = [ ]
dbc = DBM.connect( 'some.sql3' )
cur = dbc.cursor()
if 0:
sql = 'insert into xdata values( ? , ? , ? , ? , ? , ? , ? , ? )'
for row in fs :
dt, col1, col2, col3, col4,col5, col6 = row.strip().split(',' )
xdate , xtime = dt.split( 'T' )
xtuple = ( xdate, xtime, col1, col2, col3, col4, col5, col6 )
cur.execute( sql , xtuple )
dbc.commit()
else:
list_sql = [
'select xtime , col4' ,
'from xdata' ,
'where xtime >= 80000 and xtime <= 90000 ; ' ]
str_sql = '\n'.join( list_sql )
cur.execute( str_sql )
for row in cur :
#ipdb.set_trace()
# I get: TypeError: "tuple indices must be integers, not str"
# "ipdb> row" says: "(80000, 46120.0)"
#print row[ 'xtime' ] , row[ 'col4' ]
print row[0] , row[1]

fs.close()
dbc.close()
==============================

I don't fully understand it yet, but it's nice to see that it works!
Thank you very much for that! Now I'll have to concentrate on figuring
out how/why it works :-)

> You can be creative with the data selections
> and pass them off to be plotted as needed ....

Yes, I understand. Thank you very much. As you can see, on my system I
had to use:

print row[0] , row[1]

instead of:

print row[ 'xtime' ] , row[ 'col4' ]

I'm not sure exactly why - is it because you have another version of
sqlite3 ? This is a bit strange, but anyway I can live with that - at
least for now...

> If mysql is used instead of sqlite3
> you should only have to monkey with
> the data type declarations in xdata_create.sql
> and the dbc.connect strings in the python code ....

Actually I didn't knew anything about this sqlite3 before now. This is
the first time I try it out, so I don't really know what's the
difference between sqlite3 and mysql...

But thank you very much for providing some code I can now study and
learn from !!! Much appreciated....



someone

unread,
Apr 12, 2013, 10:19:06 AM4/12/13
to
On 2013-04-11 20:44, Cousin Stanley wrote:
> Cousin Stanley wrote:
>
>> The stand-alone sqlite interpreter can first be used
>> to create an empty database named some.sql3
>> and create a table named xdata in that data base ....
>>
>> sqlite3 some.sql3 '.read xdata_create.sql'
>
> This step can also be done in python
> without using the stand-alone sqlite interpreter ....

Ah, that's great (and even better so I don't have to create the
xdata_create.sql file) - thank you!

I collected what you wrote and put together this script:

=====================
#!/usr/bin/python

import sqlite3 as DBM
import ipdb

# ls = [ ] # this seems to be un-used ?
dbc = DBM.connect( 'some.sql3' )
cur = dbc.cursor()

fs = open( 'some.csv' )
if 0: # select whether to create new database file or query from it?
if 0: # switch between "create table xdata" and "... if not exists"
list_sql = [
'create table xdata ',
'( ',
' xdate integer , ',
' xtime integer , ',
' col1 real , ',
' col2 integer , ',
' col3 integer , ',
' col4 real , ',
' col5 integer , ',
' col6 integer ',
') ;' ]
else:
list_sql = [
'create table if not exists xdata ' ,
'( ' ,
' xdate integer , ' ,
' xtime integer , ' ,
' col1 real , ' ,
' col2 integer , ' ,
' col3 integer , ' ,
' col4 real , ' ,
' col5 integer , ' ,
' col6 integer ' ,
') ; ' ]
# ---------------------------------------------
str_sql = '\n'.join( list_sql )
cur.execute( str_sql )
# ---------------------------------------------
# Insert data from input file fs ("some.csv")
sql = 'insert into xdata values( ? , ? , ? , ? , ? , ? , ? , ? )'
for row in fs :
dt, col1, col2, col3, col4,col5, col6 = row.strip().split(',' )
xdate , xtime = dt.split( 'T' )
xtuple = ( xdate, xtime, col1, col2, col3, col4, col5, col6 )
cur.execute( sql , xtuple )
dbc.commit()

else:
list_sql = [
'select xtime , col4' ,
'from xdata' ,
'where xtime >= 80000 and xtime <= 81104 ; ' ]
str_sql = '\n'.join( list_sql )
cur.execute( str_sql )
for row in cur :
#ipdb.set_trace()
# I get: TypeError: "tuple indices must be integers, not str"
# "ipdb> row" says: "(80000, 46120.0)"
#print row[ 'xtime' ] , row[ 'col4' ]
print row[0] , row[1]

fs.close()
dbc.close()
=====================

I think I can learn a lot from google, based on this code - I'm very
grateful for your help!

Now I just need to make a nice interface and couple it to matplotlib, so
it's easy to plot - I think there's a good chance that I can come up
with a good solution from here, based on the help I got from you people...

Thanks again!

Cousin Stanley

unread,
Apr 12, 2013, 12:58:15 PM4/12/13
to
someone wrote:

> As you can see, on my system I
> had to use:
>
> print row[0] , row[1]
>
> instead of:
>
> print row[ 'xtime' ] , row[ 'col4' ]
>
> I'm not sure exactly why

The magic there is setting up the row_factory
after the database connection ....

dbc = DBM.connect( 'some.sql3' )

dbc.row_factory = DBM.Row

> I don't really know what's the difference
> between sqlite3 and mysql...

MySQL is used through a client/server system
where the db server is always running
and client processes submit requests to it
in the form of sql statements ....

SQLite is used as a stand-alone single process
with no external server involved ....

Both speak sql but there are some differences
mostly in data base connection strings
and data type declarations ....

Basic sql selection is ....

select these fields
from these files
where these conditions are met

And that part of sql doesn't vary much
among different data base managers ....

someone

unread,
Apr 12, 2013, 4:52:40 PM4/12/13
to
On 04/12/2013 06:58 PM, Cousin Stanley wrote:
> someone wrote:
>
>> As you can see, on my system I
>> had to use:
>>
>> print row[0] , row[1]
>>
>> instead of:
>>
>> print row[ 'xtime' ] , row[ 'col4' ]
>>
>> I'm not sure exactly why
>
> The magic there is setting up the row_factory
> after the database connection ....
>
> dbc = DBM.connect( 'some.sql3' )
>
> dbc.row_factory = DBM.Row

Ah, thanks a lot - now it works! This is much more "user-friendly"...

>> I don't really know what's the difference
>> between sqlite3 and mysql...
>
> MySQL is used through a client/server system
> where the db server is always running
> and client processes submit requests to it
> in the form of sql statements ....
>
> SQLite is used as a stand-alone single process
> with no external server involved ....

Ok, I see... So SQLite is very good for "practicing"... I'll remember
that, thank you.

> Both speak sql but there are some differences
> mostly in data base connection strings
> and data type declarations ....
>
> Basic sql selection is ....
>
> select these fields
> from these files
> where these conditions are met
>
> And that part of sql doesn't vary much
> among different data base managers ....

Great, thank you very much...

Looks like everything is on track now... I just have to sit and play
with it and make a good interface with matplotlib, but I think I should
be able to come up with something nice, based on the help I god in this
thread...

Thanks again... I just love this python language - makes it possible to
do so much, in so little time and without being an expert at all...



Cousin Stanley

unread,
Apr 12, 2013, 7:26:05 PM4/12/13
to
someone wrote:

> ....
> So SQLite is very good for "practicing"
> ....

Yes it is but it is also very good
for much more than just practice ....

Check the wikipedia info ....

http://en.wikipedia.org/wiki/Sqlite

"It is arguably the most widely deployed database engine,
as it is used today by several widespread browsers,
operating systems, and embedded systems, among others"

The firefox browser keeps different sqlite database files
for various uses ....

If you use firefox check its default directory
and you will see several files with .sqlite
file type extensions ....

Under debian debian linux ....

~/.mozilla/firefox/*.default

Many programmers, including pythonistas,
use sqlite for a convenient and persistent
data store where data can be stashed now
and used later in many different ways
through the diversity of sql selections ....


> Thanks again ....

You're welcome ....

> I just love this python language

Me too .... :-)

someone

unread,
Apr 12, 2013, 8:00:40 PM4/12/13
to
On 04/13/2013 01:26 AM, Cousin Stanley wrote:
> someone wrote:
>
>> ....
>> So SQLite is very good for "practicing"
>> ....
>
> Yes it is but it is also very good
> for much more than just practice ....
>
> Check the wikipedia info ....
>
> http://en.wikipedia.org/wiki/Sqlite

Very interesting...

> "It is arguably the most widely deployed database engine,
> as it is used today by several widespread browsers,
> operating systems, and embedded systems, among others"
>
> The firefox browser keeps different sqlite database files
> for various uses ....

I should remember to use this in the future for my small apps...

> If you use firefox check its default directory
> and you will see several files with .sqlite
> file type extensions ....
>
> Under debian debian linux ....
>
> ~/.mozilla/firefox/*.default

You're right:

/home/myUser/.mozilla/firefox/pv079lxv.default/addons.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/chromeappsstore.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/content-prefs.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/cookies.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/downloads.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/extensions.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/formhistory.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/permissions.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/places.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/search.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/signons.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/urlclassifier3.sqlite
/home/myUser/.mozilla/firefox/pv079lxv.default/webappsstore.sqlite

Very interesting, I didn't knew that :-)

> Many programmers, including pythonistas,
> use sqlite for a convenient and persistent
> data store where data can be stashed now
> and used later in many different ways
> through the diversity of sql selections ....

I'll try to do this in the future also... I just have to practice a bit
more with the SQL commands, but now I can create, update, delete, query
and I guess that's the most important things to know :-)

Thanks for your help and for providing interesting background info :-)


Steven D'Aprano

unread,
Apr 12, 2013, 9:44:31 PM4/12/13
to
On Fri, 12 Apr 2013 23:26:05 +0000, Cousin Stanley wrote:

> The firefox browser keeps different sqlite database files for various
> uses ....

Yes, and I *really* wish they wouldn't. It's my number 1 cause of major
problems with Firefox. E.g.

http://kb.mozillazine.org/Bookmarks_history_and_toolbar_buttons_not_working_-_Firefox


Using a database for such lightweight data as bookmarks is, in my
opinion, gross overkill and adds to the complexity of Firefox. More
complexity leads to more bugs, e.g.:

https://bugzilla.mozilla.org/show_bug.cgi?id=465684#c11

https://bugzilla.mozilla.org/show_bug.cgi?id=431558


Please don't use a full-featured database if you don't need the overhead
of ACID compliance. And if you do, well, Sqlite is not fully ACID compliant.



--
Steven

someone

unread,
Apr 13, 2013, 7:08:14 AM4/13/13
to
On 04/13/2013 03:44 AM, Steven D'Aprano wrote:
> On Fri, 12 Apr 2013 23:26:05 +0000, Cousin Stanley wrote:
>
>> The firefox browser keeps different sqlite database files for various
>> uses ....
>
> Yes, and I *really* wish they wouldn't. It's my number 1 cause of major
> problems with Firefox. E.g.
>
> http://kb.mozillazine.org/Bookmarks_history_and_toolbar_buttons_not_working_-_Firefox

Oh, sorry to hear that... Actually I mostly use chromium (but I also
have firefox installed). I just opened a few of my sqlite files from
within sqlite3 - that was very interesting to see the contents of these
files without being anything like a "hacker"... :-)

> Using a database for such lightweight data as bookmarks is, in my
> opinion, gross overkill and adds to the complexity of Firefox. More
> complexity leads to more bugs, e.g.:
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=465684#c11
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=431558

On the other hand, I guess it's in the spirit of "open source" that it's
easy for everyone to go in an see what's in the configuration files and
(if one wants) modify and/or make own improvements/programs that tamper
with these sql-files ?

> Please don't use a full-featured database if you don't need the overhead

Ok, you're saying there's overhead I should think of... Most of my
programs are rather small in comparison with commercial programs so I
think I don't have to worry about overhead (I don't have any real
speed-critical applications).

> of ACID compliance. And if you do, well, Sqlite is not fully ACID compliant.

I just had to google what ACID compliance means and accordingly to this:

http://en.wikipedia.org/wiki/SQLite

"SQLite is ACID-compliant and implements most of the SQL standard, using
a dynamically and weakly typed SQL syntax that does not guarantee the
domain integrity."

So you seem to disagree with wikipedia?

I however don't understand what it means "to not guarantee domain
integrity"... As I read this, I get the feeling that sqlite *IS* ACID
compliant (wikipedia however doesn't use the wording: "fully ACID
compliant", maybe this is the culprit) ?



Chris Angelico

unread,
Apr 13, 2013, 7:39:02 AM4/13/13
to pytho...@python.org
On Sat, Apr 13, 2013 at 9:08 PM, someone <news...@gmail.com> wrote:
> I just had to google what ACID compliance means and accordingly to this:
>
> http://en.wikipedia.org/wiki/SQLite
>
> "SQLite is ACID-compliant and implements most of the SQL standard, using a
> dynamically and weakly typed SQL syntax that does not guarantee the domain
> integrity."
>
> So you seem to disagree with wikipedia?

Disagreeing with Wikipedia doesn't mean much, but try this:

http://www.sqlite.org/atomiccommit.html

Note that there's a caveat: You have to tell SQLite to be ACID
compliant, effectively.

ChrisA

someone

unread,
Apr 13, 2013, 9:30:06 AM4/13/13
to
On 04/13/2013 01:39 PM, Chris Angelico wrote:
> On Sat, Apr 13, 2013 at 9:08 PM, someone <news...@gmail.com> wrote:
>> I just had to google what ACID compliance means and accordingly to this:
>>
>> http://en.wikipedia.org/wiki/SQLite
>>
>> "SQLite is ACID-compliant and implements most of the SQL standard, using a
>> dynamically and weakly typed SQL syntax that does not guarantee the domain
>> integrity."
>>
>> So you seem to disagree with wikipedia?
>
> Disagreeing with Wikipedia doesn't mean much, but try this:
>
> http://www.sqlite.org/atomiccommit.html

Ok, thanks - I didn't read it all, but interesting.

> Note that there's a caveat: You have to tell SQLite to be ACID
> compliant, effectively.

So, you're saying to me that by default SQLite isn't ACID compliant, if
I begin to use it in my own small programs?

I don't know so much about it - maybe it's a matter of definition... If
I just google for the 3 words: "sqlite acid compliance" I get:

Hit no. 1 is wikipedia.
Hit no. 3 says: "SQLite is an ACID-compliant embedded relational
database management system"
Hit no. 4 says: "SQLite implements ACID-compliance by way of a
transaction journal"
Hit no. 5 says: "SQLite transactions are fully ACID-compliant, allowing
safe access from.."
Hit no. 6 says: "Techopedia explains SQLite. SQLite is atomicity,
consistency, isolation, durability (ACID) compliant."
Hit no. 7: "Tell me what you know about SQLite, the ACID-compliant
embedded relational"
Hit no. 9: "SQLite is superior to Jet for the major reason that SQLite
is ACID-compliant whereas Jet, unfortunately, isn't..."
Hit no. 10: "SQLite for Linux 3.6.17. An ACID-compliant relational
database management system"

I think maybe being it's a question of definitions, i.e. "well, Sqlite
is not fully ACID compliant" vs. all the google hits that just tells
that sqlite is "ACID compliant"...

Do I understand you correct, that by "You have to tell SQLite to be ACID
compliant, effectively", you're saying that by default SQLite isn't ACID
compliant ?

Next question: Is it something I should worry about in my own programs
(I'm not sure, I'm an SQL noob)... ?

Thanks.

Chris Angelico

unread,
Apr 13, 2013, 10:03:25 AM4/13/13
to pytho...@python.org
On Sat, Apr 13, 2013 at 11:30 PM, someone <news...@gmail.com> wrote:
> On 04/13/2013 01:39 PM, Chris Angelico wrote:
>> Note that there's a caveat: You have to tell SQLite to be ACID
>> compliant, effectively.
>
>
> So, you're saying to me that by default SQLite isn't ACID compliant, if I
> begin to use it in my own small programs?
> ...
> Do I understand you correct, that by "You have to tell SQLite to be ACID
> compliant, effectively", you're saying that by default SQLite isn't ACID
> compliant ?
>

First off: I am NOT inherently familiar with sqlite. I'm more familiar
with PostgreSQL, DB2, and MySQL. I'm also not an expert at database
engine design, so this discussion is from the point of view of an
applications developer who has used databases from his apps.

True ACID compliance demands support at every level:

1) The application has to operate in logical units of work, which -
apart from with DB2 - requires an explicit "BEGIN" query, or
single-statement transactions.

2) The database engine must employ some form of write-ahead log.
Different databases do this somewhat differently (according to the
page I linked to, SQLite does this in reverse, maintaining a log
that's sufficient to *undo* the transaction, while PostgreSQL does
this forwards, maintaining a log that's sufficient to *redo* it as
well - more effort, but it can be used for database replication), but
one way or another, there must be a way to detect half-done
transactions.

3) The operating system and filesystem must support a forced file
synchronization (fsync/fdatasync), so the database engine can wait for
the data to be written to disk.

4) The underlying media (hard disk, SSD, USB stick, etc) must respond
to the fsync call by actually writing the content to persistent
storage before returning.

Failure at any level means the overall system is not ACID compliant.
PostgreSQL has a huge amount of code in it to try to deal with (or at
least recognize) a level-3 failure, but nothing in the database engine
can deal with level 1 or 4 issues.

You'd have to actually test it. The easiest way is to get two
computers, side by side, and run the database engine on one and a
monitor on the other. To test some SSDs at work, I knocked together a
little program that worked somewhat thus:

* Connect to the database over TCP/IP (easy, as we were doing this
with PostgreSQL)
* Create a table with a number of rows with an ID and a counter,
initialized to 0
* Repeatedly, in parallel, perform a transaction:
- Increment the counter on one of the rows (at random)
- Increment a "possible" in-memory counter for that row
- Commit the database transaction
- Increment a "confirmed" in-memory counter for that row
* When an error of "database seems to be down" is detected, wait for
it to come up again, then query the table. The counters must all be at
least their corresponding "possible" value and at most the
"confirmed".

With that running, I simply pulled the plug on the database computer.
With a properly-configured hard disk, every one of the counters was
within its correct range. With a lying SSD, though, they could be
anywhere from "pretty close" (with a low workload - simulated by
having only a single thread doing transactions and having it sleep for
a few ms each iteration) to "pretty appalling" (with a bunch of
threads spinning tightly, keeping the workload high). Once the SSD
starts doing major write reordering, its throughput soars, but at the
cost of trustworthiness.

> Next question: Is it something I should worry about in my own programs (I'm
> not sure, I'm an SQL noob)... ?

Yes, it most certainly is. If you have any data that you care about,
put together some kind of test that will allow you to literally pull
the plug on the database, while still knowing whether or not your
transaction was completed (so you'll most likely need some kind of
"possible" / "confirmed" counter pair as I used above).

ChrisA

Roy Smith

unread,
Apr 13, 2013, 10:36:19 AM4/13/13
to
In article <mailman.551.1365861...@python.org>,
Chris Angelico <ros...@gmail.com> wrote:

> 2) The database engine must employ some form of write-ahead log.
> [...]
> one way or another, there must be a way to detect half-done
> transactions.
>
> 3) The operating system and filesystem must support a forced file
> synchronization (fsync/fdatasync), so the database engine can wait for
> the data to be written to disk.
>
> 4) The underlying media (hard disk, SSD, USB stick, etc) must respond
> to the fsync call by actually writing the content to persistent
> storage before returning.

Some of the early Unix file systems were very fragile. One of the
(often under-appreciated) major advances in BSD (it was certainly in
4.2, not sure how much earlier) was a new filesystem which was much more
robust in the face of hardware failures and system crashes. Prior to
BSD, the on-disk data could be left in an inconsistent state if the
system crashed at the wrong time. In BSD, data was written to disk in
such a way that every operation could either be backed out cleanly or
had enough information to complete the transaction.

someone

unread,
Apr 13, 2013, 10:39:12 AM4/13/13
to
On 04/13/2013 04:03 PM, Chris Angelico wrote:
> On Sat, Apr 13, 2013 at 11:30 PM, someone <news...@gmail.com> wrote:
>> On 04/13/2013 01:39 PM, Chris Angelico wrote:
>>> Note that there's a caveat: You have to tell SQLite to be ACID
>>> compliant, effectively.
>>
>>
>> So, you're saying to me that by default SQLite isn't ACID compliant, if I
>> begin to use it in my own small programs?
>> ...
>> Do I understand you correct, that by "You have to tell SQLite to be ACID
>> compliant, effectively", you're saying that by default SQLite isn't ACID
>> compliant ?
>>
>
> First off: I am NOT inherently familiar with sqlite. I'm more familiar
> with PostgreSQL, DB2, and MySQL. I'm also not an expert at database
> engine design, so this discussion is from the point of view of an
> applications developer who has used databases from his apps.

Ok, would be nice to hear the opinion from an sqlite expert then...

> True ACID compliance demands support at every level:
>
> 1) The application has to operate in logical units of work, which -
> apart from with DB2 - requires an explicit "BEGIN" query, or
> single-statement transactions.
>
> 2) The database engine must employ some form of write-ahead log.
> Different databases do this somewhat differently (according to the
> page I linked to, SQLite does this in reverse, maintaining a log
> that's sufficient to *undo* the transaction, while PostgreSQL does
> this forwards, maintaining a log that's sufficient to *redo* it as
> well - more effort, but it can be used for database replication), but
> one way or another, there must be a way to detect half-done
> transactions.
>
> 3) The operating system and filesystem must support a forced file
> synchronization (fsync/fdatasync), so the database engine can wait for
> the data to be written to disk.
>
> 4) The underlying media (hard disk, SSD, USB stick, etc) must respond
> to the fsync call by actually writing the content to persistent
> storage before returning.

Ok.

> Failure at any level means the overall system is not ACID compliant.

Roger... But google says sqlite is supposed to be ACID compliant
(although maybe not "fully" as you indicate, I'm not sure about this)...

> PostgreSQL has a huge amount of code in it to try to deal with (or at
> least recognize) a level-3 failure, but nothing in the database engine
> can deal with level 1 or 4 issues.
>
> You'd have to actually test it. The easiest way is to get two
> computers, side by side, and run the database engine on one and a
> monitor on the other. To test some SSDs at work, I knocked together a
> little program that worked somewhat thus:
>
> * Connect to the database over TCP/IP (easy, as we were doing this
> with PostgreSQL)
> * Create a table with a number of rows with an ID and a counter,
> initialized to 0
> * Repeatedly, in parallel, perform a transaction:
> - Increment the counter on one of the rows (at random)
> - Increment a "possible" in-memory counter for that row
> - Commit the database transaction
> - Increment a "confirmed" in-memory counter for that row
> * When an error of "database seems to be down" is detected, wait for
> it to come up again, then query the table. The counters must all be at
> least their corresponding "possible" value and at most the
> "confirmed".

Ok, that doesn't sound to be so simple after all...

> With that running, I simply pulled the plug on the database computer.
> With a properly-configured hard disk, every one of the counters was
> within its correct range. With a lying SSD, though, they could be
> anywhere from "pretty close" (with a low workload - simulated by
> having only a single thread doing transactions and having it sleep for
> a few ms each iteration) to "pretty appalling" (with a bunch of
> threads spinning tightly, keeping the workload high). Once the SSD
> starts doing major write reordering, its throughput soars, but at the
> cost of trustworthiness.

Ok, it would be nice to hear/read the opinion from another in here
who've been working (a lot?) with sqlite...

>> Next question: Is it something I should worry about in my own programs (I'm
>> not sure, I'm an SQL noob)... ?
>
> Yes, it most certainly is. If you have any data that you care about,
> put together some kind of test that will allow you to literally pull
> the plug on the database, while still knowing whether or not your
> transaction was completed (so you'll most likely need some kind of
> "possible" / "confirmed" counter pair as I used above).

I'm not so rich, so I prefer to go for a free database solution rather
than an expensive license... I've heard good things about oracle and
that's also what they used at my previous company, but it's not
something I am willing to pay for, from my private/own money for my
sparetime-projects...

Maybe what you've written explains why somebody got corrupted firefox
sqlite files... I'll just practice a bit more and remember your advice
about testing - at least for "important" projects, I'll remember how you
tested this with pulling out the plug and monitoring the data...


Walter Hurry

unread,
Apr 13, 2013, 10:56:49 AM4/13/13
to
On Sat, 13 Apr 2013 16:39:12 +0200, someone wrote:

> I'm not so rich, so I prefer to go for a free database solution rather
> than an expensive license
(<paraphrasing> but I do care about ACID compliance)

Sounds to me that PostgreSQL is your man, then.

Chris Angelico

unread,
Apr 13, 2013, 12:15:55 PM4/13/13
to pytho...@python.org
On Sun, Apr 14, 2013 at 12:39 AM, someone <news...@gmail.com> wrote:
> On 04/13/2013 04:03 PM, Chris Angelico wrote:
>> Failure at any level means the overall system is not ACID compliant.
>
> Roger... But google says sqlite is supposed to be ACID compliant (although
> maybe not "fully" as you indicate, I'm not sure about this)...

What your Google hits are telling you is that sqlite can (if
configured correctly) pass level 2. But it doesn't guarantee anything
about the other levels, so it's easy to have an, uhh, ACID leak.

>> You'd have to actually test it. The easiest way is to get two
>> computers, side by side, and run the database engine on one and a
>> monitor on the other.
>
> Ok, that doesn't sound to be so simple after all...

I gave a fairly wordy run-down of what I tested, but it's actually
fairly simple in concept: Do a huge bunch of transactions, and keep a
log of what's returned from the COMMIT query; then pull the power out.

> Ok, it would be nice to hear/read the opinion from another in here who've
> been working (a lot?) with sqlite...

Agreed. I'm sure someone will chime in.

> I'm not so rich, so I prefer to go for a free database solution rather than
> an expensive license... I've heard good things about oracle and that's also
> what they used at my previous company, but it's not something I am willing
> to pay for, from my private/own money for my sparetime-projects...

I concur with Walter's assessment: You want PostgreSQL. It's free/open
source software (highly permissive MIT-like license), massively
trusted, and scales up beautifully. (That last one may not be
significant to you, but it's still good to know your database can
handle hundreds or thousands of tps on basic hardware.)

ChrisA

rusi

unread,
Apr 13, 2013, 1:02:18 PM4/13/13
to
On Apr 13, 9:15 pm, Chris Angelico <ros...@gmail.com> wrote:
Dunno why you guys are ACIDing a hapless python+SQL noob.
As far as I can see he did not even know what ACID was... Just
happened to start with mysql (without evidently knowing the DBMS area)
and Cousin Stanley's recommendation to step a notch down from mysql to
sqlite seems to me to be spot-on for his requirement.

To the OP:
Steven is welcome to his views about use of databases. Good to
remember that everyone does not agree with him. This includes the
firefox devs as well as python devs.

In particular, sqlite in python is quite special. All the other
databases have bridge modules to talk from python to the database.
Which means that python runs and the database runs and the two talk
asynchronously across the bridge using what is called a 'client-server
model'. Now client-server is powerful and sophisticated and earlier it
was the only option. That is the noob database programmer had to
grapple with sql (the basic stuff) along with the transaction/ACID
advanced stuff.

Sqlite changed the rules of the game. Sqlite allows programmers to
play with sql without having to deal with client server headaches at
the same time.
Python amplified that change by bundling it with python.

In short Python+Sqlite is a boon for beginners to programming+DBMS

someone

unread,
Apr 13, 2013, 3:25:08 PM4/13/13
to
On 04/13/2013 04:36 PM, Roy Smith wrote:
> In article <mailman.551.1365861...@python.org>,
> Chris Angelico <ros...@gmail.com> wrote:
>
>> 2) The database engine must employ some form of write-ahead log.
>> [...]
>> one way or another, there must be a way to detect half-done
>> transactions.
>>
>> 3) The operating system and filesystem must support a forced file
>> synchronization (fsync/fdatasync), so the database engine can wait for
>> the data to be written to disk.
>>
>> 4) The underlying media (hard disk, SSD, USB stick, etc) must respond
>> to the fsync call by actually writing the content to persistent
>> storage before returning.
>
> Some of the early Unix file systems were very fragile. One of the
> (often under-appreciated) major advances in BSD (it was certainly in
> 4.2, not sure how much earlier) was a new filesystem which was much more
> robust in the face of hardware failures and system crashes. Prior to

Are you talking about (journaling?) filesystems such as ext3, ext4, JFS,
ReiserFS and XFS ?

http://en.wikipedia.org/wiki/Journaling_file_system

> BSD, the on-disk data could be left in an inconsistent state if the
> system crashed at the wrong time. In BSD, data was written to disk in
> such a way that every operation could either be backed out cleanly or
> had enough information to complete the transaction.

Journaling filesystems? I myself use ext4... There's a comparison here:

http://en.wikipedia.org/wiki/Comparison_of_file_systems

?

someone

unread,
Apr 13, 2013, 3:34:38 PM4/13/13
to
Oh, ok. Thanks! BTW: I just read: "Yahoo runs a multi-petabyte modified
PostgreSQL database that processes billions of events per day" - that's
truely amazing, I think...

I think maybe I'll experiment a bit with both mySql (small/medium sized
databases) and for critical/important stuff I should go with
PostgreSQL... Glad to hear this... Then I know what to look at...

someone

unread,
Apr 13, 2013, 3:42:45 PM4/13/13
to
On 04/13/2013 06:15 PM, Chris Angelico wrote:
> On Sun, Apr 14, 2013 at 12:39 AM, someone <news...@gmail.com> wrote:
>> On 04/13/2013 04:03 PM, Chris Angelico wrote:
>>> Failure at any level means the overall system is not ACID compliant.
>>
>> Roger... But google says sqlite is supposed to be ACID compliant (although
>> maybe not "fully" as you indicate, I'm not sure about this)...
>
> What your Google hits are telling you is that sqlite can (if
> configured correctly) pass level 2. But it doesn't guarantee anything
> about the other levels, so it's easy to have an, uhh, ACID leak.

Ok, thank you very much, this is something I couldn't easily see in the
first place... I think what I should do now is to play a bit with sqlite
and then afterwards, when I'm happy I would begin to play with
postgresql and be very happy with it, knowing that I can (hopefully) use
that for all important projects in the rest of my life :-)

I might also play a bit with mySql, because it's my impression that it
also have a big user-group. But I read that postgresql is MUCH more
"safe" to use (and a bit slower) than postgresql which on the other hand
is VERY safe, being fully ACID-compliant...

>>> You'd have to actually test it. The easiest way is to get two
>>> computers, side by side, and run the database engine on one and a
>>> monitor on the other.
>>
>> Ok, that doesn't sound to be so simple after all...
>
> I gave a fairly wordy run-down of what I tested, but it's actually
> fairly simple in concept: Do a huge bunch of transactions, and keep a
> log of what's returned from the COMMIT query; then pull the power out.

I'll try it (or something similar) out one day in the future and see
what happens with the "corrupted" changes due to pulling out the network
cable while transmitting data...

>> Ok, it would be nice to hear/read the opinion from another in here who've
>> been working (a lot?) with sqlite...
>
> Agreed. I'm sure someone will chime in.
>
>> I'm not so rich, so I prefer to go for a free database solution rather than
>> an expensive license... I've heard good things about oracle and that's also
>> what they used at my previous company, but it's not something I am willing
>> to pay for, from my private/own money for my sparetime-projects...
>
> I concur with Walter's assessment: You want PostgreSQL. It's free/open
> source software (highly permissive MIT-like license), massively
> trusted, and scales up beautifully. (That last one may not be
> significant to you, but it's still good to know your database can
> handle hundreds or thousands of tps on basic hardware.)

I understand that scaling is VERY important and if I could choose
between two "equally" opensource systems and one of them scales better
than the other, I would definately work with the one that scales the
most - that means that I don't have to learn how to use a whole new
system, if I already learnt the system that scales best...

And I just found on google that yahoo runs a HUGE PostgreSQL database...
Very interesting - I'll definately try to play around with postgreSQL at
some time in the future...

someone

unread,
Apr 13, 2013, 3:49:55 PM4/13/13
to
On 04/13/2013 07:02 PM, rusi wrote:
> On Apr 13, 9:15 pm, Chris Angelico <ros...@gmail.com> wrote:
>> On Sun, Apr 14, 2013 at 12:39 AM, someone <newsbo...@gmail.com> wrote:
>>> On 04/13/2013 04:03 PM, Chris Angelico wrote:
.....
.....
>>>> Failure at any level means the overall system is not ACID compliant.
>>> Ok, it would be nice to hear/read the opinion from another in here who've
>>> been working (a lot?) with sqlite...
>>
>> Agreed. I'm sure someone will chime in.
>>
>>> I'm not so rich, so I prefer to go for a free database solution rather than
>>> an expensive license... I've heard good things about oracle and that's also
>>> what they used at my previous company, but it's not something I am willing
>>> to pay for, from my private/own money for my sparetime-projects...
>>
>> I concur with Walter's assessment: You want PostgreSQL. It's free/open
>> source software (highly permissive MIT-like license), massively
>> trusted, and scales up beautifully. (That last one may not be
>> significant to you, but it's still good to know your database can
>> handle hundreds or thousands of tps on basic hardware.)
>>
>> ChrisA
>
> Dunno why you guys are ACIDing a hapless python+SQL noob.

That's ok - I'm very interested in hearing/reading this, so don't worry :-)

> As far as I can see he did not even know what ACID was... Just

I think I know it know (maybe not all the details, but generally I know
that it should be ACID-compliant for critical data to avoid corruption
and bad data) :-)

> happened to start with mysql (without evidently knowing the DBMS area)
> and Cousin Stanley's recommendation to step a notch down from mysql to
> sqlite seems to me to be spot-on for his requirement.

Agree - but after that I would like to play with a client/server-system,
so that's also interesting to hear about...

> To the OP:
> Steven is welcome to his views about use of databases. Good to
> remember that everyone does not agree with him. This includes the
> firefox devs as well as python devs.

Yes, I think I understand this discussion. I'm sorry to hear that the
sqlite-database-files sometimes become corrupted. I haven't experienced
this problem myself (AFAIR), because ~90% of the time I'm on chromium.

> In particular, sqlite in python is quite special. All the other
> databases have bridge modules to talk from python to the database.
> Which means that python runs and the database runs and the two talk
> asynchronously across the bridge using what is called a 'client-server
> model'. Now client-server is powerful and sophisticated and earlier it

Yes, got it :-)

> was the only option. That is the noob database programmer had to
> grapple with sql (the basic stuff) along with the transaction/ACID
> advanced stuff.

Yep, I understand your intentions...

> Sqlite changed the rules of the game. Sqlite allows programmers to
> play with sql without having to deal with client server headaches at
> the same time.
> Python amplified that change by bundling it with python.
>
> In short Python+Sqlite is a boon for beginners to programming+DBMS

I completely agree with you that Python+Sqlite is really really great...
But soon I'll also move on to using a client/server model and therefore
I also appreciate the other comments/discussion related to e.g. failure
or non-"fully-ACID compliance" of sqlite, which maybe can explain this
firefox problem with corrupted database(s)...

I think I learned a lot from this thread and know what I should be
working on now...


Message has been deleted

someone

unread,
Apr 13, 2013, 5:36:07 PM4/13/13
to
On 04/13/2013 10:01 PM, Dennis Lee Bieber wrote:
> On Sun, 14 Apr 2013 00:03:25 +1000, Chris Angelico <ros...@gmail.com>
> declaimed the following in gmane.comp.python.general:

[ ....]

>> * Create a table with a number of rows with an ID and a counter,
>> initialized to 0
>> * Repeatedly, in parallel, perform a transaction:
>> - Increment the counter on one of the rows (at random)
>> - Increment a "possible" in-memory counter for that row
>> - Commit the database transaction
>> - Increment a "confirmed" in-memory counter for that row
>> * When an error of "database seems to be down" is detected, wait for
>> it to come up again, then query the table. The counters must all be at
>> least their corresponding "possible" value and at most the
>> "confirmed".
>>
> SQLite is a "file server" database (like M$ JET engine [aka:
> "Access"]). It's locking system is multi-stage. It allows multiple
> concurrent readers on a "shared" lock state. Only one connection can
> perform write operations ("reserved" lock) alongside the readers. A
> second connection attempting to perform a write will be rejected with a
> database locked condition. Then it really gets nasty -- the writer
> attempts to commit the update: The first step is to block other
> connections from even entering the read state (the "pending" lock).
> However, the writer itself is blocked until all remaining readers have
> exited; only then does it have exclusive access to and SQLite makes
> changes to the database file itself (prior to that, the writer
> connection is changing page images in memory)

Ok, this makes sense... It's not something I'll bother about to begin
with, but maybe later (for critical apps) I can see that this is important.

[ ....]

> In the commit phase, SQLite first tries to ensure the rollback
> journal is flushed to disk -- but that apparently is out of its control;
> it can submit a sync command to the OS, but has to rely on what the OS
> tells it about the state of the writes to disk (the book indicates that
> some IDE drives would lie when queried about sync status, while still
> having unwritten data in the on-board buffers). After the rollback
> journal it submits the data to the database. I

I agree, this must be a problem, when the OS is lying...

> Crash during journal write: restart finds no journal, that transaction
> is lost but the database itself is clean
>
> Crash after journal during database update, restart finds journal,
> assumes database is suspect, and rolls back the pages, database is
> restored to pre-transaction state
>
> Crash after database sync during removal of journal, restart either
> finds journal still there and rolls back the pages restoring to
> pretransaction state, or the file was removed from the directory and
> SQLite determines database file is good with the last transaction in
> place.

Ok, this is a bit more advanced - I'll try to make my own experiments
now and then after some time I guess I can dig more into these details,
thanks.

Roy Smith

unread,
Apr 13, 2013, 5:38:07 PM4/13/13
to
In article <kkcb9f$pei$1...@dont-email.me>, someone <news...@gmail.com>
wrote:

> > Some of the early Unix file systems were very fragile. One of the
> > (often under-appreciated) major advances in BSD (it was certainly in
> > 4.2, not sure how much earlier) was a new filesystem which was much more
> > robust in the face of hardware failures and system crashes. Prior to
>
> Are you talking about (journaling?) filesystems such as ext3, ext4, JFS,
> ReiserFS and XFS ?
>
> http://en.wikipedia.org/wiki/Journaling_file_system

No, I'm talking about

http://en.wikipedia.org/wiki/Berkeley_Fast_File_System

Journaling came along later.

Walter Hurry

unread,
Apr 13, 2013, 6:22:05 PM4/13/13
to
If it were me I wouldn't use MySQL for anything at all. I'd use sqlite
for little non-critical local applications, and Postgres for the rest.

Postgres is not difficult at all, provided you RTFM and follow the
instructions (the documentation is superb). And whichever you use, you
need to learn SQL anyway.

someone

unread,
Apr 13, 2013, 6:31:05 PM4/13/13
to
On 04/14/2013 12:22 AM, Walter Hurry wrote:
> On Sat, 13 Apr 2013 21:34:38 +0200, someone wrote:
>
>> On 04/13/2013 04:56 PM, Walter Hurry wrote:
>>> On Sat, 13 Apr 2013 16:39:12 +0200, someone wrote:
>>>
>>>> I'm not so rich, so I prefer to go for a free database solution rather
>>>> than an expensive license
>>> (<paraphrasing> but I do care about ACID compliance)
>>>
>>> Sounds to me that PostgreSQL is your man, then.
>>
>> Oh, ok. Thanks! BTW: I just read: "Yahoo runs a multi-petabyte modified
>> PostgreSQL database that processes billions of events per day" - that's
>> truely amazing, I think...
>>
>> I think maybe I'll experiment a bit with both mySql (small/medium sized
>> databases) and for critical/important stuff I should go with
>> PostgreSQL... Glad to hear this... Then I know what to look at...
>
> If it were me I wouldn't use MySQL for anything at all. I'd use sqlite
> for little non-critical local applications, and Postgres for the rest.

Ok, thank you. I just came across a blog that said pytables is also a
very good option?

http://www.pytables.org/moin/PyTables?action=AttachFile&do=view&target=non-indexed.png

> Postgres is not difficult at all, provided you RTFM and follow the
> instructions (the documentation is superb). And whichever you use, you
> need to learn SQL anyway.

Good to hear... I'll dig more into it, thank you...


Chris Angelico

unread,
Apr 13, 2013, 6:34:51 PM4/13/13
to pytho...@python.org
On Sun, Apr 14, 2013 at 5:34 AM, someone <news...@gmail.com> wrote:
> I think maybe I'll experiment a bit with both mySql (small/medium sized
> databases) and for critical/important stuff I should go with PostgreSQL

PostgreSQL isn't majorly slower than MySQL, and it's a lot more
trustworthy in terms of database constraints and so on. MySQL is
designed as a place for a single application to store its data, and it
assumes that the application is king; PostgreSQL is designed as a
database against which application(s) may execute queries, therefore
it assumes that the database administrator is king.

With heavy read/write workloads, I'd put my money on PostgreSQL every
time; MySQL has a much greater problem with wide locks (eg
table-level) and consequent loss of concurrency.

ChrisA

Chris Angelico

unread,
Apr 13, 2013, 6:44:28 PM4/13/13
to pytho...@python.org
On Sun, Apr 14, 2013 at 6:01 AM, Dennis Lee Bieber
<wlf...@ix.netcom.com> wrote:
> On Sun, 14 Apr 2013 00:03:25 +1000, Chris Angelico <ros...@gmail.com>
> declaimed the following in gmane.comp.python.general:
>
>> True ACID compliance demands support at every level:
>>
>> 1) The application has to operate in logical units of work, which -
>> apart from with DB2 - requires an explicit "BEGIN" query, or
>> single-statement transactions.
>>
> While SQLite3 normally runs in an auto-commit mode, the Python
> DB-API spec, in general, requires that auto-commit be turned off. "The
> Definitive Guide to SQLite" states that the Python adapter scans
> queries, and will start a transaction if the query is one that will
> change data (insert/replace/update). Read-only queries stay auto-commit
> until one of the data change queries is submitted and not committed.

Okay, that's good. Point still stands, though, that the application
has to use BEGIN/COMMIT correctly; the size of the logical unit of
work should be defined by what's one logical action, not by what gives
the best performance.

>> * Connect to the database over TCP/IP (easy, as we were doing this
>> with PostgreSQL)
>
> You don't with SQLite -- or, properly, it is not to an SQLite
> port... It would be something like an NFS mounted file share -- and we
> all know how uncertain file locking is over NFS. <G>

Sure, but you could easily make a tiny "SQLite server" that accepts
socket connections, reads integers, and writes back "OK" when the
transaction's committed. The only difference is that you have to write
two halves instead of letting the DB itself be the other half.

>> * Create a table with a number of rows with an ID and a counter,
>> initialized to 0
>> * Repeatedly, in parallel, perform a transaction:
>> - Increment the counter on one of the rows (at random)
> So in your example above, the first process to submit an update
> command is going to lock all the others from submitting updates AND will
> itself be held from committing the update until all the other processes
> have closed (commit or rollback their "read sessions").

Ah, that'd be a problem. What if each row is in its own file, though?
Would that work? That is, instead of:

UPDATE durability_test_table SET counter=counter+1 WHERE id=:random_value

you use:

UPDATE durability_test_:random_value SET counter=counter+1

(except, of course, that SQL parameterization wouldn't work there, so
it'd be Python string manipulation) - this way, transactions will lock
only against other transactions manipulating the same entry, which is
effectively the same as row-level locking. With 2-3 times as many
"rows" as threads, there should be very little lock contention.

ChrisA

Chris Angelico

unread,
Apr 13, 2013, 6:54:28 PM4/13/13
to pytho...@python.org
On Sun, Apr 14, 2013 at 8:31 AM, someone <news...@gmail.com> wrote:
> Ok, thank you. I just came across a blog that said pytables is also a very
> good option?
>
> http://www.pytables.org/moin/PyTables?action=AttachFile&do=view&target=non-indexed.png

>From what I gather, that's looking at performance of a non-indexable
query on a 10,000,000-row table. That's going to suck whatever you do,
and the exact level of suckitude doesn't really prove much. (Note that
even the best options are taking half a second for this single query.)

A better test of a database is transactions per second of something
that approximates to your real workload. For instance, English
Wikipedia has roughly a hundred edits per minute (assessed by me just
now by looking at the Recent Changes), and some ridiculous number of
page reads per minute (not assessed, but believed to be somewhere
between 11 and Graham's number); so a test of a proposed new database
would have to mimic this ratio. Most of the queries involved should be
able to be answered using indexes; in some cases, ONLY using the index
(eg if you just want to know whether or not a row exists).

PyTables may well outperform PostgreSQL in real usage, but that one
graph doesn't tell me that. (Not to mention that it's measuring a
somewhat old PG.)

ChrisA
Message has been deleted

someone

unread,
Apr 13, 2013, 8:06:26 PM4/13/13
to
On 04/14/2013 12:54 AM, Chris Angelico wrote:
> On Sun, Apr 14, 2013 at 8:31 AM, someone <news...@gmail.com> wrote:
>> Ok, thank you. I just came across a blog that said pytables is also a very
>> good option?
>>
>> http://www.pytables.org/moin/PyTables?action=AttachFile&do=view&target=non-indexed.png
>
>>From what I gather, that's looking at performance of a non-indexable
> query on a 10,000,000-row table. That's going to suck whatever you do,
> and the exact level of suckitude doesn't really prove much. (Note that
> even the best options are taking half a second for this single query.)

Interesting... Thank you very much for that information...

> A better test of a database is transactions per second of something
> that approximates to your real workload. For instance, English
> Wikipedia has roughly a hundred edits per minute (assessed by me just
> now by looking at the Recent Changes), and some ridiculous number of
> page reads per minute (not assessed, but believed to be somewhere
> between 11 and Graham's number); so a test of a proposed new database
> would have to mimic this ratio. Most of the queries involved should be
> able to be answered using indexes; in some cases, ONLY using the index
> (eg if you just want to know whether or not a row exists).
>
> PyTables may well outperform PostgreSQL in real usage, but that one
> graph doesn't tell me that. (Not to mention that it's measuring a
> somewhat old PG.)

Ok, thank you very much... Sounds to me like PostgreSQL it is, then :-)


someone

unread,
Apr 13, 2013, 8:10:54 PM4/13/13
to
Ok, thank you very much... Sounds like PostgreSQL is the best option for
me to go on to from here, after I've played a bit my sqlite...

Steven D'Aprano

unread,
Apr 14, 2013, 3:56:21 AM4/14/13
to
On Sat, 13 Apr 2013 10:02:18 -0700, rusi wrote:

> To the OP:
> Steven is welcome to his views about use of databases.

I haven't given any views about databases. I've given my view on
application developers -- specifically, Firefox -- using a not-quite ACID
database in a way that is fragile, can cause data loss, and adds lots
more complexity to the application AND the end-user experience. And for
what? Simple data that would be much better in a simpler format, such as
bookmarks.


> Good to remember
> that everyone does not agree with him. This includes the firefox devs as
> well as python devs.

I don't see what the Python devs have to do with it. They don't use
Sqlite for Python's internals, and the fact that there is a module for
sqlite doesn't mean squat. There's a module for parsing Sun AU audio
files, that doesn't mean the Python devs recommend that they are the best
solution to your audio processing and multimedia needs.

I'm not saying that Sqlite doesn't have it's uses, although I personally
haven't found them yet. And as for the Firefox devs, well, I'll just let
Jamie Zawinski show their l33t des1gn ski11z in context:

http://www.jwz.org/blog/2003/01/more-khtml/

Okay, that's ten years old. What do you think the odds are that Firefox
has a nice, clean design by now? Well, I suppose it's possible, but when
it takes a minimum of NINE files to do the equivalent of "Hello World" in
Firefox, I wouldn't put money on it:

http://kb.mozillazine.org/Getting_started_with_extension_development

I mean, really -- bookmarks, in a single-user application, and they store
it in a database. You can't even have two instances of Firefox running at
the same time.

The consequences of this over-engineered solution is that Firefox is more
complex and fragile than it needs be, and less reliable than it could be.
When your bookmarks database gets corrupt, which is easy, the browser
History and Back button stop working, which then pushes responsibility
for fixing the database corruption back on the user. So the Firefox
developers actually end up paying the costs of a non-lightweight
implementation, but without the benefits. They don't even get to remove
the old bookmarks to HTML code, since they still need it for manual
exports and backups.

Considering the rest of the Firefox architecture (XUL, XUL everywhere!),
using sqlite probably feels like a lightweight solution to the devs.

"The Mork database structure used by Mozilla Firefox v1-2 is unusual to
say the least. It was originally developed by Netscape for their browser
(Netscape v6) and the format was later adopted by Mozilla to be used in
Firefox. It is a plain text format which is not easily human readable
and is not efficient in its storage structures. For example, a single
Unicode character can take many bytes to store. The developers
themselves complained it was extremely difficult to parse correctly and
from Firefox v3, it was replaced by MozStorage which is based on an
SQLite database."


http://wordpress.bladeforensics.com/?p=357

http://en.wikipedia.org/wiki/Mork_%28file_format%29




--
Steven

rusi

unread,
Apr 14, 2013, 7:17:28 AM4/14/13
to
On Apr 14, 12:56 pm, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
> On Sat, 13 Apr 2013 10:02:18 -0700, rusi wrote:
> > To the OP:
> > Steven is welcome to his views about use of databases.
>
> I haven't given any views about databases.

You are twisting "use of databases" to just "about databases"

And heres what you said:

> Using a database for such lightweight data as bookmarks is, in my
> opinion, gross overkill and adds to the complexity of Firefox. More
> complexity leads to more bugs…

Not that I would disagree with that for general databases, just for
something as atypical as sqlite.
In short, you are being hypnotized by the word 'database' and not
seeing that sqlite is a very strange instance of that species.
http://en.wikipedia.org/wiki/Etymological_fallacy
+
http://en.wikipedia.org/wiki/Accident_%28fallacy%29

> I've given my view on
> application developers -- specifically, Firefox -- using a not-quite ACID
> database in a way that is fragile, can cause data loss,

FUD
Are you saying that flat-files dont lose data?

> and adds lots
> more complexity to the application AND the end-user experience. And for
> what?

Strange argument: If I call a one line re.match(..) that hooks into
5000 arcane lines of the re module, on whose account is the complexity
-- mine or python's?

From a programmer's POV if 10 lines of flat-file munging are reduced
to two lines of SQL its a reduction of 10 to 2.

> Simple data that would be much better in a simpler format, such as
> bookmarks.
>
> > Good to remember
> > that everyone does not agree with him. This includes the firefox devs as
> > well as python devs.
>
> I don't see what the Python devs have to do with it. They don't use
> Sqlite for Python's internals, and the fact that there is a module for
> sqlite doesn't mean squat. There's a module for parsing Sun AU audio
> files, that doesn't mean the Python devs recommend that they are the best
> solution to your audio processing and multimedia needs.

Python made a choice to include AU file support when Sun existed and
looked more respectable than MS. Today the support continues to exist
probably for backward compatibility reasons. "The code's already
written. Why remove it?"
Sure but it has its costs -- memory footprint, sources-size etc --
which are deemed negligible enough to not bother.

Likewise python 2.5 made a choice to include sqlite. Following RoR's D
Hansson we may call it an 'opinionated choice.' That choice implies
that the devs decided that a fixed-cost of bundling sqlite with python
is deemed better than each programmer installing/rolling-his-own etc


>
> I'm not saying that Sqlite doesn't have it's uses, although I personally
> haven't found them yet. And as for the Firefox devs, well, I'll just let
> Jamie Zawinski show their l33t des1gn ski11z in context:
>
> http://www.jwz.org/blog/2003/01/more-khtml/
>

Faulty generalization fallacy:
http://en.wikipedia.org/wiki/Faulty_generalization
Because some code in firefox is bad, every choice of firefox is bad?
[Actually I am surprised that you agree with *that example*: Would you
claim that a void returning, no-argument function is better than one
with arguments and return values? Anyways thats really far away from
this discussion…]

To the OP:
Lets deconstruct ACID.

Consistency+Atomicity:
Lets say you write some stack code like this
stack[top] = newvalue
top += 1

And if you catch the machine state between the two assignments, you
will find an *inconsistent* stack because that code is *non-atomic*
Should you bother? Yes if you have concurrency, no if not.

Likewise Isolation is vacuously guaranteed if you are the sole guy
running your code.

As for Durability, if you randomly turn off your machine when your
program is running, yes you may lose the results of your program. You
may lose much else!

IOW if you are alone on your machine, all discussion of ACID is moot

Chris Angelico

unread,
Apr 14, 2013, 9:22:03 AM4/14/13
to pytho...@python.org
On Sun, Apr 14, 2013 at 9:17 PM, rusi <rusto...@gmail.com> wrote:
> On Apr 14, 12:56 pm, Steven D'Aprano <steve
> +comp.lang.pyt...@pearwood.info> wrote:
>> I've given my view on
>> application developers -- specifically, Firefox -- using a not-quite ACID
>> database in a way that is fragile, can cause data loss,
>
> FUD
> Are you saying that flat-files dont lose data?

If they do, a human being can easily open them up and see what's
inside. Suppose bookmarks are stored like this:

r"""Some-Browser-Name web bookmarks file - edit with care
url: http://www.google.com/
title: Search engine
icon: whatever-format-you-want-to-use

url: http://www.duckduckgo.com/
title: Another search engine

url: http://www.python.org/

url: ftp://192.168.0.12/
title: My FTP Server
desc: Photos are in photos/, videos are in videos/
Everything else is in other/
user: root
pass: secret
"""

The parsing of this file is pretty simple. Blank line marks end of
entry; indented line continues the previous attribute (like RFC822),
everything else is "attribute: value". (You might even be able to
abuse an RFC822 parser/compositor for the job.) The whole file has to
be read and rewritten for any edits, so it's unsuited to gigabytes of
content; but we're talking about *web browser bookmarks* here. I know
some people have a lot of them, but hardly gigs and gigs. And if you
think they will, then all you need to do is have multiple files, eg
one for each folder in the bookmark tree.

Now suppose it gets damaged somehow. Firstly, that's a lot less likely
with a simple file format and a "write to temp file, then move temp
file over main file" setup; but mainly, it's very easy to
resynchronize - maybe there'll be one bookmark (or a group of
bookmarks) that get flagged as corrupted, but everything after that
can be parsed just fine - as soon as you get to a blank line, you
start parsing again. Very simple. Well suited to a simple task. (Note,
however, that the uber-simple concept I've posited here would have the
same concurrency problems that Firefox has. At very least, it'd rely
on some sort of filesystem-level lock when it starts rewriting the
file. But this is approximately similar to running two instances of a
text editor and trying to work with the same file.)

> From a programmer's POV if 10 lines of flat-file munging are reduced
> to two lines of SQL its a reduction of 10 to 2.

The complexity exists in a variety of places. The two lines of SQL
hide a morass of potential complexity; so would a massive regex. The
file itself is way harder for external tools to manage. And all of it
can be buggy. With a simple flat-file system, chances are you can turn
it into a nested list structure and a dict for indexing (or possibly a
collections.OrderedDict), and then you have the same reduction - it's
just simple in-memory operations, possibly followed by a save() call.
All the options available will do that, whether flat-file or database.

>> I don't see what the Python devs have to do with it. They don't use
>> Sqlite for Python's internals, and the fact that there is a module for
>> sqlite doesn't mean squat. There's a module for parsing Sun AU audio
>> files, that doesn't mean the Python devs recommend that they are the best
>> solution to your audio processing and multimedia needs.
>
> Python made a choice to include AU file support when Sun existed and
> looked more respectable than MS. Today the support continues to exist
> probably for backward compatibility reasons. "The code's already
> written. Why remove it?"
> Sure but it has its costs -- memory footprint, sources-size etc --
> which are deemed negligible enough to not bother.

Actually, this is one place where I disagree with the current decision
of the Python core devs: I think bindings for other popular databases
(most notably PostgreSQL, and probably MySQL since it's so widely
used) ought to be included in core, rather than being shoved off to
PyPI. Databasing is so important to today's world that it would really
help if people had all the options right there in core, if only so
they're more findable (if you're browsing docs.python.org, you won't
know that psycopg is available). Currently the policy seems to be "we
don't include the server so why should we include the client"; I
disagree, I think the client would stand nicely on its own. (Does
Python have a DNS server module? DNS client? I haven't dug deep, but
I'm pretty sure I can do name lookups in Python, yet running a DNS
server is sufficiently arcane that it can, quite rightly, be pushed
off to PyPI.) But this is minor, and tangential to this discussion.

> Faulty generalization fallacy:
> http://en.wikipedia.org/wiki/Faulty_generalization
> Because some code in firefox is bad, every choice of firefox is bad?

It's a matter of windows into the philosophy, rather than specific
examples. Requiring nine files to do a "Hello World" extension
suggests a large corpus of mandatory boilerplate; imagine, for
instance, that my example bookmarks file structure had demanded
_every_ attribute be provided for _every_ bookmark, instead of
permitting the defaults. That would demonstrate overkill in design,
and the sort of person who would produce that is probably unable to
simplify code for the same reasons.

> As for Durability, if you randomly turn off your machine when your
> program is running, yes you may lose the results of your program. You
> may lose much else!
>
> IOW if you are alone on your machine, all discussion of ACID is moot

No, no, a thousand times no! If I am doing financial transactions,
even if I'm alone on my machine, I will demand full ACID compliance.
Randomly turning off the machine is a simulation of the myriad
possible failures - incoming power failure (or UPS failure, if you
have one), power supply goes boom, motherboard gets fried, operating
system encounters a hard failure condition, cleaning lady unplugs the
server to put her vacuum cleaner onto the UPS... anything. The point
of ACID compliance is that you might lose the results of *this run* of
the program, but nothing more; and if any other program has been told
"That's committed", then it really has been. Without some such
guarantee, you might lose *all the data you have stored*, because
something got corrupted. Partial guarantees of acidity are
insufficient; imagine if power failure during ALTER TABLE can result
in your whole database being unreadable.

With the setup I described above, everything works beautifully if the
OS guarantees an atomic mv() operation. Even if it doesn't, you can
probably figure out what's going on by inspecting the file state; for
instance, you can assume that a non-empty main file should be kept
(discarding the temporary), but if the main file is empty or absent
AND the temporary is readable and parseable, use the temporary. (This
assumes that a fresh install creates a non-empty file, otherwise
there's ambiguity at initial file creation which would need to be
resolved. But you get the idea.)

Of course, that uber-simple option does require a full file rewrite
for every edit. But like I said, it's designed for simplicity, not
concurrent writing.

ChrisA

Ned Deily

unread,
Apr 14, 2013, 12:40:53 PM4/14/13
to pytho...@python.org
In article
<CAPTjJmrP_9saiG89DKse-P6D...@mail.gmail.com>,
Chris Angelico <ros...@gmail.com> wrote:
> Actually, this is one place where I disagree with the current decision
> of the Python core devs: I think bindings for other popular databases
> (most notably PostgreSQL, and probably MySQL since it's so widely
> used) ought to be included in core, rather than being shoved off to
> PyPI. Databasing is so important to today's world that it would really
> help if people had all the options right there in core, if only so
> they're more findable (if you're browsing docs.python.org, you won't
> know that psycopg is available). Currently the policy seems to be "we
> don't include the server so why should we include the client"; I
> disagree, I think the client would stand nicely on its own. (Does
> Python have a DNS server module? DNS client? I haven't dug deep, but
> I'm pretty sure I can do name lookups in Python, yet running a DNS
> server is sufficiently arcane that it can, quite rightly, be pushed
> off to PyPI.) But this is minor, and tangential to this discussion.

For the bindings to be useful, Python batteries-included distributions
(like python.org installers) would either need to also ship the various
DB client libraries for all supported platforms (including Windows),
which adds complexity and potentially intractable license issues, or
there would need to be reverse-engineered implementations of the client
libs or wire protocols, either option adding fragility and complex
testing issues. DNS client lookups use published, well-understood
Internet-standard protocols, not at all like talking to a third-party
database, be it open-source or not. Sqlite3 is certainly an anomaly in
that it is not-only open source but designed to be a lightweight,
compatible library that runs on just about everything, and with a
fanatical devotion to compatibility and documentation. These days just
about every major product or operating system platform ships with or
uses a copy of sqllite3 for something.

--
Ned Deily,
n...@acm.org

Cousin Stanley

unread,
Apr 14, 2013, 2:20:17 PM4/14/13
to
Steven D'Aprano wrote:

> On Fri, 12 Apr 2013 23:26:05 +0000, Cousin Stanley wrote:
>
>> The firefox browser keeps different sqlite database files for various
>> uses ....
>
> Yes, and I *really* wish they wouldn't.
>
> It's my number 1 cause of major problems with Firefox.

Problems with software of any flavor,
especially software that is used regularly
and upon which we are somewhat dependent,
are always a source of frustration ....

My own personal use of firefox over the years
has been limited as I have not used it
for my primary browser and have not experienced
any problems with its bookmarks ....

I use opera as my primary browser
and would very much like to convert
the plain-vanilla bookmark.adr file
that opera uses into an sqlite data base
for diversity in bookmark searches
that would be independent of reglular
browser usage ....

$ grep FOLDER ~/.opera/bookmarks.adr | wc -l
631

$ grep URL ~/.opera/bookmarks.adr | wc -l
14944


> http://kb.mozillazine.org/Bookmarks_history_and_toolbar_buttons_not_working_-_Firefox

Although there have been many reports entailing corruption
of the places.sqlite file, it isn't apparent to me
from the link above that sqlite itself is the culprit ....

Could the complexity/bugginess of the firefox code
possibly be the cause instead ?

"If Firefox works normally when you first open it
after starting up the computer but multiple symptoms arise
after you close and later reopen Firefox, it's likely
that a Firefox process from a previous session
did not close properly and the Places database
( "places.sqlite" file ) is locked."


If you check the headers of any of my posts here
you will find that I post with a python-based news client
named XPN that also uses sqlite for persistent storage,
one sqlite database for each different newsgroup ....

I've used xpn daily for many years and have never experienced
a corrupted sqlite database file ....

firefox + sqlite ----> buggy ? ....... :-(

python + sqlite ----> ok, hooray .... :-)


> Using a database for such lightweight data as bookmarks is, in my
> opinion, gross overkill and adds to the complexity of Firefox.
>
> More complexity leads to more bugs, e.g.:
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=465684#c11
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=431558

These pages show problems that are 4 and 5 years old
from 2008 & 2009 and are marked as Status: RESOLVED FIXED
at the top of the page ....

Are you still having firefox bookmark problems today ?


--
Stanley C. Kitching
Human Being
Phoenix, Arizona

Tim Chase

unread,
Apr 14, 2013, 4:16:08 PM4/14/13
to Ned Deily, pytho...@python.org
On 2013-04-14 09:40, Ned Deily wrote:
> DNS client lookups use published, well-understood
> Internet-standard protocols, not at all like talking to a
> third-party database, be it open-source or not.

That said, even though DNS is a publicly documented standard, I've
reached for DNS code in the Python stdlib on multiple occasions
(usually to try and snag the MX record for a customer, so smtplib can
send stuff to it), and get disappointed each time. I'd really love
if there was a simple DNS-lookup module available in the stdlib,
especially if it allowed overriding the server to ask. I mean...POP,
IMAP and SMTP are all publicly documented standards that Python makes
easily accessible. DNS would be a good addition.

-tkc



Chris Angelico

unread,
Apr 14, 2013, 5:43:47 PM4/14/13
to pytho...@python.org
On Mon, Apr 15, 2013 at 2:40 AM, Ned Deily <n...@acm.org> wrote:
> In article
> <CAPTjJmrP_9saiG89DKse-P6D...@mail.gmail.com>,
> Chris Angelico <ros...@gmail.com> wrote:
> > Actually, this is one place where I disagree with the current decision
>> of the Python core devs: I think bindings for other popular databases
>> (most notably PostgreSQL, and probably MySQL since it's so widely
>> used) ought to be included in core, rather than being shoved off to
>> PyPI. Databasing is so important to today's world that it would really
>> help if people had all the options right there in core, if only so
>> they're more findable (if you're browsing docs.python.org, you won't
>> know that psycopg is available). Currently the policy seems to be "we
>> don't include the server so why should we include the client"; I
>> disagree, I think the client would stand nicely on its own. (Does
>> Python have a DNS server module? DNS client? I haven't dug deep, but
>> I'm pretty sure I can do name lookups in Python, yet running a DNS
>> server is sufficiently arcane that it can, quite rightly, be pushed
>> off to PyPI.) But this is minor, and tangential to this discussion.
>
> For the bindings to be useful, Python batteries-included distributions
> (like python.org installers) would either need to also ship the various
> DB client libraries for all supported platforms (including Windows),
> which adds complexity and potentially intractable license issues, or
> there would need to be reverse-engineered implementations of the client
> libs or wire protocols, either option adding fragility and complex
> testing issues. DNS client lookups use published, well-understood
> Internet-standard protocols, not at all like talking to a third-party
> database, be it open-source or not. Sqlite3 is certainly an anomaly in
> that it is not-only open source but designed to be a lightweight,
> compatible library that runs on just about everything, and with a
> fanatical devotion to compatibility and documentation. These days just
> about every major product or operating system platform ships with or
> uses a copy of sqllite3 for something.

Understandable, but I'm actually referencing a discussion on either
python-dev or python-ideas where the statement was made that it didn't
make sense to include the client for something that the server for
wasn't included. I can't find the discussion thread off-hand, but
that, rather than the portability/complication issues, seemed to be
the primary line of argument.

I don't know about any others, but PostgreSQL's wire protocol isn't
all that difficult to work with, and since we're talking about
something where the far end is almost certainly going to consume some
time, it wouldn't hurt to implement it in pure Python. Based on
http://wiki.postgresql.org/wiki/Python it seems there are a few
modules that do just that (unchecked, but if they work on any platform
and don't require libpq, I strongly suspect they use Python's own
networking); if one of those is of sufficient code quality for the
stdlib, I think it would be an excellent addition. However, I am not a
core dev, therefore sqlite is the only one included.

ChrisA

Roy Smith

unread,
Apr 14, 2013, 5:48:17 PM4/14/13
to
In article <mailman.605.1365973...@python.org>,
Tim Chase <pytho...@tim.thechases.com> wrote:

> I'd really love if there was a simple DNS-lookup module available in
> the stdlib, especially if it allowed overriding the server to ask.

pip install dnspython

rusi

unread,
Apr 15, 2013, 7:45:25 AM4/15/13
to
I am trying to understand your points Chris. On the one hand you say:

On Apr 14, 6:22 pm, Chris Angelico <ros...@gmail.com> wrote:
> No, no, a thousand times no! If I am doing financial transactions,
> even if I'm alone on my machine, I will demand full ACID compliance.



On the other you describe a bookmark storage scheme (which it seems
you are recommending); to wit

> Suppose bookmarks are stored like this:
>
> r"""Some-Browser-Name web bookmarks file - edit with care
> url:http://www.google.com/
> title: Search engine
> icon: whatever-format-you-want-to-use
>
> url:http://www.duckduckgo.com/
> title: Another search engine
>
> url:http://www.python.org/
>
> url:ftp://192.168.0.12/
> title: My FTP Server
> desc: Photos are in photos/, videos are in videos/
>  Everything else is in other/
> user: root
> pass: secret
> """
>
> The parsing of this file is pretty simple. Blank line marks end of
> entry;…

So are you saying that if one switches from the non-ACID compliant
sqlite to your simple-text data-format, the new 'database' (note the
quote marks) will now become ACID compliant?

Chris Angelico

unread,
Apr 15, 2013, 8:28:45 AM4/15/13
to pytho...@python.org
On Mon, Apr 15, 2013 at 9:45 PM, rusi <rusto...@gmail.com> wrote:
> I am trying to understand your points Chris. On the one hand you say:
>
> On Apr 14, 6:22 pm, Chris Angelico <ros...@gmail.com> wrote:
>> No, no, a thousand times no! If I am doing financial transactions,
>> even if I'm alone on my machine, I will demand full ACID compliance.
>
> On the other you describe a bookmark storage scheme (which it seems
> you are recommending); to wit
> ...
> So are you saying that if one switches from the non-ACID compliant
> sqlite to your simple-text data-format, the new 'database' (note the
> quote marks) will now become ACID compliant?

Unlikely. It theoretically could be made ACID compliant (all it needs
is an OS-guaranteed atomic move/rename operation), but my point is
that some things don't _need_ full-on databases. Financial work *does*
(if I'm accepting money from people, I'd better make pretty sure I
know who's paid me and how much); bookmarks usually don't. Also,
bookmarks are the exclusive property of the person who creates them,
so it's helpful to store them in a way that can be edited; with money
movements, you often want some kind of indelibility guarantee, too
(you can't go back and edit a previous transaction, you have to put in
a correcting transaction). Different tasks demand different storage
schemes.

ChrisA
0 new messages