Proposal: Database Schema Migration API

Shawn Wilsher

unread,

Oct 20, 2009, 1:52:33 PM10/20/09

to dev-pl...@lists.mozilla.org, Marco Bonardo, asuth...@asutherland.org, Paul O’Shannessy

One of the problems that mozStorage doesn't currently try to address is
migrating database schemas between changes. As a result, every consumer
of the API handles it in their own way, each having the possibility to
introduce subtle bugs.

As a result, I'd like to propose an API to handle this. I've
determined, however, that using xpidl for this would be rather painful
for both C++ and JS consumers, so I'm actually going to create one API
for each (using jsapi to define a js-friendly version for JS consumers,
and a notxpcom method on mozIStorageConnection for C++ callers).

The best part about this API is that we can properly handle downgrades
of database schemas, which is something that is poorly handled, if it's
even handled by all existing consumers. We will store all known schema
versions in the database, and when someone installs an older version of
the application, the downgrade logic will be embedded in the database.

The rest of this post is going to be a bit lengthy, but it contains the
specification as I've worked it out so far. In both APIs, I'm aware
that the methods are synchronous, and in the past I've advocated not
doing anything synchronous. However, all database migrations are
currently performed synchronously, so nothing major is changing. This
API is the first step to create an asynchronous API to open a database
file, and getting consumers using this will make opening databases
asynchronously easier.

== Native Code API ==

Consumers of this API will pass in a statically allocated struct that
defines the schema. This is a bit verbose, but it can be shoved off
into a header file and just #included.

interface mozIStoageConnection : nsISupports {

/**
* Synchronously checks the schema version of the database and
* performs migration if necessary.
*
* @param aSchema
* The schema definition of this database.
*/
[notxpcom] nsresult checkAndMigrateSchema(struct Schema &aSchema);
}

The structures used for this are a bit verbose, but I've included them
below (and in a syntax highlighted wiki form here:
https://developer.mozilla.org/User:sdwilsh/SchemaProposal/CPP). I've
worked with Marco Bonardo on this to make sure it has all the needed
options to implement the Places schema, which is arguably the most
complex one in the tree. Additionally, I've implemented the Cookies
database schema with this, and that example can be found on my MDC talk
page
(https://developer.mozilla.org/User:sdwilsh/SchemaProposal/CookiesSample#Native_Code_Sample).

/**
* Describes the type of a column.
*/
enum ColumnType {
INTEGER_COLUMN,
REAL_COLUMN,
TEXT_COLUMN,
BLOB_COLUMN,
};
/**
* Describes the type of default for the column.
*/
enum DefaultType {
NO_DEFAULT,
INTEGER_DEFAULT,
REAL_DEFAULT,
TEXT_DEFAULT,
BLOB_DEFAULT,
NULL_DEFAULT,
};
struct Column
{
/**
* The name of the column.
*/
const char * const name;

/**
* The type of the column.
*/
const ColumnType columnType;

/**
* Indicates if the column should automatically increment or not.
* Only valid for INTEGER_COLUMN columns.
*/
const bool autoIncrement;

/**
* The default value for the column.
*/
union {
const int64_t integer;
const double real;
const char * const text;
const unsigned char * const blob;
} defaultValue;

/**
* The type of value the default is for this column.
*/
const DefaultType defaultType;

/**
* If the column can be null or not.
*/
const bool allowNull;
};

/**
* Describes when the trigger should be executed in relation to the
* trigger's event type.
*/
enum TriggerEventTime {
BEFORE,
AFTER,
INSTEAD_OF,
};
/**
* Describes the type of event the trigger acts on.
*/
enum TriggerEventType {
DELETE,
INSERT,
UPDATE,
};
struct Trigger
{
/**
* The trigger's name. If this trigger changes between versions, its
* name should also change.
*/
const char * const name;

/**
* The time when the trigger should be executed in relation to
* eventType.
*/
const TriggerEventTime eventTime;

/**
* The event that should cause this trigger to run.
*/
const TriggerEventType eventType;

/**
* The list of columns that this trigger should run for. This is
* only used for UPDATE triggers.
*/
const Column * const columns;

/**
* Array of conditions that must be satisfied for the trigger to
* run. Must be null terminated strings. Can be null if
* numConditions is zero.
*/
const char * const * const conditions;

/**
* The number of elements in the conditions array.
*/
const size_t numConditions;

/**
* Array of SQL statements that define the work the trigger does.
*/
const char * const * const statements;

/**
* The number of elements in the statements array. Must be greater
* than zero.
*/
const size_t numStatements;
};

/**
* Describes the sort direction of the index.
*/
enum IndexDirection {
ASC,
DESC,
};
struct Index
{
/**
* The name of the index. If this index changes between versions,
* its name should also change.
*/
const char * const name;

/**
* Indicates if this index should enforce uniqueness of its entries
* or not.
*/
const bool unique;

/**
* Array of columns that make up this index. Must not be null.
*/
const Column * const columns;

/**
* The number of elements in the columns array. Must be greater than
* zero.
*/
const size_t numColumns;

/**
* The sort order of this index.
*/
const IndexDirection direction;
};

struct Table
{
/**
* The name of the table.
*/
const char * const name;

/**
* The primary key of the table.
*/
const Index primaryKey;

/**
* Array of columns making up this table. Must not be null.
*/
const Column * const columns;

/**
* The number of elements in the columns array. Must be greater than
* zero.
*/
const size_t numColumns;

/**
* Array of triggers acting on this table. Can be null if
* numTriggers is zero.
*/
const Trigger * const triggers;

/**
* The number of elements in the triggers array.
*/
const size_t numTriggers;

/**
* Array of indexes on the table. Can be null if numIndexes is zero.
*/
const Index * const indexes;

/**
* The number of elements in the indexes array.
*/
const size_t numIndexes;
};

struct View
{
/**
* The name of the view.
*/
const char * const name;

/**
* The SELECT statement uses to populate the view.
*/
const char * const statement;

/**
* Array of triggers acting on this view. Can be null if numTriggers
* is zero.
*/
const Trigger * const triggers;

/**
* The number of elements in the triggers array.
*/
const size_t numTriggers;
};

struct MigrationSQL
{
/**
* Array of SQL statements to perform at the upgrade step. Must not
* be null.
*/
const char * const * const statements;

/**
* The number of elements in the statements array. Must be greater
* than zero.
*/
const size_t numStatements;
};

struct Version
{
/**
* The schema version number of this version. Each version must have
* a unique number.
*/
const size_t version;

/**
* Array of tables in this version. Must not be null.
*/
const Table * const tables;

/**
* The number of elements in the tables array. Must be greater than
* zero.
*/
const size_t numTables;

/**
* Array of views in this version. Can be null if numViews is zero.
*/
const View * const views;

/**
* The number of elements in the views array.
*/
const size_t numViews;

/**
* Set of SQL statements to perform to migrate from the previous
* version to this version. Can be null. These statements are run
* after the tables have been modified.
*/
MigrationSQL * const migrateUp;

/**
* Set of SQL statements to perform to migrate from this version to
* the previous version. Can be null. These statements are run
* after the tables have been modified.
*/
MigrationSQL * const migrateDown;
};

struct Schema
{
/**
* The most current version of the schema.
*/
const size_t currentVersion;

/**
* Array of schema versions for this database. Must not be null.
*/
const Version * const versions;

/**
* The number of elements in the versions array. Must be greater
* than zero.
*/
const size_t numVersions;
};

== JavaScript API ==

The JavaScript API will be like the native code one, expect that it will
take a JSON string of the schema (or a JS object that looks the same -
it doesn't really matter). The structure of the JSON looks a lot like
the structs in the native code API, but optional things can simply be
left out. Additionally, instead of having objects that then specify
their names inside them, we use the name of the property that the holds
the reference to the object. This is hard to explain with words, but I
think it's pretty clear in this sample object (also in syntax
highlighted wiki form here:
https://developer.mozilla.org/User:sdwilsh/SchemaProposal/CookiesSample#JavaScript_Sample).
I worked with Paul Oï¿½Shannessy to make this API more JavaScript-friendly.

var schema = {
currentVersion: 2,
versions: [
{ version: 1,
tables: {
moz_cookies: {
primaryKey: {
name: "primary_key_v1",
columns: [
"id",
],
direction: StorageIndexDirection.ASC, /* optional */
},
columns: {
id: {
type: StorageType.INTEGER,
/* default: someValue, */ /* optional */
},
name: {
type: StorageType.TEXT,
},
value: {
type: StorageType.TEXT,
},
host: {
type: StorageType.TEXT,
},
path: {
type: StorageType.TEXT,
},
expiry: {
type: StorageType.INTEGER,
},
isSecure: {
type: StorageType.INTEGER,
},
isHttpOnly: {
type: StorageType.INTEGER,
},
},
triggers: null, /* optional */
indexes: null, /* optional */
},
},
views: null, /* optional */
migrateUp: null, /* optional */
migrateDown: null, /* optional */
}, /* end version 1 */
{ version: 2,
tables: {
moz_cookies: {
primaryKey: {
name: "primary_key_v1",
columns: [
"id",
],
direction: StorageIndexDirection.ASC, /* optional */
},
columns: {
id: {
type: StorageType.INTEGER,
},
name: {
type: StorageType.TEXT,
},
value: {
type: StorageType.TEXT,
},
host: {
type: StorageType.TEXT,
},
path: {
type: StorageType.TEXT,
},
expiry: {
type: StorageType.INTEGER,
},
lastAccessed: {
type: StorageType.INTEGER,
},
isSecure: {
type: StorageType.INTEGER,
},
isHttpOnly: {
type: StorageType.INTEGER,
},
},
},
},
}, /* end version 2 */
],
};

== Summary ==

I know this is a lot to digest, and I'm sorry for the length. I'd love
to hear thoughts and feedback on this, as this is something that the
Places team would like to use in 1.9.3, so I need to start working on
implementing parts of this soon.

Cheers,

Shawn

Andrew Sutherland

unread,

Oct 20, 2009, 3:31:58 PM10/20/09

to

As an aside, the downgrading strategy sounds like a fantastic idea.

Can you provide a brief overview of the actual migration logic that will
be used and the planned limitations?

For example, SQLite's ALTER TABLE support is fairly limited; added
columns are added to the end of the list and dropping columns is
entirely not supported. I assume the sqlite limitation on adds impacts
the order of results that "SELECT * FROM table" returns. The limitation
on dropped columns implies that you are going to have to create a new
table and use an INSERT/SELECT to funnel the data across (which could
also be used to address the add situation too.)

From a consumer perspective, Thunderbird's global database (gloda) is
unlikely to make use of the migration API because when the schema
changes it usually means some piece of data is now critical and we need
to re-process all messages and so blowing away the database is the
correct thing to do.

Thunderbird's address book is looking at a migration to SQLite and it is
a good candidate for this API as it does not have the luxury of
imploding at the drop of a hat. I'll take a look at the schema
definition from that perspective.

Andrew

David Dahl

unread,

Oct 20, 2009, 5:13:23 PM10/20/09

to dev-pl...@lists.mozilla.org

Can you point to any schema migration tools you have researched and may
be influencing your design?

Most SQL migration tools I am familiar with work hand-in-glove with some
kind of DataObject/ORM/Wrapper that the framework uses to do "NoSQL".
From what I can tell, you could end up halfway to some sort of
rudimentary ORM, no?

David

On 10/20/2009 10:52 AM, Shawn Wilsher wrote:
> One of the problems that mozStorage doesn't currently try to address
> is migrating database schemas between changes. As a result, every
> consumer of the API handles it in their own way, each having the
> possibility to introduce subtle bugs.
>
> As a result, I'd like to propose an API to handle this. I've
> determined, however, that using xpidl for this would be rather painful
> for both C++ and JS consumers, so I'm actually going to create one API
> for each (using jsapi to define a js-friendly version for JS
> consumers, and a notxpcom method on mozIStorageConnection for C++
> callers).

...

Mike Shaver

unread,

Oct 20, 2009, 5:15:01 PM10/20/09

to dd...@mozilla.com, dev-pl...@lists.mozilla.org

On Tue, Oct 20, 2009 at 5:13 PM, David Dahl <dd...@mozilla.com> wrote:
> From
> what I can tell, you could end up halfway to some sort of rudimentary ORM,
> no?

Not if we're careful!

Mike

Shawn Wilsher

unread,

Oct 20, 2009, 6:26:13 PM10/20/09

to dev-pl...@lists.mozilla.org

On 10/20/09 2:13 PM, David Dahl wrote:
> Can you point to any schema migration tools you have researched and may
> be influencing your design?

The only tools that I'm aware of are ORM based, which mozStorage isn't.
As a result, I didn't really look to any of them because we don't have
the abstraction tools that they do. While it's true that there is
virtually no SQL in this spec, it doesn't mean this is ORM based. It's
not even really an abstraction since it mirrors the SQL for creating
tables and indexes.

> Most SQL migration tools I am familiar with work hand-in-glove with some
> kind of DataObject/ORM/Wrapper that the framework uses to do "NoSQL".

> From what I can tell, you could end up halfway to some sort of
> rudimentary ORM, no?

I don't feel like we are anywhere near an ORM right now.

Cheers,

Shawn

Shawn Wilsher

unread,

Oct 20, 2009, 6:49:40 PM10/20/09

to dev-pl...@lists.mozilla.org

On 10/20/09 12:31 PM, Andrew Sutherland wrote:
> Can you provide a brief overview of the actual migration logic that will
> be used and the planned limitations?

In general, adding or dropping columns/tables/indexes is pretty simple
between versions (with the issues you brought up, which I'll address
shortly). The upgrade logic I'm looking at goes something like this:
for each version upgrade:
1) Add any tables missing from the database for the new schema.
2) Add any columns missing from each table for the new schema.
3) Add any indexes missing from each table for the new schema.
4) Run SQL statements in migrateUp.
5) Drop any indexes that are no longer needed.
6) Drop any columns that are no longer needed.
7) Drop any tables that are no longer needed.

And the downgrade logic looks something like this:
for each version downgrade:
1) Add any tables missing from the database for the new schema.
2) Add any columns missing from each table for the new schema.
3) Add any indexes missing from each table for the new schema.
4) Run SQL statements in migrateDown.
5) Drop any indexes that are no longer needed.
6) Drop any columns that are no longer needed.
7) Drop any tables that are no longer needed.

This allows for the migration SQL to access data that will be going away
before it does in both instances. I still need to deal with triggers in
this logic, and I'm not totally sure where they should go (likely adding
after step 3 and removing after step 4 in both cases).

> For example, SQLite's ALTER TABLE support is fairly limited; added
> columns are added to the end of the list and dropping columns is
> entirely not supported. I assume the sqlite limitation on adds impacts
> the order of results that "SELECT * FROM table" returns. The limitation
> on dropped columns implies that you are going to have to create a new
> table and use an INSERT/SELECT to funnel the data across (which could
> also be used to address the add situation too.)

The ALTER TABLE issue with adding columns is already a concern.
Consumers now just make sure to select the columns they need in the
order they need. I think this isn't really an issue to worry about, and
encouraging the behavior of explicitly calling out the columns you want
to select is a good practice anyway. It means your query is easier to
read and understand, and it also means that if you only use columns that
are indexes, SQLite can optimize it properly. Maybe I'm wrong here, but
I suspect not.

As for dropping a column, I agree that it's an issue. I've talked to
drh in the past, and he indicated they could probably add support for it
even if it just creates a new table internally. I don't think that's a
terribly big deal since column dropping isn't common, and migrating
database schemas is even less common (the common case is going to be
that your database is up to date), so if it's a little expensive, it's
not a terribly big deal.

Cheers,

Shawn

Andrew Sutherland

unread,

Oct 20, 2009, 7:46:41 PM10/20/09

to

On 10/20/2009 03:49 PM, Shawn Wilsher wrote:
> The ALTER TABLE issue with adding columns is already a concern.
> Consumers now just make sure to select the columns they need in the
> order they need. I think this isn't really an issue to worry about, and
> encouraging the behavior of explicitly calling out the columns you want
> to select is a good practice anyway. It means your query is easier to
> read and understand, and it also means that if you only use columns that
> are indexes, SQLite can optimize it properly. Maybe I'm wrong here, but
> I suspect not.

I agree that it's good practice and a reasonable constraint. Just
wanted to elicit your stance and make it stand out as something to
briefly mention in the documentation. It's moot when accessing results
via column name anyways, which I believe is now possible and advisable
for async too.

> As for dropping a column, I agree that it's an issue. I've talked to drh
> in the past, and he indicated they could probably add support for it
> even if it just creates a new table internally. I don't think that's a
> terribly big deal since column dropping isn't common, and migrating
> database schemas is even less common (the common case is going to be
> that your database is up to date), so if it's a little expensive, it's
> not a terribly big deal.

This also sounds reasonable. So for the purposes of the proposal, is
the answer something like this?:

"Dropping columns is not fully supported; they will be left intact on
upgrade/downgrade. As long as you are explicitly calling out your
columns in your SELECT statements (as we require) this should not pose a
problem to your code when downgrading occurs. If you absolutely need to
drop a column, please file an enhancement request and/or work around the
problem by creating a new table, using an INSERT/SELECT statement to
transfer the data to the new table, and remove the old table."

And presumably if places needs it you can log the enhancement and have
drh do the legwork.

Andrew

Shawn Wilsher

unread,

Oct 20, 2009, 8:04:17 PM10/20/09

to dev-pl...@lists.mozilla.org

On 10/20/09 4:46 PM, Andrew Sutherland wrote:
> I agree that it's good practice and a reasonable constraint. Just wanted
> to elicit your stance and make it stand out as something to briefly
> mention in the documentation. It's moot when accessing results via
> column name anyways, which I believe is now possible and advisable for
> async too.

Sort of. It's a bit painful to do in native code still since you get an
nsIVariant back. This is fixable though.

> This also sounds reasonable. So for the purposes of the proposal, is the
> answer something like this?:
>
> "Dropping columns is not fully supported; they will be left intact on
> upgrade/downgrade. As long as you are explicitly calling out your
> columns in your SELECT statements (as we require) this should not pose a
> problem to your code when downgrading occurs. If you absolutely need to
> drop a column, please file an enhancement request and/or work around the
> problem by creating a new table, using an INSERT/SELECT statement to
> transfer the data to the new table, and remove the old table."

I think that the answer is that if you drop a column it may be slow
because we have to copy table data. I think having the right effects
here is more important than performance since this is such an edge case
anyway (and long run, will be done async!).

Cheers,

Shawn

Justin Dolske

unread,

Oct 21, 2009, 4:32:16 PM10/21/09

to

On 10/20/09 10:52 AM, Shawn Wilsher wrote:
> One of the problems that mozStorage doesn't currently try to address is
> migrating database schemas between changes. As a result, every consumer
> of the API handles it in their own way, each having the possibility to
> introduce subtle bugs.

I agree the current situation isn't ideal, but I'm not entirely
convinced a complicated new API is the solution. Indeed, I think most of
the risk of schema upgrade/downgrades is just the inherent risk of
having code with version X looking at data with version Y, and the
subtle interactions therein.

> The rest of this post is going to be a bit lengthy, but it contains the
> specification as I've worked it out so far.

tl;dr -- Is there a way for code to specific custom data values for each
row when upgrading/downgrading?

> This API is the
> first step to create an asynchronous API to open a database file, and
> getting consumers using this will make opening databases asynchronously
> easier.

If the plan is to make this async eventually, it would probably be less
pain to just do that now, instead of rewriting this all again later and
breaking code using it (or maintaining more code).

Justin

Justin Dolske

unread,

Oct 21, 2009, 4:35:25 PM10/21/09

to

On 10/20/09 3:49 PM, Shawn Wilsher wrote:

> The upgrade logic I'm looking at goes something like this:
> for each version upgrade:

...

> 5) Drop any indexes that are no longer needed.
> 6) Drop any columns that are no longer needed.
> 7) Drop any tables that are no longer needed.

Doesn't this break forwards compatibility for old code?

EG, User upgrades from Firefox 5 to Firefox 6, and you drop some stuff
from some DB. Now they go back to Firefox 5...

> And the downgrade logic looks something like this:

> 1) Add any tables missing from the database for the new schema.
> 2) Add any columns missing from each table for the new schema.
> 3) Add any indexes missing from each table for the new schema.

...how do you restore the missing *data*?

Justin

Shawn Wilsher

unread,

Oct 21, 2009, 4:50:38 PM10/21/09

to dev-pl...@lists.mozilla.org

On 10/21/09 1:32 PM, Justin Dolske wrote:
> I agree the current situation isn't ideal, but I'm not entirely
> convinced a complicated new API is the solution. Indeed, I think most of
> the risk of schema upgrade/downgrades is just the inherent risk of
> having code with version X looking at data with version Y, and the
> subtle interactions therein.

The API is more verbose than it is complicated (and it is substantially
less verbose for JavaScript than it is for C++). Additionally, it
removes lots of duplicated code that is all subtly different. Also, the
fact that no consumer handles downgrades now is a bit of a problem.

> tl;dr -- Is there a way for code to specific custom data values for each
> row when upgrading/downgrading?

I presume you are concerned about a specific password manager use-case
with populating a secret key or something, right? I talked to zpao
about this when I was drafting this, and there are a few solutions to
this problem, but it'd have to be done manually (either in a migradeUp
SQL statement, or post-migration).

> If the plan is to make this async eventually, it would probably be less
> pain to just do that now, instead of rewriting this all again later and
> breaking code using it (or maintaining more code).

It'd make it hard to implement in any consumer right now. The async
method will take the same data structure for the schema, but in an
openDatabase call. Doing this + async open is too much work to bite off
for now, so this is a natural first step.

Cheers,

Shawn

Shawn Wilsher

unread,

Oct 21, 2009, 4:52:34 PM10/21/09

to dev-pl...@lists.mozilla.org

On 10/21/09 1:35 PM, Justin Dolske wrote:
> Doesn't this break forwards compatibility for old code?
>
> EG, User upgrades from Firefox 5 to Firefox 6, and you drop some stuff
> from some DB. Now they go back to Firefox 5...

It shouldn't. This is what the migrateSQL steps are for.

>> And the downgrade logic looks something like this:
>> 1) Add any tables missing from the database for the new schema.
>> 2) Add any columns missing from each table for the new schema.
>> 3) Add any indexes missing from each table for the new schema.
>
> ...how do you restore the missing *data*?

You'd handle that in migrateDown SQL statements.

Cheers,

Shawn

Justin Dolske

unread,

Oct 21, 2009, 7:47:31 PM10/21/09

to

On 10/21/09 1:50 PM, Shawn Wilsher wrote:
> Also, the
> fact that no consumer handles downgrades now is a bit of a problem.

Aroo? How are you defining "downgrades"?

Password Manager and Satchel both handle it fine, you can take their DBs
from a trunk profile and use them in FF3.5 (or even recent FF3.0
updates, in the case of satchel). I thought this was also true of Places?

> I presume you are concerned about a specific password manager use-case
> with populating a secret key or something, right?

Well, more generally about setting values derived from other things. IE,
where a default column value isn't appropriate.

> It'd make it hard to implement in any consumer right now. The async
> method will take the same data structure for the schema, but in an
> openDatabase call. Doing this + async open is too much work to bite off
> for now, so this is a natural first step.

Why not do async open first, then? It's an independent problem, no?

It sounds rather painful to rewrite consumers to use this API, and then
rewrite them yet again for the async API. Code churn inevitably leads to
bugs, which is a little ironic given the reason for doing this. :)

Justin

Justin Dolske

unread,

Oct 21, 2009, 7:51:54 PM10/21/09

to

On 10/21/09 1:52 PM, Shawn Wilsher wrote:

>> ...how do you restore the missing *data*?
> You'd handle that in migrateDown SQL statements.

How? The data is gone, and may not be derivable from other columns.

Justin

Andrew Sutherland

unread,

Oct 21, 2009, 8:40:47 PM10/21/09

to

On 10/21/2009 04:47 PM, Justin Dolske wrote:
> It sounds rather painful to rewrite consumers to use this API, and then
> rewrite them yet again for the async API. Code churn inevitably leads to
> bugs, which is a little ironic given the reason for doing this. :)

I would assume that in the async case the client code would only be
using storage's async API to interact with the database. I additionally
assume that storage would make sure that all async statements issued by
client code would not hit SQLite until the migration is complete. This
would allow client code to be completely ignorant of the migration.

Presumably the async migration API would also provide support for a
callback when migration is supported so that more clever client code
could provide some user indication that functionality dependent on the
database will have to wait a bit.

Andrew

Andrew Sutherland

unread,

Oct 21, 2009, 8:42:49 PM10/21/09

to

On 10/21/2009 05:40 PM, Andrew Sutherland wrote:
> Presumably the async migration API would also provide support for a
> callback when migration is supported so that more clever client code
> could provide some user indication that functionality dependent on the
> database will have to wait a bit.

s/migration is supported/migration is completed/.

Andrew

Shawn Wilsher

unread,

Oct 21, 2009, 10:14:45 PM10/21/09

to dev-pl...@lists.mozilla.org

On 10/21/09 4:47 PM, Justin Dolske wrote:
> Aroo? How are you defining "downgrades"?

Well, downgrades are highly limited right now. You can't drop columns
in future versions because once you downgrade, it'd be useless. Same
with tables. There are other constraints as well. This migrator allows
for more flexible changes to databases.

> Password Manager and Satchel both handle it fine, you can take their DBs
> from a trunk profile and use them in FF3.5 (or even recent FF3.0
> updates, in the case of satchel). I thought this was also true of Places?

Yeah, but places can't make some changes because of backwards
compatibility, whereas this would mean more options are open.

> Well, more generally about setting values derived from other things. IE,
> where a default column value isn't appropriate.

Right, this is what zpao and I had talked about and made sure was
solvable. It's not a common case, so it's not exactly easy, but it's
doable.

> Why not do async open first, then? It's an independent problem, no?

Andrew answered this already in later posts.

> It sounds rather painful to rewrite consumers to use this API, and then
> rewrite them yet again for the async API. Code churn inevitably leads to
> bugs, which is a little ironic given the reason for doing this. :)

Honestly, the rewriting here is trivial in most cases. Almost all
consumers have some "migrateDatabase" function, and instead of calling
that, we'd just call the new method on the mozIStorageConnection object
and pass in the schema. The most work will be creating the schema
object, which is pretty trivial IMO.

Cheers,

Shawn

Shawn Wilsher

unread,

Oct 21, 2009, 10:16:30 PM10/21/09

to dev-pl...@lists.mozilla.org

On 10/21/09 4:51 PM, Justin Dolske wrote:
> On 10/21/09 1:52 PM, Shawn Wilsher wrote:
> How? The data is gone, and may not be derivable from other columns.

If that data is not derivable form other columns, you lose, but that's
already the case now. This new API gives you a bit more flexibility for
downgrades than what is currently available.

Cheers,

Shawn

Marco Bonardo

unread,

Oct 22, 2009, 3:53:26 AM10/22/09

to

Il 22/10/2009 1.47, Justin Dolske ha scritto:
> On 10/21/09 1:50 PM, Shawn Wilsher wrote:
> Password Manager and Satchel both handle it fine, you can take their DBs
> from a trunk profile and use them in FF3.5 (or even recent FF3.0
> updates, in the case of satchel). I thought this was also true of Places?

Places schema is complex, and migrations are even more complex.
You could ask why, well, the Places database schema sucks, was badly
designed with the old history/bookmarks system in mind, thinking more to
code facts rather than to databases normalization rules and performance
tips. So it is not as performant and good as it could be.

Changing it is clearly what we want, but supporting downgrades will be a
pain, because old version don't know about new data and tables.
For example we would like to have a separate table for tags, but if we
move tags downgrading to an old version will just lose all tags.

If you'd ask me for FX4 i'd just throw away places.sqlite and build new
history.sqlite and bookmarks.sqlite, without too much care at
downgrades, that's what was done when moving from FX2 to FX3. But that
does not look like the direction most people want.

So the request to have a nice API to migrate data making old versions
aware of changes.

Marco

David Dahl

unread,

Oct 22, 2009, 2:25:42 PM10/22/09

to dev-pl...@lists.mozilla.org

Sounds like you might want to use some kind of JSON-based log to handle
resurrection of deleted data. or something like that:)

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

David Dahl

unread,

Oct 22, 2009, 2:32:14 PM10/22/09

to dev-pl...@lists.mozilla.org

On 10/22/2009 12:53 AM, Marco Bonardo wrote:
>
> If you'd ask me for FX4 i'd just throw away places.sqlite and build
> new history.sqlite and bookmarks.sqlite, without too much care at
> downgrades, that's what was done when moving from FX2 to FX3. But that
> does not look like the direction most people want.
>

I am in the minority as well, I think a major iteration of Places is in
order for Fx 4. With the UI changing so radically I am not sure if we
will be able to keep the existing schema. We will see. I think the
existing schema is going to hold back any kind of bold moves in Places.

d

Justin Dolske

unread,

Oct 25, 2009, 6:47:43 PM10/25/09

to

On 10/21/09 7:16 PM, Shawn Wilsher wrote:

>> How? The data is gone, and may not be derivable from other columns.
> If that data is not derivable form other columns, you lose, but that's
> already the case now. This new API gives you a bit more flexibility for
> downgrades than what is currently available.

This isn't correct at all. For example:

The form (and soon password) DBs added a "time created" timestamp for
each entry, which is useful for frecency stuff. If you use the DB with
an older version of Firefox, we down-rev the user_version and just
ignore that extra column -- it's not dropped. When you return to a
current version of Firefox, we up-rev user_version and migrate any newly
added entries (ie, where "time created" is null). Other entries retain
their original timestamps. This all works today, and allows safely using
Firefox 3.0 with a formhistory.sqlite from Firefox 3.5 [and then
returning to 3.5 seamlessly].

Having an API that drops columns when downgrading would be a big step
backward. I wouldn't use it because it's destroying user data.

Justin

Justin Dolske

unread,

Oct 25, 2009, 7:10:00 PM10/25/09

to

On 10/21/09 7:14 PM, Shawn Wilsher wrote:

>> Password Manager and Satchel both handle it fine, you can take their DBs
>> from a trunk profile and use them in FF3.5 (or even recent FF3.0
>> updates, in the case of satchel). I thought this was also true of Places?
> Yeah, but places can't make some changes because of backwards
> compatibility, whereas this would mean more options are open.

I would argue that something with Places' complexity and uniqueness
could be expected to handle it's own schema migration. Designing an API
to fit Places needs will likely result in an API that's great for Places
but not for anything else. :)

>> Well, more generally about setting values derived from other things. IE,
>> where a default column value isn't appropriate.
> Right, this is what zpao and I had talked about and made sure was
> solvable. It's not a common case, so it's not exactly easy, but it's
> doable.

I think these are fairly basic things people would want to do, so I
wouldn't be so quick to dismiss it.

> Honestly, the rewriting here is trivial in most cases.

Famous last words, especially after saying above that switching to this
API is "not exactly easy" for 2 consumers in Firefox!

Justin

Andrew Sutherland

unread,

Oct 25, 2009, 7:30:32 PM10/25/09

to

On 10/25/2009 04:10 PM, Justin Dolske wrote:
> I would argue that something with Places' complexity and uniqueness
> could be expected to handle it's own schema migration. Designing an API
> to fit Places needs will likely result in an API that's great for Places
> but not for anything else. :)

The 'API' is basically a machine-readable (without SQL parsing)
definition of the schema. The only thing that is not schema are the
opaque SQL statements used to upgrade or downgrade.

What about that is Places specific and how would you avoid it crippling
the API?

Andrew

Andrew Sutherland

unread,

Oct 25, 2009, 7:43:52 PM10/25/09

to

On 10/25/2009 03:47 PM, Justin Dolske wrote:
> The form (and soon password) DBs added a "time created" timestamp for
> each entry, which is useful for frecency stuff. If you use the DB with
> an older version of Firefox, we down-rev the user_version and just
> ignore that extra column -- it's not dropped. When you return to a
> current version of Firefox, we up-rev user_version and migrate any newly
> added entries (ie, where "time created" is null). Other entries retain
> their original timestamps. This all works today, and allows safely using
> Firefox 3.0 with a formhistory.sqlite from Firefox 3.5 [and then
> returning to 3.5 seamlessly].

This sounds like you would like the ability to annotate that certain new
columns / tables are harmless to previous schema versions that you
explicitly call out.

So Firefox 3.5's schema version could indicate that the "time created"
column is harmless to Firefox 3.0. However, if the user managed to
downgrade to 2.0 (this is all obviously hypothetical and impossible),
that column might get dropped because the assumption is that you would
have marked it harmless if it was harmless.

For your upgrade to 3.5, would a SQL statement be sufficient to upgrade
"time created", or do you need to run C++/JS logic?

Andrew

Shawn Wilsher

unread,

Oct 25, 2009, 11:38:37 PM10/25/09

to dev-pl...@lists.mozilla.org

On 10/25/09 4:10 PM, Justin Dolske wrote:
> I would argue that something with Places' complexity and uniqueness
> could be expected to handle it's own schema migration. Designing an API
> to fit Places needs will likely result in an API that's great for Places
> but not for anything else. :)

The API was designed for the general case, and then Marco made sure
Places could actually use it. He found a few things that I forgot to
support, but nothing major changed.

> I think these are fairly basic things people would want to do, so I
> wouldn't be so quick to dismiss it.

Sure, but we have one concrete case where this comes up, and that's it.
Right now it sounds like an edge case, and it's an edge case with a
work around.

> Famous last words, especially after saying above that switching to this
> API is "not exactly easy" for 2 consumers in Firefox!

As Andrew mentioned, all this is is a schema specification. The same
data structures will be used with the new API, so there really is no
major change needed here.

Cheers,

Shawn

Dave Townsend

unread,

Nov 10, 2009, 4:38:29 PM11/10/09

to

On 25/10/2009 16:10, Justin Dolske wrote:
> On 10/21/09 7:14 PM, Shawn Wilsher wrote:
>>> Well, more generally about setting values derived from other things. IE,
>>> where a default column value isn't appropriate.
>> Right, this is what zpao and I had talked about and made sure was
>> solvable. It's not a common case, so it's not exactly easy, but it's
>> doable.
>
> I think these are fairly basic things people would want to do, so I
> wouldn't be so quick to dismiss it.

Really as best as I can tell the current proposal is only useful if your
migrations (up or down) only involve changing data structures. With it
as is it is not likely something I'd end up using for the extension
manager. We have changed data schema in the RDF there a couple of times
in the past and this proposal wouldn't have helped in any of the cases
since each time we were adding new data that we had to extract from the
extensions during migration.

The automatic applying of a schema is nice but for it to really be
useful to me I'd probably need to have the ability to pass a callback to
be used at the migrateUp or migrateDown stage.