Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Databases schemas

57 views
Skip to first unread message

Fox on the run

unread,
Jul 27, 2008, 11:44:27 AM7/27/08
to
Are there any references out there (authoritative ones especially, but
will take whatever's out there) on the various databases used by FF3
and the relationship between the fields from the various tables within
each of those databases and ideally how they get populated? I
searched current postings for database schema and found one reference
to a URL from a user but that link is now dead.

Thanks,

JB

Shawn Wilsher

unread,
Jul 27, 2008, 2:12:52 PM7/27/08
to Fox on the run
As far as I know, this doesn't exist. If you have any specific
questions/issues/concerns, I'd be happy to help you get the information
that you need.

Cheers,

Shawn

Fox on the run

unread,
Jul 27, 2008, 4:58:10 PM7/27/08
to

I'm posting my reply to Shawn. I replied to it via my email account
but it replied to him directly, not to the list. Including my reply
on the list in case others have some contributions to add to the
discussion.
-----------
Thanks Shawn. Are you one of the developers by any chance? I'm
trying to understand the significance of the various fields within the
tables, and the ones that point to a unique field in another table
(such as moz_annos.places_id=moz_places.id). Some are obvious like my
previous example. Others are less obvious (I figured out today that
moz_bookmarks.ftk = moz_places.id). As for the field contents, most
are fairly obvious. But some aren't such as visit_type in
moz_historyvisits. I've did some testing and have noted some pattern,
but not enough to make any definitive conclusions yet. I noticed that
a typed URI in moz_places will have typed=1. One that is not typed
will have typed=0. But if it is typed at some point in time, typed
becomes 1. However that does not allow one to determine if it ever
was 0 before thus giving the impression that all entries in
moz_historyvisits pointing to that moz_places entry are typed URI's.
Having said that this is where I notice that
moz_historyvisits.visit_type differs between a typed URI and one that
is not typed. moz_historyvisits.session appears to increment
everytime you restart FF3. I haven't done any testing yet to see what
would happen if I had a second instance of FF 3 running. I'll test
that later. I haven't quite figured out moz_anno_attributes and
moz_items_annos. And I can't seem to figure out what sqlite_sequence
does.

I'll also be attacking the other database files in a similar fashion.
I've looked at them briefly and note that most are straight forward.
But I'm sure as I dig into some of them other questions will arise.

I must say that overall I am really enjoying FF 3. I like digging
under the hood to understand what is happening behind the scenes.
What activities caused what entries. I did quite a bit of digging
around with FF 2 and had a pretty good handle on things. FF 3 brings
new challenges but I'm enjoying learning about sqlite.

The other thing I noted was that FF 3 appears to be using Epoch time
in UTC format when storing such info in places.sqlite (and I expect
the same will apply for the others but I won't know until I dig deeper
in the other databases). However it uses it to a greater precision,
with 6 numbers beyond the standard Epoch time, so to the millionth of
a second I guess. Is this done to ensure greater likelihood of having
unique values in those date/time fields?

I've found the link on the Mozilla site that explains
moz_places.frecency. I'd love to see such explanations for the other
fields within the tables. The moz_places.hidden field is another
example. I can see essentially what causes it. I haven't dug deep
enough to see if it's possible to link those hidden URIs to the main
URI a person would have visited to populate those hidden entries. And
although I can see how moz_places.rev_host is obtained, am I correct
in guessing that it's used to improving indexing and thus seeking
information in places. Makes sense given that if you sorted in the
normal order, with most everything starting with http://www that you'd
likely end up taking a performance hit in trying to sort and seek on
the URI in that fashion.

Thanks for any help you can provide.

Jacques

Edward Lee

unread,
Jul 31, 2008, 3:55:20 AM7/31/08
to
At least for places, there's this schema diagram which might not be too
up-to-date, but should give a better picture:

http://dietrich.ganx4.com/mozilla/places-erd.png

--
Ed

Fox on the run

unread,
Aug 1, 2008, 1:35:42 AM8/1/08
to

Thanks Ed. I've seen that one before. One thing it lacks (for my
needs) is the relationship between the fields from the various tables
(i.e. moz_historyvisits.place_id = moz_places.id) in order to writer
proper SQL join statements.

Thanks,
JB

Fox on the run

unread,
Aug 16, 2008, 1:19:39 AM8/16/08
to
On Jul 27, 2:12 pm, Shawn Wilsher <sdwi...@mozilla.com> wrote:
> Fox on the run wrote:> Are there any references out there (authoritative ones especially, but
> > will take whatever's out there) on the various databases used by FF3
> > and the relationship between the fields from the various tables within
> > each of those databases and ideally how they get populated?  I
> > searched current postings for databaseschemaand found one reference

> > to a URL from a user but that link is now dead.
>
> As far as I know, this doesn't exist.  If you have any specific
> questions/issues/concerns, I'd be happy to help you get the information
> that you need.
>
> Cheers,
>
> Shawn

Hi Shawn,

Any luck digging out answers to some of my questions? I tried
replying to your email you sent me directly but it came back as a
permanent failure whereas it was working at the end of July.

Thanks,

JB

Shawn Wilsher

unread,
Aug 23, 2008, 12:36:39 AM8/23/08
to Fox on the run
Fox on the run wrote:
Wow, it's almost been a month since you posted. Sorry this took so long.

So, most of our databases are pretty easy to follow along with. The
most difficult one has got to be places. Before I go any further
though, I just want to say that nobody should modify the data in these
tables and should stick with the API's given by the interface(s) that
use the database. If it's lacking something, file a bug to get
something added.

With that out of the way, let's talk about places, shall we? The
starting point is right here:
http://dietrich.ganx4.com/mozilla/places-erd.png
I know Edward already posted about this, but it's been recently updated.
Let's move on to some of your more specific questions now.

> The other thing I noted was that FF 3 appears to be using Epoch time
> in UTC format when storing such info in places.sqlite (and I expect
> the same will apply for the others but I won't know until I dig deeper
> in the other databases). However it uses it to a greater precision,
> with 6 numbers beyond the standard Epoch time, so to the millionth of
> a second I guess. Is this done to ensure greater likelihood of having
> unique values in those date/time fields?

Since the code is written in C++ (well, most of it), we just use a
PRTime value:
http://developer.mozilla.org/en/PRTime

> I've found the link on the Mozilla site that explains
> moz_places.frecency. I'd love to see such explanations for the other
> fields within the tables. The moz_places.hidden field is another
> example. I can see essentially what causes it. I haven't dug deep
> enough to see if it's possible to link those hidden URIs to the main
> URI a person would have visited to populate those hidden entries. And
> although I can see how moz_places.rev_host is obtained, am I correct
> in guessing that it's used to improving indexing and thus seeking
> information in places. Makes sense given that if you sorted in the
> normal order, with most everything starting with http://www that you'd
> likely end up taking a performance hit in trying to sort and seek on
> the URI in that fashion.

Hidden uri's are ones that the user didn't specifically navigate to.
Those tend to include embedded pages and images. It might include
something else, but I'm not 100% sure.

As for rev_host - I was wondering myself the other day what it's used
for. I can't give you an answer at this time - sorry!

If you have any other questions, please feel free to post. I promise
not to take so long to get back to you!

Cheers,

Shawn

Fox on the run

unread,
Aug 23, 2008, 10:41:23 AM8/23/08
to
>  > normal order, with most everything starting withhttp://wwwthat you'd

>  > likely end up taking a performance hit in trying to sort and seek on
>  > the URI in that fashion.
> Hidden uri's are ones that the user didn't specifically navigate to.
> Those tend to include embedded pages and images.  It might include
> something else, but I'm not 100% sure.
>
> As for rev_host - I was wondering myself the other day what it's used
> for.  I can't give you an answer at this time - sorry!
>
> If you have any other questions, please feel free to post.  I promise
> not to take so long to get back to you!
>
> Cheers,
>
> Shawn

Thanks for all that info Shawn. I appreciate that it is recommended
to go through the APIs to pull data out. Currently I'm using SQLite
Manager and learning sqlite to pull the information using sqlite
statements. Eventually I'd like to take the time to learn a suitable
language and possibly use the APIs in that case to access the
information (what language would you suggest). The advantage I
suspect you have (and please correct me if I am wrong) by doing it
directly via sqlite statements is that you can query the database in
different ways not possible by the API. For example if you wanted to
see all visited URIs excluding hidden ones for a given time frame.

Regards that .png image, what do the fk represents (I'm guess the k is
for key). I can see that it's used to indicate keys that have a
relationship to a key in another table (again, correct me if I'm wrong
on that conclusion).

Regards hidden URIs, thanks for confirming what I suspected it was.
As for rev_host, there is an index created on it as follows "CREATE
INDEX moz_places_hostindex ON moz_places (rev_host)" which is why I'm
guessing it's done to make each entry more unique than sorting on the
URI in regular order (rather than reverse order) thus making it more
useful and more effective. But I'm only making an educated guess on
that one.

Thanks,

JB

Shawn Wilsher

unread,
Aug 27, 2008, 2:42:00 PM8/27/08
to Fox on the run
Fox on the run wrote:
> Thanks for all that info Shawn. I appreciate that it is recommended
> to go through the APIs to pull data out. Currently I'm using SQLite
> Manager and learning sqlite to pull the information using sqlite
> statements. Eventually I'd like to take the time to learn a suitable
> language and possibly use the APIs in that case to access the
> information (what language would you suggest). The advantage I
> suspect you have (and please correct me if I am wrong) by doing it
> directly via sqlite statements is that you can query the database in
> different ways not possible by the API. For example if you wanted to
> see all visited URIs excluding hidden ones for a given time frame.
The grounding for my remark is mostly regarding writing to the database.
Once you start reading with sql queries, it get's pretty easy to start
writing too, and that can break what the code expects.

> Regards that .png image, what do the fk represents (I'm guess the k is
> for key). I can see that it's used to indicate keys that have a
> relationship to a key in another table (again, correct me if I'm wrong
> on that conclusion).

fk is for foreign key.

Cheers,

Shawn

Dietrich Ayala

unread,
Sep 4, 2008, 2:55:35 PM9/4/08
to Fox on the run, dev-apps...@lists.mozilla.org

This allows us to do flexible partial match searches on hostnames while still being able to utilize
the index on that column. More details are available in the code:

http://mxr.mozilla.org/mozilla-central/search?string=tfosorcim

0 new messages