Thanks,
JB
Cheers,
Shawn
I'm posting my reply to Shawn. I replied to it via my email account
but it replied to him directly, not to the list. Including my reply
on the list in case others have some contributions to add to the
discussion.
-----------
Thanks Shawn. Are you one of the developers by any chance? I'm
trying to understand the significance of the various fields within the
tables, and the ones that point to a unique field in another table
(such as moz_annos.places_id=moz_places.id). Some are obvious like my
previous example. Others are less obvious (I figured out today that
moz_bookmarks.ftk = moz_places.id). As for the field contents, most
are fairly obvious. But some aren't such as visit_type in
moz_historyvisits. I've did some testing and have noted some pattern,
but not enough to make any definitive conclusions yet. I noticed that
a typed URI in moz_places will have typed=1. One that is not typed
will have typed=0. But if it is typed at some point in time, typed
becomes 1. However that does not allow one to determine if it ever
was 0 before thus giving the impression that all entries in
moz_historyvisits pointing to that moz_places entry are typed URI's.
Having said that this is where I notice that
moz_historyvisits.visit_type differs between a typed URI and one that
is not typed. moz_historyvisits.session appears to increment
everytime you restart FF3. I haven't done any testing yet to see what
would happen if I had a second instance of FF 3 running. I'll test
that later. I haven't quite figured out moz_anno_attributes and
moz_items_annos. And I can't seem to figure out what sqlite_sequence
does.
I'll also be attacking the other database files in a similar fashion.
I've looked at them briefly and note that most are straight forward.
But I'm sure as I dig into some of them other questions will arise.
I must say that overall I am really enjoying FF 3. I like digging
under the hood to understand what is happening behind the scenes.
What activities caused what entries. I did quite a bit of digging
around with FF 2 and had a pretty good handle on things. FF 3 brings
new challenges but I'm enjoying learning about sqlite.
The other thing I noted was that FF 3 appears to be using Epoch time
in UTC format when storing such info in places.sqlite (and I expect
the same will apply for the others but I won't know until I dig deeper
in the other databases). However it uses it to a greater precision,
with 6 numbers beyond the standard Epoch time, so to the millionth of
a second I guess. Is this done to ensure greater likelihood of having
unique values in those date/time fields?
I've found the link on the Mozilla site that explains
moz_places.frecency. I'd love to see such explanations for the other
fields within the tables. The moz_places.hidden field is another
example. I can see essentially what causes it. I haven't dug deep
enough to see if it's possible to link those hidden URIs to the main
URI a person would have visited to populate those hidden entries. And
although I can see how moz_places.rev_host is obtained, am I correct
in guessing that it's used to improving indexing and thus seeking
information in places. Makes sense given that if you sorted in the
normal order, with most everything starting with http://www that you'd
likely end up taking a performance hit in trying to sort and seek on
the URI in that fashion.
Thanks for any help you can provide.
Jacques
http://dietrich.ganx4.com/mozilla/places-erd.png
--
Ed
Thanks Ed. I've seen that one before. One thing it lacks (for my
needs) is the relationship between the fields from the various tables
(i.e. moz_historyvisits.place_id = moz_places.id) in order to writer
proper SQL join statements.
Thanks,
JB
Hi Shawn,
Any luck digging out answers to some of my questions? I tried
replying to your email you sent me directly but it came back as a
permanent failure whereas it was working at the end of July.
Thanks,
JB
So, most of our databases are pretty easy to follow along with. The
most difficult one has got to be places. Before I go any further
though, I just want to say that nobody should modify the data in these
tables and should stick with the API's given by the interface(s) that
use the database. If it's lacking something, file a bug to get
something added.
With that out of the way, let's talk about places, shall we? The
starting point is right here:
http://dietrich.ganx4.com/mozilla/places-erd.png
I know Edward already posted about this, but it's been recently updated.
Let's move on to some of your more specific questions now.
> The other thing I noted was that FF 3 appears to be using Epoch time
> in UTC format when storing such info in places.sqlite (and I expect
> the same will apply for the others but I won't know until I dig deeper
> in the other databases). However it uses it to a greater precision,
> with 6 numbers beyond the standard Epoch time, so to the millionth of
> a second I guess. Is this done to ensure greater likelihood of having
> unique values in those date/time fields?
Since the code is written in C++ (well, most of it), we just use a
PRTime value:
http://developer.mozilla.org/en/PRTime
> I've found the link on the Mozilla site that explains
> moz_places.frecency. I'd love to see such explanations for the other
> fields within the tables. The moz_places.hidden field is another
> example. I can see essentially what causes it. I haven't dug deep
> enough to see if it's possible to link those hidden URIs to the main
> URI a person would have visited to populate those hidden entries. And
> although I can see how moz_places.rev_host is obtained, am I correct
> in guessing that it's used to improving indexing and thus seeking
> information in places. Makes sense given that if you sorted in the
> normal order, with most everything starting with http://www that you'd
> likely end up taking a performance hit in trying to sort and seek on
> the URI in that fashion.
Hidden uri's are ones that the user didn't specifically navigate to.
Those tend to include embedded pages and images. It might include
something else, but I'm not 100% sure.
As for rev_host - I was wondering myself the other day what it's used
for. I can't give you an answer at this time - sorry!
If you have any other questions, please feel free to post. I promise
not to take so long to get back to you!
Cheers,
Shawn
Thanks for all that info Shawn. I appreciate that it is recommended
to go through the APIs to pull data out. Currently I'm using SQLite
Manager and learning sqlite to pull the information using sqlite
statements. Eventually I'd like to take the time to learn a suitable
language and possibly use the APIs in that case to access the
information (what language would you suggest). The advantage I
suspect you have (and please correct me if I am wrong) by doing it
directly via sqlite statements is that you can query the database in
different ways not possible by the API. For example if you wanted to
see all visited URIs excluding hidden ones for a given time frame.
Regards that .png image, what do the fk represents (I'm guess the k is
for key). I can see that it's used to indicate keys that have a
relationship to a key in another table (again, correct me if I'm wrong
on that conclusion).
Regards hidden URIs, thanks for confirming what I suspected it was.
As for rev_host, there is an index created on it as follows "CREATE
INDEX moz_places_hostindex ON moz_places (rev_host)" which is why I'm
guessing it's done to make each entry more unique than sorting on the
URI in regular order (rather than reverse order) thus making it more
useful and more effective. But I'm only making an educated guess on
that one.
Thanks,
JB
> Regards that .png image, what do the fk represents (I'm guess the k is
> for key). I can see that it's used to indicate keys that have a
> relationship to a key in another table (again, correct me if I'm wrong
> on that conclusion).
fk is for foreign key.
Cheers,
Shawn
This allows us to do flexible partial match searches on hostnames while still being able to utilize
the index on that column. More details are available in the code:
http://mxr.mozilla.org/mozilla-central/search?string=tfosorcim