Global Database Proposal Draft

Andrew Sutherland

unread,

Jun 10, 2008, 5:14:11 AM6/10/08

to

To help drive Bryan Clark's proposed Experimental Message View (and
because it would be nice to have one :), creating a global database has
been under discussion.

I have posted a draft proposal here:
http://wiki.mozilla.org/User:Andrew_Sutherland/MailNews/GlobalDatabase

It is at once both too long and not long enough. (I have tried to be
thorough, but the goal is to get a global database in time for 3.0, not
just write a long document that no one will read in time for 3.0.)

My goal is to move forward with developing a global database extension
in the near future (accompanied by an 'experimental message view'
extension), so my greatest interest is in specific feedback about the
schema. Namely, the attribute model, its scalability, and
representation of contacts/identities (for which an arbitrary decision,
though informed by practical experience, has been made). Comments about
potential features built on such a database are probably best made in a
separate thread or the experimental message view thread.

I think responses on the newsgroup are probably easiest for everyone to
be part of the communication, but the wiki discussion pages or
well-annotated in-line changes probably also work.

Thanks,
Andrew

Kent James

unread,

Jun 12, 2008, 2:48:37 PM6/12/08

to

The document is a little overwhelming, but a really important issue, so
let me try to make some small comments.

"Because we are building a layer on top of the folder-based storage,
each message in the global database only needs to 1) store enough
information to locate the message in folder-centric storage plus 2)
whatever information we need about it to act on/filter/group by/etc. in
the global database without touching the message store."

The "whatever information" really needs to be "all information" if you
really want the database to drive saved search views. I've been slowly
fixing various cases where the "all information" was not being
accurately reflected in saved searches (new flag, keywords, junk
status). If you don't include all information that is already
searchable, you're going to create another variation of what I call the
"scope problem". What you describe ends up completely duplicating the
existing mork folder database - is that your intention?

"Explicit attributes are attributes that are the result of a user action
... This means we should probably go out of our way to find a way to
ensure that explicit attributes (that are not already preserved
else-where) can be backed-up and restored."

We really need to decide if the existing folder databases will be
expanded to be reliable stores of metadata, or subsumed by the global
database. As things stand now, I am sorely tempted to start adding a
SQLITE database at the folder level, in parallel with the mork database,
to backup information that we aren't keeping in the messages themselves
- like junkstatusorigin or junkscore - that are needed for reliably
adding more complex views in the current database/search environment. I
would like to add better views for junk management based on saved
searches, but I cannot currently do that because the data is all blown
away regularly.

Andrew Sutherland

unread,

Jun 12, 2008, 5:04:26 PM6/12/08

to

Kent James wrote:
> "Because we are building a layer on top of the folder-based storage,
> each message in the global database only needs to 1) store enough
> information to locate the message in folder-centric storage plus 2)
> whatever information we need about it to act on/filter/group by/etc. in
> the global database without touching the message store."
>
> The "whatever information" really needs to be "all information" if you
> really want the database to drive saved search views. I've been slowly
> fixing various cases where the "all information" was not being
> accurately reflected in saved searches (new flag, keywords, junk
> status). If you don't include all information that is already
> searchable, you're going to create another variation of what I call the
> "scope problem". What you describe ends up completely duplicating the
> existing mork folder database - is that your intention?

Yes, my intention is that the global database index everything that Mork
indexes. If we can efficiently perform a search on a single folder,
there is no reason that we shouldn't be able to do it globally[1]. My
intention is also that we not actually store the message body or
headers/portions of headers that the system does not understand, which
is why I say "whatever information" rather than "all information". I
would expect "whatever information" to grow as we bulk up the features
of the core global database and as other extensions make other data more
relevant.

1: This does not include full-text search, but I don't believe the Mork
index is doing full-text search either.

> "Explicit attributes are attributes that are the result of a user action
> ... This means we should probably go out of our way to find a way to
> ensure that explicit attributes (that are not already preserved
> else-where) can be backed-up and restored."
>
> We really need to decide if the existing folder databases will be
> expanded to be reliable stores of metadata, or subsumed by the global
> database. As things stand now, I am sorely tempted to start adding a
> SQLITE database at the folder level, in parallel with the mork database,
> to backup information that we aren't keeping in the messages themselves
> - like junkstatusorigin or junkscore - that are needed for reliably
> adding more complex views in the current database/search environment. I
> would like to add better views for junk management based on saved
> searches, but I cannot currently do that because the data is all blown
> away regularly.

I think we should be storing the information in the message. I feel it
is an admirable quality of software to try and avoid creating lock-in
situations. We could do worse than SQLite for avoiding lock-in (for
example, Mork :), but I don't see a real reason to not just put things
that have meaning to the user (and I think SPAM flags count) in the
message itself. And spam flags are sufficiently fundamental to
Thunderbird that I think the spam flags could be added just like we add
tag flags so that we don't need to completely re-write the mbox, just
patch the specific header lines.

Andrew

Kent James

unread,

Jun 12, 2008, 6:14:59 PM6/12/08

to

Andrew Sutherland wrote:
>
> I think we should be storing the information in the message.
>

> Andrew

I'm not sure if this is even feasible in the IMAP case. We are currently
using custom IMAP flags, but that pretty much limits you to binary data.
Then there is news, where we don't modify the message at all AFAIK.

Looking forward to the kinds of things that I would be interested in,
messages get additional metadata from triage activities, association
with projects, statistical analysis - in addition to the contact and
conversation metadata that you refer to. There could be lots of kinds of
metadata, each only used by a fraction of the userbase.

It's pretty clear to me that this issue of where and how to store the
authoritative version of metadata information needs additional
discussion and planning. If you really expect to store all metadata in
messages, then we need to be planning how to do that. I sincerely doubt
though if that is the best approach.

Ron K.

unread,

Jun 12, 2008, 6:46:21 PM6/12/08

to

Kent James keyboarded, On 6/12/2008 6:14 PM :

> Andrew Sutherland wrote:
>>
>> I think we should be storing the information in the message.
>>
>> Andrew
>
> I'm not sure if this is even feasible in the IMAP case. We are
> currently using custom IMAP flags, but that pretty much limits you to
> binary data. Then there is news, where we don't modify the message at
> all AFAIK.

Right, because NNTP has the messages on the client side only if User
selected for Offline reading or the User has done a copy to a local
archival folder. News is like a broadcast, while mail is narrowcast
where the client has ownership once delivered. About all that should be
tracked is Date, Subject, Sender, and Message ID with the Reference
Header strings for thread reconstruction (at least for the portion a
message lived in).

> Looking forward to the kinds of things that I would be interested in,
> messages get additional metadata from triage activities, association
> with projects, statistical analysis - in addition to the contact and
> conversation metadata that you refer to. There could be lots of kinds
> of metadata, each only used by a fraction of the userbase.

--
Ron K.
Who is General Failure, and why is he searching my HDD?
Kernel Restore reported Major Error used BSOD to msg the enemy!

Robert Kaiser

unread,

Jun 12, 2008, 6:50:07 PM6/12/08

to

Andrew Sutherland wrote:
> Yes, my intention is that the global database index everything that Mork
> indexes. If we can efficiently perform a search on a single folder,
> there is no reason that we shouldn't be able to do it globally[1].

In previous discussions taken from the POV of replacing Mork with
SQLite, most commenters seemed to prefer per-account storage instead of
global storage, esp. as reportedly, SQLite can do reasonably fast
searches across multiple databases.
It would also allow different account types to store different metadata
additional to obviously some set of common fields, and allow accounts to
e.g. be backuped separately with all metadata intact.

What are the reasons to go global on this now? Is it really better as
per-account?

Robert Kaiser

Andrew Sutherland

unread,

Jun 13, 2008, 2:10:41 AM6/13/08

to

Robert Kaiser wrote:
> In previous discussions taken from the POV of replacing Mork with
> SQLite, most commenters seemed to prefer per-account storage instead of
> global storage, esp. as reportedly, SQLite can do reasonably fast
> searches across multiple databases.
> It would also allow different account types to store different metadata
> additional to obviously some set of common fields, and allow accounts to
> e.g. be backuped separately with all metadata intact.
>
> What are the reasons to go global on this now? Is it really better as
> per-account?

I think the cross-account case is the interesting case for new
functionality and capabilities. The fusion of information from e-mail,
newsgroups, IRC, etc. The intent here is not to replace Mork, but to
have a global database.

I don't think the schema I have proposed has any problem with varying
meta-data per originating account type. Are you thinking of a model
where the database schema is varied based on the account type/user's
preferred meta-data? (In other words, dynamically adding/removing columns.)

My gut reaction to the one-database-per-account for the global database
fusion goal is that it would complicate things too much for a single
layer of abstraction. However, I think we could instantiate one of what
I am calling the 'global database' per account, and then introduce an
additional layer on top of that scatters/gathers information from all
those account-global databases. But that's still not going to be a Mork
replacement...

Andrew

Andrew Sutherland

unread,

Jun 13, 2008, 3:01:35 AM6/13/08

to

Robert Kaiser wrote:
> It would also allow different account types to store different metadata
> additional to obviously some set of common fields, and allow accounts to
> e.g. be backuped separately with all metadata intact.

(splitting this into its own issue sub-thread... not sure how the
subject change is going to pan out)

So, I think there are really 3 separate issues coming into play here,
and we want to solve as many of them with as few implementation as possible.
1) File-system back-ups.
2) Profile migration of individual accounts, and technical users who use
this as a means of back-up.
3) Historical archiving.

Elaborations...

1) File-system back-ups. I think a file-system back-up is the least
likely to be affected by how we go about implementing things (ignoring
disk usage for now). If the user's drive crashes, the entire profile is
presumably restored, and possibly the specific version of Thunderbird in
use. How the data is arranged doesn't matter.

Coming back to disk usage, there are things we could do to reduce the
number of deltas perceived by incremental back-up mechanisms (including
disk snapshots).

2) Profile migration of individual accounts, and technical users who use
this as a means of back-up. Examples: A user wants to copy their
personal e-mail account and its meta-data from their desktop to their
laptop, avoiding copying a work-related account. A technical user wants
to rsync only their personal e-mail account to a back-up location,
without backing up all of their junk e-mail accounts, etc.

I think this is the case you are most specifically talking about.

3) Historical archiving. Examples: A user switching to another e-mail
client wants to get their e-mail and their meta-data out of Thunderbird.
Historians from the future recover a Thunderbird e-mail profile and
want to see what e-mails humans used to share, and what meta-data they
associated with it. Corporate governance/government regulations require
e-mail to be archived at intervals for historians/others many years down
the road, when Thunderbird 3.x no longer uses SQLite, etc.

I think historical archiving should be the driving problem that, by
solving it, assists us in solving the 2nd case. I do not think a SQLite
database addresses this problem in an ideal manner. I especially think
a SQLite database schema whose goals are performance and features and is
not concerned with maintaining a single schema forever does not address
this problem at all.

That's why I said in my previous message that I think we should be
storing the information in the message. Unfortunately, I neglected to
elaborate on things (but my message was short enough that no one would
fear to read it...).

Here's my (likely contentious) solution:

Let's store everything on disk in the appropriate standard, but
marked-up using the relevant extensions (and of course augmented by
indices, as Mork does for us today):
* POP/Local Message Folders: easy, we already do this. Perhaps we add
MailDir to the mix.
* IMAP: Save the messages locally, just like it was POP. Add a header
that tells us where it came from.
* RSS/Atom: Save the RSS/Atom entries to disk similar to how we pulled
them down from the web. Use our own namespace and/or extension
mechanisms supported by the standard to inject our annotations and any
origin information (if not explicit in the data-stream).
* IRC/Instant Messaging: Save the logs however we got'em, and/or using a
standard that falls out. (Perhaps XMPP reduced to message payloads.)
* Other: However it came to us, or a dominant standard in the domain.

I think one could make the argument that this is not completely insane.
Non-mobile computing devices should not be hurting for resources so
badly that this is untenable. Mobile computing devices, by their very
nature, should not be authoritative information stores. What we cannot
push upstream based on the underlying protocol (IMAP) should perhaps use
a Weave-like mechanism to propagate the information to an authoritative
data-store (which could itself be a server and not just another
Thunderbird client).

Andrew

Andrew Sutherland

unread,

Jun 13, 2008, 3:09:17 AM6/13/08

to

Andrew Sutherland wrote:
> Let's store everything on disk in the appropriate standard, but
> marked-up using the relevant extensions (and of course augmented by
> indices, as Mork does for us today):
> * POP/Local Message Folders: easy, we already do this. Perhaps we add
> MailDir to the mix.
> * IMAP: Save the messages locally, just like it was POP. Add a header
> that tells us where it came from.
> * RSS/Atom: Save the RSS/Atom entries to disk similar to how we pulled
> them down from the web. Use our own namespace and/or extension
> mechanisms supported by the standard to inject our annotations and any
> origin information (if not explicit in the data-stream).
> * IRC/Instant Messaging: Save the logs however we got'em, and/or using a
> standard that falls out. (Perhaps XMPP reduced to message payloads.)
> * Other: However it came to us, or a dominant standard in the domain.

Just realized I forgot NNTP, as raised by Ron K. elsewhere on the thread
(news://news.mozilla.org:119/OKSdnUwM67ehOszV...@mozilla.org).

For archival purposes, unless the user has requested the entire
newsgroup to be archived/made available offline, I would propose
archiving only articles on which the user has explicitly annotated
specific meta-data (and reading the message doesn't count). We could
even go so far as to remove all data that isn't annotation or required
to locate the message again in the newsgroup.

Andrew

Andrew Sutherland

unread,

Jun 13, 2008, 3:20:54 AM6/13/08

to

Kent James wrote:
> Looking forward to the kinds of things that I would be interested in,
> messages get additional metadata from triage activities, association
> with projects, statistical analysis - in addition to the contact and

What kind of statistical analysis? One delineation I've tried to make
in the global database proposal is in treating data that can be
automatically re-derived specially.

Namely, the global database only needs to pre-calculate it and store it
because that's the only way to make search based on the analysis
possible. In the 'aggregate data' case, the idea is that analysis is
performed/cached based on demand and usage.

In general, my opinion is that we would only want to treat statistical
data as something we want reliably archived if the analysis concludes
something that the user is implicitly agreeing with. For example, if
Thunderbird automatically marks a message as spam and the user doesn't
correct it, there's a good chance that message is really spam and we
should save that state for all eternity until corrected. Likewise,
automated topic analysis becomes a de facto trait of the message and
should be persisted; it would be confusing for a message to change
topics based on profile migration.

Andrew

Kent James

unread,

Jun 13, 2008, 4:18:57 AM6/13/08

to

Andrew Sutherland wrote:

>
> What kind of statistical analysis?
>

> ...

> In general, my opinion is that we would only want to treat statistical
> data as something we want reliably archived if the analysis concludes
> something that the user is implicitly agreeing with.

I think you are mostly considering junkmail when you say this. I don't
think you can generally say that only metadata with an implicit
agreement has real value. Yes you didn't say "real value" but you did
say "reliably archived."

There are degrees of "reliable". We are not talking about offline backup
of critical data here, we are talking about the data used for day-to-day
operation of some future sophisticated email program. If my views rely
on statistical data for operation, then I should be able to reasonably
rely on that data being available once created. The fact that I can
regenerate it with 2 hours of computer time does not really help me show
the user my sophisticated statistical view whose data just got blown away.

Just you understand my experience, the average lifetime of my mork
folder databases on one account (IMAP to Exchange server) is about one
day. On my main Local folder, it was last blown away on April 25, 2008 -
6 weeks ago. There is currently no expectation of reliable storage of
data there. We need a reliable storage for message metadata.

But to answer your specific question, one example of per-message
statistical data would be content tokens that exceed some measure of
significance given the types of current classification that I am trying
to do of messages. This does not need to be "reliably archived", but it
does need to be actually available without being regenerated daily.

Joshua Cranmer

unread,

Jun 13, 2008, 8:30:51 AM6/13/08

to

Andrew Sutherland wrote:
> For archival purposes, unless the user has requested the entire
> newsgroup to be archived/made available offline, I would propose
> archiving only articles on which the user has explicitly annotated
> specific meta-data (and reading the message doesn't count). We could
> even go so far as to remove all data that isn't annotation or required
> to locate the message again in the newsgroup.

FWIW, I've considered implementing spam filtering for news, which would
force offline if enabled. Unfortunately, I think offline news has its
own can of worms not yet considered...

David Huynh

unread,

Jun 17, 2008, 5:04:10 PM6/17/08

to

Andrew, I think my Seek 2.0 schemas turned out to be pretty similar to
yours. One thing I haven't seen in the wiki page is how the messages are
connected to the contacts. Maybe I'm missing something...

By the way, the database used in Seek 1.0 is a "graph store" (salvaged
from Exhibit), which is basically a semantic network, and it makes it
super easy to handle the kind of problem you're describing with contacts
with multiple identities. I also have a little expression language to
traverse that graph. You can see some expressions in the facet
definitions here

http://code.google.com/p/simile-seek/source/browse/seek/trunk/src/extension/content/scripts/facet-configurations.js

For example, the "from domain" facet is defined with the expression
".author.domain", which basically hops from the message to the author
and from the author to the domain (of their email address).

The reason why I'm using sqlite for Seek 2.0 is just performance.
Really, relational databases are so 1990s :) We should all be moving on
to graph stores now. They are conceptually much simpler to deal with,
since the machine takes care of the normalization.

----

On a different note, I expect another big challenge in Seek 2.0 will be
to keep a usable, responsive UI even while the global database is
getting itself sync'ed up with the folder message stores. There should
be a robust notification framework for the UI to keep itself in touch
with the global database's state. Any thought on that?

David

Andrew Sutherland

unread,

Jun 17, 2008, 11:25:43 PM6/17/08

to

David Huynh wrote:
> Andrew, I think my Seek 2.0 schemas turned out to be pretty similar to
> yours. One thing I haven't seen in the wiki page is how the messages are
> connected to the contacts. Maybe I'm missing something...

The idea is that they go in the messageAttributes table, which is
basically a poor-man's RDF/triple-store. The subject in each row is
always a message (and the conversation it belongs to), the predicate is
the integer reference to a row in the attributeDefinitions table, and
the object is the numeric 'value'.

So we define FROM/TO/CC/etc. attributes and have the value be the id of
the contact/identity. We lose the chance for referential integrity
enforcement and perhaps the query optimizer loses some insight into the
distribution of 'value', but it should work.

The various proposed uses of the attribute tables start from here:
http://wiki.mozilla.org/User:Andrew_Sutherland/MailNews/GlobalDatabase#Fundamental_Attributes

> By the way, the database used in Seek 1.0 is a "graph store" (salvaged
> from Exhibit), which is basically a semantic network, and it makes it
> super easy to handle the kind of problem you're describing with contacts
> with multiple identities. I also have a little expression language to
> traverse that graph. You can see some expressions in the facet
> definitions here

> The reason why I'm using sqlite for Seek 2.0 is just performance.

> Really, relational databases are so 1990s :) We should all be moving on
> to graph stores now. They are conceptually much simpler to deal with,
> since the machine takes care of the normalization.

Yes, graph stores are awesome, but I have been avoiding them. I must
admit to playing it (somewhat) conservative with the design so far... I
both want to avoid biting off more than we can chew for a Thunderbird
3.0 timeline and to avoid creating a stigma associated with
Thunderbird's global database due to (perceived) over-engineering.

I think you are causing me to re-visit my fear of providing more general
graph-store capabilities, if only because my contact/identity problem
would likely be resolved in a cleaner fashion by a general solution than
the special-case I am basically proposing right now. (With the same SQL
query complexity.)

So then, in your Seek 2.0 graph store schema, do you specialize the
triple-store tables to specific object types (with precedence/object
constraints for ambiguous cases of which table to use), or do you just
have a single table for all uses (where the predicate implies the object
types, or some other form of name-spacing occurs)?

> On a different note, I expect another big challenge in Seek 2.0 will be
> to keep a usable, responsive UI even while the global database is
> getting itself sync'ed up with the folder message stores. There should
> be a robust notification framework for the UI to keep itself in touch
> with the global database's state. Any thought on that?

In terms of keeping in touch:

My tentative plan is to have the global database layer track outstanding
queries it has issued with weak references. When we are making a change
to the global database (due to indexing, explicit user actions,
notifications from Thunderbird proper, etc.), we see if the change
intersects any of the outstanding queries. Since the queries all are
going to be factored to get their data via asynchronous notifications,
they should not be surprised when a new result trickles in. They may be
surprised if something disappears, but hopefully it will be easy enough
to code for that case that it will be coded for.

To avoid pathological performance, we would make sure that our
notifications are batched with some reasonable granularity. We are
aided in this by an increasing number of batch notifications from the
Thunderbird core.

This solution dovetails with my plan for implementation of the proposed
'aggregate data' support. For example, if I want a sparkline graph of
my communication with a given contact by week over the course of several
years (and we e-mail a lot), this may be something that it's advisable
to cache. If the (hand-waving here) aggregate data code decides to
cache it, we will then maintain the cache using the same mechanism. If
we are adding/remove a message involving me and that contact, we update
the aggregate data. Let's assume a decision tree, a user not under
assault by e-mail, etc.

In terms of responsiveness:

The reality of the situation is that we have to do most of our work on
the user interface thread. This means that the best we can do is to try
and keep our operations at a sufficiently small granularity and
load-level when the user is actively trying to use Thunderbird. On the
other hand, we should try and go crazy when they aren't doing much. I
suspect there is no nsIUserActivity notification interface with a
'goCrazy' method, yet, but I don't see why we couldn't add one...

Andrew

David Huynh

unread,

Jun 20, 2008, 3:14:01 PM6/20/08

to

Andrew Sutherland wrote:
> David Huynh wrote:
>> Andrew, I think my Seek 2.0 schemas turned out to be pretty similar to
>> yours. One thing I haven't seen in the wiki page is how the messages are
>> connected to the contacts. Maybe I'm missing something...
>
> The idea is that they go in the messageAttributes table, which is
> basically a poor-man's RDF/triple-store. The subject in each row is
> always a message (and the conversation it belongs to), the predicate is
> the integer reference to a row in the attributeDefinitions table, and
> the object is the numeric 'value'.
>
> So we define FROM/TO/CC/etc. attributes and have the value be the id of
> the contact/identity. We lose the chance for referential integrity
> enforcement and perhaps the query optimizer loses some insight into the
> distribution of 'value', but it should work.
>
> The various proposed uses of the attribute tables start from here:
> http://wiki.mozilla.org/User:Andrew_Sutherland/MailNews/GlobalDatabase#Fundamental_Attributes

I see. Regarding the data schemas, my top criteria are simply
- to have unique and stable identifiers for messages across folders and
across accounts
- to have unique and stable identifiers for threads also across folders
and accounts
- to have first-class support for what you call contacts and identifies
I think I can manage everything else.

>> [snip]

>
> I think you are causing me to re-visit my fear of providing more general
> graph-store capabilities, if only because my contact/identity problem
> would likely be resolved in a cleaner fashion by a general solution than
> the special-case I am basically proposing right now. (With the same SQL
> query complexity.)
>
> So then, in your Seek 2.0 graph store schema, do you specialize the
> triple-store tables to specific object types (with precedence/object
> constraints for ambiguous cases of which table to use), or do you just
> have a single table for all uses (where the predicate implies the object
> types, or some other form of name-spacing occurs)?

In Seek 1.0 I did have a graph store, but in Seek 2.0, in order to use
sqlite efficiently, I'm afraid I'll have to use normal relational
tables. Then I'll have to build a graph abstraction on top of that.

>> On a different note, I expect another big challenge in Seek 2.0 will be
>> to keep a usable, responsive UI even while the global database is
>> getting itself sync'ed up with the folder message stores. There should
>> be a robust notification framework for the UI to keep itself in touch
>> with the global database's state. Any thought on that?
>
> In terms of keeping in touch:
>
> My tentative plan is to have the global database layer track outstanding
> queries it has issued with weak references. When we are making a change
> to the global database (due to indexing, explicit user actions,
> notifications from Thunderbird proper, etc.), we see if the change
> intersects any of the outstanding queries. Since the queries all are
> going to be factored to get their data via asynchronous notifications,
> they should not be surprised when a new result trickles in. They may be
> surprised if something disappears, but hopefully it will be easy enough
> to code for that case that it will be coded for.

I think in some cases async queries are necessary and in others sync
queries are convenient. I'd vote for supporting both and trust your API
users (like me) to be sane :)

> [snip]

>
> In terms of responsiveness:
>
> The reality of the situation is that we have to do most of our work on
> the user interface thread. This means that the best we can do is to try
> and keep our operations at a sufficiently small granularity and
> load-level when the user is actively trying to use Thunderbird. On the
> other hand, we should try and go crazy when they aren't doing much. I
> suspect there is no nsIUserActivity notification interface with a
> 'goCrazy' method, yet, but I don't see why we couldn't add one...

Seek 1.0 actually breaks its indexing into small jobs, which allows the
progress bar to get updated. It's a little tricky to make sure that
changes to the folder during the indexing process don't get lost.

So, it seems that my new job is getting demanding and I don't have much
free time to do the whole architecture of Seek 2.0. I'd love to be able
to get your early alpha/beta global database extension and build on top
of that. Any prediction on when that might be available for testing?

On a slightly different topic: one of the pain points for building Seek
1.0 was to re-implement the view for the thread tree. This was because
the default implementation was in C++, and so it was impossible for me
to override it partially in Javascript. So I had to pretty much re-do
the whole thing, trying to figure out how to render each column, how to
hook in the shortcut accelerators, etc. It'd be nice if that thread tree
is a lot easier to customize. :)

David

Andrew Sutherland

unread,

Jun 24, 2008, 4:03:32 PM6/24/08

to

David Huynh wrote:
> So, it seems that my new job is getting demanding and I don't have much
> free time to do the whole architecture of Seek 2.0. I'd love to be able
> to get your early alpha/beta global database extension and build on top
> of that. Any prediction on when that might be available for testing?

Milestone 1 will likely happen July 1-2, with subsequent milestones
hopefully occurring no less frequently than every 2 weeks. I will post
here in the group when I believe it is reached. I think you can also
get an RSS feed from the mercurial repositories I am using, which are
listed in my Milestone 0 announcement
(news://news.mozilla.org:119/Q56dnYF4XIBxpMnV...@mozilla.org).

> On a slightly different topic: one of the pain points for building Seek
> 1.0 was to re-implement the view for the thread tree. This was because
> the default implementation was in C++, and so it was impossible for me
> to override it partially in Javascript. So I had to pretty much re-do
> the whole thing, trying to figure out how to render each column, how to
> hook in the shortcut accelerators, etc. It'd be nice if that thread tree
> is a lot easier to customize. :)

The experimental message view is in the same boat. What has been
created thus far is all JavaScript, and will likely stay that way, owing
to the strict alignment of the C++ code with folder-based message views.
I believe the plan is to not do a strict re-implementation, but to
leverage some of the excellent work already done on bug 213945
(https://bugzilla.mozilla.org/show_bug.cgi?id=213945) to support
multi-line displays in the tree view. Of course, we will want to
maintain as much UI consistency as possible, and perhaps to that end may
be able to build on your also excellent work :)

I will try and keep customization in mind as I do things, but
milestone-wise, we're probably talking milestone 3 or 4 before explicit
intent to support that. Longer term, I would like to provide the
ability for custom columns to support some form of Canvas rendering,
allowing visualizations to be embedded. (SVG is also an option, but
seems potentially contrary to the performance goals of a tree view with
potentially thousands of items present.)

Andrew