Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to access the TBird message composition content

78 views
Skip to first unread message

Jason Smith

unread,
Mar 20, 2009, 4:29:55 AM3/20/09
to
I need access to the plain text content of the email body in
Thunderbird. This is my first XUL project so I'd appreciate any
feedback as to whether I'm going in the right direction. I posted on
Stack Overflow but I don't think they have the domain knowledge for
this stuff.

Here is my experimentation code in the Extension Developer Javascript
console.

var composer = document.getElementById('msgcomposeWindow');
var frame = composer.getElementsByAttribute('id', 'content-
frame').item(0);
if(frame.editortype != 'textmail') {
print('Sorry, you are not composing in plain text.');
return;
}

var doc = frame.contentDocument.documentElement;

// XXX: This does not work because newlines are not in the string!
var text = doc.textContent;
print('Message content:');
print(text);
print('');

// Do a TreeWalker through the composition window DOM instead.
var body = doc.getElementsByTagName('body').item(0);
var acceptAllNodes = function(node) { return
NodeFilter.FILTER_ACCEPT; };
var walker = document.createTreeWalker(body, NodeFilter.SHOW_TEXT
| NodeFilter.SHOW_ELEMENT, { acceptNode: acceptAllNodes }, false);

var lines = [];

var justDidNewline = false;
while(walker.nextNode()) {
if(walker.currentNode.nodeName == '#text') {
lines.push(walker.currentNode.nodeValue);
justDidNewline = false;
}
else if(walker.currentNode.nodeName == 'BR') {
if(justDidNewline)
// This indicates back-to-back newlines in the message text.
lines.push('');
justDidNewline = true;
}
}

for(a in lines) {
print(a + ': ' + lines[a]);
}

I would appreciate any feedback as to whether I'm on the right track.
I also have some specific questions:

* Does `doc.textContent` really not have newlines? How stupid is
that? I'm hoping it's just a bug with the Javascript console but I
suspect not. A previous message on this board suggests this but if I
don't have the newlines then that's useless to me.

* Is the TreeWalker correct? I first tried `NodeFilter.SHOW_TEXT`
but it did not traverse into the `<SPAN>`s which contain the quoted
material in a reply. Similarly, it seems funny to `FILTER_ACCEPT`
every node and then manually cherry-pick it later, but I had the same
problem where if I rejected a `SPAN` node, the walker would not step
inside.

* Consecutive `<BR>`s break the naive implementation because there
is no `#text` node in between them. So I manually detect them and
push empty lines on my array. Is it really necessary to do that much
manual work to access the message content?

Thanks very much.

Neil

unread,
Mar 20, 2009, 6:59:29 AM3/20/09
to
Jason Smith wrote:

>Does `doc.textContent` really not have newlines? How stupid is that?
>

That's 100% spec. foo<br>bar's text content has no newlines, although it
displays as two lines.

>Is it really necessary to do that much manual work to access the message content?
>
>

I think the easiest way is to create a range over the body and convert
that to a string, which should convert <br> to newlines. It also gives
you a textual representation of HTML content, which you may find useful.

--
Warning: May contain traces of nuts.

Jason Smith

unread,
Mar 20, 2009, 7:31:04 AM3/20/09
to
On Mar 20, 5:59 pm, Neil <n...@parkwaycc.co.uk> wrote:
> Jason Smith wrote:
> >Does `doc.textContent` really not have newlines?  How stupid is that?
>
> That's 100% spec. foo<br>bar's text content has no newlines, although it
> displays as two lines.

Of course, you're right. For some reason I had assumed that
Thunderbird would have had a convenient hook to access the message
content and I was just venting.

> >Is it really necessary to do that much manual work to access the message content?
>
> I think the easiest way is to create a range over the body and convert
> that to a string, which should convert <br> to newlines. It also gives
> you a textual representation of HTML content, which you may find useful.

Thank you very much. I hadn't considered a range. I will look into
that.

Arivald

unread,
Mar 20, 2009, 7:39:25 AM3/20/09
to
Jason Smith pisze:

> I need access to the plain text content of the email body in
> Thunderbird. This is my first XUL project so I'd appreciate any
> feedback as to whether I'm going in the right direction. I posted on
> Stack Overflow but I don't think they have the domain knowledge for
> this stuff.
>
> Here is my experimentation code in the Extension Developer Javascript
> console.
>

in Composer window

window.gMsgCompose.editor.outputToString('text/plain', FLAGS);

Read more on MDC about possible FLAGS to use.
Other valid MIME to use with outputToString is 'text/html'.


> I would appreciate any feedback as to whether I'm on the right track.
> I also have some specific questions:
>
> * Does `doc.textContent` really not have newlines?

Yes and no. There is not a CR and/or LF, but there is a lot of <BR>.
Editor internally use DOM document (HTML), so in most cases it just
ignore CR and LF.
If You want CF and/or LF, use outputToString with proper flags.


BTW, which kind of extension You working on?
Anyway, for most tasks it is better to use exiting high-level objects,
like gMsgCompose, instead fighting with DOM tree.


--
Arivald

Jason Smith

unread,
Mar 20, 2009, 12:16:48 PM3/20/09
to
On Mar 20, 6:39 pm, Arivald <arivald_@AT_interia_DOT_pl> wrote:
> in Composer window
>
> window.gMsgCompose.editor.outputToString('text/plain', FLAGS);
>
> Read more on MDC about possible FLAGS to use.
> Other valid MIME to use with outputToString is 'text/html'.

That sounds great. Thank you very much.

> > I would appreciate any feedback as to whether I'm on the right track.
> > I also have some specific questions:
>
> >   * Does `doc.textContent` really not have newlines?
>
> Yes and no. There is not a CR and/or LF, but there is a lot of <BR>.
> Editor internally use DOM document (HTML), so in most cases it just
> ignore CR and LF.
> If You want CF and/or LF, use outputToString with proper flags.

Yes, I have come to realize more about Mozilla and XUL and it makes
sense that the composer is much more sophisticated than just a
<textarea>.

> BTW, which kind of extension You working on?
> Anyway, for most tasks it is better to use exiting high-level objects,
> like gMsgCompose, instead fighting with DOM tree.

Thanks. I've been slightly frustrated at the lack of introductory
material on Thunderbird extensions, especially in task-oriented
format. The API coverage is nice but it's a bit of a learning curve
to start. So I fall back on my bad Firebug+DOM-browser habits of just
querying the DOM with Javascript until I get something that works!

I am trying to make an extension where I can compose in Markdown
format, but the mail will be sent multipart/alternative with the HTML
version being rendered from the Markdown, perhaps by the Showdown
library. If I can accomplish this task, or even get close, I'd like
to get it into the extension registry, but first thing's first!

0 new messages