mime::initialize has to perform better

Cameron Laird

unread,

Nov 8, 2001, 3:39:43 PM11/8/01

to

Does no one else [package require mime]? When items get over
a megabyte, the time to complete
mime::initialize -string $item
on the host available to me appears to diverge. I'm astonished
that no one else is complaining about this.

mime is set up to handle lots of small items; it's definitely
not optimized for the multi-megabyte e-mail messages my customers
regularly generate. The initialize implementation, for example,
cuts an item up with
while 1 {
...
set pos [string first "\n" $string]
set line [string range $string 0 [expr {$pos-1}]]
set string [string range $string [expr {$pos+1}] end]
...
}

This takes MINUTES even with a fast CPU and lots of memory.

'Anyone else want to get involved in a performance-enhancing
optimization, or should I just code for my own needs?
--

Cameron Laird <Cam...@Lairds.com>
Business: http://www.Phaseit.net
Personal: http://starbase.neosoft.com/~claird/home.html

Darren New

unread,

Nov 8, 2001, 3:43:46 PM11/8/01

to

Cameron Laird wrote:
> Does no one else [package require mime]? When items get over
> a megabyte, the time to complete
> mime::initialize -string $item
> on the host available to me appears to diverge. I'm astonished
> that no one else is complaining about this.

Actually, my original work on a Tcl MIME package was to leave the
message in the file (if that's where it was) and just parse the headers
and boundary markers and such, with offsets into the file. Composing a
multipart could also leave the data in the file until you asked to
serialize it (either over an SMTP socket or to a string or some other
channel).

The difficulty of dealing with really large messages (like, bigger than
your RAM) is a lot harder to deal with.

That said, there's probably a variety of micro-improvements that can
speed up the MIME code, assuming your message isn't a signficant
percentage of your RAM size.

--
Darren New
San Diego, CA, USA (PST). Cryptokeys on demand.
You will soon read a generic fortune cookie.

Cameron Laird

unread,

Nov 8, 2001, 4:02:02 PM11/8/01

to

In article <3BEAEE6F...@san.rr.com>, Darren New <dn...@san.rr.com> wrote:
>Cameron Laird wrote:
>> Does no one else [package require mime]? When items get over
>> a megabyte, the time to complete
>> mime::initialize -string $item
>> on the host available to me appears to diverge. I'm astonished
>> that no one else is complaining about this.
>
>Actually, my original work on a Tcl MIME package was to leave the
>message in the file (if that's where it was) and just parse the headers
>and boundary markers and such, with offsets into the file. Composing a
>multipart could also leave the data in the file until you asked to
>serialize it (either over an SMTP socket or to a string or some other
>channel).
>
>The difficulty of dealing with really large messages (like, bigger than
>your RAM) is a lot harder to deal with.
>
>That said, there's probably a variety of micro-improvements that can
>speed up the MIME code, assuming your message isn't a signficant
>percentage of your RAM size.

.
.
.
Oh, leaving the item bodies alone makes an ENORMOUS difference;
don't even start me on that one.

I'll say it this way: when I replace the above [string first ...]
sorts of parsing with a [split ... \n], a typical (for my needs)
e-mail filter takes ten seconds instead of twenty-five minutes.
Those who like to look at global consequences are welcome to
contemplate how this appears to the Perl crowd.

So: is it worth my time to make a nice patch? Is anyone else
working in this area?

We can leave aside for a moment the larger-than-memory cases.

Jeff Hobbs

unread,

Nov 8, 2001, 5:21:13 PM11/8/01

to

Cameron Laird wrote:
>
> Does no one else [package require mime]? When items get over
> a megabyte, the time to complete
> mime::initialize -string $item
> on the host available to me appears to diverge. I'm astonished
> that no one else is complaining about this.
>
> mime is set up to handle lots of small items; it's definitely
> not optimized for the multi-megabyte e-mail messages my customers
> regularly generate. The initialize implementation, for example,
> cuts an item up with
> while 1 {
> ...
> set pos [string first "\n" $string]
> set line [string range $string 0 [expr {$pos-1}]]
> set string [string range $string [expr {$pos+1}] end]
> ...
> }
>
> This takes MINUTES even with a fast CPU and lots of memory.
>
> 'Anyone else want to get involved in a performance-enhancing
> optimization, or should I just code for my own needs?

Can you be more specific of an exact example that takes a while
for you? I'm trying some basic stuff and I can't cause it to
more than a second, but I'm sure that I must be avoiding some
basic part of the expensive algorithms.

(mime) 63 % catch {string repeat abcd 4096000} mb16
0
(mime) 64 % time {mime::finalize [mime::initialize \
-canonical text/plain -string $mb16]} 100
767 microseconds per iteration
(mime) 65 % time {mime::finalize [mime::initialize \
-canonical multipart/mime -string $mb16]} 100
778 microseconds per iteration
(mime) 66 % string length $mb16
16384000

This was on a PII/450 with 256MB physmem (IOW, no speed demon).
This was with ActiveTcl 8.3.4.1.

--
Jeff Hobbs The Tcl Guy
Senior Developer http://www.ActiveState.com/
Tcl Support and Productivity Solutions

Jeff Hobbs

unread,

Nov 8, 2001, 5:22:20 PM11/8/01

to

Cameron Laird wrote:
...

> So: is it worth my time to make a nice patch? Is anyone else
> working in this area?

Patches are always good. You don't have to do it yourself, just
submit the patch to SF and mod it high for quick response.

Cameron Laird

unread,

Nov 8, 2001, 7:33:31 PM11/8/01

to

In article <3BEB0559...@ActiveState.com>,

Jeff Hobbs <Je...@ActiveState.com> wrote:
>Cameron Laird wrote:
>>
>> Does no one else [package require mime]? When items get over
>> a megabyte, the time to complete
>> mime::initialize -string $item
>> on the host available to me appears to diverge. I'm astonished
>> that no one else is complaining about this.
>>
>> mime is set up to handle lots of small items; it's definitely
>> not optimized for the multi-megabyte e-mail messages my customers
>> regularly generate. The initialize implementation, for example,
>> cuts an item up with
>> while 1 {
>> ...
>> set pos [string first "\n" $string]
>> set line [string range $string 0 [expr {$pos-1}]]
>> set string [string range $string [expr {$pos+1}] end]
>> ...
>> }
>>
>> This takes MINUTES even with a fast CPU and lots of memory.

.
.
.

>Can you be more specific of an exact example that takes a while
>for you? I'm trying some basic stuff and I can't cause it to
>more than a second, but I'm sure that I must be avoiding some
>basic part of the expensive algorithms.
>
>(mime) 63 % catch {string repeat abcd 4096000} mb16
>0
>(mime) 64 % time {mime::finalize [mime::initialize \
> -canonical text/plain -string $mb16]} 100
>767 microseconds per iteration
>(mime) 65 % time {mime::finalize [mime::initialize \
> -canonical multipart/mime -string $mb16]} 100
>778 microseconds per iteration
>(mime) 66 % string length $mb16
>16384000
>
>This was on a PII/450 with 256MB physmem (IOW, no speed demon).
>This was with ActiveTcl 8.3.4.1.

.
.
.
Absolutely.

First, thanks for your leadership, in making your
own experiments explicit. This is a model for
clear communication.

My complaints have to do with *parsing*, so an
example is more like
package require mime

proc small_test size {
set head "To: someone\nSubject: something\n\n"
set body [string repeat abcd\n $size]
set item $head$body
set length [string length $item]
set result [time {mime::finalize [mime::initialize \
-string $item]} 10]
puts "$size ($length): $result"
}

small_test 1000
small_test 10000
small_test 100000
There's more to it than this, though; I'm still
tracking down the specifics. I'll be back with
details as I'm able to generate them.

Jeffrey Hobbs

unread,

Nov 9, 2001, 1:32:54 AM11/9/01

to

Cameron Laird wrote:
...

> My complaints have to do with *parsing*, so an
> example is more like

...

Cameron and I have gone offline with this, because the simple
example he gave is also OK. You need to get more complex before
you hit the heavy speed hit.

lvi...@yahoo.com

unread,

Nov 9, 2001, 5:57:26 AM11/9/01

to

According to Cameron Laird <cla...@starbase.neosoft.com>:
:So: is it worth my time to make a nice patch? Is anyone else
:working in this area?

Another place to check for activity is the newly created tcllib-developers
mailing list. Subscription details available at http://sf.net/projects/tcllib/

--
"I know of vanishingly few people ... who choose to use ksh." "I'm a minority!"
<URL: mailto:lvi...@cas.org> <URL: http://www.purl.org/NET/lvirden/>
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.

Cameron Laird

unread,

Nov 9, 2001, 1:42:04 PM11/9/01

to

In article <3BEB7988...@ActiveState.com>,

Jeffrey Hobbs <Je...@ActiveState.com> wrote:
>Cameron Laird wrote:
> ...
>> My complaints have to do with *parsing*, so an
>> example is more like
> ...
>
>Cameron and I have gone offline with this, because the simple
>example he gave is also OK. You need to get more complex before
>you hit the heavy speed hit.

.
.
.
Demonstration code follows.

I'll review. My distress got the better of me, and I posted
a couple of confusing posts. There *is* a serious performance
problem, and it *can* be improved; I apologize for sending off
earlier stuff that so obscured those realities. The key point
that I completely failed to explain is that the current mime
package treats multipart differently; it parses multipart to
the bitter end, and does so excruciatingly slowly.

What follows is executable code that demonstrates the bad per-
formance. Comments at the end suggest the idea for a fix.

There are lots of other issues with mime. This is far, far the
most urgent, in my opinion.
========================================================================

package require mime

########
#
# A typical return value from this proc looks like
# MIME-Version: 1.0
# Content-ID: <11074.100...@ecunet.org>
# Content-Type: multipart/mixed;
# boundary="----- =_MTAwNTMyODkwMDo6bWltZTo6M2VjdW5ldC5vcmdwYXI="
#
# ------- =_MTAwNTMyODkwMDo6bWltZTo6M2VjdW5ldC5vcmdwYXI=
# MIME-Version: 1.0
# Content-ID: <11074.100...@ecunet.org>
# Content-Type: text/plain
#
# This is a first part.
#
# ------- =_MTAwNTMyODkwMDo6bWltZTo6M2VjdW5ldC5vcmdwYXI=
# MIME-Version: 1.0
# Content-ID: <11074.100...@myhost.com>
# Content-Type: application/octet-stream
#
# abcd
# abcd
# abcd
# ...
#
#
########
proc construct_item_with_attachment size {
set message_token [mime::initialize -canonical text/plain \
-string "This is a first part."]
set attachment_body [string repeat abcd\n [expr $size / 5]]
set attachment_token [mime::initialize \
-canonical application/octet-stream \
-string $attachment_body]
set multi_token [mime::initialize -canonical multipart/mixed \
-parts [list $message_token $attachment_token]]

set packaged [mime::buildmessage $multi_token]
mime::finalize $multi_token
return $packaged
}

proc small_test size {
set item [construct_item_with_attachment $size]

set length [string length $item]
set result [time {mime::finalize [mime::initialize \
-string $item]} 10]
puts "$size ($length): $result"
}

small_test 100
small_test 1000
small_test 10000
small_test 100000

########
#
# Typical results:
# 100 (654): 12207 microseconds per iteration
# 1000 (1554): 28418 microseconds per iteration
# 10000 (10554): 682226 microseconds per iteration
# 100000 (100557): 57002148 microseconds per iteration
#
# Interpretation: a modest item of barely over 100,000 bytes
# requires 57 seconds (!) to parse. By the time we get
# to the multi-megabyte traffic typical for my customers
# ... well, it's an ugly sight.
#
# Observation: an enormous sink is the
# while 1 {
# set pos [string first "\n" $string]]
# set line [string range $string 0 [expr {$pos-1}]]
# set string [string range $string [expr {$pos+1}] end]
# ...
# }
# loop in mime::parsepart. Performance improves enormously
# when
# foreach line [split $string \n] {
# ...
# }
# replaces this.
#
########

Pat Thoyts

unread,

Nov 9, 2001, 4:43:37 PM11/9/01

to

Cameron Laird <cla...@starbase.neosoft.com> wrote:
[snip]

>Those who like to look at global consequences are welcome to
>contemplate how this appears to the Perl crowd.

Use the right tool for the job? Tcl is a glue language after all.

>
>So: is it worth my time to make a nice patch? Is anyone else
>working in this area?

Yes.
--
Pat Thoyts http://www.zsplat.freeserve.co.uk/resume.html
To reply, rot13 the return address or read the X-Address header.
PGP fingerprint 2C 6E 98 07 2C 59 C8 97 10 CE 11 E6 04 E0 B9 DD"

Frank Pilhofer

unread,

Nov 10, 2001, 9:11:58 AM11/10/01

to

Cameron Laird <cla...@starbase.neosoft.com> wrote:
> # Typical results:
> # 100 (654): 12207 microseconds per iteration
> # 1000 (1554): 28418 microseconds per iteration
> # 10000 (10554): 682226 microseconds per iteration
> # 100000 (100557): 57002148 microseconds per iteration
> #
> # Interpretation: a modest item of barely over 100,000 bytes
> # requires 57 seconds (!) to parse. By the time we get
> # to the multi-megabyte traffic typical for my customers
> # ... well, it's an ugly sight.

Another interpretation: what you're seeing is quadratic behavior: the
runtime scales with the square of the input length. The obvious solution
is that the algorithm must be replaced with a linear one.

Apart from your [split], you could also use

set curpos 0
while {1} {
set pos [string first "\n" $string $curpos]
set line [string range $string $curpos [expr {$pos-1}]]
set curpos $pos
incr curpos
}

This has the advantage of lesser memory consumption, because the string
is not duplicated.

Frank

--
Frank Pilhofer ........................................... f...@fpx.de
I'm a pessimist so that I can be positively suprised by reality. - FP

lvi...@yahoo.com

unread,

Nov 28, 2001, 9:08:48 AM11/28/01

to

According to Pat Thoyts <Cng.G...@ovtsbbg.pbz>:

:Cameron Laird <cla...@starbase.neosoft.com> wrote:
:[snip]

:>So: is it worth my time to make a nice patch? Is anyone else

:>working in this area?
:
:Yes.

Okay, I'll byte. Please let us know whom else is working in this area...

Cameron Laird

unread,

Nov 28, 2001, 9:52:43 AM11/28/01

to

In article <9u2r5g$g7o$7...@srv38.cas.org>, <lvi...@yahoo.com> wrote:
>
>According to Pat Thoyts <Cng.G...@ovtsbbg.pbz>:
>:Cameron Laird <cla...@starbase.neosoft.com> wrote:
>:[snip]
>:>So: is it worth my time to make a nice patch? Is anyone else
>:>working in this area?
>:
>:Yes.
>
>
>Okay, I'll byte. Please let us know whom else is working in this area...

.
.
.
Darren New. Andreas. I. Donal, although as an intellectual
exercise; while he doesn't have a practical interest in MIME
attachments of over a megabyte, he certainly helps out. Maybe
Reinhard. Frank Pilhofer. That's about everybody--which
astounds me, 'cause, if I didn't know better, I'd expect lots
of people to be manipulating MIME stuff with Tcl, and to care
about its performance.

We'll probably discuss it in tcllib-dev from here on.