Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1004740: exim4: SIGSEGV (maybe attempt to write to immutable memory) when sending a mail; message frozen

277 views
Skip to first unread message

Vincent Lefevre

unread,
Feb 1, 2022, 9:10:03 AM2/1/22
to
On 2022-02-01 14:44:21 +0100, Vincent Lefevre wrote:
> 2022-02-01 14:23:23 1nEt2b-0008jG-97 SIGSEGV (maybe attempt to write to immutable memory)
> 2022-02-01 14:23:23 1nEt2b-0008jG-97 Delivery status for xxx@yyy: got 0 of 7 bytes (pipeheader) from transport process 35015 for transport smtp
>
> 2022-02-01 14:23:23 1nEt2b-0008jG-97 == xxx@yyy R=dnslookup T=remote_smtp defer (-1): smtp transport process returned non-zero status 0x000b: terminated by signal 11
> 2022-02-01 14:23:23 1nEt7z-00096o-IX <= <> R=1nEt2b-0008jG-97 U=Debian-exim P=local S=783
> 2022-02-01 14:23:23 1nEt2b-0008jG-97 Frozen
> 2022-02-01 14:23:23 1nEt7z-00096o-IX => vlefevre <postm...@cventin.lip.ens-lyon.fr> R=local_user T=maildir_home
> 2022-02-01 14:23:23 1nEt7z-00096o-IX Completed
> 2022-02-01 14:23:23 End queue run: pid=35012
> 2022-02-01 14:28:19 Start queue run: pid=35460
> 2022-02-01 14:28:19 1nEt2b-0008jG-97 Message is frozen
> 2022-02-01 14:28:19 End queue run: pid=35460
> [...]
>
> The consequence is that the mail is not sent, there are no retries,
> and the end user is not warned. So, from the end user point of view,
> the message is *silently lost*.

Some clarification: by "end user", I mean the author of the message.
There is an error message sent to postmaster, but not to the address
of the author of the message.

--
Vincent Lefèvre <vin...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Andreas Metzler

unread,
Feb 2, 2022, 12:30:03 PM2/2/22
to
On 2022-02-01 Vincent Lefevre <vin...@vinc17.net> wrote:
> On 2022-02-01 14:44:21 +0100, Vincent Lefevre wrote:
> > 2022-02-01 14:23:23 1nEt2b-0008jG-97 SIGSEGV (maybe attempt to write to immutable memory)
> > 2022-02-01 14:23:23 1nEt2b-0008jG-97 Delivery status for xxx@yyy: got 0 of 7 bytes (pipeheader) from transport process 35015 for transport smtp
> >
> > 2022-02-01 14:23:23 1nEt2b-0008jG-97 == xxx@yyy R=dnslookup T=remote_smtp defer (-1): smtp transport process returned non-zero status 0x000b: terminated by signal 11
> > 2022-02-01 14:23:23 1nEt7z-00096o-IX <= <> R=1nEt2b-0008jG-97 U=Debian-exim P=local S=783
> > 2022-02-01 14:23:23 1nEt2b-0008jG-97 Frozen
> > 2022-02-01 14:23:23 1nEt7z-00096o-IX => vlefevre <postm...@cventin.lip.ens-lyon.fr> R=local_user T=maildir_home
> > 2022-02-01 14:23:23 1nEt7z-00096o-IX Completed
> > 2022-02-01 14:23:23 End queue run: pid=35012
> > 2022-02-01 14:28:19 Start queue run: pid=35460
> > 2022-02-01 14:28:19 1nEt2b-0008jG-97 Message is frozen
> > 2022-02-01 14:28:19 End queue run: pid=35460
> > [...]
> >
> > The consequence is that the mail is not sent, there are no retries,
> > and the end user is not warned. So, from the end user point of view,
> > the message is *silently lost*.

> Some clarification: by "end user", I mean the author of the message.
> There is an error message sent to postmaster, but not to the address
> of the author of the message.

Yeah from exim's POV this workes out as designed. Something very much
unexpected happened, let's freeze the message and tell the admin. The
message is not lost. It is still in queue.

Is this reproducible, happening with a specific host? Any chance of
getting a coredump?

cu Andreas

--
`What a good friend you are to him, Dr. Maturin. His other friends are
so grateful to you.'
`I sew his ears on from time to time, sure'

Vincent Lefevre

unread,
Feb 2, 2022, 9:40:13 PM2/2/22
to
On 2022-02-02 18:20:48 +0100, Andreas Metzler wrote:
> Yeah from exim's POV this workes out as designed. Something very much
> unexpected happened, let's freeze the message and tell the admin. The
> message is not lost. It is still in queue.

Yes, once the issue occurs, exim works as designed. But this makes
the issue particularly problematic. Note that on a multi-user machine,
the user typically doesn't have access to the status of the queue,
so doesn't know that the mail has not been sent, and will probably
never know the reason.

> Is this reproducible, happening with a specific host?

Not reproduced yet. I wonder whether this is due to the "SMTP error
from remote mail server after DATA: 450 [S09] Try Again Later".

> Any chance of getting a coredump?

exim didn't leave a coredump. I could see that Mutt disables coredumps
after some time. I don't know under which condition.

Gedalya

unread,
May 10, 2022, 9:50:03 AM5/10/22
to
Got another variant now.

On the receiving side, I set my primary MX server (exim) to defer and the secondary one to "drop".

2022-05-10 13:32:18 1noPyD-0005sx-Mb ARC: no Authentication-Results header for signing
2022-05-10 13:32:18 1noPyD-0005sx-Mb H=mail.gedalya.net [**.**.**.**]: SMTP error from remote mail server after pipelined end of data: 451 Temporary local problem - please try later
2022-05-10 13:32:19 1noPyD-0005sx-Mb H=mx2.gedalya.net [****:****:****:....] Network is unreachable
2022-05-10 13:32:19 1noPyD-0005sx-Mb ARC: no Authentication-Results header for signing
2022-05-10 13:32:20 1noPyD-0005sx-Mb H=mx2.gedalya.net [**.**.**.**] TLS error on connection (recv): The TLS connection was non-properly terminated.
2022-05-10 13:32:20 1noPyD-0005sx-Mb H=mx2.gedalya.net [**.**.**.**] TLS error on connection (recv): The specified session has been invalidated for some reason.
2022-05-10 13:32:20 1noPyD-0005sx-Mb SIGSEGV (maybe attempt to write to immutable memory)
2022-05-10 13:32:20 1noPyD-0005sx-Mb Delivery status for ged...@gedalya.net: got 0 of 7 bytes (pipeheader) from transport process 22646 for transport smtp
2022-05-10 13:32:20 1noPyD-0005sx-Mb == ged...@gedalya.net R=dnslookup T=remote_smtp defer (-1): smtp transport process returned non-zero status 0x008b: terminated by signal 11
2022-05-10 13:32:20 1noPyD-0005sx-Mb Frozen

Now exim seems to crash while closing the TLS connection.

# gdb /usr/sbin/exim4 /var/spool/exim4/core
GNU gdb (Debian 10.1-2+b1) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/exim4...
Reading symbols from /usr/lib/debug/.build-id/b0/ba38f1cd15529b233aa41d2b313ad815319a3e.debug...

warning: core file may not match specified executable file.
[New LWP 22646]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `exim -q'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fa016193af9 in gnutls_x509_trust_list_deinit (list=0x5597d74c9160, all=1) at ../../../lib/x509/verify-high.c:213
213    ../../../lib/x509/verify-high.c: No such file or directory.
(gdb) set pagination off
(gdb) bt full
#0  0x00007fa016193af9 in gnutls_x509_trust_list_deinit (list=0x5597d74c9160, all=1) at ../../../lib/x509/verify-high.c:213
        i = <optimized out>
        j = 0
#1  0x00007fa0161020cb in gnutls_certificate_free_credentials (sc=0x5597d74ca360) at ../../lib/cert-cred.c:403
No locals.
#2  0x00005597d5941480 in tls_close (ct_ctx=0x5597d723aff0, do_shutdown=do_shutdown@entry=2) at ./b-exim4-daemon-custom/build-Linux-x86_64/tls-gnu.c:3752
        state = 0x5597d723aff0
        tlsp = 0x5597d59d23c0 <tls_out>
        __FUNCTION__ = "tls_close"
#3  0x00005597d596d9f8 in smtp_deliver (addrlist=addrlist@entry=0x5597d722a760, host=host@entry=0x5597d778e8e8, host_af=host_af@entry=2, defport=<optimized out>, interface=<optimized out>, tblock=tblock@entry=0x5597d723a380, message_defer=<optimized out>, suppress_tls=<optimized out>) at ./b-exim4-daemon-custom/build-Linux-x86_64/transports/smtp.c:4819
        n = <optimized out>
        ob = <optimized out>
        yield = <optimized out>
        save_errno = 0
        rc = <optimized out>
        message = 0x5597d771d718 "SMTP error from remote mail server after pipelined end of data: 550 Administrative prohibition"
        new_message_id = "\360\201-\203\375\177\000\000\000\000\000\000\000\000\000", <incomplete sequence \320>
        sx = 0x5597d77051e0
        __FUNCTION__ = "smtp_deliver"
        pass_message = 1
        dane_held = <optimized out>
        tcw_done = 1
        tcw = 0
        SEND_MESSAGE = <optimized out>
#4  0x00005597d596f742 in smtp_transport_entry (tblock=<optimized out>, addrlist=<optimized out>) at ./b-exim4-daemon-custom/build-Linux-x86_64/transports/smtp.c:5636
        thost = <optimized out>
        first_addr = 0x5597d722a760
        host_is_expired = 0
        some_deferred = 0
        interface = 0x0
        rc = <optimized out>
        host_af = 2
        message_defer = 0
        retry_host_key = 0x0
        retry_message_key = 0x0
        serialize_key = 0x0
        nexthost = 0x0
        unexpired_hosts_tried = 3
        continue_host_tried = 0
        cutoff_retry = <optimized out>
        defport = 25
        hosts_defer = 0
        hosts_fail = 0
        hosts_looked_up = <optimized out>
        hosts_retry = 0
        hosts_serial = 0
        hosts_total = <optimized out>
        total_hosts_tried = <optimized out>
        expired = 0
        expanded_hosts = <optimized out>
        pistring = 0x5597d598be71 ""
        tid = <optimized out>
        __FUNCTION__ = "smtp_transport_entry"
        ob = 0x5597d723a4b8
        hostlist = 0x5597d778e928
        host = 0x5597d778e8e8
#5  0x00005597d58cd682 in do_remote_deliveries (fallback=fallback@entry=0) at ./b-exim4-daemon-custom/build-Linux-x86_64/deliver.c:4736
        fd = 9
        h = <optimized out>
        address_count_max = <optimized out>
        use_initgroups = 0
        tp = 0x5597d723a380
        gid = 110
        pfd = {8, 9}
        anchor = <optimized out>
        addr = <optimized out>
        pid = 0
        multi_domain = 1
        pipe_done = 1
        last = <optimized out>
        panicmsg = <optimized out>
        uid = 106
        address_count = <optimized out>
        next = <optimized out>
        serialize_key = 0x0
        delivery_count = 0
        parmax = 2
        poffset = <optimized out>
        __FUNCTION__ = "do_remote_deliveries"
#6  0x00005597d58d3579 in deliver_message (id=id@entry=0x5597d722a2d9 "1noPyD-0005sx-Mb", forced=forced@entry=0, give_up=give_up@entry=0) at ./b-exim4-daemon-custom/build-Linux-x86_64/deliver.c:7255
        i = <optimized out>
        rc = <optimized out>
        final_yield = 0
        now = <optimized out>
        addr_last = <optimized out>
        filter_message = 0x0
        process_recipients = <optimized out>
        dbblock = {dbptr = 0x5597d74d3160, lockfd = 7}
        dbm_file = <optimized out>
        info = <optimized out>
        __FUNCTION__ = "deliver_message"
        RECIP_QUEUE_FAILED = <optimized out>
#7  0x00005597d5905a27 in queue_run (start_id=start_id@entry=0x0, stop_id=stop_id@entry=0x0, recurse=recurse@entry=0) at ./b-exim4-daemon-custom/build-Linux-x86_64/queue.c:675
        rc = <optimized out>
        pid = 0
        status = 0
        statbuf = {st_dev = 51744, st_ino = 131107, st_nlink = 1, st_mode = 33184, st_uid = 106, st_gid = 110, __pad0 = 0, st_rdev = 0, st_size = 2410, st_blksize = 4096, st_blocks = 8, st_atim = {tv_sec = 1652189531, tv_nsec = 109884967}, st_mtim = {tv_sec = 1652189529, tv_nsec = 697845398}, st_ctim = {tv_sec = 1652189529, tv_nsec = 701845510}, __glibc_reserved = {0, 0, 0}}
        buffer = "\000\210-\203\375\177\000\000pX\230\325\227U\000\000 O\230\325\227U\000\000Ј-\203\375\177\000\000pX\230\325\227U\000\000Ј-\203\375\177\000\000\006\000\000\000\000\000\000\000\\\237\215\325\227U\000\000~\001\000\000+\000\000\000\030\222\"חU\000\000\b\000\000\000\060\000\000\000\340\211-\203\375\177\000\000\000\211-\203\375\177\000\000\000\274\275jt\346\341\315\001\000\000\000\000\000\000\000q\276\230\325\227U\000\000\001\000\000\000\000\000\000\000\347\221\353\025\240\177\000\000\a\000\000\000\000\000\000\000\020\023\"חU\000\000\350\216\061\203\375\177\000\000\264\271\344\025\240\177\000\000\257\310\"חU\000\000\000\274\275jt\346\341\315acl_checx\377\377\377\377\377\377\377", '\000' <repeats 16 times>...
        pfd = {3, 5}
        fq = 0x5597d722a2d0
        reset_point1 = 0x5597d722a228
        i = 0
        force_delivery = 0
        selectstring_regex = 0x0
        selectstring_regex_sender = 0x0
        log_detail = 0x5597d722a218 "pid=22640"
        subcount = 0
        subdirs = "\000\000\000\000\000\000\000\000\277\000\000\000\227U\000\000\240^\236\325\227U\000\000\240\372\234\325\227U\000\000x\000\000\000P\000\000\000\000\274\275jt\346\341\315\070\026#חU\000\000\035\225\223\325\227U\000"
        qpid = {0, 0, 0, 0}
        single_id = 0
        __FUNCTION__ = "queue_run"
        single_item_retry = <optimized out>
#8  0x00005597d58b6e7a in main (argc=2, cargv=0x7ffd83318ee8) at ./b-exim4-daemon-custom/build-Linux-x86_64/exim.c:4797
        argv = 0x7ffd83318ee8
        arg_receive_timeout = -1
        arg_smtp_receive_timeout = -1
        arg_error_handling = 0
        filter_sfd = -1
        filter_ufd = -1
        group_count = <optimized out>
        i = <optimized out>
        rv = <optimized out>
        list_queue_option = <optimized out>
        msg_action = 0
        msg_action_arg = -1
        namelen = <optimized out>
        queue_only_reason = 0
        recipients_arg = <optimized out>
        sender_address_domain = 0
        test_retry_arg = -1
        test_rewrite_arg = -1
        original_egid = <optimized out>
        arg_queue_only = <optimized out>
        bi_option = <optimized out>
        checking = <optimized out>
        count_queue = <optimized out>
        expansion_test = <optimized out>
        extract_recipients = <optimized out>
        flag_G = <optimized out>
        flag_n = <optimized out>
        forced_delivery = 0
        f_end_dot = <optimized out>
        deliver_give_up = 0
        list_queue = 0
        list_options = <optimized out>
        list_config = <optimized out>
        local_queue_only = <optimized out>
        more = 1
        one_msg_action = 0
        opt_D_used = <optimized out>
        queue_only_set = <optimized out>
        receiving_message = <optimized out>
        sender_ident_set = <optimized out>
        session_local_queue_only = <optimized out>
        unprivileged = 0
        removed_privilege = <optimized out>
        usage_wanted = <optimized out>
        verify_address_mode = <optimized out>
        verify_as_sender = <optimized out>
        rcpt_verify_quota = <optimized out>
        version_printed = <optimized out>
        alias_arg = <optimized out>
        called_as = 0x5597d598be71 ""
        cmdline_syslog_name = <optimized out>
        start_queue_run_id = <optimized out>
        stop_queue_run_id = <optimized out>
        expansion_test_message = <optimized out>
        ftest_domain = <optimized out>
        ftest_localpart = <optimized out>
        ftest_prefix = <optimized out>
        ftest_suffix = <optimized out>
        log_oneline = <optimized out>
        malware_test_file = <optimized out>
        real_sender_address = <optimized out>
        originator_home = 0x5597d597f0bd "/"
        sz = <optimized out>
        pw = 0x5597d5a27900 <pwcopy>
        statbuf = {st_dev = 22, st_ino = 3, st_nlink = 1, st_mode = 8576, st_uid = 0, st_gid = 5, __pad0 = 0, st_rdev = 34816, st_size = 0, st_blksize = 1024, st_blocks = 0, st_atim = {tv_sec = 1652189536, tv_nsec = 425246092}, st_mtim = {tv_sec = 1652189536, tv_nsec = 425246092}, st_ctim = {tv_sec = 1652165838, tv_nsec = 445246653}, __glibc_reserved = {0, 0, 0}}
        passed_qr_pid = <optimized out>
        passed_qr_pipe = <optimized out>
        group_list = <error reading variable group_list (value requires 262144 bytes, which is more than max-value-size)>
        info_flag = <optimized out>
        info_stdout = <optimized out>
        rsopts = {0x5597d5989b3d "f", 0x5597d59b1368 "ff", 0x5597d59a59f4 "r", 0x5597d5983d6e "rf", 0x5597d5983d71 "rff"}
        __FUNCTION__ = "main"

Gedalya

unread,
May 11, 2022, 11:20:04 AM5/11/22
to
On 5/10/22 21:30, Gedalya wrote:
> I forgot to mention: google seems to be closing the connection immediately after the deferral, causing the logged TLS error lines, and this seems to be a necessary component for this issue

No. That's incorrect.

I've reproduced this with plain Debian-built exim4-daemon-light 4.95-5 and 4.96~RC0-1, with the remote setup being my two MX servers running exim with a simple "defer" acl verb, not closing the connection. The behavior seems quite similar with both exim versions. See attachments.

I'm a little dazzled by the variety of crashes I've seen so far: smtp_setup_conn > tls_client_start > verify_certificate, and during ARC signing, but it could be just noise so I'll leave it alone for now.

exim4_4.96~RC0-1.txt
exim4_4.95-5.txt

Matt Corallo

unread,
May 16, 2022, 1:20:03 AM5/16/22
to


On 5/11/22 8:09 AM, Gedalya wrote:
> I'm a little dazzled by the variety of crashes I've seen so far: smtp_setup_conn > tls_client_start > verify_certificate, and during ARC signing, but it could be just noise so I'll leave it alone for now.


As a passer-by might I suggest valgrind or building with address/undefined-behavior sanitizer? I
don't see any mention of it in this issue, and "it keeps crashing in random places that may or may
not be related" screams "memory corruption".

Matt
0 new messages