broken midx files

15 views
Skip to first unread message

Greg Troxel

unread,
May 14, 2026, 1:24:17 PM (11 days ago) May 14
to bup-...@googlegroups.com
I'm using bup 0.33.10 with python3.13 on NetBSD 9 and 10. Mostly things
are working well. The bup repo described below is on zfs, so it's
probably not a disk issue.

I periodically do a set of backups to online bup repos, where each
machine being backed up goes to 2-3 places, and for historical reasons
and continuing reasons there are repos per subset of the machines.

Yesterday, I got stack backtraces about midx files from 2 machines that
were backing up to the same repo, but the backups finished, with a
commit added to the ref anyway. backtrace was lost because of other
issues.

I looked at the repo, and running bup midx --check showed that one of
the 3 midx files had lots of complaints.

I removed all the midx files and ran

bup -d /b0/bup-linuxpal16 midx -a -f --max-files=500

because a prior run had hung, but this seems to get into a state where
~nothing is happening (tmp file being written stops growing) and a very
little bit of cpu time is used.

Repo is 129 GB, 491 packfiles.

I reduced to 200, and got 3 files, 82 666 and 408 MB.

A new backup ran without errors, 2 of the 3 midx files were fetched, and
the old ones went away from index-cache.

I have a dim memory of having to do this before.

I realize this isn't really a useful bug report, but I wonder if anyone
else is seeing midx corruption.

There's also that creating midx seems to not really work, at least in
large repos, with --max-files.

I do wonder how close we are to it being reasonable to run 0.34-alpha at
least and maybe enough has changed debugging 0.33.10 isn't sensible.

Mark Hewitt

unread,
May 14, 2026, 2:13:07 PM (11 days ago) May 14
to bup-...@googlegroups.com
Hi Greg,

This sounds a little like an issue I had and was discussed with Rob on
1st April or so on this list.
I was able to reproduce it for a while, and then, as I added diagnostics
to the code, not - and I still have not been able to identify exactly
what changed that behaviour.

Mark.

On 14/05/2026 18:24, Greg Troxel wrote:
> [,,,]
>
> I looked at the repo, and running bup midx --check showed that one of
> the 3 midx files had lots of complaints.
>
> I removed all the midx files and ran
>
> bup -d /b0/bup-linuxpal16 midx -a -f --max-files=500
>
> because a prior run had hung, but this seems to get into a state where
> ~nothing is happening (tmp file being written stops growing) and a very
> little bit of cpu time is used.
>
> [...]


Rob Browning

unread,
May 16, 2026, 6:08:34 PM (9 days ago) May 16
to Mark Hewitt, bup-...@googlegroups.com
Mark Hewitt <mjh.br...@gmail.com> writes:

> Hi Greg,
>
> This sounds a little like an issue I had and was discussed with Rob on
> 1st April or so on this list.
> I was able to reproduce it for a while, and then, as I added diagnostics
> to the code, not - and I still have not been able to identify exactly
> what changed that behaviour.

Right, this one I think:

https://groups.google.com/g/bup-list/c/DrPnmT6TiqM

Definitely like to figure out what's going on...

--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

Rob Browning

unread,
May 16, 2026, 6:11:20 PM (9 days ago) May 16
to Greg Troxel, bup-...@googlegroups.com
Greg Troxel <g...@lexort.com> writes:

> I do wonder how close we are to it being reasonable to run 0.34-alpha at
> least and maybe enough has changed debugging 0.33.10 isn't sensible.

I have a few more patches to add to main, and then I think we'll be
about ready to call for testing for 0.34.

So, hopefully close.

Greg Troxel

unread,
May 16, 2026, 7:53:08 PM (9 days ago) May 16
to Rob Browning, Mark Hewitt, bup-...@googlegroups.com
Rob Browning <r...@defaultvalue.org> writes:

> Mark Hewitt <mjh.br...@gmail.com> writes:
>
>> This sounds a little like an issue I had and was discussed with Rob on
>> 1st April or so on this list.
>> I was able to reproduce it for a while, and then, as I added diagnostics
>> to the code, not - and I still have not been able to identify exactly
>> what changed that behaviour.
>
> Right, this one I think:
>
> https://groups.google.com/g/bup-list/c/DrPnmT6TiqM
>
> Definitely like to figure out what's going on...

That seems to at least somewhat match what I am seeing.

Somehow, I end up with a midx file that is broken, where it claims to
cover some number of idx files but objects (all objects?) in those files
are missing. The symptom is that during a backup from another host
(which uses -r back to the repo) I get backtraces, but the backup
completes.

I am also finding that for many repos, if I remove all midx files and
try to midx -a, I get the very-slow-cpu-time nothing-really-happening
behavior. If I limit to some hundreds of idx, each run finishes very
quickly.

My repos and most of my backups are on NetBSD. Some on ZFS, some on
UFS2.

The repo I most recently had this problem was on ZFS:

129G total

491 pack/idx pairs

hashsplit 16

idx files are 1.3G (from "tar cf - > /tmp/IDX", which was easier than
using awk)

now has 3 midx files with sizes and mod dates
82M May 14 12:56
666M May 14 12:56
408M May 14 12:56

git count-objects -v
count: 0
size: 0
in-pack: 50621786
packs: 491
size-pack: 134667640
prune-packable: 0
garbage: 4
size-garbage: 1248684

Moving aside the 3 midx files and running

bup -d ../.. midx -a
I get an in-progress midx-[hash]/pending of size
1211171452
fairly quickly (not more than 10s of seconds)
160 ms was accumulated over 60s

Running again with --max-files 500, it seems to hang, and ^C won't kill
it, but SIGTERM will.

With --max-files=400
midx: writing 99.58% (17924200/18000000) real 0m8.505s
432528799
779191451

450 hangs

With --max-files=425
312265405
899192701

440:
240264655
971193451

445 hangs

443:
985593601
225864505

444:
221064455
990393651

This is suspiciously like 1E9 bytes is trouble. (Yes, I know that isn't
a round number in binary!)


Different repo, hashsplit 13.

midx hangs, pending is
1113426892

The highest max-filesa is 669 and the resulting midx sizes

988659937
124932287

Third repo, hashsplit 16, 153G, 583 packs,
Biggest max-files is 562;
990660479
96132083


It would be maybe interesting if others have the same hang that is
avoidable with max-files and the resulting largest midx is similar.




Overall, assuming my problem is a match:

argues for some counter/sum/etc. overflowing some limit

argues against it being a bug in $some_particular_fs

Rob Browning

unread,
May 17, 2026, 12:38:11 PM (8 days ago) May 17
to Greg Troxel, Mark Hewitt, bup-...@googlegroups.com
Greg Troxel <g...@lexort.com> writes:

> I am also finding that for many repos, if I remove all midx files and
> try to midx -a, I get the very-slow-cpu-time nothing-really-happening
> behavior. If I limit to some hundreds of idx, each run finishes very
> quickly.

If you have the time, and if (sounds like?) it's not too hard to
reproduce, might try adding some log() statements to narrow down where
it's stuck.

Also, when it hangs, did you mean that it's not swamping the CPU or it
is? Wondering if it's CPU bound, IO bound, or blocking.

Thanks

Greg Troxel

unread,
May 17, 2026, 2:30:58 PM (8 days ago) May 17
to Rob Browning, Mark Hewitt, bup-...@googlegroups.com
Rob Browning <r...@defaultvalue.org> writes:

> Greg Troxel <g...@lexort.com> writes:
>
>> I am also finding that for many repos, if I remove all midx files and
>> try to midx -a, I get the very-slow-cpu-time nothing-really-happening
>> behavior. If I limit to some hundreds of idx, each run finishes very
>> quickly.
>
> If you have the time, and if (sounds like?) it's not too hard to
> reproduce, might try adding some log() statements to narrow down where
> it's stuck.

It is reproducible.

> Also, when it hangs, did you mean that it's not swamping the CPU or it
> is? Wondering if it's CPU bound, IO bound, or blocking.

The process appears not to be doing IO, but I sould check harder. It
uses a very small amount of CPU time, apparently blocked on something.
I should look at the wchan.

If you can rm all midx and rerun midx -a on a repo with well over 1 GB
of idx, and that works, I'd find that very interesting.

Johannes Berg

unread,
May 17, 2026, 3:11:56 PM (8 days ago) May 17
to Greg Troxel, Rob Browning, Mark Hewitt, bup-...@googlegroups.com
On Sun, 2026-05-17 at 14:30 -0400, Greg Troxel wrote:
>
> > Also, when it hangs, did you mean that it's not swamping the CPU or it
> > is? Wondering if it's CPU bound, IO bound, or blocking.
>
> The process appears not to be doing IO, but I sould check harder. It
> uses a very small amount of CPU time, apparently blocked on something.

If it's doing any IO, then it'd be via mmap(), maybe that would be
accounted differently?

I'd think that if the whole *.idx size doesn't fit into RAM _twice_
(roughly) then it's going to be really painful because it mmaps all the
inputs _and_ the output, and then keeps shuffling the output data
around, but I can't explain any persistent corruption of the file right
now.

johannes

Johannes Berg

unread,
May 17, 2026, 4:03:54 PM (8 days ago) May 17
to Greg Troxel, Rob Browning, Mark Hewitt, bup-...@googlegroups.com
On Sun, 2026-05-17 at 21:11 +0200, Johannes Berg wrote:
> but I can't explain any persistent corruption of the file right
> now.

Well, Rob and I were just chatting about it, and he surfaced this
ancient thread about write() vs. writes through mmap, and a similar
problem that was there back then is perhaps in the midx code too?

Perhaps we need to explicitly call msync() for some reason, like below,
but that does msync(..., MS_SYNC)

johannes

--- a/lib/bup/cmd/midx.py
+++ b/lib/bup/cmd/midx.py
@@ -168,6 +168,7 @@ def _do_midx(outdir, outfilename, infilenames, prefixstr,

with mmap_readwrite(f, close=False) as fmap:
count = merge_into(fmap, bits, total, inp)
+ fmap.flush()
f.seek(0, os.SEEK_END)
f.write(b'\0'.join(allfilenames))
f.flush()

Greg Troxel

unread,
May 17, 2026, 7:11:25 PM (8 days ago) May 17
to Johannes Berg, Rob Browning, Mark Hewitt, bup-...@googlegroups.com
Johannes Berg <joha...@sipsolutions.net> writes:

> On Sun, 2026-05-17 at 14:30 -0400, Greg Troxel wrote:
>>
>> > Also, when it hangs, did you mean that it's not swamping the CPU or it
>> > is? Wondering if it's CPU bound, IO bound, or blocking.
>>
>> The process appears not to be doing IO, but I sould check harder. It
>> uses a very small amount of CPU time, apparently blocked on something.
>
> If it's doing any IO, then it'd be via mmap(), maybe that would be
> accounted differently?

I should try again and watch what the SSD is doing, xfers/s MB/S.

> I'd think that if the whole *.idx size doesn't fit into RAM _twice_
> (roughly) then it's going to be really painful because it mmaps all the
> inputs _and_ the output, and then keeps shuffling the output data
> around, but I can't explain any persistent corruption of the file right
> now.

The box has 32GB.

I ran midx -a, and it hung. I left it for a few minutes short of 3
hours.

The process looks like:

UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
1001 7133 18489 22752 80 0 2473700 1808820 pipe_wr Il+ pts/31 0:34.88 bup -d ../.. midx -a

System load is 0.06 and responsiveness is fine. top says enough RAM:

CPU states: 0.1% user, 0.0% nice, 0.1% system, 0.0% interrupt, 99.7% idle
Memory: 13G Act, 1970M Inact, 327M Wired, 893M Exec, 10G File, 5351M Free

The bup process has lots of pipes, all to itself.

$ fstat -n|egrep ' 7133 '|egrep pipe
gdt bup 7133 1* pipe 0xffff806ba2a43ee8 -> 0xffff8066eb11da10 w
gdt bup 7133 2* pipe 0xffff806cb7568628 -> 0xffff80671115e9f0 w
gdt bup 7133 3* pipe 0xffff8066eb11da10 <- 0xffff806ba2a43ee8 r
gdt bup 7133 4* pipe 0xffff806ba2a43ee8 -> 0xffff8066eb11da10 w
gdt bup 7133 6* pipe 0xffff80671115e9f0 <- 0xffff806cb7568628 r
gdt bup 7133 7* pipe 0xffff806cb7568628 -> 0xffff80671115e9f0 w
gdt bup 7133 9* pipe 0xffff80673ae17928 <- 0xffff8068eb620af8 r
gdt bup 7133 10* pipe 0xffff8068eb620af8 -> 0xffff80673ae17928 w

There are no processes with a ppid of 7133.

There are two threads:

PID LID USERNAME PRI STATE TIME WCPU CPU NAME COMMAND
7133 14190 gdt 43 parked/6 0:26 0.00% 0.00% - bup
7133 7133 gdt 80 pipe_w/4 0:09 0.00% 0.00% - bup



I conclude that "bup is confused" more than some it has run out of some
resource. I can't rule out some resource request denied by ulimit and
that being unhandled.

Reading ktrace output, I am even more befuddled.

(As for missing flush, my quick take is that interleaving normal IO and
mmap is asking for trouble even if POSIX says it has to work. But it's
interesting we are seeing issues on multiple operating systems that
likely do not have common code.)

Greg Troxel

unread,
May 17, 2026, 7:24:03 PM (8 days ago) May 17
to Johannes Berg, Rob Browning, Mark Hewitt, bup-...@googlegroups.com
Running again with --debug, I see

midx: 686 indexes; want no more than 5.
midx: b''creating from 685 files (46349096 objects).
midx: table size: 1048576 (18 bits)

and that value is suspiciously close to trouble.

I wonder if the estimate of needed space is too low and we're not
finding a place to put entries?

Mark Hewitt

unread,
May 18, 2026, 6:17:13 AM (7 days ago) May 18
to Greg Troxel, Johannes Berg, Rob Browning, bup-...@googlegroups.com
And I can reproduce this too. In my case there are 79Gb of .idx files in
this repository, and the machine has 128Gb RAM.

Everything Greg says aligns with my experience - there are two processes
(actually I see 3 by only because I'm running this with sudo here),
there is no IO according to iotop, there is no CPU usage according to
htop, and strace of the deepest child only shows the terminal message write:

> mjh@Rocinante:~$ sudo strace -p 43169
> strace: Process 43169 attached
> write(2, "midx: writing 9.92% (158040232/1"..., 43

although the actual output  see is:

> midx: writing 99.98% (68009536/68019772)

more interesting is it's parent:

> strace: Process 43168 attached
> ppoll([{fd=-1}, {fd=12, events=POLLIN}, {fd=9, events=POLLIN}], 3,
> NULL, NULL, 8

And this state has been the case for about 20 hours.

When I last investigated this, adding proctically any logging caused
midx to complete, so I'm on board with some flush or IO race condition
here - but I have not had the time to do much more detailed analysis
than that yet.

Mark

Johannes Berg

unread,
May 18, 2026, 6:21:08 AM (7 days ago) May 18
to Mark Hewitt, Greg Troxel, Rob Browning, bup-...@googlegroups.com
On Mon, 2026-05-18 at 11:17 +0100, Mark Hewitt wrote:
> On 18/05/2026 00:24, Greg Troxel wrote:
> > Running again with --debug, I see
> >
> > midx: 686 indexes; want no more than 5.
> > midx: b''creating from 685 files (46349096 objects).
> > midx: table size: 1048576 (18 bits)
> >
> > and that value is suspiciously close to trouble.
> >
> > I wonder if the estimate of needed space is too low and we're not
> > finding a place to put entries?
> >
> And I can reproduce this too. In my case there are 79Gb of .idx files in
> this repository, and the machine has 128Gb RAM.
>
> Everything Greg says aligns with my experience - there are two processes
> (actually I see 3 by only because I'm running this with sudo here),
> there is no IO according to iotop, there is no CPU usage according to
> htop, and strace of the deepest child only shows the terminal message write:
>
> > mjh@Rocinante:~$ sudo strace -p 43169
> > strace: Process 43169 attached
> > write(2, "midx: writing 9.92% (158040232/1"..., 43

But this actually does something? Or is it basically stuck at this
point? If I'm interpreting this correctly, the write() is just not
finishing?

> more interesting is it's parent:
>
> > strace: Process 43168 attached
> > ppoll([{fd=-1}, {fd=12, events=POLLIN}, {fd=9, events=POLLIN}], 3,
> > NULL, NULL, 8
>
> And this state has been the case for about 20 hours.

I think we're looking at (another) output filter bug, but in main we
just removed all that code IIRC.

johannes

Johannes Berg

unread,
May 18, 2026, 6:26:30 AM (7 days ago) May 18
to Mark Hewitt, Greg Troxel, Rob Browning, bup-...@googlegroups.com
Right:

commit 0d473c6c1fa2a663312b333f349aea4a6367c1d0
Author: Rob Browning <r...@defaultvalue.org>
Date: Tue Mar 18 16:24:02 2025 -0500

Drop stdin/stderr output filtering

johannes

Mark Hewitt

unread,
May 18, 2026, 7:08:12 AM (7 days ago) May 18
to Johannes Berg, Greg Troxel, Rob Browning, bup-...@googlegroups.com
On 18/05/2026 11:21, Johannes Berg wrote:
>>> mjh@Rocinante:~$ sudo strace -p 43169
>>> strace: Process 43169 attached
>>> write(2, "midx: writing 9.92% (158040232/1"..., 43
> But this actually does something? Or is it basically stuck at this
> point? If I'm interpreting this correctly, the write() is just not
> finishing?

No, I don't think so - more likely that strace is buffering (via stdio
even though this is to a terminal??) and has not yet received enough to
write the next few lines.

Everything here is apparently just waiting.

> I think we're looking at (another) output filter bug, but in main we
> just removed all that code IIRC.

I could try the same thing on a test box with the same data and main,
though that only has 16Gb RAM, so would introduce other variables.

I did test main on there recently and saw an issue due to a str/byte
typing problem in a path, but I have not followed that up as I expect
standard testing would identify that.

Mark

Johannes Berg

unread,
May 18, 2026, 7:11:45 AM (7 days ago) May 18
to Mark Hewitt, Greg Troxel, Rob Browning, bup-...@googlegroups.com
On Mon, 2026-05-18 at 12:08 +0100, Mark Hewitt wrote:
> On 18/05/2026 11:21, Johannes Berg wrote:
> > > > mjh@Rocinante:~$ sudo strace -p 43169

Just noticed that machine name :-)

> Everything here is apparently just waiting.

Right, that's what I thought.

> I could try the same thing on a test box with the same data and main,
> though that only has 16Gb RAM, so would introduce other variables.

I guess it'd be _more_ stressful, but it would certainly change timing
so perhaps not be all that useful, unless you could reproduce this issue
there as well first.

> I did test main on there recently and saw an issue due to a str/byte
> typing problem in a path, but I have not followed that up as I expect
> standard testing would identify that.

It seems it hasn't so far, maybe if you see it again send it to the
list.

johannes

Mark Hewitt

unread,
May 19, 2026, 8:58:22 AM (6 days ago) May 19
to Johannes Berg, Greg Troxel, Rob Browning, bup-...@googlegroups.com
On 18/05/2026 12:11, Johannes Berg wrote:
> On Mon, 2026-05-18 at 12:08 +0100, Mark Hewitt wrote:
>> On 18/05/2026 11:21, Johannes Berg wrote:
>>>>> mjh@Rocinante:~$ sudo strace -p 43169
> Just noticed that machine name :-)

I'll leave you to decide if it were Cervantes or Corey that inspired that !

>> Everything here is apparently just waiting.
> Right, that's what I thought.

FYI, two days later (it is 19th May as I write this), this is what the
directory looks like where the idx and midx live:

-rw-rw-r-- 1 mjh  mjh  36487381225 Apr  1 17:07
midx-f20258f451b0dce6ee866fd3e27805f03754c552.midx
-rw-r--r-- 1 root root       37019 May 17 22:09
midx-757b4572bd2f77f1cf337f11143b0210f2e69e00.midx
-rw-r--r-- 1 root root       37019 May 17 22:09
midx-b415562d1e07a1820c764797fcab1613c95b3442.midx
-rw-r--r-- 1 root root       37019 May 17 22:09
midx-819b9bef57b35221489950bb8bd55c1943066faa.midx
-rw-r--r-- 1 root root       37019 May 17 22:09
midx-929751c99fe5873f829731fd0301dc8308d15b41.midx
-rw-r--r-- 1 root root       55955 May 17 22:09
midx-8481c9f5a65a8d401606dd4227a1f65c4d19d3ac.midx
-rw-r--r-- 1 root root     9612467 May 17 22:09
midx-9ae873a01bf58d190d4d8c8e041395141ab19bfc.midx
-rw-r--r-- 1 root root  1160488075 May 17 22:09
midx-932aee33acfef7a9c7b52799df975d4be55fd56f.midx
-rw-r--r-- 1 root root  1580648747 May 17 22:09
midx-5259f93c8c475817e700d4d7fa15d083b62e5873.midx
-rw-r--r-- 1 root root  1584014339 May 17 22:09
midx-d291d1661e5a4421f32cae599c9efb227dd87360.midx
-rw-r--r-- 1 root root  1585668611 May 17 22:10
midx-c8122380a6e4a86a70770596532b5608bf0da6b9.midx
-rw-r--r-- 1 root root  1586838179 May 17 22:10
midx-411eef4ff59ea28b379dcbc381346b0b0abb99ed.midx
-rw-r--r-- 1 root root  1587860483 May 17 22:10
midx-ffd51b478eff8b4cc4e4e48be40bfe90f75e97bd.midx
-rw-r--r-- 1 root root  1588879811 May 17 22:10
midx-84fc0c713270b1800edbd9f3f2d1f06c8103ad3c.midx
-rw-r--r-- 1 root root  1589754371 May 17 22:10
midx-595bf6ef81ef20a8b346dbf8853f0ac0d155881f.midx
-rw-r--r-- 1 root root  1590610451 May 17 22:11
midx-74ba71e370f4f92c115b4ed0290fbf90f51e5f9e.midx
-rw-r--r-- 1 root root  1591421675 May 17 22:11
midx-29ef18dd2a4a0542f1dde8d40ce78872848f9418.midx
-rw-r--r-- 1 root root  1592312795 May 17 22:11
midx-23f0811a8802931b77be21b17784a72fd05014c4.midx
-rw-r--r-- 1 root root  1593225083 May 17 22:11
midx-2aa7b4ce46dd2d7cefe0d9104d49b495a240c5c2.midx
-rw-r--r-- 1 root root  1594182275 May 17 22:11
midx-f2f96648daefda8bf0f912e834d0dc13322c5b4f.midx
-rw-r--r-- 1 root root  1595254739 May 17 22:12
midx-64bb4c4944c3ba3a5e4b4829587126e828178568.midx
-rw-r--r-- 1 root root  1596512747 May 17 22:12
midx-56f9dd59eb289dfd1d1a89abc380fc57968b7fe8.midx
-rw-r--r-- 1 root root  1598448923 May 17 22:12
midx-b2c5a2e18345a7de0e3f7d808b858cdd5ebc6c46.midx
-rw-r--r-- 1 root root  1601214755 May 17 22:12
midx-04a622f9c89a34e5d74eaecab1b495bdf1bf6b58.midx
-rw-r--r-- 1 root root  1605553091 May 17 22:12
midx-47ee7027d0f4c552d8e7d4a03b222ecfa578e838.midx
-rw-r--r-- 1 root root  1609932035 May 17 22:13
midx-3636f442fa84f965dd69d2c7736e0efbf8d71591.midx
-rw-r--r-- 1 root root  1614319787 May 17 22:13
midx-5eb47db4f5e29cb215102d7034a9322690dc8e55.midx
-rw-r--r-- 1 root root  1619919779 May 17 22:13
midx-ab846bd85a8275f6ff6ea5ddb5f5dc6b72fa6085.midx
-rw-r--r-- 1 root root  1634596691 May 17 22:13
midx-2493ccdcfea573b72bbac68979a3c7c8704eba62.midx
drwx------ 2 root root        4096 May 17 22:13
midx-22c570376eb4e16d939f70c522f866f4d7f372bc.midx-jmh1q6c3

with that last tempdir containing:

-rw-r--r-- 1 root root 38274083900 May 17 22:13 pending

And note that to emulate Greg's experience, the midx command line
included "-a -f --max-files=500".

Mark.

Greg Troxel

unread,
May 19, 2026, 9:12:50 AM (6 days ago) May 19
to Mark Hewitt, Johannes Berg, Rob Browning, bup-...@googlegroups.com
Mark Hewitt <mjh.br...@gmail.com> writes:

>
> And note that to emulate Greg's experience, the midx command line
> included "-a -f --max-files=500".

interesting. Well, sorry you're having trouble but glad it's not just
me...

I am thinking that 500 is not a universal magic, but the point is to
make creating of any one midx small enough to fit in whatever stealth
limit/bug we are hitting.

But, midx is run in the background when saving from a remote (or
locally?), and that doesn't get the --max-files.


I wonder if you static-patch max-files to 100 (so the code does that
without an arg) and rm your midx, if things will be ok for you.

Rob Browning

unread,
May 19, 2026, 1:24:49 PM (6 days ago) May 19
to Mark Hewitt, Johannes Berg, Greg Troxel, bup-...@googlegroups.com
Mark Hewitt <mjh.br...@gmail.com> writes:

> On 18/05/2026 11:21, Johannes Berg wrote:

>> I think we're looking at (another) output filter bug, but in main we
>> just removed all that code IIRC.
>
> I could try the same thing on a test box with the same data and main,
> though that only has 16Gb RAM, so would introduce other variables.

To evaluate whether it might be the filtering (won't be at all
surprised), could one of you try the same command but with
BUP_FORCE_TTY=3 set in the environment?

BUP_FORCE_TTY=3 bup ...

or equivalent? I think that should disable it.

Greg Troxel

unread,
May 19, 2026, 7:26:47 PM (6 days ago) May 19
to Rob Browning, Mark Hewitt, Johannes Berg, bup-...@googlegroups.com
Rob Browning <r...@defaultvalue.org> writes:

> Mark Hewitt <mjh.br...@gmail.com> writes:
>
>> On 18/05/2026 11:21, Johannes Berg wrote:
>
>>> I think we're looking at (another) output filter bug, but in main we
>>> just removed all that code IIRC.
>>
>> I could try the same thing on a test box with the same data and main,
>> though that only has 16Gb RAM, so would introduce other variables.
>
> To evaluate whether it might be the filtering (won't be at all
> surprised), could one of you try the same command but with
> BUP_FORCE_TTY=3 set in the environment?
>
> BUP_FORCE_TTY=3 bup ...
>
> or equivalent? I think that should disable it.

Magic! -- with that, it writes a single midx, with progress indication,
no drama.

without, it just sits there.

Rob Browning

unread,
May 19, 2026, 8:18:45 PM (6 days ago) May 19
to Greg Troxel, Mark Hewitt, Johannes Berg, bup-...@googlegroups.com
Greg Troxel <g...@lexort.com> writes:

> Magic! -- with that, it writes a single midx, with progress indication,
> no drama.

Nice (and not nice). Thanks for testing.

Now we'll have to figure out what to do about it in 0.33.x. One option
would be to just disable output filtering, but I don't know offhand how
big a mess that might make in the terminal.

I also don't relish yet another round of trying to fix the filtering,
particularly since we've dropped it in main (assuming that sticks).

But at least we hae a workaround for now.

Mark Hewitt

unread,
May 20, 2026, 3:43:02 AM (5 days ago) May 20
to Rob Browning, Johannes Berg, Greg Troxel, bup-...@googlegroups.com
I was just waiting to ensure that the midx --check passed after trying
this, and sure enough, the midx generation seems to complete, and the
check passes.

I did expect that the midx already in situ would have been remove due to
the "-f", but in fact it was left behind.
And interestingly, that midx and the new one have very different sizes
(new is almost exactly double) and both pass the midx check: this is a
static copy of the .idx files, so I can't ascribe further saves to this
discrepancy.

-rw-rw-r-- 1 mjh  mjh  36487381225 Apr  1 17:07
midx-f20258f451b0dce6ee866fd3e27805f03754c552.midx
-rw-r--r-- 1 root root 72974762439 May 19 19:30
midx-6242ec6abbc00a2fb751243e904db658e4a2d6a4.midx

Mark
Reply all
Reply to author
Forward
0 new messages