Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: 64bit startup

6 views
Skip to first unread message

Samuel Thibault

unread,
Oct 1, 2023, 6:20:03 PM10/1/23
to
Hello,

Good news! It seems I fixed the bug that was making my 64bit VM
crashing quite often. The problem was that when receiving a message from
userland, ipc_kmsg_get would allocate a kernel buffer with the same size
as the userland message. But since we may expand the 32bit port names
into 64bit port addresses, this is not large enough.

I'm almost finished with building the base set of debian packages,
required to build & upload packages on the debian-ports archive (just
missing cmake that exposes some bootstrap bugs).

Before building the hurd-amd64 world, are we sure we are all set with
the ABI details? It'll be way harder to fix later on.

Samuel

Samuel Thibault

unread,
Oct 24, 2023, 6:30:04 PM10/24/23
to
Hello,

Some update on the 64bit port:

- The debian-ports archive now has enough packages to bootstrap a
chroot.
- A 64bit debian buildd is getting set up, not much work is left there.
- The hurd-amd64 wanna-build infrastructure is to be set up in the
coming days.

*but*

Building packages is not very stable. I have been trying to build
gcc-13 for a couple of weeks, without success so far. There are various
failures, most often odd errors in the libtool script, which are a sign
that the system itself is not behaving correctly. A way to reproduce
the issue is to just repeatedly build a package that is using libtool,
sooner or later that will fail very oddly.

This means that while the buildd will be ready, I'm really not at ease
with letting it start, knowing that it can behave erratically. When I
built the initial set of packages for debian-ports (~100 packages), I
got something like 5-10 such failures, that's quite high of a rate :/

Samuel

Jeffrey Walton

unread,
Oct 24, 2023, 9:30:04 PM10/24/23
to
On Tue, Oct 24, 2023 at 6:21 PM Samuel Thibault <samuel....@gnu.org> wrote:
>
> Some update on the 64bit port:
>
> - The debian-ports archive now has enough packages to bootstrap a
> chroot.
> - A 64bit debian buildd is getting set up, not much work is left there.
> - The hurd-amd64 wanna-build infrastructure is to be set up in the
> coming days.

Congrats

> *but*
>
> Building packages is not very stable. I have been trying to build
> gcc-13 for a couple of weeks, without success so far. There are various
> failures, most often odd errors in the libtool script, which are a sign
> that the system itself is not behaving correctly. A way to reproduce
> the issue is to just repeatedly build a package that is using libtool,
> sooner or later that will fail very oddly.

lol... <https://harmful.cat-v.org/software/GCC> and
<https://harmful.cat-v.org/software/GNU/auto-hell>.

Jeff

Jessica Clarke

unread,
Oct 24, 2023, 9:40:04 PM10/24/23
to
Yeah can we not spread this kind of vile rhetoric here? Regardless of
how much truth is in that, and whether it holds today, that kind of
language isn’t something we should be celebrating and encouraging
others to read. Let’s keep things more civil and on topic.

Jess

> <https://harmful.cat-v.org/software/GNU/auto-hell>.
>
> Jeff
>

Jeffrey Walton

unread,
Oct 24, 2023, 9:50:04 PM10/24/23
to
My apologies for offending your delicate sensibilities.

Jeff

Jessica Clarke

unread,
Oct 24, 2023, 10:00:04 PM10/24/23
to
1. I did not say I was offended. I said it was vile rhetoric. It does
not personally offend me, but that does not mean I want to see it
being circulated on these kinds of mailing lists.

2. Even if it did, so what? Per the Debian Mailing Lists’ Code of
Conduct, a superset of Debian’s. Off-topic and unwelcoming content is
against those. Even if it wasn’t explicitly written down as a rule,
though, the decent response to “don’t send unwelcoming content” isn’t
“I’m sorry you’re so sensitive” but “sorry, I won’t do it again”. So
kindly act decently or don’t contribute.

Jess

Jeffrey Walton

unread,
Oct 24, 2023, 10:10:04 PM10/24/23
to
Your overreaction. It looks like the stuff I would expect to see on
social media, like one of those binary confused persons crying someone
is perpetrating a hate crime because the wrong pronoun (subject?) was
used.

> 2. Even if it did, so what? Per the Debian Mailing Lists’ Code of
> Conduct, a superset of Debian’s. Off-topic and unwelcoming content is
> against those. Even if it wasn’t explicitly written down as a rule,
> though, the decent response to “don’t send unwelcoming content” isn’t
> “I’m sorry you’re so sensitive” but “sorry, I won’t do it again”. So
> kindly act decently or don’t contribute.

Exactly. So what?

Some folks have witty humor and appreciate the nostalgia. Others don't.

If you don't like my posts, then plonk me.

Jeff

jbr...@dismail.de

unread,
Oct 25, 2023, 12:10:04 AM10/25/23
to
>> lol... <https://harmful.cat-v.org/software/GCC> and
>
> Exactly. So what?
>
> Some folks have witty humor and appreciate the nostalgia. Others don't.
>
> If you don't like my posts, then plonk me.
>
> Jeff

I followed the link, and I did think it was a little funny. Perhaps
it was a little off topic. Or maybe GCC is partly at fault for the
Hurd's X86_64 building troubles?

But can we all remember that we are friends here? We all want the
Hurd to succeed right?

Thanks,

Joshua

Martin Steigerwald

unread,
Oct 25, 2023, 3:40:04 AM10/25/23
to
Hi Samuel, hi,

Samuel Thibault - 25.10.23, 00:04:33 CEST:
> Some update on the 64bit port:
>
> - The debian-ports archive now has enough packages to bootstrap a
> chroot.
> - A 64bit debian buildd is getting set up, not much work is left there.
> - The hurd-amd64 wanna-build infrastructure is to be set up in the
> coming days.

Congratulations! This is a great achievement. I appreciate it.

> *but*

Even though there are still challenges ahead, this is a great
accomplishment.

Thank you.

Best,
--
Martin

Samuel Thibault

unread,
Oct 25, 2023, 4:10:03 AM10/25/23
to
Jeffrey Walton, le mar. 24 oct. 2023 22:00:54 -0400, a ecrit:
*You* are overracting here.

Samuel

Samuel Thibault

unread,
Oct 25, 2023, 4:10:04 AM10/25/23
to
jbr...@dismail.de, le mer. 25 oct. 2023 03:40:16 +0000, a ecrit:
> >> lol... <https://harmful.cat-v.org/software/GCC> and
> >
> > Exactly. So what?
> >
> > Some folks have witty humor and appreciate the nostalgia. Others don't.
> >
> > If you don't like my posts, then plonk me.
> >
> > Jeff
>
> I followed the link, and I did think it was a little funny. Perhaps
> it was a little off topic. Or maybe GCC is partly at fault for the
> Hurd's X86_64 building troubles?

It's not at all. Nor is libtool.

I occasionally had issues in ./configure, too.

You'll say that's "yeah, it's all about auto-crap". No.

It's *very* most probably about bash, simply.

(and while the gcc page makes sense (all very complex software have
a lot of bugs, so compilers have, but the message has to be terribly
moderated, considering the huge amount of tests that gcc receives), the
auto-hell page doesn't: nowhere was it ever suggested that it's normal
for software to run several ./configure instances).

So, yes, it was essentially off-topic.

Samuel

jbr...@dismail.de

unread,
Oct 25, 2023, 8:20:03 AM10/25/23
to
October 25, 2023 3:43 AM, "Samuel Thibault" <samuel....@gnu.org> wrote:

> jbr...@dismail.de, le mer. 25 oct. 2023 03:40:16 +0000, a ecrit:
>
>> Or maybe GCC is partly at fault for the
>> Hurd's X86_64 building troubles?
>
> It's not at all. Nor is libtool.
>
> I occasionally had issues in ./configure, too.
>
> You'll say that's "yeah, it's all about auto-crap". No.
>
> It's *very* most probably about bash, simply.
>
> Samuel

Hmmm. I guess in the long-term then, the bash issues should be fixed.

Could we change the default shell on X86_64 Debian Hurd in the meantime,
as a temporary solution? Or is that a silly suggestion? I assume the ksh
shell is simplier, mainly because that is the default shell in OpenBSD.
Also, this is coming from someone who has yet to try the 64 bit Hurd. I
should really fire up qemu and give it a try.

Joshua

Samuel Thibault

unread,
Oct 25, 2023, 8:30:04 AM10/25/23
to
jbr...@dismail.de, le mer. 25 oct. 2023 11:52:02 +0000, a ecrit:
> October 25, 2023 3:43 AM, "Samuel Thibault" <samuel....@gnu.org> wrote:
> > jbr...@dismail.de, le mer. 25 oct. 2023 03:40:16 +0000, a ecrit:
> >
> >> Or maybe GCC is partly at fault for the
> >> Hurd's X86_64 building troubles?
> >
> > It's not at all. Nor is libtool.
> >
> > I occasionally had issues in ./configure, too.
> >
> > You'll say that's "yeah, it's all about auto-crap". No.
> >
> > It's *very* most probably about bash, simply.
>
> Hmmm. I guess in the long-term then, the bash issues should be fixed.

It'd really better be short-term, because currently we cannot really
trust the built packages: what if due to shell script misbehavior
./configure misdetects features, forgets enabling some support
etc. That'd lead to subtle incompatibilities that'll be hard to hunt
down.

> Could we change the default shell on X86_64 Debian Hurd in the meantime,
> as a temporary solution?

Don't take me wrong: I'm not saying the concern is *because* of bash,
but *concerning* bash. Another shell could very well just face exactly
the same concerns.

And no, we cannot just switch it: libtool uses the bash features, so we
have to fix the behavior of the system for bash (and /bin/sh already
points to dash)

Samuel

Samuel Thibault

unread,
Oct 25, 2023, 9:20:05 AM10/25/23
to
Samuel Thibault, le mer. 25 oct. 2023 14:05:35 +0200, a ecrit:
> jbr...@dismail.de, le mer. 25 oct. 2023 11:52:02 +0000, a ecrit:
> > October 25, 2023 3:43 AM, "Samuel Thibault" <samuel....@gnu.org> wrote:
> > > jbr...@dismail.de, le mer. 25 oct. 2023 03:40:16 +0000, a ecrit:
> > >
> > >> Or maybe GCC is partly at fault for the
> > >> Hurd's X86_64 building troubles?
> > >
> > > It's not at all. Nor is libtool.
> > >
> > > I occasionally had issues in ./configure, too.
> > >
> > > You'll say that's "yeah, it's all about auto-crap". No.
> > >
> > > It's *very* most probably about bash, simply.
> >
> > Hmmm. I guess in the long-term then, the bash issues should be fixed.
>
> It'd really better be short-term, because currently we cannot really
> trust the built packages: what if due to shell script misbehavior
> ./configure misdetects features, forgets enabling some support
> etc. That'd lead to subtle incompatibilities that'll be hard to hunt
> down.

Today's gcc attempt:

Comparing stages 2 and 3
Bootstrap comparison failure!
libbacktrace/.libs/sort.o differs

Samuel

Sergey Bugaev

unread,
Oct 25, 2023, 9:50:03 AM10/25/23
to
On Wed, Oct 25, 2023 at 2:52 PM <jbr...@dismail.de> wrote:
>
> October 25, 2023 3:43 AM, "Samuel Thibault" <samuel....@gnu.org> wrote:
>
> > jbr...@dismail.de, le mer. 25 oct. 2023 03:40:16 +0000, a ecrit:
> >
> >> Or maybe GCC is partly at fault for the
> >> Hurd's X86_64 building troubles?
> >
> > It's not at all. Nor is libtool.
> >
> > I occasionally had issues in ./configure, too.
> >
> > You'll say that's "yeah, it's all about auto-crap". No.
> >
> > It's *very* most probably about bash, simply.
> >
> > Samuel
>
> Hmmm. I guess in the long-term then, the bash issues should be fixed.
>
> Could we change the default shell on X86_64 Debian Hurd in the meantime,
> as a temporary solution?

I would rather ask, would it not be possible to set up a continuous
build server (buildd? I know next to nothing about the Debian infra)
that itself runs on a more stable architecture (amd64, or hurd-i386)
and cross-compiles the packages?

Sergey

Samuel Thibault

unread,
Oct 25, 2023, 10:30:04 AM10/25/23
to
Sergey Bugaev, le mer. 25 oct. 2023 16:29:29 +0300, a ecrit:
Cross-compiling *very* often produces slightly bogus packages. They are
enough to bootstrap something you can build upon, but you cannot hope
more.

We do have cross-compiling boot strap set up on
https://jenkins.debian.net/view/rebootstrap/
but we don't want to upload the result, since when cross-compiling
there are various ./configure tests that you cannot run (execution-time
results).

Samuel

Samuel Thibault

unread,
Oct 26, 2023, 7:10:04 PM10/26/23
to
Samuel Thibault, le mer. 25 oct. 2023 14:55:36 +0200, a ecrit:
Here is a good example:

f="internal/reflectlite.o"; if test ! -f $f; then f="internal/.libs/reflectlite.o"; fi; x86_64-gnu-objcopy -j .go_export $f internal/reflectlite.s-gox.tmp; /bin/bash ../../../src/libgo/mvifdiff.sh internal/reflectlite.s-gox.tmp `echo internal/reflectlite.s-gox | sed -e 's/s-gox/gox/'`
mv: cannot move 'internal/reflectlite.s-gox.tmp' to '': No such file or directory

It looks like the echo|sed part didn't work. (or command parameter
passing to mvifdiff.sh, but I doubt that)

Samuel

Samuel Thibault

unread,
Oct 26, 2023, 7:30:05 PM10/26/23
to
Samuel Thibault, le mer. 25 oct. 2023 00:04:33 +0200, a ecrit:
> Building packages is not very stable. I have been trying to build
> gcc-13 for a couple of weeks, without success so far. There are various
> failures, most often odd errors in the libtool script, which are a sign
> that the system itself is not behaving correctly. A way to reproduce
> the issue is to just repeatedly build a package that is using libtool,
> sooner or later that will fail very oddly.
>
> This means that while the buildd will be ready, I'm really not at ease
> with letting it start, knowing that it can behave erratically.

Actually, until gcc-13 actually builds, nothing else can build since
libc depends on libgcc.

Samuel

Damien Zammit

unread,
Oct 26, 2023, 9:12:59 PM10/26/23
to
Please check the locore.S on 64 bit. I think the int stack checks may not be pointing to the right location. I remember making some changes a long time ago without updating 64 bit because i had no way to test.

Damien


Sent from ProtonMail mobile



-------- Original Message --------

Samuel Thibault

unread,
Oct 27, 2023, 3:10:04 AM10/27/23
to
Samuel Thibault, le ven. 27 oct. 2023 00:42:06 +0200, a ecrit:
Indeed,

while [ "$(echo -n `echo internal/reflectlite.s-gox | sed -e 's/s-gox/gox/' ` )" = internal/reflectlite.gox ] ; do : ; done

does stop.

Samuel

Samuel Thibault

unread,
Oct 29, 2023, 6:50:04 PM10/29/23
to
Samuel Thibault, le ven. 27 oct. 2023 08:48:19 +0200, a ecrit:
> while [ "$(echo -n `echo internal/reflectlite.s-gox | sed -e 's/s-gox/gox/' ` )" = internal/reflectlite.gox ] ; do : ; done

For now, I could reproduce with

time while [ "$(echo -n `echo a` )" = a ] ; do : ; done

by running two of them in parallel, along with an apt install loop in
parallel. It takes a few hours to reproduce (sometimes 1, sometimes
3...)

Samuel

Samuel Thibault

unread,
Oct 30, 2023, 2:00:04 PM10/30/23
to
Samuel Thibault, le dim. 29 oct. 2023 23:27:22 +0100, a ecrit:
It seems to happen more often when running inside a chroot (possibly
because of the intermediate firmlink redirection?), and possibly
eatmydata also makes it more frequent.

Samuel

Samuel Thibault

unread,
Oct 31, 2023, 12:00:04 AM10/31/23
to
Samuel Thibault, le lun. 30 oct. 2023 18:35:03 +0100, a ecrit:
(it looks like there are memory leaks in proc, its vminfo keeps
increasing).

Samuel

Sergey Bugaev

unread,
Oct 31, 2023, 3:33:31 AM10/31/23
to
This could be [0], considering [ is a Bash built-in and not /bin/[, so
it's Bash that both compares strings and receives SIGCHLDs.

[0]: https://lists.gnu.org/archive/html/bug-hurd/2023-06/msg00105.html

Sergey

Samuel Thibault

unread,
Oct 31, 2023, 9:10:04 PM10/31/23
to
Samuel Thibault, le mar. 31 oct. 2023 04:40:43 +0100, a ecrit:
It seems 64bit-specific: the program below makes proc leak memory, 100
vminfo lines at a time. Possibly __mach_msg_destroy doesn't actually
properly parse messages to be destroyed, so that in the error case the
server leaks non-inline data? Flavio, perhaps you have an idea?

Samuel


#include <hurd.h>
#include <stdio.h>

#define N 1024
int main(void) {
mach_port_t port = getproc();
mach_port_t ports[N];
int ints[N];
for (int i = 0; i < N; i++) {
ports[i] = MACH_PORT_DEAD;
}
for (int i = 0; i < 100; i++) {
int ret = proc_setexecdata(port, ports, MACH_MSG_TYPE_COPY_SEND, N, ints, N);
if (ret) {
errno = ret;
perror("setexecdata");
}
}
return 0;
}

Samuel Thibault

unread,
Oct 31, 2023, 9:40:04 PM10/31/23
to
Samuel Thibault, le mer. 01 nov. 2023 01:50:40 +0100, a ecrit:
> Samuel Thibault, le mar. 31 oct. 2023 04:40:43 +0100, a ecrit:
> > Samuel Thibault, le lun. 30 oct. 2023 18:35:03 +0100, a ecrit:
> > > Samuel Thibault, le dim. 29 oct. 2023 23:27:22 +0100, a ecrit:
> > > > Samuel Thibault, le ven. 27 oct. 2023 08:48:19 +0200, a ecrit:
> > > > > while [ "$(echo -n `echo internal/reflectlite.s-gox | sed -e 's/s-gox/gox/' ` )" = internal/reflectlite.gox ] ; do : ; done
> > > >
> > > > For now, I could reproduce with
> > > >
> > > > time while [ "$(echo -n `echo a` )" = a ] ; do : ; done
> > > >
> > > > by running two of them in parallel, along with an apt install loop in
> > > > parallel. It takes a few hours to reproduce (sometimes 1, sometimes
> > > > 3...)
> > >
> > > It seems to happen more often when running inside a chroot (possibly
> > > because of the intermediate firmlink redirection?), and possibly
> > > eatmydata also makes it more frequent.
> >
> > (it looks like there are memory leaks in proc, its vminfo keeps
> > increasing).
>
> It seems 64bit-specific: the program below makes proc leak memory, 100
> vminfo lines at a time. Possibly __mach_msg_destroy doesn't actually
> properly parse messages to be destroyed, so that in the error case the
> server leaks non-inline data? Flavio, perhaps you have an idea?

Realizing only now by reading the __mach_msg_destroy assembly...

unsigned int msgt_inline : 1,
msgt_longform : 1,
msgt_deallocate : 1,
msgt_name : 8,
msgt_size : 16,
msgt_unused : 5;

This field ordering makes reading them overly complex... It'll be a pain
to rebootstrap, but perhaps we do want to put msgt_size and msgt_name
first?

Samuel

Samuel Thibault

unread,
Nov 1, 2023, 8:30:05 AM11/1/23
to
Samuel Thibault, le mer. 01 nov. 2023 01:50:40 +0100, a ecrit:
> Samuel Thibault, le mar. 31 oct. 2023 04:40:43 +0100, a ecrit:
> > Samuel Thibault, le lun. 30 oct. 2023 18:35:03 +0100, a ecrit:
> > > Samuel Thibault, le dim. 29 oct. 2023 23:27:22 +0100, a ecrit:
> > > > Samuel Thibault, le ven. 27 oct. 2023 08:48:19 +0200, a ecrit:
> > > > > while [ "$(echo -n `echo internal/reflectlite.s-gox | sed -e 's/s-gox/gox/' ` )" = internal/reflectlite.gox ] ; do : ; done
> > > >
> > > > For now, I could reproduce with
> > > >
> > > > time while [ "$(echo -n `echo a` )" = a ] ; do : ; done
> > > >
> > > > by running two of them in parallel, along with an apt install loop in
> > > > parallel. It takes a few hours to reproduce (sometimes 1, sometimes
> > > > 3...)
> > >
> > > It seems to happen more often when running inside a chroot (possibly
> > > because of the intermediate firmlink redirection?), and possibly
> > > eatmydata also makes it more frequent.
> >
> > (it looks like there are memory leaks in proc, its vminfo keeps
> > increasing).
>
> It seems 64bit-specific: the program below makes proc leak memory, 100
> vminfo lines at a time. Possibly __mach_msg_destroy doesn't actually
> properly parse messages to be destroyed, so that in the error case the
> server leaks non-inline data? Flavio, perhaps you have an idea?

I don't think we have the kernel-to-user equivalent for
adjust_msg_type_size? So that we end up pushing twice too much data to
userland for port arrays?

Samuel Thibault

unread,
Nov 1, 2023, 11:00:05 AM11/1/23
to
Samuel Thibault, le mer. 01 nov. 2023 13:14:17 +0100, a ecrit:
> Samuel Thibault, le mer. 01 nov. 2023 01:50:40 +0100, a ecrit:
> > Samuel Thibault, le mar. 31 oct. 2023 04:40:43 +0100, a ecrit:
> > > (it looks like there are memory leaks in proc, its vminfo keeps
> > > increasing).
> >
> > It seems 64bit-specific: the program below makes proc leak memory, 100
> > vminfo lines at a time. Possibly __mach_msg_destroy doesn't actually
> > properly parse messages to be destroyed, so that in the error case the
> > server leaks non-inline data? Flavio, perhaps you have an idea?
>
> I don't think we have the kernel-to-user equivalent for
> adjust_msg_type_size? So that we end up pushing twice too much data to
> userland for port arrays?

I found and fixed the allocation issue in the kernel. We however still
probably need some adjust_msg_type_size in copyoutmsg, otherwise
userland will see a 64bit size for ports?

Samuel

Samuel Thibault

unread,
Nov 1, 2023, 11:10:04 AM11/1/23
to
Samuel Thibault, le mer. 01 nov. 2023 15:35:00 +0100, a ecrit:
Ah, it's already done within copyout_unpack_msg_type

Samuel

Samuel Thibault

unread,
Nov 1, 2023, 11:30:04 AM11/1/23
to
Samuel Thibault, le mer. 01 nov. 2023 15:35:00 +0100, a ecrit:
> Samuel Thibault, le mer. 01 nov. 2023 13:14:17 +0100, a ecrit:
> > Samuel Thibault, le mer. 01 nov. 2023 01:50:40 +0100, a ecrit:
> > > Samuel Thibault, le mar. 31 oct. 2023 04:40:43 +0100, a ecrit:
> > > > (it looks like there are memory leaks in proc, its vminfo keeps
> > > > increasing).
> > >
> > > It seems 64bit-specific: the program below makes proc leak memory, 100
> > > vminfo lines at a time. Possibly __mach_msg_destroy doesn't actually
> > > properly parse messages to be destroyed, so that in the error case the
> > > server leaks non-inline data? Flavio, perhaps you have an idea?
> >
> > I don't think we have the kernel-to-user equivalent for
> > adjust_msg_type_size? So that we end up pushing twice too much data to
> > userland for port arrays?
>
> I found and fixed the allocation issue in the kernel.

It seems proc is still leaking, but on the heap this time. This is not
64bit-specific, the same simple reproducer triggers it:

while [ "$(echo -n `echo a` )" = a ] ; do : ; done

or more simply:

while true ; do echo $(echo -n $(echo a)) > /dev/null ; done

Samuel

Samuel Thibault

unread,
Nov 1, 2023, 2:40:03 PM11/1/23
to
Samuel Thibault, le mer. 01 nov. 2023 16:06:57 +0100, a ecrit:
I tracked it a bit, it seems that libport is not always cleaning
structures from the proc class. Below is the tracing that we get for
instance with the while loop above. Alloc is the allocation of pi, free
is the freeing from the point of view of the proc server, and clean is
the actual cleanup done by libports. I tell proc to print them whenever
one of them crosses a hundred boundary:

proc: alloc 651 free 600 clean 520
proc: alloc 700 free 648 clean 568
proc: alloc 731 free 679 clean 600
proc: alloc 751 free 700 clean 620
proc: alloc 800 free 748 clean 668
proc: alloc 831 free 779 clean 700
proc: alloc 851 free 800 clean 720
proc: alloc 900 free 848 clean 768
proc: alloc 931 free 879 clean 800
proc: alloc 951 free 900 clean 820
proc: alloc 1000 free 948 clean 868
proc: alloc 1031 free 979 clean 900
proc: alloc 1051 free 1000 clean 920
proc: alloc 1100 free 1048 clean 968
[...]
proc: alloc 2251 free 2200 clean 2120
proc: alloc 2300 free 2248 clean 2168
proc: alloc 2331 free 2279 clean 2200
proc: alloc 2351 free 2300 clean 2220
proc: alloc 2400 free 2348 clean 2268
proc: alloc 2431 free 2379 clean 2300
proc: alloc 2451 free 2400 clean 2320
proc: alloc 2500 free 2448 clean 2368
proc: alloc 2551 free 2500 clean 2368
proc: alloc 2600 free 2548 clean 2368
proc: alloc 2651 free 2600 clean 2368
[...]
proc: alloc 3400 free 3348 clean 2368
proc: alloc 3451 free 3400 clean 2368
proc: alloc 3500 free 3448 clean 2368
proc: alloc 3551 free 3500 clean 2368
proc: alloc 3600 free 3548 clean 2368

I.e. after a few seconds point the cleaning stops. I stopped the loop
there, waited a few seconds, and restarted it again, and got:

proc: alloc 3649 free 3597 clean 2400
proc: alloc 3651 free 3600 clean 2402
proc: alloc 3700 free 3648 clean 2450
proc: alloc 3749 free 3697 clean 2500
proc: alloc 3751 free 3700 clean 2502
proc: alloc 3800 free 3748 clean 2550
proc: alloc 3849 free 3797 clean 2600
proc: alloc 3851 free 3800 clean 2602
proc: alloc 3900 free 3848 clean 2650
proc: alloc 3949 free 3897 clean 2700
proc: alloc 3951 free 3900 clean 2702
proc: alloc 4000 free 3948 clean 2750

i.e. it restarts cleaning properly, but after some time the cleaning
stops again. Also, if I restart too quickly, the cleaning doesn't start
again. So it looks like the cleaning work somehow gets jammed.

Could it be that proc is overflown with dead port notifications? That's
not many procs, but still. Maybe Sergey has an idea?

Samuel

Sergey Bugaev

unread,
Nov 1, 2023, 4:40:03 PM11/1/23
to
Hello,

On Wed, Nov 1, 2023 at 9:17 PM Samuel Thibault <samuel....@gnu.org> wrote:
> I tracked it a bit, it seems that libport is not always cleaning
> structures from the proc class. Below is the tracing that we get for
> instance with the while loop above. Alloc is the allocation of pi, free
> is the freeing from the point of view of the proc server, and clean is
> the actual cleanup done by libports.
>
> Could it be that proc is overflown with dead port notifications? That's
> not many procs, but still. Maybe Sergey has an idea?

I don't think I understood what "freeing from the point of view of the
proc server" and "actual cleanup done by libports" mean (for one
thing, proc_class's clean_routine is NULL).

Perhaps you could post this as a patch to clear things up, and also
for me to try and reproduce (& debug) this?

Sergey

Samuel Thibault

unread,
Nov 1, 2023, 4:50:05 PM11/1/23
to
Sergey Bugaev, le mer. 01 nov. 2023 23:18:01 +0300, a ecrit:
> On Wed, Nov 1, 2023 at 9:17 PM Samuel Thibault <samuel....@gnu.org> wrote:
> > I tracked it a bit, it seems that libport is not always cleaning
> > structures from the proc class. Below is the tracing that we get for
> > instance with the while loop above. Alloc is the allocation of pi, free
> > is the freeing from the point of view of the proc server, and clean is
> > the actual cleanup done by libports.
> >
> > Could it be that proc is overflown with dead port notifications? That's
> > not many procs, but still. Maybe Sergey has an idea?
>
> I don't think I understood what "freeing from the point of view of the
> proc server" and "actual cleanup done by libports" mean (for one
> thing, proc_class's clean_routine is NULL).

>From the point of view of the proc server = complete_exit was called
Actual cleanup = clean_routing called (I added one just to track it).

The difference between the two is probably some remaining port
reference for whatever reason.

> Perhaps you could post this as a patch to clear things up, and also
> for me to try and reproduce (& debug) this?

Here it is.

Samuel
patch

Flávio Cruz

unread,
Nov 5, 2023, 11:20:04 PM11/5/23
to
Hi Samuel

I only moved msgt_size to the end of the struct since a small number of (deprecated) RPC types
require msgt_size to be 2 bytes long and those can be shortened to be under 1 byte.
We could end up with a much larger contiguous msgt_unused that could be used for
other things in the future.

In relation to other changes we have to do for finishing the ABI... I think we have a reasonable
ABI now. However, I wonder if it would be simpler to just ask the user land to pass port names using
the following struct:

#ifdef KERNEL
union mach_rpc_port {
   mach_port_name_t name;
   mach_port_t kernel_port;
};
#else
struct mach_rpc_port {
   mach_port_name_t name;
   int unused;
};
#endif

It would make the kernel simpler since no message resizing is necessary and most of the code using this
would be MiG generated.

Flavio


Samuel

Samuel Thibault

unread,
Nov 6, 2023, 5:40:03 PM11/6/23
to
Hello,

Flávio Cruz, le dim. 05 nov. 2023 23:17:49 -0500, a ecrit:
> On Tue, Oct 31, 2023 at 9:14 PM Samuel Thibault <[1]samuel....@gnu.org>
> wrote:
>
> > Realizing only now by reading the __mach_msg_destroy assembly...
> >
> >     unsigned int        msgt_inline : 1,
> >                         msgt_longform : 1,
> >                         msgt_deallocate : 1,
> >                         msgt_name : 8,
> >                         msgt_size : 16,
> >                         msgt_unused : 5;
> >
> > This field ordering makes reading them overly complex... It'll be a pain
> > to rebootstrap, but perhaps we do want to put msgt_size and msgt_name
> > first?
>
> I only moved msgt_size to the end of the struct

Ok, so we can as well align fields to make bit mangling simpler.

> However, I wonder if it would be simpler to just ask the user land to
> pass port names using the following struct:
>
> #ifdef KERNEL
> union mach_rpc_port {
>    mach_port_name_t name;
>    mach_port_t kernel_port;
> };
> #else
> struct mach_rpc_port {
>    mach_port_name_t name;
>    int unused;
> };
> #endif
>
> It would make the kernel simpler since no message resizing is
> necessary

I was thinking about this suggestion today, and I think that'll be
better for the long run indeed. There are questions about holes being
uninitialized, but:

> and most of the code using this would be MiG generated.

indeed.

I'm almost done with the ground set of Debian packages. Will wait until
this is settled before building the hurd-amd64 Debian world :)

Samuel

Samuel Thibault

unread,
Nov 26, 2023, 2:50:04 PM11/26/23
to
Hello,

Samuel Thibault, le mer. 01 nov. 2023 16:06:57 +0100, a ecrit:
I found the issue, it's because of the quiescence support in libports,
which assumes that all threads will sooner or later go through a
quiescent state (because it finished processing a message). But that's
not true, proc doesn't set a thread timeout, and thus some threads can
stay indefinitely stuck in receiving messages. And thus the deferred
dereferencing used by ports_destroy_right never gets achieved.

I'll push a fix.

Samuel

Richard Braun

unread,
Dec 1, 2023, 8:30:04 AM12/1/23
to
On Sun, Nov 26, 2023 at 08:32:30PM +0100, Samuel Thibault wrote:
> I found the issue, it's because of the quiescence support in libports,
> which assumes that all threads will sooner or later go through a
> quiescent state (because it finished processing a message). But that's
> not true, proc doesn't set a thread timeout, and thus some threads can
> stay indefinitely stuck in receiving messages. And thus the deferred
> dereferencing used by ports_destroy_right never gets achieved.
>
> I'll push a fix.

Very nice catch.

--
Richard Braun

Samuel Thibault

unread,
Jan 3, 2024, 3:20:04 AMJan 3
to
Hello,

I'm still stuck without being able to start packages building for
hurd-amd64 due to this unreliability.

Sergey Bugaev, le mar. 31 oct. 2023 10:09:17 +0300, a ecrit:
I tried

time while /usr/bin/\[ "$(echo -n `echo a` )" = a ] ; do : ; done

with the same result.

Samuel

Sergey Bugaev

unread,
Jan 3, 2024, 3:40:04 AMJan 3
to
Hello,

I guess this is where I ask (consistent with the subject line) about
how I would run the x86_64 system (to reproduce & debug this).

I've tried debootstrapping from
https://people.debian.org/~sthibault/tmp/hurd-amd64 as the wiki page
says; but that doesn't proceed beyond the rumpdisk. Rumpdisk just sits
there, slowly spitting out logs; ext2fs gives up waiting for it after
several minutes. (I can attach more logs/details if needed.) hurd-i386
w/ rumpdisk boots fine on the same (virtual) hardware.

How are you running it? Should I still be using a ramdisk image and
not rumpdisk?

Sergey

Samuel Thibault

unread,
Jan 3, 2024, 3:50:04 AMJan 3
to
Sergey Bugaev, le mer. 03 janv. 2024 11:17:53 +0300, a ecrit:
> I guess this is where I ask (consistent with the subject line) about
> how I would run the x86_64 system (to reproduce & debug this).

You probably want to start with the pre-built images I have linked from
the wiki page.

> I've tried debootstrapping from
> https://people.debian.org/~sthibault/tmp/hurd-amd64 as the wiki page
> says; but that doesn't proceed beyond the rumpdisk. Rumpdisk just sits
> there, slowly spitting out logs;

Does it detect disks? What qemu parameters are you using?

I'm using a mere

kvm -M q35 -drive file=disk-amd64.img -m 1

Samuel

Sergey Bugaev

unread,
Jan 3, 2024, 2:20:04 PMJan 3
to
On Wed, Jan 3, 2024 at 11:27 AM Samuel Thibault <samuel....@gnu.org> wrote:
> Sergey Bugaev, le mer. 03 janv. 2024 11:17:53 +0300, a ecrit:
> > I guess this is where I ask (consistent with the subject line) about
> > how I would run the x86_64 system (to reproduce & debug this).
>
> You probably want to start with the pre-built images I have linked from
> the wiki page.

Ah... I have been reading the wrong version of the wiki page again. It
doesn't help that there are many of them [0][1][2][3].

[0] https://www.gnu.org/software/hurd/faq/64-bit.html
[1] https://www.gnu.org/software/hurd/open_issues/64-bit_port.html
[2] https://darnassus.sceen.net/~hurd-web/faq/64-bit/
[3] https://darnassus.sceen.net/~hurd-web/open_issues/64-bit_port/

But your disk image works *great*! \o/ I don't know what is different
compared to what I was trying, but yours just works.

I haven't been able to reproduce your bug in a few hours of testing;
perhaps I need to try two of them in parallel and some I/O-heavy
workload in the background, as you're saying. Even if I do manage to
reproduce this, I don't immediately know how to debug it; maybe we
could try to use qemu's record/replay functionality to debug it
backwards from where we can detect it? (I have found the rr(1) tool
*immensely* useful for debugging GTK issues.)

Could it be that the two strings are actually different (something
being wrong with pipes perhaps)?

Sergey

Luca

unread,
Jan 3, 2024, 2:30:04 PMJan 3
to
Hi Sergey,

Il 03/01/24 09:17, Sergey Bugaev ha scritto:
> How are you running it? Should I still be using a ramdisk image and
> not rumpdisk?

Recently I've been installing hurd-amd64 on another disk of my hurd-i386
vm and booting from that. Basically I prepare the disk with debootstrap
--foreign, then I reuse the i386 grub install to boot the 64 bit kernel
with a custom entry, then run the --second stage, configure login, fstab
and network and reboot. I can give you the exact commands and setup I'm
using if you want (I need to reinstall it anyway due to latest changes),

I'm currently using qemu via virt-manager, mostly with the default
configuration for an x86_64 vm; that means a virtual SATA disk
controller and Q35 chipset.

The only issue I see is that sometimes at shutdown rumpdisk hangs and I
can't halt the system, however this seems the same with hurd-i686 and it
doesn't happen if I force the shutdown with halt-hurd (or reboot-hurd).
I didn't had a deeper look at this so far, but I don't have issues
booting or connecting via ssh. I didn't try heavy builds, so probably
that's why I don't see Samuel's issue, but I've been building gdb and
the hurd itself (for a fix for the crash server that I have in my queue).


Luca

Samuel Thibault

unread,
Jan 3, 2024, 2:40:05 PMJan 3
to
Luca, le mer. 03 janv. 2024 20:07:00 +0100, a ecrit:
> Il 03/01/24 09:17, Sergey Bugaev ha scritto:
> > How are you running it? Should I still be using a ramdisk image and
> > not rumpdisk?
>
> Recently I've been installing hurd-amd64 on another disk of my hurd-i386 vm
> and booting from that. Basically I prepare the disk with debootstrap
> --foreign, then I reuse the i386 grub install to boot the 64 bit kernel with
> a custom entry, then run the --second stage, configure login, fstab and
> network and reboot. I can give you the exact commands and setup I'm using if
> you want (I need to reinstall it anyway due to latest changes),

It could be useful to merge information into the wiki page.

Samuel

Samuel Thibault

unread,
Jan 4, 2024, 3:20:03 AMJan 4
to
Sergey Bugaev, le mer. 03 janv. 2024 21:56:54 +0300, a ecrit:
> perhaps I need to try two of them in parallel and some I/O-heavy
> workload in the background, as you're saying.

Yes, that's needed to raise the probability of the bug.

> Could it be that the two strings are actually different (something
> being wrong with pipes perhaps)?

I tried

A=a ; time while /usr/bin/\[ "$A" = a ] ; do A="$(echo -n `echo a` )" ; done ; echo $A

The output is empty. But yes, that could be some missing flush or such
in pipes.

Samuel

Sergey Bugaev

unread,
Jan 4, 2024, 5:20:04 AMJan 4
to
On Wed, Jan 3, 2024 at 10:07 PM Luca <lu...@orpolo.org> wrote:
> Hi Sergey,

Hi,

> Recently I've been installing hurd-amd64 on another disk of my hurd-i386
> vm and booting from that. Basically I prepare the disk with debootstrap
> --foreign, then I reuse the i386 grub install to boot the 64 bit kernel
> with a custom entry,

That (debootstrap + reusing existing GRUB from the i686 installation)
is what I was doing, yes, in one of the two setups that I've tried. On
the other (on a different host) I was doing grub2-install myself. In
both cases I got the same result, with GRUB working fine, but then
rumpdisk apparently misbehaving.

I could reproduce this if we want to debug it further, but Samuel's
image works great for now.

> then run the --second stage, configure login, fstab
> and network and reboot. I can give you the exact commands and setup I'm
> using if you want (I need to reinstall it anyway due to latest changes),
>
> I'm currently using qemu via virt-manager, mostly with the default
> configuration for an x86_64 vm; that means a virtual SATA disk
> controller and Q35 chipset.

Yes, I'd like to use libvirt eventually too, like I'm doing for my
i686 Hurd VM. But I need greater control over how I invoke QEMU for
now.

Sergey

P.S. I have posted all of my patches, so if you're interested in
hacking on aarch64-gnu Mach, you should be able to build the full
toolchain now.

Sergey Bugaev

unread,
Jan 4, 2024, 2:40:04 PMJan 4
to
On Thu, Jan 4, 2024 at 10:57 AM Samuel Thibault <samuel....@gnu.org> wrote:
>
> Sergey Bugaev, le mer. 03 janv. 2024 21:56:54 +0300, a ecrit:
> > perhaps I need to try two of them in parallel and some I/O-heavy
> > workload in the background, as you're saying.
>
> Yes, that's needed to raise the probability of the bug.

I'm still unable to reproduce this, it's been running for 10+ hours at
this point. That's two copies of it, and unrelated activity in the
background.

> > Could it be that the two strings are actually different (something
> > being wrong with pipes perhaps)?
>
> I tried
>
> A=a ; time while /usr/bin/\[ "$A" = a ] ; do A="$(echo -n `echo a` )" ; done ; echo $A
>
> The output is empty. But yes, that could be some missing flush or such
> in pipes.

Try

A=abcd ; time while /usr/bin/\[ "$A" = abcd ] ; do A="$(echo -n `echo
a``echo b`)$(echo -n `echo c``echo d`)" ; done ; echo $A

perhaps?

Sergey

Samuel Thibault

unread,
Jan 4, 2024, 5:10:03 PMJan 4
to
Hello,

Sergey Bugaev, le jeu. 04 janv. 2024 22:21:11 +0300, a ecrit:
> On Thu, Jan 4, 2024 at 10:57 AM Samuel Thibault <samuel....@gnu.org> wrote:
> > Sergey Bugaev, le mer. 03 janv. 2024 21:56:54 +0300, a ecrit:
> > > perhaps I need to try two of them in parallel and some I/O-heavy
> > > workload in the background, as you're saying.
> >
> > Yes, that's needed to raise the probability of the bug.
>
> I'm still unable to reproduce this, it's been running for 10+ hours at
> this point. That's two copies of it, and unrelated activity in the
> background.

Which kind of activity? I use

while true ; do apt install --reinstall wdiff ; done

> > > Could it be that the two strings are actually different (something
> > > being wrong with pipes perhaps)?
> >
> > I tried
> >
> > A=a ; time while /usr/bin/\[ "$A" = a ] ; do A="$(echo -n `echo a` )" ; done ; echo $A
> >
> > The output is empty. But yes, that could be some missing flush or such
> > in pipes.
>
> Try
>
> A=abcd ; time while /usr/bin/\[ "$A" = abcd ] ; do A="$(echo -n `echo
> a``echo b`)$(echo -n `echo c``echo d`)" ; done ; echo $A
>
> perhaps?

got acd :)

I'll be testing with dash over the next hours.

Samuel

Sergey Bugaev

unread,
Jan 5, 2024, 4:30:03 AMJan 5
to
On Fri, Jan 5, 2024 at 12:52 AM Samuel Thibault <samuel....@gnu.org> wrote:
> Which kind of activity?

I just had a loop spawning /bin/true — this should've triggered it
assuming it was related to some state getting corrupted on
context-switching.

> I use
>
> while true ; do apt install --reinstall wdiff ; done

That did it! I can now reliably reproduce this.

(I assume you don't mind my box constantly banging on your repo.)

> got acd :)

In the six times that I've reproduced it so far, I got "acd" in all cases. Hmmm.

Sergey

Samuel Thibault

unread,
Jan 5, 2024, 5:10:04 AMJan 5
to
Sergey Bugaev, le ven. 05 janv. 2024 12:06:13 +0300, a ecrit:
> > I use
> >
> > while true ; do apt install --reinstall wdiff ; done
>
> That did it! I can now reliably reproduce this.
>
> (I assume you don't mind my box constantly banging on your repo.)

It's people.debian.org, it's meant for this :)

You can probably also pass to apt an option to keep the donwloaded .deb

> > got acd :)
>
> In the six times that I've reproduced it so far, I got "acd" in all cases. Hmmm.

"interesting" :)

Samuel

Samuel Thibault

unread,
Jan 5, 2024, 9:20:06 AMJan 5
to
Samuel Thibault, le jeu. 04 janv. 2024 08:57:51 +0100, a ecrit:
It happens with dash too.

Samuel

Sergey Bugaev

unread,
Jan 5, 2024, 1:40:03 PMJan 5
to
I'm not seeing hurd-dbg / hurd-libs-dbg packages in your repo. Could
you please either teach me where to look, or if they're indeed
missing, upload them?

Also I can't help but notice that the hurd package (i.e the translator
binaries) is still not being built as PIE, unlike basically all the
other binaries. This actually helps with debugging for now, but please
remember to ensure it does end up as PIE when the system is ready for
broader use.

/servers/crash-dump-core crashes on the memset () call in
hurd:exec/elfcore.c:fetch_thread_fpregset (); the (*fpregs) pointer is
NULL. The caller passes fpregs = &note.data.pr_fpreg, where note.data
is of type struct elf_lwpstatus, defined in hurd:include/sys/procfs.h,
whose pr_fpreg field is of type prfpregset_t, which is a typedef to
fpregset_t, which was an actual struct on i386, but is a pointer on
x86_64. This would've been easier to debug if I had debuginfo :)

Sergey

Samuel Thibault

unread,
Jan 5, 2024, 4:20:04 PMJan 5
to
Sergey Bugaev, le ven. 05 janv. 2024 21:12:48 +0300, a ecrit:
> I'm not seeing hurd-dbg / hurd-libs-dbg packages in your repo.

Yes, my repo is built from the rebootstrap scripts, which drop debug
etc. only for creating a booting system.

For proper packages, use the usual deb.debian.org debian-ports mirror.

Samuel

Samuel Thibault

unread,
Jan 5, 2024, 6:10:04 PMJan 5
to
Sergey Bugaev, le ven. 05 janv. 2024 21:12:48 +0300, a ecrit:
> Also I can't help but notice that the hurd package (i.e the translator
> binaries) is still not being built as PIE,

This is not actually specific to 64bit. This was set explicitly in 2016
in debian/rules, tests welcome to check whether building the package
without it works now.

Samuel

Luca

unread,
Jan 5, 2024, 7:10:04 PMJan 5
to
Il 04/01/24 10:55, Sergey Bugaev ha scritto:
> P.S. I have posted all of my patches, so if you're interested in
> hacking on aarch64-gnu Mach, you should be able to build the full
> toolchain now.

sure, I've started looking into it, but it will take a while before I
can run something in userspace. I'm working on top of your gnumach patch
for now.


Luca

Luca

unread,
Jan 5, 2024, 7:10:04 PMJan 5
to
Il 05/01/24 19:12, Sergey Bugaev ha scritto:
> /servers/crash-dump-core crashes on the memset () call in
> hurd:exec/elfcore.c:fetch_thread_fpregset (); the (*fpregs) pointer is
> NULL. The caller passes fpregs = &note.data.pr_fpreg, where note.data
> is of type struct elf_lwpstatus, defined in hurd:include/sys/procfs.h,
> whose pr_fpreg field is of type prfpregset_t, which is a typedef to
> fpregset_t, which was an actual struct on i386, but is a pointer on
> x86_64. This would've been easier to debug if I had debuginfo :)

I had this small patch applied that apparently is enough for me to have
some kind of core dump, I'm not sure if it's a good solution:

diff --git a/exec/elfcore.c b/exec/elfcore.c
index c6aa2bc8b..405fa8e0c 100644
--- a/exec/elfcore.c
+++ b/exec/elfcore.c
@@ -544,6 +544,11 @@ dump_core (task_t task, file_t file, off_t corelimit,
note.data.pr_info.si_code = sigcode;
note.data.pr_info.si_errno = sigerror;

+#ifdef __x86_64__
+ struct _libc_fpstate fpstate;
+ memset(&fpstate, 0, sizeof(fpstate));
+ note.data.pr_fpreg = &fpstate;
+#endif
fetch_thread_regset (threads[i], &note.data.pr_reg);
fetch_thread_fpregset (threads[i], &note.data.pr_fpreg);


HTH
Luca

Samuel Thibault

unread,
Jan 5, 2024, 7:20:05 PMJan 5
to
Luca, le sam. 06 janv. 2024 00:42:35 +0100, a ecrit:
> Il 05/01/24 19:12, Sergey Bugaev ha scritto:
> > /servers/crash-dump-core crashes on the memset () call in
> > hurd:exec/elfcore.c:fetch_thread_fpregset (); the (*fpregs) pointer is
> > NULL. The caller passes fpregs = &note.data.pr_fpreg, where note.data
> > is of type struct elf_lwpstatus, defined in hurd:include/sys/procfs.h,
> > whose pr_fpreg field is of type prfpregset_t, which is a typedef to
> > fpregset_t, which was an actual struct on i386, but is a pointer on
> > x86_64. This would've been easier to debug if I had debuginfo :)
>
> I had this small patch applied that apparently is enough for me to have some
> kind of core dump, I'm not sure if it's a good solution:

You probably rather want to fix fetch_thread_fpregset, so as to properly
put the floating state into pr_fpreg.

This probably needs to actually copy over explicit fields, but that's
what we need anyway.

Sergey Bugaev

unread,
Jan 6, 2024, 8:40:03 AMJan 6
to
On Sat, Jan 6, 2024 at 2:42 AM Luca <lu...@orpolo.org> wrote:
> I had this small patch applied that apparently is enough for me to have
> some kind of core dump, I'm not sure if it's a good solution:

> +#ifdef __x86_64__
> + struct _libc_fpstate fpstate;
> + memset(&fpstate, 0, sizeof(fpstate));
> + note.data.pr_fpreg = &fpstate;
> +#endif
> fetch_thread_regset (threads[i], &note.data.pr_reg);
> fetch_thread_fpregset (threads[i], &note.data.pr_fpreg);

Well, that should surely prevent the crash, but so will commenting out
the fetch_thread_fpregset call completely (perhaps this is what we
should actually do for the time being).

note.data is what we're writing out into the ELF core dump file; it
doesn't make any sense for it to be a pointer to a stack-allocated
variable in the dumper's address space. It doesn't make much sense for
it to be a pointer at all.

We need to figure out what kind of note GDB expects there to be, and
write that out. It doesn't necessarily have to match our definition of
fpregset_t; in fact I think it'd be better if builds of GDB without
any explicit support for x86_64-gnu would be able to read our core
files. And this means copying what Linux does, instead of apparently
Solaris, as the current elfcore tries to. Doing this might fix issues
with reading core files on i686-gnu, too.

> sure, I've started looking into it,

\o/

> but it will take a while before I
> can run something in userspace.

Sure, I don't expect you to produce a working kernel overnight :)

I kind of want to participate in kernel-side hacking too, but I don't
nearly have the understanding required.

How does initial boot protocol work, for example? I mean, where and
how control is transferred, what gets passed in registers, that kind
of thing. I've found many detailed explanations of board-specific boot
details (e.g. for Raspberry Pi 3), and [0] has a nice explanation of
how to make U-Boot load a custom kernel instead of Linux. I haven't
been able to find much info on GRUB; it sounds like Multiboot still
exists on AArch64, but it's unclear exactly how it works.

[0]: https://krinkinmu.github.io/2023/08/21/how-u-boot-loads-linux-kernel.html

Then, we're supposed to parse & use the device tree, aren't we? Do I
understand it right that both Mach and userland would have to deal
with the device tree? — in case of Mach, this would be needed to get
info on CPU cores, RAM, and serial / console / UART, right?

Do the questions at least make sense? Or am I talking nonsense? Can
you recommend any useful resources?

Are you targeting (or: do you think it's realistic to target) many
platforms / boards in a generic manner, or would we have to have
platform-specific gnumach builds? Does it make sense to start with
some specific platform (qemu "virt"?), and expand from there, or is it
better to build in the genericity from the start?

Sergey

Joshua Branson

unread,
Jan 7, 2024, 12:10:03 PMJan 7
to
Luca <lu...@orpolo.org> writes:

> Hi Sergey,
>
> Il 03/01/24 09:17, Sergey Bugaev ha scritto:
>> How are you running it? Should I still be using a ramdisk image and
>> not rumpdisk?
>
> Recently I've been installing hurd-amd64 on another disk of my
> hurd-i386 vm and booting from that. Basically I prepare the disk with
> debootstrap --foreign, then I reuse the i386 grub install to boot the
> 64 bit kernel with a custom entry, then run the --second stage,
> configure login, fstab and network and reboot. I can give you the
> exact commands and setup I'm using if you want (I need to reinstall it
> anyway due to latest changes),

If you do send an email detailing how to install hurd-amd64, please CC
me, and I will edit the wiki.

>
> I'm currently using qemu via virt-manager, mostly with the default
> configuration for an x86_64 vm; that means a virtual SATA disk
> controller and Q35 chipset.
>
> The only issue I see is that sometimes at shutdown rumpdisk hangs and
> I can't halt the system, however this seems the same with hurd-i686
> and it doesn't happen if I force the shutdown with halt-hurd (or
> reboot-hurd). I didn't had a deeper look at this so far, but I don't
> have issues booting or connecting via ssh. I didn't try heavy builds,
> so probably that's why I don't see Samuel's issue, but I've been
> building gdb and the hurd itself (for a fix for the crash server that
> I have in my queue).
>
>
> Luca
>

--

Joshua Branson
Sent from the Hurd
0 new messages