Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

PCI arbiter crash on last qemu image

0 views
Skip to first unread message

Joan Lledó

unread,
Aug 15, 2020, 11:29:19 AM8/15/20
to bug-...@gnu.org
Hello,

I downloaded and tried the last qemu image "debian-hurd-20200731.img".
When I try to read the memory mapped content of region files in the
arbiter, it crashes and shows the message "Real-time signal 0".

This happens when executing "hexdump -Cn 256
/servers/bus/pci/0000/00/02/0/region0"

Region files for IO-ports seems to work fine, it only fails with
/dev/mem mapped regions.

Any ideas on why?

Damien Zammit

unread,
Aug 15, 2020, 10:46:38 PM8/15/20
to jll...@member.fsf.org, bug-...@gnu.org
Hi there,

On 15/8/20 9:49 pm, Joan Lledó wrote
> I downloaded and tried the last qemu image "debian-hurd-20200731.img".
> When I try to read the memory mapped content of region files in the
> arbiter, it crashes and shows the message "Real-time signal 0".

I am also getting this on my latest hurd system that I have been working on.

I ran gdb on pci-arbiter pid, put breakpoints on S_pci_conf_read and S_pci_dev_get_regions
but seemed to have no effect, and when I continue and then run the hexdump,
I get no useful backtrace, could it be a recursion problem with stack overflow?

Thread 1 received signal ?, Unknown signal.
memcpy () at ../sysdeps/i386/i686/memcpy.S:71
71 ../sysdeps/i386/i686/memcpy.S: No such file or directory.
(gdb) bt
#0 memcpy () at ../sysdeps/i386/i686/memcpy.S:71
#1 0x08059588 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Any ideas?

Damien


Joan Lledó

unread,
Aug 16, 2020, 11:04:36 AM8/16/20
to Damien Zammit, bug-...@gnu.org


El 16/8/20 a les 4:46, Damien Zammit ha escrit:
I found the same issue, investigating a bit more I found that in
func_files.c:201[1], the value of region->memory is 0x0, so reading from
there raises a segfault. That pointer should be filled in libpciacces,
at x86_pci.c:601[2] during the startup, but for some reason it seems it
doesn't. Regrettably I don't have the time to go further right know.

Could it be some issue with /dev/mem?

-------------------
[1]
http://git.savannah.gnu.org/cgit/hurd/hurd.git/tree/pci-arbiter/func_files.c#n201
[2]
https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/blob/master/src/x86_pci.c#L601

Damien Zammit

unread,
Aug 16, 2020, 7:51:38 PM8/16/20
to jll...@member.fsf.org, bug-...@gnu.org
Hi there,

On 17/8/20 1:04 am, Joan Lledó wrote:
> I found the same issue, investigating a bit more I found that in
> func_files.c:201[1], the value of region->memory is 0x0, so reading from
> there raises a segfault. That pointer should be filled in libpciacces,
> at x86_pci.c:601[2] during the startup, but for some reason it seems it
> doesn't. Regrettably I don't have the time to go further right know.
>
> Could it be some issue with /dev/mem?

It's probably due to this patch:

https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/merge_requests/12/diffs?commit_id=952f4553f91da3f3eb8615a8d53a6da7f7bc0080

(We are not using master of pciaccess).

I removed the mapping and set it to NULL because there was a bug that caused
the initial mapping within "probe" to prevent further mappings of the same region later on.

Perhaps a better way to fix the mapping problem I encountered
is by removing the check for previous mappings when trying to map regions,
but no one commented on my PR upstream (yet).

Damien

Joan Lledó

unread,
Aug 17, 2020, 4:51:45 PM8/17/20
to Damien Zammit, bug-...@gnu.org
El 17/8/20 a les 1:51, Damien Zammit ha escrit:
> It's probably due to this patch:

It's surely for that

> Perhaps a better way to fix the mapping problem I encountered
> is by removing the check for previous mappings when trying to map regions,

I could check the pointer before reading from it at func_files.c, and if
it happens to be null, then call the libpciaccess mapping function from
there, i.e. mapping the memory in the first access instead of doing it
during the startup. So there's no need to make any changes in your
patch, do you think that'd work?

Damien Zammit

unread,
Aug 18, 2020, 12:28:39 AM8/18/20
to jll...@member.fsf.org, bug-...@gnu.org
On 18/8/20 6:51 am, Joan Lledó wrote:
> El 17/8/20 a les 1:51, Damien Zammit ha escrit:
>> Perhaps a better way to fix the mapping problem I encountered
>> is by removing the check for previous mappings when trying to map regions,
>
> I could check the pointer before reading from it at func_files.c, and if
> it happens to be null, then call the libpciaccess mapping function from
> there, i.e. mapping the memory in the first access instead of doing it
> during the startup. So there's no need to make any changes in your
> patch, do you think that'd work?
>

That would probably work, but I don't want to break other usages of pciaccess
by including my change upstream, doing it your way would mean every other use case
for pciaccess would need to do the same thing as you are suggesting.

Perhaps we can just remove the check instead after all in pciaccess,
as that would be compatible with existing code.

Damien

Damien Zammit

unread,
Aug 22, 2020, 2:35:24 AM8/22/20
to jll...@member.fsf.org, bug-...@gnu.org
Joan,

On 18/8/20 6:51 am, Joan Lledó wrote:
> El 17/8/20 a les 1:51, Damien Zammit ha escrit:
>> Perhaps a better way to fix the mapping problem I encountered
>> is by removing the check for previous mappings when trying to map regions,

I have removed my latest patch from my upstream merge request and replaced it
with a patch that fixes the problem:

https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/merge_requests/12/commits

Samuel, would you be able to fix the debian package of libpciaccess based on the above?
(ie, remove one patch and add one patch).

I now get the following with my version of pciaccess (and rumpdisk still works with x86 method):

root@zamhurd:~# hexdump -C -n16 /servers/bus/pci/0000/00/02/0/region0
00000000 45 02 14 18 00 00 00 00 83 02 00 00 00 00 00 00 |E...............|
00000010
root@zamhurd:~#

Damien

Joan Lledó

unread,
Aug 22, 2020, 6:38:28 AM8/22/20
to bug-...@gnu.org
Hi,

> I have removed my latest patch from my upstream merge request and
replaced it
> with a patch that fixes the problem:

I took a look at your patch.

> mappings[devp->num_mappings].flags = map_flags;
> mappings[devp->num_mappings].memory = NULL;
>
> - if (dev->regions[region].memory == NULL) {
> - err = (*pci_sys->methods->map_range)(dev,
> -
&mappings[devp->num_mappings]);
> - }
> + err = (*pci_sys->methods->map_range)(dev,
&mappings[devp->num_mappings]);
>
> if (err == 0) {
> *addr = mappings[devp->num_mappings].memory;

I've spent some time studying memory management in libpciaccess and
haven't found a reason to make range mappings and region mappings
mutually exclusive. This completely disables range mappings in the x86
backend, since a region mapping for each region is done during the bus
scan.

However, I think the problem here is the x86 backend, not the common
interface. If we take a look at all other backends we'll see that:

1.- Neither of them call its probe() from its create(). So it's the
client who must call pci_device_probe(), it's our arbiter who should do
it and it doesn't.

2.- Neither of them map memory from its probe(). So again it's the
client who must call either pci_device_map_range() or
pci_device_map_region(), and again it's our arbiter who is not doing it.

This is due to our transition from having a libpciaccess copy of x86
backend embedded in the arbiter to use libpciaccess. When it was
embedded, I modified it to probe and map everything during the bus scan,
but the original code in libpciaccess[1] didn't do it. So we are not
using libpciaccess properly and we modified libpciaccess to fit us
instead of the other way around.

It's the commit in [2] that introduced the problem. And that affected to
all clients who use the x86 backend and mapped memory, though it seems
we are the only ones or at least the first ones to worry.

I don't think modifying the common interface of libpciaccess is the
solution, b/c that would affect all clients, not only those using the
x86 backend, I don't see why range and region mappings are mutually
exclusive but they are and clients assume that. So what we should do
instead is modify the x86 backend to behave as others and adapt the
arbiter to use libpciaccess right.

Would you like to fix the backend and I fix the arbiter?

-----
[1]
https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/blob/a167bd6474522a709ff3cbb00476c0e4309cb66f/src/x86_pci.c
[2]
https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/commit/95fbfeeacfd054de1037d6a10dee03b2b2cbc290


El 22/8/20 a les 8:35, Damien Zammit ha escrit:

Damien Zammit

unread,
Aug 22, 2020, 8:43:23 AM8/22/20
to jll...@member.fsf.org, bug-...@gnu.org
On 22/8/20 8:38 pm, Joan Lledó wrote:
> However, I think the problem here is the x86 backend, not the common
> interface. If we take a look at all other backends we'll see that:
>
> 1.- Neither of them call its probe() from its create(). So it's the
> client who must call pci_device_probe(), it's our arbiter who should do
> it and it doesn't.

OK, so we need to add this patch upstream:
---
src/x86_pci.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/src/x86_pci.c b/src/x86_pci.c
index ac598b1..1614729 100644
--- a/src/x86_pci.c
+++ b/src/x86_pci.c
@@ -814,10 +814,6 @@ pci_system_x86_scan_bus (uint8_t bus)

d->base.device_class = reg >> 8;

- err = pci_device_x86_probe (&d->base);
- if (err)
- return err;
-
pci_sys->devices = devices;
pci_sys->num_devices++;

--
2.25.1


> 2.- Neither of them map memory from its probe(). So again it's the
> client who must call either pci_device_map_range() or
> pci_device_map_region(), and again it's our arbiter who is not doing it.

And so my original patch for this part was correct:
---
src/x86_pci.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/src/x86_pci.c b/src/x86_pci.c
index 0d9f2e7..ac598b1 100644
--- a/src/x86_pci.c
+++ b/src/x86_pci.c
@@ -630,9 +630,6 @@ pci_device_x86_region_probe (struct pci_device *dev, int reg_num)
if (err)
return err;
}
-
- /* Clear the map pointer */
- dev->regions[reg_num].memory = 0;
}
else if (dev->regions[reg_num].size > 0)
{
@@ -649,15 +646,11 @@ pci_device_x86_region_probe (struct pci_device *dev, int reg_num)
if (err)
return err;
}
-
- /* Map the region in our space */
- if ( (err = map_dev_mem(&dev->regions[reg_num].memory,
- dev->regions[reg_num].base_addr,
- dev->regions[reg_num].size,
- 1)) )
- return err;
}

+ /* Clear the map pointer */
+ dev->regions[reg_num].memory = 0;
+
return 0;
}

--
2.25.1


> This is due to our transition from having a libpciaccess copy of x86
> backend embedded in the arbiter to use libpciaccess. When it was
> embedded, I modified it to probe and map everything during the bus scan,
> but the original code in libpciaccess[1] didn't do it. So we are not
> using libpciaccess properly and we modified libpciaccess to fit us
> instead of the other way around.

I see now. Sorry, I think I added the probe() in create()!

> Would you like to fix the backend and I fix the arbiter?
Can you confirm I have it correct this time? If so I will alter my merge request upstream.

Thanks,
Damien

Damien Zammit

unread,
Aug 22, 2020, 9:10:57 AM8/22/20
to jll...@member.fsf.org, bug-...@gnu.org
Hi Joan,

I found another probe() call in hurd_pci.c that should not be there.
(So I dropped a second incorrect patch).
Can you please confirm this final branch looks correct?

http://git.zammit.org/libpciaccess.git/log/?h=rumpdisk-upstream

Thanks,
Damien

Joan Lledó

unread,
Aug 22, 2020, 11:15:24 AM8/22/20
to bug-...@gnu.org
Hi,

El 22/8/20 a les 15:10, Damien Zammit ha escrit:
Yes, it looks correct for me. Since neither hurd nor x86 backends probe
on create() or map memory on probe() I'd say that's OK. Please give me
some days to write the arbiter part and test it over your version of
libpciaccess to ensure it works before sending patches to upstream.

Joan Lledó

unread,
Aug 23, 2020, 6:47:15 AM8/23/20
to bug-...@gnu.org
Hi, I made my changes on the arbiter and works fine, you can check my
code at

http://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=jlledom-pciaccess-map

On the other hand, I found a couple of issues in your patch

In map_dev_mem():

+ memfd = open("/dev/mem", flags | O_CLOEXEC);
+ if (memfd == -1)
+ return errno;
+
+ *dest = mmap(NULL, mem_size, prot, MAP_SHARED, memfd, mem_offset);
+ if (*dest == MAP_FAILED) {
+ close(memfd);
+ *dest = NULL;
+ return errno;
+ }
+
+ return 0;

here close() is only called when the map fails, it should be called also
before returning 0, when the map success.

Also in map_dev_mem(), it seems to be some problem when mapping the rom.
I tried to read the rom with hexdump:

hexdump -Cn 16 /servers/bus/pci/0000/00/03/0/rom

When running this command, it sometimes returns all zeroes and other
times returns the correct values, I checked it with the debugger and
found that is the call to vm_map who not always sets *dest correctly.
You can checkout my branch and try yourself.

In pci_device_x86_read_rom() the memory is mapped and unmapped for each
read. I wonder if it's correct to unmap with munmap() something mapped
with vm_map()

El 22/8/20 a les 15:10, Damien Zammit ha escrit:
> Hi Joan,
>
> I found another probe() call in hurd_pci.c that should not be there.
> (So I dropped a second incorrect patch).
> Can you please confirm this final branch looks correct?
>
> http://git.zammit.org/libpciaccess.git/log/?h=rumpdisk-upstream
>
> Thanks,
> Damien
>

Damien Zammit

unread,
Aug 26, 2020, 5:13:48 AM8/26/20
to jll...@member.fsf.org, bug-...@gnu.org
Hi,

On 23/8/20 8:47 pm, Joan Lledó wrote:
> http://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=jlledom-pciaccess-map

Thanks for doing this, I tried it locally and fixed two bugs in my libpciaccess patches:

diff --git a/src/x86_pci.c b/src/x86_pci.c
index 1614729..1e70f35 100644
--- a/src/x86_pci.c
+++ b/src/x86_pci.c
@@ -275,6 +275,7 @@ map_dev_mem(void **dest, size_t mem_offset, size_t mem_size, int write)
return errno;
}

+ close(memfd);
return 0;
#endif
}
@@ -505,7 +506,7 @@ pci_nfuncs(struct pci_device *dev, uint8_t *nfuncs)
static error_t
pci_device_x86_read_rom(struct pci_device *dev, void *buffer)
{
- void *bios;
+ void *bios = NULL;
struct pci_device_private *d = (struct pci_device_private *)dev;

int err;


> Also in map_dev_mem(), it seems to be some problem when mapping the rom.
> I tried to read the rom with hexdump:
>
> hexdump -Cn 16 /servers/bus/pci/0000/00/03/0/rom

This command now works (every time).

> In pci_device_x86_read_rom() the memory is mapped and unmapped for each
> read. I wonder if it's correct to unmap with munmap() something mapped
> with vm_map()

I think it's okay to munmap memory mapped with vm_map
but I forgot to initialise the rom memory pointer to NULL.

See http://git.zammit.org/libpciaccess.git/log/?h=rumpdisk-upstream again for updated branch.
If you think everything is okay with this, I will squash the last patch and submit patches upstream.

Thanks,
Damien

Joan Lledó

unread,
Aug 26, 2020, 5:17:45 PM8/26/20
to bug-...@gnu.org
Hi,

El 26/8/20 a les 11:13, Damien Zammit ha escrit:
> If you think everything is okay with this, I will squash the last patch and submit patches upstream.

Yes it's OK for me

0 new messages