OSv page_fault assert during "bsd: initializing"?

201 views
Skip to first unread message

David Smith

unread,
Feb 15, 2021, 6:51:42 AM2/15/21
to OSv Development
I am running a Golang application on OSv under firecracker. Occasionally (it's proving difficult to reproduce reliably) I see an assertion failure early in the OSv startup sequence. The console output for my release build is as follows:

OSv v0.55.0-168-g31a04239
1 CPUs detected
Firmware vendor: Unknown
bsd: initializingAssertion failed: sched::preemptable() (arch/x64/mmu.cc: page_fault: 37)

[backtrace]

It appears to be hitting the assert well before my go application is launched, so my impression is that this isn't being caused by anything my application is doing, though I haven't been able to prove that the same behaviour occurs with a base image (no apps included).

As you can see the backtrace is empty (perhaps that's the norm for a release build?), so there are no additional clues. If I can find a way to reproduce with a debug build and/or with a base image, I will update this thread.

In most cases, my OSv image starts up without issue and in those cases, the console output around the same area looks as follows:

OSv v0.55.0-168-g31a04239
1 CPUs detected
Firmware vendor: Unknown
bsd: initializing - done
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
net: initializing - done

I can see there are numerous previous discussions around the page fault when preemption is disabled assertion, but these seem to be tied to application behaviour.
Is an assert during "bsd: Initializing" a known issue for which there is already a patch?

Otherwise, if there is additional info. that I can capture to help isolate the cause, let me know.

Nadav Har'El

unread,
Feb 15, 2021, 7:16:55 AM2/15/21
to David Smith, OSv Development
On Mon, Feb 15, 2021 at 1:51 PM David Smith <d787...@gmail.com> wrote:
I am running a Golang application on OSv under firecracker. Occasionally (it's proving difficult to reproduce reliably) I see an assertion failure early in the OSv startup sequence. The console output for my release build is as follows:

OSv v0.55.0-168-g31a04239
1 CPUs detected
Firmware vendor: Unknown
bsd: initializingAssertion failed: sched::preemptable() (arch/x64/mmu.cc: page_fault: 37)

[backtrace]

I wonder why the backtrace is empty. If you run with qemu, not firecracker, I you can perhaps get a better backtrace with gdb - see instructions in https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb. Does it fail in a similar way in qemu too, or just in firecracker?


It appears to be hitting the assert well before my go application is launched, so my impression is that this isn't being caused by anything my application is doing,

That's my impression too - bsd_init() finishes (and prints "done()") before starting to run the application.
 
though I haven't been able to prove that the same behaviour occurs with a base image (no apps included).

This assert appears to happen in bsd_init() (although it's hard to know without a backtrace) - and that runs with IRQs disabled so indeed should not have had page faults and the fact we got a page fault might indicate a bug. Hopefully the debugger or backtrace can explain why/where this page fault is happening.


As you can see the backtrace is empty (perhaps that's the norm for a release build?), so there are no additional clues. If I can find a way to reproduce with a debug build and/or with a base image, I will update this thread.

In most cases, my OSv image starts up without issue and in those cases, the console output around the same area looks as follows:

OSv v0.55.0-168-g31a04239
1 CPUs detected
Firmware vendor: Unknown
bsd: initializing - done
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
net: initializing - done

I can see there are numerous previous discussions around the page fault when preemption is disabled assertion, but these seem to be tied to application behaviour.
Is an assert during "bsd: Initializing" a known issue for which there is already a patch?

Unfortunately, I don't remember this specific case was seen.  We had recently https://github.com/cloudius-systems/osv/issues/1123 where bsd_init() hangs, though, I wonder if it's related.


Otherwise, if there is additional info. that I can capture to help isolate the cause, let me know.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/17c2a0f9-970f-4b8a-b2c4-18b3c678f027n%40googlegroups.com.

David Smith

unread,
Feb 16, 2021, 3:59:22 AM2/16/21
to OSv Development
Thanks for the quick reply. I haven't seen this failure when running under Qemu, but since I'm almost always running under Firecracker and it happens so infrequently under Firecracker, that's no surprise. I will try to reproduce under Qemu and hopefully collect more information.

On a related point, when this occurs (and I assume the behaviour would be the same with any kernel assert), the OS seems to be left in a hung state. What I'd like to ensure with my build is that if a fatal error occurs, the VM is powered off. Is there an option I can use to achieve this, or would it require code modification (perhaps injecting a call to poweroff() at an appropriate location)?

Waldek Kozaczuk

unread,
Feb 16, 2021, 5:26:29 PM2/16/21
to OSv Development
The #1123 issue has been reported in aarch64 port and only (as far as I have seen) happens in SMP mode (#vCPUs >=2). In your case, you seem to be getting it with 1 cpu. But who knows if it is related.

As far as automatic power off, have you tried to prepend your command line with '--power-off-on-abort'?

Waldek

David Smith

unread,
Feb 17, 2021, 7:10:27 AM2/17/21
to OSv Development
Thanks for the power-off tip, I'll give that a try.
I ran 2,500 cycles of starting the VM under Qemu yesterday and saw no occurrences of this issue.
I'll repeat the same test using firecracker and see if I can reproduce. Last time I saw the issue, it happened once in 12 VM launches. 

David Smith

unread,
Feb 17, 2021, 10:31:35 AM2/17/21
to OSv Development
I ran the same test using firecracker. After 1865 successful executions (OSv VM startup, my app dumping usage info. to stdout then quitting, OSv VM shutdown/poweroff) the 1866'th failed with the same Assert. I'll keep trying to find a more easily reproducible test (I think I'll try the sample go app. next, which cuts out anything I'm developing).
With a 1 in 1866 failure, I don't think that's a clear cut "It only happens under firecracker", so I'll also repeat both firecracker and Qemu tests to get a more certain picture.

David Smith

unread,
Feb 26, 2021, 7:36:43 AM2/26/21
to OSv Development
The power-off-on-abort flag worked, thanks for that tip.

On the crash issue, I built an OSv image with the standard native example: ./scripts/build image=native-example
Then ran continuous launches under firecracker using a bash loop: while true; do ./scripts/firecracker.py; done
After 1249 iterations, I got the same Assert at startup.

Then I ran continuous launches under qemu:  while true; do ./scripts/run.py; done
So far, I'm over 6,500 iterations without failure, so I think this issue is occurring under firecracker only, but that may be due to startup timing.

I tried the patch supplied for a similar issue ("[PATCH] pthread_create() - prefault top of stack") in case it helped, but the Assert still happens as described in my original post.

David Smith

unread,
Mar 9, 2021, 3:54:35 AM3/9/21
to OSv Development
To add weight to the timing issue theory, we are seeing the crash much more often (est. 1 in 20) when :
  • The -verbose OSv startup argument is supplied
or
  • We TCP connect to the OSv instance whilst it is starting up
Is there somewhere I can raise a bug report for this?

Waldek Kozaczuk

unread,
Mar 9, 2021, 2:14:59 PM3/9/21
to OSv Development
Is there a chance I could get a hold of the test app you are running on OSv? On other hand given it happens so early in the boot process (it seems at least) it probably does not matter. 

Also, which version of firecracker are you using? I am using  v0.23.0 and the latest OSv master (you see to using a version of OSv that is ~30 commits behind it seems - 168). Probably does not matter.

I think that backtrace does not work as the error happens pretty early in the boot process.

Can you try to modify OSv with this patch to dump registers at least:

diff --git a/arch/x64/mmu.cc b/arch/x64/mmu.cc
index 1af268c0..5fd0ee2e 100644
--- a/arch/x64/mmu.cc
+++ b/arch/x64/mmu.cc
@@ -16,6 +16,8 @@
 #include <osv/elf.hh>
 #include "exceptions.hh"
 
+void dump_registers(exception_frame* ef);
+
 void page_fault(exception_frame *ef)
 {
     sched::fpu_lock fpu;
@@ -34,6 +36,9 @@ void page_fault(exception_frame *ef)
     }
     // The following code may sleep. So let's verify the fault did not happen
     // when preemption was disabled, or interrupts were disabled.
+    if (!sched::preemptable()) {
+        dump_registers(ef);
+    }
     assert(sched::preemptable());
     assert(ef->rflags & processor::rflags_if);
 
and send us the output? Best if you can use RIP value to find which function was called last.

BTW I ran simple golang-pie-example (./scripts/build -j4 image=golang-pie-example fs=rofs) over 1,500 times and I could not see the error:
OSv v0.55.0-203-gbd2fa23f
1 CPUs detected
Firmware vendor: Unknown
bsd: initializing - done
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
net: initializing - done
Detected virtio-mmio device: (2,0)
virtio-blk: Add blk device instances 0 as vblk0, devsize=8432640
random: intel drng, rdrand registered as a source.
random: <Software, Yarrow> initialized
VFS: unmounting /dev
VFS: mounting rofs at /rofs
VFS: mounting devfs at /dev
VFS: mounting procfs at /proc
VFS: mounting sysfs at /sys
VFS: mounting ramfs at /tmp
Booted up in 11.17 ms
Cmdline: /hello.so 
random: blocking on read.
random: device unblocked.
Hello, 世界
Go version: go1.12.6
program exited with status 0
VFS: unmounting /dev
VFS: unmounting /proc
VFS: unmounting /
ROFS: spent 4.84 ms reading from disk
ROFS: read 3760 512-byte blocks from disk
ROFS: allocated 3804 512-byte blocks of cache memory
ROFS: hit ratio is 86.20%
Powering off.

BTW are you using ROFS or default ZFS image?

BTW I used this modified firecracker.py script to run your example:
diff --git a/scripts/firecracker.py b/scripts/firecracker.py
index 7137b21c..5fa2a599 100755
--- a/scripts/firecracker.py
+++ b/scripts/firecracker.py
@@ -329,7 +329,16 @@ def main(options):
         if not options.api:
             if options.verbose:
                 print(client.firecracker_config_json())
-            firecracker, config_file_path = start_firecracker_with_no_api(firecracker_path, client.firecracker_config_json())
+            N = 1
+            while True:
+                print_time("-----------------------------------")
+                print_time("Running the %d-th time" % N)
+                firecracker, config_file_path = start_firecracker_with_no_api(firecracker_path, client.firecracker_config_json())
+                try:
+                    firecracker.wait()
+                except KeyboardInterrupt:
+                    os.kill(firecracker.pid, signal.SIGINT)
+                N = N + 1
         else:
             client.start_instance()
             print_time("Booted OSv VM")

On Tuesday, March 9, 2021 at 3:54:35 AM UTC-5 d787...@gmail.com wrote:
To add weight to the timing issue theory, we are seeing the crash much more often (est. 1 in 20) when :
  • The -verbose OSv startup argument is supplied
or
  • We TCP connect to the OSv instance whilst it is starting up
Is there somewhere I can raise a bug report for this?

Sure. You can always create a new issue at https://github.com/cloudius-systems/osv/issues.

David Smith

unread,
Mar 10, 2021, 6:04:53 AM3/10/21
to OSv Development
Thanks for the reply. Initially I was using firecracker 0.22.0, but recently updated to 0.23.1 and ROFS. The OSv issue appears to be the same with these two firecracker versions and I've seen the same behaviour with ZFS.

I'll upgrade to latest OSv source, add the dump registers patch and see what the result is.

I imagine using one of the httpserver sample apps and curl'ing the URL as OSv is starting would result in something more reproducible. My guess is that this is a timing related issue and so could be very sensitive to individual system configurations (host platform specs.).

David Smith

unread,
Mar 11, 2021, 8:23:02 AM3/11/21
to OSv Development
OK, I reproduced a crash (after 517 iterations) on a clean Ubuntu 20.04 system with latest OSv source (cloned yesterday evening) and Firecracker v0.23.3. I can't be sure it's the same issue (though it looks to have failed at the same point, so it's reasonable to assume it is) due to the output produced by the modifications you requested in the mmu.cc:page_fault() function.

Here's how I reproduced the issue:

Applied the two patches you supplied in your most recent post to the OSv source.
Made a small change to the provided "golang-httpexample" sample app to force it to quit upon receiving an HTTP request:

diff --git a/golang-httpserver/httpserver.go b/golang-httpserver/httpserver.go
index 49e5909..7c8487e 100644
--- a/golang-httpserver/httpserver.go
+++ b/golang-httpserver/httpserver.go
@@ -13,7 +13,8 @@ func hello(w http.ResponseWriter, r *http.Request) {
        hostname, err := os.Hostname()
         if( err == nil ) {
                io.WriteString(w, "Hello world from " + runtime.Version() + " at " + hostname)
-        }
+               os.Exit(1)      // Horrible hack for OSv startup test purposes only - Force quit the application
+       }
 }

Built OSv image using ./scripts/build -j4 image=golang-httpexample

Created a shell script 'curltest.sh' containing the following:
while true; do curl -m 1 -s http://172.16.0.2:8000; done

In shell session 1, executed the following command:
./scripts/firecracker.py -m 64M -n -V > osvstartuptests.log

In shell session 2, monitored the log:
tail -f osvstartuptests.log

In shell session 3, executed the following command:
curltest.sh

After 517 iterations, the log output contained the following:
OSv v0.55.0-205-g36432c3a
1 CPUs detected
Firmware vendor: Unknown
bsd: initializingexception nested too deeply
[backtrace]

Hopefully this is more reproducible for you.

David Smith

unread,
Mar 11, 2021, 8:24:56 AM3/11/21
to OSv Development
Apologies, in the previous post, replace "golang-httpexample" references with "golang-httpserver".

Waldek Kozaczuk

unread,
Mar 15, 2021, 11:44:12 AM3/15/21
to OSv Development
I noticed you pass 64M of memory which, given you use ZFS image (based on what I see you build) which uses quite many threads, maybe a bit low (we have plans to further optimize memory usage). So in my case when I executed the app exactly like you I would sometimes get OSv hung in another place later or get OOM crash. Please note that sometimes OSv does not gracefully handle OOM crashes but it is hard to tell if "bsd: initializingexception nested too deeply" is really caused by an out-of-memory scenario.

In any case, when I bumped memory to 256M - ./scripts/firecracker.py -m 256M -n -V > osvstartuptests.log - (btw ROFS uses much less memory so 64M would be just enough for the Golang httpserver app, the C native example app requires only 15M on firecracker), I could not reproduce your crash after 10K (10 thousand) of runs.

I have also tested the ROFS image (fs=rofs) and also could not see the crash with 10K iterations and memory of 64M passed to firecracker.

Golang version: 1.12.17 (btw I noticed that this httpserver app hangs when built with golang 1.15 also some syscall unhandled so it would be nice to look into why).
Firecracker version: v.0.23.0

I am not sure what to tell you. Is your case an out-of-memory issue or something completely different? Maybe gcc version? Firecracker 0.23.0 vs 0.23.3?

Also, I am testing it on Fedora 33. I do have Ubuntu 20.10 machine I have not had a chance to test on.

Waldek

David Smith

unread,
Mar 31, 2021, 11:20:08 AM3/31/21
to OSv Development

The patch you supplied to obtain more detailed information isn't working as you intended I think, since we're seeing 'exception nested too deeply' instead of the exception data. So I modified dump.cc:dump_registers() to set the ‘name’ to an empty string then comment out the call to osv::lookup_name_demangled().

The image was built with fs=rofs and I ran with default firecracker.py memory settings (128M). 

I was then able to capture the register information at the time of the Assert:

OSv v0.55.0-237-g7cd33926


1 CPUs detected
Firmware vendor: Unknown

bsd: initializing[registers]
RIP: 0x00000000403ebffe <>
RFL: 0x0000000000010246  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x0000000000000001  RBX: 0xfffffffffffff900  RCX: 0x0000000000000020  RDX: 0x0000000000000000
RSI: 0xffff800000a31000  RDI: 0x0000000000000000  RBP: 0xffff80000015acd0  R8:  0x0000000000000007
R9:  0xffff80000000d518  R10: 0x0000000000000000  R11: 0xffff80000014a170  R12: 0xffff8000001cb040
R13: 0xffff80000000d518  R14: 0x0000000000000000  R15: 0xffff80000014a170  RSP: 0xffff80000015aca0
Assertion failed: sched::preemptable() (arch/x64/mmu.cc: page_fault: 42)

[backtrace]
[registers]
RIP: 0x00000000403e943a <>
RFL: 0x0000000000010046  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x0000000000000000  RBX: 0xfffffffffffff900  RCX: 0x0000000000000000  RDX: 0x0000000000000001
RSI: 0x0000000000000007  RDI: 0x0000000000000001  RBP: 0xffff800000149120  R8:  0x0000000000000008
R9:  0x0000000000000001  R10: 0x0000000000000040  R11: 0x0000000000000002  R12: 0xffff8000001cb040
R13: 0x0000000000000001  R14: 0x0000000000000040  R15: 0x0000000000000002  RSP: 0xffff800000149100


I used the below approach to lookup the offending function from the supplied RIP, please advise if the approach I’ve used is incorrect:

1. readelf -s build/release.x64/loader.elf > symbols.txt

2. Lookup RIP in symbols.txt to find the function this address is in

Which gives me, for the first RIP (0x00000000403ebffe):

  6828: 00000000403ebdd0  1054 FUNC    GLOBAL DEFAULT    4 _ZN6memory9page_pool2l16r

And for the second RIP (0x00000000403e943a):

  8594: 00000000403e93f0   203 FUNC    GLOBAL DEFAULT    4 _ZN6memory9page_pool2l116


Responses below to some of the points from your previous reply.

Firecracker version differences are out I think (see my earlier comment where I mention that I was previously running FC v0.22.0 and saw the same issue). Now I’m running the very latest (v0.24.2).

Here's my gcc and go version information:

gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

go version go1.16.2 linux/amd64

I agree with your comment. This assert seems to be so early in the kernel startup sequence, it seems to me unlikely to be an OOM related issue. In your tests, did you run the curltest shell script (as provided in my last post)? This, along with the verbose kernel arg. certainly seem to induce a failure more frequently, though it still happens without these.

Is there anything further that can be done to trace the cause?

Waldek Kozaczuk

unread,
Apr 1, 2021, 12:08:45 AM4/1/21
to OSv Development
This one indicates an error that happened here:

void l1::refill()
{
    SCOPE_LOCK(preempt_lock);
    auto& pbuf = get_l1();
    if (pbuf.nr + page_batch::nr_pages < pbuf.max / 2) { 
        auto* pb = global_l2.alloc_page_batch();
        if (pb) {
            // Other threads might have filled the array while we waited for
            // the page batch.  Make sure there is enough room to add the pages
            // we just acquired, otherwise return them.
            if (pbuf.nr + page_batch::nr_pages <= pbuf.max) {
                for (auto& page : pb->pages) {
                    pbuf.push(page);
                }    
            } else {
                global_l2.free_page_batch(pb);
            }    
        }    
    }    

}

 

And for the second RIP (0x00000000403e943a):

  8594: 00000000403e93f0   203 FUNC    GLOBAL DEFAULT    4 _ZN6memory9page_pool2l116

This one would be here:
 void* l1::alloc_page_local()
{
    SCOPE_LOCK(preempt_lock);
    auto& pbuf = get_l1();
    if (pbuf.nr < pbuf.watermark_lo) {
        pbuf.wake_thread();
    }
    if (pbuf.nr == 0) {
        return nullptr;
    }
    return pbuf.pop();
}

The common thing about those (if the root cause is common), is that both call get_l1() which returns a per-cpu variable by reference. I wonder if there some smart assert() we could in those methods.

Alternatively, you could try to put an assert in here:

PERCPU(l1*, percpu_l1);
static sched::cpu::notifier _notifier([] () {
    *percpu_l1 = new l1(sched::cpu::current());
    // N per-cpu threads for L1 page pool, 1 thread for L2 page pool
    // Switch to smp_allocator only when all the N + 1 threads are ready
    if (smp_allocator_cnt++ == sched::cpus.size()) {
        smp_allocator = true;
    }
    l1_pool_stats.resize(sched::cpus.size());
});
static inline l1& get_l1()
{
    return **percpu_l1;
}

Could it be that percpu_l1 is not properly initialized?

David Smith

unread,
Apr 1, 2021, 12:18:01 PM4/1/21
to OSv Development
Thanks.

I'm not sure what to look for to be able to verify that percpu_l1 is initialised correctly. I can see that it is a structure with various fields and function references, but I'm not clear how much of the field content should be initialised at the point of the failure. Regardless, I added some additional debug to at least show the percpu_l1 pointer content at the time of initialisation (so we can see this has been setup by the time the Assert occurs).

Another possible clue might be that I have occasionally also seen a failure like this:

 OSv v0.55.0-237-g7cd33926
1 CPUs detected
Firmware vendor: Unknown
trying to execute or access missing symbol
[backtrace]

Looking in the Page_Fault() function (where we made the change to call dump_registers()), I can see that the "trying to execute....missing symbol" message is being output for another error condition. Perhaps this is a different issue, but I bring it to your attention just in case it is a possible clue to a single root cause.

I added some basic additional debug output for both conditions (the changes I've made to the base repository file set is included below in git diff form) and re-ran the tests. I captured the new output for both the original Assert and the missing symbol conditions, these are included below. Note that the debug message displaying the percpu_l1 value is being split by the "bsd: initializing" message, is that another clue, or expected behaviour? 

*** MISSING SYMBOL ***
2021-04-01T11:37:33.831033: Start
2021-04-01T11:37:33.835916: API socket-less: True
2021-04-01T11:37:33.835940: Firecracker ready
2021-04-01T11:37:33.836200: Configured VM
2021-04-01T11:37:33.836210: Added disk
2021-04-01T11:37:33.836221: Created OSv VM with cmdline: --power-off-on-abort --verbose --ip=eth0,172.16.0.2,255.255.255.252 --defaultgw=172.16.0.1 --nameserver=172.16.0.1 --nopci --rootfs=zfs /go.so /hello.so
{
   "machine-config": {
      "vcpu_count": 1,
      "mem_size_mib": 128,
      "ht_enabled": false
   },
   "drives": [
      {
         "drive_id": "rootfs",
         "path_on_host": "/home/david/osv/scripts/../build/last/usr.raw",
         "is_root_device": false,
         "is_read_only": false
      }
   ],
   "network-interfaces": [
      {
         "iface_id": "eth0",
         "host_dev_name": "fc_tap0",
         "guest_mac": "52:54:00:12:34:56",
         "rx_rate_limiter": {
            "bandwidth": {
               "size": 0,
               "refill_time": 0
            },
            "ops": {
               "size": 0,
               "refill_time": 0
            }
         },
         "tx_rate_limiter": {
            "bandwidth": {
               "size": 0,
               "refill_time": 0
            },
            "ops": {
               "size": 0,
               "refill_time": 0
            }
         }
      }
   ],
   "boot-source": {
      "kernel_image_path": "/home/david/osv/scripts/../build/last/kernel.elf",
      "boot_args": "--power-off-on-abort --verbose --ip=eth0,172.16.0.2,255.255.255.252 --defaultgw=172.16.0.1 --nameserver=172.16.0.1 --nopci --rootfs=zfs /go.so /hello.so"
   }
}
2021-04-01T11:37:33.842643: Waiting for firecracker process to terminate
2021-04-01T11:37:34.789093: End
OSv v0.55.0-237-g7cd33926
1 CPUs detected
Firmware vendor: Unknown

*percpu_l1 = 0x174040
addr: 0x0000000000000008    mmu::page_size: 4096     elf:mspa: 0x0000000000000000
trying to execute or access missing symbol
[backtrace]
addr: 0x0000000000000008    mmu::page_size: 4096     elf:mspa: 0x0000000000000000
*** MISSING SYMBOL ***

*** ASSERT ***
2021-04-01T11:28:15.610862: Start
2021-04-01T11:28:15.612978: API socket-less: True
2021-04-01T11:28:15.612998: Firecracker ready
2021-04-01T11:28:15.613394: Configured VM
2021-04-01T11:28:15.613430: Added disk
2021-04-01T11:28:15.613473: Created OSv VM with cmdline: --power-off-on-abort --verbose --ip=eth0,172.16.0.2,255.255.255.252 --defaultgw=172.16.0.1 --nameserver=172.16.0.1 --nopci --rootfs=zfs /go.so /hello.so
{
   "machine-config": {
      "vcpu_count": 1,
      "mem_size_mib": 128,
      "ht_enabled": false
   },
   "drives": [
      {
         "drive_id": "rootfs",
         "path_on_host": "/home/david/osv/scripts/../build/last/usr.raw",
         "is_root_device": false,
         "is_read_only": false
      }
   ],
   "network-interfaces": [
      {
         "iface_id": "eth0",
         "host_dev_name": "fc_tap0",
         "guest_mac": "52:54:00:12:34:56",
         "rx_rate_limiter": {
            "bandwidth": {
               "size": 0,
               "refill_time": 0
            },
            "ops": {
               "size": 0,
               "refill_time": 0
            }
         },
         "tx_rate_limiter": {
            "bandwidth": {
               "size": 0,
               "refill_time": 0
            },
            "ops": {
               "size": 0,
               "refill_time": 0
            }
         }
      }
   ],
   "boot-source": {
      "kernel_image_path": "/home/david/osv/scripts/../build/last/kernel.elf",
      "boot_args": "--power-off-on-abort --verbose --ip=eth0,172.16.0.2,255.255.255.252 --defaultgw=172.16.0.1 --nameserver=172.16.0.1 --nopci --rootfs=zfs /go.so /hello.so"
   }
}
2021-04-01T11:28:15.619818: Waiting for firecracker process to terminate
2021-04-01T11:28:16.510262: End
OSv v0.55.0-237-g7cd33926
1 CPUs detected
Firmware vendor: Unknown

*percpu_l1 = 0xbsd: initializing174040
[registers]
RIP: 0x00000000403ebffe <>
RFL: 0x0000000000010246  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x0000000000000001  RBX: 0xfffffffffffff900  RCX: 0x0000000000000020  RDX: 0x0000000000000000
RSI: 0xffff8000009ee000  RDI: 0x0000000000000000  RBP: 0xffff80000015acd0  R8:  0x0000000000000006
R9:  0xffff80000000d518  R10: 0x0000000000000000  R11: 0xffff80000014a170  R12: 0xffff800000174040
R13: 0xffff80000000d518  R14: 0x0000000000000000  R15: 0xffff80000014a170  RSP: 0xffff80000015aca0
addr: 0x0000000000000008
Assertion failed: sched::preemptable() (arch/x64/mmu.cc: page_fault: 48)

[backtrace]
[registers]
RIP: 0x00000000403e943a <>
RFL: 0x0000000000010046  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x0000000000000000  RBX: 0xfffffffffffff900  RCX: 0x0000000000000000  RDX: 0x0000000000000001
RSI: 0x0000000000000007  RDI: 0x0000000000000001  RBP: 0xffff800000149120  R8:  0x0000000000000008
R9:  0x0000000000000001  R10: 0x0000000000000040  R11: 0x0000000000000002  R12: 0xffff800000174040
R13: 0x0000000000000001  R14: 0x0000000000000040  R15: 0x0000000000000002  RSP: 0xffff800000149100
addr: 0x0000000000000008
*** ASSERT ***


Code changes were as follows:

diff --git a/arch/x64/dump.cc b/arch/x64/dump.cc
index 293fd214..2a46f4f4 100644
--- a/arch/x64/dump.cc
+++ b/arch/x64/dump.cc
@@ -14,8 +14,11 @@ void dump_registers(exception_frame* ef)
 {
     char name[1024];
 
-    osv::lookup_name_demangled(
-        reinterpret_cast<void *>(ef->rip), name, sizeof(name));
+    name[0] = '\0'; // DS initialise to empty
+    // DS comment out for test START
+    // osv::lookup_name_demangled(
+    //     reinterpret_cast<void *>(ef->rip), name, sizeof(name));
+    // DS comment out for test END
     debug_ll("[registers]\n");
     debug_ll("RIP: 0x%016lx <%s>\n", ef->rip, name);
     debug_ll("RFL: 0x%016lx  CS:  0x%016x  SS:  0x%016x\n", ef->rflags, ef->cs, ef->ss);
diff --git a/arch/x64/mmu.cc b/arch/x64/mmu.cc
index 1af268c0..500e6c4a 100644
--- a/arch/x64/mmu.cc
+++ b/arch/x64/mmu.cc
@@ -16,6 +16,8 @@
 #include <osv/elf.hh>
 #include "exceptions.hh"
 
+void dump_registers(exception_frame* ef);
+
 void page_fault(exception_frame *ef)
 {
     sched::fpu_lock fpu;
@@ -30,10 +32,19 @@ void page_fault(exception_frame *ef)
         abort("trying to execute null pointer");
     }
     if (reinterpret_cast<void*>(addr & ~(mmu::page_size - 1)) == elf::missing_symbols_page_addr) {
+// DS debug start
+        debug_ll("addr: 0x%016lx    mmu::page_size: %ld     elf:mspa: 0x%016lx\n", addr, mmu::page_size, elf::missing_symbols_page_addr);
+// DS debug end
         abort("trying to execute or access missing symbol");
     }
     // The following code may sleep. So let's verify the fault did not happen
     // when preemption was disabled, or interrupts were disabled.
+    if (!sched::preemptable()) {
+        dump_registers(ef);
+// DS debug start
+        debug_ll("addr: 0x%016lx\n", addr);
+// DS debug end
+    }
     assert(sched::preemptable());
     assert(ef->rflags & processor::rflags_if);
 
diff --git a/core/mempool.cc b/core/mempool.cc
index bd8e2fcf..ad11a716 100644
--- a/core/mempool.cc
+++ b/core/mempool.cc
@@ -1326,6 +1326,9 @@ private:
 PERCPU(l1*, percpu_l1);
 static sched::cpu::notifier _notifier([] () {
     *percpu_l1 = new l1(sched::cpu::current());
+
+    debug_ll("\n*percpu_l1 = 0x%x\n", *percpu_l1);    // DS debug
+
     // N per-cpu threads for L1 page pool, 1 thread for L2 page pool
     // Switch to smp_allocator only when all the N + 1 threads are ready
     if (smp_allocator_cnt++ == sched::cpus.size()) {
diff --git a/scripts/firecracker.py b/scripts/firecracker.py
index 7137b21c..9f4d71cd 100755
--- a/scripts/firecracker.py
+++ b/scripts/firecracker.py
@@ -297,6 +297,10 @@ def main(options):
     if options.verbose:
         cmdline = '--verbose ' + cmdline
 
+    # DS debug start
+    cmdline = '--power-off-on-abort ' + cmdline
+    # DS debug end
+
     # Create API client and make API calls
     if options.api:
         client = ApiClient(socket_path.replace("/", "%2F"))

Waldek Kozaczuk

unread,
Apr 2, 2021, 5:52:42 PM4/2/21
to OSv Development
On Thursday, April 1, 2021 at 12:18:01 PM UTC-4 d787...@gmail.com wrote:
Thanks.

I'm not sure what to look for to be able to verify that percpu_l1 is initialised correctly. I can see that it is a structure with various fields and function references, but I'm not clear how much of the field content should be initialised at the point of the failure. Regardless, I added some additional debug to at least show the percpu_l1 pointer content at the time of initialisation (so we can see this has been setup by the time the Assert occurs).

Another possible clue might be that I have occasionally also seen a failure like this:

 OSv v0.55.0-237-g7cd33926
1 CPUs detected
Firmware vendor: Unknown
trying to execute or access missing symbol
[backtrace]

Looking in the Page_Fault() function (where we made the change to call dump_registers()), I can see that the "trying to execute....missing symbol" message is being output for another error condition. Perhaps this is a different issue, but I bring it to your attention just in case it is a possible clue to a single root cause.

This one is a completely bizarre error to happen so early. The point of the elf::missing_symbols_page_addr is to set up a page which when hit would indicate a missing symbol when the dynamic linker loads the app which happens way later. Something is wrong with the memory setup. 
These addresses of  0x0000000000000008 are so wrong. I think I have some idea of what might be going on.

Can you try to disable the missing_symbols_page_addr logic? Please comment call to setup_missing_symbols_detector().
Can you post your kernel.elf somewhere? Meanwhile, could you try one of the kernel.elf prebuilt on Ubuntu 20.10 from here -  https://github.com/osvunikernel/osv-nightly-releases/releases/tag/ci-master-250? I wonder if somehow the gcc you are using produces code that misbehaves or exposes some bug in OSv. 

Waldek Kozaczuk

unread,
Apr 2, 2021, 5:53:37 PM4/2/21
to OSv Development
I think you can use '-k' option of firecracker.py to point to an arbitrary kernel.elf.

David Smith

unread,
Apr 6, 2021, 8:34:20 AM4/6/21
to OSv Development
I renamed my kernel.elf and kernel-stripped.elf, then copied in the kernel.elf from https://github.com/osvunikernel/osv-nightly-releases/releases/tag/ci-master-250, then ran the same test. Just as a reminder, the test consists of:

One terminal session running the following script, with output redirected to a log file:
while true; do ./scripts/firecracker.py -n -V; done

A second terminal session running the following script:
while true; do curl https://172.16.0.2 --insecure --no-keepalive; done

I don't think the fact that it's https is important, I imagine curl'ing http://172.16.0.2 would have the same result.

The missing symbol condition occurred first (at iteration 650):

OSv v0.55.0-240-g9d1f5111
1 CPUs detected
Firmware vendor: Unknown
trying to execute or access missing symbol
[backtrace]

Then the assert condition occurred (at iteration 3844):

OSv v0.55.0-240-g9d1f5111
1 CPUs detected
Firmware vendor: Unknown
bsd: initializingAssertion failed: sched::preemptable() (arch/x64/mmu.cc: page_fault: 37)

[backtrace]

I did notice that my kernel.elf was slightly larger (7120032 bytes total) than the CI generated kernel.elf (7074976 bytes total). On comparing the readelf output of the two, I could also see some differences in functions included and in the size of some functions.

I reverted to my original kernel.elf, commented out "elf::setup_missing_symbols_detector()" in main_cont() in loader.cc, built and started the test.
At iteration 3653:

OSv v0.55.0-237-g7cd33926
1 CPUs detected
Firmware vendor: Unknown

*percpu_l1 = 0x1740bsd: initializing40
addr: 0x0000000000000008    mmu::page_size: 4096     elf:mspa: 0x0000000000000000
trying to execute or access missing symbol
[backtrace]
addr: 0x0000000000000008    mmu::page_size: 4096     elf:mspa: 0x0000000000000000

Followed by the same fault (missing symbol) at iteration 3966 and again at iteration 4628.
The setup_missing_symbols_detector() change seems to have changed the behaviour, but not overcome the failures. Is this what you expected?

Let me know if you'd still like me to supply a copy of my kernel.elf.

David Smith

unread,
Apr 7, 2021, 8:30:10 AM4/7/21
to OSv Development
Re: Your earlier point that it could be due to my gcc version revealing a gcc/OSv bug, I thought I would try upgrading to the latest gcc/g++ tools. So I'm now running:

gcc (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0

I rebuilt my OSv image as follows:

./scripts/build clean
./scripts/build -j4 image=golang-example

Then re-ran my test. At iterations 591, 688 and 9326, I saw the missing symbol failure, but so far, no sign of the Assert failure.

Waldek Kozaczuk

unread,
Apr 10, 2021, 11:45:59 PM4/10/21
to OSv Development
Interesting. What happens if you apply that other change to disable setting up that missing symbols page and compile with gcc 10.2.0?

David Smith

unread,
Apr 11, 2021, 12:43:17 PM4/11/21
to OSv Development
The gcc 10.2.0 build already included the missing symbol change to disable setting up the missing symbols page.

Did you see my April 6th reply?

Waldek Kozaczuk

unread,
Apr 15, 2021, 5:04:47 PM4/15/21
to OSv Development
Sorry I was away on a vacation so responding somewhat late.

Yes, I did see your April 6th reply, which seems to prove that the issue is unlikely related to the GCC version nor other build specifics. 

It is still most likely a bug in OSv that is triggered by some runtime-specific element that we do not know. My hunch is that it is memory-related, possibly setup, given we see two types of crashes - "bsd .." page fault and missing symbol page error. The second one is very weird as the symbols from the kernel should always be found, so most likely something is reading from the wrong part of memory.

Having said that I do not know where to go from there especially given I cannot reproduce it.

Today I tried to re-produce on my Ubuntu 20.10 machine to no avail. I looked at your email from Feb 26 where you wrote:

"On the crash issue, I built an OSv image with the standard native example: ./scripts/build image=native-example
Then ran continuous launches under firecracker using a bash loop: while true; do ./scripts/firecracker.py; done
After 1249 iterations, I got the same Assert at startup."

which seems to prove that this error is not app-specific and I used exactly some commands to build and run the example:

With the original command:

while true; do ./scripts/firecracker.py >> /tmp/running; done

I could run it 71700 (over 71K) with no errors. 


With verbose enabled:

while true; do ./scripts/firecracker.py -V >> /tmp/running2; done

I could run it 46691(over 46K) with no errors.


Let me share some of my host machine-specific (5-6 year old MacBook Pro) info:

more /proc/cpuinfo 

processor : 0

vendor_id : GenuineIntel

cpu family : 6

model : 70

model name : Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz

stepping : 1

microcode : 0x1c

cpu MHz : 876.525

cache size : 6144 KB

physical id : 0

siblings : 8

core id : 0

cpu cores : 4

apicid : 0

initial apicid : 0

fpu : yes

fpu_exception : yes

cpuid level : 13

wp : yes

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1

gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 s

sse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_singl

e pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln 

pts md_clear flush_l1d

vmx flags : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple shadow_vm

cs

bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds

bogomips : 4589.68

clflush size : 64

cache_alignment : 64

address sizes : 39 bits physical, 48 bits virtual

power management:


uname -a
Linux wkoMacBookPro 5.8.0-49-generic #55-Ubuntu SMP Wed Mar 24 14:45:45 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

So my machine comes with Intel CPUs.

Is yours Intel as well or AMD? I know that Firecracker had to add some special support for AMD.

Finally given it might be memory-setup related issue I wonder what the physical memory ranges you see.

This patch:
git diff core/
diff --git a/core/mempool.cc b/core/mempool.cc
index bd8e2fcf..52f94774 100644
--- a/core/mempool.cc
+++ b/core/mempool.cc
@@ -1732,6 +1732,8 @@ void free_initial_memory_range(void* addr, size_t size)
     if (!size) {
         return;
     }
+    debug_early_u64("mempool: add range at:      ", (u64)addr);
+    debug_early_u64("mempool: add range of size: ", size);
     auto a = reinterpret_cast<uintptr_t>(addr);
     auto delta = align_up(a, page_size) - a;
     if (delta > size) {

produces this extra info with verbose output:

2021-04-15T13:40:50.493125: Start
2021-04-15T13:40:50.493200: API socket-less: True
2021-04-15T13:40:50.493204: Firecracker ready
2021-04-15T13:40:50.493293: Configured VM
2021-04-15T13:40:50.493298: Added disk
2021-04-15T13:40:50.493301: Created OSv VM with cmdline: --verbose --nopci --rootfs=zfs /hello
{
   "machine-config": {
      "vcpu_count": 1,
      "mem_size_mib": 128,
      "ht_enabled": false
   },
   "drives": [
      {
         "drive_id": "rootfs",
         "path_on_host": "/home/wkozaczuk/projects/osv-master/scripts/../build/last/usr.raw",
         "is_root_device": false,
         "is_read_only": false
      }
   ],
   "boot-source": {
      "kernel_image_path": "/home/wkozaczuk/projects/osv-master/scripts/../build/last/kernel.elf",
      "boot_args": "--verbose --nopci --rootfs=zfs /hello"
   }
}
OSv v0.55.0-240-g9d1f5111
mempool: add range at:      ffff800000957434
mempool: add range of size: 00000000076a8bcc
mempool: add range at:      ffff800000000001
mempool: add range of size: 000000000009fbff
mempool: add range at:      ffff800000100000
mempool: add range of size: 0000000000100000
1 CPUs detected
Firmware vendor: Unknown
bsd: initializing - done
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
net: initializing - done
Detected virtio-mmio device: (2,0)
virtio-blk: Add blk device instances 0 as vblk0, devsize=268435456
random: intel drng, rdrand registered as a source.
random: <Software, Yarrow> initialized
VFS: unmounting /dev
VFS: mounting zfs at /zfs
zfs: mounting osv/zfs from device /dev/vblk0.1
VFS: mounting devfs at /dev
VFS: mounting procfs at /proc
VFS: mounting sysfs at /sys
BSD shrinker: event handler list found: 0xffffa000009ad080
        BSD shrinker found: 1
BSD shrinker: unlocked, running
Booted up in 47.39 ms
Cmdline: /hello 
Hello from C code
VFS: unmounting /dev
VFS: unmounting /proc
VFS: unmounting /
Powering off.
2021-04-15T13:40:50.495752: Waiting for firecracker process to terminate
2021-04-15T13:40:50.594645: End

Also in all 46691 runs with verbose I would always see the same memory ranges.

I wonder what ranges are in your case. I know we used to have some edge-case bugs in memory setup around parsing e820 ranges so.

Maybe we have a bug where we read incorrectly e820 and then create a wrong mapping or read less memory than provided.

Are the ranges the same in your case or possibly fluctuate and in case of the crash are somewhat different?

Otherwise, I am out of any more ideas.

Waldek

PS. Do you possibly run your tests in a nested virtualization setup or possibly on EC2 i3 metal instances?

Waldek Kozaczuk

unread,
Apr 15, 2021, 6:06:37 PM4/15/21
to OSv Development
Another related possibility is that we have a bug somewhere where we read a configuration value - some memory address, size parameter, etc. And we read it incorrectly - instead of say 4 bytes we read 8 bytes and we get garbage value which is then used to set up things incorrectly. And we end up reading from or writing to some wrong places.

David Smith

unread,
Apr 20, 2021, 12:08:23 PM4/20/21
to OSv Development
Again, I appreciate you continuing to make suggestions. I was also on vacation yesterday and catching up today. I will respond to your various points in more detail as soon as possible.

On the question of nested virtualisation, the answer is yes, my dev environment is Ubuntu 20.04 under VMWare Fusion, with Intel VT-x/EPT and Code Profiling CPU options enabled in the VM settings.

David Smith

unread,
Apr 21, 2021, 10:15:08 AM4/21/21
to OSv Development
Some info. for my host (Intel CPU based) Mac system (on which I'm running MacOS 10.15.7):

Darwin 19.6.0 Darwin Kernel Version 19.6.0: Tue Jan 12 22:13:05 PST 2021; root:xnu-6153.141.16~1/RELEASE_X86_64 x86_64

machdep.cpu.max_basic: 27
machdep.cpu.max_ext: 2147483656
machdep.cpu.vendor: GenuineIntel
machdep.cpu.brand_string: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
machdep.cpu.family: 6
machdep.cpu.model: 126
machdep.cpu.extmodel: 7
machdep.cpu.extfamily: 0
machdep.cpu.stepping: 5
machdep.cpu.feature_bits: 9221959987971750911
machdep.cpu.leaf7_feature_bits: 4072613871 1077960526
machdep.cpu.leaf7_feature_bits_edx: 3154117648
machdep.cpu.extfeature_bits: 1241984796928
machdep.cpu.signature: 460517
machdep.cpu.brand: 0
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 AVX2 FDPEO SMEP BMI2 ERMS INVPCID FPU_CSDS AVX512F AVX512DQ RDSEED ADX SMAP AVX512IFMA CLFSOPT IPT AVX512CD SHA AVX512BW AVX512VL AVX512VBMI UMIP PKU GFNI VAES VPCLMULQDQ AVX512VNNI AVX512BITALG AVX512VPOPCNTDQ RDPID SGXLC FSREPMOV MDCLEAR IBRS STIBP L1DF ACAPMSR SSBD
machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI
machdep.cpu.logical_per_package: 16
machdep.cpu.cores_per_package: 8
machdep.cpu.microcode_version: 160
machdep.cpu.processor_flag: 7
machdep.cpu.mwait.linesize_min: 64
machdep.cpu.mwait.linesize_max: 64
machdep.cpu.mwait.extensions: 3
machdep.cpu.mwait.sub_Cstates: 286396448
machdep.cpu.thermal.sensor: 1
machdep.cpu.thermal.dynamic_acceleration: 1
machdep.cpu.thermal.invariant_APIC_timer: 1
machdep.cpu.thermal.thresholds: 2
machdep.cpu.thermal.ACNT_MCNT: 1
machdep.cpu.thermal.core_power_limits: 1
machdep.cpu.thermal.fine_grain_clock_mod: 1
machdep.cpu.thermal.package_thermal_intr: 1
machdep.cpu.thermal.hardware_feedback: 0
machdep.cpu.thermal.energy_policy: 1
machdep.cpu.xsave.extended_state: 743 2688 2696 0
machdep.cpu.xsave.extended_state1: 15 2432 8448 0
machdep.cpu.arch_perf.version: 5
machdep.cpu.arch_perf.number: 8
machdep.cpu.arch_perf.width: 48
machdep.cpu.arch_perf.events_number: 8
machdep.cpu.arch_perf.events: 0
machdep.cpu.arch_perf.fixed_number: 4
machdep.cpu.arch_perf.fixed_width: 48
machdep.cpu.cache.linesize: 64
machdep.cpu.cache.L2_associativity: 8
machdep.cpu.cache.size: 256
machdep.cpu.address_bits.physical: 39
machdep.cpu.address_bits.virtual: 48
machdep.cpu.core_count: 4
machdep.cpu.thread_count: 8
machdep.cpu.tsc_ccc.numerator: 104
machdep.cpu.tsc_ccc.denominator: 2

I would add that this is my development system. We have a test system which utilises VMWare ESXi and runs the OSv VM under firecracker on an Alpine Linux VMWare VM (so nested virtualisation again) and we also see the "bsd initializing" crash on startup on this configuration occasionally.

I added the patch to dump the physical memory region information as you requested. Built again with build fs=rofs image=native-example
then tested with  while true; do ./scripts/firecracker.py >> /tmp/running; done (my firecracker.py script is slightly modified to add in the '--power-off-on-abort' cmd line argument, which I think is not important, but I mention it just in case).
On the 996'th iteration, I hit the missing symbol failure. Below is the output for the last successful invocation followed by the failed invocation. The physical memory information dumped is consistent across all 996 invocations.

OSv v0.55.0-237-g7cd33926
mempool: add range at:      ffff800000956434
mempool: add range of size: 00000000076a9bcc
mempool: add range at:      ffff800000000001
mempool: add range of size: 000000000009fbff
mempool: add range at:      ffff800000100000
mempool: add range of size: 0000000000100000
1 CPUs detected
Firmware vendor: Unknown

*percpu_l1 = 0x175040
bsd: initializing - done
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
net: initializing - done
Detected virtio-mmio device: (2,0)
virtio-blk: Add blk device instances 0 as vblk0, devsize=6474752
random: intel drng, rdrand registered as a source.
random: <Software, Yarrow> initialized
VFS: unmounting /dev
VFS: mounting rofs at /rofs
VFS: mounting devfs at /dev
VFS: mounting procfs at /proc
VFS: mounting sysfs at /sys
VFS: mounting ramfs at /tmp
Booted up in 121.29 ms
Cmdline: /hello 
Hello from C code
VFS: unmounting /dev
VFS: unmounting /proc
VFS: unmounting /
ROFS: spent 1.78 ms reading from disk
ROFS: read 64 512-byte blocks from disk
ROFS: allocated 61 512-byte blocks of cache memory
ROFS: hit ratio is 93.88%
Powering off.
2021-04-21T13:22:17.382768: Start
2021-04-21T13:22:17.382851: API socket-less: True
2021-04-21T13:22:17.382855: Firecracker ready
2021-04-21T13:22:17.382940: Configured VM
2021-04-21T13:22:17.382945: Added disk
2021-04-21T13:22:17.382948: Created OSv VM with cmdline: --power-off-on-abort --verbose --nopci --rootfs=rofs /hello
{
   "machine-config": {
      "vcpu_count": 1,
      "mem_size_mib": 128,
      "ht_enabled": false
   },
   "drives": [
      {
         "drive_id": "rootfs",
         "path_on_host": "/home/david-griffin/projects/firecracker/osv/scripts/../build/last/usr.raw",
         "is_root_device": false,
         "is_read_only": false
      }
   ],
   "boot-source": {
      "kernel_image_path": "/home/david-griffin/projects/firecracker/osv/scripts/../build/last/kernel.elf",
      "boot_args": "--power-off-on-abort --verbose --nopci --rootfs=rofs /hello"
   }
}
2021-04-21T13:22:17.385972: Waiting for firecracker process to terminate
2021-04-21T13:22:17.586150: End
OSv v0.55.0-237-g7cd33926
mempool: add range at:      ffff800000956434
mempool: add range of size: 00000000076a9bcc
mempool: add range at:      ffff800000000001
mempool: add range of size: 000000000009fbff
mempool: add range at:      ffff800000100000
mempool: add range of size: 0000000000100000
1 CPUs detected
Firmware vendor: Unknown

*percpu_l1 = 0x175040
addr: 0x0000000000000008    mmu::page_size: 4096     elf:mspa: 0x0000000000000000
trying to execute or access missing symbol
[backtrace]
addr: 0x0000000000000008    mmu::page_size: 4096     elf:mspa: 0x0000000000000000

Prior to adding the physical memory info. patch, I setup a Fedora 33 system (again under my VMWare Fusion) environment, installed the same OSv sources, built the same image with the same commands and ran the same test. I was unable to reproduce either startup failure over two attempts each running approx. 50K iterations.
Fedora 33 is running a later kernel (5.11.11) compared to both my Ubuntu 20.04 (5.8.0) and Alpine (5.10.13). Could there be some subtle kernel interaction? 

David Smith

unread,
Apr 22, 2021, 11:42:54 AM4/22/21
to OSv Development
Further to my previous reply, I've now been able to reproduce the original bsd: initializing Assert failure on my Fedora 33 system.
It did require some specific actions, which seem to invoke the failure relatively quickly. These more closely align the firecracker instance launch method with our test system, where we launch a few different OSv VMs under firecracker (on Alpine) in quick succession. On our test system we see a much higher failure rate, as high as 1 in 20 launches.

Using latest OSv source (pulled today), with no modifications (i.e. none of the new instrumenting changes you've provided in relation to this issue applied):

build -j4 fs=rofs image=native-example
firecracker.py -V (just to prove that it runs and to obtain the converted raw ROFS image file needed by firecracker).

I then duplicated the kernel.elf and usr.raw files to:
kernel1.elf
kernel2.elf
kernel3.elf
usr1.raw
usr2.raw
usr3.raw

This probably isn't necessary, but for the avoidance of doubt, I used this approach rather than sharing the same files across multiple firecracker instances.

I then extracted the firecracker json config generated by the firecracker.py -V command into 3 separate json config files. In each file, I modified the paths for the kernel.elf and usr.raw to reference one of the numbered versions of the same file. Here's an example of the resulting content:

{
   "machine-config": {
      "vcpu_count": 1,
      "mem_size_mib": 128,
      "ht_enabled": false
   },
   "drives": [
      {
         "drive_id": "rootfs",
         "path_on_host": "/home/test/usr1.raw",
         "is_root_device": false,
         "is_read_only": false
      }
   ],
   "boot-source": {
      "kernel_image_path": "/home/test/kernel1.elf",
      "boot_args": "--verbose --nopci --rootfs=rofs /hello"
   }
}

Finally, in three different terminal windows, I ran a script which performs the usual while forever loop launching the firecracker instance, using a different numbered json config. in each terminal window and logging to a different log file in each terminal window. Here's an example of the script content:

Script 1
#!/bin/bash
rm test1.log; while true; do firecracker --no-api --config-file firecracker1.json >> test1.log ; done

Script 2
#!/bin/bash
rm test2.log; while true; do firecracker --no-api --config-file firecracker2.json >> test2.log ; done

Script 3
#!/bin/bash
rm test3.log; while true; do firecracker --no-api --config-file firecracker3.json >> test3.log ; done


On my system, running these three scripts concurrently invokes the Assert condition within 30 seconds of the start of the 3rd script.
Interestingly, when one fails, all 3 seem to fail, though I can't be sure this is happening at exactly the same time. 

Hopefully this provides a mechanism by which you can reproduce the issue.

Waldek Kozaczuk

unread,
Apr 27, 2021, 7:34:43 PM4/27/21
to OSv Development
I have run the exact same test - 3 terminals each firing the command as you have it above. And I could NOT reproduce it after 50K runs or so in each terminal.

Now the difference is that my setup does NOT involve nested virtualization. Both of my MBPros have Ubuntu and Fedora installed natively (dual or triple boot mode). 

I have never run firecracker nor OSv on firecracker in such a setup. Until this point, I was not even aware that such a possibility exists on Mac (this page - https://kb.vmware.com/s/article/80785 - seems to point there are some limitations). Given all that it could be that this is a bug in OSv or some other layer involved in this nested virtualization setup.

I know that Firecracker is used in production on the so-called 'bare-metal' EC2 instances (i3, etc) that in reality involve nested virtualization (Nitro hypervisor) but I am not sure how well firecracker is supported in nested virtualization setup in general besides that.

Do your experiments target nested virtualization by design or is it just a matter of convenience?

Have you tried to reproduce this problem in not nested virtualization setup?

Waldek

PS. I do not have VMware Fusion handy on any of my MBPros machines but I guess I could install it and see if I can reproduce your issue. Is VMware Fusion free?

David Smith

unread,
Apr 28, 2021, 3:34:18 AM4/28/21
to OSv Development
That's disappointing, I'd hoped this would be reproducible on your setup. As you say, perhaps this is an indication that it is somehow connected with the fact that our environment is using nested virtualisation.

The VMWare KB article you referenced isn't relevant to my setup, as it relates to MacOS 11.x and I'm running MacOS 10.15.7 (as the article indicates, there were some significant MacOS changes in 11.x which impacted VMWare's virtualisation capabilities, so I'm avoiding the upgrade for the moment).

Our target environment will be using nested virtualisation (Firecracker launched OSv VMs running on an Alpine Linux VM under a VMWare ESXi host) and as indicated previously, we see this same failure in our test environment which replicates the target environment.

I have not tried to reproduce this on a non-nested virtualisation setup. I'll look into this.

VMWare Fusion Pro is available as a fully-functional 30-day free trial from https://www.vmware.com/products/fusion/fusion-evaluation.html
If you do go down the route of trying it under Fusion, to enable nested virtualisation you will need to modify the following settings for the Ubuntu/Fedora VM:

Settings->Processors & Memory->Advanced:
    Enable hypervisor applications in this virtual machine
    Enable code profiling application in this virtual machine (can be enabled optionally for Ubuntu, but seems to be required for Fedora)

David

Waldek Kozaczuk

unread,
Apr 29, 2021, 7:04:02 PM4/29/21
to OSv Development
I have installed VMWare Fusion Pro on my other Mac (OSX - 10.15.7, 2.3GHz, 8-core, i9) with Fedora 33 server in it, enabled the 2 settings to allow KVM in nested virtualization and I was able to replicate the crashes.

So there is most likely something about nested virtualization that possibly triggers some bug in OSv.

Now that I can re-produce it, I will try to figure out what is wrong.

Waldek

Waldek Kozaczuk

unread,
Apr 29, 2021, 9:28:05 PM4/29/21
to OSv Development
And I was also able to replicate it on QEMU running OSv in PVH mode on microvm machine (similar to how firecracker runs it) again in nested virtualization.

./scripts/run.py -k -p qemu_microvm --nics=0 -c 1


I have used the same recipe - 3 terminals running similar commands to start C native example on ROFS image. After I connected with gdb I saw this:


(gdb) osv info threads

  id  address             name            cpu  status         vruntime  total_cpu_time location

   1 (0xffff800000014040) reclaimer       cpu0 waiting     6.51405e-24            1229 condvar::wait(lockfree::mutex*, sched::timer*) at core/condvar.cc:43

   2 (0xffff80000004b040) kvm_wall_clock_ cpu0 waiting     4.76921e-24            8226 std::_Function_handler<void(), kvmclock::kvmclock()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/10/bits/std_function.h:291

   3 (0xffff800000068040) page-access-sca cpu0 queued      1.92449e-23           33196 pagecache::access_scanner::run() at core/pagecache.cc:695

   4 (0xffff800000080040) page_pool_l2    cpu0 queued      1.40362e-23           24193 memory::page_pool::l2::fill_thread() at core/mempool.cc:1435

   5 (0xffff80000008d040) itimer-real     cpu? unstarted             0               0 ?? at arch/x64/entry.S:112

   6 (0xffff800000094040) itimer-virt     cpu? unstarted             0               0 ?? at arch/x64/entry.S:112

   7 (0xffff800000150040) balancer0       cpu0 queued      4.63251e-21         8154897 memory::page_pool::l1::alloc_page_local() at core/mempool.cc:1403

   8 (0xffff800000168040) rcu0            cpu0 waiting     2.04856e-23           25420 osv::rcu::cpu_quiescent_state_thread::do_work() at core/rcu.cc:178

   9 (0xffff800000180040) idle0           cpu0 queued              inf           12481 std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/10/bits/std_function.h:291

  10 (0xffff800000197040) >init           cpu0 waiting     7.75482e-24           13338 std::_Function_handler<void(), sched::thread::reaper::reaper()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/10/bits/std_function.h:291

  11 (0xffff8000001b3040) thread taskq    cpu0 waiting     1.16969e-23           10158 synch_port::_msleep(void*, mtx*, int, char const*, int) at bsd/porting/synch.cc:101

  12 (0xffff8000001bf040) callout         cpu0 waiting     1.15418e-23            9891 _callout_thread() at bsd/porting/callout.cc:118

  13 (0xffff8000001d9040) page_pool_l1_0  cpu0 running     -2.4551e-21          709254 sched::thread::switch_to() at arch/x64/arch-switch.hh:108

Number of threads: 13

(gdb) bt

#0  abort (fmt=fmt@entry=0x4063f860 "Assertion failed: %s (%s: %s: %d)\n") at runtime.cc:106

#1  0x0000000040237f29 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at runtime.cc:140

#2  0x0000000040390970 in page_fault (ef=0xffff80000094a048) at arch/x64/mmu.cc:37

#3  <signal handler called>

#4  0x00000000403e6266 in memory::page_pool::l1::pop (this=0xffff8000001d7040) at core/mempool.cc:1191

#5  memory::page_pool::l1::alloc_page_local () at core/mempool.cc:1411

#6  0x00000000403e8cfa in memory::page_pool::l1::alloc_page () at core/mempool.cc:1174

#7  memory::untracked_alloc_page () at core/mempool.cc:1671

#8  memory::pool::add_page (this=0x4090c968 <memory::malloc_pools+72>) at core/mempool.cc:254

#9  0x00000000403e8e18 in memory::pool::alloc (this=0x4090c968 <memory::malloc_pools+72>) at core/mempool.cc:221

#10 0x00000000403e8f9c in std_malloc (size=8, alignment=8) at core/mempool.cc:1782

#11 0x00000000403e9457 in malloc (size=8) at core/mempool.cc:2001

#12 0x00000000404b0439 in operator new(unsigned long) ()

#13 0x0000000040354018 in __gnu_cxx::new_allocator<elf::object*>::allocate (__n=1, this=<optimized out>) at /usr/include/c++/10/ext/new_allocator.h:103

#14 std::allocator_traits<std::allocator<elf::object*> >::allocate (__a=..., __n=1) at /usr/include/c++/10/bits/alloc_traits.h:460

#15 std::_Vector_base<elf::object*, std::allocator<elf::object*> >::_M_allocate (__n=1, this=0xffff8000001dd2e0)

    at /usr/include/c++/10/bits/stl_vector.h:346

#16 std::vector<elf::object*, std::allocator<elf::object*> >::reserve (__n=1, this=0xffff8000001dd2e0) at /usr/include/c++/10/bits/vector.tcc:78

#17 elf::program::modules_get (this=0xffffa0000009adf0) at core/elf.cc:1810

#18 0x00000000403548d0 in elf::program::with_modules<elf::program::lookup_addr(void const*)::<lambda(const elf::program::modules_list&)> > (f=..., 

    this=0xffffa0000009adf0) at include/osv/elf.hh:702

#19 elf::program::lookup_addr (this=0xffffa0000009adf0, 

    addr=addr@entry=0x40237f28 <__assert_fail(char const*, char const*, unsigned int, char const*)+24>) at core/elf.cc:1621

#20 0x00000000404273ec in osv::lookup_name_demangled (addr=0x40237f28 <__assert_fail(char const*, char const*, unsigned int, char const*)+24>, 

    buf=0xffff8000001dd7a0 "", len=1024) at core/demangle.cc:47

#21 0x0000000040237a1a in print_backtrace () at runtime.cc:86

#22 0x0000000040237edc in abort (fmt=fmt@entry=0x4063f860 "Assertion failed: %s (%s: %s: %d)\n") at runtime.cc:122

#23 0x0000000040237f29 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at runtime.cc:140

#24 0x0000000040390970 in page_fault (ef=0xffff8000001de068) at arch/x64/mmu.cc:37

#25 <signal handler called>

#26 0x00000000403e860e in memory::page_pool::l1::push (page=0xffff800000a27000, this=0xffff8000001d7040) at core/mempool.cc:1198

#27 memory::page_pool::l1::refill () at core/mempool.cc:1379

#28 0x00000000403e88f5 in memory::page_pool::l1::fill_thread () at core/mempool.cc:1356

#29 0x00000000403eb579 in memory::page_pool::l1::l1(sched::cpu*)::{lambda()#1}::operator()() const (__closure=<optimized out>) at core/mempool.cc:1164

#30 std::__invoke_impl<void, memory::page_pool::l1::l1(sched::cpu*)::{lambda()#1}&>(std::__invoke_other, memory::page_pool::l1::l1(sched::cpu*)::{lambda()#1}&) (__f=...) at /usr/include/c++/10/bits/invoke.h:60

#31 std::__invoke_r<void, memory::page_pool::l1::l1(sched::cpu*)::{lambda()#1}&>(memory::page_pool::l1::l1(sched::cpu*)::{lambda()#1}&) (__fn=...)

    at /usr/include/c++/10/bits/invoke.h:153

#32 std::_Function_handler<void (), memory::page_pool::l1::l1(sched::cpu*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...)

    at /usr/include/c++/10/bits/std_function.h:291

--Type <RET> for more, q to quit, c to continue without paging--c

#33 0x00000000403fbc0a in sched::thread::main (this=0xffff8000001d9040) at core/sched.cc:1267

#34 sched::thread_main_c (t=0xffff8000001d9040) at arch/x64/arch-switch.hh:325

#35 0x0000000040390733 in thread_main () at arch/x64/entry.S:113

(gdb) frame 26

#26 0x00000000403e860e in memory::page_pool::l1::push (page=0xffff800000a27000, this=0xffff8000001d7040) at core/mempool.cc:1198

1198         l1_pool_stats[cpu_id]._nr = nr;

(gdb) p cpu_id

$1 = 0

(gdb) p l1_pool_stats

$2 = std::vector of length 0, capacity 0

(gdb) p l1_pool_stats[0]

Cannot access memory at address 0x0

(gdb) p smp_allocator

$3 = true

The page fault happens at frame 26 - core/mempool.cc:1198. It seems that the l1_pool_stats global variable has address of 0. I do not know how it is possible and  I do not see any obvious bug, yet (I have not had time to study it yet). In any case the l1_pool_stats is some vector that keeps statistics about L1 memory pool.

So my hunch was to comment out 2 lines where we access this variable and see what happens:

diff --git a/core/mempool.cc b/core/mempool.cc
index bd8e2fcf..8f843e99 100644
--- a/core/mempool.cc
+++ b/core/mempool.cc
@@ -1188,14 +1188,14 @@ struct l1 {
     void* pop()
     {
         assert(nr);
-        l1_pool_stats[cpu_id]._nr = nr - 1;
+        //l1_pool_stats[cpu_id]._nr = nr - 1;
         return _pages[--nr];
     }
     void push(void* page)
     {
         assert(nr < 512);
         _pages[nr++] = page;
-        l1_pool_stats[cpu_id]._nr = nr;
+        //l1_pool_stats[cpu_id]._nr = nr;
 
     }
     void* top() { return _pages[nr - 1]; }

And miraculously the crash seems to go away. I ran the test with over 10K iterations in each terminal on QEMU and over 20K iterations on Firecracker. And I could not reproduce it after this patch.

I wonder if you will see the same behavior. I wonder what the root cause of this crash is. And who knows if that is a symptom of something weirder and we will see other crashes even with these 2 lines commented out

Waldek

PS. On a side note, the nested virtualization makes OSv boot time 7-9 times slower (~40ms vs 5-7ms). QEMU microvm is actually faster than firecracker as far as boot time goes - on average it takes ~ 22ms to boot.
 In both cases, there is no networking device which would add some time.

David Smith

unread,
Apr 30, 2021, 9:28:46 AM4/30/21
to OSv Development
Waldek, that's great news and again I really appreciate your efforts on this. If I'm interpreting your test results correctly, you seem to have also ruled out firecracker as playing a part in this issue.

On my side, I was also busy yesterday:

a. I setup a Fedora 33 OS on an Intel platform and ran the same parallel tests to see if it still failed under nested virtualisation. I could not reproduce the failure (more than 350K iterations without issue).

b. I then tried another nested virtualisation setup to try to rule out a VMWare (ESXi/Fusion) related issue. I used a Fedora 33 VM under ProxMox on an Intel platform and repeated the same parallel tests. I was able to easily reproduce the failure, with the very same symptoms.

c. Back on the native Fedora 33 platform, I then tried creating a libvirtd VM with Fedora 33 and running the same parallel tests in that VM. Again I was able to easily reproduce the failure, with the same symptoms.

This morning, I applied your 2 line change to core/mempool.cc on both my development system and configuration c above. After ~13K iterations on configuration c and ~6K iterations on my development system, I did not see any failures.

David

Nadav Har'El

unread,
Apr 30, 2021, 12:27:47 PM4/30/21
to Waldek Kozaczuk, OSv Development
I have a guess.

I think that the vector is valid - it was simply never resize()ed to the right size, so it has zero size. This is a vector, not a hash_map, so doing l1_pool_stats[cpu_id]._nr = nr does *NOT* create the cpu_id'th element if not already existing - it must have existed previously.

I see the vector is resized in

static sched::cpu::notifier _notifier([] () {
   ...
    l1_pool_stats.resize(sched::cpus.size());
}

One guess I have on a problem here is that it seems you call this l1_pool_stats.resize(sched::cpus.size())  again and again for each of the CPUs. Maybe it even happens in parallel on two CPUs and cause a big mess if it happens (and explains why this problem happens rarely and not every time)? Can you try doing it only once - perhaps in the same place you decide to do do smp_allocator = true, or using some other mutual exclusion trick?

If this doesn't help, please also take a look at the initialization order. The l1_pool_stats initialization doesn't have a __attribute__((init_priority(...))), and I don't know how it is ordered compared to other initializations and to that notifier thing. Maybe you can add printouts to debug if there's a problem there.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.

Waldek Kozaczuk

unread,
May 3, 2021, 4:51:14 PM5/3/21
to OSv Development
Nadav,
I have sent a patch "mempool: resize l1_pool_stats once" that should address this.

David,
Could you also please test this patch to see if it fixes the crashes?

Waldek 

David Smith

unread,
May 4, 2021, 11:52:14 AM5/4/21
to OSv Development
Good news.
I tested the patch on my Ubuntu 20.04 system under VMWare Fusion and on my Fedora 33 libvirtd VM under Fedora 33 on a physical host.
The two systems completed 90K and 136K iterations respectively of the same test without issue.

Reply all
Reply to author
Forward
0 new messages