Can't download "large" files using python3 on top of Firecracker: ...virtio/net.rs:257] Receiving buffer is too small to hold frame of current size

85 views
Skip to first unread message

Henrique Fingler

unread,
Sep 18, 2019, 7:06:46 PM9/18/19
to OSv Development
 First of all, thank you for being active and helping out users!

 Here's my setup: I'm building a python3 image, with a script that does

 response = urllib.request.urlopen("http://<a 1mb file>")

 The execution just hangs for a few seconds, then a storm of warnings from Firecracker show up:

<A lot of the same warning>
2019-09-18T17:50:36.841517975 [anonymous-instance:WARN:devices/src/virtio/net.rs:257] Receiving buffer is too small to hold frame of current size
2019-09-18T17:50:36.841529410 [anonymous-instance:WARN:devices/src/virtio/net.rs:257] Receiving buffer is too small to hold frame of current size
2019-09-18T17:50:36.841569665 [anonymous-instance:WARN:devices/src/virtio/net.rs:257] Receiving buffer is too small to hold frame of current size
2019-09-18T17:50:36.841584097 [anonymous-instance:WARN:devices/src/virtio/net.rs:257] Receiving buffer is too small to hold frame of current size
2019-09-18T17:50:36.841656060 [anonymous-instance:WARN:devices/src/virtio/net.rs:257] Receiving buffer is too small to hold frame of current size


 If the file is smaller, let's say 256B, it works fine

 Could this be a bug in the virtio implementation of OSv or is it a Firecraker thing?
 I'll start to investigate the issue. I'm asking because you might have seen this problem.

 Thanks!

Asias He

unread,
Sep 18, 2019, 9:23:21 PM9/18/19
to Henrique Fingler, OSv Development
Try disable gso/tso in osv viriot-net driver.

 

 Thanks!

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/965f0cad-d074-4b18-b998-ffe5777851a2%40googlegroups.com.


--
Asias

Henrique Fingler

unread,
Sep 19, 2019, 12:09:19 AM9/19/19
to OSv Development
 How do I go about disabling GSO?
 I think I found how to disable TSO (diff below), but I can't find where to disable GSO. Disabling just TSO didn't fix it.
 
 The loop where Firecracker gets stuck (fn rx_single_frame) tries to write an entire frame (7318 bytes) and it notices it doesn't fit into all the descriptors of the guest.
 It seems that if it fails to write the entire frame, it marks descriptors as used, but retries to deliver the whole frame again. Maybe the OSv buffer isn't big enough and FC just loops forever?


virtio-net.cc:

                   | (1 << VIRTIO_NET_F_STATUS)     \
                 
| (1 << VIRTIO_NET_F_CSUM)       \
                 
| (1 << VIRTIO_NET_F_GUEST_CSUM) \
-                 | (1 << VIRTIO_NET_F_GUEST_TSO4) \
+                 | (0 << VIRTIO_NET_F_GUEST_TSO4) \
                 
| (1 << VIRTIO_NET_F_HOST_ECN)   \
-                 | (1 << VIRTIO_NET_F_HOST_TSO4)  \
+                 | (0 << VIRTIO_NET_F_HOST_TSO4)  \
                 
| (1 << VIRTIO_NET_F_GUEST_ECN)
                 
| (1 << VIRTIO_NET_F_GUEST_UFO)


 Thanks!
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.


--
Asias

Waldek Kozaczuk

unread,
Sep 19, 2019, 12:02:53 PM9/19/19
to OSv Development
Most likely it is a bug on OSv side. It could be in the virtio-net features negotiation logic - https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L351-L378 or https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L283-L297


This section of VirtIO spec would apply then:

"5.1.6.3.1 Driver Requirements: Setting Up Receive Buffers
  • If VIRTIO_NET_F_MRG_RXBUF is not negotiated:
    • If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD populate the receive queue(s) with buffers of at least 65562 bytes.
    • Otherwise, the driver SHOULD populate the receive queue(s) with buffers of at least 1526 bytes.
  • If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer MUST be at greater than the size of the struct virtio_net_hdr."

This makes me think that our receive buffers are only 1 page (4096 bytes) large so whenever Firecracker tries to send a buffer bigger than that OSv bounces. Think this OSv code applies - https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L587-L609. It seems the virtio ring buffers are alway 1 page big - see alloc_page call. 

So maybe on OSv side we need to allow for bigger buffers (64K) when VIRTIO_NET_F_MRG_RXBUF is off which would require changes to drivers/virtio-vring.cc. I wonder if on QEMU this feature is on and that is why we never see this issue of QEMU, do we? It would be nice to run same Python program in qemu and see if VIRTIO_NET_F_MRG_RXBUF is on or off.

This is all my speculation and I might be off so maybe others can shed more light on it.

Waldek

Henrique Fingler

unread,
Sep 19, 2019, 6:59:22 PM9/19/19
to OSv Development
 I'm trying to check if it works on qemu, but scripts/run and capstan run set the network differently than Firecracker's script.
 With the regular user networking (no "-n") it works. When I try running it with with "-n -b br0" or just "-n" the execution hangs after printing OSv version.

 I'm trying to manually hack the allocation of a single but larger size for the receive queue and disabling VIRTIO_NET_F_MRG_RXBUF on the driver just to check what Firecracker does, but it seems that during compilation a qemu instance of the unikernel is launched. Is this a test? Can this be disabled?

 Also, is there a way to find which hypervisor OSv is running on top of? This would help switching between feature sets in virtio-net.

 Thanks!

Waldek Kozaczuk

unread,
Sep 19, 2019, 10:58:49 PM9/19/19
to OSv Development
This patch seems to do the job:

diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc
index e78fb3af..fe5f1ae0 100644
--- a/drivers/virtio-net.cc
+++ b/drivers/virtio-net.cc
@@ -375,6 +375,8 @@ void net::read_config()
     net_i("Features: %s=%d,%s=%d", "Host TSO ECN", _host_tso_ecn, "CSUM", _csum);
     net_i("Features: %s=%d,%s=%d", "Guest_csum", _guest_csum, "guest tso4", _guest_tso4);
     net_i("Features: %s=%d", "host tso4", _host_tso4);
+
+    printf("VIRTIO_NET_F_MRG_RXBUF: %d\n", _mergeable_bufs);
 }
 
 /**
@@ -591,16 +593,19 @@ void net::fill_rx_ring()
     vring* vq = _rxq.vqueue;
 
     while (vq->avail_ring_not_empty()) {
-        auto page = memory::alloc_page();
+        //auto page = memory::alloc_page();
+        auto page = malloc(16 * memory::page_size);
 
         vq->init_sg();
-        vq->add_in_sg(page, memory::page_size);
+        vq->add_in_sg(page, memory::page_size * 16);
         if (!vq->add_buf(page)) {
-            memory::free_page(page);
+            //memory::free_page(page);
+            free(page);
             break;
         }
         added++;
     }
+    printf("net: Allocated %d pages\n", added * 16);
 
     trace_virtio_net_fill_rx_ring_added(_ifn->if_index, added);
 

But for sure it is just a hack. I am not sure if we should actually allocate 16 pages in one shot (which I am doing here) vs create single chained buffer made of 16 pages. Not sure how we should extract data if chained. 

I have also found this (based on a comment in firecracker code) - https://bugs.chromium.org/p/chromium/issues/detail?id=753630. As you can see VIRTIO_NET_F_MRG_RXBUF is much more memory efficient and flexible which is what QEMU implements.

I am interested in what others think how we should handle this properly.

Either way I think it would not hurt creating an issue against Firecracker to ask supporting VIRTIO_NET_F_MRG_RXBUF.

Waldek

Henrique Fingler

unread,
Sep 19, 2019, 11:34:56 PM9/19/19
to OSv Development
 I agree that this is mostly a thing that should be done on Firecracker. For now, if there's a way to detect the hypervisor we can switch that. Personally I'm only using Firecracker so I'll leave this in.

 I wrote pretty much the same code but instead of malloc I used memory::alloc_hugepage( but it got stuck at compilation when qemu was started, do you happen to know the reason? I thought we also had to force the length of the receiving queue to one, maybe that part was the one breaking osv under qemu.
 And the size I was allocating was 17 pages because the spec says 65562, which is 16 pages plus 26 bytes.
 Did you also disable VIRTIO_NET_F_MRG_RXBUF in the feature mask or no, since Firecracker just ignores it?

 I'll patch that in and test it out.

 Thanks!

Waldek Kozaczuk

unread,
Sep 20, 2019, 8:56:35 AM9/20/19
to OSv Development
See my answers below.


On Thursday, September 19, 2019 at 11:34:56 PM UTC-4, Henrique Fingler wrote:
 I agree that this is mostly a thing that should be done on Firecracker. For now, if there's a way to detect the hypervisor we can switch that. Personally I'm only using Firecracker so I'll leave this in.
Instead of detecting what hypervisor we are dealing with, we should simply act accordingly based on what features have been negotiated and agreed between OSv (driver) and hypervisor (device). We should simply follow the VirtIo spec as it says here:
"5.1.6.3.1 Driver Requirements: Setting Up Receive Buffers
  • If VIRTIO_NET_F_MRG_RXBUF is not negotiated:
    • If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD populate the receive queue(s) with buffers of at least 65562 bytes.
    • Otherwise, the driver SHOULD populate the receive queue(s) with buffers of at least 1526 bytes.
  • If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer MUST be at greater than the size of the struct virtio_net_hdr."
Something similar to what Linux does here - https://github.com/torvalds/linux/blob/0445971000375859008414f87e7c72fa0d809cf8/drivers/net/virtio_net.c#L3075-L3080. So only use 17 pages long buffers when we have to. One outstanding question is this - shall we allocate and use a single contiguous block of 17 pages of memory as a single slot in the vring or chain of 17 single page ones like for single large buffer? (the latter is what Linux seems to be doing). The slight advantage of chained one is that it will be easier to find 17 pages of memory than 68K contiguous one under pressure. But handling chained buffer is going to be more complicated. I think memory waste is the same.

Pekka, Nadav,
What do you think we should do?



 I wrote pretty much the same code but instead of malloc I used memory::alloc_hugepage( but it got stuck at compilation when qemu was started, do you happen to know the reason? I thought we also had to force the length of the receiving queue to one, maybe that part was the one breaking osv under qemu.
 
Most likely you build the image with ZFS filesystem which at least now requires OSv to boot so that files can be uploaded to. You can avoid it by using Read-Only FS (fs=rofs). Either way, we should use VIRTIO_NET_F_MRG_RXBUF if QEMU offers it (which happens right now) and you patch should not affect this.

 

 And the size I was allocating was 17 pages because the spec says 65562, which is 16 pages plus 26 bytes.
You are right about 17 pages. 
 Did you also disable VIRTIO_NET_F_MRG_RXBUF in the feature mask or no, since Firecracker just ignores it?
Firecracker "ignores" it in an sense that it is part of how features are negotiated, 

Waldek Kozaczuk

unread,
Sep 20, 2019, 9:04:38 AM9/20/19
to OSv Development
So instead of what we do here for "chained buffer" - https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L593-L603 - if enabled we would do something like this (pseudo-code and mist likely buggy):

 while (vq->avail_ring_not_empty()) {
        vq->init_sg();
        for (int i=0; i < 17; i++) {
           auto page = memory::alloc_page();           
           vq->add_in_sg(page, memory::page_size);
        }

        if (!vq->add_buf(first_page)) {
            memory::free_page(page); // Free 17
            break;
        }
        added++;
    }

We would also have to modify receiving logic part,

You received this message because you are subscribed to a topic in the Google Groups "OSv Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/osv-dev/InlSKnJAfMQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/45103ec2-ab30-40c5-a129-472f51ba32af%40googlegroups.com.

Waldek Kozaczuk

unread,
Sep 20, 2019, 10:30:33 AM9/20/19
to OSv Development


On Friday, September 20, 2019 at 9:04:38 AM UTC-4, Waldek Kozaczuk wrote:
So instead of what we do here for "chained buffer" - https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L593-L603 - if enabled we would do something like this (pseudo-code and mist likely buggy):

 while (vq->avail_ring_not_empty()) {
        vq->init_sg();
        for (int i=0; i < 17; i++) {
           auto page = memory::alloc_page();           
           vq->add_in_sg(page, memory::page_size);
        }

        if (!vq->add_buf(first_page)) {
            memory::free_page(page); // Free 17
            break;
        }
        added++;
    }

We would also have to modify receiving logic part,

On Fri, Sep 20, 2019 at 8:56 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
See my answers below.

On Thursday, September 19, 2019 at 11:34:56 PM UTC-4, Henrique Fingler wrote:
 I agree that this is mostly a thing that should be done on Firecracker. For now, if there's a way to detect the hypervisor we can switch that. Personally I'm only using Firecracker so I'll leave this in.
Instead of detecting what hypervisor we are dealing with, we should simply act accordingly based on what features have been negotiated and agreed between OSv (driver) and hypervisor (device). We should simply follow the VirtIo spec as it says here:
"5.1.6.3.1 Driver Requirements: Setting Up Receive Buffers
  • If VIRTIO_NET_F_MRG_RXBUF is not negotiated:
    • If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD populate the receive queue(s) with buffers of at least 65562 bytes.
    • Otherwise, the driver SHOULD populate the receive queue(s) with buffers of at least 1526 bytes.
  • If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer MUST be at greater than the size of the struct virtio_net_hdr."
Something similar to what Linux does here - https://github.com/torvalds/linux/blob/0445971000375859008414f87e7c72fa0d809cf8/drivers/net/virtio_net.c#L3075-L3080. So only use 17 pages long buffers when we have to. One outstanding question is this - shall we allocate and use a single contiguous block of 17 pages of memory as a single slot in the vring or chain of 17 single page ones like for single large buffer? (the latter is what Linux seems to be doing). The slight advantage of chained one is that it will be easier to find 17 pages of memory than 68K contiguous one under pressure. But handling chained buffer is going to be more complicated. I think memory waste is the same.
More specifically in net::init() after read_config() was executed and features negotiated, but before virt queues setup, we would need to determine the size of the buffers (4K or 68K) based on values of _mergeable_bufs, _guest_tso4 and _guest_ufo per what virtio spec says. We could either add a flag that says "use large buffers" or save size of buffer to be used.

Then net::fill_rx_ring() can allocate buffers accordingly based on the "use large buffers" flag accordingly. We would also need to modify places where those buffers are deallocated - free_buffer_and_refcnt() (may be others?).

Also, I have noticed with my simple patch OSv ends up allocating 256 buffers on Firecracker in fill_rx_ring() on startup which is roughly 16MB which seems way too excessive. So maybe vring::avail_ring_not_empty() needs to be adjusted to account for large buffer size.

All in all it would be nice to experiment and see how much memory OSv uses on Firecracker with all kinds of apps - simple REST API apps service tiny payloads (<4K) and apps that for example download large files. We want both cases to work reasonably well.
To unsubscribe from this group and all its topics, send an email to osv-dev+unsubscribe@googlegroups.com.

Waldek Kozaczuk

unread,
Sep 20, 2019, 2:58:42 PM9/20/19
to OSv Development


On Friday, September 20, 2019 at 8:56:35 AM UTC-4, Waldek Kozaczuk wrote:
See my answers below.

On Thursday, September 19, 2019 at 11:34:56 PM UTC-4, Henrique Fingler wrote:
 I agree that this is mostly a thing that should be done on Firecracker. For now, if there's a way to detect the hypervisor we can switch that. Personally I'm only using Firecracker so I'll leave this in.
Instead of detecting what hypervisor we are dealing with, we should simply act accordingly based on what features have been negotiated and agreed between OSv (driver) and hypervisor (device). We should simply follow the VirtIo spec as it says here:
"5.1.6.3.1 Driver Requirements: Setting Up Receive Buffers
  • If VIRTIO_NET_F_MRG_RXBUF is not negotiated:
    • If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD populate the receive queue(s) with buffers of at least 65562 bytes.
    • Otherwise, the driver SHOULD populate the receive queue(s) with buffers of at least 1526 bytes.
  • If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer MUST be at greater than the size of the struct virtio_net_hdr."
Something similar to what Linux does here - https://github.com/torvalds/linux/blob/0445971000375859008414f87e7c72fa0d809cf8/drivers/net/virtio_net.c#L3075-L3080. So only use 17 pages long buffers when we have to. One outstanding question is this - shall we allocate and use a single contiguous block of 17 pages of memory as a single slot in the vring or chain of 17 single page ones like for single large buffer? (the latter is what Linux seems to be doing). The slight advantage of chained one is that it will be easier to find 17 pages of memory than 68K contiguous one under pressure. But handling chained buffer is going to be more complicated. I think memory waste is the same.

Pekka, Nadav,
What do you think we should do?



 I wrote pretty much the same code but instead of malloc I used memory::alloc_hugepage( but it got stuck at compilation when qemu was started, do you happen to know the reason? I thought we also had to force the length of the receiving queue to one, maybe that part was the one breaking osv under qemu.
 
Most likely you build the image with ZFS filesystem which at least now requires OSv to boot so that files can be uploaded to. You can avoid it by using Read-Only FS (fs=rofs). Either way, we should use VIRTIO_NET_F_MRG_RXBUF if QEMU offers it (which happens right now) and you patch should not affect this.

Here is a capstan doc that should somewhat explain all 3 filesystems OSv offers - https://github.com/cloudius-systems/capstan/blob/master/Documentation/OsvFilesystem.md

Henrique Fingler

unread,
Sep 20, 2019, 3:45:29 PM9/20/19
to OSv Development
 I'll check that out.

Instead of detecting what hypervisor we are dealing with, we should simply act accordingly based on what features have been negotiated and agreed

 Yep, you're right. Five minutes after I hit Post I remembered what "negotiate" means. Whoops.

Also, I have noticed with my simple patch OSv ends up allocating 256 buffers on Firecracker

 That's why I was trying to force the size of the recv queue to one. But this can be done in a smarter way in net::fill_rx_ring() as you said. I'll hack around and see what comes up.
 It also seems that Firecracker has the machinery to implement VIRTIO_NET_F_MRG_RXBUF, but I don't know how complicated it would be to finish it. I might check that out in a few weeks when I have some free time.

 Thanks for all the pointers!

zhiting zhu

unread,
Sep 20, 2019, 3:54:11 PM9/20/19
to Henrique Fingler, OSv Development
Is there any difference on boot time between zfs and rofs? 

To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/64ae1bcf-9506-4a52-8ca6-4b0921981f9f%40googlegroups.com.

Waldek Kozaczuk

unread,
Sep 20, 2019, 5:01:45 PM9/20/19
to zhiting zhu, Henrique Fingler, OSv Development
Yes quite substantial. On firecracker ZFS needs at least 50-60 ms to initialize on my machine. Whereas RoFS images takes 1 millisecond - the smallest native example takes 5-6 ms to boot including RoFS mount and ~10ms in total to execute (10 ms includes that 5-6 ms of boot time). 

Sent from my iPhone
You received this message because you are subscribed to a topic in the Google Groups "OSv Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/osv-dev/InlSKnJAfMQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/CA%2B3q14xYSikYznw1iCkxtO0%2BRqmcrUirShVU-e1_Pwpp3Zd1yw%40mail.gmail.com.

Henrique Fingler

unread,
Sep 26, 2019, 6:12:27 PM9/26/19
to OSv Development
 Waldek, I'm getting a general protection fault when doing some HTTP requests from OSv, do you think it might be related to the hack to make it work on Firecracker?

 Here's the MWE, a few requests go through, then it faults.

import urllib.request
for i in range (10):
response = urllib.request.urlopen("http://192.168.0.20:9999/1.bin")
response.read()

 Here's the trace:

[registers]
RIP
: 0x00000000403e9fd6 <memory::page_range_allocator::remove_huge(memory::page_range&)+38>
RFL
: 0x0000000000010202  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX
: 0x000000000000000d  RBX: 0xffff800003f2f000  RCX: 0x6d314e7578313731  RDX: 0x6270415369447065
RSI
: 0xffff800003f2f000  RDI: 0xffff800003f2f008  RBP: 0xffff800000074e20  R8:  0x00000000000000fc
R9
:  0xffff80000094d7e8  R10: 0x0000000000003f40  R11: 0x1144029210842110  R12: 0x0000000040911300
R13
: 0xffff80000094d7e8  R14: 0x0000000000003f40  R15: 0x1144029210842110  RSP: 0xffff800000074e00
general protection fault


[backtrace]
0x000000004039cabc <general_protection+140>
0x0000000040399fa2 <???+1077518242>
0x00000000403e4d66 <memory::page_range_allocator::free(memory::page_range*)+166>
0x00000000403e4ecb <memory::page_pool::l2::free_batch(memory::page_pool::page_batch&)+91>
0x00000000403e5118 <memory::page_pool::l2::unfill()+504>
0x00000000403e6776 <memory::page_pool::l2::fill_thread()+358>
0x00000000403ea7db <std::_Function_handler<void (), memory::page_pool::l2::l2()::{lambda()#1}>::_M_invoke(std::_Any_data const&)+11>
0x00000000403f9746 <thread_main_c+38>
0x000000004039af62 <???+1077522274>


 The hack in virtio-net.cc is similar to yours:

diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc
index e78fb3af
..0065e8d7 100644
--- a/drivers/virtio-net.cc
+++ b/drivers/virtio-net.cc
@@ -590,13 +590,31 @@ void net::fill_rx_ring()
     
int added = 0;
     vring
* vq = _rxq.vqueue;
 
+#define HACKQ 1
+

     
while (vq->avail_ring_not_empty()) {
-        auto page = memory::alloc_page();
+
+        #if HACKQ
+            auto page = malloc(17 * memory::page_size);
+        #else
+            auto page = memory::alloc_page();
+        #endif

 
         vq
->init_sg();
-        vq->add_in_sg(page, memory::page_size);
+
+        #if HACKQ
+            vq->add_in_sg(page, 17 * memory::page_size);
+        #else
+            vq->add_in_sg(page, memory::page_size);
+        #endif
+

         
if (!vq->add_buf(page)) {
-            memory::free_page(page);
+            #if HACKQ
+                free(page);
+            #else
+                memory::free_page(page);
+            #endif
+
             
break;
         
}
         added
++;


 Maybe it's related to the size of the buffer allocated for virtio? I'll try to force it to size one and see what happens.

 Best.

--
You received this message because you are subscribed to a topic in the Google Groups "OSv Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/osv-dev/InlSKnJAfMQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to osv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/CA%2B3q14xYSikYznw1iCkxtO0%2BRqmcrUirShVU-e1_Pwpp3Zd1yw%40mail.gmail.com.

Waldek Kozaczuk

unread,
Oct 4, 2019, 3:28:01 PM10/4/19
to OSv Development
Hi,

So there were a couple of issues with my and your patch:
1) We should NOT be using straight malloc() to allocate 17 pages of memory. It needs to be page-aligned, so aligned_alloc() is the correct choice.
2) I missed recognizing that we also need to free() instead of free_page() in the right place.

Please see improved patch - better hack ;-). I still think we might be allocating 17-too many 17-pages large buffers. But at least we recognize VIRTIO_NET_F_MRG_RXBUF is off/on and choose correct page size accordingly (more/less).

diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc
index e78fb3af..0df45dce 100644
--- a/drivers/virtio-net.cc
+++ b/drivers/virtio-net.cc
@@ -63,6 +63,7 @@ extern int maxnic;
 namespace virtio {
 
 int net::_instance = 0;
+bool net::use_large_buffer = false;
 
 #define net_tag "virtio-net"
 #define net_d(...)   tprintf_d(net_tag, __VA_ARGS__)
@@ -375,6 +376,9 @@ void net::read_config()
     net_i("Features: %s=%d,%s=%d", "Host TSO ECN", _host_tso_ecn, "CSUM", _csum);
     net_i("Features: %s=%d,%s=%d", "Guest_csum", _guest_csum, "guest tso4", _guest_tso4);
     net_i("Features: %s=%d", "host tso4", _host_tso4);
+
+    printf("VIRTIO_NET_F_MRG_RXBUF: %d\n", _mergeable_bufs);
+    use_large_buffer = !_mergeable_bufs;
 }
 
 /**
@@ -473,7 +477,10 @@ void net::receiver()
             // Bad packet/buffer - discard and continue to the next one
             if (len < _hdr_size + ETHER_HDR_LEN) {
                 rx_drops++;
-                memory::free_page(page);
+                if (use_large_buffer)
+                   free(page);
+                else
+                   memory::free_page(page);
 
                 continue;
             }
@@ -581,7 +588,13 @@ void net::free_buffer_and_refcnt(void* buffer, void* refcnt)
 void net::do_free_buffer(void* buffer)
 {
     buffer = align_down(buffer, page_size);
-    memory::free_page(buffer);
+    if (use_large_buffer) {
+        printf("--> Freeing 17 pages: %p\n", buffer);
+        free(buffer);
+    } else {
+        printf("--> Freeing single page: %p\n", buffer);
+        memory::free_page(buffer);
+    }
 }
 
 void net::fill_rx_ring()
@@ -591,12 +604,23 @@ void net::fill_rx_ring()
     vring* vq = _rxq.vqueue;
 
     while (vq->avail_ring_not_empty()) {
-        auto page = memory::alloc_page();
+        void *page;
+        int pages_num = use_large_buffer ? 17 : 1;
+        if (use_large_buffer) {
+            page = aligned_alloc(memory::page_size, pages_num * memory::page_size);
+            printf("--> Allocated 17 pages: %p\n", page);
+        } else {
+            page = memory::alloc_page();
+            printf("--> Allocated single page: %p\n", page);
+        }
 
         vq->init_sg();
-        vq->add_in_sg(page, memory::page_size);
+        vq->add_in_sg(page, pages_num * memory::page_size);
         if (!vq->add_buf(page)) {
-            memory::free_page(page);
+            if (use_large_buffer)
+               free(page);
+            else
+               memory::free_page(page);
             break;
         }
         added++;
diff --git a/drivers/virtio-net.hh b/drivers/virtio-net.hh
index adc93b39..e6725231 100644
--- a/drivers/virtio-net.hh
+++ b/drivers/virtio-net.hh
@@ -220,6 +220,7 @@ public:
     static void free_buffer_and_refcnt(void* buffer, void* refcnt);
     static void free_buffer(iovec iov) { do_free_buffer(iov.iov_base); }
     static void do_free_buffer(void* buffer);
+    static bool use_large_buffer;
 
     bool ack_irq();
 

I have tested it with your python example downloading 100 times 5MB large file and all seems to be working fine. Feel free to remove debug statements :-)

Waldek

Waldek Kozaczuk

unread,
Oct 5, 2019, 1:46:29 PM10/5/19
to OSv Development
I have opened a new issue on OSv side - https://github.com/cloudius-systems/osv/issues/1058 and one on the firecracker side - https://github.com/firecracker-microvm/firecracker/issues/1314.

Nadav Har'El

unread,
Oct 6, 2019, 4:39:40 AM10/6/19
to Waldek Kozaczuk, OSv Development
On Fri, Oct 4, 2019 at 10:28 PM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
Hi,

So there were a couple of issues with my and your patch:
1) We should NOT be using straight malloc() to allocate 17 pages of memory. It needs to be page-aligned, so aligned_alloc() is the correct choice.

Even better, please use alloc_phys_contiguous_aligned() (and the matching free_phys_contiguous_aligned()) which guarantee that the allocated memory is physicall contiguous (you need that, don't you?). Normal malloc() or aligned_alloc() does this, but doesn't guarantee it - we may want to change this in the future (see for example https://github.com/cloudius-systems/osv/issues/854) - and IIRC, it doesn't happen even today when memory debugging is enabled in the build.
 

Henrique Fingler

unread,
Oct 6, 2019, 2:14:09 PM10/6/19
to OSv Development
 That makes sense. I did try using the hugepage alloc that I had before but it still crashed. Thank you for the response!
 I'll try it out with alloc_phys_contiguous_aligned.

Waldek Kozaczuk

unread,
Oct 6, 2019, 2:16:01 PM10/6/19
to Henrique Fingler, OSv Development
Did you try my latest patch from Friday that also frees memory?

To unsubscribe from this group and all its topics, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/ae21cb99-1106-4dc9-bae4-55be207ae989%40googlegroups.com.

Henrique Fingler

unread,
Oct 6, 2019, 2:30:11 PM10/6/19
to OSv Development
 Not yet, I'll probably do it today and post the results on this thread.
To unsubscribe from this group and all its topics, send an email to osv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/ae21cb99-1106-4dc9-bae4-55be207ae989%40googlegroups.com.

Waldek Kozaczuk

unread,
Oct 7, 2019, 10:26:31 PM10/7/19
to OSv Development
I have just sent a proper patch addressing this issue. Feel free to apply it and test it.

Waldek Kozaczuk

unread,
Nov 2, 2019, 10:49:56 AM11/2/19
to OSv Development
Hi,

I have just applied this patch to master. It seems to be working based on number of workloads I have tested with new automated scripts I have just added as well. 

To unsubscribe from this group and all its topics, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/f2e53da2-cb48-4fd1-9d3d-5f0640386851%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages