[RFC] Make ZFS optional as a shared library

8 views
Skip to first unread message

Waldemar Kozaczuk

unread,
Dec 10, 2021, 11:05:56 PM12/10/21
to osv...@googlegroups.com, Waldemar Kozaczuk
Originally I thought that extracting ZFS out of the kernel
as a shared library would not be as easy as it it has turned out to
be. Obviously after figuring couple of important gotchas which I
describe below and in the code comments.

The advantages of moving ZFS to a separate library are following:
- kernel becomes ~900K smaller
- there are at least 10 less threads needed to run non-ZFS image
(running ROFS image on 1 cpu requires only 25 threads)

I also hope this patch provides a blueprint of how we could implement
another ext2/3/4 filesystem driver (see #1179) or other true kernel modules.

The essence of this patch are changes to the main makefile to build
new libsolaris.so and various ZFS-related parts of the kernel like
pagecache, arc_shrinker and ZFS dev driver to make them call into
libsolaris.so upon dynamically registering handful of callbacks.

The new libsolaris.so is mainly composed of the solaris and zfs sets
as defined in the makefile (and not part of the kernel anymore)
plus bsd RPC code (xdr*), kobj and finally new fs/zfs/zfs_initialize.c
which provides main INIT function - zfs_initialize(). The
zfs_initialize() initializes various ZFS resources like threads and
memory and registers various callback functions into the main kernel
(see comments in zfs_initialize.c).

Two important gotchas I have discovered are:
1) The libsolaris.so needs to build with BIND_NOW to make all symbols
resolved eagerly to avoid page faults to resolve those symbols later
if the ZFS code in libsolaris.so is called to resolve other faults.
This would cause deadlocks.
2) The libsolaris.so needs the osv-mlock note so that dynamic linker
would populate the mappings. This is similar to above to avoid page
faults later that would lead to deadlocks.

We also make changes to loader.cc to dlopen("/libsolaris.so") before
we mount ZFS filesystem (for that reason libsolaris.so needs to be part
of the bootfs for ZFS images). Because ZFS is root filesystem, we cannot
use the same approach we used for nfs which is also implemented as a
shared library but loaded in pivot_rootfs() which happens much later.

In theory we could build mixes disk with two partitions - 1st ROFS
one with libsolaris.so on it and the 2nd ZFS one which would be mounted
after we mount ROFS and load and initialize libsolaris.so from it.

Please note that osv_c_wrappers.* and mkfs.cc changes are not really
essential to this patch and are part of unfinished effort to make ZFS
library work with the kernel when most symbols are hidden.

I have tested this patch by running unit tests (all pass) and also using
tests/misc-zfs-io.cc as well as running stress test of MySQL on ZFS
image.

Fixes #1009

Signed-off-by: Waldemar Kozaczuk <jwkoz...@gmail.com>
---
Makefile | 51 +++++++++++++++----
bootfs.manifest.skel | 1 +
bsd/init.cc | 7 ---
bsd/porting/shrinker.cc | 30 +++++++++---
core/osv_c_wrappers.cc | 17 +++++++
core/pagecache.cc | 45 ++++++++++++-----
drivers/zfs.cc | 12 ++++-
fs/zfs/zfs_initialize.c | 94 ++++++++++++++++++++++++++++++++++++
fs/zfs/zfs_null_vfsops.cc | 54 +++++++++++++++++++++
include/osv/osv_c_wrappers.h | 4 ++
libc/misc/uname.c | 2 +-
loader.cc | 19 ++++++++
tools/mkfs/mkfs.cc | 27 +++++++----
13 files changed, 314 insertions(+), 49 deletions(-)
create mode 100644 fs/zfs/zfs_initialize.c
create mode 100644 fs/zfs/zfs_null_vfsops.cc

diff --git a/Makefile b/Makefile
index 7acf130c..7949ea4d 100644
--- a/Makefile
+++ b/Makefile
@@ -568,7 +568,7 @@ bsd += bsd/porting/kthread.o
bsd += bsd/porting/mmu.o
bsd += bsd/porting/pcpu.o
bsd += bsd/porting/bus_dma.o
-bsd += bsd/porting/kobj.o
+#bsd += bsd/porting/kobj.o
bsd += bsd/sys/netinet/if_ether.o
bsd += bsd/sys/compat/linux/linux_socket.o
bsd += bsd/sys/compat/linux/linux_ioctl.o
@@ -618,9 +618,6 @@ bsd += bsd/sys/netinet/cc/cc_cubic.o
bsd += bsd/sys/netinet/cc/cc_htcp.o
bsd += bsd/sys/netinet/cc/cc_newreno.o
bsd += bsd/sys/netinet/arpcache.o
-bsd += bsd/sys/xdr/xdr.o
-bsd += bsd/sys/xdr/xdr_array.o
-bsd += bsd/sys/xdr/xdr_mem.o
bsd += bsd/sys/xen/evtchn.o

ifeq ($(arch),x64)
@@ -644,6 +641,11 @@ bsd += bsd/sys/dev/random/live_entropy_sources.o

$(out)/bsd/sys/%.o: COMMON += -Wno-sign-compare -Wno-narrowing -Wno-write-strings -Wno-parentheses -Wno-unused-but-set-variable

+xdr :=
+xdr += bsd/sys/xdr/xdr.o
+xdr += bsd/sys/xdr/xdr_array.o
+xdr += bsd/sys/xdr/xdr_mem.o
+
solaris :=
solaris += bsd/sys/cddl/compat/opensolaris/kern/opensolaris.o
solaris += bsd/sys/cddl/compat/opensolaris/kern/opensolaris_atomic.o
@@ -799,7 +801,7 @@ libtsm += drivers/libtsm/tsm_screen.o
libtsm += drivers/libtsm/tsm_vte.o
libtsm += drivers/libtsm/tsm_vte_charsets.o

-drivers := $(bsd) $(solaris)
+drivers := $(bsd)
drivers += core/mmu.o
drivers += arch/$(arch)/early-console.o
drivers += drivers/console.o
@@ -1849,6 +1851,7 @@ fs_objs += virtiofs/virtiofs_vfsops.o \
fs_objs += pseudofs/pseudofs.o
fs_objs += procfs/procfs_vnops.o
fs_objs += sysfs/sysfs_vnops.o
+fs_objs += zfs/zfs_null_vfsops.o

objects += $(addprefix fs/, $(fs_objs))
objects += $(addprefix libc/, $(libc))
@@ -2035,11 +2038,11 @@ $(out)/empty_bootfs.o: ASFLAGS += -I$(out)

$(out)/tools/mkfs/mkfs.so: $(out)/tools/mkfs/mkfs.o $(out)/libzfs.so
$(makedir)
- $(call quiet, $(CC) $(CFLAGS) -o $@ $(out)/tools/mkfs/mkfs.o -L$(out) -lzfs, LINK mkfs.so)
+ $(call quiet, $(CC) $(CFLAGS) -o $@ $(out)/tools/mkfs/mkfs.o -L$(out) -lzfs -lstdc++, LINK mkfs.so)

$(out)/tools/cpiod/cpiod.so: $(out)/tools/cpiod/cpiod.o $(out)/tools/cpiod/cpio.o $(out)/libzfs.so
$(makedir)
- $(call quiet, $(CC) $(CFLAGS) -o $@ $(out)/tools/cpiod/cpiod.o $(out)/tools/cpiod/cpio.o -L$(out) -lzfs, LINK cpiod.so)
+ $(call quiet, $(CC) $(CFLAGS) -o $@ $(out)/tools/cpiod/cpiod.o $(out)/tools/cpiod/cpio.o -L$(out) -lzfs -lstdc++, LINK cpiod.so)

################################################################################
# The dependencies on header files are automatically generated only after the
@@ -2117,6 +2120,34 @@ libzfs-objects = $(foreach file, $(libzfs-file-list), $(out)/bsd/cddl/contrib/op
libzpool-file-list = util kernel
libzpool-objects = $(foreach file, $(libzpool-file-list), $(out)/bsd/cddl/contrib/opensolaris/lib/libzpool/common/$(file).o)

+solaris-objects = $(foreach file, $(solaris), $(out)/$(file))
+xdr-objects = $(foreach file, $(xdr), $(out)/$(file))
+comma:=,
+#build libsolaris.so with -z,now so that all symbols get resolved eagerly (BIND_NOW)
+#also make sure libsolaris.so has osv-mlock note (see zfs_initialize.c) so that
+# the file segments get loaded eagerly as well when mmapped
+$(out)/libsolaris.so: $(solaris-objects) $(xdr-objects) $(out)/bsd/porting/kobj.o $(out)/fs/zfs/zfs_initialize.o
+ $(makedir)
+ $(call quiet, $(CC) $(CFLAGS) -Wl$(comma)-z$(comma)now -o $@ $(solaris-objects) $(xdr-objects) $(out)/bsd/porting/kobj.o $(out)/fs/zfs/zfs_initialize.o -L$(out), LINK libsolaris.so)
+$(solaris-objects): kernel-defines = -D_KERNEL $(source-dialects)
+$(xdr-objects): kernel-defines = -D_KERNEL $(source-dialects)
+$(out)/bsd/porting/kobj.o: kernel-defines = -D_KERNEL $(source-dialects)
+$(out)/fs/zfs/zfs_initialize.o: kernel-defines = -D_KERNEL $(source-dialects)
+$(out)/fs/zfs/zfs_initialize.o: CFLAGS+= \
+ -DBUILDING_ZFS \
+ -Wno-array-bounds \
+ -Ibsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs \
+ -Ibsd/sys/cddl/contrib/opensolaris/common/zfs \
+ -fno-strict-aliasing \
+ -Wno-unknown-pragmas \
+ -Wno-unused-variable \
+ -Wno-switch \
+ -Wno-maybe-uninitialized \
+ -Ibsd/sys/cddl/compat/opensolaris \
+ -Ibsd/sys/cddl/contrib/opensolaris/common \
+ -Ibsd/sys/cddl/contrib/opensolaris/uts/common \
+ -Ibsd/sys
+
libzfs-objects += $(libzpool-objects)
libzfs-objects += $(out)/bsd/cddl/compat/opensolaris/misc/mkdirp.o
libzfs-objects += $(out)/bsd/cddl/compat/opensolaris/misc/zmount.o
@@ -2167,9 +2198,9 @@ $(out)/bsd/cddl/contrib/opensolaris/lib/libzfs/common/zprop_common.o: bsd/sys/cd
$(makedir)
$(call quiet, $(CC) $(CFLAGS) -c -o $@ $<, CC $<)

-$(out)/libzfs.so: $(libzfs-objects) $(out)/libuutil.so
+$(out)/libzfs.so: $(libzfs-objects) $(out)/libuutil.so $(out)/libsolaris.so
$(makedir)
- $(call quiet, $(CC) $(CFLAGS) -o $@ $(libzfs-objects) -L$(out) -luutil, LINK libzfs.so)
+ $(call quiet, $(CC) $(CFLAGS) -o $@ $(libzfs-objects) -L$(out) -luutil -lsolaris, LINK libzfs.so)

#include $(src)/bsd/cddl/contrib/opensolaris/cmd/zpool/build.mk:
zpool-cmd-file-list = zpool_iter zpool_main zpool_util zpool_vdev
@@ -2211,6 +2242,6 @@ $(zfs-cmd-objects): CFLAGS += -Wno-switch -D__va_list=__builtin_va_list '-DTEXT_
-Wno-maybe-uninitialized -Wno-unused-variable -Wno-unknown-pragmas -Wno-unused-function


-$(out)/zfs.so: $(zfs-cmd-objects) $(out)/libzfs.so
+$(out)/zfs.so: $(zfs-cmd-objects) $(out)/libzfs.so $(out)/libsolaris.so
$(makedir)
$(call quiet, $(CC) $(CFLAGS) -o $@ $(zfs-cmd-objects) -L$(out) -lzfs, LINK zfs.so)
diff --git a/bootfs.manifest.skel b/bootfs.manifest.skel
index a819f1a1..aad2bbeb 100644
--- a/bootfs.manifest.skel
+++ b/bootfs.manifest.skel
@@ -3,6 +3,7 @@
/libuutil.so: libuutil.so
/zpool.so: zpool.so
/libzfs.so: libzfs.so
+/libsolaris.so: libsolaris.so
/zfs.so: zfs.so
/tools/mkfs.so: tools/mkfs/mkfs.so
/tools/cpiod.so: tools/cpiod/cpiod.so
diff --git a/bsd/init.cc b/bsd/init.cc
index f0e8e32c..e2c8c564 100644
--- a/bsd/init.cc
+++ b/bsd/init.cc
@@ -15,10 +15,6 @@
#include <bsd/sys/sys/eventhandler.h>

extern "C" {
- extern void system_taskq_init(void *arg);
- extern void opensolaris_load(void *arg);
- extern void callb_init(void *arg);
-
// taskqueue
#include <bsd/sys/sys/taskqueue.h>
#include <bsd/sys/sys/priority.h>
@@ -49,9 +45,6 @@ void bsd_init(void)

arc4_init();
eventhandler_init(NULL);
- opensolaris_load(NULL);
- callb_init(NULL);
- system_taskq_init(NULL);

debug(" - done\n");
}
diff --git a/bsd/porting/shrinker.cc b/bsd/porting/shrinker.cc
index 3fb7aff1..7ed274cf 100644
--- a/bsd/porting/shrinker.cc
+++ b/bsd/porting/shrinker.cc
@@ -45,14 +45,17 @@ arc_shrinker::arc_shrinker()
{
}

-extern "C" size_t arc_lowmem(void *arg, int howto);
-extern "C" size_t arc_sized_adjust(int64_t to_reclaim);
+//These two function pointers will be set dynamically in INIT function of
+//libsolaris.so by calling register_shrinker_funs() below. The arc_lowmem()
+//and arc_sized_adjust() are functions defined in libsolaris.so.
+size_t (*arc_lowmem_fun)(void *arg, int howto);
+size_t (*arc_sized_adjust_fun)(int64_t to_reclaim);

size_t arc_shrinker::request_memory(size_t s, bool hard)
{
size_t ret = 0;
if (hard) {
- ret = arc_lowmem(nullptr, 0);
+ ret = (*arc_lowmem_fun)(nullptr, 0);
// ARC's aggressive mode will call arc_adjust, which will reduce the size of the
// cache, but won't necessarily free as much memory as we need. If it doesn't,
// keep going in soft mode. This is better than calling arc_lowmem() again, since
@@ -67,7 +70,7 @@ size_t arc_shrinker::request_memory(size_t s, bool hard)
// minimum of 16 M.
s = std::max(s, (16ul << 20));
do {
- size_t r = arc_sized_adjust(s);
+ size_t r = (*arc_sized_adjust_fun)(s);
if (r == 0) {
break;
}
@@ -81,21 +84,32 @@ void bsd_shrinker_init(void)
struct eventhandler_list *list = eventhandler_find_list("vm_lowmem");
struct eventhandler_entry *ep;

- debug("BSD shrinker: event handler list found: %p\n", list);
+ kprintf("BSD shrinker: event handler list found: %p\n", list);

TAILQ_FOREACH(ep, &list->el_entries, ee_link) {
- debug("\tBSD shrinker found: %p\n",
+ kprintf("\tBSD shrinker found: %p\n",
((struct eventhandler_entry_generic *)ep)->func);

auto *_ee = (struct eventhandler_entry_generic *)ep;

- if ((void *)_ee->func == (void *)arc_lowmem) {
+ if ((void *)_ee->func == (void *)arc_lowmem_fun) {
new arc_shrinker();
+ kprintf("Created arc_shrinker\n");
} else {
new bsd_shrinker(_ee);
+ kprintf("Created bsd_shrinker\n");
}
}
EHL_UNLOCK(list);

- debug("BSD shrinker: unlocked, running\n");
+ kprintf("BSD shrinker: unlocked, running\n");
+}
+
+//This needs to be a C-style function so it can be called
+//from libsolaris.so
+extern "C" void register_shrinker_funs(
+ size_t (*_arc_lowmem_fun)(void *, int),
+ size_t (*_arc_sized_adjust_fun)(int64_t)) {
+ arc_lowmem_fun = _arc_lowmem_fun;
+ arc_sized_adjust_fun = _arc_sized_adjust_fun;
}
diff --git a/core/osv_c_wrappers.cc b/core/osv_c_wrappers.cc
index 137f2c6f..f762d34e 100644
--- a/core/osv_c_wrappers.cc
+++ b/core/osv_c_wrappers.cc
@@ -3,10 +3,14 @@
#include <osv/debug.hh>
#include <osv/sched.hh>
#include <osv/app.hh>
+#include <osv/run.hh>
+#include <osv/export.h>
+#include "drivers/zfs.hh"

using namespace osv;
using namespace sched;

+OSV_LIBC_API
int osv_get_all_app_threads(pid_t tid, pid_t** tid_arr, size_t *len) {
thread* app_thread = tid==0? thread::current(): thread::find_by_id(tid);
if (app_thread == nullptr) {
@@ -28,3 +32,16 @@ int osv_get_all_app_threads(pid_t tid, pid_t** tid_arr, size_t *len) {
}
return 0;
}
+
+OSV_LIBC_API
+int osv_run(const char *cmdpath, int argc, char **argv) {
+ int ret;
+ auto ok = run(cmdpath, argc, argv, &ret);
+ assert(ok && ret == 0);
+ return ret;
+}
+
+OSV_LIBC_API
+void osv_zfsdev_init() {
+ zfsdev::zfsdev_init();
+}
diff --git a/core/pagecache.cc b/core/pagecache.cc
index b58a97fb..e5e9bcd1 100644
--- a/core/pagecache.cc
+++ b/core/pagecache.cc
@@ -19,11 +19,26 @@
#include <osv/prio.hh>
#include <chrono>

-extern "C" {
-void arc_unshare_buf(arc_buf_t*);
-void arc_share_buf(arc_buf_t*);
-void arc_buf_accessed(const uint64_t[4]);
-void arc_buf_get_hashkey(arc_buf_t*, uint64_t[4]);
+//These four function pointers will be set dynamically in INIT function of
+//libsolaris.so by calling register_arc_funs() below. The arc_unshare_buf(),
+//arc_share_buf(), arc_buf_accessed() and arc_buf_get_hashkey()
+//are functions defined in libsolaris.so.
+void (*arc_unshare_buf_fun)(arc_buf_t*);
+void (*arc_share_buf_fun)(arc_buf_t*);
+void (*arc_buf_accessed_fun)(const uint64_t[4]);
+void (*arc_buf_get_hashkey_fun)(arc_buf_t*, uint64_t[4]);
+
+//This needs to be a C-style function so it can be called
+//from libsolaris.so
+extern "C" void register_arc_funs(
+ void (*_arc_unshare_buf_fun)(arc_buf_t*),
+ void (*_arc_share_buf_fun)(arc_buf_t*),
+ void (*_arc_buf_accessed_fun)(const uint64_t[4]),
+ void (*_arc_buf_get_hashkey_fun)(arc_buf_t*, uint64_t[4])) {
+ arc_unshare_buf_fun = _arc_unshare_buf_fun;
+ arc_share_buf_fun = _arc_share_buf_fun;
+ arc_buf_accessed_fun = _arc_buf_accessed_fun;
+ arc_buf_get_hashkey_fun = _arc_buf_get_hashkey_fun;
}

namespace std {
@@ -270,7 +285,7 @@ public:
cached_page_arc(hashkey key, void* page, arc_buf_t* ab) : cached_page(key, page), _ab(ref(ab, this)) {}
virtual ~cached_page_arc() {
if (!_removed && unref(_ab, this)) {
- arc_unshare_buf(_ab);
+ (*arc_unshare_buf_fun)(_ab);
}
}
arc_buf_t* arcbuf() {
@@ -439,7 +454,7 @@ void map_arc_buf(hashkey *key, arc_buf_t* ab, void *page)
SCOPE_LOCK(arc_read_lock);
cached_page_arc* pc = new cached_page_arc(*key, page, ab);
arc_read_cache.emplace(*key, pc);
- arc_share_buf(ab);
+ (*arc_share_buf_fun)(ab);
}

void map_read_cached_page(hashkey *key, void *page)
@@ -656,7 +671,7 @@ void sync(vfs_file* fp, off_t start, off_t end)
}

TRACEPOINT(trace_access_scanner, "scanned=%u, cleared=%u, %%cpu=%g", unsigned, unsigned, double);
-static class access_scanner {
+class access_scanner {
static constexpr double _max_cpu = 20;
static constexpr double _min_cpu = 0.1;
static constexpr unsigned _freq = 1000;
@@ -673,7 +688,7 @@ private:
return false;
}
for (auto&& arc_hashkey: accessed) {
- arc_buf_accessed(arc_hashkey.key);
+ (*arc_buf_accessed_fun)(arc_hashkey.key);
}
accessed.clear();
return true;
@@ -708,7 +723,7 @@ private:
auto cp = p.second;
if (cp->clear_accessed()) {
arc_hashkey arc_hashkey;
- arc_buf_get_hashkey(arcbuf, arc_hashkey.key);
+ (*arc_buf_get_hashkey_fun)(arcbuf, arc_hashkey.key);
accessed.emplace(arc_hashkey);
cleared++;
}
@@ -746,10 +761,18 @@ private:
cleared /= 2;
}
}
-} s_access_scanner;
+};
+
+static access_scanner *s_access_scanner = nullptr;

constexpr double access_scanner::_max_cpu;
constexpr double access_scanner::_min_cpu;

+}

+//The access_scanner thread is ZFS specific so it
+//is initialized by calling the function below if libsolaris.so
+//is loaded.
+extern "C" void start_access_scanner() {
+ pagecache::s_access_scanner = new pagecache::access_scanner();
}
diff --git a/drivers/zfs.cc b/drivers/zfs.cc
index ef7f7812..6fad299b 100644
--- a/drivers/zfs.cc
+++ b/drivers/zfs.cc
@@ -11,7 +11,10 @@

namespace zfsdev {

-extern "C" int osv_zfs_ioctl(unsigned long req, void* buffer);
+//The osv_zfs_ioctl_fun will be set dynamically in INIT function of
+//libsolaris.so by calling register_osv_zfs_ioctl() below. The osv_zfs_ioctl()
+//is a function defined in libsolaris.so.
+int (*osv_zfs_ioctl_fun)(unsigned long req, void* buffer);

struct zfs_device_priv {
zfs_device* drv;
@@ -24,7 +27,7 @@ static zfs_device_priv *to_priv(device *dev)

static int zfs_ioctl(device* dev, ulong req, void* buffer)
{
- return osv_zfs_ioctl(req, buffer);
+ return (*osv_zfs_ioctl_fun)(req, buffer);
}

static devops zfs_device_devops = {
@@ -63,3 +66,8 @@ void zfsdev_init(void)
}

}
+
+//Needs to be a C-style function so it can be called from libsolaris.so
+extern "C" void register_osv_zfs_ioctl( int (*osv_zfs_ioctl_fun)(unsigned long, void*)) {
+ zfsdev::osv_zfs_ioctl_fun = osv_zfs_ioctl_fun;
+}
diff --git a/fs/zfs/zfs_initialize.c b/fs/zfs/zfs_initialize.c
new file mode 100644
index 00000000..7db2c615
--- /dev/null
+++ b/fs/zfs/zfs_initialize.c
@@ -0,0 +1,94 @@
+/*
+ * Copyright (C) 2021 Waldemar Kozaczuk
+ *
+ * This work is open source software, licensed under the terms of the
+ * BSD license as described in the LICENSE file in the top-level directory.
+ */
+
+#include <stddef.h>
+#include <stdio.h>
+#include <osv/mount.h>
+#include <sys/arc.h>
+
+//This file gets linked as part of libsolaris.so to
+//provide an INIT function to initialize ZFS filesystem
+//code
+
+extern void system_taskq_init(void *arg);
+extern void opensolaris_load(void *arg);
+extern void callb_init(void *arg);
+
+extern int osv_zfs_ioctl(unsigned long req, void* buffer);
+//The function below is part of kernel and is used to
+//register osv_zfs_ioctl() as a callback
+extern void register_osv_zfs_ioctl( int (*osv_zfs_ioctl_fun)(unsigned long, void*));
+
+extern size_t arc_lowmem(void *arg, int howto);
+extern size_t arc_sized_adjust(long to_reclaim);
+//The function below is part of kernel and is used to
+//register arc_lowmem() and arc_sized_adjust() as callbacks
+extern void register_shrinker_funs(size_t (*_arc_lowmem_fun)(void *, int), size_t (*_arc_sized_adjust_fun)(long));
+
+extern void arc_unshare_buf(arc_buf_t*);
+extern void arc_share_buf(arc_buf_t*);
+extern void arc_buf_accessed(const uint64_t[4]);
+extern void arc_buf_get_hashkey(arc_buf_t*, uint64_t[4]);
+//The function below is part of kernel and is used to
+//register for functions above - arc_*() - as callbacks
+extern void register_arc_funs(
+ void (*_arc_unshare_buf_fun)(arc_buf_t*),
+ void (*_arc_share_buf_fun)(arc_buf_t*),
+ void (*_arc_buf_accessed_fun)(const uint64_t[4]),
+ void (*_arc_buf_get_hashkey_fun)(arc_buf_t*, uint64_t[4]));
+
+extern struct vfsops zfs_vfsops;
+//The function below is part of kernel and is used to
+//update ZFS vfsops in the vfssw configuration struct
+extern void zfs_update_vfsops(struct vfsops* _vfsops);
+
+extern void start_access_scanner();
+
+extern int zfs_init(void);
+
+//This init function gets called on loading of libsolaris.so
+//and it initializes all necessary resources (threads, etc) used by the code in
+//libsolaris.so. This initialization is necessary before ZFS can be mounted.
+void __attribute__((constructor)) zfs_initialize(void) {
+ // These 3 functions used to be called at the end of bsd_init()
+ // and are intended to initialize various resources, mainly thread pools
+ // (threads named 'system_taskq_*' and 'solthread-0x*')
+ opensolaris_load(NULL);
+ callb_init(NULL);
+ system_taskq_init(NULL);
+
+ //Register osv_zfs_ioctl() as callback in drivers/zfs.cc
+ register_osv_zfs_ioctl(&osv_zfs_ioctl);
+ //Register arc_lowmem() and arc_sized_adjust() as callbacks in arc_shrinker
+ //implemented as part of bsd/porting/shrinker.cc
+ register_shrinker_funs(&arc_lowmem, &arc_sized_adjust);
+ //Register arc_unshare_buf(), arc_share_buf(), arc_buf_accessed() and arc_buf_get_hashkey()
+ //as callbacks in the page cache layer implemented in core/pagecache.cc
+ register_arc_funs(&arc_unshare_buf, &arc_share_buf, &arc_buf_accessed, &arc_buf_get_hashkey);
+
+ //Register vfsops and vnops ...
+ zfs_update_vfsops(&zfs_vfsops);
+ //Start ZFS access scanner (part of pagecache)
+ start_access_scanner();
+
+ //Finally call zfs_init() which is what would been normally called by vfs_init()
+ //The dummy zfs_init() defined in kernel does not do anything so
+ //we have to call the real one here as a last step after everything else above
+ //was called to initialize various ZFS resources and register relevant callback
+ //functions in the kernel
+ zfs_init();
+
+ printf("zfs_initialize: --> libsolaris.so initialized!\n");
+}
+
+//This is important to make sure that OSv dynamic linker will
+//pre-fault (populate) all segments of libsolaris.so on load
+//before any of its code is executed. This makes it so that ZFS
+//code does not trigger any faults which is important
+//when handling map() or unmap() on ZFS files for example.
+//Without it we would encounter deadlocks in such scenarios.
+asm(".pushsection .note.osv-mlock, \"a\"; .long 0, 0, 0; .popsection");
diff --git a/fs/zfs/zfs_null_vfsops.cc b/fs/zfs/zfs_null_vfsops.cc
new file mode 100644
index 00000000..679fa40c
--- /dev/null
+++ b/fs/zfs/zfs_null_vfsops.cc
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) 2021 Waldemar Kozaczuk
+ *
+ * This work is open source software, licensed under the terms of the
+ * BSD license as described in the LICENSE file in the top-level directory.
+ */
+
+#include <osv/mount.h>
+
+#define zfs_mount ((vfsop_mount_t)vfs_nullop)
+#define zfs_umount ((vfsop_umount_t)vfs_nullop)
+#define zfs_sync ((vfsop_sync_t)vfs_nullop)
+#define zfs_vget ((vfsop_vget_t)vfs_nullop)
+#define zfs_statfs ((vfsop_statfs_t)vfs_nullop)
+
+static int zfs_noop_mount(struct mount *mp, const char *dev, int flags,
+ const void *data)
+{
+ printf("The zfs is in-active!. Please add libsolaris.so to the image.\n");
+ return -1;
+}
+
+/*
+ * File system operations
+ *
+ * This provides dummy vfsops when libsolaris is not loaded and ZFS filesystem
+ * is not active.
+ */
+struct vfsops zfs_vfsops = {
+ zfs_noop_mount, /* mount */
+ zfs_umount, /* umount */
+ zfs_sync, /* sync */
+ zfs_vget, /* vget */
+ zfs_statfs, /* statfs */
+ nullptr, /* vnops */
+};
+
+extern "C" int zfs_init(void)
+{
+ return 0;
+}
+
+//Normally (without ZFS enabled) the zfs_vfsops points to dummy
+//noop functions. So when libsolaris.so is loaded, we provide the
+//function below to be called to register real vfsops for ZFS
+extern "C" void zfs_update_vfsops(struct vfsops* _vfsops) {
+ zfs_vfsops.vfs_mount = _vfsops->vfs_mount;
+ zfs_vfsops.vfs_unmount = _vfsops->vfs_unmount;
+ zfs_vfsops.vfs_sync = _vfsops->vfs_sync;
+ zfs_vfsops.vfs_mount = _vfsops->vfs_mount;
+ zfs_vfsops.vfs_vget = _vfsops->vfs_vget;
+ zfs_vfsops.vfs_statfs = _vfsops->vfs_statfs;
+ zfs_vfsops.vfs_vnops = _vfsops->vfs_vnops;
+}
diff --git a/include/osv/osv_c_wrappers.h b/include/osv/osv_c_wrappers.h
index 94f07ad1..3823a7fb 100644
--- a/include/osv/osv_c_wrappers.h
+++ b/include/osv/osv_c_wrappers.h
@@ -22,6 +22,10 @@ Returns 0 on success, error code on error.
*/
int osv_get_all_app_threads(pid_t tid, pid_t** tid_arr, size_t* len);

+int osv_run(const char *cmdpath, int argc, char **argv);
+
+void osv_zfsdev_init();
+
#ifdef __cplusplus
}
#endif
diff --git a/libc/misc/uname.c b/libc/misc/uname.c
index 3f1bf754..016d74a5 100644
--- a/libc/misc/uname.c
+++ b/libc/misc/uname.c
@@ -24,7 +24,7 @@ _Static_assert(KERNEL_VERSION(LINUX_MAJOR, LINUX_MINOR, LINUX_PATCH)
#define str(s) #s
#define str2(s) str(s)

-struct utsname utsname OSV_HIDDEN = {
+struct utsname utsname = {
.sysname = "Linux", /* lie, to avoid confusing the payload. */
.nodename = "osv.local",
.release = str2(LINUX_MAJOR) "." str2(LINUX_MINOR) "." str2(LINUX_PATCH),
diff --git a/loader.cc b/loader.cc
index 44c0e754..dd58b388 100644
--- a/loader.cc
+++ b/loader.cc
@@ -57,6 +57,7 @@

#include "libc/network/__dns.hh"
#include <processor.hh>
+#include <dlfcn.h>

using namespace osv;
using namespace osv::clock::literals;
@@ -421,6 +422,10 @@ void* do_main_thread(void *_main_args)
}
boot_time.event("ROFS mounted");
} else if (opt_rootfs.compare("zfs") == 0) {
+ //Initialize ZFS filesystem driver implemented in libsolaris.so
+ //TODO: Check if dlopen() of libsolaris.so succeeded
+ //TODO: Consider calling dlclose() somewhere after ZFS is unmounted
+ dlopen("/libsolaris.so", RTLD_LAZY);
zfsdev::zfsdev_init();
auto error = mount_zfs_rootfs(opt_pivot, opt_extra_zfs_pools);
if (error) {
@@ -454,6 +459,10 @@ void* do_main_thread(void *_main_args)
} else if (mount_virtiofs_rootfs(opt_pivot) == 0) {
boot_time.event("Virtio-fs mounted");
} else {
+ //Initialize ZFS filesystem driver implemented in libsolaris.so
+ //TODO: Check if dlopen() of libsolaris.so succeeded
+ //TODO: Consider calling dlclose() somewhere after ZFS is unmounted
+ dlopen("/libsolaris.so", RTLD_LAZY);
zfsdev::zfsdev_init();
auto error = mount_zfs_rootfs(opt_pivot, opt_extra_zfs_pools);
if (error) {
@@ -469,6 +478,16 @@ void* do_main_thread(void *_main_args)
}
}
}
+ } else {
+ //TODO: This should not really be necessary but due to some bugs
+ //or shortcomings of dynamic linker when libzfs.so is initialized
+ //we end up with libsolaris.so with some symbols resolved to wrong functions
+ //(more details in separate email)
+ //For that reason we exclicitly dlopen libsolaris.so before the mkfs.so
+ //is loaded and executed
+ //Unless we fix the dynamic linker bugs, we should probably consider
+ //moving this to mkfs.cc maybe?
+ dlopen("/libsolaris.so", RTLD_LAZY);
}

bool has_if = false;
diff --git a/tools/mkfs/mkfs.cc b/tools/mkfs/mkfs.cc
index 1983cf83..13f2be23 100644
--- a/tools/mkfs/mkfs.cc
+++ b/tools/mkfs/mkfs.cc
@@ -7,22 +7,29 @@

#include <assert.h>
#include <string.h>
-#include <osv/device.h>
-#include <osv/run.hh>
-#include <fs/vfs/vfs.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <unistd.h>
+#include <sys/stat.h>
+#include <osv/osv_c_wrappers.h>
#include <iostream>
-#include "drivers/zfs.hh"
+#include <vector>
+#include <osv/export.h>

-using namespace osv;
+//using namespace osv;
using namespace std;

// Created to guarantee that shared objects resources will
// be surely released at the function prologue.
static void run_cmd(const char *cmdpath, vector<string> args)
{
- int ret;
- auto ok = run(cmdpath, args, &ret);
- assert(ok && ret == 0);
+ char *argv[args.size()];
+ int i = 0;
+ for (auto &arg : args) {
+ argv[i++] = const_cast<char*>(arg.c_str());
+ }
+ auto ret = osv_run(cmdpath, args.size(), argv);
+ assert(ret == 0);
}

// Get extra blk devices for pool creation.
@@ -54,7 +61,7 @@ static void get_blk_devices(vector<string> &zpool_args)
static void mkfs(void)
{
// Create zfs device, then /etc/mnttab which is required by libzfs
- zfsdev::zfsdev_init();
+ osv_zfsdev_init();

// Manually create /etc/mnttab, a file required by libzfs.
mkdir("/etc", 0755);
@@ -82,7 +89,7 @@ static void mkfs(void)
run_cmd("/zfs.so", {"zfs", "set", "compression=lz4", "osv"});
}

-int main(int ac, char** av)
+OSV_LIBC_API int main(int ac, char** av)
{
cout << "Running mkfs...\n";
mkfs();
--
2.31.1

Reply all
Reply to author
Forward
0 new messages