[PATCH 0/2] fallocate hole punch clear partial pages

25 views
Skip to first unread message

Mike Kravetz

unread,
Jul 12, 2022, 6:27:26 PM7/12/22
to Eric B Munson, libhug...@googlegroups.com, Mike Kravetz
Hi Eric,
Not sure if libhugetlbfs is still being maintained. If so, I would like
to make the following update to align with kernel changes.

Starting in Linux v5.19, fallocate() hole punch will clear partial hugetlb
pages in the specified range. This was added with:
https://lore.kernel.org/linux-mm/YqeiMlZDKI1Kabfe@monkey/

Prior to this change, hugetlb hole punch only operated on whole hugetlb
pages. The test fallocate_align tested whole page alignment. Rename
this test to fallocate_align_whole and create a new test named
fallocate_align_partial to test partial page operations. Update the
wrapper scripts to run the correct script based on kernel version.

Mike Kravetz (2):
fallocate-align: rename to fallocate-align-whole
fallocate-align-partial: add test for partial page zeroing

tests/Makefile | 6 +-
tests/fallocate_align_partial.c | 225 ++++++++++++++++++
tests/fallocate_align_partial.sh | 16 ++
...locate_align.c => fallocate_align_whole.c} | 4 +-
...cate_align.sh => fallocate_align_whole.sh} | 9 +-
tests/run_tests.py | 3 +-
6 files changed, 256 insertions(+), 7 deletions(-)
create mode 100644 tests/fallocate_align_partial.c
create mode 100755 tests/fallocate_align_partial.sh
rename tests/{fallocate_align.c => fallocate_align_whole.c} (98%)
rename tests/{fallocate_align.sh => fallocate_align_whole.sh} (50%)

--
2.35.3

Mike Kravetz

unread,
Jul 12, 2022, 6:27:31 PM7/12/22
to Eric B Munson, libhug...@googlegroups.com, Mike Kravetz
In Linux v5.19, hugetlbfs fallocate hole punching will be modified to
zero partial pages. This aligns with with fallocate documentation.
Prior to v5.19, hugetlbfs fallocate hole punch only operated on whole
pages.

Retain and rename current test to fallocate-align-whole that will test
pre-v5.19 kernels. A subsequent patch will add testing for partial page
operations in v5.19 and later kernels.

Also fix/update copyright.

Signed-off-by: Mike Kravetz <mike.k...@oracle.com>
---
tests/Makefile | 4 ++--
tests/{fallocate_align.c => fallocate_align_whole.c} | 4 ++--
tests/{fallocate_align.sh => fallocate_align_whole.sh} | 9 +++++++--
tests/run_tests.py | 2 +-
4 files changed, 12 insertions(+), 7 deletions(-)
rename tests/{fallocate_align.c => fallocate_align_whole.c} (98%)
rename tests/{fallocate_align.sh => fallocate_align_whole.sh} (50%)

diff --git a/tests/Makefile b/tests/Makefile
index 073df96..78f7989 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -14,7 +14,7 @@ LIB_TESTS = gethugepagesize test_root find_path unlinked_fd misalign \
mremap-expand-slice-collision \
mremap-fixed-normal-near-huge mremap-fixed-huge-near-normal \
corrupt-by-cow-opt noresv-preserve-resv-page noresv-regarded-as-resv \
- fallocate_basic fallocate_align fallocate_stress
+ fallocate_basic fallocate_align_whole fallocate_stress
LIB_TESTS_64 =
LIB_TESTS_64_STATIC = straddle_4GB huge_at_4GB_normal_below \
huge_below_4GB_normal_above
@@ -28,7 +28,7 @@ STRESS_TESTS = mmap-gettest mmap-cow shm-gettest shm-getraw shm-fork
WRAPPERS = quota counters madvise_reserve fadvise_reserve \
readahead_reserve mremap-expand-slice-collision \
mremap-fixed-normal-near-huge mremap-fixed-huge-near-normal \
- fallocate_basic fallocate_align fallocate_stress
+ fallocate_basic fallocate_align_whole fallocate_stress
HELPERS = get_hugetlbfs_path compare_kvers
HELPER_LIBS = libheapshrink.so
BADTOOLCHAIN = bad-toolchain.sh
diff --git a/tests/fallocate_align.c b/tests/fallocate_align_whole.c
similarity index 98%
rename from tests/fallocate_align.c
rename to tests/fallocate_align_whole.c
index 1ab2e94..004b760 100644
--- a/tests/fallocate_align.c
+++ b/tests/fallocate_align_whole.c
@@ -1,6 +1,6 @@
/*
* libhugetlbfs - Easy use of Linux hugepages
- * Copyright (C) 20015 Mike Kravetz, Oracle Corporation
+ * Copyright (c) 2015, 2022, Oracle and/or its affiliates.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License
@@ -31,7 +31,7 @@

#include "hugetests.h"

-#define P "fallocate-align"
+#define P "fallocate-align-whole"
#define DESC \
"* Test alignment of fallocate arguments. fallocate will take *\n"\
"* non-huge page aligned offsets and addresses. However, *\n"\
diff --git a/tests/fallocate_align.sh b/tests/fallocate_align_whole.sh
similarity index 50%
rename from tests/fallocate_align.sh
rename to tests/fallocate_align_whole.sh
index 5105151..e9a823a 100755
--- a/tests/fallocate_align.sh
+++ b/tests/fallocate_align_whole.sh
@@ -10,6 +10,11 @@ if [ $? -eq 1 ]; then
echo "FAIL no fallocate support in kernels before 4.3.0"
exit $RC_FAIL
else
- EXP_RC=$RC_PASS
- exec_and_check $EXP_RC fallocate_align "$@"
+ compare_kvers `uname -r` "5.19.0"
+ if [ $? -eq 1 ]; then
+ EXP_RC=$RC_PASS
+ exec_and_check $EXP_RC fallocate_align_whole "$@"
+ else
+ echo "FAIL fallocate zeros partial pages in kernels 5.19.0 and later" exit $RC_FAIL
+ fi
fi
diff --git a/tests/run_tests.py b/tests/run_tests.py
index 018264d..d9626dd 100755
--- a/tests/run_tests.py
+++ b/tests/run_tests.py
@@ -558,7 +558,7 @@ def functional_tests():
do_test_with_rlimit(resource.RLIMIT_MEMLOCK, -1, "mlock")
do_test("misalign")
do_test("fallocate_basic.sh")
- do_test("fallocate_align.sh")
+ do_test("fallocate_align_whole.sh")

# Specific kernel bug tests
do_test("ptrace-write-hugepage")
--
2.35.3

Mike Kravetz

unread,
Jul 12, 2022, 6:27:35 PM7/12/22
to Eric B Munson, libhug...@googlegroups.com, Mike Kravetz
Linux v5.19, modified hugetlbfs fallocate hole punching will be to
zero partial pages. This aligns with with fallocate documentation.

Add new test (fallocate-align-partial) to test this change in
functionality.

Signed-off-by: Mike Kravetz <mike.k...@oracle.com>
---
tests/Makefile | 6 +-
tests/fallocate_align_partial.c | 225 +++++++++++++++++++++++++++++++
tests/fallocate_align_partial.sh | 16 +++
tests/run_tests.py | 1 +
4 files changed, 246 insertions(+), 2 deletions(-)
create mode 100644 tests/fallocate_align_partial.c
create mode 100755 tests/fallocate_align_partial.sh

diff --git a/tests/Makefile b/tests/Makefile
index 78f7989..e6220ee 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -14,7 +14,8 @@ LIB_TESTS = gethugepagesize test_root find_path unlinked_fd misalign \
mremap-expand-slice-collision \
mremap-fixed-normal-near-huge mremap-fixed-huge-near-normal \
corrupt-by-cow-opt noresv-preserve-resv-page noresv-regarded-as-resv \
- fallocate_basic fallocate_align_whole fallocate_stress
+ fallocate_basic fallocate_align_whole fallocate_align_partial \
+ fallocate_stress
LIB_TESTS_64 =
LIB_TESTS_64_STATIC = straddle_4GB huge_at_4GB_normal_below \
huge_below_4GB_normal_above
@@ -28,7 +29,8 @@ STRESS_TESTS = mmap-gettest mmap-cow shm-gettest shm-getraw shm-fork
WRAPPERS = quota counters madvise_reserve fadvise_reserve \
readahead_reserve mremap-expand-slice-collision \
mremap-fixed-normal-near-huge mremap-fixed-huge-near-normal \
- fallocate_basic fallocate_align_whole fallocate_stress
+ fallocate_basic fallocate_align_whole fallocate_align_partial \
+ fallocate_stress
HELPERS = get_hugetlbfs_path compare_kvers
HELPER_LIBS = libheapshrink.so
BADTOOLCHAIN = bad-toolchain.sh
diff --git a/tests/fallocate_align_partial.c b/tests/fallocate_align_partial.c
new file mode 100644
index 0000000..c6a9941
--- /dev/null
+++ b/tests/fallocate_align_partial.c
@@ -0,0 +1,225 @@
+/*
+ * libhugetlbfs - Easy use of Linux hugepages
+ * Copyright (c) 2015, 2022, Oracle and/or its affiliates.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#define _GNU_SOURCE
+
+#include <linux/falloc.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <signal.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <linux/falloc.h>
+
+#include <hugetlbfs.h>
+
+#include "hugetests.h"
+
+#define P "fallocate-align"
+#define DESC \
+ "* Test alignment of fallocate arguments. fallocate will take *\n"\
+ "* non-huge page aligned offsets and addresses. However, *\n"\
+ "* operations are only performed on huge pages. This is different *\n"\
+ "* that than fallocate behavior in "normal" filesystems. *"
+
+#define FILL_CHAR 'a'
+
+static void write_entire_file(int fd, off_t size)
+{
+ static void *addr;
+ unsigned long i;
+ int err;
+
+ addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ if (addr == MAP_FAILED)
+ FAIL("mmap(): %s", strerror(errno));
+
+ for (i = 0; i < size; i++)
+ *((char *)(addr + i)) = FILL_CHAR;
+
+ err = munmap(addr, size);
+ if (err)
+ FAIL("munmap(): %s", strerror(errno));
+}
+
+static void verify_hole(int fd, off_t start, off_t length, off_t f_size)
+{
+ static void *addr;
+ unsigned long i;
+ int err;
+ off_t end = start + length;
+
+ addr = mmap(NULL, f_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ if (addr == MAP_FAILED)
+ FAIL("mmap(): %s", strerror(errno));
+
+ for (i = 0; i < f_size; i++) {
+ if (i < start || i >= end) {
+ if (*((char *)(addr + i)) != FILL_CHAR) {
+ printf("\nUnexpected char outside hole\n");
+ printf(" offset %lx %c != %c\n",
+ i, *((char *)(addr + i)), FILL_CHAR);
+ FAIL("Unexpected char outside hole\n");
+ }
+ } else {
+ if (*((char *)(addr + i)) != 0) {
+ printf("\nUnexpected char in hole\n");
+ printf(" offset %lx %c != %c\n",
+ i, *((char *)(addr + i)), 0);
+ FAIL("\nNon-zero char in hole\n");
+ }
+ }
+ }
+
+ err = munmap(addr, f_size);
+ if (err)
+ FAIL("munmap(): %s", strerror(errno));
+}
+
+int main(int argc, char *argv[])
+{
+ long hpage_size;
+ int fd;
+ int err;
+ unsigned long free_before, free_after;
+
+ test_init(argc, argv);
+
+ hpage_size = check_hugepagesize();
+
+ fd = hugetlbfs_unlinked_fd();
+ if (fd < 0)
+ FAIL("hugetlbfs_unlinked_fd()");
+
+ free_before = get_huge_page_counter(hpage_size, HUGEPAGES_FREE);
+
+ /*
+ * First preallocate file with with just 1 byte. Allocation sizes
+ * are rounded up, so we should get an entire huge page.
+ */
+ err = fallocate(fd, 0, 0, 1);
+ if (err) {
+ if (errno == EOPNOTSUPP)
+ IRRELEVANT();
+ if (err)
+ FAIL("fallocate(): %s", strerror(errno));
+ }
+
+ free_after = get_huge_page_counter(hpage_size, HUGEPAGES_FREE);
+ if (free_before - free_after != 1)
+ FAIL("fallocate 1 byte did not preallocate entire huge page\n");
+
+ write_entire_file(fd, hpage_size);
+
+ /*
+ * Now punch a hole with just 1 byte. On hole punch, sizes are
+ * rounded down. So, this operation should not create a hole.
+ */
+ err = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+ 0, 1);
+ if (err)
+ FAIL("fallocate(FALLOC_FL_PUNCH_HOLE): %s", strerror(errno));
+
+ free_after = get_huge_page_counter(hpage_size, HUGEPAGES_FREE);
+ if (free_after == free_before)
+ FAIL("fallocate hole punch 1 byte free'ed a huge page\n");
+
+ verify_hole(fd, 0, 1, hpage_size);
+
+ /* Make sure file has 2 huge pages */
+ err = fallocate(fd, 0, 0, 2 * hpage_size);
+ if (err) {
+ if (errno == EOPNOTSUPP)
+ IRRELEVANT();
+ if (err)
+ FAIL("fallocate(): %s", strerror(errno));
+ }
+ free_after = get_huge_page_counter(hpage_size, HUGEPAGES_FREE);
+ if (free_before - free_after != 2)
+ FAIL("fallocate 2 pages did not preallocate pages\n");
+
+ write_entire_file(fd, 2 * hpage_size);
+
+ /*
+ * Now punch a hole with of 2 * hpage_size - 1 byte. This size
+ * should be rounded down to a single huge page and the hole created.
+ */
+ err = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+ 0, (2 * hpage_size) - 1);
+ if (err)
+ FAIL("fallocate(FALLOC_FL_PUNCH_HOLE): %s", strerror(errno));
+
+ free_after = get_huge_page_counter(hpage_size, HUGEPAGES_FREE);
+ if (free_before - free_after != 1)
+ FAIL("fallocate hole punch 2 * hpage_size - 1 byte did not free huge page\n");
+
+ verify_hole(fd, 0, (2 * hpage_size) - 1, 2 * hpage_size);
+
+ /*
+ * Perform a preallocate operation with offset 1 and size of
+ * hpage_size. The offset should be rounded down and the
+ * size rounded up to preallocate the missing huge page.
+ */
+ err = fallocate(fd, 0, 1, hpage_size);
+ if (err)
+ FAIL("fallocate(): %s", strerror(errno));
+
+ free_after = get_huge_page_counter(hpage_size, HUGEPAGES_FREE);
+ if (free_before - free_after != 2)
+ FAIL("fallocate 1 byte offset, huge page size did not preallocate two huge pages\n");
+
+ write_entire_file(fd, 2 * hpage_size);
+
+ /*
+ * The hole punch code will only delete 'whole' huge pags that are
+ * in the specified range. The offset is rounded up, and (offset
+ * + size) is rounded down to determine the huge pages to be deleted.
+ * In this case, after rounding the range is (hpage_size, hpage_size).
+ * So, no pages should be deleted.
+ */
+ err = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+ 1, hpage_size);
+ if (err)
+ FAIL("fallocate(FALLOC_FL_PUNCH_HOLE): %s", strerror(errno));
+
+ free_after = get_huge_page_counter(hpage_size, HUGEPAGES_FREE);
+ if (free_before - free_after != 2)
+ FAIL("fallocate hole punch 1 byte offset, huge page size incorrectly deleted a huge page\n");
+
+ verify_hole(fd, 1, hpage_size, 2 * hpage_size);
+
+ write_entire_file(fd, 2 * hpage_size);
+
+ /*
+ * To delete both huge pages, the range passed to hole punch must
+ * overlap the allocated pages
+ */
+ err = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+ 0, 2 * hpage_size);
+ if (err)
+ FAIL("fallocate(FALLOC_FL_PUNCH_HOLE): %s", strerror(errno));
+
+ free_after = get_huge_page_counter(hpage_size, HUGEPAGES_FREE);
+ if (free_after != free_before)
+ FAIL("fallocate hole punch did not delete two huge pages\n");
+
+ verify_hole(fd, 0, 2 * hpage_size, 2 * hpage_size);
+
+ PASS();
+}
diff --git a/tests/fallocate_align_partial.sh b/tests/fallocate_align_partial.sh
new file mode 100755
index 0000000..ec421c6
--- /dev/null
+++ b/tests/fallocate_align_partial.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+
+. wrapper-utils.sh
+
+#
+# hugetlbfs fallocate support was not available until 4.3
+# Partial page support added in 5.19
+#
+compare_kvers `uname -r` "5.19.0"
+if [ $? -eq 1 ]; then
+ echo "FAIL no fallocate partial page support in kernels before 5.19.0"
+ exit $RC_FAIL
+else
+ EXP_RC=$RC_PASS
+ exec_and_check $EXP_RC fallocate_align_partial "$@"
+fi
diff --git a/tests/run_tests.py b/tests/run_tests.py
index d9626dd..cab89c0 100755
--- a/tests/run_tests.py
+++ b/tests/run_tests.py
@@ -559,6 +559,7 @@ def functional_tests():
do_test("misalign")
do_test("fallocate_basic.sh")
do_test("fallocate_align_whole.sh")
+ do_test("fallocate_align_partial.sh")

Eric B Munson

unread,
Aug 4, 2022, 2:02:32 PM8/4/22
to Mike Kravetz, libhug...@googlegroups.com
On 2022-07-12 18:27, Mike Kravetz wrote:
> Hi Eric,
> Not sure if libhugetlbfs is still being maintained. If so, I would
> like
> to make the following update to align with kernel changes.

Hi Mike,

I am not maintaining it any longer, but I would be happy to hand it over
to anyone who wants to maintain the code base.
Reply all
Reply to author
Forward
0 new messages