[RFC PATCH] rust: doctests: Number tests by line numerically, not lexicographically.

0 views

Skip to first unread message

David Gow

unread,

Dec 19, 2025, 4:25:53 AM12/19/25

to Brendan Higgins, Rae Moar, Gary Guo, Miguel Ojeda, Shuah Khan, Guillaume Gomez, David Gow, linux-k...@vger.kernel.org, kuni...@googlegroups.com, rust-fo...@vger.kernel.org, linux-...@vger.kernel.org

The rust doctests are numbered -- instead of named with the line number
-- in order to keep them moderately consistent even as the source file
changes.

However, the test numbers are generated by sorting the file/line
strings, and so the line numbers were sorted as strings, not integers.
So, for instance, a test on line 7 would sort in-between one on line 65
and one on line 75.

Instead, parse the numbers as an integer, and sort based on that. This
is a bit slower, uglier, and will break things once, but I suspect is
worth it (at least until we have a better solution).

Signed-off-by: David Gow <davi...@google.com>
---

This is a pretty unpolished, likely-unidiomatic patch to work around the
test numbering being horrible.

I have three questions before I decide if this is worth continuing with:

1. Is it worth renumbering all of the tests (hopefully just once), or
would that break too many people's test histories?

2. Is there a better way of doing this in Rust? I can think of ways
which might be nicer if the whole thing is refactored somewhat
seriously, but if there's an easy numeric sort on strings, that'd be
much easier.

3. Should we wait until after all or some of the changes to the test
generation? Does the new --output-format=doctest option make this
easier/harder/different?

Does anyone have opinions/advice on those (or, indeed, on anything
else)?

Cheers,
-- David

---
scripts/rustdoc_test_gen.rs | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index be0561049660..60b0bbfb1896 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -116,7 +116,19 @@ fn main() {
.collect::<Vec<_>>();

// Sort paths.
- paths.sort();
+ paths.sort_by(|a, b|{
+ let a_name = a.file_name().unwrap().to_str().unwrap().to_string();
+ let (a_file, a_line) = a_name.rsplit_once('_').unwrap().0.rsplit_once('_').unwrap();
+ let a_line_no = a_line.parse::<u64>().unwrap();
+ let b_name = b.file_name().unwrap().to_str().unwrap().to_string();
+ let (b_file, b_line) = b_name.rsplit_once('_').unwrap().0.rsplit_once('_').unwrap();
+ let b_line_no = b_line.parse::<u64>().unwrap();
+
+ match a_file.cmp(b_file) {
+ std::cmp::Ordering::Equal => a_line_no.cmp(&b_line_no),
+ order => order,
+ }
+ });

let mut rust_tests = String::new();
let mut c_test_declarations = String::new();
--
2.52.0.322.g1dd061c0dc-goog

Miguel Ojeda

unread,

Dec 19, 2025, 5:04:40 AM12/19/25

to David Gow, Brendan Higgins, Rae Moar, Gary Guo, Miguel Ojeda, Shuah Khan, Guillaume Gomez, linux-k...@vger.kernel.org, kuni...@googlegroups.com, rust-fo...@vger.kernel.org, linux-...@vger.kernel.org

On Fri, Dec 19, 2025 at 10:25 AM David Gow <davi...@google.com> wrote:
>
> 1. Is it worth renumbering all of the tests (hopefully just once), or
> would that break too many people's test histories?

Personally I don't have such histories just yet (and anyway the tests
generally work), and even if someone does, it may be best to pay the
price sooner rather than later.

> 2. Is there a better way of doing this in Rust? I can think of ways
> which might be nicer if the whole thing is refactored somewhat
> seriously, but if there's an easy numeric sort on strings, that'd be
> much easier.

We do essentially the same in the main loop (which is where I suppose
you picked it up), so it isn't too bad:

// The `name` follows the `{file}_{line}_{number}` pattern
(see description in
// `scripts/rustdoc_test_builder.rs`). Discard the `number`.
let name = path.file_name().unwrap().to_str().unwrap().to_string();

// Extract the `file` and the `line`, discarding the `number`.
let (file, line) =
name.rsplit_once('_').unwrap().0.rsplit_once('_').unwrap();

However, we could perhaps save the information so that the main loop
is cleaner instead of redoing it.

Having said that, given we are migrating anyway, this patch may be
simpler to avoid reworking this code. So I am happy either way.

> 3. Should we wait until after all or some of the changes to the test
> generation? Does the new --output-format=doctest option make this
> easier/harder/different?

We could do it there -- it would be easier in the sense that we have
the proper data already with the proper types etc.

On the other hand, it may be best to define the order we want to
follow (independently of the approaches), and then the migration would
be a smaller change conceptually, i.e. one less thing to decide then.

(I have to send the version to finally integrate the migration soon,
by the way -- I would like to put it in this cycle if possible).

Thanks!

Cheers,
Miguel

Reply all

Reply to author

Forward

0 new messages