[llvm-dev] RFC: Support for preferring paths with forward slashes on Windows

188 views
Skip to first unread message

Martin Storsjö via llvm-dev

unread,
Oct 14, 2021, 8:22:40 AM10/14/21
to llvm...@lists.llvm.org, mat...@gmail.com, git...@jdrake.com
Hi,

When using Clang on Windows as a drop-in replacement for GCC, one issue
that crops up fairly soon is that not all callers can tolerate paths
spelled out with backslashes.

This is an issue when e.g. libtool parses the output of "$CC -v" (where
clang passes an absolute path to compiler-rt libraries) and uses parts of
that in shell script contexts that don't tolerate backslashes, when some
callers call "$CC --print-search-dirs", etc.

This is also one of the most important things that MSYS2 patches in their
distribution of Clang/LLVM according to their patch tracker [1].

(I've locally worked around this in my distribution without patching, by
filtering clang's stdout in a wrapper, when options like "-v" or
"--print-search-dirs" are detected, but that's essentially the same as
patching.)

I've finally taken the plunge and tried to implement this properly. I've
got a decent patch set [2] that I could start sending for review, but
before doing that, I'd like to discuss the overall design.


The main idea is that I add a third alternative to path::Style - in
addition to the existing Windows and Posix path styles, I'm adding
Windows_forward, which otherwise parses and handles Windows paths like
before (i.e. accepting and interpreting both separators), but with a
different preferred separator (as returned by get_separator()).

This allows any code on any platform to handle paths in all three forms,
just like in the existing design, when explicitly giving a path::Style
argument.

To actually make it have effect, one can make path::Style::native act like
Windows_forward instead of plain Windows. I'm not entirely sure what the
best strategy is for when to do that - one could do it when LLVM itself
was built for a MinGW target (which kind of breaks the assumption that the
tools work pretty much the same as long as one passes the right --target
options etc), or one could maybe set it up as a configure time CMake
option? Or even make it a globally settable option in the process, to
allow changing it e.g. depending on the tool's target configuration?

I also faintly remember that Reid at some point implied that it could be
an option to switch all Windows builds outright to such a behaviour?

Most of the code is entirely independent of the policy decision of
when/where to enable the behaviour - the decision is centralised to one
single spot in LLVMSupport.

In any case, with this design and a quite moderate amount of fixups, most
of the tests in check-all seem to pass, if switching the preference.

There's a couple tests that fail due to checking e.g. the literal paths %s
or %t (as output by llvm-lit, with backslashes) against paths that the
tools output. There's also a dozen or so of tests in Clang (mainly
regarding PCH) that seem to misbehave when the same paths are referred to
with varying kinds of slashes, e.g. stored with a forward slash in the PCH
but referred to with backslashes in arguments to Clang, where paths are
essentially equal but the strings differ. (For actual use with PCH, Clang
built this way seems to work - and MSYS2 have been running with tools
patched this way for quite some time, and I haven't heard about reports
about bugs relating to that patch.)

If the design seems sane (have a look at [2] if you want to have a look at
my whole series at the moment) I'd start sending the initial patches for
review.

// Martin

[1] https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-clang/README-patches.md

[2] https://github.com/llvm/llvm-project/compare/main...mstorsjo:path-separator

Chris Tetreault via llvm-dev

unread,
Oct 14, 2021, 12:43:21 PM10/14/21
to Martin Storsjö, llvm-dev, mat...@gmail.com, git...@jdrake.com
I could be mistaken, but I believe that since the dawn of time, Windows has just secretly supported forward slashes. A quick google search does not turn up any Microsoft docs stating that this is true, but I've heard rumors that it's been this way since DOS. On my Windows 10 machine, Powershell accepts /, cmd.exe accepts /, and Visual Studio accepts /. Whomever takes it upon themselves to work on this should test extensively before committing code. I would probably feel better if somebody could dig up some authoritative source on this.

Assuming that this is the case, it would probably be nice if any paths we take in were just immediately canonicalized to use / and all paths just have forward slash. I know we have a ton of tests that have this `{(/|\\)}` regex in them, and it would be nice if we could just not do that.

Thanks,
Chris Tetreault

-----Original Message-----
From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Martin Storsjö via llvm-dev
Sent: Thursday, October 14, 2021 5:22 AM
To: llvm...@lists.llvm.org
Cc: mat...@gmail.com; git...@jdrake.com
Subject: [llvm-dev] RFC: Support for preferring paths with forward slashes on Windows

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

Hi,

// Martin

[1] https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-clang/README-patches.md

[2] https://github.com/llvm/llvm-project/compare/main...mstorsjo:path-separator
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Jameson Nash via llvm-dev

unread,
Oct 14, 2021, 2:53:04 PM10/14/21
to Chris Tetreault, llvm-dev, mat...@gmail.com, git...@jdrake.com
The Win32 userspace (usually) supports them, but the underlying NT kernel does not. So they are normally converted, but not always. This can affect a few places, such as paths whose names begin with the character sequence `\\?\` or, if I recall correctly, a few odd places such as LoadLibrary(Ex) not supporting paths containing `/` (explicitly documented at https://docs.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibrarya)

Martin Storsjö via llvm-dev

unread,
Oct 14, 2021, 4:54:06 PM10/14/21
to Chris Tetreault, llvm-dev, mat...@gmail.com, git...@jdrake.com
On Thu, 14 Oct 2021, Chris Tetreault wrote:

> I could be mistaken, but I believe that since the dawn of time, Windows
> has just secretly supported forward slashes. A quick google search does
> not turn up any Microsoft docs stating that this is true, but I've heard
> rumors that it's been this way since DOS. On my Windows 10 machine,
> Powershell accepts /, cmd.exe accepts /, and Visual Studio accepts /.

Yes, overall most APIs that take paths can take either form, but in my
experience, cmd.exe pretty exclusively requires backslashes.

> Whomever takes it upon themselves to work on this should test
> extensively before committing code. I would probably feel better if
> somebody could dig up some authoritative source on this.

I don't think this aspect is anything new/controversial wrt LLVM so far;
it can take paths that use forward slashes (if given such paths) in a
number of places and pass them through pretty much as-is to the underlying
APIs.

But most cases where we take a path and feed to a underlying API is well
centralised to a single function, which takes our char based UTF8 paths
and widens them to UTF16 wchar_t, before passing them to the actual Win32
APIs in that form. Currently, that function forces the paths to backslash
form in certain cases (when it needs to prepend a \\?\ prefix for long
paths), but if we felt weary about it we could make it always force them
to backslash form.

So the fact that we can pass paths with forward slashes to Win32 APIs is a
preexisting condition and nothing that my patch set would change,
essentially - it'd just do it more often than before.

> Assuming that this is the case, it would probably be nice if any paths
> we take in were just immediately canonicalized to use / and all paths
> just have forward slash. I know we have a ton of tests that have this
> `{(/|\\)}` regex in them, and it would be nice if we could just not do
> that.

If desired, that could be a later goal - that's a couple steps further
than what I aimed for so far though.

Right now, my patchset canonicalizes paths that are made up internally
(functions like current_path(), getMainExecutable(), findProgramByName(),
and how InitLLVM() sets argv[0]) and uses the preferred separator wherever
paths are assembled in code, but in many cases, paths are taken in and
passed around in the user-provided form too. Given the full interface of
e.g. Clang, there's a huge number of different places where paths can be
provided (there's dozens of various command line options that take paths
as arguments).

Also, judging from both GCC and MSVC, neither of them seem to canonicalize
paths on input. If I call either of them with e.g. c:\dir\source.c or
c:/dir/source.c, then the warnings emitted from that file are printed with
slashes in the exact form I input.

But in any case, regardless of how far we want to go with
canonicalizations in either form, the patchset I've started on, given that
others agree on the design, is a first step towards being able to use
forward slashes. It works quite well to apply it gradually until switching
the preference.

// Martin

Fangrui Song via llvm-dev

unread,
Oct 15, 2021, 12:56:16 AM10/15/21
to Martin Storsjö, llvm...@lists.llvm.org, mat...@gmail.com, git...@jdrake.com

Big thanks to you for making investigation in this area!

clang/test/Driver tests suffer the most from Windows backslashes. MC and
DebugInfo suffer a bit as well.
I have seen so many times a new test did not pass on Windows and a fixup follow-up was needed.
Sometimes the author may adjust the test and slighly degrade the test
quality if they cannot figure out the best way supporting both / and \
(using {{/|\\\\}} multiple times on one line could clutter up).

Michael Kruse via llvm-dev

unread,
Oct 15, 2021, 10:17:06 AM10/15/21
to Martin Storsjö, llvm-dev, mat...@gmail.com, git...@jdrake.com
Thanks for working on this.

I also noticed is that the paths printed by clang -v or -### are
escaping the backslashes and put them into quotes, i.e.
"C:\\path\\to\\clang.exe" -cc1 "..\\special'^`character .c"
Interestingly, it still works copy&pasting it to the Windows command
line [2], but cmd.exe's escape character is ^ and PowerShell's is the
backtick `. What would the correct output be?

[1] https://stackoverflow.com/questions/33027024/documented-behavior-for-multiple-backslashes-in-windows-paths

Michael

Am Do., 14. Okt. 2021 um 07:22 Uhr schrieb Martin Storsjö via llvm-dev
<llvm...@lists.llvm.org>:

Martin Storsjö via llvm-dev

unread,
Oct 15, 2021, 4:53:11 PM10/15/21
to Michael Kruse, llvm-dev, mat...@gmail.com, git...@jdrake.com
On Fri, 15 Oct 2021, Michael Kruse wrote:

> I also noticed is that the paths printed by clang -v or -### are
> escaping the backslashes and put them into quotes, i.e.
> "C:\\path\\to\\clang.exe" -cc1 "..\\special'^`character .c"
> Interestingly, it still works copy&pasting it to the Windows command
> line [2], but cmd.exe's escape character is ^ and PowerShell's is the
> backtick `. What would the correct output be?

I wasn't aware that cmd.exe had an escape char per se (other than ^ for
line continuations?).

The fact that such slashes are printed double is, iirc, an intentional
quirk, so that the command lines are copypasteable in a variety of
contexts: cmd.exe don't need them doubled (but tolerates them), bash
unescapes them so it also can execute them.

This is actually one downside of using forward slashes, as cmd.exe
wouldn't be able to execute such a command straight out (only the slash
direction of the command executable itself matters though).

// Martin

Reid Kleckner via llvm-dev

unread,
Oct 18, 2021, 5:04:06 PM10/18/21
to Martin Storsjö, llvm-dev, mat...@gmail.com, git...@jdrake.com
Thanks for working on this! I think this mode is a long time coming.

On Fri, Oct 15, 2021 at 1:53 PM Martin Storsjö via llvm-dev <llvm...@lists.llvm.org> wrote:
On Fri, 15 Oct 2021, Michael Kruse wrote:

> I also noticed is that the paths printed by clang -v or -### are
> escaping the backslashes and put them into quotes, i.e.
> "C:\\path\\to\\clang.exe" -cc1 "..\\special'^`character .c"
> Interestingly, it still works copy&pasting it to the Windows command
> line [2], but cmd.exe's escape character is ^ and PowerShell's is the
> backtick `. What would the correct output be?

I wasn't aware that cmd.exe had an escape char per se (other than ^ for
line continuations?).

The fact that such slashes are printed double is, iirc, an intentional
quirk, so that the command lines are copypasteable in a variety of
contexts: cmd.exe don't need them doubled (but tolerates them), bash
unescapes them so it also can execute them.

Yep, I was going to say that.
 
This is actually one downside of using forward slashes, as cmd.exe
wouldn't be able to execute such a command straight out (only the slash
direction of the command executable itself matters though).

I think as long as the user has a way to choose between the styles, they've got what they need to unblock themselves.

Michael Kruse via llvm-dev

unread,
Oct 19, 2021, 12:40:24 AM10/19/21
to Reid Kleckner, llvm-dev, mat...@gmail.com, git...@jdrake.com
Am Mo., 18. Okt. 2021 um 16:04 Uhr schrieb Reid Kleckner <r...@google.com>:
> On Fri, Oct 15, 2021 at 1:53 PM Martin Storsjö via llvm-dev <llvm...@lists.llvm.org> wrote:
>> I wasn't aware that cmd.exe had an escape char per se (other than ^ for
>> line continuations?).

Some MS docs recommend to use it to escape redirection (<, >) and pipe
| symbols [1]. Interestingly, ^ does NOT escape within double-quotes.
The best explanation I found about it is [2].

[1] https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/echo
[2] https://ss64.com/nt/syntax-esc.html

Michael

Martin Storsjö via llvm-dev

unread,
Oct 26, 2021, 4:59:21 AM10/26/21
to Reid Kleckner, llvm-dev, mat...@gmail.com, git...@jdrake.com
On Mon, 18 Oct 2021, Reid Kleckner wrote:

> Thanks for working on this! I think this mode is a long time coming.

Just FTR, as there weren't any specific comments on the implementation
strategy, I went ahead and posted the initial couple patches for review,
at https://reviews.llvm.org/D111879 and https://reviews.llvm.org/D111880.

Reply all
Reply to author
Forward
0 new messages