Hello LLVM Developers,
Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.
Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need, and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.
To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.
We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:
The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.
The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.
If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.
Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.
The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.
There are also few areas which we do not intend to invest in at this point:
Implement dynamic loading and linking support.
Support for more architectures (we'll start with just x86-64 for simplicity).
For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.
We intend to build the new libc in a gradual manner. To begin with, the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.
So, what do you think about incorporating this new libc under the LLVM project?
Thank you,
Siva Chandra and the rest of the Google LLVM contributors
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Currently clang-cl has several dependencies on having a Visual Studio
installation present on your machine, and one of these is because to
provide an implementation of the CRT (i.e. libc). So having a libc
implementation which supports Windows and is compatible with MSVCRT
would be useful for people using clang on Windows as well.
Hello LLVM Developers,
Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.
Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need,
+1 - This has also been my experience: Many people over many years have expressed a desire to have a libc has part of the LLVM project. It is currently a large gap in our LLVM toolchain offering. Moreover, from the standpoint of my organization, an LLVM
libc could provide benefits on both production platforms and research/experimental hardware.
and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.
To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.
We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:
The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.
The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.
If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.
Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.
The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.
Great.
There are also few areas which we do not intend to invest in at this point:
Implement dynamic loading and linking support.
It will be useful to have a design document that describes the kind of system and capabilities that you're targeting, and then we can discuss how the libc might have a modular design that can be adapted for other use cases. I mention modularity because,
for example, we have accelerator hardware and various kind of low-variability/embedded environments where many, but not all, POSIX/libc capabilities make sense.
Support for more architectures (we'll start with just x86-64 for simplicity).
For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.
We intend to build the new libc in a gradual manner. To begin with, the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.
So, what do you think about incorporating this new libc under the LLVM project?
This is something that I'd like to see.
-Hal
Thank you,
Siva Chandra and the rest of the Google LLVM contributors
_______________________________________________ LLVM Developers mailing list llvm...@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
On Jun 24, 2019, at 3:23 PM, Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:
The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.
For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.
We intend to build the new libc in a gradual manner. To begin with, the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.So, what do you think about incorporating this new libc under the LLVM project?
Hello LLVM Developers,
Within Google, we have a growing range of needs that existing libc
implementations don't quite address. This is pushing us to start working on
a new libc implementation.
Are you able to share what some of these needs are? My reason for
asking is to see if there is a particular niche where existing libc
designs are not working, or if there is an approach that will handle
many use cases better than existing libc implementations.
>
> Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need, and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.
>
>
> To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.
>
I'm definitely interested in hearing more. Assembling an LLVM based
toolchain when there isn't an obvious native platform C library that
can be used could in theory benefit greatly from something like this.
As you point out, this might not be in your set of needs though.
>
> We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:
>
>
> The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.
>
There can be good reasons for designs to be monolithic though, for
example https://wiki.musl-libc.org/design-concepts.html . I'm not
enough of a C-library expert to say that this is always true, but it
does at least highlight that there is a risk that a toolkit suitable
for many libraries becomes too cumbersome to use in practice.
> The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.
>
Interesting. I've seen an embedded static-PIE loader embedded into an
image so that it could relocate itself. As all the dependencies were
statically linked there were only simple relative relocations to
resolve. Are you thinking of something along those lines or an
external loader program?
> If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.
>
I'm interested in what sort of platform that the libc could run on and
what would be needed to be provided externally? In particular I'm
interested in whether a platform OS is required? I'm also interested
in where the boundaries of the libc, for example I'm thinking of
something like the separation of newlib and libgloss here?
> Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.
>
> The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.
>
>
> There are also few areas which we do not intend to invest in at this point:
>
>
> Implement dynamic loading and linking support.
>
> Support for more architectures (we'll start with just x86-64 for simplicity).
>
I strongly recommend you choose at least one other architecture and
build cross platform support in from the beginning. I suspect that
trying to put this in retroactively will put huge stress on the design
and the supporting infrastructure such as the build system. There is
also a danger of baking design decisions favouring one architecture
into the system, 32-bit vs 64-bit support is one obvious case. I'm
thinking that this is one area where the community could contribute.
>
> For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.
>
>
> We intend to build the new libc in a gradual manner. To begin with, the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.
>
I'm interested to see which system libc and existing platforms you
intend to support? Does this go as low as embedded system where the
platform is more like a board support package, or is this purely a
libc for platforms?
>
> So, what do you think about incorporating this new libc under the LLVM project?
>
Personally I think that if it can satisfy the needs of a sufficiently
broad segment of the community then I'm in favour. I'm looking forward
to seeing more.
Peter
>
> Thank you,
>
> Siva Chandra and the rest of the Google LLVM contributors
>
>
Many of the headers prescribed by the C specification itself have some degree of architecture-dependence. I count 11 headers that rely on some architecture details, or minimally need to have some knowledge of ABI details. These are complex, fenv, float, inttypes, limits, math, stdarg, stdatomic, stddef, stdint, and setjmp. About 10 headers have some degree of operating-system specific details (ctype, errno, signal, stdio, stdlib, wchar, wctype, threads, time, locale), although most of them have a fairly minimal abstraction base. The remaining 8 headers are truly agnostic to any implementation details and have their contents more or less mandated by the specification (assert, iso646, stdalign, stdbool, stdnoreturn, string, tgmath, and uchar), although there may be room for libraries to have accelerated implementations on particular architectures.
Since I have a little experience in this area, I'd like to chime in on
it. :-) TL;DR I think it's a reall, REALLY bad idea.
First, writing and maintaining a correct, compatible, high-quality
libc is a monumental task. The amount of code needed is not all that
large, but the subtleties of how it behaves and the difficulties of
implementing various interfaces that have no capacity to fail or
report failure, and the astronomical "compatibility surface" of
interfacing with all C and C++ software ever written as well as a
large amount of software written in other languages whose runtimes
"pass through" the behavior of libc to the applications they host, all
contribute to the scale of work, and of knowledge/expertise, involved
in making something of even decent quality. (As an aside, note that I
love to see hobby libc projects even if they have major problems, but
that's totally different from proposing something that lots of people
will end up stuck using.)
Second, corporate development teams are uniquely qualified to utterly
botch a libc, yet still push it into widespread use, and the cost is
painful compatibility hacks in all applications. Apple did this with
their fork of BSD libc code. Google has done it once already with
their fork of musl in Fuchsia -- a project which I contributed
significant amounts of free labor to in terms of tracking down folks
for license clarification their lawyers wanted, only to have them
never bother to ask me why technical things were done they way they
were before making random useless and broken changes in their fork. A
corporate-led project does not have to answer to the community, and
will leave whatever bugs they introduce in place for the sake of
bug-compatibility with their own software rather than fixing them.
Third, there is tremendous value in non-monoculture of libc
implementations, or implementations of any important library
interfaces or language runtimes. Likewise there's tremendous value in
non-monoculture of tooling (compilers, linkers, etc.). Avoiding
monoculture preserves the motivation for consensus-based standards
processes rather than single-party control (see also: Chrome and what
it's done to the web) and the motivation for people writing software
to write to the standards rather than to a particular implementation.
A big part of making that possible is clear delineation of roles
between parts of the toolchain and runtime, with well-defined
interface boundaries. Some folks have told me that I should press LLVM
to make musl the "LLVM libc" instead of whatever Google wants to do,
but that misses the point: there *shouldn't be* a "LLVM libc", or any
one library implementation that's "first class" for use with LLVM
while others are only "second class".
So, in summary:
Point 1 is why making a libc for real-world use is not to be taken
lightly.
Point 2 is why, if it is done, it shouldn't be a Google project.
Point 3 is why there should not be an "LLVM libc".
Hope this is all helpful.
Regards,
Rich
Also, I didn't read the proposal as segregating the world into first
class and second class libc implementations. For example, libc++
currently works fine with non LLVM-based toolchains, and libstdc++
currently works fine with LLVM-based toolchains. Do you see libc as
fundamentally different in this regard?
Regarding your second point, if Google were to write a libc
implementation and then upstream it in bulk, I would agree with you.
But being done in the open appears to solve the exact problem you are
concerned about, which is that corporate interests will lead to
lasting design decisions that aren't in the best interest of the
general public. By doing it in the open, such problems can be
addressed before the code is ever committed.
Since I have a little experience in this area, I'd like to chime in on
it. :-) TL;DR I think it's a reall, REALLY bad idea.
Second, corporate development teams are uniquely qualified to utterly
botch a libc, yet still push it into widespread use, and the cost is
painful compatibility hacks in all applications. Apple did this with
their fork of BSD libc code. Google has done it once already with
their fork of musl in Fuchsia
disclaimer: I work at Google so don't take my +1 as an independent vote forward.
We would like to use this on Fuchsia and I am particularly interested in creating a dynamic linking library for ELF with Roland McGrath's guidance. We spoke about creating a library for writing dynamic linkers internally and I don't see why this can't be upstreamed.
On Fuchsia we critically need support for AArch64; What do you expect to be architecture dependent? I struggled to think of where the architecture and not the operating system was the issue.
why is answering these questions at a general level important?
What do you expect the support for Windows to be? Certainly, I don't
expect you to provide Windows support personally if you don't need it,
but given that LLVM supports Windows, it should at least be done in
such a way that the design lends itself to interested parties
contributing Windows support.
On 6/24/19 5:23 PM, Siva Chandra via llvm-dev wrote:
Hello LLVM Developers,
Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.
Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need,
+1 - This has also been my experience: Many people over many years have expressed a desire to have a libc has part of the LLVM project. It is currently a large gap in our LLVM toolchain offering. Moreover, from the standpoint of my organization, an LLVM libc could provide benefits on both production platforms and research/experimental hardware.
and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.
To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.
We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:
The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.
The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.
If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.
Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.
The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.
Great.
There are also few areas which we do not intend to invest in at this point:
Implement dynamic loading and linking support.
It will be useful to have a design document that describes the kind of system and capabilities that you're targeting, and then we can discuss how the libc might have a modular design that can be adapted for other use cases. I mention modularity because, for example, we have accelerator hardware and various kind of low-variability/embedded environments where many, but not all, POSIX/libc capabilities make sense.
The most immediate thing I think we will run into is that you
mentioned wanting this to take shape as something that sits in between
system libc and application. Given that Windows' libc and other
versions of libc are so different, I expect this to lead to some
interesting problems.
Can you elaborate more on how you envision this working with llvm libc
in between application and system libc?
On Tue, Jun 25, 2019 at 4:20 PM Siva Chandra <sivac...@google.com> wrote:
>
Syscalls are operating system specific and architecture dependent so I think we'll want an abstraction layer around the fundamental operations the syscalls support anyway. Some things like open aren't even syscalls on all operating
systems. There might be a generic syscall layer added that would be architecture and not operating system specific but even on x86_64 there are two different ways to do syscalls I think. Loading, startup, and linking are all both format and operating system specific and a few of these details involved are determined by the architecture but they're trivially abstracted away.why is answering these questions at a general level important?
Because I wanted to make sure I understood the direction and the restriction stated. The restriction on what architecture will be used without stating a restriction on the operating system seemed like an odd statement. I'd very much like operating system abstractions to be considered right out of the gate and this seems like a bigger issue than the architecture to me.
The main concern I have is that Windows is so different from
everything else that there is a high likelihood of decisions being
baked in early on that make things very difficult for people to come
along later and contribute a Windows implementation. This happened
with sanitizers for example (lack of support for weak functions on
Windows), LLDB (posix api calls scattered throughout the codebase),
and I worry with libc it will be even more difficult to correctly
design the abstraction because we have to deal with executable file
format, syscalls, operating system loaders, and various linkage
models.
The most immediate thing I think we will run into is that you
mentioned wanting this to take shape as something that sits in between
system libc and application. Given that Windows' libc and other
versions of libc are so different, I expect this to lead to some
interesting problems.
Can you elaborate more on how you envision this working with llvm libc
in between application and system libc?
On Mon, Jun 24, 2019 at 3:45 PM Finkel, Hal J. <hfi...@anl.gov> wrote:
On 6/24/19 5:23 PM, Siva Chandra via llvm-dev wrote:
Hello LLVM Developers,
Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.
Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need,
+1 - This has also been my experience: Many people over many years have expressed a desire to have a libc has part of the LLVM project. It is currently a large gap in our LLVM toolchain offering. Moreover, from the standpoint of my organization, an LLVM libc could provide benefits on both production platforms and research/experimental hardware.
and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.
To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.
We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:
The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.
The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.
If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.
Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.
The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.
Great.
There are also few areas which we do not intend to invest in at this point:
Implement dynamic loading and linking support.
It will be useful to have a design document that describes the kind of system and capabilities that you're targeting, and then we can discuss how the libc might have a modular design that can be adapted for other use cases. I mention modularity because, for example, we have accelerator hardware and various kind of low-variability/embedded environments where many, but not all, POSIX/libc capabilities make sense.
I am of the opinion that modularity should be as fine-grained as possible. For example, one should be able to pick and package individual functions into a libc as suitable for their platform.That said, I am open to other ideas you might have about modularity. I am also open to getting convinced that function level granularity is an overkill.
This sounds like a good starting position to me. We can adjust over time.
Thanks again,
Hal
How do you guarantee that if you implement method A and forward method
B, that B will behave the same as it would have if you had forwarded A
also? It might not even work at all. Where can you safely draw this
boundary?
Users can set errno for example, and in many cases they must set errno
to 0 before invoking a call if they want to reliably detect an error.
So let's say they set errno to 0, then call a method which our libc
implementation decides to forward. What do we do? We could propagate
errno on every single call, but my point is that there are going to be
a ton of subtle issues that arise from this approach that are hard to
foresee, precisely because the implementation details of a libc
implementation are supposed to be just that - implementation details.
On Tue, Jun 25, 2019 at 4:05 PM Jake Ehrlich <jakehe...@google.com> wrote:Syscalls are operating system specific and architecture dependent so I think we'll want an abstraction layer around the fundamental operations the syscalls support anyway. Some things like open aren't even syscalls on all operatingRight, syscalls are OS _and_ architecture dependent. So yes, one will have to build abstraction layers over fundamental operations in general.systems. There might be a generic syscall layer added that would be architecture and not operating system specific but even on x86_64 there are two different ways to do syscalls I think. Loading, startup, and linking are all both format and operating system specific and a few of these details involved are determined by the architecture but they're trivially abstracted away.why is answering these questions at a general level important?
Because I wanted to make sure I understood the direction and the restriction stated. The restriction on what architecture will be used without stating a restriction on the operating system seemed like an odd statement. I'd very much like operating system abstractions to be considered right out of the gate and this seems like a bigger issue than the architecture to me.Ah, I see what happened.So, we are definitely not restricting anything by design here. All we are saying is that we do not intend to contribute beyond x86_64 and Linux to begin with. The community is free to contribute and widen the scope as suitable.
With respect to how exactly we want to build the abstractions, I am of the opinion that we have to go on a case by case basis. The scope of the project is so large that I think it is more meaningful to discuss designs at a more narrow level based on the area that is being worked on. Sure, we might end up discovering patterns down the road and choose to unify certain things eventually.
On Tue, Jun 25, 2019 at 4:49 PM Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:On Tue, Jun 25, 2019 at 4:05 PM Jake Ehrlich <jakehe...@google.com> wrote:Syscalls are operating system specific and architecture dependent so I think we'll want an abstraction layer around the fundamental operations the syscalls support anyway. Some things like open aren't even syscalls on all operatingRight, syscalls are OS _and_ architecture dependent. So yes, one will have to build abstraction layers over fundamental operations in general.systems. There might be a generic syscall layer added that would be architecture and not operating system specific but even on x86_64 there are two different ways to do syscalls I think. Loading, startup, and linking are all both format and operating system specific and a few of these details involved are determined by the architecture but they're trivially abstracted away.why is answering these questions at a general level important?
Because I wanted to make sure I understood the direction and the restriction stated. The restriction on what architecture will be used without stating a restriction on the operating system seemed like an odd statement. I'd very much like operating system abstractions to be considered right out of the gate and this seems like a bigger issue than the architecture to me.Ah, I see what happened.So, we are definitely not restricting anything by design here. All we are saying is that we do not intend to contribute beyond x86_64 and Linux to begin with. The community is free to contribute and widen the scope as suitable.IMO It is perfectly fine to have a favorite target in mind that you want to put your effort to support.However if the project is not started from the ground up by involving people that care about other platforms (and you have enough variety of these), then it is likely that assumptions about your favorite platform will be baked in the foundations of the project and it'll be technically hard for the community to re-use these pieces in the future or contribute support for their platform (I'm making a similar point to Zach here).If we must have a libc in LLVM, I hope it will be designed and implemented from the beginning with multiple OS and at least two architectures from the beginning. Even if you only really care about X86/Linux, you may have to put some minimal amount of effort to support Windows just to prove your design (ideally there would be enough support in the community so that putting effort to support Windows isn't only on you).
First, writing and maintaining a correct, compatible, high-quality
libc is a monumental task.
Point 1 is why making a libc for real-world use is not to be taken
lightly.
Point 2 is why, if it is done, it shouldn't be a Google project.
Point 3 is why there should not be an "LLVM libc".
I find it interesting that a reimplementation of libc is being
discussed without clearly stating the differences and benefits of the
new implementation.
Or did I miss the discussion about the differences and benefits?
Sincerely
Jan
Hi
I find it interesting that a reimplementation of libc is being
discussed without clearly stating the differences and benefits of the
new implementation.
Or did I miss the discussion about the differences?
Siva touched on them at the top of his message, but it maybe got lost
a bit in the ensuing discussion. Plus the differences and benefits
will to some extent depend on where people want to go with it.
As someone who has spent the last several years hammering the round
peg of glibc into the square hole that is Google production
infrastructure (it does work, breadcrumbs to the glibc branches here -
https://sourceware.org/glibc/wiki/GlibcGit/google_namespace ), I find
a couple opportunities especially appealing:
1) Static linking, which opens up opportunities for whole-program
analysis. In theory, one could do this with glibc, but even aside
from the LGPL issue, the code is built to present a versioned-symbol
ABI. Imagine being able to trace all the way to the bottom of a
printf call, and only incorporate the bits that you actually use, or
being able to elide locks in a known safe zone.
2) Updated build machinery and coding style. I'm sure it's urban
legend that glibc was written as a test case for every GNU Make
feature :-) but its makefiles are pretty intricate, there are a bunch
of cases where chunks of code are synthesized by make rules, and a
bunch more where important code is in the bodies of multi-page
multi-level C macros, in the best programming style of the 1980s.
I really have nothing to do with this project, and no insight on the thoughts behind it, but I think you and several other people on this thread have missed a significant issue: the thread is conflating whether it is a good idea to "create yet another libc" with whether it is a good idea to "contribute that code to LLVM". I don’t think arguing whether or not someone should build a project is on-topic for this list. Given that they appear motivated to build it, the question is whether this fits into the LLVM umbrella.
With my LLVM hat on (I also work for Google, but am unaffiliated and uninvolved with this proposal), it appears clearly beneficial for LLVM to have a libc if it were done well. That said, clang shouldn’t/couldn't *require* one specific libc, just like we don’t require libc++ as the standard library. We want LLVM components to be mixable and matchable.
I appreciate the comments on this thread that are throwing in ideas for how to make the project better, how to ensure it grows to being a successful and widely useful component of LLVM, etc. I for one think that this could be very useful for people building custom micro targets, and being able to build custom configs of a libc without (e.g.) stdio or libm would be a nice way to shed weight.
-Chris
On Jun 24, 2019, at 3:23 PM, Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:
Hello LLVM Developers,Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.
Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need, and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.
To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.
We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:
The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic. The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.
Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions. The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.
There are also few areas which we do not intend to invest in at this point:
Implement dynamic loading and linking support. Support for more architectures (we'll start with just x86-64 for simplicity).
For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.We intend to build the new libc in a gradual manner. To begin with, the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.
So, what do you think about incorporating this new libc under the LLVM project?
Thank you,Siva Chandra and the rest of the Google LLVM contributors
There have been a lot of questions about our reasons for opting to
build a new libc and why an existing libc implementation does not meet
our needs. I will try to address these questions in a general fashion
in this email. I will answer individual concerns separately. Before I
start, I also want to apologize if I am being late to answer, or
appearing to be ignoring some of the emails. I am not trying to ignore
or avoid any one or any question - it is just that I need time to
process your questions and compose meaningful answers.
So, we have a bunch of reasons for a new libc and why we prefer it to
be a part of the LLVM project:
1. Static linking without the complexity of dynamic linking - Most
libc implementations end up being complicated because they support
dynamic loading/linking. This is not bad by itself, but we want to be
able to take out dynamic linking capability where possible and get the
benefits of the much simpler system. We believe that building
everything in a “as a library fashion” would facilitate this.
2. As somebody else has pointed out in the list, we want to have a
libc with as much fine grained modularity as possible. This not only
helps one to pick and choose what they want, but also makes it easy to
adapt to different build systems. Moreover, such a modular system will
also facilitate deploying chunks of functionality during the
transition from another libc to this new libc.
3. Sanitizer supported testing and fuzz testing from the start - Doing
this from the start will impact few design choices non-trivially. For
example, sanitizers need that a target be rebuilt with sanitizer
specific specialized options. We want to develop the new libc in such
a fashion that it will work with these specialized options as well.
4. ABI independent implementation as far as possible - There will be
places where it would not be possible to implement in an ABI
independent fashion. However, wherever possible, we want to use normal
source code so that compiler-based changes to the ABI are easy. Our
reasons for ABI independent implementations fall into two categories:
a) Long term changes to the ABI for security like SCADS, and for
performance tuning like caller/callee register ratios to better match
software and hardware.
b) Rapid deployment of specific ABI changes as part of security
mitigation strategies such as those for Spectre. For example,
speculative load hardening would have vastly benefitted from being
able to change the calling convention.
5. Avoid assembly language as far as possible - Again, there will be
places where one cannot avoid assembly level implementations. But,
wherever possible, we want to avoid assembly level implementations.
There are a few reasons here as well:
a) We want to leverage the compiler for performance wherever possible,
and as part of the LLVM project, fix compiler bugs rather than use
assembly.
b) Enable sanitizers and coverage-based fuzzing to work well across
the implementation of libc.
c) Allow deploying compiler-based security mitigations such as those
we needed for Spectre.
6. Having the support of the LLVM community, project, and
infrastructure - From access to the broad platform expertise in the
community to the strong license and project structure, we think the
project will be significantly more successful as part of LLVM than
elsewhere.
All this does not mean we want to implement everything from scratch.
If someone has implementations for parts of the libc ready, and would
like to contribute to this project under the LLVM license, we will
certainly welcome it.
I really have nothing to do with this project, and no insight on the thoughts behind it, but I think you and several other people on this thread have missed a significant issue: the thread is conflating whether it is a good idea to "create yet another libc" with whether it is a good idea to "contribute that code to LLVM". I don’t think arguing whether or not someone should build a project is on-topic for this list. Given that they appear motivated to build it, the question is whether this fits into the LLVM umbrella.
I don't understand your reasoning here. If there's reason to believe it
should not be built at all, wouldn't that also imply that it shouldn't
be taken under LLVM's umbrella? The LLVM community (including myself)
will be responsible for maintaining this software and to do that we must
figure out the specifications, trade-offs, and use cases. How should we
determine the requirements of something that has no reason to exist?
+1 for what Hal Finkel has said below about switching from redirectors
to implementations: There will be certain groups of functions which
will have to be switched all together. We will not be able to do it
one function at a time for such groups.
> > How do you guarantee that if you implement method A and forward method
> > B, that B will behave the same as it would have if you had forwarded A
> > also? It might not even work at all. Where can you safely draw this
> > boundary?
Are you talking about a scenario wherein implementation of B in the
system libc calls its A? If yes, most libc implementations do a good
job of using internal names in such scenarios. That is, B would call A
with an internal name. This ensures that B from the system libc calls
A also from the system libc and not the redirector/forwarder.
> > Users can set errno for example, and in many cases they must set errno
> > to 0 before invoking a call if they want to reliably detect an error.
> > So let's say they set errno to 0, then call a method which our libc
> > implementation decides to forward. What do we do? We could propagate
> > errno on every single call, but my point is that there are going to be
> > a ton of subtle issues that arise from this approach that are hard to
> > foresee, precisely because the implementation details of a libc
> > implementation are supposed to be just that - implementation details.
Dealing with errno in particular is probably not as nasty as it seems.
The standard allows errno to be a macro. Hence, for the transitory
phase, implementations and redirectors in our libc can make use of the
errno from the system libc. Something like this:
$> cat llvm-errno.cpp
#include <errno.h> // This is the system-libc header file
int *__llvm_errno() {
return &errno;
}
$> cat errno.h # This is the llvm libc's errno.h
int *__llvm_errno();
#define errno (*__llvm_errno())
On Tue, Jun 25, 2019 at 6:20 PM Finkel, Hal J. <hfi...@anl.gov> wrote:
> You certainly can't mix-and-match on a per-function level, in general. I
> suspect that there are some subsystems that can be substituted. Using
> open from one libc and close from another seems problematic. Using open
> and close from one libc and qsort from another is probably fine. And, as
> you point out, the library might need to be configurable to use an
> externally-provided errno.
a) We want to leverage the compiler for performance wherever possible,
and as part of the LLVM project, fix compiler bugs rather than use
assembly.
Some natural questions:
1) Will libm be included?
2) How will llvm libc be different from musl in design perspectives?
Then another natural question is how the kernel differences will be effectively isolated. The platform specific macros in compiler-rt may be a bit messy now. I hope we can prevent that situation.
On 26/06/2019 17:02, Andrew Kelley via llvm-dev wrote:
> Finally, I'm only aware of 2 operating systems where the libc is not an
> integral part of the system, which is Linux and Windows. For example on
> macOS, FreeBSD, OpenBSD, and DragonFlyBSD, the libc is guaranteed to be
> available, and must be dynamically linked, because this is the stable
> syscall ABI.
Solaris and macOS (kind-of) belong on this list, but FreeBSD does not
and I don't believe other BSDs do, though the situation is somewhat more
complex. On FreeBSD, the system call ABI is stable and there are compat
layers that allow foreign or legacy system call interfaces to be exposed
to userspace processes (e.g. a FreeBSD 7 system call table on FreeBSD
12, or a Linux system call table on any FreeBSD. The Capsicum sandbox
mode is also implemented in part by pivoting the system call layer: once
you call cap_enter, some system calls are simply not exposed to you at
all).
There is even CloudABI, which uses a mostly musl-derived libc and a
Capsicum-derived system call table. This is used for statically linked
applications with a custom launcher that gives strong security guarantees.
That said, the relationship between FreeBSD's libc, libthr (pthreads)
and rtld are quite complex, as are their interactions with the kernel.
Supporting dlopening libthr turned out to be incredibly hard to support
in practice, but even without that, there is some complexity from the
fact that libc must allow libthr to preempt a number of its symbols (and
must provide implementations of things like pthread_mutex for programs
that do not start threads). In the 5.x time frame, we did support two
different pthreads implementations. This was, in hindsight, an
absolutely terrible idea and not something that I'd ever recommend
anyone do ever again.
On macOS, libSystem is actually the public interface to the kernel, so
you can bring along your own libc if you want to, you just have to
dynamically link to libSystem to get access to system calls (or you do
what Go did, try to make them without going via libSystem, and watch
every single program written in your language die when the kernel's
gettimeofday interface changes...). This; however, makes it effectively
impossible to difficult to bring your own dyld replacement to macOS,
because it must be able to load libSystem without making any system calls...
> So it would only make sense for an LLVM libc to be for
> Linux and Windows. It seems reasonable to assume that Google is only
> interested in Linux. In this case I have to re-iterate my original
> question, what are the needs that are not being met by existing Linux
> libcs, such as musl?
I am also unconvinced that it is possible to design a clean platform
abstraction layer for libc that would work over even Linux and FreeBSD
without imposing significant penalties for one or the other. If you add
Windows into the mix, then it gets a lot harder. POSIX's decision to
use int, rather than a pointer type, for file descriptors and to make
specific guarantees about reuse order (rather than just providing dup2
as a moderately sane interface) means that userspace code will need to
implement the file descriptor table. Do we build higher-level layering
on top of file descriptors or do we support Windows HANDLEs natively for
internal usage and use fds only for public APIs?
The idea of an LLVM libc has been proposed a few times and generally the
pushback has been that it doesn't make sense because libc is so
intimately tied to the host kernel that it's very hard to consider it as
a portable component.
David
I think it's becoming uncommon to find cases like that today; the
person who thinks they have a magic assembly hack finds that it works
well for one microbenchmark on one architecture variant, but
disappoints when used in real code. In fact, glibc has been throwing
out a bunch of assembly code in recent years, as testing shows much of
it to not to have any noticeable advantage.
If the customized calling convention scheme works out, it's going to
be a huge incentive to fix the compiler in case of performance
lossage; it will be quite difficult to write assembly that is equally
performant for all possible calling conventions, and if you try to
assume a convention, then the assumption propagates up through the
program, possibly defeating more important optimizations.
On Jun 27, 2019, at 8:16 AM, Stan Shebs <stan...@google.com> wrote:For example, if someone says “I can shave 1 cycle out of this important thing if I write it in asm” and you know that a suitably capable compiler engineer can achieve the same thing given enough time, how do you plan to push back?
I think it's becoming uncommon to find cases like that today; the
person who thinks they have a magic assembly hack finds that it works
well for one microbenchmark on one architecture variant, but
disappoints when used in real code. In fact, glibc has been throwing
out a bunch of assembly code in recent years, as testing shows much of
it to not to have any noticeable advantage.
If the customized calling convention scheme works out, it's going to
be a huge incentive to fix the compiler in case of performance
lossage; it will be quite difficult to write assembly that is equally
performant for all possible calling conventions, and if you try to
assume a convention, then the assumption propagates up through the
program, possibly defeating more important optimizations.
Zhang
Hello LLVM Developers,
Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.
Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need, and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.
To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.
We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:
The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.
The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.
If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.
Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.
The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.
There are also few areas which we do not intend to invest in at this point:
Implement dynamic loading and linking support.
Support for more architectures (we'll start with just x86-64 for simplicity).
For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.
We intend to build the new libc in a gradual manner. To begin with, the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.
So, what do you think about incorporating this new libc under the LLVM project?
Thank you,
Siva Chandra and the rest of the Google LLVM contributors
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
On Jun 27, 2019, at 2:53 PM, Saleem Abdulrasool via llvm-dev <llvm...@lists.llvm.org> wrote:
So, what do you think about incorporating this new libc under the LLVM project?As stated, I really feel that this is far too specialised to certain use cases that are pertinent to Google. I think that this needs to be broadened to allow a general purpose libc much as libc++ is a general C++ implementation. I think that the project has a different set of requirements and seems like it would be extremely interesting to see how it would develop over time. This could really be an interesting choice for a certain type of project but as described feels like it is best explored outside of the umbrella of LLVM.
_______________________________________________
Let me change the direction here a little bit. Lets say, for Windows,
you can develop the new libc starting from a clean slate without
having to worry about the redirectors/forwarders. Is that a good
enough place for you to start?
What I am getting to is this: redirectors are probably an
implementation detail at this point. We think they will allow us to
develop and phase-in this libc in a gradual manner. But, if they end
up being a problem on other platforms, we will build them in such a
way that they only stay as Linux specific implementation details. If
other platforms can benefit from them, they are of course free to
adopt them.
> Then there are more immediate issues. On Windows specifically, I’m not even sure it’s going to be physically possible to link in two copies of the CRT and have one forward to the other. If it is possible, it’s very non obvious how to make it work and will likely require a ton of additional machinery.
No, I do not think we want to mix up CRTs on any platform. At the
least, it will be disruptive to the compiler drivers. Our goal is to
build a CRT with supports statically linked executables on Linux. We
do not intend to mix this new CRT with the CRT from the system libc.
The new CRT might only be useful after a non-trivial part of the libc
has been built. Until then, we have to use the CRT from the system
libc.
May be my email listing our goals is being misinterpreted as being the
bounding set of goals for the project. So, let me make it clear again:
The goals I have listed are just our initial set of goals for the
project. Members of the community are of course free to add their own
goals to this set, implement them, and make it a "full solution." I
have also mentioned in some of my earlier emails that we do not intend
to design out any particular feature or platform. For example, I have
said that we do not intend to work on dynamic linking/loading at least
to begin with. This does not mean that the scope of the project is
curtailed to static linking. The members of the community are free to
add support for dynamic linking/loading. In fact, if dynamic
linking/loading support is added in a modular/"as a library" fashion,
it makes it a win-win situation as we will be able to take it out if
we do not require it.
On Thu, Jun 27, 2019 at 9:06 AM Zachary Turner <ztu...@roblox.com> wrote:
> I guess let me make this concrete: can you propose a specific separation that you have in mind?
>
> Keep in mind that even if A doesn’t depend on B, that doesn’t mean that A and B can be separated. You mentioned that open() and close() would obviously have to be done at the same time, but it’s much worse than this: The *entire transitive closure* of open() and close() must be done at the same time, and my hypothesis is that this is going to a) be much larger than you expect, and b) be different with different underlying libc implementations.
Let me change the direction here a little bit. Lets say, for Windows,
you can develop the new libc starting from a clean slate without
having to worry about the redirectors/forwarders. Is that a good
enough place for you to start?
> Then there are more immediate issues. On Windows specifically, I’m not even sure it’s going to be physically possible to link in two copies of the CRT and have one forward to the other. If it is possible, it’s very non obvious how to make it work and will likely require a ton of additional machinery.
No, I do not think we want to mix up CRTs on any platform. At the
least, it will be disruptive to the compiler drivers. Our goal is to
build a CRT with supports statically linked executables on Linux. We
do not intend to mix this new CRT with the CRT from the system libc.
The new CRT might only be useful after a non-trivial part of the libc
has been built. Until then, we have to use the CRT from the system
libc.
No, I do not think we want to mix up CRTs on any platform. At the
least, it will be disruptive to the compiler drivers. Our goal is to
build a CRT with supports statically linked executables on Linux. We
do not intend to mix this new CRT with the CRT from the system libc.
The new CRT might only be useful after a non-trivial part of the libc
has been built. Until then, we have to use the CRT from the system
libc.
How would you perform redirection if both copies are not linked in? Some sort of out-of-process mechanism? Or maybe I'm misunderstanding the nature of the redirection you're referring to.
Saleem, Owen, others on the thread who are concerned about this: it seems that some of the concern is that the project goals are too narrow, and thus the eventual result may not serve the full community well over time.Would any of you be interested in what we should consider as the list of requirements for such a full solution? It would make it much easier to evaluate initial steps if we were to have a big picture of the problem to solve over time.
-ChrisOn Jun 27, 2019, at 2:53 PM, Saleem Abdulrasool via llvm-dev <llvm...@lists.llvm.org> wrote:
So, what do you think about incorporating this new libc under the LLVM project?As stated, I really feel that this is far too specialised to certain use cases that are pertinent to Google. I think that this needs to be broadened to allow a general purpose libc much as libc++ is a general C++ implementation. I think that the project has a different set of requirements and seems like it would be extremely interesting to see how it would develop over time. This could really be an interesting choice for a certain type of project but as described feels like it is best explored outside of the umbrella of LLVM.I don't have a strong stake in this decision, but Saleem's commentary matches my thoughts on the topic. Maybe some of this is related to messaging - would the proposed project be *an* LLVM libc or *the* LLVM libc. There is already at least one instance within the LLVM umbrella where a subproject designed and built to a particular set of constraints became *the* LLVM solution, and ended up disincentivizing investment from contributors whose priorities didn't match those constraints. Staking the blessed-by-LLVM slot for a piece of the toolchain is not free.To turn the question around, why should *this* libc (assuming it will be built whether or not LLVM accepts it) be *the* LLVM libc?--Owen_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
On Thu, Jun 27, 2019 at 3:56 PM Zachary Turner <ztu...@roblox.com> wrote:No, I do not think we want to mix up CRTs on any platform. At the
least, it will be disruptive to the compiler drivers. Our goal is to
build a CRT with supports statically linked executables on Linux. We
do not intend to mix this new CRT with the CRT from the system libc.
The new CRT might only be useful after a non-trivial part of the libc
has been built. Until then, we have to use the CRT from the system
libc.How would you perform redirection if both copies are not linked in? Some sort of out-of-process mechanism? Or maybe I'm misunderstanding the nature of the redirection you're referring to.There is probably a difference in what we mean by CRT _and_ redirectors. Let me try to make my meaning clear.By CRT, I am referring to the [r]crt*.o files on Linux which handle program startup and termination logic. I do not know if CRT means something else on Windows.
With respect to "redirectors", I do not want to get locked into an implementation discussion here, so let me just say that they are simply functions in the new libc which merely call into the system libc.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> I’d love to hear what you have in mind with point 3 above, and see it expanded. libc++ implements C++11 and subsequent standards, and that makes me wonder:
>
> Which standards would this libc implement?
We need parts of the C standard library, parts of the POSIX
extensions, and also the linux headers. The community is of course
free to widen the surface as needed.
> Would you implement upcoming C standards, and how would you manage “experimental” features (API changes, ABI changes, etc)?
We will probably take this up on an as-needed basis.
> What parts of the standard wouldn’t you follow, why, how would the LLVM community determine this?
I would think what we (the "we" here is for the developer community
and not my company or my team) communicate would depend on how the
project evolves. For example, at the very beginning, we will probably
only say "large parts of the standards A, B, C are still
unimplemented." When the implemented surface becomes large enough, we
might start explicitly listing the unimplemented parts. There might be
parts which require qualification with version numbers.
> Which parts aren’t worth implementing?
> Which parts cannot be safely used in modern coding practice? How would you remedy what’s perceived as “the bad parts”?
At a certain level, what is worth and what is safe/unsafe is a
subjective matter. So, instead of listing my opinions here, let me say
this: If we build sufficient modularity into the libc, one will be
able to pick and choose what they want, and omit what they do not
want.
> I’d love it if the C Standards Committee, WG14, got renewed involvement through this project. Is that an explicit goal? Who will join WG14 in this effort?
> What part of C do you see this project help improve over time?
The answer to this question also depends on how the project and the
community around it evolves.
> How do you intend to test this C library? Fuzzing and all that is nice, but just straight conformance testing is what I’d like to hear about.
What kind of testing we want to do depends on what exactly is getting
tested. But in general, we want to do conformance tests for sure. We
also want to do some amount of differential testing between this new
libc and an existing, battle tested libc. Depending on what is getting
tested, we also want to be able to test against the test suite of an
existing libc.
On Jun 27, 2019, at 5:05 PM, Siva Chandra <sivac...@google.com> wrote:On Wed, Jun 26, 2019 at 10:27 AM JF Bastien <jfba...@apple.com> wrote:3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.I’d love to hear what you have in mind with point 3 above, and see it expanded. libc++ implements C++11 and subsequent standards, and that makes me wonder:
Which standards would this libc implement?
We need parts of the C standard library, parts of the POSIX
extensions, and also the linux headers. The community is of course
free to widen the surface as needed.
Would you implement upcoming C standards, and how would you manage “experimental” features (API changes, ABI changes, etc)?
We will probably take this up on an as-needed basis.
What parts of the standard wouldn’t you follow, why, how would the LLVM community determine this?
I would think what we (the "we" here is for the developer community
and not my company or my team) communicate would depend on how the
project evolves. For example, at the very beginning, we will probably
only say "large parts of the standards A, B, C are still
unimplemented." When the implemented surface becomes large enough, we
might start explicitly listing the unimplemented parts. There might be
parts which require qualification with version numbers.Which parts aren’t worth implementing?
Which parts cannot be safely used in modern coding practice? How would you remedy what’s perceived as “the bad parts”?
At a certain level, what is worth and what is safe/unsafe is a
subjective matter. So, instead of listing my opinions here, let me say
this: If we build sufficient modularity into the libc, one will be
able to pick and choose what they want, and omit what they do not
want.I’d love it if the C Standards Committee, WG14, got renewed involvement through this project. Is that an explicit goal? Who will join WG14 in this effort?
What part of C do you see this project help improve over time?
The answer to this question also depends on how the project and the
community around it evolves.
How do you intend to test this C library? Fuzzing and all that is nice, but just straight conformance testing is what I’d like to hear about.
What kind of testing we want to do depends on what exactly is getting
tested. But in general, we want to do conformance tests for sure. We
also want to do some amount of differential testing between this new
libc and an existing, battle tested libc. Depending on what is getting
tested, we also want to be able to test against the test suite of an
existing libc.
> On Thu, Jun 27, 2019 at 2:05 PM Chris Lattner via llvm-dev
> <llvm...@lists.llvm.org> wrote:
> >
> > Saleem, Owen, others on the thread who are concerned about this: it seems
> > that some of the concern is that the project goals are too narrow, and
> > thus the eventual result may not serve the full community well over time.
>
> May be my email listing our goals is being misinterpreted as being the
> bounding set of goals for the project. So, let me make it clear again:
> The goals I have listed are just our initial set of goals for the
> project. Members of the community are of course free to add their own
> goals to this set, implement them, and make it a "full solution." I
> have also mentioned in some of my earlier emails that we do not intend
> to design out any particular feature or platform. For example, I have
> said that we do not intend to work on dynamic linking/loading at least
> to begin with. This does not mean that the scope of the project is
> curtailed to static linking. The members of the community are free to
> add support for dynamic linking/loading. In fact, if dynamic
> linking/loading support is added in a modular/"as a library" fashion,
> it makes it a win-win situation as we will be able to take it out if
> we do not require it.
The discussion here makes me strongly suspect that this libc will remain a
linux-only implementation.
OpenBSD, and I think most other BSDs, OSX, Solaris, and others consider libc
an integral part of the system, and modify the ABI between the kernel and libc
with varying frequency. How would you want llvm libc to handle, for example,
OpenBSD's 64 bit time_t transition? There will be other situations like it.
I don't think a Linux-only solution should be adopted by LLVM, and I think that
using a non-system libc is something that will cause more pain than it's worth
outside of cases where someone has full platform control.
I wouldn't mind being proven wrong, maybe people will jump in, port it, and
maintain it on multiple platforms. I'd like to see this happen *before* this
libc was put under the LLVM umbrella.
Libcs can be written outside of LLVM, and code can be imported after it's
in wider use.
But then again, I'm mostly an observer.
--
Ori Bernstein <o...@eigenstate.org>
For what it is worth, I do believe that these files do really belong in the libc project because they are so intricately tied to the implementation of the language. I just think that the fact these files will be part of the project is merely an implementation detail and should not even be part of the discussion here.
Makes sense, but beyond “capabilities” of the eventual result, I think it is important to focus on *design* points, since they are the things that will shape the result as the effort goes from nothing to something.
For example, I think that subset-ability is really important, so imo a build system that allows compiling different confirmations is really important. This implies some configuration description, an internal dependence graph, a way to depend on targets with multiple possible implementations, etc.
Getting that right would also make compiler_rt a bit nicer.
Now I understand why you want to start with a clean slate for Windows.
As I said in my earlier email, it should be OK for a platform to not
use the redirector strategy if it cannot be implemented in a
straightforward manner.
On Thu, Jun 27, 2019 at 2:05 PM Chris Lattner via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> Saleem, Owen, others on the thread who are concerned about this: it seems that some of the concern is that the project goals are too narrow, and thus the eventual result may not serve the full community well over time.
May be my email listing our goals is being misinterpreted as being the
bounding set of goals for the project. So, let me make it clear again:
The goals I have listed are just our initial set of goals for the
project. Members of the community are of course free to add their own
goals to this set, implement them, and make it a "full solution." I
have also mentioned in some of my earlier emails that we do not intend
to design out any particular feature or platform. For example, I have
said that we do not intend to work on dynamic linking/loading at least
to begin with. This does not mean that the scope of the project is
curtailed to static linking. The members of the community are free to
add support for dynamic linking/loading. In fact, if dynamic
linking/loading support is added in a modular/"as a library" fashion,
it makes it a win-win situation as we will be able to take it out if
we do not require it.
I was of the opinion that you were asking me to elaborate point #3 of
mine from above.
> I don’t think that's a good objective for such a project. For practical purposes that’s the implementation approach that makes sense to start with, but I’m looking for what the charter of this LLVM project should be.
I want to refrain from talking as if this libc project has already
been accepted by the LLVM. But yes, if this libc project is indeed
accepted and takes off, we will definitely want a charter written down
for this as an LLVM project. And I also agree that this charter cannot
limit itself to Google's use cases.
> Compare with libc++: https://libcxx.llvm.org
Yes. Our aspirations for this libc are to be like libc++.
> I think you want to fill out a proposed set of documentation pages, like libc++’s, and answer the questions libc++ answers. Not where you’ll start or in what order (though that’s useful for this discussion!), but what your proposed libc aspires to be.
Absolutely!
> Same as above, IMO an LLVM project should aspire to something bigger, even if practical concerns guide the initial implementation.
Again, I want to wait for some sort of confirmation that we can
actually start work on this as an LLVM project.
> Personally I’m really interested in a project that increases the quality of all C libraries, and of the C standard. I therefore think champions of this project signing up to collaborate with WG14 is important.
I do not disagree. At the same time, I am of the opinion that such a
champion should grow out of this project rather than getting
volunteered or nominated. This is my personal opinion and I am ready
to be corrected.
> I think you need write a design for how this C library will be tested.
I can assure you that all this will happen once we take off.
> I suggest you have a chat with Marshall Clow (CC’ed). He does a lot of really good work with libc++ and the C++ Standards Committee. I’d like this C library to be similar to libc++ in many ways, and I’d like a leader like Marshall involved in leading this C library. Talking to Marshall will help understand the type of leadership I’d like to see in this project.
Experienced guidance is most welcome. And, thanks a lot for bringing
up everything you have done in this email. I also apologize for the
delay in my response, so thanks for your patience as well.
On Jun 27, 2019, at 10:29 PM, Siva Chandra <sivac...@google.com> wrote:On Thu, Jun 27, 2019 at 5:19 PM JF Bastien <jfba...@apple.com> wrote:On Jun 27, 2019, at 5:05 PM, Siva Chandra <sivac...@google.com> wrote:
On Wed, Jun 26, 2019 at 10:27 AM JF Bastien <jfba...@apple.com> wrote:
3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.
I’d love to hear what you have in mind with point 3 above, and see it expanded. libc++ implements C++11 and subsequent standards, and that makes me wonder:
Which standards would this libc implement?
We need parts of the C standard library, parts of the POSIX
extensions, and also the linux headers. The community is of course
free to widen the surface as needed.
Which standard specifically? So far the responses sound like “the standard Google uses”.
I was of the opinion that you were asking me to elaborate point #3 of
mine from above.
I don’t think that's a good objective for such a project. For practical purposes that’s the implementation approach that makes sense to start with, but I’m looking for what the charter of this LLVM project should be.
I want to refrain from talking as if this libc project has already
been accepted by the LLVM. But yes, if this libc project is indeed
accepted and takes off, we will definitely want a charter written down
for this as an LLVM project. And I also agree that this charter cannot
limit itself to Google's use cases.
Compare with libc++: https://libcxx.llvm.org
Yes. Our aspirations for this libc are to be like libc++.I think you want to fill out a proposed set of documentation pages, like libc++’s, and answer the questions libc++ answers. Not where you’ll start or in what order (though that’s useful for this discussion!), but what your proposed libc aspires to be.
Absolutely!Same as above, IMO an LLVM project should aspire to something bigger, even if practical concerns guide the initial implementation.
Again, I want to wait for some sort of confirmation that we can
actually start work on this as an LLVM project.
Personally I’m really interested in a project that increases the quality of all C libraries, and of the C standard. I therefore think champions of this project signing up to collaborate with WG14 is important.
I do not disagree. At the same time, I am of the opinion that such a
champion should grow out of this project rather than getting
volunteered or nominated. This is my personal opinion and I am ready
to be corrected.
I think you need write a design for how this C library will be tested.
I can assure you that all this will happen once we take off.
I suggest you have a chat with Marshall Clow (CC’ed). He does a lot of really good work with libc++ and the C++ Standards Committee. I’d like this C library to be similar to libc++ in many ways, and I’d like a leader like Marshall involved in leading this C library. Talking to Marshall will help understand the type of leadership I’d like to see in this project.
Experienced guidance is most welcome. And, thanks a lot for bringing
up everything you have done in this email. I also apologize for the
delay in my response, so thanks for your patience as well.
On Jun 26, 2019, at 2:20 PM, Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:
5. Avoid assembly language as far as possible - Again, there will be
places where one cannot avoid assembly level implementations. But,
wherever possible, we want to avoid assembly level implementations.
There are a few reasons here as well:
a) We want to leverage the compiler for performance wherever possible,
and as part of the LLVM project, fix compiler bugs rather than use
assembly.
After my first step, my first email to this thread, I was waiting for
someone to drive me towards a process. Your email now has given me
sufficient guidance on how to proceed forward. So thank you for that.
In the coming days, I will start sharing/discussing the information
you are expecting.
I'm no math expert, but I tangle with clang vs glibc's math code
regularly, and have discussed all this with Siva.
It's too early to say exactly what the implementation will look like,
but I anticipate it will be a combination of 1) and 2). There's
really no alternative to having a mode that does accurate flag
handling, but if the compiler has both library sources and call sites
in hand, it should be able to determine whether it needs to include,
say, underflow handling, and only compile in those parts. We've
handicapped ourselves somewhat by having shifted to a model where the
library functions are black boxes because of dynamic linking, and I
think we can do better than just introducing more and more ifuncs or
whatever.
I also expect there will be more work to do in the compiler, both for
builtins and for additional optimizations, and to me that is part of
the rationale to put the libc project under LLVM in general. There
won't be any secrets - if GCC folks want to try their hand at
compiling this libc, they're welcome to it - but there will be some
opportunities to co-develop library code that takes advantage of new
compiler abilities and vice versa.
> - For most platforms, there are significant performance wins available for some of the core strings and memory functions using assembly, even as compared to the best compiler auto-vectorization output. There are a few reasons for this, but one of the major ones is that—in assembly, on most architectures—we can safely do aligned memory accesses that are partially outside the buffer that has been passed in, and mask off or ignore the bytes that are invalid. This is a hugely significant optimization for edging around core vector loops, and it’s simply unavailable in C and C++ because of the abstract memory models they define. A compiler could do this for you automatically, but this is not yet implemented in LLVM (and you don’t want to be tightly coupled to LLVM, anyway?) In practice, on many systems, the small-buffer case dominates usage for these functions, so getting the most efficient edging code is basically the only thing that matters.
Google does have a little experience in this area, mem* being the libc
functions that perennially show up at the top of fleetwide performance
profiles. (Lots of protobufs to move, I guess. :-) ) I imagine there
will be both assembly and high-level versions in libc, and it will be
the compiler's challenge to meet or beat the assembly code.
>
> 1. Are you going to teach LLVM to perform these optimizations? If so, awesome, but this is not at all a small project—you’re not just fixing an isolated perf bug, you’re fundamentally reworking autovectorization. What about other compilers?
> 2. Are you going to simply write off performance in these cases and let the autovectorizer do what it does?
> 3. Will you use assembly instead purely for optimization purposes?
>
> A bunch of other questions will probably come to me around the math library, but I would encourage you to think very carefully about what specifications you want to have for a libm before you start building one. All that said, I think having more libc implementations is great, but I would be very careful to define what design tradeoffs you’re making around these choices and to what spec(s) you plan to conform, and why they necessitate a new libc rather than adapting an existing one.
>
> – Steve
> Le 28 juin 2019 à 02:30, Saleem Abdulrasool via llvm-dev <llvm...@lists.llvm.org> a écrit :
>
>
> I feel like I have an even stronger “claim”: independent of any OS/architecture, unless you are developing a freestanding libc for an embedded device, you will need this at some point and you cannot borrow them from another source
They are not required if your kernel and dynamic linker are smart enough to perform the required initialisations / termination.
On macOS, there was a time where each binary include a crt.o file that define the _start symbols which was the software entry point and perform the required initialisation before calling main, but for some times now, dyld calls « main » directly and no longer « _start », and the compiler no longer have to include crt.o file in each binaries.
I apologize for the delay. I will try to address the above questions
in this email. I will shortly follow up with answers to other
questions.
Below is a write up which I think would qualify as the "charter" for
the new libc. It is also answering questions like, "where we’ll get
with this libc?", "what's this libc actually going to be?" and similar
ones. I have used libcxx.llvm.org landing page as a template to write
it down.
###############################################
"llvm-libc" C Standard Library
========================
llvm-libc is an implementation of the C standard library targeting C11
and above. It also provides platform specific extensions as relevant.
For example, on Linux it also provides pthreads, librt and other POSIX
extension libraries.
Documentation
============
The llvm-libc project is still in the planning phase. Stay tuned for
updates soon.
Features and Goals
================
* C11 and upwards conformant.
* A modular libc with individual pieces implemented in the "as a
library" philosophy of the LLVM project.
* Ability to layer this libc over the system libc.
* Provide C symbols as specified by the standards, but take advantage
and use C++ language facilities for the core implementation.
* Provides POSIX extensions on POSIX compliant platforms.
* Provides system-specific extensions as appropriate. For example,
provides the Linux API on Linux.
* Vendor extensions if and only if necessary.
* Designed and developed from the start to work with LLVM tooling and
testing like fuzz testing and sanitizer-supported testing.
* ABI independent implementation as far as possible.
* Use source based implementations as far possible rather than
assembly. Will try to “fix” the compiler rather than use assembly
language workarounds.
Why a new C Standard Library?
=========================
Implementing a libc is no small task and is not be taken lightly. A
natural question to ask is, "why a new implementation of the C
standard library?" There is no single answer to this question, but
some of the major reasons are as follows:
* Most libc implementations are monolithic. It is a non-trivial
porting task to pick and choose only the pieces relevant to one's
platform. The new libc will be developed with sufficient modularity to
make picking and choosing a straightforward task.
* Most libc implementations break when built with sanitizer specific
compiler options. The new libc will be developed from the start to
work with those specialized compiler options.
* The new libc will be developed to support and employ fuzz testing
from the start.
* Most libc implementations use a good amount of assembly language,
and assume specific ABIs (may be platform dependent). With the new
libc implementation, we want to use normal source code as much as
possible so that compiler-based changes to the ABI are easy. Moreover,
as part of the LLVM project, we want to use this opportunity to fix
performance related compiler bugs rather than using assembly
workarounds.
* A large hole in the llvm toolchain will be plugged with this new
libc. With the broad platform expertise in the LLVM community, and the
strong license and project structure, we think that the new libc will
be more tunable and robust, without sacrificing the simplicity and
accessibility typical of the LLVM project.
Platform Support
==============
llvm-libc development is still in the planning phase. However, we
envision that it will support a variety of platforms in the coming
years. Interested parties are encouraged to participate in the design
and implementation, and add support for their favorite platforms.
Current Status
============
llvm-libc development is still in the planning phase.
Build Bots
=========
Coming soon.
Get involved!
===========
First please review our Developer's Policy. Stay tuned for llvm-libc
specific information.
Design Documents
===============
Coming soon.
Any particular reason for C11 as opposed to C17?
~Aaron
When the need arises, I do not mind being a "champion" like this. To
begin with though, we (as in the team at Google I am representing) do
not intend to participate beyond what we already do (like the C++
committee). Let me point out that I said "to begin with". So,
depending on how things evolve, we might in future increase our
participation with the committees.
Personally, it feels like it is early days - before one goes to the
committee, they should first develop some experience implementing the
standard library. If there is already one such person in the
community, and they would like to take the lead and engage with the
committee from the start, it would be most welcome. I would only be a
hand-waving participant if I were to do it today.
> I think you need write a design for how this C library will be tested.
> You want a plan before it takes off. Testing standardized stuff has enough precedent that you should be able to look at what others have done, and come up with a plan up front. I really like that you want to fuzz, use sanitizers, etc. That’s pretty novel for this kind of project. Basic standards testing isn’t novel, so it should be pretty easy to figure out.
Beyond fuzz and sanitizer based testing, at a general level, this
would be covered:
1. Extensive unit testing.
2. Standards conformance testing.
3. If relevant and possible, differential testing: We want to be able
to test llvm-libc against another battle-tested libc. This is
essentially to understand how we differ from other libcs.
4. If relevant and possible, test against the testsuite of an existing
libc implementation.
One could go into details here, but I think it is best to take them up
on a case-by-case basis: when we are implementing X, we will discuss
what exact kind of testing makes sense for X.
Sorry, I split that part into another email:
http://lists.llvm.org/pipermail/llvm-dev/2019-July/133867.html
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>> * Ability to layer this libc over the system libc.
Echoing others, this seems dubious to me. Why not build up small pieces
at a time and write tests for them? This library doesn't need to
support all existing programs out of the gate. I dont't think libc++
layered on top of existing standard C++ libraries, so why would libc
need to?
>> * Provide C symbols as specified by the standards, but take advantage
>> and use C++ language facilities for the core implementation.
Does this mean C programs would require a C++ runtime? If not, how will
the project ensure that?
-David
I don't think this is really possible, without tooling designed
specifically to do it (remapping symbols, etc.), which clang/LLVM,
beign the compiler/tooling, *could* do. But even with the right
tooling, you're going to find that it's *a lot* harder than you
expect, likely almost impossible without making assumptions about the
internals of the underlying libc that are not public contracts.
> * Provide C symbols as specified by the standards, but take advantage
> and use C++ language facilities for the core implementation.
This was done in Fuchsia's fork of musl too, and was one of my major
criticisms of it -- makes no sense except satisfying developers who
want to use C++ for the sake of it being C++. It's very hard to do
"freestanding" C++ that doesn't even rely on the underlying libc, and
if you do rely on libc, it's a circular dependency. Moreover there's
really *very little* in libc that benefits from C++ (much less a pure
freestanding C++ with no library) for implementing it. And there's
huge potential to get things wrong by using C++ in ways that have
hidden failure cases/exceptions in places where the C interface you're
implementing either cannot be allowed to fail, or where introducing
the possibility of failure would be a huge QoI flaw.
> * Provides POSIX extensions on POSIX compliant platforms.
> * Provides system-specific extensions as appropriate. For example,
> provides the Linux API on Linux.
> * Vendor extensions if and only if necessary.
> * Designed and developed from the start to work with LLVM tooling and
> testing like fuzz testing and sanitizer-supported testing.
> * ABI independent implementation as far as possible.
> * Use source based implementations as far possible rather than
> assembly. Will try to “fix” the compiler rather than use assembly
> language workarounds.
>
> Why a new C Standard Library?
> =========================
>
> Implementing a libc is no small task and is not be taken lightly. A
Indeed.
> natural question to ask is, "why a new implementation of the C
> standard library?" There is no single answer to this question, but
> some of the major reasons are as follows:
>
> * Most libc implementations are monolithic. It is a non-trivial
> porting task to pick and choose only the pieces relevant to one's
> platform. The new libc will be developed with sufficient modularity to
> make picking and choosing a straightforward task.
Have you given any thought to what it would mean to make this kind of
porting practical? The reason we haven't done it in musl is because
it's highly nontrivial. You have to either find an existing point
amenable to abstraction that's reasonably common to all existing
systems and hope it will apply to future ones too -- for musl, this
means the concept of syscalls, which are presently assumed to be Linux
ones but could be abstracted *somewhat* further, and might be in the
future.
If you can't find a suitable point amenable to abstraction that
encompasses everything you want to support, then instead you end up
making your own abstraction layer in between, and now you're stuck
with the task of porting your abstraction layer to every new system
you want to support. Now you have an extra layer of bloat, and haven't
saved any significant amount of porting work.
All of this aside, I agree that it would be rather nice to be
"non-monolithic", especially for the parts of libc that are "pure
library code" (not depending on any underlying system facilities) to
be kept separate and easy to reuse in ports to weird/bare-metal/etc.
systems.
It'd also be nice for things like stdio that do depend on a system
facility, but where the underlying system facility is understood to be
"common" at a higher level than syscalls (actual functions on fds) to
be able to work with arbitrary implementations of the underlying
functions. The reason we didn't do this from the beginning in musl is
namespacing; plain C symbols can't depend on symbols in the POSIX
namespace.
> * Most libc implementations break when built with sanitizer specific
> compiler options. The new libc will be developed from the start to
> work with those specialized compiler options.
This is a nice goal but invites all sorts of circular dependency
problems. At some point this will likely be possible with musl too,
with the exception of certain components that need to operate at early
entry time.
> * The new libc will be developed to support and employ fuzz testing
> from the start.
> * Most libc implementations use a good amount of assembly language,
> and assume specific ABIs (may be platform dependent). With the new
> libc implementation, we want to use normal source code as much as
> possible so that compiler-based changes to the ABI are easy. Moreover,
> as part of the LLVM project, we want to use this opportunity to fix
> performance related compiler bugs rather than using assembly
> workarounds.
This is particularly wrong about musl, where use of asm (especially
extern asm files as opposed to inline asm) is mostly limited to places
where something fundamentally can't be implemented without asm. We
don't use asm as a workaround for poor compiler codegen, unless you
count things like single-instruction math functions, where it would be
really hard for a compiler to pattern-recognize the whole function and
reduce it down to the instruction. (Note also that use of __builtin_*
doesn't help here because it can create circular definitions if the
compiler chooses not to inline the single instruction.)
> * A large hole in the llvm toolchain will be plugged with this new
> libc.
I read this as a confirmation of my concerns from my previous post and
tweets, that this looks like you're trying to make "LLVM libc" (or
rather "Google libc") the first-class libc for use with clang/LLVM,
radically altering the boundaries between tooling and platform, and
relegating the existing libc implementations on LLVM's targets to
second-class.
If this is not the case, can you explain what guarantees we have that
this is not what's going on?
> With the broad platform expertise in the LLVM community, and the
> strong license and project structure, we think that the new libc will
> be more tunable and robust, without sacrificing the simplicity and
> accessibility typical of the LLVM project.
Tunable and robust are usually opposites; see also: uclibc.
In summary, I think you're still massively underestimating what an
undertaking this is, mistaken about various choices/tradeoffs and
whether they make sense, and either not thinking about consequences on
ecosystem/monoculture of tightly coupling library with tooling, or
intentionally trying to bring about those consequences, contrary to
what I see as the best interests of the communities affected.
Rich
I do not think we are saying we "need to". We are only saying it "has
the ability to". One can choose not to use this ability, but having
this ability might be crucial for adoption for some users.
> >> * Provide C symbols as specified by the standards, but take advantage
> >> and use C++ language facilities for the core implementation.
>
> Does this mean C programs would require a C++ runtime? If not, how will
> the project ensure that?
This is a very good question. I think, to keep things simple and sane,
llvm-libc should not require a C++ runtime. If this needs some form of
relaxation later, then we can take it up on a case by case basis.
About ensuring that there is no accidental dependence on a C++
runtime, at a very high level, I think suitably configured bots would
suffice? There could be some tooling, but might have to be taken up on
a case by case basis again.
sorry, but this charter does not make much sense.
i think you should get some c runtime developers involved.
>
> ###############################################
>
> "llvm-libc" C Standard Library
> ========================
>
> llvm-libc is an implementation of the C standard library targeting C11
> and above. It also provides platform specific extensions as relevant.
> For example, on Linux it also provides pthreads, librt and other POSIX
> extension libraries.
>
> Documentation
> ============
>
> The llvm-libc project is still in the planning phase. Stay tuned for
> updates soon.
>
> Features and Goals
> ================
>
> * C11 and upwards conformant.
> * A modular libc with individual pieces implemented in the "as a
> library" philosophy of the LLVM project.
this is wrong on many levels:
LLVM is a bad example of library design, as it has library
safety and interface stability issues.
libc is necessarily a library so of course it is implemented
"as a library".
"modular libc" does not make much sense: with satic linking
you only link what you use (libc internal dependencies have
to be minimized if you static link anyway), with dynamic
linking, multiple modules is a huge mistake for other reasons
(it creates internal abi between the components that is
difficult to manage)
if modularity means configurability, then you have a much
bigger problem: maintenance and testing becomes harder
and if components are actually interchangable then you need
stable libc internal interfaces. (this is easy to do in
existing libcs, just not done because it's a disaster,
..except in uclibc ..which is a disaster)
> * Ability to layer this libc over the system libc.
> * Provide C symbols as specified by the standards, but take advantage
> and use C++ language facilities for the core implementation.
this mistake was already made in bionic, why ppl want to
rely on underspecified freestanding c++ language semantics
and making assumptions about c++ implementation internals
as well as dealing with subtle language incompatibilities
when writing a libc is a mistery.
> * Provides POSIX extensions on POSIX compliant platforms.
> * Provides system-specific extensions as appropriate. For example,
> provides the Linux API on Linux.
> * Vendor extensions if and only if necessary.
> * Designed and developed from the start to work with LLVM tooling and
> testing like fuzz testing and sanitizer-supported testing.
the difficulty of this is not in the libc, but various
issues in the sanitizer libraries.. e.g. you don't want to
fuzz test a libc with a fuzz runtime that depends on a c++
runtime (or any other external component that can call
back to libc outside of the control of the fuzz runtime).
> * ABI independent implementation as far as possible.
if you mean call abi, then you get this for free, just use
a portable language (such as c), which is what existing
libc implementations do anyway. (glibc has a bit more
target specifc asm than usual, but most of that has generic
fallback code so easy to drop the asm)
if you mean other abis then clarify.
> * Use source based implementations as far possible rather than
> assembly. Will try to “fix” the compiler rather than use assembly
> language workarounds.
same as above, all libcs do this already.
>
> Why a new C Standard Library?
> =========================
>
> Implementing a libc is no small task and is not be taken lightly. A
> natural question to ask is, "why a new implementation of the C
> standard library?" There is no single answer to this question, but
> some of the major reasons are as follows:
>
> * Most libc implementations are monolithic. It is a non-trivial
> porting task to pick and choose only the pieces relevant to one's
> platform. The new libc will be developed with sufficient modularity to
> make picking and choosing a straightforward task.
this does not make sense (see above).
> * Most libc implementations break when built with sanitizer specific
> compiler options. The new libc will be developed from the start to
> work with those specialized compiler options.
this does not make sense.
(new libc does not make this easier: e.g. adding asan support
to musl is about a week work and most issues are sanitizer
related problems where you have to give up correctness or
reliability for sanitizers to work)
> * The new libc will be developed to support and employ fuzz testing
> from the start.
this does not make sense.
(new libc does not make this easier)
> * Most libc implementations use a good amount of assembly language,
> and assume specific ABIs (may be platform dependent). With the new
> libc implementation, we want to use normal source code as much as
> possible so that compiler-based changes to the ABI are easy. Moreover,
> as part of the LLVM project, we want to use this opportunity to fix
> performance related compiler bugs rather than using assembly
> workarounds.
citation needed.
(removing all unnecessary asm from musl and replacing it with
intrinsics is a weekend project)
> * A large hole in the llvm toolchain will be plugged with this new
> libc. With the broad platform expertise in the LLVM community, and the
> strong license and project structure, we think that the new libc will
> be more tunable and robust, without sacrificing the simplicity and
> accessibility typical of the LLVM project.
for this hole to be filled in on linux you need distros to be able
to rebuild everything against the new libc, otherwise you just
have a new libc that nobody can use because applications have
dependencies that have to be built against the same libc.
in some contexts it is enough to have a libc with build scripts to
build the dependencies.. but existing build scripts often don't
work out of the box with a new libc.
creating a software platform around a libc is a much bigger task
than the libc itself, and without that the libc has limited value.
it seems to me that none of the answers make much sense as is.
>
> Platform Support
> ==============
>
> llvm-libc development is still in the planning phase. However, we
> envision that it will support a variety of platforms in the coming
> years. Interested parties are encouraged to participate in the design
> and implementation, and add support for their favorite platforms.
the problem is not porting the new libc to a platfrom but
porting the platform to a new libc.
i.e. you need to worry a lot more about what's above the libc
(all userspace software) than what's below it (tiny os kernel
interface).
Aaron Ballman via llvm-dev <llvm...@lists.llvm.org> writes:
>> * Ability to layer this libc over the system libc.
Echoing others, this seems dubious to me. Why not build up small pieces
at a time and write tests for them? This library doesn't need to
support all existing programs out of the gate.
I dont't think libc++
layered on top of existing standard C++ libraries, so why would libc
need to?
>> * Provide C symbols as specified by the standards, but take advantage
>> and use C++ language facilities for the core implementation.
Does this mean C programs would require a C++ runtime? If not, how will
the project ensure that?
> * Designed and developed from the start to work with LLVM tooling and
> testing like fuzz testing and sanitizer-supported testing.
the difficulty of this is not in the libc, but various
issues in the sanitizer libraries.. e.g. you don't want to
fuzz test a libc with a fuzz runtime that depends on a c++
runtime (or any other external component that can call
back to libc outside of the control of the fuzz runtime).
There are options like musl which is relatively new and might not have
as much legacy. I am sure you have looked
at various system C libraries out there for consideration, it would be
interesting
to share the insights on differences.
Two reasons:
1. The C++17 standard refers to the C11 standard.
2. C11 is sufficiently modern while not closing doors for users
requiring compliance with an "older" standards. That said, we could
choose not to implement certain items removed in the C17 standard. An
obvious example of such a candidate is `gets`.
This is somewhat confusing to me. That's a reason to support *at
least* C11. It doesn't seem like a reason to not support the latest C
standard.
> 2. C11 is sufficiently modern while not closing doors for users
> requiring compliance with an "older" standards. That said, we could
> choose not to implement certain items removed in the C17 standard. An
> obvious example of such a candidate is `gets`.
gets() was removed in C11. ;-)
I strongly think we should support the C17 standard library. C17 was a
bugfix release, so the delta between it and C11 in terms of
functionality is small but the quality is higher.
~Aaron
I think there is some misunderstanding here. My first message said
llvm-libc will target C11 __and above__. Which is to imply that the
lower bound of supported standards is C11.
On Mon, Jul 15, 2019 at 11:31 AM Finkel, Hal J. <hfi...@anl.gov> wrote:
> +1. Aiming for C17 seems better than aiming for only C11.
I interpreted Aaron Ballman's first question as, "why is the lower
bound C11 and not C17?" I answered that by saying C++17 standard still
refers to C11 standard, so we need to keep C11 as a lower bound.
Unless there is some technicality of the language and/or standards
which I am not aware of, I did not intend to convey that we do not
intend to support latest C standards.
Are you saying that the lower bound of standards llvm-libc should
support ought to be C17?
Yes, I'm sorry if I was unclear. I think that there's not much purpose
to supporting C11 as the lower bound given that C17's standard library
is C11's standard library, but with bug fixes. There were no new
features added during C17.
~Aaron