[llvm-dev] A libc in LLVM

360 views
Skip to first unread message

Siva Chandra via llvm-dev

unread,
Jun 24, 2019, 6:23:39 PM6/24/19
to llvm...@lists.llvm.org

Hello LLVM Developers,


Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.


Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need, and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project. 


To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.


We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:


  1. The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.

  2. The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.

  3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.

  4. Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.

  5. The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.


There are also few areas which we do not intend to invest in at this point:


  1. Implement dynamic loading and linking support.

  2. Support for more architectures (we'll start with just x86-64 for simplicity).


For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.


We intend to build the new libc in a gradual manner. To begin with,  the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.


So, what do you think about incorporating this new libc under the LLVM project?


Thank you,

Siva Chandra and the rest of the Google LLVM contributors


Jake Ehrlich via llvm-dev

unread,
Jun 24, 2019, 6:38:05 PM6/24/19
to Siva Chandra, llvm-dev
disclaimer: I work at Google so don't take my +1 as an independent vote forward.

We would like to use this on Fuchsia and I am particularly interested in creating a dynamic linking library for ELF with Roland McGrath's guidance. We spoke about creating a library for writing dynamic linkers internally and I don't see why this can't be upstreamed.

On Fuchsia we critically need support for AArch64; What do you expect to be architecture dependent? I struggled to think of where the architecture and not the operating system was the issue.

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Zachary Turner via llvm-dev

unread,
Jun 24, 2019, 6:43:32 PM6/24/19
to Jake Ehrlich, llvm-dev
What do you expect the support for Windows to be? Certainly, I don't
expect you to provide Windows support personally if you don't need it,
but given that LLVM supports Windows, it should at least be done in
such a way that the design lends itself to interested parties
contributing Windows support.

Currently clang-cl has several dependencies on having a Visual Studio
installation present on your machine, and one of these is because to
provide an implementation of the CRT (i.e. libc). So having a libc
implementation which supports Windows and is compatible with MSVCRT
would be useful for people using clang on Windows as well.

Finkel, Hal J. via llvm-dev

unread,
Jun 24, 2019, 6:45:30 PM6/24/19
to Siva Chandra, llvm...@lists.llvm.org
On 6/24/19 5:23 PM, Siva Chandra via llvm-dev wrote:

Hello LLVM Developers,


Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.


Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need,


+1 - This has also been my experience: Many people over many years have expressed a desire to have a libc has part of the LLVM project. It is currently a large gap in our LLVM toolchain offering. Moreover, from the standpoint of my organization, an LLVM libc could provide benefits on both production platforms and research/experimental hardware.


and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project. 


To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.


We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:


  1. The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.

  2. The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.

  3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.

  4. Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.

  5. The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.


Great.



There are also few areas which we do not intend to invest in at this point:


  1. Implement dynamic loading and linking support.


It will be useful to have a design document that describes the kind of system and capabilities that you're targeting, and then we can discuss how the libc might have a modular design that can be adapted for other use cases. I mention modularity because, for example, we have accelerator hardware and various kind of low-variability/embedded environments where many, but not all, POSIX/libc capabilities make sense.


  1. Support for more architectures (we'll start with just x86-64 for simplicity).


For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.


We intend to build the new libc in a gradual manner. To begin with,  the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.


So, what do you think about incorporating this new libc under the LLVM project?


This is something that I'd like to see.

 -Hal



Thank you,

Siva Chandra and the rest of the Google LLVM contributors



_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Shawn Webb via llvm-dev

unread,
Jun 24, 2019, 8:28:47 PM6/24/19
to Siva Chandra, llvm...@lists.llvm.org
Hey Siva,

HardenedBSD is a derivative of FreeBSD that aims to perform a
clean-room reimplementation of the publicly-documented bits of the
grsecurity patchset. We're extremely interested in llvm's CFI to fill
the gap of PaX's/grsecurity's patented/GPLv3'd excellent RAP
implementation. We've made measurable and tangible progress in
researching and integrating Cross-DSO CFI (even producing a pre-alpha
Call-For-Testing of Cross-DSO CFI in HardenedBSD base).

One hard problem I need to solve is tight integration of the sanitizer
library into both our libc and our RTLD while also attempting to keep
diffs minimal with our upstream FreeBSD.

Having a libc that was sanitizer-centric (or, at least, aware) and
could serve as a drop-in replacement for our libc would be a major
win and would even enable quicker development of novel security
technologies in the future.

On Mon, Jun 24, 2019 at 03:23:20PM -0700, Siva Chandra via llvm-dev wrote:
> Hello LLVM Developers,
>
> Within Google, we have a growing range of needs that existing libc
> implementations don't quite address. This is pushing us to start working on
> a new libc implementation.
>
> Informal conversations with others within the LLVM community has told us
> that a libc in LLVM is actually a broader need, and we are increasingly
> consolidating our toolchains around LLVM. Hence, we wanted to see if the
> LLVM project would be interested in us developing this upstream as part of
> the project.
>
> To be very clear: we don't expect our needs to exactly match everyone
> else's -- part of our impetus is to simplify things wherever we can, and
> that may not quite match what others want in a libc. That said, we do
> believe that the effort will still be directly beneficial and usable for
> the broader LLVM community, and may serve as a starting point for others in
> the community to flesh out an increasingly complete set of libc
> functionality.
>
> We are still in the early stages, but we do have some high-level goals and
> guiding principles of the initial scope we are interested in pursuing:
>
>
> 1.
>
> The project should mesh with the "as a library" philosophy of the LLVM
> project: even though "the C Standard Library" is nominally "a library,"
> most implementations are, in practice, quite monolithic.
> 2.
>
> The libc should support static non-PIE and static-PIE linking. This
> means, providing the CRT (the C runtime) and a PIE loader for static
> non-PIE and static-PIE linked executables.

Having a portable, permissively-licensed CSU/CRT that supports static
PIE would be a very welcomed project, especially if HardenedBSD could
make use of it.

> 3.
>
> If there is a specification, we should follow it. The scope that we need
> includes most of the C Standard Library; POSIX additions; and some
> necessary, system-specific extensions. This does not mean we should (or
> can) follow the entire specification -- there will be some parts which
> simply aren't worth implementing, and some parts which cannot be safely
> used in modern coding practice.
> 4.
>
> Vendor extensions must be considered very carefully, and only admitted
> when necessary. Similar to Clang and libc++, it does seem inevitable that
> we will need to provide some level of compatibility with other vendors'
> extensions.
> 5.
>
> The project should be an exemplar of developing with LLVM tooling. Two
> examples are fuzz testing from the start, and sanitizer-supported testing.
>
>
> There are also few areas which we do not intend to invest in at this point:
>
>
> 1.
>
> Implement dynamic loading and linking support.

That is correct. Implementing a runtime linker (RTLD) is orthogonal.
However, it seems to be the next logical (and welcomed!) step. Not
within scope of a libc implementation, though.

> 2.
>
> Support for more architectures (we'll start with just x86-64 for
> simplicity).
>
>
> For these areas, the community is of course free to contribute. Our hope is
> that, preserving the "as a library" design philosophy will make such
> extensions easy, and allow retaining the simplicity when these features
> aren't needed.
>
> We intend to build the new libc in a gradual manner. To begin with, the
> new libc will be a layer sitting between the application and the system
> libc. Eventually, when the implementation is sufficiently complete, it will
> be able to replace the system libc at least for some use cases and contexts.
>
> So, what do you think about incorporating this new libc under the LLVM
> project?

Even if the new libc isn't merged into llvm, it would be very
interesting to collaborate on. I would hope that Google would remain
interested in keeping in open sourced, and perhaps maintained in a
fashion that multiple OS vendors can adopt.

Thanks,

--
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

Tor-ified Signal: +1 443-546-8752
Tor+XMPP+OTR: lat...@is.a.hacker.sx
GPG Key ID: 0xFF2E67A277F8E1FA
GPG Key Fingerprint: D206 BB45 15E0 9C49 0CF9 3633 C85B 0AF8 AB23 0FB2
signature.asc

Chris Lattner via llvm-dev

unread,
Jun 24, 2019, 8:33:12 PM6/24/19
to Siva Chandra, llvm...@lists.llvm.org
<disclaimer: I work at Google, though not on anything related to this project>

On Jun 24, 2019, at 3:23 PM, Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:

We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:

  1. The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.

This is awesome.  I’d really love to see a corpus of functionality built as a set of libraries that can be sliced and remixed in different ways per the needs of different use-cases.

For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.

Fantastic!


We intend to build the new libc in a gradual manner. To begin with,  the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.

So, what do you think about incorporating this new libc under the LLVM project?

I would love to see this, and I think it would fill a significant missing piece in the LLVM ecosystem.

-Chris

Zachary Turner via llvm-dev

unread,
Jun 24, 2019, 8:41:46 PM6/24/19
to Chris Lattner, llvm...@lists.llvm.org
I’m not totally sold on the idea of having it be a layer between system libc and application.  I think this is likely to create a split between windows and non windows that will be difficult to overcome.

It also seems like it brings with it its own set of difficulties.  Where can you make a separation in libc such that you’re guaranteed that the two pieces do not share any state, especially given that not everyone is going to be using the same libc?

Have you considered just starting with a blank slate?

Jon Chesterfield via llvm-dev

unread,
Jun 25, 2019, 2:16:36 AM6/25/19
to llvm-dev, llvm-dev...@lists.llvm.org
Hello LLVM Developers,

Within Google, we have a growing range of needs that existing libc
implementations don't quite address. This is pushing us to start working on
a new libc implementation.

It would be convenient for LLVM to bundle libc. I've been shipping a subset of musl with LLVM for a while now and based on passing comment at conferences suspect that to be common. The codebase is coherent and adherence to the spec is good.

If we adopt that, even by forking it, then we'd have a solid, multiarch libc with dynamic linking support within a day or so. A bit longer to set up fuzz testing.

I'd be interested to know how that fails to meet the internal requirements at Google, especially in such a way that couldn't be managed as a downstream fork.

Kind regards,

Jon

Fāng-ruì Sòng via llvm-dev

unread,
Jun 25, 2019, 4:18:28 AM6/25/19
to Siva Chandra, LLVM Developers Mailing List
Some natural questions:

1) Will libm be included?
2) How will llvm libc be different from musl in design perspectives?
   musl is another widely used libc implementation, available on many Linux distributions (https://wiki.musl-libc.org/projects-using-musl.html#Linux-distributions-using-musl and even on Windows! https://midipix.org/), often used by prebuilt packages because of its lightweightness.

It'd be great if the library will be designed with multiple kernels in mind. That can be a purpose why another libc implementation is needed. :) Then another natural question is how the kernel differences will be effectively isolated. The platform specific macros in compiler-rt may be a bit messy now. I hope we can prevent that situation.


> Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.

I'm glad to see this. Many uses of glibc symbol versioning are actually "bug-compatibility".
It'd be good to push applications to fix their own problems.


> Implement dynamic loading and linking support.

Lack of support for dynamic linking circumvents many problems: PLT lazy binding, dlclose, ABI compatibility (newer binary on older loaders), etc. However, it is good to make the intention clear whether the feature will ever be implemented in an early stage because it will influence many design choices of many interfaces.
Entirely forgetting it may bring trouble when it is eventually decided to be implemented in the future.


> Support for more architectures (we'll start with just x86-64 for simplicity).

This is fine. musl has 5 or 6 arch-dependent files for each port (arch/*/*.h) and a few more in the user interface arch/*/bits/*.h . It proves that a new port does not need a bunch of additional logic. Many optimized routines may inevitably get added, though..
--
宋方睿

Peter Smith via llvm-dev

unread,
Jun 25, 2019, 5:54:11 AM6/25/19
to Siva Chandra, llvm-dev
On Mon, 24 Jun 2019 at 23:23, Siva Chandra via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> Hello LLVM Developers,
>
>
> Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.
>

Are you able to share what some of these needs are? My reason for
asking is to see if there is a particular niche where existing libc
designs are not working, or if there is an approach that will handle
many use cases better than existing libc implementations.

>
> Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need, and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.
>
>
> To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.
>

I'm definitely interested in hearing more. Assembling an LLVM based
toolchain when there isn't an obvious native platform C library that
can be used could in theory benefit greatly from something like this.
As you point out, this might not be in your set of needs though.

>
> We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:
>
>
> The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.
>

There can be good reasons for designs to be monolithic though, for
example https://wiki.musl-libc.org/design-concepts.html . I'm not
enough of a C-library expert to say that this is always true, but it
does at least highlight that there is a risk that a toolkit suitable
for many libraries becomes too cumbersome to use in practice.

> The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.
>

Interesting. I've seen an embedded static-PIE loader embedded into an
image so that it could relocate itself. As all the dependencies were
statically linked there were only simple relative relocations to
resolve. Are you thinking of something along those lines or an
external loader program?

> If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.
>

I'm interested in what sort of platform that the libc could run on and
what would be needed to be provided externally? In particular I'm
interested in whether a platform OS is required? I'm also interested
in where the boundaries of the libc, for example I'm thinking of
something like the separation of newlib and libgloss here?

> Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.
>
> The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.
>
>
> There are also few areas which we do not intend to invest in at this point:
>
>
> Implement dynamic loading and linking support.
>
> Support for more architectures (we'll start with just x86-64 for simplicity).
>

I strongly recommend you choose at least one other architecture and
build cross platform support in from the beginning. I suspect that
trying to put this in retroactively will put huge stress on the design
and the supporting infrastructure such as the build system. There is
also a danger of baking design decisions favouring one architecture
into the system, 32-bit vs 64-bit support is one obvious case. I'm
thinking that this is one area where the community could contribute.

>
> For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.
>
>
> We intend to build the new libc in a gradual manner. To begin with, the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.
>

I'm interested to see which system libc and existing platforms you
intend to support? Does this go as low as embedded system where the
platform is more like a board support package, or is this purely a
libc for platforms?

>
> So, what do you think about incorporating this new libc under the LLVM project?
>

Personally I think that if it can satisfy the needs of a sufficiently
broad segment of the community then I'm in favour. I'm looking forward
to seeing more.

Peter

>
> Thank you,
>
> Siva Chandra and the rest of the Google LLVM contributors
>
>

Alex Brachet-Mialot via llvm-dev

unread,
Jun 25, 2019, 11:35:16 AM6/25/19
to llvm...@lists.llvm.org
What do you expect to be architecture dependent? I struggled to think of where the architecture and not the operating system was the issue.

I’m also interested in this. All I can think of is longjmp. 

On Fuchsia we critically need support for AArch64

I’d be willing to work on this side of things once work gets under way but I’m with Jake that I’m not really sure what will be architecture dependent.

Best,
Alex


Zachary Turner via llvm-dev

unread,
Jun 25, 2019, 12:32:57 PM6/25/19
to Alex Brachet-Mialot, llvm...@lists.llvm.org
fenv.h comes to mind, along with lots of routines in memory.h (eg optimized memcpy)

Cranmer, Joshua via llvm-dev

unread,
Jun 25, 2019, 1:41:47 PM6/25/19
to Alex Brachet-Mialot, llvm...@lists.llvm.org

Many of the headers prescribed by the C specification itself have some degree of architecture-dependence. I count 11 headers that rely on some architecture details, or minimally need to have some knowledge of ABI details. These are complex, fenv, float, inttypes, limits, math, stdarg, stdatomic, stddef, stdint, and setjmp. About 10 headers have some degree of operating-system specific details (ctype, errno, signal, stdio, stdlib, wchar, wctype, threads, time, locale), although most of them have a fairly minimal abstraction base. The remaining 8 headers are truly agnostic to any implementation details and have their contents more or less mandated by the specification (assert, iso646, stdalign, stdbool, stdnoreturn, string, tgmath, and uchar), although there may be room for libraries to have accelerated implementations on particular architectures.

Rich Felker via llvm-dev

unread,
Jun 25, 2019, 5:33:53 PM6/25/19
to llvm...@lists.llvm.org

Since I have a little experience in this area, I'd like to chime in on
it. :-) TL;DR I think it's a reall, REALLY bad idea.

First, writing and maintaining a correct, compatible, high-quality
libc is a monumental task. The amount of code needed is not all that
large, but the subtleties of how it behaves and the difficulties of
implementing various interfaces that have no capacity to fail or
report failure, and the astronomical "compatibility surface" of
interfacing with all C and C++ software ever written as well as a
large amount of software written in other languages whose runtimes
"pass through" the behavior of libc to the applications they host, all
contribute to the scale of work, and of knowledge/expertise, involved
in making something of even decent quality. (As an aside, note that I
love to see hobby libc projects even if they have major problems, but
that's totally different from proposing something that lots of people
will end up stuck using.)

Second, corporate development teams are uniquely qualified to utterly
botch a libc, yet still push it into widespread use, and the cost is
painful compatibility hacks in all applications. Apple did this with
their fork of BSD libc code. Google has done it once already with
their fork of musl in Fuchsia -- a project which I contributed
significant amounts of free labor to in terms of tracking down folks
for license clarification their lawyers wanted, only to have them
never bother to ask me why technical things were done they way they
were before making random useless and broken changes in their fork. A
corporate-led project does not have to answer to the community, and
will leave whatever bugs they introduce in place for the sake of
bug-compatibility with their own software rather than fixing them.

Third, there is tremendous value in non-monoculture of libc
implementations, or implementations of any important library
interfaces or language runtimes. Likewise there's tremendous value in
non-monoculture of tooling (compilers, linkers, etc.). Avoiding
monoculture preserves the motivation for consensus-based standards
processes rather than single-party control (see also: Chrome and what
it's done to the web) and the motivation for people writing software
to write to the standards rather than to a particular implementation.
A big part of making that possible is clear delineation of roles
between parts of the toolchain and runtime, with well-defined
interface boundaries. Some folks have told me that I should press LLVM
to make musl the "LLVM libc" instead of whatever Google wants to do,
but that misses the point: there *shouldn't be* a "LLVM libc", or any
one library implementation that's "first class" for use with LLVM
while others are only "second class".

So, in summary:

Point 1 is why making a libc for real-world use is not to be taken
lightly.

Point 2 is why, if it is done, it shouldn't be a Google project.

Point 3 is why there should not be an "LLVM libc".

Hope this is all helpful.

Regards,

Rich

Zachary Turner via llvm-dev

unread,
Jun 25, 2019, 6:13:59 PM6/25/19
to Rich Felker, llvm-dev
Doesn't having additional libc implementations to choose from
contribute *to* the ideal of not having a monoculture?

Also, I didn't read the proposal as segregating the world into first
class and second class libc implementations. For example, libc++
currently works fine with non LLVM-based toolchains, and libstdc++
currently works fine with LLVM-based toolchains. Do you see libc as
fundamentally different in this regard?


Regarding your second point, if Google were to write a libc
implementation and then upstream it in bulk, I would agree with you.
But being done in the open appears to solve the exact problem you are
concerned about, which is that corporate interests will lead to
lasting design decisions that aren't in the best interest of the
general public. By doing it in the open, such problems can be
addressed before the code is ever committed.

Chandler Carruth via llvm-dev

unread,
Jun 25, 2019, 6:39:48 PM6/25/19
to Rich Felker, llvm-dev
I'm gonna let the folks working on this respond to technical points, but some meta points about discussion on this list...

On Tue, Jun 25, 2019 at 2:33 PM Rich Felker via llvm-dev <llvm...@lists.llvm.org> wrote:
Since I have a little experience in this area, I'd like to chime in on
it. :-) TL;DR I think it's a reall, REALLY bad idea.

In case there is any confusion, I'm really glad you're participating in the discussion here because of this background.
 
Second, corporate development teams are uniquely qualified to utterly
botch a libc, yet still push it into widespread use, and the cost is
painful compatibility hacks in all applications. Apple did this with
their fork of BSD libc code. Google has done it once already with
their fork of musl in Fuchsia

Let's keep this focused on technical issues and LLVM issues, none of the above (or the text in this paragraph I've snipped out) has anything to do with those, and I don't think the LLVM list is the right place to discuss that.

LLVM has a long and effective history of both individuals and corporations working effectively together in the open as part of the project. I don't think this project poses any risk there, much like Zach points out in his reply. Google is specifically discussing this early and trying to participate in the open process of the LLVM community from the outset. =]

Also, I'd suggest using more specific technical language than "botch" and "hacks" to make the discussion more productive.


With that, I'll wander off and let you all dig into the real issues here.
-Chandler

Siva Chandra via llvm-dev

unread,
Jun 25, 2019, 6:47:33 PM6/25/19
to Jake Ehrlich, llvm-dev
On Mon, Jun 24, 2019 at 3:37 PM Jake Ehrlich <jakehe...@google.com> wrote:
disclaimer: I work at Google so don't take my +1 as an independent vote forward.

We would like to use this on Fuchsia and I am particularly interested in creating a dynamic linking library for ELF with Roland McGrath's guidance. We spoke about creating a library for writing dynamic linkers internally and I don't see why this can't be upstreamed.

If dynamic linking support is added in a "as a library" fashion, so that it can easily be excluded if not required without affecting the rest of the system, I do not see any problems adding it.
 
On Fuchsia we critically need support for AArch64; What do you expect to be architecture dependent? I struggled to think of where the architecture and not the operating system was the issue.

I think syscalls are an example of being architecture specific? And, items like program startup and PIE loader are operating system/exe format specific?

Just for my knowledge, why is answering these questions at a general level important? 

Jake Ehrlich via llvm-dev

unread,
Jun 25, 2019, 7:05:59 PM6/25/19
to Siva Chandra, llvm-dev
Syscalls are operating system specific and architecture dependent so I think we'll want an abstraction layer around the fundamental operations the syscalls support anyway. Some things like open aren't even syscalls on all operating systems. There might be a generic syscall layer added that would be architecture and not operating system specific but even on x86_64 there are two different ways to do syscalls I think. Loading, startup, and linking are all both format and operating system specific and a few of these details involved are determined by the architecture but they're trivially abstracted away.


why is answering these questions at a general level important? 

Because I wanted to make sure I understood the direction and the restriction stated. The restriction on what architecture will be used without stating a restriction on the operating system seemed like an odd statement. I'd very much like operating system abstractions to be considered right out of the gate and this seems like a bigger issue than the architecture to me.

Siva Chandra via llvm-dev

unread,
Jun 25, 2019, 7:20:22 PM6/25/19
to Zachary Turner, llvm-dev
On Mon, Jun 24, 2019 at 3:43 PM Zachary Turner <ztu...@roblox.com> wrote:
What do you expect the support for Windows to be?  Certainly, I don't
expect you to provide Windows support personally if you don't need it,
but given that LLVM supports Windows, it should at least be done in
such a way that the design lends itself to interested parties
contributing Windows support.

We are not going to disallow support for an item/features we do not plan to implement ourselves. Contributions will be welcome.

As I have mentioned in another email, we really want to develop everything in a "as a library" fashion so that adding support for new items/features isn't blocked by design.

Jake Ehrlich via llvm-dev

unread,
Jun 25, 2019, 7:25:37 PM6/25/19
to Siva Chandra, llvm-dev
Not that right here at this exact moment is the right place to discuss this but a secondary email to discuss and gather requirements for an operating system abstraction layer seems to be required then. We don't want the implementation to be coupled too tightly with Linux if we want to support BSD, Windows, and Fuchsia as well. I also have hopes that hobbyist operating system developers could use this. Libc implementations for hobby OS projects were a pain point for me personally.

Siva Chandra via llvm-dev

unread,
Jun 25, 2019, 7:30:12 PM6/25/19
to Finkel, Hal J., llvm...@lists.llvm.org
On Mon, Jun 24, 2019 at 3:45 PM Finkel, Hal J. <hfi...@anl.gov> wrote:
On 6/24/19 5:23 PM, Siva Chandra via llvm-dev wrote:

Hello LLVM Developers,


Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.


Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need,


+1 - This has also been my experience: Many people over many years have expressed a desire to have a libc has part of the LLVM project. It is currently a large gap in our LLVM toolchain offering. Moreover, from the standpoint of my organization, an LLVM libc could provide benefits on both production platforms and research/experimental hardware.


and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project. 


To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.


We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:


  1. The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.

  2. The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.

  3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.

  4. Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.

  5. The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.


Great.



There are also few areas which we do not intend to invest in at this point:


  1. Implement dynamic loading and linking support.


It will be useful to have a design document that describes the kind of system and capabilities that you're targeting, and then we can discuss how the libc might have a modular design that can be adapted for other use cases. I mention modularity because, for example, we have accelerator hardware and various kind of low-variability/embedded environments where many, but not all, POSIX/libc capabilities make sense.

I am of the opinion that modularity should be as fine-grained as possible. For example, one should be able to pick and package individual functions into a libc as suitable for their platform.
That said, I am open to other ideas you might have about modularity. I am also open to getting convinced that function level granularity is an overkill.

Zachary Turner via llvm-dev

unread,
Jun 25, 2019, 7:32:40 PM6/25/19
to Siva Chandra, llvm-dev
The main concern I have is that Windows is so different from
everything else that there is a high likelihood of decisions being
baked in early on that make things very difficult for people to come
along later and contribute a Windows implementation. This happened
with sanitizers for example (lack of support for weak functions on
Windows), LLDB (posix api calls scattered throughout the codebase),
and I worry with libc it will be even more difficult to correctly
design the abstraction because we have to deal with executable file
format, syscalls, operating system loaders, and various linkage
models.

The most immediate thing I think we will run into is that you
mentioned wanting this to take shape as something that sits in between
system libc and application. Given that Windows' libc and other
versions of libc are so different, I expect this to lead to some
interesting problems.

Can you elaborate more on how you envision this working with llvm libc
in between application and system libc?

On Tue, Jun 25, 2019 at 4:20 PM Siva Chandra <sivac...@google.com> wrote:
>

Siva Chandra via llvm-dev

unread,
Jun 25, 2019, 7:49:24 PM6/25/19
to Jake Ehrlich, llvm-dev
On Tue, Jun 25, 2019 at 4:05 PM Jake Ehrlich <jakehe...@google.com> wrote:
Syscalls are operating system specific and architecture dependent so I think we'll want an abstraction layer around the fundamental operations the syscalls support anyway. Some things like open aren't even syscalls on all operating

Right, syscalls are OS _and_ architecture dependent. So yes, one will have to build abstraction layers over fundamental operations in general.
 
systems. There might be a generic syscall layer added that would be architecture and not operating system specific but even on x86_64 there are two different ways to do syscalls I think. Loading, startup, and linking are all both format and operating system specific and a few of these details involved are determined by the architecture but they're trivially abstracted away.

why is answering these questions at a general level important? 

Because I wanted to make sure I understood the direction and the restriction stated. The restriction on what architecture will be used without stating a restriction on the operating system seemed like an odd statement. I'd very much like operating system abstractions to be considered right out of the gate and this seems like a bigger issue than the architecture to me.

Ah, I see what happened.
So, we are definitely not restricting anything by design here. All we are saying is that we do not intend to contribute beyond x86_64 and Linux to begin with. The community is free to contribute and widen the scope as suitable.

With respect to how exactly we want to build the abstractions, I am of the opinion that we have to go on a case by case basis. The scope of the project is so large that I think it is more meaningful to discuss designs at a more narrow level based on the area that is being worked on.  Sure, we might end up discovering patterns down the road and choose to unify certain things eventually.

Siva Chandra via llvm-dev

unread,
Jun 25, 2019, 8:01:17 PM6/25/19
to Zachary Turner, llvm-dev
On Tue, Jun 25, 2019 at 4:32 PM Zachary Turner <ztu...@roblox.com> wrote:
The main concern I have is that Windows is so different from
everything else that there is a high likelihood of decisions being
baked in early on that make things very difficult for people to come
along later and contribute a Windows implementation.  This happened
with sanitizers for example (lack of support for weak functions on
Windows), LLDB (posix api calls scattered throughout the codebase),
and I worry with libc it will be even more difficult to correctly
design the abstraction because we have to deal with executable file
format, syscalls, operating system loaders, and various linkage
models.

The most immediate thing I think we will run into is that you
mentioned wanting this to take shape as something that sits in between
system libc and application.  Given that Windows' libc and other
versions of libc are so different, I expect this to lead to some
interesting problems.

Can you elaborate more on how you envision this working with llvm libc
in between application and system libc?

A typical application uses a large number of pieces from a libc. But, it is not practical to have everything implemented and ready in a new libc from day one. So for that phase, when the new libc is still being built, we want the unimplemented parts of the new libc to essentially redirect to the system libc. This brings two benefits:

1. We can build the new libc in a gradual manner.
2. Applications stay operational while gaining the benefits of the new implementations.

Do you foresee any problems with this approach on Windows?

Finkel, Hal J. via llvm-dev

unread,
Jun 25, 2019, 8:18:17 PM6/25/19
to Siva Chandra, llvm...@lists.llvm.org


On 6/25/19 6:29 PM, Siva Chandra wrote:


On Mon, Jun 24, 2019 at 3:45 PM Finkel, Hal J. <hfi...@anl.gov> wrote:
On 6/24/19 5:23 PM, Siva Chandra via llvm-dev wrote:

Hello LLVM Developers,


Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.


Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need,


+1 - This has also been my experience: Many people over many years have expressed a desire to have a libc has part of the LLVM project. It is currently a large gap in our LLVM toolchain offering. Moreover, from the standpoint of my organization, an LLVM libc could provide benefits on both production platforms and research/experimental hardware.


and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project. 


To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.


We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:


  1. The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.

  2. The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.

  3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.

  4. Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.

  5. The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.


Great.



There are also few areas which we do not intend to invest in at this point:


  1. Implement dynamic loading and linking support.


It will be useful to have a design document that describes the kind of system and capabilities that you're targeting, and then we can discuss how the libc might have a modular design that can be adapted for other use cases. I mention modularity because, for example, we have accelerator hardware and various kind of low-variability/embedded environments where many, but not all, POSIX/libc capabilities make sense.

I am of the opinion that modularity should be as fine-grained as possible. For example, one should be able to pick and package individual functions into a libc as suitable for their platform.
That said, I am open to other ideas you might have about modularity. I am also open to getting convinced that function level granularity is an overkill.


This sounds like a good starting position to me. We can adjust over time.

Thanks again,

Hal

Zachary Turner via llvm-dev

unread,
Jun 25, 2019, 8:23:10 PM6/25/19
to Siva Chandra, llvm-dev
I foresee problems with this on both Windows and non-Windows. A
typical libc implementation has a lot of internal state that is shared
across API boundaries in a way that is considered an implementation
detail. So making assumptions about which state is shared and which
isn't is going to be a problem.

How do you guarantee that if you implement method A and forward method
B, that B will behave the same as it would have if you had forwarded A
also? It might not even work at all. Where can you safely draw this
boundary?

Users can set errno for example, and in many cases they must set errno
to 0 before invoking a call if they want to reliably detect an error.
So let's say they set errno to 0, then call a method which our libc
implementation decides to forward. What do we do? We could propagate
errno on every single call, but my point is that there are going to be
a ton of subtle issues that arise from this approach that are hard to
foresee, precisely because the implementation details of a libc
implementation are supposed to be just that - implementation details.

Finkel, Hal J. via llvm-dev

unread,
Jun 25, 2019, 9:20:19 PM6/25/19
to Zachary Turner, Siva Chandra, llvm-dev

On 6/25/19 7:22 PM, Zachary Turner via llvm-dev wrote:
> I foresee problems with this on both Windows and non-Windows. A
> typical libc implementation has a lot of internal state that is shared
> across API boundaries in a way that is considered an implementation
> detail. So making assumptions about which state is shared and which
> isn't is going to be a problem.
>
> How do you guarantee that if you implement method A and forward method
> B, that B will behave the same as it would have if you had forwarded A
> also? It might not even work at all. Where can you safely draw this
> boundary?
>
> Users can set errno for example, and in many cases they must set errno
> to 0 before invoking a call if they want to reliably detect an error.
> So let's say they set errno to 0, then call a method which our libc
> implementation decides to forward. What do we do? We could propagate
> errno on every single call, but my point is that there are going to be
> a ton of subtle issues that arise from this approach that are hard to
> foresee, precisely because the implementation details of a libc
> implementation are supposed to be just that - implementation details.


You certainly can't mix-and-match on a per-function level, in general. I
suspect that there are some subsystems that can be substituted. Using
open from one libc and close from another seems problematic. Using open
and close from one libc and qsort from another is probably fine. And, as
you point out, the library might need to be configurable to use an
externally-provided errno.

 -Hal
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Jorg Brown via llvm-dev

unread,
Jun 25, 2019, 10:23:36 PM6/25/19
to Rich Felker, llvm...@lists.llvm.org
I agree with your point 1.

With regards to point 2, I think there's a difference between Fuchsia, which Google controls (where every check-in is authored by a Fuchsia eng and reviewed by another Fuchsia eng), and LLVM, which Google doesn't control.  There's also a difference between Google in general, and the Fuchsia project, which I'd summarize as simply: Google is not a monoculture.  Case in point: Jake, who works at Google, immediately countered Siva's suggestion that "Support for more architectures" is not something Google intends to invest in, by pointing out his need for AArch64 support. I work for Google too, and I personally need RISC-V support. (Separately, I'm sorry to hear about your experience with Fuchsia's musl fork... though I've not worked on Fuchsia and have no knowledge of that situation and therefore won't say anything more about it.)

With regards to point 3, I agree with your points, in particular, I agree that it's important for there to be a variety of libc implementations.  But it seems to me that while gnu has both gcc and glibc, gcc doesn't require the use of glibc, and I would anticipate that clang would never require llvmlibc.  I would anticipate that a user would continue to have their choice of compiler, their choice of STL implementation, their choice of libc implementation.  To the extent that there would be a "library implementation that's first-class for use with LLVM", I think there already is: glibc.  But it would be better if there were two first-class implementations.

Mehdi AMINI via llvm-dev

unread,
Jun 26, 2019, 1:42:04 AM6/26/19
to Siva Chandra, llvm-dev
On Tue, Jun 25, 2019 at 4:49 PM Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:
On Tue, Jun 25, 2019 at 4:05 PM Jake Ehrlich <jakehe...@google.com> wrote:
Syscalls are operating system specific and architecture dependent so I think we'll want an abstraction layer around the fundamental operations the syscalls support anyway. Some things like open aren't even syscalls on all operating

Right, syscalls are OS _and_ architecture dependent. So yes, one will have to build abstraction layers over fundamental operations in general.
 
systems. There might be a generic syscall layer added that would be architecture and not operating system specific but even on x86_64 there are two different ways to do syscalls I think. Loading, startup, and linking are all both format and operating system specific and a few of these details involved are determined by the architecture but they're trivially abstracted away.

why is answering these questions at a general level important? 

Because I wanted to make sure I understood the direction and the restriction stated. The restriction on what architecture will be used without stating a restriction on the operating system seemed like an odd statement. I'd very much like operating system abstractions to be considered right out of the gate and this seems like a bigger issue than the architecture to me.

Ah, I see what happened.
So, we are definitely not restricting anything by design here. All we are saying is that we do not intend to contribute beyond x86_64 and Linux to begin with. The community is free to contribute and widen the scope as suitable.

IMO It is perfectly fine to have a favorite target in mind that you want to put your effort to support.
However if the project is not started from the ground up by involving people that care about other platforms (and you have enough variety of these), then it is likely that assumptions about your favorite platform will be baked in the foundations of the project and it'll be technically hard for the community to re-use these pieces in the future or contribute support for their platform (I'm making a similar point to Zach here).

If we must have a libc in LLVM, I hope it will be designed and implemented from the beginning with multiple OS and at least two architectures from the beginning. Even if you only really care about X86/Linux, you may have to put some minimal amount of effort to support Windows just to prove your design (ideally there would be enough support in the community so that putting effort to support Windows isn't only on you).

Cheers,

-- 
Mehdi


 

With respect to how exactly we want to build the abstractions, I am of the opinion that we have to go on a case by case basis. The scope of the project is so large that I think it is more meaningful to discuss designs at a more narrow level based on the area that is being worked on.  Sure, we might end up discovering patterns down the road and choose to unify certain things eventually.

Siva Chandra via llvm-dev

unread,
Jun 26, 2019, 2:05:44 AM6/26/19
to Mehdi AMINI, llvm-dev
On Tue, Jun 25, 2019 at 10:41 PM Mehdi AMINI <joke...@gmail.com> wrote:
On Tue, Jun 25, 2019 at 4:49 PM Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:
On Tue, Jun 25, 2019 at 4:05 PM Jake Ehrlich <jakehe...@google.com> wrote:
Syscalls are operating system specific and architecture dependent so I think we'll want an abstraction layer around the fundamental operations the syscalls support anyway. Some things like open aren't even syscalls on all operating

Right, syscalls are OS _and_ architecture dependent. So yes, one will have to build abstraction layers over fundamental operations in general.
 
systems. There might be a generic syscall layer added that would be architecture and not operating system specific but even on x86_64 there are two different ways to do syscalls I think. Loading, startup, and linking are all both format and operating system specific and a few of these details involved are determined by the architecture but they're trivially abstracted away.

why is answering these questions at a general level important? 

Because I wanted to make sure I understood the direction and the restriction stated. The restriction on what architecture will be used without stating a restriction on the operating system seemed like an odd statement. I'd very much like operating system abstractions to be considered right out of the gate and this seems like a bigger issue than the architecture to me.

Ah, I see what happened.
So, we are definitely not restricting anything by design here. All we are saying is that we do not intend to contribute beyond x86_64 and Linux to begin with. The community is free to contribute and widen the scope as suitable.

IMO It is perfectly fine to have a favorite target in mind that you want to put your effort to support.
However if the project is not started from the ground up by involving people that care about other platforms (and you have enough variety of these), then it is likely that assumptions about your favorite platform will be baked in the foundations of the project and it'll be technically hard for the community to re-use these pieces in the future or contribute support for their platform (I'm making a similar point to Zach here).

If we must have a libc in LLVM, I hope it will be designed and implemented from the beginning with multiple OS and at least two architectures from the beginning. Even if you only really care about X86/Linux, you may have to put some minimal amount of effort to support Windows just to prove your design (ideally there would be enough support in the community so that putting effort to support Windows isn't only on you).

Right. If my first email sounded like we care only about x86_64 and Linux, let me improve on it: Our intention is not to design out any particular platform. At this point in time, the team I represent is most interested in Linux and x86_64. We intend to do all development in the open from day one. So, we will be looking up to the community members interested in other platforms to bring their views and needs and help build stronger abstractions.
 

Siva Chandra via llvm-dev

unread,
Jun 26, 2019, 2:31:20 AM6/26/19
to Rich Felker, llvm-dev
On Tue, Jun 25, 2019 at 12:12 PM Rich Felker <dal...@libc.org> wrote:
First, writing and maintaining a correct, compatible, high-quality
libc is a monumental task.
Point 1 is why making a libc for real-world use is not to be taken
lightly.

We totally understand the magnitude of this undertaking :)

Point 2 is why, if it is done, it shouldn't be a Google project.

The very point of my first email in this thread was to ask if this can be made part of the LLVM project, developed and maintained by the LLVM community.
 
Point 3 is why there should not be an "LLVM libc".
 
If there can be a C++ standard library and runtime implementation as part of the LLVM project, I do not see a reason why there cannot be a libc implementation as part of the LLVM project.

Jan Ziak via llvm-dev

unread,
Jun 26, 2019, 9:48:47 AM6/26/19
to llvm...@lists.llvm.org
Hi

I find it interesting that a reimplementation of libc is being
discussed without clearly stating the differences and benefits of the
new implementation.

Or did I miss the discussion about the differences and benefits?

Sincerely
Jan

Zachary Turner via llvm-dev

unread,
Jun 26, 2019, 10:03:40 AM6/26/19
to Jan Ziak, llvm...@lists.llvm.org
On Wed, Jun 26, 2019 at 6:48 AM Jan Ziak via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi

I find it interesting that a reimplementation of libc is being
discussed without clearly stating the differences and benefits of the
new implementation.

Or did I miss the discussion about the differences?

I can’t speak for Siva’s and/or Google, but two obvious differences that come to mind for me personally are:

1) the license

2) there is currently no existing open source implementation of libc for Windows that works with native Windows / MSABI toolchains.  clang on Windows, for example, has a hard dependency on a full visual studio installation for exactly this reason.

Stan Shebs via llvm-dev

unread,
Jun 26, 2019, 10:58:56 AM6/26/19
to Jan Ziak, llvm...@lists.llvm.org
On Wed, Jun 26, 2019 at 8:49 AM Jan Ziak via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> Hi
>
> I find it interesting that a reimplementation of libc is being
> discussed without clearly stating the differences and benefits of the
> new implementation.
>
> Or did I miss the discussion about the differences and benefits?

Siva touched on them at the top of his message, but it maybe got lost
a bit in the ensuing discussion. Plus the differences and benefits
will to some extent depend on where people want to go with it.

As someone who has spent the last several years hammering the round
peg of glibc into the square hole that is Google production
infrastructure (it does work, breadcrumbs to the glibc branches here -
https://sourceware.org/glibc/wiki/GlibcGit/google_namespace ), I find
a couple opportunities especially appealing:

1) Static linking, which opens up opportunities for whole-program
analysis. In theory, one could do this with glibc, but even aside
from the LGPL issue, the code is built to present a versioned-symbol
ABI. Imagine being able to trace all the way to the bottom of a
printf call, and only incorporate the bits that you actually use, or
being able to elide locks in a known safe zone.

2) Updated build machinery and coding style. I'm sure it's urban
legend that glibc was written as a test case for every GNU Make
feature :-) but its makefiles are pretty intricate, there are a bunch
of cases where chunks of code are synthesized by make rules, and a
bunch more where important code is in the bodies of multi-page
multi-level C macros, in the best programming style of the 1980s.

Shawn Webb via llvm-dev

unread,
Jun 26, 2019, 11:19:33 AM6/26/19
to Siva Chandra, llvm-dev
On Tue, Jun 25, 2019 at 03:47:15PM -0700, Siva Chandra via llvm-dev wrote:
> On Mon, Jun 24, 2019 at 3:37 PM Jake Ehrlich <jakehe...@google.com>
> wrote:
>
> > disclaimer: I work at Google so don't take my +1 as an independent vote
> > forward.
> >
> > We would like to use this on Fuchsia and I am particularly interested in
> > creating a dynamic linking library for ELF with Roland McGrath's guidance.
> > We spoke about creating a library for writing dynamic linkers internally
> > and I don't see why this can't be upstreamed.
> >
>
> If dynamic linking support is added in a "as a library" fashion, so that it
> can easily be excluded if not required without affecting the rest of the
> system, I do not see any problems adding it.

It would be very cool to see an RTLD in llvm, especially one that
could support multiple formats, like ELF and LLVM IR. If I understand
llvm's Cross-DSO CFI support properly (which I may not), the entire
address space for a dlopen()ed DSO (as in, where the DSO was loaded in
memory) is added to the cfi-icall whitelist. An RTLD in llvm that can
intelligently integrate with the sanitizers to provide the same level
of granularity for dlopen()ed libraries as ELF::DT_NEEDED ones.

One could also complete the required SafeStack integration with the
RTLD, which requires bringing in the sanitizer framework, anyways.

A portable libc and RTLD provided by llvm that can make integral use
of the sanitizers, especially CFI and SafeStack, would be absolutely
lovely. If not done within llvm, HardenedBSD needs to do it, anyways,
in order to apply SafeStack and Cross-DSO CFI properly to both DSOs
and applications. If not done within llvm, HardenedBSD will simply
continue using and maintaining patches on top of the libc and RTLD
we inherit from our upstream FreeBSD.

Thanks,

--
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

Tor-ified Signal: +1 443-546-8752
Tor+XMPP+OTR: lat...@is.a.hacker.sx
GPG Key ID: 0xFF2E67A277F8E1FA
GPG Key Fingerprint: D206 BB45 15E0 9C49 0CF9 3633 C85B 0AF8 AB23 0FB2
signature.asc

Andrew Kelley via llvm-dev

unread,
Jun 26, 2019, 12:02:50 PM6/26/19
to llvm...@lists.llvm.org
On 6/24/19 6:23 PM, Siva Chandra via llvm-dev wrote:
> Within Google, we have a growing range of needs that existing libc
> implementations don't quite address.
> To be very clear: we don't expect our needs to exactly match everyone
> else's -- part of our impetus is to simplify things wherever we can, and
> that may not quite match what others want in a libc.
> There are also few areas which we do not intend to invest in at this point:
> Implement dynamic loading and linking support.
> Support for more architectures (we'll start with just x86-64 for
> simplicity).
> So, what do you think about incorporating this new libc under the LLVM
> project?
The null hypothesis is to not add a project to LLVM. In order to add a
project, it should be justified. What are the justifications here? I've
quoted the snippets above where it is made clear that Google's needs do
*not* line up with the needs of the community. But the proposal failed
to mention what the actual needs of Google are.

So what are they?

The current list of C ABI environments which LLVM recognizes is:

none
gnu
gnuabin32
gnuabi64
gnueabi
gnueabihf
gnux32
code16
eabi
eabihf
android
musl
musleabi
musleabihf
msvc
itanium
cygnus
coreclr
simulator

Would this proposed libc be adding a new C ABI environment to this list,
or maintaining API/ABI compatibility with one or more of these?

Finally, I'm only aware of 2 operating systems where the libc is not an
integral part of the system, which is Linux and Windows. For example on
macOS, FreeBSD, OpenBSD, and DragonFlyBSD, the libc is guaranteed to be
available, and must be dynamically linked, because this is the stable
syscall ABI. So it would only make sense for an LLVM libc to be for
Linux and Windows. It seems reasonable to assume that Google is only
interested in Linux. In this case I have to re-iterate my original
question, what are the needs that are not being met by existing Linux
libcs, such as musl?

Regards,
Andrew

signature.asc

Chris Lattner via llvm-dev

unread,
Jun 26, 2019, 12:43:09 PM6/26/19
to Andrew Kelley, llvm...@lists.llvm.org
On Jun 26, 2019, at 9:02 AM, Andrew Kelley via llvm-dev <llvm...@lists.llvm.org> wrote:
> On 6/24/19 6:23 PM, Siva Chandra via llvm-dev wrote:
>> Within Google, we have a growing range of needs that existing libc
>> implementations don't quite address.
>> To be very clear: we don't expect our needs to exactly match everyone
>> else's -- part of our impetus is to simplify things wherever we can, and
>> that may not quite match what others want in a libc.
>> There are also few areas which we do not intend to invest in at this point:
>> Implement dynamic loading and linking support.
>> Support for more architectures (we'll start with just x86-64 for
>> simplicity).
>> So, what do you think about incorporating this new libc under the LLVM
>> project?
> The null hypothesis is to not add a project to LLVM. In order to add a
> project, it should be justified. What are the justifications here? I've
> quoted the snippets above where it is made clear that Google's needs do
> *not* line up with the needs of the community. But the proposal failed
> to mention what the actual needs of Google are.
>
> So what are they?

I really have nothing to do with this project, and no insight on the thoughts behind it, but I think you and several other people on this thread have missed a significant issue: the thread is conflating whether it is a good idea to "create yet another libc" with whether it is a good idea to "contribute that code to LLVM". I don’t think arguing whether or not someone should build a project is on-topic for this list. Given that they appear motivated to build it, the question is whether this fits into the LLVM umbrella.

With my LLVM hat on (I also work for Google, but am unaffiliated and uninvolved with this proposal), it appears clearly beneficial for LLVM to have a libc if it were done well. That said, clang shouldn’t/couldn't *require* one specific libc, just like we don’t require libc++ as the standard library. We want LLVM components to be mixable and matchable.

I appreciate the comments on this thread that are throwing in ideas for how to make the project better, how to ensure it grows to being a successful and widely useful component of LLVM, etc. I for one think that this could be very useful for people building custom micro targets, and being able to build custom configs of a libc without (e.g.) stdio or libm would be a nice way to shed weight.

-Chris

Andrew Kelley via llvm-dev

unread,
Jun 26, 2019, 1:17:11 PM6/26/19
to Chris Lattner, llvm...@lists.llvm.org
On 6/26/19 12:42 PM, Chris Lattner wrote:
> On Jun 26, 2019, at 9:02 AM, Andrew Kelley via llvm-dev <llvm...@lists.llvm.org> wrote:
>> On 6/24/19 6:23 PM, Siva Chandra via llvm-dev wrote:
>>> Within Google, we have a growing range of needs that existing libc
>>> implementations don't quite address.
>>> To be very clear: we don't expect our needs to exactly match everyone
>>> else's -- part of our impetus is to simplify things wherever we can, and
>>> that may not quite match what others want in a libc.
>>> There are also few areas which we do not intend to invest in at this point:
>>> Implement dynamic loading and linking support.
>>> Support for more architectures (we'll start with just x86-64 for
>>> simplicity).
>>> So, what do you think about incorporating this new libc under the LLVM
>>> project?
>> The null hypothesis is to not add a project to LLVM. In order to add a
>> project, it should be justified. What are the justifications here? I've
>> quoted the snippets above where it is made clear that Google's needs do
>> *not* line up with the needs of the community. But the proposal failed
>> to mention what the actual needs of Google are.
>>
>> So what are they?
>
> I really have nothing to do with this project, and no insight on the thoughts behind it, but I think you and several other people on this thread have missed a significant issue: the thread is conflating whether it is a good idea to "create yet another libc" with whether it is a good idea to "contribute that code to LLVM". I don’t think arguing whether or not someone should build a project is on-topic for this list. Given that they appear motivated to build it, the question is whether this fits into the LLVM umbrella.

I don't understand your reasoning here. If there's reason to believe it
should not be built at all, wouldn't that also imply that it shouldn't
be taken under LLVM's umbrella? The LLVM community (including myself)
will be responsible for maintaining this software and to do that we must
figure out the specifications, trade-offs, and use cases. How should we
determine the requirements of something that has no reason to exist?

signature.asc

JF Bastien via llvm-dev

unread,
Jun 26, 2019, 1:27:57 PM6/26/19
to Siva Chandra, llvm...@lists.llvm.org
On Jun 24, 2019, at 3:23 PM, Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:

Hello LLVM Developers,

Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.

Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need, and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project. 

To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.

We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:

  1. The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.
  2. The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.
  3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.

I’d love to hear what you have in mind with point 3 above, and see it expanded. libc++ implements C++11 and subsequent standards, and that makes me wonder:

  • Which standards would this libc implement?
  • Would you implement upcoming C standards, and how would you manage “experimental” features (API changes, ABI changes, etc)?
  • What parts of the standard wouldn’t you follow, why, how would the LLVM community determine this?
  • Which parts aren’t worth implementing?
  • Which parts cannot be safely used in modern coding practice? How would you remedy what’s perceived as “the bad parts”?
  • I’d love it if the C Standards Committee, WG14, got renewed involvement through this project. Is that an explicit goal? Who will join WG14 in this effort?
  • What part of C do you see this project help improve over time?


  1. Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.
  2. The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.

How do you intend to test this C library? Fuzzing and all that is nice, but just straight conformance testing is what I’d like to hear about.


There are also few areas which we do not intend to invest in at this point:

  1. Implement dynamic loading and linking support.
  2. Support for more architectures (we'll start with just x86-64 for simplicity).

For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.

We intend to build the new libc in a gradual manner. To begin with,  the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.

So, what do you think about incorporating this new libc under the LLVM project?

Thank you,
Siva Chandra and the rest of the Google LLVM contributors

Somewhat off-topic… this last line is unfortunate. It would be great to not sign as "the rest of the Google LLVM contributors” when subsequent responses show that many Google LLVM contributors aren’t co-signing this proposal (even if they’re interested!). Scope and purpose within your organization would have been more helpful, here it sounds like all of Google is in agreement… which never happen 🙂


 - JF (and not the rest of the Apple LLVM contributors 😉)


Siva Chandra via llvm-dev

unread,
Jun 26, 2019, 2:20:42 PM6/26/19
to Peter Smith, llvm-dev
On Tue, Jun 25, 2019 at 2:53 AM Peter Smith <peter...@linaro.org> wrote:
>
> On Mon, 24 Jun 2019 at 23:23, Siva Chandra via llvm-dev

> <llvm...@lists.llvm.org> wrote:
> >
> > Hello LLVM Developers,
> >
> >
> > Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.
> >
>
> Are you able to share what some of these needs are? My reason for
> asking is to see if there is a particular niche where existing libc
> designs are not working, or if there is an approach that will handle
> many use cases better than existing libc implementations.

There have been a lot of questions about our reasons for opting to
build a new libc and why an existing libc implementation does not meet
our needs. I will try to address these questions in a general fashion
in this email. I will answer individual concerns separately. Before I
start, I also want to apologize if I am being late to answer, or
appearing to be ignoring some of the emails. I am not trying to ignore
or avoid any one or any question - it is just that I need time to
process your questions and compose meaningful answers.

So, we have a bunch of reasons for a new libc and why we prefer it to
be a part of the LLVM project:

1. Static linking without the complexity of dynamic linking - Most
libc implementations end up being complicated because they support
dynamic loading/linking. This is not bad by itself, but we want to be
able to take out dynamic linking capability where possible and get the
benefits of the much simpler system. We believe that building
everything in a “as a library fashion” would facilitate this.

2. As somebody else has pointed out in the list, we want to have a
libc with as much fine grained modularity as possible. This not only
helps one to pick and choose what they want, but also makes it easy to
adapt to different build systems. Moreover, such a modular system will
also facilitate deploying chunks of functionality during the
transition from another libc to this new libc.

3. Sanitizer supported testing and fuzz testing from the start - Doing
this from the start will impact few design choices non-trivially. For
example, sanitizers need that a target be rebuilt with sanitizer
specific specialized options. We want to develop the new libc in such
a fashion that it will work with these specialized options as well.

4. ABI independent implementation as far as possible - There will be
places where it would not be possible to implement in an ABI
independent fashion. However, wherever possible, we want to use normal
source code so that compiler-based changes to the ABI are easy. Our
reasons for ABI independent implementations fall into two categories:

a) Long term changes to the ABI for security like SCADS, and for
performance tuning like caller/callee register ratios to better match
software and hardware.
b) Rapid deployment of specific ABI changes as part of security
mitigation strategies such as those for Spectre. For example,
speculative load hardening would have vastly benefitted from being
able to change the calling convention.

5. Avoid assembly language as far as possible - Again, there will be
places where one cannot avoid assembly level implementations. But,
wherever possible, we want to avoid assembly level implementations.
There are a few reasons here as well:

a) We want to leverage the compiler for performance wherever possible,
and as part of the LLVM project, fix compiler bugs rather than use
assembly.
b) Enable sanitizers and coverage-based fuzzing to work well across
the implementation of libc.
c) Allow deploying compiler-based security mitigations such as those
we needed for Spectre.

6. Having the support of the LLVM community, project, and
infrastructure - From access to the broad platform expertise in the
community to the strong license and project structure, we think the
project will be significantly more successful as part of LLVM than
elsewhere.

All this does not mean we want to implement everything from scratch.
If someone has implementations for parts of the libc ready, and would
like to contribute to this project under the LLVM license, we will
certainly welcome it.

Chris Lattner via llvm-dev

unread,
Jun 27, 2019, 12:42:28 AM6/27/19
to Andrew Kelley, llvm...@lists.llvm.org
On Jun 26, 2019, at 10:16 AM, Andrew Kelley <and...@ziglang.org> wrote:


I really have nothing to do with this project, and no insight on the thoughts behind it, but I think you and several other people on this thread have missed a significant issue: the thread is conflating whether it is a good idea to "create yet another libc" with whether it is a good idea to "contribute that code to LLVM".  I don’t think arguing whether or not someone should build a project is on-topic for this list.  Given that they appear motivated to build it, the question is whether this fits into the LLVM umbrella.

I don't understand your reasoning here. If there's reason to believe it
should not be built at all, wouldn't that also imply that it shouldn't
be taken under LLVM's umbrella? The LLVM community (including myself)
will be responsible for maintaining this software and to do that we must
figure out the specifications, trade-offs, and use cases. How should we
determine the requirements of something that has no reason to exist?

I don’t see the connection.  The LLVM project exists to foster compiler and toolchain related technologies that align with its developer policy (including license, library based design etc).  This proposal aligns directly with that mission.

I don’t see why something being part of LLVM means that you necessarily need to support it or “maintain" it.  Do you maintain vmkit, which is also part of LLVM?

OTOH, I agree with you that something becoming an LLVM subproject means that it is likely to gain traction over time and become a default answer with new targets that come up.  If there are other existing libc implementations that want to align with the mission (incl, design goals, coding standard, licensing, etc), then I encourage them to step up and provide other compelling alternatives to consider.

If no compelling alternative to consider steps forward, then the primary question (from the LLVM perspective) is whether a new project aligns with the mission or not.  We have no track record of rejecting a proposal based on the theory that some better alternative *could theoretically* exist.

-Chris



Siva Chandra via llvm-dev

unread,
Jun 27, 2019, 12:44:55 AM6/27/19
to Finkel, Hal J., llvm-dev
> On 6/25/19 7:22 PM, Zachary Turner via llvm-dev wrote:
> > I foresee problems with this on both Windows and non-Windows. A
> > typical libc implementation has a lot of internal state that is shared
> > across API boundaries in a way that is considered an implementation
> > detail. So making assumptions about which state is shared and which
> > isn't is going to be a problem.

+1 for what Hal Finkel has said below about switching from redirectors
to implementations: There will be certain groups of functions which
will have to be switched all together. We will not be able to do it
one function at a time for such groups.

> > How do you guarantee that if you implement method A and forward method
> > B, that B will behave the same as it would have if you had forwarded A
> > also? It might not even work at all. Where can you safely draw this
> > boundary?

Are you talking about a scenario wherein implementation of B in the
system libc calls its A? If yes, most libc implementations do a good
job of using internal names in such scenarios. That is, B would call A
with an internal name. This ensures that B from the system libc calls
A also from the system libc and not the redirector/forwarder.

> > Users can set errno for example, and in many cases they must set errno
> > to 0 before invoking a call if they want to reliably detect an error.
> > So let's say they set errno to 0, then call a method which our libc
> > implementation decides to forward. What do we do? We could propagate
> > errno on every single call, but my point is that there are going to be
> > a ton of subtle issues that arise from this approach that are hard to
> > foresee, precisely because the implementation details of a libc
> > implementation are supposed to be just that - implementation details.

Dealing with errno in particular is probably not as nasty as it seems.
The standard allows errno to be a macro. Hence, for the transitory
phase, implementations and redirectors in our libc can make use of the
errno from the system libc. Something like this:

$> cat llvm-errno.cpp
#include <errno.h> // This is the system-libc header file

int *__llvm_errno() {
return &errno;
}

$> cat errno.h # This is the llvm libc's errno.h
int *__llvm_errno();

#define errno (*__llvm_errno())

On Tue, Jun 25, 2019 at 6:20 PM Finkel, Hal J. <hfi...@anl.gov> wrote:
> You certainly can't mix-and-match on a per-function level, in general. I
> suspect that there are some subsystems that can be substituted. Using
> open from one libc and close from another seems problematic. Using open
> and close from one libc and qsort from another is probably fine. And, as
> you point out, the library might need to be configurable to use an
> externally-provided errno.

Chris Lattner via llvm-dev

unread,
Jun 27, 2019, 12:52:39 AM6/27/19
to Siva Chandra, llvm-dev
On Jun 26, 2019, at 11:20 AM, Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:

a) We want to leverage the compiler for performance wherever possible,
and as part of the LLVM project, fix compiler bugs rather than use
assembly.

I love this approach as a way of driving low level performance forward!  How do you anticipate this working in practice? 

For example, if someone says “I can shave 1 cycle out of this important thing if I write it in asm” and you know that a suitably capable compiler engineer can achieve the same thing given enough time, how do you plan to push back?

-Chris


Siva Chandra via llvm-dev

unread,
Jun 27, 2019, 1:09:11 AM6/27/19
to Andrew Kelley, llvm-dev


On Wed, Jun 26, 2019 at 9:02 AM Andrew Kelley via llvm-dev <llvm...@lists.llvm.org> wrote:
> It seems reasonable to assume that Google is only
> interested in Linux. In this case I have to re-iterate my original
> question, what are the needs that are not being met by existing Linux
> libcs, such as musl?

First of all, let me make this clear: musl is great, and I have used it personally to learn about how things work. We also evaluated the option of adopting musl and modifying it to suit our needs (I listed our rough goals here: http://lists.llvm.org/pipermail/llvm-dev/2019-June/133360.html)

However, considering the disruptive nature of some of the changes we want to make (like ABI independence, sanitizer friendly, modularity, avoiding complexity from dynamic linking where possible, etc.), it seemed comparable to building from the ground up. That also offered an opportunity to structure this as part of the overall LLVM project which has lots of advantages on its own.

This does not mean we want to re-implement everything. If community members already have parts of libc implementations ready and want to contribute them to LLVM, that would be great. That could absolutely include parts of musl if the authors are interested in contributing them to LLVM, but some research indicated this wasn't likely (happy to be corrected if wrong though). We would just need to figure out how to add pieces incrementally, and support the goals outlined above.

Siva Chandra via llvm-dev

unread,
Jun 27, 2019, 1:26:54 AM6/27/19
to Fāng-ruì Sòng, LLVM Developers Mailing List
On Tue, Jun 25, 2019 at 1:18 AM Fāng-ruì Sòng <mas...@google.com> wrote:
Some natural questions:

1) Will libm be included?

Yes, libm and libpthread are absolutely included. Further, [r]crt*.o files which support static linking of all kinds are also included.
 
2) How will llvm libc be different from musl in design perspectives?

We want llvm libc design to accommodate them from the start.
 
Then another natural question is how the kernel differences will be effectively isolated. The platform specific macros in compiler-rt may be a bit messy now. I hope we can prevent that situation.

We definitely want to avoid "mess". But at the same time, it's hard to talk about this in a general fashion. We will have to take this up on a case by case basis and discuss at a narrower scope as to what makes the most practical sense.

David Chisnall via llvm-dev

unread,
Jun 27, 2019, 8:26:57 AM6/27/19
to llvm...@lists.llvm.org
[ I have worked on FreeBSD libc, so a few clarifications here: ]

On 26/06/2019 17:02, Andrew Kelley via llvm-dev wrote:
> Finally, I'm only aware of 2 operating systems where the libc is not an
> integral part of the system, which is Linux and Windows. For example on
> macOS, FreeBSD, OpenBSD, and DragonFlyBSD, the libc is guaranteed to be
> available, and must be dynamically linked, because this is the stable
> syscall ABI.

Solaris and macOS (kind-of) belong on this list, but FreeBSD does not
and I don't believe other BSDs do, though the situation is somewhat more
complex. On FreeBSD, the system call ABI is stable and there are compat
layers that allow foreign or legacy system call interfaces to be exposed
to userspace processes (e.g. a FreeBSD 7 system call table on FreeBSD
12, or a Linux system call table on any FreeBSD. The Capsicum sandbox
mode is also implemented in part by pivoting the system call layer: once
you call cap_enter, some system calls are simply not exposed to you at
all).

There is even CloudABI, which uses a mostly musl-derived libc and a
Capsicum-derived system call table. This is used for statically linked
applications with a custom launcher that gives strong security guarantees.

That said, the relationship between FreeBSD's libc, libthr (pthreads)
and rtld are quite complex, as are their interactions with the kernel.
Supporting dlopening libthr turned out to be incredibly hard to support
in practice, but even without that, there is some complexity from the
fact that libc must allow libthr to preempt a number of its symbols (and
must provide implementations of things like pthread_mutex for programs
that do not start threads). In the 5.x time frame, we did support two
different pthreads implementations. This was, in hindsight, an
absolutely terrible idea and not something that I'd ever recommend
anyone do ever again.

On macOS, libSystem is actually the public interface to the kernel, so
you can bring along your own libc if you want to, you just have to
dynamically link to libSystem to get access to system calls (or you do
what Go did, try to make them without going via libSystem, and watch
every single program written in your language die when the kernel's
gettimeofday interface changes...). This; however, makes it effectively
impossible to difficult to bring your own dyld replacement to macOS,
because it must be able to load libSystem without making any system calls...

> So it would only make sense for an LLVM libc to be for
> Linux and Windows. It seems reasonable to assume that Google is only
> interested in Linux. In this case I have to re-iterate my original
> question, what are the needs that are not being met by existing Linux
> libcs, such as musl?

I am also unconvinced that it is possible to design a clean platform
abstraction layer for libc that would work over even Linux and FreeBSD
without imposing significant penalties for one or the other. If you add
Windows into the mix, then it gets a lot harder. POSIX's decision to
use int, rather than a pointer type, for file descriptors and to make
specific guarantees about reuse order (rather than just providing dup2
as a moderately sane interface) means that userspace code will need to
implement the file descriptor table. Do we build higher-level layering
on top of file descriptors or do we support Windows HANDLEs natively for
internal usage and use fds only for public APIs?

The idea of an LLVM libc has been proposed a few times and generally the
pushback has been that it doesn't make sense because libc is so
intimately tied to the host kernel that it's very hard to consider it as
a portable component.

David

Stan Shebs via llvm-dev

unread,
Jun 27, 2019, 11:16:19 AM6/27/19
to Chris Lattner, llvm-dev

I think it's becoming uncommon to find cases like that today; the
person who thinks they have a magic assembly hack finds that it works
well for one microbenchmark on one architecture variant, but
disappoints when used in real code. In fact, glibc has been throwing
out a bunch of assembly code in recent years, as testing shows much of
it to not to have any noticeable advantage.

If the customized calling convention scheme works out, it's going to
be a huge incentive to fix the compiler in case of performance
lossage; it will be quite difficult to write assembly that is equally
performant for all possible calling conventions, and if you try to
assume a convention, then the assumption propagates up through the
program, possibly defeating more important optimizations.

Zachary Turner via llvm-dev

unread,
Jun 27, 2019, 12:06:30 PM6/27/19
to Siva Chandra, llvm-dev
Errno is an easy example, but perhaps not the best specifically because the standard dictates its behavior.  But an implementation may have implicit assumptions as well.

I guess let me make this concrete: can you propose a specific separation that you have in mind?

Keep in mind that even if A doesn’t depend on B, that doesn’t mean that A and B can be separated.  You mentioned that open() and close() would obviously  have to be done at the same time, but it’s much worse than this: The *entire transitive closure* of open() and close() must be done at the same time, and my hypothesis is that this is going to a) be much larger than you expect, and b) be different with different underlying libc implementations.


Then there are more immediate issues.  On Windows specifically, I’m not even sure it’s going to be physically possible to link in two copies of the CRT and have one forward to the other.  If it is possible, it’s very non obvious how to make it work and will likely require a ton of additional machinery.

Chris Lattner via llvm-dev

unread,
Jun 27, 2019, 12:29:59 PM6/27/19
to Stan Shebs, llvm-dev


On Jun 27, 2019, at 8:16 AM, Stan Shebs <stan...@google.com> wrote:

For example, if someone says “I can shave 1 cycle out of this important thing if I write it in asm” and you know that a suitably capable compiler engineer can achieve the same thing given enough time, how do you plan to push back?

I think it's becoming uncommon to find cases like that today; the
person who thinks they have a magic assembly hack finds that it works
well for one microbenchmark on one architecture variant, but
disappoints when used in real code.  In fact, glibc has been throwing
out a bunch of assembly code in recent years, as testing shows much of
it to not to have any noticeable advantage.

If the customized calling convention scheme works out, it's going to
be a huge incentive to fix the compiler in case of performance
lossage; it will be quite difficult to write assembly that is equally
performant for all possible calling conventions, and if you try to
assume a convention, then the assumption propagates up through the
program, possibly defeating more important optimizations.

Yeah, that all matches with my intuition as well.  That said, when it comes to human nature and the drive to optimize very specific things, sometimes what “makes sense” in the big picture best lost.  In any case, my question isn’t very important, it can be figured out on a case by case basis over time.

-Chris

Kamil Rytarowski via llvm-dev

unread,
Jun 27, 2019, 1:16:24 PM6/27/19
to Rich Felker, llvm...@lists.llvm.org
On 25.06.2019 21:12, Rich Felker via llvm-dev wrote:
> Since I have a little experience in this area, I'd like to chime in on
> it. :-) TL;DR I think it's a reall, REALLY bad idea.

As a contributor to NetBSD libc, I don't see any benefits in the
proposal. Mentioned motivations like static linking, static PIE are
supported natively out of the box.

Tighter sanitizer integration? NetBSD supports in-libc UBSan and
whole-distribution sanitization (ASan, UBSan, TSan, MSan.. in various
degrees of completeness).

Licensing issues? The implementation is (L)GPL-free...

NetBSD libc is an integral and inseparable part of the NetBSD
distribution. We share the same code with the kernel, userland
utilities, bootloader, rumpkernels..

Every kernel has its own specific syscall ABI layer and thus parts like
libpthread need to be implemented largely for each OS separately. Even
every BSD is totally different here. Portable libdl? Not really as we
shall support dynamic loader specifics on per-OS basis.

Furthermore downstream OSs like NetBSD need downstream specific behavior
in toolchain that is tightly integrated into loader/libc/kernel.

Reimplementing libc would be a tremendous work for literally no gain.

signature.asc

mayuyu.io via llvm-dev

unread,
Jun 27, 2019, 1:49:57 PM6/27/19
to Kamil Rytarowski, llvm...@lists.llvm.org, Rich Felker
I surely want to see this happen to. But afaik on Darwin the libc(libSystem) is even more complicated and plays an integral part of ObjectiveC/DYLD runtime. I don't think we’ll be able to achieve what we need without literally reimplementing everything

Zhang

Alex Brachet-Mialot via llvm-dev

unread,
Jun 27, 2019, 2:40:35 PM6/27/19
to n...@gmx.com, LLVMDev
I think I might share similar concerns with Zachary, however mine is more of a confusion.

How would this work to have two libc's at the same time? If my program is linked against this libc, but I am using a library linked to the system libc, how does shared state work/ If I setlocale(3) in my program, it wont affect the library which is using a different libc like would be expected, correct?

Saleem Abdulrasool via llvm-dev

unread,
Jun 27, 2019, 2:53:36 PM6/27/19
to Siva Chandra, llvm-dev
On Mon, Jun 24, 2019 at 3:23 PM Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:

Hello LLVM Developers,


Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.


Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need, and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project. 


To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.


We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:


  1. The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.

  2. The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.

  3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.


I don’t think that POSIX additions should be part of the core library.  Not all interesting targets are POSIX: e.g. Windows.  I think that POSIX should be a separate standalone library piece as you mention that dynamic loading should be downthread.  I think that the only pieces that should be available in the core should be the C11 core specification.

What parts of the C standard do you consider as not being worth implementing?

If you are looking to implement “extensions” which replace the modern coding practices, does that mean that the surface really should be the MSVCRT implementation then?  Because it does deprecate the “unsafe” routines in favour of safe versions (suffixed with `_s`).  Additionally, you could always just implement the C standard annex and use those instead.
 
  1. Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.


How would this work for reasonable bodies of code which are built on Linux?  e.g. Chrome does have Linux specific paths and I would be surprised if Chrome does not depend on any GNU behaviours.
 
  1. The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.


There are also few areas which we do not intend to invest in at this point:


  1. Implement dynamic loading and linking support.


If this is done as a “library” layer, then so should POSIX and the C99/C11 annexes.
 
  1. Support for more architectures (we'll start with just x86-64 for simplicity).


I think that AArch64 is pretty core these days and leaving that out is pretty restrictive.  At this point Windows AArch64 is an interesting target.  With Linux AArch64 and Windows AArch64 becoming more mainstream, it seems like a poor design tradeoff to limit the target to Linux x86_64.
 

For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.


We intend to build the new libc in a gradual manner. To begin with,  the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.


This is really tricky and finicky to implement (I have done something like this in the past).  On ELF you can interposition symbols, but on PE/COFF with two level namespace binding, this needs to be statically resolved.  Would the approach mean that symbols are interpositioned at compile time to ensure that they are fully redirected?  How will you manage cross-domain memory once a malloc implementation is included into the library?  What happens with threading?

The general libc implementation would require that full threading is under its control - consider cases like the IE model for TLS.  This requires the loader to be aware of the modules and the full spacing.  Another example where this starts to break down is with faulty - it was just a library layer that implemented compressed memory mapped library loading because a previous libc implementation - bionic - suffered from extensive issues including the inability to load more than a handful of modules.  This is far from only limitation of the bionic libc implementation, but this doesn’t seem like the appropriate forum for discussing the previous libc implementation attempts.

One other point of interest to this is how would the loader integration work?  With glibc, the loader effectively embeds a copy of libc for itself, and has to dig through the kernel handoff (AT_AUXV) to get the loader location.  What happens with multiple object file formats?  PE/COFF does not load the same way as ELF and may ripple through the rest of the library.  The libc integration is needed for the resolution of symbols as well as for TLS.
 

So, what do you think about incorporating this new libc under the LLVM project?


As stated, I really feel that this is far too specialised to certain use cases that are pertinent to Google.  I think that this needs to be broadened to allow a general purpose libc much as libc++ is a general C++ implementation.  I think that the project has a different set of requirements and seems like it would be extremely interesting to see how it would develop over time.  This could really be an interesting choice for a certain type of project but as described feels like it is best explored outside of the umbrella of LLVM.
 

Thank you,

Siva Chandra and the rest of the Google LLVM contributors


_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


--
Saleem Abdulrasool
compnerd (at) compnerd (dot) org

Owen Anderson via llvm-dev

unread,
Jun 27, 2019, 4:19:18 PM6/27/19
to Saleem Abdulrasool, llvm-dev


On Jun 27, 2019, at 2:53 PM, Saleem Abdulrasool via llvm-dev <llvm...@lists.llvm.org> wrote:


So, what do you think about incorporating this new libc under the LLVM project?

As stated, I really feel that this is far too specialised to certain use cases that are pertinent to Google.  I think that this needs to be broadened to allow a general purpose libc much as libc++ is a general C++ implementation.  I think that the project has a different set of requirements and seems like it would be extremely interesting to see how it would develop over time.  This could really be an interesting choice for a certain type of project but as described feels like it is best explored outside of the umbrella of LLVM.


I don't have a strong stake in this decision, but Saleem's commentary matches my thoughts on the topic.  Maybe some of this is related to messaging - would the proposed project be *an* LLVM libc or *the* LLVM libc.  There is already at least one instance within the LLVM umbrella where a subproject designed and built to a particular set of constraints became *the* LLVM solution, and ended up disincentivizing investment from contributors whose priorities didn't match those constraints.  Staking the blessed-by-LLVM slot for a piece of the toolchain is not free.

To turn the question around, why should *this* libc (assuming it will be built whether or not LLVM accepts it) be *the* LLVM libc?

--Owen

Chris Lattner via llvm-dev

unread,
Jun 27, 2019, 5:05:08 PM6/27/19
to Owen Anderson, llvm-dev
Saleem, Owen, others on the thread who are concerned about this: it seems that some of the concern is that the project goals are too narrow, and thus the eventual result may not serve the full community well over time.

Would any of you be interested in what we should consider as the list of requirements for such a full solution?  It would make it much easier to evaluate initial steps if we were to have a big picture of the problem to solve over time.

-Chris
_______________________________________________

Siva Chandra via llvm-dev

unread,
Jun 27, 2019, 6:39:26 PM6/27/19
to Zachary Turner, llvm-dev
On Thu, Jun 27, 2019 at 9:06 AM Zachary Turner <ztu...@roblox.com> wrote:
> I guess let me make this concrete: can you propose a specific separation that you have in mind?
>
> Keep in mind that even if A doesn’t depend on B, that doesn’t mean that A and B can be separated. You mentioned that open() and close() would obviously have to be done at the same time, but it’s much worse than this: The *entire transitive closure* of open() and close() must be done at the same time, and my hypothesis is that this is going to a) be much larger than you expect, and b) be different with different underlying libc implementations.

Let me change the direction here a little bit. Lets say, for Windows,
you can develop the new libc starting from a clean slate without
having to worry about the redirectors/forwarders. Is that a good
enough place for you to start?

What I am getting to is this: redirectors are probably an
implementation detail at this point. We think they will allow us to
develop and phase-in this libc in a gradual manner. But, if they end
up being a problem on other platforms, we will build them in such a
way that they only stay as Linux specific implementation details. If
other platforms can benefit from them, they are of course free to
adopt them.

> Then there are more immediate issues. On Windows specifically, I’m not even sure it’s going to be physically possible to link in two copies of the CRT and have one forward to the other. If it is possible, it’s very non obvious how to make it work and will likely require a ton of additional machinery.

No, I do not think we want to mix up CRTs on any platform. At the
least, it will be disruptive to the compiler drivers. Our goal is to
build a CRT with supports statically linked executables on Linux. We
do not intend to mix this new CRT with the CRT from the system libc.
The new CRT might only be useful after a non-trivial part of the libc
has been built. Until then, we have to use the CRT from the system
libc.

Siva Chandra via llvm-dev

unread,
Jun 27, 2019, 6:43:34 PM6/27/19
to Chris Lattner, llvm-dev
On Thu, Jun 27, 2019 at 2:05 PM Chris Lattner via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> Saleem, Owen, others on the thread who are concerned about this: it seems that some of the concern is that the project goals are too narrow, and thus the eventual result may not serve the full community well over time.

May be my email listing our goals is being misinterpreted as being the
bounding set of goals for the project. So, let me make it clear again:
The goals I have listed are just our initial set of goals for the
project. Members of the community are of course free to add their own
goals to this set, implement them, and make it a "full solution." I
have also mentioned in some of my earlier emails that we do not intend
to design out any particular feature or platform. For example, I have
said that we do not intend to work on dynamic linking/loading at least
to begin with. This does not mean that the scope of the project is
curtailed to static linking. The members of the community are free to
add support for dynamic linking/loading. In fact, if dynamic
linking/loading support is added in a modular/"as a library" fashion,
it makes it a win-win situation as we will be able to take it out if
we do not require it.

Zachary Turner via llvm-dev

unread,
Jun 27, 2019, 6:57:38 PM6/27/19
to Siva Chandra, llvm-dev
On Thu, Jun 27, 2019 at 3:39 PM Siva Chandra <sivac...@google.com> wrote:
On Thu, Jun 27, 2019 at 9:06 AM Zachary Turner <ztu...@roblox.com> wrote:
> I guess let me make this concrete: can you propose a specific separation that you have in mind?
>
> Keep in mind that even if A doesn’t depend on B, that doesn’t mean that A and B can be separated.  You mentioned that open() and close() would obviously  have to be done at the same time, but it’s much worse than this: The *entire transitive closure* of open() and close() must be done at the same time, and my hypothesis is that this is going to a) be much larger than you expect, and b) be different with different underlying libc implementations.

Let me change the direction here a little bit. Lets say, for Windows,
you can develop the new libc starting from a clean slate without
having to worry about the redirectors/forwarders. Is that a good
enough place for you to start?
It's probably a good enough place for me to start, yes.  I still have reservations -- even for the Linux case -- about whether it will be possible to make a reasonable separation of library calls in such a way that set A redirects, set B doesn't redirect, and everything works without any issues, but as long as the general community isn't locked into such a model for every platform, then I guess it can be up to the platform owners to work those issues on their own.
 

> Then there are more immediate issues.  On Windows specifically, I’m not even sure it’s going to be physically possible to link in two copies of the CRT and have one forward to the other.  If it is possible, it’s very non obvious how to make it work and will likely require a ton of additional machinery.

No, I do not think we want to mix up CRTs on any platform. At the
least, it will be disruptive to the compiler drivers. Our goal is to
build a CRT with supports statically linked executables on Linux. We
do not intend to mix this new CRT with the CRT from the system libc.
The new CRT might only be useful after a non-trivial part of the libc
has been built. Until then, we have to use the CRT from the system
libc.
How would you perform redirection if both copies are not linked in?  Some sort of out-of-process mechanism?  Or maybe I'm misunderstanding the nature of the redirection you're referring to.

Andrew Kelley via llvm-dev

unread,
Jun 27, 2019, 7:00:42 PM6/27/19
to Siva Chandra, llvm-dev
On 6/27/19 1:08 AM, Siva Chandra wrote:
>
>
> On Wed, Jun 26, 2019 at 9:02 AM Andrew Kelley via llvm-dev
> <llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>> wrote:
>> It seems reasonable to assume that Google is only
>> interested in Linux. In this case I have to re-iterate my original
>> question, what are the needs that are not being met by existing Linux
>> libcs, such as musl?
>
> First of all, let me make this clear: musl is great, and I have used it
> personally to learn about how things work. We also evaluated the option
> of adopting musl and modifying it to suit our needs (I listed our rough
> goals here: http://lists.llvm.org/pipermail/llvm-dev/2019-June/133360.html)

Thanks for listing those goals. This ABI-independent thing, does that
mean that you plan to add a new C ABI environment to LLVM's list? (See
my second question here:
http://lists.llvm.org/pipermail/llvm-dev/2019-June/133354.html)

What would it be called?

signature.asc

Siva Chandra via llvm-dev

unread,
Jun 27, 2019, 7:17:05 PM6/27/19
to Zachary Turner, llvm-dev
On Thu, Jun 27, 2019 at 3:56 PM Zachary Turner <ztu...@roblox.com> wrote:
No, I do not think we want to mix up CRTs on any platform. At the
least, it will be disruptive to the compiler drivers. Our goal is to
build a CRT with supports statically linked executables on Linux. We
do not intend to mix this new CRT with the CRT from the system libc.
The new CRT might only be useful after a non-trivial part of the libc
has been built. Until then, we have to use the CRT from the system
libc.
 
How would you perform redirection if both copies are not linked in?  Some sort of out-of-process mechanism?  Or maybe I'm misunderstanding the nature of the redirection you're referring to.

There is probably a difference in what we mean by CRT _and_ redirectors. Let me try to make my meaning clear.

By CRT, I am referring to the [r]crt*.o files on Linux which handle program startup and termination logic. I do not know if CRT means something else on Windows.

With respect to "redirectors", I do not want to get locked into an implementation discussion here, so let me just say that they are simply functions in the new libc which merely call into the system libc.

Zachary Turner via llvm-dev

unread,
Jun 27, 2019, 7:45:29 PM6/27/19
to Siva Chandra, llvm-dev
The difference seems to be that in Windows's version of libc (which they dub "The CRT") you can't take one without the other.  You get program startup and termination logic + everything else, and you can't pick and choose.

Saleem Abdulrasool via llvm-dev

unread,
Jun 27, 2019, 7:48:24 PM6/27/19
to Chris Lattner, llvm-dev
On Thu, Jun 27, 2019 at 2:05 PM Chris Lattner <clat...@nondot.org> wrote:
Saleem, Owen, others on the thread who are concerned about this: it seems that some of the concern is that the project goals are too narrow, and thus the eventual result may not serve the full community well over time.

Would any of you be interested in what we should consider as the list of requirements for such a full solution?  It would make it much easier to evaluate initial steps if we were to have a big picture of the problem to solve over time.

Sure, I think that would be a good idea.  Off the top of my head, something like this would be a good starting point:

- a complete C11 standards compliant library (with complete support for dynamic linking - remember __declspec(dllimport))
- bundled dynamic loader which is capable of loading ELF/PE/MachO binaries
- full TLS compatibility (including copy relocation)
- compatible with OSes supported by LLVM (at least Windows, FreeBSD, Darwin, and Linux)
- compatible with popular architectures supported by LLVM (at least x86, arm, arm64, and PPC)
- portable code (e.g. no weak symbols)
- ability to externalise (and even exclude!) locale data
- optional POSIX layer
- optional inclusion of C11 annexes
- complete enough to replace the default system libc
 
-Chris

On Jun 27, 2019, at 1:19 PM, Owen Anderson via llvm-dev <llvm...@lists.llvm.org> wrote:



On Jun 27, 2019, at 2:53 PM, Saleem Abdulrasool via llvm-dev <llvm...@lists.llvm.org> wrote:


So, what do you think about incorporating this new libc under the LLVM project?

As stated, I really feel that this is far too specialised to certain use cases that are pertinent to Google.  I think that this needs to be broadened to allow a general purpose libc much as libc++ is a general C++ implementation.  I think that the project has a different set of requirements and seems like it would be extremely interesting to see how it would develop over time.  This could really be an interesting choice for a certain type of project but as described feels like it is best explored outside of the umbrella of LLVM.


I don't have a strong stake in this decision, but Saleem's commentary matches my thoughts on the topic.  Maybe some of this is related to messaging - would the proposed project be *an* LLVM libc or *the* LLVM libc.  There is already at least one instance within the LLVM umbrella where a subproject designed and built to a particular set of constraints became *the* LLVM solution, and ended up disincentivizing investment from contributors whose priorities didn't match those constraints.  Staking the blessed-by-LLVM slot for a piece of the toolchain is not free.

To turn the question around, why should *this* libc (assuming it will be built whether or not LLVM accepts it) be *the* LLVM libc?

--Owen
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Finkel, Hal J. via llvm-dev

unread,
Jun 27, 2019, 7:57:24 PM6/27/19
to Siva Chandra, Chris Lattner, llvm-dev

On 6/27/19 5:43 PM, Siva Chandra via llvm-dev wrote:
> On Thu, Jun 27, 2019 at 2:05 PM Chris Lattner via llvm-dev
> <llvm...@lists.llvm.org> wrote:
>> Saleem, Owen, others on the thread who are concerned about this: it seems that some of the concern is that the project goals are too narrow, and thus the eventual result may not serve the full community well over time.
> May be my email listing our goals is being misinterpreted as being the
> bounding set of goals for the project. So, let me make it clear again:
> The goals I have listed are just our initial set of goals for the
> project. Members of the community are of course free to add their own
> goals to this set, implement them, and make it a "full solution." I
> have also mentioned in some of my earlier emails that we do not intend
> to design out any particular feature or platform. For example, I have
> said that we do not intend to work on dynamic linking/loading at least
> to begin with. This does not mean that the scope of the project is
> curtailed to static linking. The members of the community are free to
> add support for dynamic linking/loading. In fact, if dynamic
> linking/loading support is added in a modular/"as a library" fashion,
> it makes it a win-win situation as we will be able to take it out if
> we do not require it.
>

I think that it is important that we not, as a community, exclude from
the project any libc implementation just because it does not aim to be a
glibc or Windows CRT replacement. If people want that, then that's
great, but there is significant value regardless.

One of my primary use cases for an LLVM libc is to take a subset of it
and link it with our OpenMP device-side runtime library, or into code
being compiled for CUDA/HIP/SYCL/etc. (so that we can support compiling
code for accelerators (e.g., GPUs) that happens to call snprintf (or
whatever) across platform from a variety of vendors). I believe that I
can get this capability with only minor additional effort, so long as
the libc is sufficiently modular. Being part of the LLVM project will
make it much easier to ensure that this configuration is tested and
supported.

 -Hal


>
>
>
>> -Chris
>>
>> On Jun 27, 2019, at 1:19 PM, Owen Anderson via llvm-dev <llvm...@lists.llvm.org> wrote:
>>
>>
>>
>> On Jun 27, 2019, at 2:53 PM, Saleem Abdulrasool via llvm-dev <llvm...@lists.llvm.org> wrote:
>>
>>> So, what do you think about incorporating this new libc under the LLVM project?
>>
>> As stated, I really feel that this is far too specialised to certain use cases that are pertinent to Google. I think that this needs to be broadened to allow a general purpose libc much as libc++ is a general C++ implementation. I think that the project has a different set of requirements and seems like it would be extremely interesting to see how it would develop over time. This could really be an interesting choice for a certain type of project but as described feels like it is best explored outside of the umbrella of LLVM.
>>
>>
>> I don't have a strong stake in this decision, but Saleem's commentary matches my thoughts on the topic. Maybe some of this is related to messaging - would the proposed project be *an* LLVM libc or *the* LLVM libc. There is already at least one instance within the LLVM umbrella where a subproject designed and built to a particular set of constraints became *the* LLVM solution, and ended up disincentivizing investment from contributors whose priorities didn't match those constraints. Staking the blessed-by-LLVM slot for a piece of the toolchain is not free.
>>
>> To turn the question around, why should *this* libc (assuming it will be built whether or not LLVM accepts it) be *the* LLVM libc?
>>
>> --Owen
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm...@lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm...@lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Saleem Abdulrasool via llvm-dev

unread,
Jun 27, 2019, 7:59:30 PM6/27/19
to Siva Chandra, llvm-dev
On Thu, Jun 27, 2019 at 4:16 PM Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:

On Thu, Jun 27, 2019 at 3:56 PM Zachary Turner <ztu...@roblox.com> wrote:
No, I do not think we want to mix up CRTs on any platform. At the
least, it will be disruptive to the compiler drivers. Our goal is to
build a CRT with supports statically linked executables on Linux. We
do not intend to mix this new CRT with the CRT from the system libc.
The new CRT might only be useful after a non-trivial part of the libc
has been built. Until then, we have to use the CRT from the system
libc.
 
How would you perform redirection if both copies are not linked in?  Some sort of out-of-process mechanism?  Or maybe I'm misunderstanding the nature of the redirection you're referring to.

There is probably a difference in what we mean by CRT _and_ redirectors. Let me try to make my meaning clear.

By CRT, I am referring to the [r]crt*.o files on Linux which handle program startup and termination logic. I do not know if CRT means something else on Windows.

This tends to be a source of confusion.  The files that you are talking about are not required at all.  In fact, you should start with these files not part of the project.  These are meant for one purpose: they provide the initialization required for the libc.  These files are inherently tied to the implementation because they must initialise the data structures for the exact libc implementation that they comes from.  The Windows approach here is actually very clever: it is part of the import library that you link against and it provides the equivalent routines.  Yes, at some point, a libc will require initialization, but the files are there as an implementation detail, not as an end goal.  They are implicitly non-portable and not of any material concern to such a project.

For what it is worth, I do believe that these files do really belong in the libc project because they are so intricately tied to the implementation of the language.  I just think that the fact these files will be part of the project is merely an implementation detail and should not even be part of the discussion here.
 

With respect to "redirectors", I do not want to get locked into an implementation discussion here, so let me just say that they are simply functions in the new libc which merely call into the system libc.

I think that this is actually important to understand.  This was one thing that you pointed out as being really important to the implementation for Linux.  I would like to understand what this approach is, because I have at least two different approaches that I can suggest with pros and cons to each which I have used successfully in the past.
 
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Chris Lattner via llvm-dev

unread,
Jun 27, 2019, 8:04:58 PM6/27/19
to Saleem Abdulrasool, llvm-dev
Makes sense, but beyond “capabilities” of the eventual result, I think it is important to focus on *design* points, since they are the things that will shape the result as the effort goes from nothing to something.

For example, I think that subset-ability is really important, so imo a build system that allows compiling different confirmations is really important.  This implies some configuration description, an internal dependence graph, a way to depend on targets with multiple possible implementations, etc.

Getting that right would also make compiler_rt a bit nicer.

In any case, if there other design points people are interested in, I’m sure Siva would appreciate knowing about them.

-Chris

Siva Chandra via llvm-dev

unread,
Jun 27, 2019, 8:05:41 PM6/27/19
to JF Bastien, llvm-dev
On Wed, Jun 26, 2019 at 10:27 AM JF Bastien <jfba...@apple.com> wrote:
>> 3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.

> I’d love to hear what you have in mind with point 3 above, and see it expanded. libc++ implements C++11 and subsequent standards, and that makes me wonder:
>
> Which standards would this libc implement?

We need parts of the C standard library, parts of the POSIX
extensions, and also the linux headers. The community is of course
free to widen the surface as needed.

> Would you implement upcoming C standards, and how would you manage “experimental” features (API changes, ABI changes, etc)?

We will probably take this up on an as-needed basis.

> What parts of the standard wouldn’t you follow, why, how would the LLVM community determine this?

I would think what we (the "we" here is for the developer community
and not my company or my team) communicate would depend on how the
project evolves. For example, at the very beginning, we will probably
only say "large parts of the standards A, B, C are still
unimplemented." When the implemented surface becomes large enough, we
might start explicitly listing the unimplemented parts. There might be
parts which require qualification with version numbers.

> Which parts aren’t worth implementing?
> Which parts cannot be safely used in modern coding practice? How would you remedy what’s perceived as “the bad parts”?

At a certain level, what is worth and what is safe/unsafe is a
subjective matter. So, instead of listing my opinions here, let me say
this: If we build sufficient modularity into the libc, one will be
able to pick and choose what they want, and omit what they do not
want.

> I’d love it if the C Standards Committee, WG14, got renewed involvement through this project. Is that an explicit goal? Who will join WG14 in this effort?
> What part of C do you see this project help improve over time?

The answer to this question also depends on how the project and the
community around it evolves.

> How do you intend to test this C library? Fuzzing and all that is nice, but just straight conformance testing is what I’d like to hear about.

What kind of testing we want to do depends on what exactly is getting
tested. But in general, we want to do conformance tests for sure. We
also want to do some amount of differential testing between this new
libc and an existing, battle tested libc. Depending on what is getting
tested, we also want to be able to test against the test suite of an
existing libc.

JF Bastien via llvm-dev

unread,
Jun 27, 2019, 8:19:44 PM6/27/19
to Siva Chandra, llvm-dev, Marshall Clow
On Jun 27, 2019, at 5:05 PM, Siva Chandra <sivac...@google.com> wrote:

On Wed, Jun 26, 2019 at 10:27 AM JF Bastien <jfba...@apple.com> wrote:
3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.

I’d love to hear what you have in mind with point 3 above, and see it expanded. libc++ implements C++11 and subsequent standards, and that makes me wonder:

Which standards would this libc implement?

We need parts of the C standard library, parts of the POSIX
extensions, and also the linux headers. The community is of course
free to widen the surface as needed.

Which standard specifically? So far the responses sound like “the standard Google uses”. I don’t think that's a good objective for such a project. For practical purposes that’s the implementation approach that makes sense to start with, but I’m looking for what the charter of this LLVM project should be.

Compare with libc++: https://libcxx.llvm.org
I think you want to fill out a proposed set of documentation pages, like libc++’s, and answer the questions libc++ answers. Not where you’ll start or in what order (though that’s useful for this discussion!), but what your proposed libc aspires to be.


Would you implement upcoming C standards, and how would you manage “experimental” features (API changes, ABI changes, etc)?

We will probably take this up on an as-needed basis.

Same as above, IMO an LLVM project should aspire to something bigger, even if practical concerns guide the initial implementation.


What parts of the standard wouldn’t you follow, why, how would the LLVM community determine this?

I would think what we (the "we" here is for the developer community
and not my company or my team) communicate would depend on how the
project evolves. For example, at the very beginning, we will probably
only say "large parts of the standards A, B, C are still
unimplemented." When the implemented surface becomes large enough, we
might start explicitly listing the unimplemented parts. There might be
parts which require qualification with version numbers.

Which parts aren’t worth implementing?
Which parts cannot be safely used in modern coding practice? How would you remedy what’s perceived as “the bad parts”?

At a certain level, what is worth and what is safe/unsafe is a
subjective matter. So, instead of listing my opinions here, let me say
this: If we build sufficient modularity into the libc, one will be
able to pick and choose what they want, and omit what they do not
want.

I’d love it if the C Standards Committee, WG14, got renewed involvement through this project. Is that an explicit goal? Who will join WG14 in this effort?
What part of C do you see this project help improve over time?

The answer to this question also depends on how the project and the
community around it evolves.

Personally I’m really interested in a project that increases the quality of all C libraries, and of the C standard. I therefore think champions of this project signing up to collaborate with WG14 is important.


How do you intend to test this C library? Fuzzing and all that is nice, but just straight conformance testing is what I’d like to hear about.

What kind of testing we want to do depends on what exactly is getting
tested. But in general, we want to do conformance tests for sure. We
also want to do some amount of differential testing between this new
libc and an existing, battle tested libc. Depending on what is getting
tested, we also want to be able to test against the test suite of an
existing libc.

I think again, it’s useful to look at libc++ here, and see its testing strategy. It tests against multiple standards, calling out what it’s testing exactly, and it also tests extensions and other non-standard things, calling out when it does so. This allows, for example, the Microsoft STL implementors to use the libc++ test suite.

I think you need write a design for how this C library will be tested.


I suggest you have a chat with Marshall Clow (CC’ed). He does a lot of really good work with libc++ and the C++ Standards Committee. I’d like this C library to be similar to libc++ in many ways, and I’d like a leader like Marshall involved in leading this C library. Talking to Marshall will help understand the type of leadership I’d like to see in this project.



Ori Bernstein via llvm-dev

unread,
Jun 27, 2019, 8:20:28 PM6/27/19
to Siva Chandra, Siva Chandra via llvm-dev
On Thu, 27 Jun 2019 15:43:08 -0700
Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:

> On Thu, Jun 27, 2019 at 2:05 PM Chris Lattner via llvm-dev
> <llvm...@lists.llvm.org> wrote:
> >
> > Saleem, Owen, others on the thread who are concerned about this: it seems
> > that some of the concern is that the project goals are too narrow, and
> > thus the eventual result may not serve the full community well over time.
>
> May be my email listing our goals is being misinterpreted as being the
> bounding set of goals for the project. So, let me make it clear again:
> The goals I have listed are just our initial set of goals for the
> project. Members of the community are of course free to add their own
> goals to this set, implement them, and make it a "full solution." I
> have also mentioned in some of my earlier emails that we do not intend
> to design out any particular feature or platform. For example, I have
> said that we do not intend to work on dynamic linking/loading at least
> to begin with. This does not mean that the scope of the project is
> curtailed to static linking. The members of the community are free to
> add support for dynamic linking/loading. In fact, if dynamic
> linking/loading support is added in a modular/"as a library" fashion,
> it makes it a win-win situation as we will be able to take it out if
> we do not require it.

The discussion here makes me strongly suspect that this libc will remain a
linux-only implementation.

OpenBSD, and I think most other BSDs, OSX, Solaris, and others consider libc
an integral part of the system, and modify the ABI between the kernel and libc
with varying frequency. How would you want llvm libc to handle, for example,
OpenBSD's 64 bit time_t transition? There will be other situations like it.

I don't think a Linux-only solution should be adopted by LLVM, and I think that
using a non-system libc is something that will cause more pain than it's worth
outside of cases where someone has full platform control.

I wouldn't mind being proven wrong, maybe people will jump in, port it, and
maintain it on multiple platforms. I'd like to see this happen *before* this
libc was put under the LLVM umbrella.

Libcs can be written outside of LLVM, and code can be imported after it's
in wider use.

But then again, I'm mostly an observer.

--
Ori Bernstein <o...@eigenstate.org>

Zachary Turner via llvm-dev

unread,
Jun 27, 2019, 8:22:49 PM6/27/19
to Saleem Abdulrasool, llvm-dev
On Thu, Jun 27, 2019 at 4:58 PM Saleem Abdulrasool <comp...@compnerd.org> wrote:
For what it is worth, I do believe that these files do really belong in the libc project because they are so intricately tied to the implementation of the language.  I just think that the fact these files will be part of the project is merely an implementation detail and should not even be part of the discussion here.
 
It's relevant in the sense that any libc implementation on Windows will *require* these files to be part of the implementation.  You cannot (as far as I'm aware) borrow the ones from MSVCRT and then implement everything else yourself.

Saleem Abdulrasool via llvm-dev

unread,
Jun 27, 2019, 8:30:52 PM6/27/19
to Zachary Turner, llvm-dev
I feel like I have an even stronger “claim”: independent of any OS/architecture, unless you are developing a freestanding libc for an embedded device, you will need this at some point and you cannot borrow them from another source (which has long been the point of contention about adding these files to the compiler-rt project - they are tied entirely to the libc implementation).  As a result, the fact that they will exist is not important to the discussion, it is a given and an implementation detail of the runtime.

Saleem Abdulrasool via llvm-dev

unread,
Jun 27, 2019, 8:42:03 PM6/27/19
to Chris Lattner, llvm-dev
On Thu, Jun 27, 2019 at 5:04 PM Chris Lattner <clat...@nondot.org> wrote:
Makes sense, but beyond “capabilities” of the eventual result, I think it is important to focus on *design* points, since they are the things that will shape the result as the effort goes from nothing to something.

Yes, design points are important.  However, I think understanding what it is which is being built is equally important.  Without understanding what exactly the capabilities are, you can design something which doesn’t meet any of the requirements.
 

For example, I think that subset-ability is really important, so imo a build system that allows compiling different confirmations is really important.  This implies some configuration description, an internal dependence graph, a way to depend on targets with multiple possible implementations, etc.

I actually am interested in this - it would enable something like a complete dynamically linked libc which can be used on embedded environments.  I imagine that an excellent way to slice the library would be along the subsections of Section 7 of the C11 specification (e.g. stdio, localization, mathematics, etc).
 

Getting that right would also make compiler_rt a bit nicer.

I am afraid I really don’t understand this part.  Could you please explain how this would improve compiler-rt?  I suspect my understanding is incorrect on something as compiler-rt is really three things:

- the sanitizers (which can be split into their own repository/directory/etc)
- a collection of odds and ends (e.g. the fuschia C runtime support)
- the builtins (which are extracted functions for performing common operations which the CPU may not support or can become expensive to inline everywhere or be unweildy to implement inline in the compiler)

Which bit would be influenced by the design decisions for a libc?

Finkel, Hal J. via llvm-dev

unread,
Jun 27, 2019, 9:08:55 PM6/27/19
to Ori Bernstein, Siva Chandra, Siva Chandra via llvm-dev
In my experience, this approach always leads to an inferior result.
LLVM's code-review and community-feedback processes, when followed from
the very beginning, often leads to a high-quality implementation with
sufficiently-broad applicability (where broad is defined by the
interests of those reviewing the code). Importing code developed outside
of the community, while that sometimes works out well and serves a
community need, obtaining real integration with the rest of community
takes significant time and effort (and takes, in practice, years). I'd
rather have it start as an LLVM project, and if it fails as a community
project, get spun off, than the other way around.

 -Hal


>
> But then again, I'm mostly an observer.
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Siva Chandra via llvm-dev

unread,
Jun 28, 2019, 1:15:59 AM6/28/19
to Zachary Turner, llvm-dev
On Thu, Jun 27, 2019 at 4:45 PM Zachary Turner <ztu...@roblox.com> wrote:
>
> The difference seems to be that in Windows's version of libc (which they dub "The CRT") you can't take one without the other. You get program startup and termination logic + everything else, and you can't pick and choose.

Now I understand why you want to start with a clean slate for Windows.
As I said in my earlier email, it should be OK for a platform to not
use the redirector strategy if it cannot be implemented in a
straightforward manner.

Mehdi AMINI via llvm-dev

unread,
Jun 28, 2019, 1:28:46 AM6/28/19
to Siva Chandra, llvm-dev
<disclaimer: I work at Google, though not on anything related to this project>

On Thu, Jun 27, 2019 at 3:43 PM Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:
On Thu, Jun 27, 2019 at 2:05 PM Chris Lattner via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> Saleem, Owen, others on the thread who are concerned about this: it seems that some of the concern is that the project goals are too narrow, and thus the eventual result may not serve the full community well over time.

May be my email listing our goals is being misinterpreted as being the
bounding set of goals for the project. So, let me make it clear again:
The goals I have listed are just our initial set of goals for the
project. Members of the community are of course free to add their own
goals to this set, implement them, and make it a "full solution." I
have also mentioned in some of my earlier emails that we do not intend
to design out any particular feature or platform. For example, I have
said that we do not intend to work on dynamic linking/loading at least
to begin with. This does not mean that the scope of the project is
curtailed to static linking. The members of the community are free to
add support for dynamic linking/loading. In fact, if dynamic
linking/loading support is added in a modular/"as a library" fashion,
it makes it a win-win situation as we will be able to take it out if
we do not require it.

When you write that "Members of the community are of course free to add their own goals to this set", it seems that unless others are committing to putting immediate efforts into expanding the scope, then the design will be limited to your use-case (Linux/X86_64)

I still have concern with this: your use-case seems fairly restrictive to guide the design of a library that is supposed to generalize (assuming it can, apparently not everyone is convinced).
My take is that your scope is too restrictive for being really useful. While it is perfectly fine for you to be focused on the target you care about, I'd like to see other parties that are interested in other targets ready to engage in the development of this library from the beginning to consider this like a viable project to be developed under the LLVM umbrella.

This just my personal opinion, others may very well disagree. 

Best,

-- 
Mehdi

Siva Chandra via llvm-dev

unread,
Jun 28, 2019, 1:29:46 AM6/28/19
to JF Bastien, llvm-dev, Marshall Clow
On Thu, Jun 27, 2019 at 5:19 PM JF Bastien <jfba...@apple.com> wrote:
> On Jun 27, 2019, at 5:05 PM, Siva Chandra <sivac...@google.com> wrote:
> On Wed, Jun 26, 2019 at 10:27 AM JF Bastien <jfba...@apple.com> wrote:
>
> 3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.
>
>
> I’d love to hear what you have in mind with point 3 above, and see it expanded. libc++ implements C++11 and subsequent standards, and that makes me wonder:
>
> Which standards would this libc implement?
>
>
> We need parts of the C standard library, parts of the POSIX
> extensions, and also the linux headers. The community is of course
> free to widen the surface as needed.
>
>
> Which standard specifically? So far the responses sound like “the standard Google uses”.

I was of the opinion that you were asking me to elaborate point #3 of
mine from above.

> I don’t think that's a good objective for such a project. For practical purposes that’s the implementation approach that makes sense to start with, but I’m looking for what the charter of this LLVM project should be.

I want to refrain from talking as if this libc project has already
been accepted by the LLVM. But yes, if this libc project is indeed
accepted and takes off, we will definitely want a charter written down
for this as an LLVM project. And I also agree that this charter cannot
limit itself to Google's use cases.

> Compare with libc++: https://libcxx.llvm.org

Yes. Our aspirations for this libc are to be like libc++.

> I think you want to fill out a proposed set of documentation pages, like libc++’s, and answer the questions libc++ answers. Not where you’ll start or in what order (though that’s useful for this discussion!), but what your proposed libc aspires to be.

Absolutely!

> Same as above, IMO an LLVM project should aspire to something bigger, even if practical concerns guide the initial implementation.

Again, I want to wait for some sort of confirmation that we can
actually start work on this as an LLVM project.

> Personally I’m really interested in a project that increases the quality of all C libraries, and of the C standard. I therefore think champions of this project signing up to collaborate with WG14 is important.

I do not disagree. At the same time, I am of the opinion that such a
champion should grow out of this project rather than getting
volunteered or nominated. This is my personal opinion and I am ready
to be corrected.

> I think you need write a design for how this C library will be tested.

I can assure you that all this will happen once we take off.

> I suggest you have a chat with Marshall Clow (CC’ed). He does a lot of really good work with libc++ and the C++ Standards Committee. I’d like this C library to be similar to libc++ in many ways, and I’d like a leader like Marshall involved in leading this C library. Talking to Marshall will help understand the type of leadership I’d like to see in this project.

Experienced guidance is most welcome. And, thanks a lot for bringing
up everything you have done in this email. I also apologize for the
delay in my response, so thanks for your patience as well.

Cranmer, Joshua via llvm-dev

unread,
Jun 28, 2019, 9:35:44 AM6/28/19
to Finkel, Hal J., Siva Chandra, Chris Lattner, llvm...@lists.llvm.org
From: llvm-dev [mailto:llvm-dev...@lists.llvm.org] On Behalf Of Finkel, Hal J. via llvm-dev
> One of my primary use cases for an LLVM libc is to take a subset of it and link
> it with our OpenMP device-side runtime library, or into code being compiled
> for CUDA/HIP/SYCL/etc. (so that we can support compiling code for
> accelerators (e.g., GPUs) that happens to call snprintf (or
> whatever) across platform from a variety of vendors).

IMHO, this is a very useful use case for an LLVM libc. But I worry that a libc that is driven by the desire to replace a fully functional POSIX library on architectures running full OSes would not be the sort of libc that is well-placed to be ported to architectures that have minimal to no OS capabilities.

It would be useful to take a step back and collect the features that all of the interested stakeholders would want in an LLVM libc before moving forward on a potential implementation, or even deciding if there is enough common ground to make a libc that would satisfy everyone's needs.

JF Bastien via llvm-dev

unread,
Jun 28, 2019, 12:29:57 PM6/28/19
to Siva Chandra, llvm-dev, Marshall Clow
I think I now understand some of the disconnect you and I are having, and I think some of the pushback you’re getting from the community is the same. You’re talking about where you want to start with an LLVM libc. Many in the community (myself included) want to understand where we’ll get with this libc. At steady-state, what does it do? To a certain degree I don’t care about how you get to the steady state: sure the implementation approach is important, and which contributor cares about what parts is important in shaping that evolution, but at the end of the day what matters is where you get.

So here’s what’s missing: there’s no goal. Right now, your proposal is “let’s do an LLVM libc, starting with what I care about, who’s interested?”

That’s an OK place to start! You illustrated your needs, others chimed in with theirs, and now you know there’s some interest. However, you should take time now to come up with a plan. What’s this libc actually going to be? I ask a bunch of questions below that I think you need to answer as a next step. Others asked more questions which I didn’t echo, but which you should answer as well. What does this libc aspire to become?


More below:

On Jun 27, 2019, at 10:29 PM, Siva Chandra <sivac...@google.com> wrote:

On Thu, Jun 27, 2019 at 5:19 PM JF Bastien <jfba...@apple.com> wrote:
On Jun 27, 2019, at 5:05 PM, Siva Chandra <sivac...@google.com> wrote:
On Wed, Jun 26, 2019 at 10:27 AM JF Bastien <jfba...@apple.com> wrote:

3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.


I’d love to hear what you have in mind with point 3 above, and see it expanded. libc++ implements C++11 and subsequent standards, and that makes me wonder:

Which standards would this libc implement?


We need parts of the C standard library, parts of the POSIX
extensions, and also the linux headers. The community is of course
free to widen the surface as needed.


Which standard specifically? So far the responses sound like “the standard Google uses”.

I was of the opinion that you were asking me to elaborate point #3 of
mine from above.

I indeed was. I’d like a list of C / POSIX standards the library will try to conform to.

e.g. libc++ only really implements C++11 and later standards, some libstdc++ compatibility extensions, some experimental stuff from TSes, and a handful of other things. What’s your list?


I don’t think that's a good objective for such a project. For practical purposes that’s the implementation approach that makes sense to start with, but I’m looking for what the charter of this LLVM project should be.

I want to refrain from talking as if this libc project has already
been accepted by the LLVM. But yes, if this libc project is indeed
accepted and takes off, we will definitely want a charter written down
for this as an LLVM project. And I also agree that this charter cannot
limit itself to Google's use cases.

You want a charter, before the project is accepted.


Compare with libc++: https://libcxx.llvm.org

Yes. Our aspirations for this libc are to be like libc++.

I think you want to fill out a proposed set of documentation pages, like libc++’s, and answer the questions libc++ answers. Not where you’ll start or in what order (though that’s useful for this discussion!), but what your proposed libc aspires to be.

Absolutely!

Same as above, IMO an LLVM project should aspire to something bigger, even if practical concerns guide the initial implementation.

Again, I want to wait for some sort of confirmation that we can
actually start work on this as an LLVM project.

You have enough tentative support and interested contributors to warrant writing down a plan.


Personally I’m really interested in a project that increases the quality of all C libraries, and of the C standard. I therefore think champions of this project signing up to collaborate with WG14 is important.

I do not disagree. At the same time, I am of the opinion that such a
champion should grow out of this project rather than getting
volunteered or nominated. This is my personal opinion and I am ready
to be corrected.

Having this kind of champion is really important for an LLVM libc. I’m not sure I’d support such a project without such a person. As you come up with a plan, consider who that should be. Maybe it’s you :-)


I think you need write a design for how this C library will be tested.

I can assure you that all this will happen once we take off.

You want a plan before it takes off. Testing standardized stuff has enough precedent that you should be able to look at what others have done, and come up with a plan up front. I really like that you want to fuzz, use sanitizers, etc. That’s pretty novel for this kind of project. Basic standards testing isn’t novel, so it should be pretty easy to figure out.


I suggest you have a chat with Marshall Clow (CC’ed). He does a lot of really good work with libc++ and the C++ Standards Committee. I’d like this C library to be similar to libc++ in many ways, and I’d like a leader like Marshall involved in leading this C library. Talking to Marshall will help understand the type of leadership I’d like to see in this project.

Experienced guidance is most welcome. And, thanks a lot for bringing
up everything you have done in this email. I also apologize for the
delay in my response, so thanks for your patience as well.

No worries! You’ve got a lot of responses, and that’s good.

Stephen Canon via llvm-dev

unread,
Jun 28, 2019, 12:58:28 PM6/28/19
to Siva Chandra, llvm-dev
On Jun 26, 2019, at 2:20 PM, Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:

5. Avoid assembly language as far as possible - Again, there will be
places where one cannot avoid assembly level implementations. But,
wherever possible, we want to avoid assembly level implementations.
There are a few reasons here as well:

a) We want to leverage the compiler for performance wherever possible,
and as part of the LLVM project, fix compiler bugs rather than use
assembly.

As a long time libm and libc developer, and occasional compiler contributor, I will point out that this is either fundamentally in conflict with your other stated goals, entails a commitment to wide-ranging compiler improvements, or requires some very specific choices about your implementation. Much of a libc can be implemented quite easily in C or C++. However:

- You say you want to conform to relevant standards; however, e.g. the UNIX test suite requires that math.h functions not set spurious flags. This is impossible to reliably achieve in C with clang, because clang and LLVM do not precisely model the floating-point environment. On Apple’s platforms, much of the math library is written in assembly as much for this reason as for performance. I see four basic options for you here:

1. You could partially work around this by adding builtins and an extensive conformance suite, making your implementations fragile to compiler optimization but detecting the breakages immediately. 
2. You could do the work of precisely modeling the floating-point environment.
3. You could simply declare that you are not going to care about flags at all, which is fine for 99% of users, but is a clear break from relevant standards (and would make your libc unable to be adopted by some platform maintainers).
4. You could implement significant pieces of the math library in assembly.

None of these is a decision to be undertaken lightly. Have you thought about this issue at all?

I would also be curious what your plans are with regard to reproducible results in the math library: is it your intention to produce the same result on all platforms? On all microarchitectures? If so, and you’re developing for baseline x86_64 first, you’re locking yourself out of using many architectural features that are critical to delivering 30-50% of performance for these functions on other platforms (and even on newer x86)—static rounding control, FMA, etc. Even if you don’t care about that, implementation choices you make for around x86_64 will severely restrict your performance on other platforms if exact reproducibility is a requirement and you don’t carefully choose a set of “required ISA operations” on which to implement your math functions.

- For most platforms, there are significant performance wins available for some of the core strings and memory functions using assembly, even as compared to the best compiler auto-vectorization output. There are a few reasons for this, but one of the major ones is that—in assembly, on most architectures—we can safely do aligned memory accesses that are partially outside the buffer that has been passed in, and mask off or ignore the bytes that are invalid. This is a hugely significant optimization for edging around core vector loops, and it’s simply unavailable in C and C++ because of the abstract memory models they define. A compiler could do this for you automatically, but this is not yet implemented in LLVM (and you don’t want to be tightly coupled to LLVM, anyway?) In practice, on many systems, the small-buffer case dominates usage for these functions, so getting the most efficient edging code is basically the only thing that matters.

1. Are you going to teach LLVM to perform these optimizations? If so, awesome, but this is not at all a small project—you’re not just fixing an isolated perf bug, you’re fundamentally reworking autovectorization. What about other compilers?
2. Are you going to simply write off performance in these cases and let the autovectorizer do what it does?
3. Will you use assembly instead purely for optimization purposes?

A bunch of other questions will probably come to me around the math library, but I would encourage you to think very carefully about what specifications you want to have for a libm before you start building one. All that said, I think having more libc implementations is great, but I would be very careful to define what design tradeoffs you’re making around these choices and to what spec(s) you plan to conform, and why they necessitate a new libc rather than adapting an existing one.

– Steve

Siva Chandra via llvm-dev

unread,
Jun 28, 2019, 1:15:13 PM6/28/19
to JF Bastien, llvm-dev, Marshall Clow
On Fri, Jun 28, 2019 at 9:29 AM JF Bastien <jfba...@apple.com> wrote:
>
> I think I now understand some of the disconnect you and I are having, and I think some of the pushback you’re getting from the community is the same. You’re talking about where you want to start with an LLVM libc. Many in the community (myself included) want to understand where we’ll get with this libc. At steady-state, what does it do? To a certain degree I don’t care about how you get to the steady state: sure the implementation approach is important, and which contributor cares about what parts is important in shaping that evolution, but at the end of the day what matters is where you get.
>
> So here’s what’s missing: there’s no goal. Right now, your proposal is “let’s do an LLVM libc, starting with what I care about, who’s interested?”
>
> That’s an OK place to start! You illustrated your needs, others chimed in with theirs, and now you know there’s some interest. However, you should take time now to come up with a plan. What’s this libc actually going to be? I ask a bunch of questions below that I think you need to answer as a next step. Others asked more questions which I didn’t echo, but which you should answer as well. What does this libc aspire to become?

After my first step, my first email to this thread, I was waiting for
someone to drive me towards a process. Your email now has given me
sufficient guidance on how to proceed forward. So thank you for that.

In the coming days, I will start sharing/discussing the information
you are expecting.

Stan Shebs via llvm-dev

unread,
Jun 28, 2019, 2:52:16 PM6/28/19
to Stephen Canon, llvm-dev
On Fri, Jun 28, 2019 at 11:58 AM Stephen Canon via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> On Jun 26, 2019, at 2:20 PM, Siva Chandra via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> [...]

>
> 1. You could partially work around this by adding builtins and an extensive conformance suite, making your implementations fragile to compiler optimization but detecting the breakages immediately.
> 2. You could do the work of precisely modeling the floating-point environment.
> 3. You could simply declare that you are not going to care about flags at all, which is fine for 99% of users, but is a clear break from relevant standards (and would make your libc unable to be adopted by some platform maintainers).
> 4. You could implement significant pieces of the math library in assembly.

I'm no math expert, but I tangle with clang vs glibc's math code
regularly, and have discussed all this with Siva.

It's too early to say exactly what the implementation will look like,
but I anticipate it will be a combination of 1) and 2). There's
really no alternative to having a mode that does accurate flag
handling, but if the compiler has both library sources and call sites
in hand, it should be able to determine whether it needs to include,
say, underflow handling, and only compile in those parts. We've
handicapped ourselves somewhat by having shifted to a model where the
library functions are black boxes because of dynamic linking, and I
think we can do better than just introducing more and more ifuncs or
whatever.

I also expect there will be more work to do in the compiler, both for
builtins and for additional optimizations, and to me that is part of
the rationale to put the libc project under LLVM in general. There
won't be any secrets - if GCC folks want to try their hand at
compiling this libc, they're welcome to it - but there will be some
opportunities to co-develop library code that takes advantage of new
compiler abilities and vice versa.

> - For most platforms, there are significant performance wins available for some of the core strings and memory functions using assembly, even as compared to the best compiler auto-vectorization output. There are a few reasons for this, but one of the major ones is that—in assembly, on most architectures—we can safely do aligned memory accesses that are partially outside the buffer that has been passed in, and mask off or ignore the bytes that are invalid. This is a hugely significant optimization for edging around core vector loops, and it’s simply unavailable in C and C++ because of the abstract memory models they define. A compiler could do this for you automatically, but this is not yet implemented in LLVM (and you don’t want to be tightly coupled to LLVM, anyway?) In practice, on many systems, the small-buffer case dominates usage for these functions, so getting the most efficient edging code is basically the only thing that matters.

Google does have a little experience in this area, mem* being the libc
functions that perennially show up at the top of fleetwide performance
profiles. (Lots of protobufs to move, I guess. :-) ) I imagine there
will be both assembly and high-level versions in libc, and it will be
the compiler's challenge to meet or beat the assembly code.


>
> 1. Are you going to teach LLVM to perform these optimizations? If so, awesome, but this is not at all a small project—you’re not just fixing an isolated perf bug, you’re fundamentally reworking autovectorization. What about other compilers?
> 2. Are you going to simply write off performance in these cases and let the autovectorizer do what it does?
> 3. Will you use assembly instead purely for optimization purposes?
>
> A bunch of other questions will probably come to me around the math library, but I would encourage you to think very carefully about what specifications you want to have for a libm before you start building one. All that said, I think having more libc implementations is great, but I would be very careful to define what design tradeoffs you’re making around these choices and to what spec(s) you plan to conform, and why they necessitate a new libc rather than adapting an existing one.
>
> – Steve

Jean-Daniel via llvm-dev

unread,
Jun 28, 2019, 5:31:33 PM6/28/19
to Saleem Abdulrasool, llvm-dev

> Le 28 juin 2019 à 02:30, Saleem Abdulrasool via llvm-dev <llvm...@lists.llvm.org> a écrit :
>
>
> I feel like I have an even stronger “claim”: independent of any OS/architecture, unless you are developing a freestanding libc for an embedded device, you will need this at some point and you cannot borrow them from another source

They are not required if your kernel and dynamic linker are smart enough to perform the required initialisations / termination.

On macOS, there was a time where each binary include a crt.o file that define the _start symbols which was the software entry point and perform the required initialisation before calling main, but for some times now, dyld calls « main » directly and no longer « _start », and the compiler no longer have to include crt.o file in each binaries.

Siva Chandra via llvm-dev

unread,
Jul 12, 2019, 11:16:07 AM7/12/19
to JF Bastien, llvm-dev, Marshall Clow
On Fri, Jun 28, 2019 at 9:29 AM JF Bastien <jfba...@apple.com> wrote:
>
> I think I now understand some of the disconnect you and I are having, and I think some of the pushback you’re getting from the community is the same. You’re talking about where you want to start with an LLVM libc. Many in the community (myself included) want to understand where we’ll get with this libc. At steady-state, what does it do? To a certain degree I don’t care about how you get to the steady state: sure the implementation approach is important, and which contributor cares about what parts is important in shaping that evolution, but at the end of the day what matters is where you get.
>
> So here’s what’s missing: there’s no goal. Right now, your proposal is “let’s do an LLVM libc, starting with what I care about, who’s interested?”
>
> That’s an OK place to start! You illustrated your needs, others chimed in with theirs, and now you know there’s some interest. However, you should take time now to come up with a plan. What’s this libc actually going to be? I ask a bunch of questions below that I think you need to answer as a next step. Others asked more questions which I didn’t echo, but which you should answer as well. What does this libc aspire to become?

I apologize for the delay. I will try to address the above questions
in this email. I will shortly follow up with answers to other
questions.

Below is a write up which I think would qualify as the "charter" for
the new libc. It is also answering questions like, "where we’ll get
with this libc?", "what's this libc actually going to be?" and similar
ones. I have used libcxx.llvm.org landing page as a template to write
it down.

###############################################

"llvm-libc" C Standard Library
========================

llvm-libc is an implementation of the C standard library targeting C11
and above. It also provides platform specific extensions as relevant.
For example, on Linux it also provides pthreads, librt and other POSIX
extension libraries.

Documentation
============

The llvm-libc project is still in the planning phase. Stay tuned for
updates soon.

Features and Goals
================

* C11 and upwards conformant.
* A modular libc with individual pieces implemented in the "as a
library" philosophy of the LLVM project.
* Ability to layer this libc over the system libc.
* Provide C symbols as specified by the standards, but take advantage
and use C++ language facilities for the core implementation.
* Provides POSIX extensions on POSIX compliant platforms.
* Provides system-specific extensions as appropriate. For example,
provides the Linux API on Linux.
* Vendor extensions if and only if necessary.
* Designed and developed from the start to work with LLVM tooling and
testing like fuzz testing and sanitizer-supported testing.
* ABI independent implementation as far as possible.
* Use source based implementations as far possible rather than
assembly. Will try to “fix” the compiler rather than use assembly
language workarounds.

Why a new C Standard Library?
=========================

Implementing a libc is no small task and is not be taken lightly. A
natural question to ask is, "why a new implementation of the C
standard library?" There is no single answer to this question, but
some of the major reasons are as follows:

* Most libc implementations are monolithic. It is a non-trivial
porting task to pick and choose only the pieces relevant to one's
platform. The new libc will be developed with sufficient modularity to
make picking and choosing a straightforward task.
* Most libc implementations break when built with sanitizer specific
compiler options. The new libc will be developed from the start to
work with those specialized compiler options.
* The new libc will be developed to support and employ fuzz testing
from the start.
* Most libc implementations use a good amount of assembly language,
and assume specific ABIs (may be platform dependent). With the new
libc implementation, we want to use normal source code as much as
possible so that compiler-based changes to the ABI are easy. Moreover,
as part of the LLVM project, we want to use this opportunity to fix
performance related compiler bugs rather than using assembly
workarounds.
* A large hole in the llvm toolchain will be plugged with this new
libc. With the broad platform expertise in the LLVM community, and the
strong license and project structure, we think that the new libc will
be more tunable and robust, without sacrificing the simplicity and
accessibility typical of the LLVM project.

Platform Support
==============

llvm-libc development is still in the planning phase. However, we
envision that it will support a variety of platforms in the coming
years. Interested parties are encouraged to participate in the design
and implementation, and add support for their favorite platforms.

Current Status
============

llvm-libc development is still in the planning phase.

Build Bots
=========

Coming soon.

Get involved!
===========

First please review our Developer's Policy. Stay tuned for llvm-libc
specific information.

Design Documents
===============

Coming soon.

Aaron Ballman via llvm-dev

unread,
Jul 12, 2019, 11:32:14 AM7/12/19
to Siva Chandra, llvm-dev, Marshall Clow
On Fri, Jul 12, 2019 at 11:16 AM Siva Chandra via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> On Fri, Jun 28, 2019 at 9:29 AM JF Bastien <jfba...@apple.com> wrote:
> >
> > I think I now understand some of the disconnect you and I are having, and I think some of the pushback you’re getting from the community is the same. You’re talking about where you want to start with an LLVM libc. Many in the community (myself included) want to understand where we’ll get with this libc. At steady-state, what does it do? To a certain degree I don’t care about how you get to the steady state: sure the implementation approach is important, and which contributor cares about what parts is important in shaping that evolution, but at the end of the day what matters is where you get.
> >
> > So here’s what’s missing: there’s no goal. Right now, your proposal is “let’s do an LLVM libc, starting with what I care about, who’s interested?”
> >
> > That’s an OK place to start! You illustrated your needs, others chimed in with theirs, and now you know there’s some interest. However, you should take time now to come up with a plan. What’s this libc actually going to be? I ask a bunch of questions below that I think you need to answer as a next step. Others asked more questions which I didn’t echo, but which you should answer as well. What does this libc aspire to become?
>
> I apologize for the delay. I will try to address the above questions
> in this email. I will shortly follow up with answers to other
> questions.
>
> Below is a write up which I think would qualify as the "charter" for
> the new libc. It is also answering questions like, "where we’ll get
> with this libc?", "what's this libc actually going to be?" and similar
> ones. I have used libcxx.llvm.org landing page as a template to write
> it down.
>
> ###############################################
>
> "llvm-libc" C Standard Library
> ========================
>
> llvm-libc is an implementation of the C standard library targeting C11
> and above.

Any particular reason for C11 as opposed to C17?

~Aaron

Sjoerd Meijer via llvm-dev

unread,
Jul 12, 2019, 11:44:30 AM7/12/19
to JF Bastien, Siva Chandra, llvm-dev, Marshall Clow

I think this was mention before by JF, but in this new write up and testing point:

Designed and developed from the start to work with LLVM tooling and testing like fuzz testing and sanitizer-supported testing.

I am also curious why language conformance testing is not mentioned? Are there ideas on that?

From: llvm-dev <llvm-dev...@lists.llvm.org> on behalf of Siva Chandra via llvm-dev <llvm...@lists.llvm.org>
Sent: 12 July 2019 16:15
To: JF Bastien
Cc: llvm-dev; Marshall Clow
Subject: Re: [llvm-dev] A libc in LLVM
 

Siva Chandra via llvm-dev

unread,
Jul 12, 2019, 11:58:39 AM7/12/19
to JF Bastien, llvm-dev, Marshall Clow
On Fri, Jun 28, 2019 at 9:29 AM JF Bastien <jfba...@apple.com> wrote:
> Personally I’m really interested in a project that increases the quality of all C libraries, and of the C standard. I therefore think champions of this project signing up to collaborate with WG14 is important.
> Having this kind of champion is really important for an LLVM libc. I’m not sure I’d support such a project without such a person. As you come up with a plan, consider who that should be. Maybe it’s you :-)

When the need arises, I do not mind being a "champion" like this. To
begin with though, we (as in the team at Google I am representing) do
not intend to participate beyond what we already do (like the C++
committee). Let me point out that I said "to begin with". So,
depending on how things evolve, we might in future increase our
participation with the committees.

Personally, it feels like it is early days - before one goes to the
committee, they should first develop some experience implementing the
standard library. If there is already one such person in the
community, and they would like to take the lead and engage with the
committee from the start, it would be most welcome. I would only be a
hand-waving participant if I were to do it today.

> I think you need write a design for how this C library will be tested.

> You want a plan before it takes off. Testing standardized stuff has enough precedent that you should be able to look at what others have done, and come up with a plan up front. I really like that you want to fuzz, use sanitizers, etc. That’s pretty novel for this kind of project. Basic standards testing isn’t novel, so it should be pretty easy to figure out.

Beyond fuzz and sanitizer based testing, at a general level, this
would be covered:

1. Extensive unit testing.
2. Standards conformance testing.
3. If relevant and possible, differential testing: We want to be able
to test llvm-libc against another battle-tested libc. This is
essentially to understand how we differ from other libcs.
4. If relevant and possible, test against the testsuite of an existing
libc implementation.

One could go into details here, but I think it is best to take them up
on a case-by-case basis: when we are implementing X, we will discuss
what exact kind of testing makes sense for X.

Siva Chandra via llvm-dev

unread,
Jul 12, 2019, 12:03:08 PM7/12/19
to Sjoerd Meijer, llvm-dev, Marshall Clow
On Fri, Jul 12, 2019 at 8:44 AM Sjoerd Meijer <Sjoerd...@arm.com> wrote:
> I am also curious why language conformance testing is not mentioned? Are there ideas on that?

Sorry, I split that part into another email:
http://lists.llvm.org/pipermail/llvm-dev/2019-July/133867.html

> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

David Greene via llvm-dev

unread,
Jul 12, 2019, 12:15:54 PM7/12/19
to Aaron Ballman via llvm-dev, Marshall Clow
Aaron Ballman via llvm-dev <llvm...@lists.llvm.org> writes:

>> * Ability to layer this libc over the system libc.

Echoing others, this seems dubious to me. Why not build up small pieces
at a time and write tests for them? This library doesn't need to
support all existing programs out of the gate. I dont't think libc++
layered on top of existing standard C++ libraries, so why would libc
need to?

>> * Provide C symbols as specified by the standards, but take advantage
>> and use C++ language facilities for the core implementation.

Does this mean C programs would require a C++ runtime? If not, how will
the project ensure that?

-David

Sjoerd Meijer via llvm-dev

unread,
Jul 12, 2019, 12:16:33 PM7/12/19
to Siva Chandra, llvm-dev, Marshall Clow
Ah, sorry, I've missed that message.
Thanks.
From: Siva Chandra <sivac...@google.com>
Sent: 12 July 2019 17:02:16
To: Sjoerd Meijer
Cc: JF Bastien; llvm-dev; Marshall Clow

Rich Felker via llvm-dev

unread,
Jul 12, 2019, 3:29:20 PM7/12/19
to Siva Chandra, llvm...@lists.llvm.org
On Fri, Jul 12, 2019 at 02:54:40PM -0400, Siva Chandra wrote:
> On Fri, Jun 28, 2019 at 9:29 AM JF Bastien <jfbastien at apple.com>

I don't think this is really possible, without tooling designed
specifically to do it (remapping symbols, etc.), which clang/LLVM,
beign the compiler/tooling, *could* do. But even with the right
tooling, you're going to find that it's *a lot* harder than you
expect, likely almost impossible without making assumptions about the
internals of the underlying libc that are not public contracts.

> * Provide C symbols as specified by the standards, but take advantage
> and use C++ language facilities for the core implementation.

This was done in Fuchsia's fork of musl too, and was one of my major
criticisms of it -- makes no sense except satisfying developers who
want to use C++ for the sake of it being C++. It's very hard to do
"freestanding" C++ that doesn't even rely on the underlying libc, and
if you do rely on libc, it's a circular dependency. Moreover there's
really *very little* in libc that benefits from C++ (much less a pure
freestanding C++ with no library) for implementing it. And there's
huge potential to get things wrong by using C++ in ways that have
hidden failure cases/exceptions in places where the C interface you're
implementing either cannot be allowed to fail, or where introducing
the possibility of failure would be a huge QoI flaw.

> * Provides POSIX extensions on POSIX compliant platforms.
> * Provides system-specific extensions as appropriate. For example,
> provides the Linux API on Linux.
> * Vendor extensions if and only if necessary.
> * Designed and developed from the start to work with LLVM tooling and
> testing like fuzz testing and sanitizer-supported testing.
> * ABI independent implementation as far as possible.
> * Use source based implementations as far possible rather than
> assembly. Will try to “fix” the compiler rather than use assembly
> language workarounds.
>
> Why a new C Standard Library?
> =========================
>
> Implementing a libc is no small task and is not be taken lightly. A

Indeed.

> natural question to ask is, "why a new implementation of the C
> standard library?" There is no single answer to this question, but
> some of the major reasons are as follows:
>
> * Most libc implementations are monolithic. It is a non-trivial
> porting task to pick and choose only the pieces relevant to one's
> platform. The new libc will be developed with sufficient modularity to
> make picking and choosing a straightforward task.

Have you given any thought to what it would mean to make this kind of
porting practical? The reason we haven't done it in musl is because
it's highly nontrivial. You have to either find an existing point
amenable to abstraction that's reasonably common to all existing
systems and hope it will apply to future ones too -- for musl, this
means the concept of syscalls, which are presently assumed to be Linux
ones but could be abstracted *somewhat* further, and might be in the
future.

If you can't find a suitable point amenable to abstraction that
encompasses everything you want to support, then instead you end up
making your own abstraction layer in between, and now you're stuck
with the task of porting your abstraction layer to every new system
you want to support. Now you have an extra layer of bloat, and haven't
saved any significant amount of porting work.

All of this aside, I agree that it would be rather nice to be
"non-monolithic", especially for the parts of libc that are "pure
library code" (not depending on any underlying system facilities) to
be kept separate and easy to reuse in ports to weird/bare-metal/etc.
systems.

It'd also be nice for things like stdio that do depend on a system
facility, but where the underlying system facility is understood to be
"common" at a higher level than syscalls (actual functions on fds) to
be able to work with arbitrary implementations of the underlying
functions. The reason we didn't do this from the beginning in musl is
namespacing; plain C symbols can't depend on symbols in the POSIX
namespace.

> * Most libc implementations break when built with sanitizer specific
> compiler options. The new libc will be developed from the start to
> work with those specialized compiler options.

This is a nice goal but invites all sorts of circular dependency
problems. At some point this will likely be possible with musl too,
with the exception of certain components that need to operate at early
entry time.

> * The new libc will be developed to support and employ fuzz testing
> from the start.

> * Most libc implementations use a good amount of assembly language,
> and assume specific ABIs (may be platform dependent). With the new
> libc implementation, we want to use normal source code as much as
> possible so that compiler-based changes to the ABI are easy. Moreover,
> as part of the LLVM project, we want to use this opportunity to fix
> performance related compiler bugs rather than using assembly
> workarounds.

This is particularly wrong about musl, where use of asm (especially
extern asm files as opposed to inline asm) is mostly limited to places
where something fundamentally can't be implemented without asm. We
don't use asm as a workaround for poor compiler codegen, unless you
count things like single-instruction math functions, where it would be
really hard for a compiler to pattern-recognize the whole function and
reduce it down to the instruction. (Note also that use of __builtin_*
doesn't help here because it can create circular definitions if the
compiler chooses not to inline the single instruction.)

> * A large hole in the llvm toolchain will be plugged with this new
> libc.

I read this as a confirmation of my concerns from my previous post and
tweets, that this looks like you're trying to make "LLVM libc" (or
rather "Google libc") the first-class libc for use with clang/LLVM,
radically altering the boundaries between tooling and platform, and
relegating the existing libc implementations on LLVM's targets to
second-class.

If this is not the case, can you explain what guarantees we have that
this is not what's going on?

> With the broad platform expertise in the LLVM community, and the
> strong license and project structure, we think that the new libc will
> be more tunable and robust, without sacrificing the simplicity and
> accessibility typical of the LLVM project.

Tunable and robust are usually opposites; see also: uclibc.

In summary, I think you're still massively underestimating what an
undertaking this is, mistaken about various choices/tradeoffs and
whether they make sense, and either not thinking about consequences on
ecosystem/monoculture of tightly coupling library with tooling, or
intentionally trying to bring about those consequences, contrary to
what I see as the best interests of the communities affected.

Rich

Siva Chandra via llvm-dev

unread,
Jul 12, 2019, 3:37:31 PM7/12/19
to David Greene, Aaron Ballman via llvm-dev, Marshall Clow
On Fri, Jul 12, 2019 at 9:15 AM David Greene <d...@cray.com> wrote:
>
> Aaron Ballman via llvm-dev <llvm...@lists.llvm.org> writes:
>
> >> * Ability to layer this libc over the system libc.
>
> Echoing others, this seems dubious to me. Why not build up small pieces
> at a time and write tests for them? This library doesn't need to
> support all existing programs out of the gate. I dont't think libc++
> layered on top of existing standard C++ libraries, so why would libc
> need to?

I do not think we are saying we "need to". We are only saying it "has
the ability to". One can choose not to use this ability, but having
this ability might be crucial for adoption for some users.

> >> * Provide C symbols as specified by the standards, but take advantage
> >> and use C++ language facilities for the core implementation.
>
> Does this mean C programs would require a C++ runtime? If not, how will
> the project ensure that?

This is a very good question. I think, to keep things simple and sane,
llvm-libc should not require a C++ runtime. If this needs some form of
relaxation later, then we can take it up on a case by case basis.

About ensuring that there is no accidental dependence on a C++
runtime, at a very high level, I think suitably configured bots would
suffice? There could be some tooling, but might have to be taken up on
a case by case basis again.

Szabolcs Nagy via llvm-dev

unread,
Jul 12, 2019, 4:19:46 PM7/12/19
to Siva Chandra, llvm-dev, Marshall Clow
* Siva Chandra via llvm-dev <llvm...@lists.llvm.org> [2019-07-12 08:15:40 -0700]:

> Below is a write up which I think would qualify as the "charter" for
> the new libc. It is also answering questions like, "where we’ll get
> with this libc?", "what's this libc actually going to be?" and similar
> ones. I have used libcxx.llvm.org landing page as a template to write
> it down.

sorry, but this charter does not make much sense.

i think you should get some c runtime developers involved.

>
> ###############################################
>
> "llvm-libc" C Standard Library
> ========================
>
> llvm-libc is an implementation of the C standard library targeting C11
> and above. It also provides platform specific extensions as relevant.
> For example, on Linux it also provides pthreads, librt and other POSIX
> extension libraries.
>
> Documentation
> ============
>
> The llvm-libc project is still in the planning phase. Stay tuned for
> updates soon.
>
> Features and Goals
> ================
>
> * C11 and upwards conformant.
> * A modular libc with individual pieces implemented in the "as a
> library" philosophy of the LLVM project.

this is wrong on many levels:

LLVM is a bad example of library design, as it has library
safety and interface stability issues.

libc is necessarily a library so of course it is implemented
"as a library".

"modular libc" does not make much sense: with satic linking
you only link what you use (libc internal dependencies have
to be minimized if you static link anyway), with dynamic
linking, multiple modules is a huge mistake for other reasons
(it creates internal abi between the components that is
difficult to manage)

if modularity means configurability, then you have a much
bigger problem: maintenance and testing becomes harder
and if components are actually interchangable then you need
stable libc internal interfaces. (this is easy to do in
existing libcs, just not done because it's a disaster,
..except in uclibc ..which is a disaster)

> * Ability to layer this libc over the system libc.
> * Provide C symbols as specified by the standards, but take advantage
> and use C++ language facilities for the core implementation.

this mistake was already made in bionic, why ppl want to
rely on underspecified freestanding c++ language semantics
and making assumptions about c++ implementation internals
as well as dealing with subtle language incompatibilities
when writing a libc is a mistery.

> * Provides POSIX extensions on POSIX compliant platforms.
> * Provides system-specific extensions as appropriate. For example,
> provides the Linux API on Linux.
> * Vendor extensions if and only if necessary.
> * Designed and developed from the start to work with LLVM tooling and
> testing like fuzz testing and sanitizer-supported testing.

the difficulty of this is not in the libc, but various
issues in the sanitizer libraries.. e.g. you don't want to
fuzz test a libc with a fuzz runtime that depends on a c++
runtime (or any other external component that can call
back to libc outside of the control of the fuzz runtime).

> * ABI independent implementation as far as possible.

if you mean call abi, then you get this for free, just use
a portable language (such as c), which is what existing
libc implementations do anyway. (glibc has a bit more
target specifc asm than usual, but most of that has generic
fallback code so easy to drop the asm)

if you mean other abis then clarify.

> * Use source based implementations as far possible rather than
> assembly. Will try to “fix” the compiler rather than use assembly
> language workarounds.

same as above, all libcs do this already.

>
> Why a new C Standard Library?
> =========================
>
> Implementing a libc is no small task and is not be taken lightly. A
> natural question to ask is, "why a new implementation of the C
> standard library?" There is no single answer to this question, but
> some of the major reasons are as follows:
>
> * Most libc implementations are monolithic. It is a non-trivial
> porting task to pick and choose only the pieces relevant to one's
> platform. The new libc will be developed with sufficient modularity to
> make picking and choosing a straightforward task.

this does not make sense (see above).

> * Most libc implementations break when built with sanitizer specific
> compiler options. The new libc will be developed from the start to
> work with those specialized compiler options.

this does not make sense.

(new libc does not make this easier: e.g. adding asan support
to musl is about a week work and most issues are sanitizer
related problems where you have to give up correctness or
reliability for sanitizers to work)

> * The new libc will be developed to support and employ fuzz testing
> from the start.

this does not make sense.

(new libc does not make this easier)

> * Most libc implementations use a good amount of assembly language,
> and assume specific ABIs (may be platform dependent). With the new
> libc implementation, we want to use normal source code as much as
> possible so that compiler-based changes to the ABI are easy. Moreover,
> as part of the LLVM project, we want to use this opportunity to fix
> performance related compiler bugs rather than using assembly
> workarounds.

citation needed.

(removing all unnecessary asm from musl and replacing it with
intrinsics is a weekend project)

> * A large hole in the llvm toolchain will be plugged with this new
> libc. With the broad platform expertise in the LLVM community, and the
> strong license and project structure, we think that the new libc will
> be more tunable and robust, without sacrificing the simplicity and
> accessibility typical of the LLVM project.

for this hole to be filled in on linux you need distros to be able
to rebuild everything against the new libc, otherwise you just
have a new libc that nobody can use because applications have
dependencies that have to be built against the same libc.

in some contexts it is enough to have a libc with build scripts to
build the dependencies.. but existing build scripts often don't
work out of the box with a new libc.

creating a software platform around a libc is a much bigger task
than the libc itself, and without that the libc has limited value.


it seems to me that none of the answers make much sense as is.

>
> Platform Support
> ==============
>
> llvm-libc development is still in the planning phase. However, we
> envision that it will support a variety of platforms in the coming
> years. Interested parties are encouraged to participate in the design
> and implementation, and add support for their favorite platforms.

the problem is not porting the new libc to a platfrom but
porting the platform to a new libc.

i.e. you need to worry a lot more about what's above the libc
(all userspace software) than what's below it (tiny os kernel
interface).

David Jones via llvm-dev

unread,
Jul 12, 2019, 5:07:01 PM7/12/19
to David Greene, Aaron Ballman via llvm-dev, Marshall Clow
On Fri, Jul 12, 2019 at 9:15 AM David Greene via llvm-dev <llvm...@lists.llvm.org> wrote:
Aaron Ballman via llvm-dev <llvm...@lists.llvm.org> writes:

>> * Ability to layer this libc over the system libc.

Echoing others, this seems dubious to me.  Why not build up small pieces
at a time and write tests for them?  This library doesn't need to
support all existing programs out of the gate. 

I can *definitely* appreciate that not everybody wants a Frankenstein's-monster-of-a-library.

For what it's worth... for our usage (within Google, and specifically, within the production fleet), it is much more imperative to show broad usage sooner rather than later. It might make more sense from this perspective: we want to develop in the open (i.e., not just throw some code over the fence when we're "done enough"); and we want it to be of high enough quality that others might use it. In order to do this, we actually need to deliver the "low-hanging fruit" first, because that is a large part of what justifies the rest. Realities being what they are, it would be a non-starter to have no such facility; and it would be somewhat antithetical to carry out such a substantial part of our development effort behind closed doors.
 
I dont't think libc++
layered on top of existing standard C++ libraries, so why would libc
need to? 

There will almost certainly be other platforms where a top-to-bottom implementation would be impossible to implement. Not because of the libc parts, but functionality that -- for whatever reason -- is only available in a library that is also a libc.

(Examples off the top of my head: sandboxed environments, platforms which only allow syscalls through a provided DSO, or platforms which embed some critical functionality within the same binary blob as a vendor-provided libc.)
 
>> * Provide C symbols as specified by the standards, but take advantage
>> and use C++ language facilities for the core implementation.

Does this mean C programs would require a C++ runtime?  If not, how will
the project ensure that?

Shooting from the hip: no. Turning off exceptions, RTTI, and static initializers (i.e., things which require a guard variable) is probably enough to obviate the need for the runtime.

Of course, this raises platform-specific concerns.

Michael Spencer via llvm-dev

unread,
Jul 12, 2019, 5:25:06 PM7/12/19
to Rich Felker, LLVMDev
I'm surprised that you would think this given the rest of the llvm project.

We have:

* clang, yet all of llvm still builds with gcc, msvc, icc, edg, etc...
* lldb, yet you can still debug llvm produced binaries with gdb, and even msvc's debugger.
* lld, yet you can still link with gnu-ld, gold, link.exe, ld64.
* libc++, yet clang still works with libstdc++, dinkumware, and the msvc stdlib.
* libc++abi, yet libc++ still works with libsupc++ and other c++ abi libs.
* libunwind, yet libc++abi works with other unwinders.
* compiler-rt (builtins), yet you can still use libgcc.
* compiler-rt (sanitizers), yet gcc uses them.

It has been well proven that llvm project alternatives do not push out or harm non-llvm compatibility.  We even cooperate with non-llvm projects on changes and new features where it makes sense.

What makes you think a llvm libc would be any different?

- Michael Spencer 

James Y Knight via llvm-dev

unread,
Jul 12, 2019, 5:35:09 PM7/12/19
to Szabolcs Nagy, llvm-dev, Marshall Clow
On Fri, Jul 12, 2019 at 4:19 PM Szabolcs Nagy via llvm-dev <llvm...@lists.llvm.org> wrote:
> * Designed and developed from the start to work with LLVM tooling and
> testing like fuzz testing and sanitizer-supported testing.

the difficulty of this is not in the libc, but various
issues in the sanitizer libraries.. e.g. you don't want to
fuzz test a libc with a fuzz runtime that depends on a c++
runtime (or any other external component that can call
back to libc outside of the control of the fuzz runtime).

FWIW, I was able to fuzz-test some functions in musl using the AFL fuzzer with only small changes (yes, this is not LLVM libFuzzer -- I haven't gotten around to tryingthat yet). It required only minor modifications to the musl makefile and to AFL.

The existing "NOSSP_OBJS" listed in the musl Makefile were exactly the correct ones that needed to be excluded from fuzz coverage instrumentation. I compiled the rest of the library with "afl-gcc", and those files with plain "gcc".

Additionally, a 2-line change was needed in AFL to avoid trying to recursively re-enter initialization while it was already in progress (startup initialization calls getenv which then tries to call back into initialization, because initialization hasn't completed yet). Other than initialization, the instrumentation doesn't call back into libc, so just suppressing that recursion is sufficient.

Khem Raj via llvm-dev

unread,
Jul 14, 2019, 8:46:29 PM7/14/19
to Siva Chandra, llvm-dev, Marshall Clow

There are options like musl which is relatively new and might not have
as much legacy. I am sure you have looked
at various system C libraries out there for consideration, it would be
interesting
to share the insights on differences.

Siva Chandra via llvm-dev

unread,
Jul 15, 2019, 1:02:55 PM7/15/19
to Aaron Ballman, llvm-dev, Marshall Clow
On Fri, Jul 12, 2019 at 8:32 AM Aaron Ballman <aa...@aaronballman.com> wrote:
> > llvm-libc is an implementation of the C standard library targeting C11
> > and above.
>
> Any particular reason for C11 as opposed to C17?

Two reasons:
1. The C++17 standard refers to the C11 standard.
2. C11 is sufficiently modern while not closing doors for users
requiring compliance with an "older" standards. That said, we could
choose not to implement certain items removed in the C17 standard. An
obvious example of such a candidate is `gets`.

Aaron Ballman via llvm-dev

unread,
Jul 15, 2019, 2:22:20 PM7/15/19
to Siva Chandra, llvm-dev, Marshall Clow
On Mon, Jul 15, 2019 at 1:02 PM Siva Chandra <sivac...@google.com> wrote:
>
> On Fri, Jul 12, 2019 at 8:32 AM Aaron Ballman <aa...@aaronballman.com> wrote:
> > > llvm-libc is an implementation of the C standard library targeting C11
> > > and above.
> >
> > Any particular reason for C11 as opposed to C17?
>
> Two reasons:
> 1. The C++17 standard refers to the C11 standard.

This is somewhat confusing to me. That's a reason to support *at
least* C11. It doesn't seem like a reason to not support the latest C
standard.

> 2. C11 is sufficiently modern while not closing doors for users
> requiring compliance with an "older" standards. That said, we could
> choose not to implement certain items removed in the C17 standard. An
> obvious example of such a candidate is `gets`.

gets() was removed in C11. ;-)

I strongly think we should support the C17 standard library. C17 was a
bugfix release, so the delta between it and C11 in terms of
functionality is small but the quality is higher.

~Aaron

Finkel, Hal J. via llvm-dev

unread,
Jul 15, 2019, 2:35:37 PM7/15/19
to Aaron Ballman, Siva Chandra, llvm-dev, Marshall Clow

On 7/15/19 1:22 PM, Aaron Ballman via llvm-dev wrote:
> On Mon, Jul 15, 2019 at 1:02 PM Siva Chandra <sivac...@google.com> wrote:
>> On Fri, Jul 12, 2019 at 8:32 AM Aaron Ballman <aa...@aaronballman.com> wrote:
>>>> llvm-libc is an implementation of the C standard library targeting C11
>>>> and above.
>>> Any particular reason for C11 as opposed to C17?
>> Two reasons:
>> 1. The C++17 standard refers to the C11 standard.
> This is somewhat confusing to me. That's a reason to support *at
> least* C11. It doesn't seem like a reason to not support the latest C
> standard.
>
>> 2. C11 is sufficiently modern while not closing doors for users
>> requiring compliance with an "older" standards. That said, we could
>> choose not to implement certain items removed in the C17 standard. An
>> obvious example of such a candidate is `gets`.
> gets() was removed in C11. ;-)
>
> I strongly think we should support the C17 standard library. C17 was a
> bugfix release, so the delta between it and C11 in terms of
> functionality is small but the quality is higher.


+1. Aiming for C17 seems better than aiming for only C11.

 -Hal


>
> ~Aaron
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Siva Chandra via llvm-dev

unread,
Jul 15, 2019, 2:43:29 PM7/15/19
to Finkel, Hal J., llvm-dev, Marshall Clow
> On 7/15/19 1:22 PM, Aaron Ballman via llvm-dev wrote:
> > On Mon, Jul 15, 2019 at 1:02 PM Siva Chandra <sivac...@google.com> wrote:
> >> On Fri, Jul 12, 2019 at 8:32 AM Aaron Ballman <aa...@aaronballman.com> wrote:
> >>>> llvm-libc is an implementation of the C standard library targeting C11
> >>>> and above.
> >>> Any particular reason for C11 as opposed to C17?
> >> Two reasons:
> >> 1. The C++17 standard refers to the C11 standard.
> > This is somewhat confusing to me. That's a reason to support *at
> > least* C11. It doesn't seem like a reason to not support the latest C
> > standard.

I think there is some misunderstanding here. My first message said
llvm-libc will target C11 __and above__. Which is to imply that the
lower bound of supported standards is C11.

On Mon, Jul 15, 2019 at 11:31 AM Finkel, Hal J. <hfi...@anl.gov> wrote:
> +1. Aiming for C17 seems better than aiming for only C11.

I interpreted Aaron Ballman's first question as, "why is the lower
bound C11 and not C17?" I answered that by saying C++17 standard still
refers to C11 standard, so we need to keep C11 as a lower bound.

Unless there is some technicality of the language and/or standards
which I am not aware of, I did not intend to convey that we do not
intend to support latest C standards.

Are you saying that the lower bound of standards llvm-libc should
support ought to be C17?

Aaron Ballman via llvm-dev

unread,
Jul 15, 2019, 2:48:04 PM7/15/19
to Siva Chandra, llvm-dev, Marshall Clow

Yes, I'm sorry if I was unclear. I think that there's not much purpose
to supporting C11 as the lower bound given that C17's standard library
is C11's standard library, but with bug fixes. There were no new
features added during C17.

~Aaron

It is loading more messages.
0 new messages