A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway:1. Not type-safe. Not all compilers warn when you mess up the format specifier. And when you're writing your own Printf-like functions, you need to tag them with __attribute__(format, printf) which again not all compilers have.
If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile!
2. Not security safe. Functions like sprintf() will happily smash your stack for you if you're not careful.3. Not portable (well kinda). Quick, how do you print a size_t? You probably said %z. Well MSVC didn't even support %z until 2015, which we aren't even officially requiring yet. So you've gotta write (uint64_t)x and then use PRIx64. Ugh.4. Redundant. If you're giving it an integer, why do you need to specify %d? It's an integer! We should be able to use the type system to our advantage.5. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can't. You have to resort to providing an overloaded streaming operator or formatting it some other way.
So I've been working on a library that will solve all of these problems and more.
The high level design of my library is borrowed heavily from C#. But if you're not familiar with C#, I believe boost has something similar in spirit. The best way to show it off is with some examples:1. os << format_string("Test"); // writes "test"2. os << format_string("{0}", 7); // writes "7"Immediately we can see one big difference between this and llvm::format() / printf. You don't have to specify the type. If you pass in an int, it formats it as an int.3. os << format_string("{0} {0}", 7); // writes "7 7"#3 is an example of something that cannot be done elegantly with printf. Sure, you can pass it in twice, but if it's expensive to compute, this means you have to save it into a temporary.
4. os << format_string("{0:X}", 255); // writes "0xFF"5. os << format_string("{0:X7}", 255); // writes "0x000FF"6. os << format_string("{0}", foo_object); // fails to compile!Here is another example of an improvement over traditional formatting mechanisms. If you pass an object for which it cannot find a formatter, it fails to compile.However, you can always define custom formatters for your own types. If you write:namespace llvm {template<>struct format_provider<Foo> {static void format(raw_ostream &S, const Foo &F, int Align, StringRef Options) {}};}Then #6 will magically compile, and invoke the function above to do the formatting. There are other ways to customize the formatting behavior, but I'll keep going with some more examples:
7. os << format_string("{0:N}", -1234567); // Writes "-1,234,567". Note the commas.
8. os << format_string("{0:P}", 0.76); // Writes "76.00%"You can also left justify and right justify. For example:9. os << format_string("{0,8:P}", 0.76); // Writes " 76.00%"10. os << format_string("{0,-8,P}", 0.76); // Writes "76.00% "And you can also format complicated types. For example:
11. os << format_string("{0:DD/MM/YYYY hh:mm:ss}", std::chrono::system_clock::now()); // writes "10/11/2016 18:19:11”
I already have a working proof of concept that supports most of the fundamental data types and formatting options such as percents, exponents, comma grouping, fixed point, hex, etc.To summarize, the advantages of this approach are:1) Safe. If it can't format your type, it won't even compile.2) Concise. You can re-use parameters multiple times without re-specifying them.3) Simple. You don't have to remember whether to use %llu or PRIx64 or %z, because format specifiers don't exist!4) Flexible. You can format types in a multitude of different ways while still having the nice format-string style syntax.5) Extensible. If you don't like the behavior of a built-in formatter, you can override it with your own. If you have your own type which you'd like to be able to format, you can add formatting support for it in multiple different ways.I am hoping to have something ready for submitting later this week. If this interests you, please help me out by reviewing my patch! And if you think this would not be helpful for LLVM and I should not worry about this, let me know as well!
Hi,I On Oct 11, 2016, at 6:22 PM, Zachary Turner via llvm-dev <llvm...@lists.llvm.org> wrote:A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway:1. Not type-safe. Not all compilers warn when you mess up the format specifier. And when you're writing your own Printf-like functions, you need to tag them with __attribute__(format, printf) which again not all compilers have.I’m not very sensitive to the “not all compilers have” argument, however it is worth mentioning that the format may not be a string literal, which defeat the “sanitizer”.If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile!llvm::format now fails to compile as well :)However this does not address other issues, like: `format(“%d”, float_var)`2. Not security safe. Functions like sprintf() will happily smash your stack for you if you're not careful.3. Not portable (well kinda). Quick, how do you print a size_t? You probably said %z. Well MSVC didn't even support %z until 2015, which we aren't even officially requiring yet. So you've gotta write (uint64_t)x and then use PRIx64. Ugh.4. Redundant. If you're giving it an integer, why do you need to specify %d? It's an integer! We should be able to use the type system to our advantage.5. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can't. You have to resort to providing an overloaded streaming operator or formatting it some other way.It seems to me that there is no silver bullet for that: being for llvm::format() or your new proposal, there is some sort of glue/helpers that need to be provided for each and every non-standard type.So I've been working on a library that will solve all of these problems and more.Great! I appreciate the effort, and talking about that with Duncan last week he was mentioning that we should do it :)The high level design of my library is borrowed heavily from C#. But if you're not familiar with C#, I believe boost has something similar in spirit. The best way to show it off is with some examples:1. os << format_string("Test"); // writes "test"2. os << format_string("{0}", 7); // writes "7"Immediately we can see one big difference between this and llvm::format() / printf. You don't have to specify the type. If you pass in an int, it formats it as an int.3. os << format_string("{0} {0}", 7); // writes "7 7"#3 is an example of something that cannot be done elegantly with printf. Sure, you can pass it in twice, but if it's expensive to compute, this means you have to save it into a temporary.What about: printf(“%0$ %0$”, 7);
4. os << format_string("{0:X}", 255); // writes "0xFF"5. os << format_string("{0:X7}", 255); // writes "0x000FF"6. os << format_string("{0}", foo_object); // fails to compile!Here is another example of an improvement over traditional formatting mechanisms. If you pass an object for which it cannot find a formatter, it fails to compile.However, you can always define custom formatters for your own types. If you write:namespace llvm {template<>struct format_provider<Foo> {static void format(raw_ostream &S, const Foo &F, int Align, StringRef Options) {}};}Then #6 will magically compile, and invoke the function above to do the formatting. There are other ways to customize the formatting behavior, but I'll keep going with some more examples:7. os << format_string("{0:N}", -1234567); // Writes "-1,234,567". Note the commas.Why add commas? Because of the “:N”?This seems like localization-dependent: how do you handle that?
What happens with the following?os << format_string("{0:N}", -123.455);
5. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can't. You have to resort to providing an overloaded streaming operator or formatting it some other way.It seems to me that there is no silver bullet for that: being for llvm::format() or your new proposal, there is some sort of glue/helpers that need to be provided for each and every non-standard type.
On Oct 11, 2016, at 9:47 PM, Zachary Turner <ztu...@google.com> wrote:Ok, well another example would be if you pass a pointer. The only valid options are various flavors of hex. You wouldn't want to print a pointer in scientific notation, for example.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
The high level design of my library is borrowed heavily from C#.
1. os << format_string("Test"); // writes "test"2. os << format_string("{0}", 7); // writes "7"
I’m also generally in favour, but I wonder what the key motivations for designing our own, rather than importing something like FastFormat, fmtlib, or one of the other tried-and-tested C++ typesafe I/O libraries is. Has someone done an analysis of why these designs are a bad fit for LLVM, or are we just reinventing the wheel because we feel like it?
David
On 12 Oct 2016, at 07:29, Chandler Carruth via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> I'm generally favorable on the core idea of having a type-safe and friendly format-string-like formatting utility
I’m also generally in favour, but I wonder what the key motivations for designing our own, rather than importing something like FastFormat, fmtlib, or one of the other tried-and-tested C++ typesafe I/O libraries is. Has someone done an analysis of why these designs are a bad fit for LLVM, or are we just reinventing the wheel because we feel like it?
A reimplementation is likely to be no less complex than any of the originals. Both fmtlib and FastFormat are under BSD / MIT-style licenses and are both small enough that it would be possible to embed copies of either in the LLVM tree if eliminating a dependency were desired.
Even if the implementation is not useable, adopting similar interfaces to an existing C++ solution is likely to be more friendly to C++ developers than designing something based on C# or Python.
> Either way, rolling our own has some advantages: LLVM may be able to make simplifying tradeoffs other libraries cannot realistically make due to narrower use cases and needs.
If that is the case, I would be totally in favour of rolling our own, but it seems that rolling our own was a decision made before investigating the alternatives.
> Provided we're only talking about very low level utilities like this, the cost doesn't seem terribly high to rolling our own, so I'm generally comfortable doing it.
>
> Doesn't mean we shouldn't look at all the existing ones and learn everything we can from them.
Completely agreed.
> On 12 Oct 2016, at 09:34, Chandler Carruth <chan...@gmail.com> wrote:
> Given the tendency of utilities like this to become used pervasively in the project, it would seem a fairly heavy weight dependency to grow.
A reimplementation is likely to be no less complex than any of the originals. Both fmtlib and FastFormat are under BSD / MIT-style licenses and are both small enough that it would be possible to embed copies of either in the LLVM tree if eliminating a dependency were desired.
2-clause BSD and MIT licenses (the relevant ones here) do address this. They are as permissive as the most permissive license used in LLVM (and far more permissive than the proposed new license) and carry no binary attribution clauses.
What would happen if I accidentally type "ps" instead of "ms" (I am
assuming we will not support picoseconds here)?
Will this abort at runtime?
I would prefer if *all* arguments to the format were checkable at compile time:
I.e. something like:
os << "blah blah" << format<std::milli>(end-start) << "blah blah";
I understand this may clash a bit with the desire for a compact
representation, but maybe with some clever design we could achieve
both?
pl
2-clause BSD and MIT licenses (the relevant ones here) do address this.
They are as permissive as the most permissive license used in LLVM (and far more permissive than the proposed new license) and carry no binary attribution clauses.
I wonder if we could use UDLs instead?
os << "Test" << "{0}"_fs << 7;
~Aaron
>
> I'm not a huge fan of streaming, but if we want to go this route, I'd very
> much like to keep the syntax short and sweet. "format" is pretty great for
> that. If this is going to fully subsume its use cases, can we eventually get
> that to be the name?
>
> (While I don't like streaming, I'm not trying to fight that battle here...)
>
> Also, you should probably look at what is quickly becoming a popular C++
> library in this space: https://github.com/fmtlib/fmt
>
This is awesome. +1Copying a time-tested design like C#'s (and which also Python uses) seems like a really sound approach.Do you have any particular plans w.r.t. converting existing uses of the other formatting constructs? At the very least we can hopefully get rid of format_hex/format_hex_no_prefix since I don't think there are too many uses of those functions.
Also, Since the format string already can embed the surrounding literal strings, do you anticipate the use case where you would want to use `OS << format_string(...) << ...something else...`?Would `print(OS, "....", ....)` make more sense?
I'm generally favorable on the core idea of having a type-safe and friendly format-string-like formatting utility. Somewhat minor comments below:On Tue, Oct 11, 2016 at 6:22 PM Zachary Turner via llvm-dev <llvm...@lists.llvm.org> wrote:The high level design of my library is borrowed heavily from C#.My only big hesitation here is that the substitution specifier seems heavily influenced by C#. I'd prefer to model this after a format string syntax folks are fairly familiar with. IMO, Python's is probably the best bet here and has had a lot of hammering on it over the years. So I'd suggest that the pattern syntax be mapped to be as similar to Python's as possible or at least built on top of it.
1. os << format_string("Test"); // writes "test"2. os << format_string("{0}", 7); // writes "7"The "<< format_string(..." is ... really verbose for me. It also makes me strongly feel like this produces a string rather than a streamable entity.I'm not a huge fan of streaming, but if we want to go this route, I'd very much like to keep the syntax short and sweet. "format" is pretty great for that. If this is going to fully subsume its use cases, can we eventually get that to be the name?(While I don't like streaming, I'm not trying to fight that battle here...)
A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway:1. Not type-safe. Not all compilers warn when you mess up the format specifier. And when you're writing your own Printf-like functions, you need to tag them with __attribute__(format, printf) which again not all compilers have. If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile!2. Not security safe. Functions like sprintf() will happily smash your stack for you if you're not careful.3. Not portable (well kinda). Quick, how do you print a size_t? You probably said %z. Well MSVC didn't even support %z until 2015, which we aren't even officially requiring yet. So you've gotta write (uint64_t)x and then use PRIx64. Ugh.4. Redundant. If you're giving it an integer, why do you need to specify %d? It's an integer! We should be able to use the type system to our advantage.5. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can't. You have to resort to providing an overloaded streaming operator or formatting it some other way.So I've been working on a library that will solve all of these problems and more.
I wonder what use cases you envision for this? Why does LLVM need a super extensible flexible formatting library? I mean -- if you were developing this as a standalone project, that seems like maybe a nice feature. But I see no rationale as to why LLVM should include it.
That is to say: wouldn't a much-simpler printf replacement, implemented with variadic templates instead of C varargs (and which therefore doesn't require size/signedness prefixes on %d) be sufficient for LLVM?
You can do that as a drop-in improvement for llvm::format, replacing the call to snprintf inside the implementation with a new implementation that actually uses the type information.
On Oct 12, 2016, at 7:12 AM, Zachary Turner via llvm-dev <llvm...@lists.llvm.org> wrote:Ahh, UDLs also wouldn't permit non literal format strings, which is a deal breaker imo
On Wed, Oct 12, 2016 at 10:13 AM James Y Knight <jykn...@google.com> wrote:I wonder what use cases you envision for this? Why does LLVM need a super extensible flexible formatting library? I mean -- if you were developing this as a standalone project, that seems like maybe a nice feature. But I see no rationale as to why LLVM should include it.We were discussing this on IRC chat the other night, but I believe many people underestimate the need for string formatting. Here are some data points:1. There are currently 1,637 calls to llvm::format() across the codebase, and this doesn't include calls to format_hex(), format_decimal(), and the other variants.2. LLVM consists of a large number (20+ at a minimum) of focused tools (llc, lli, llvm-dwarfdump, llvm-objdump, etc) whose sole purpose is to output formatted text. Consider the use case of printing a verbose disassembly listing which is fed into FileCheck.3. Even the "flagship" tools such as clang have need for string formatting when writing diagnostic messages.4. LLDB in particular has this kind of thing *everywhere*. I'm talking about anywhere from 3-50+ times per function (and that's not an exaggeration) for logging purposes.That said, LLVM already includes a formatting library. llvm::format(). So what would be the rationale *against* a better, safer, and easier version of the same thing?
That is to say: wouldn't a much-simpler printf replacement, implemented with variadic templates instead of C varargs (and which therefore doesn't require size/signedness prefixes on %d) be sufficient for LLVM?You can do that as a drop-in improvement for llvm::format, replacing the call to snprintf inside the implementation with a new implementation that actually uses the type information.How would you format user-defined types using this? I gave an example earlier: Consider you have a start time and an end time in std::chrono types, and you want to print the start, end, and duration. The code to do this using llvm::format() or stream operators is horrible.
On Wed, Oct 12, 2016 at 1:28 PM, Zachary Turner <ztu...@google.com> wrote:On Wed, Oct 12, 2016 at 10:13 AM James Y Knight <jykn...@google.com> wrote:I wonder what use cases you envision for this? Why does LLVM need a super extensible flexible formatting library? I mean -- if you were developing this as a standalone project, that seems like maybe a nice feature. But I see no rationale as to why LLVM should include it.We were discussing this on IRC chat the other night, but I believe many people underestimate the need for string formatting. Here are some data points:1. There are currently 1,637 calls to llvm::format() across the codebase, and this doesn't include calls to format_hex(), format_decimal(), and the other variants.2. LLVM consists of a large number (20+ at a minimum) of focused tools (llc, lli, llvm-dwarfdump, llvm-objdump, etc) whose sole purpose is to output formatted text. Consider the use case of printing a verbose disassembly listing which is fed into FileCheck.3. Even the "flagship" tools such as clang have need for string formatting when writing diagnostic messages.4. LLDB in particular has this kind of thing *everywhere*. I'm talking about anywhere from 3-50+ times per function (and that's not an exaggeration) for logging purposes.That said, LLVM already includes a formatting library. llvm::format(). So what would be the rationale *against* a better, safer, and easier version of the same thing?The arguments against for me are roughly:1. It introduces a new formatting language that people need to learn.
2. People will still continue using printf-style formattings strings, too, because everyone **always** does, whenever anyone's ever introduced another formatting language anywhere.
3. The extensible formatting support is a) not obviously necessary, and b) will be more difficult to understand for readers, versus calling a function with normal function arguments.
That is to say: wouldn't a much-simpler printf replacement, implemented with variadic templates instead of C varargs (and which therefore doesn't require size/signedness prefixes on %d) be sufficient for LLVM?You can do that as a drop-in improvement for llvm::format, replacing the call to snprintf inside the implementation with a new implementation that actually uses the type information.How would you format user-defined types using this? I gave an example earlier: Consider you have a start time and an end time in std::chrono types, and you want to print the start, end, and duration. The code to do this using llvm::format() or stream operators is horrible.
I'd call a function that returns a string, and print the string.E.g.:format("Started at %s, ended at %s",format_date("%d/%m/%Y %T", start_time),format_date("%d/%m/%Y %T", end_time));
Boost.Format:
http://www.boost.org/doc/libs/1_62_0/libs/format/doc/format.html
I used it extensively in a past gig. IIRC, it's type safe, more convenient than usual operator<<, and faster than printf. I would love for something like this to be in tree... I don't really care which one as long as it's convenient enough that it's "obviously better".
(IOW, +1.)
> The best way to show it off is with some examples:
>
> 1. os << format_string("Test"); // writes "test"
> 2. os << format_string("{0}", 7); // writes "7"
>
On Oct 12, 2016, at 11:38 AM, Zachary Turner <ztu...@google.com> wrote:I don't object to compile time checking *as long as it doesn't severely detract from brevity*.
At the same time, I do object to *preventing* runtime format strings.
On Oct 12, 2016, at 12:08 PM, Zachary Turner <ztu...@google.com> wrote:I thought I did. :) Passing format strings between functions is very useful. For example, imagine wanting to write a function like printRange(const char *Fmt, std::vector<int> Items);
This isn't possible if your format string MUST be a string literal
Equally importantly, I don't see a good reason to disallow runtime format strings.
On Oct 12, 2016, at 12:35 PM, Zachary Turner <ztu...@google.com> wrote:You get compile time checking automatically when we can use c++14 though. If you use it with a string literal, you'll get compile time checking, otherwise you won’t.
Here's a different example though. Suppose you're writing a tool which prints formatted output, and the field width is specified by the user.
Now you NEED to build the format string at runtime, there's no other way
You get compile time checking automatically when we can use c++14 though. If you use it with a string literal, you'll get compile time checking, otherwise you won't.
Internationalization is often one common reason for a format string to
not be a string literal. I could see us wanting to translate our
diagnostic messages, for instance.
~Aaron
Couldn't you define a class FormatString like this:class FormatString {template<int N>constexpr FormatString(const char (&S)[N]) {tokenize();}FormatString(const char *s) {}};Then define the format function as format(const FormatString &S, Ts &&...Args)The implicit conversion from string literal would go to the constexpr constructor which could tokenize the string at compile time, while implicit conversion from non-literal would be tokenized at runtime.
On Oct 11, 2016, at 6:22 PM, Zachary Turner via llvm-dev <llvm...@lists.llvm.org> wrote:
A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway:1. Not type-safe. Not all compilers warn when you mess up the format specifier. And when you're writing your own Printf-like functions, you need to tag them with __attribute__(format, printf) which again not all compilers have. If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile!2. Not security safe. Functions like sprintf() will happily smash your stack for you if you're not careful.3. Not portable (well kinda). Quick, how do you print a size_t? You probably said %z. Well MSVC didn't even support %z until 2015, which we aren't even officially requiring yet. So you've gotta write (uint64_t)x and then use PRIx64. Ugh.4. Redundant. If you're giving it an integer, why do you need to specify %d? It's an integer! We should be able to use the type system to our advantage.5. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can't. You have to resort to providing an overloaded streaming operator or formatting it some other way.So I've been working on a library that will solve all of these problems and more.
The high level design of my library is borrowed heavily from C#. But if you're not familiar with C#, I believe boost has something similar in spirit. The best way to show it off is with some examples:
1. os << format_string("Test"); // writes "test"2. os << format_string("{0}", 7); // writes "7"
It's quite heavy, e.g.:
https://github.com/fmtlib/fmt#compile-time-and-code-bloat
I've been using that library for a couple of projects in an older
version, I think the newer version would primarily be quite a bit less
verbose. It has a modern BSD license.
Joerg
On Oct 12, 2016, at 12:35 PM, Zachary Turner <ztu...@google.com> wrote:You get compile time checking automatically when we can use c++14 though. If you use it with a string literal, you'll get compile time checking, otherwise you won’t.I understand that, but that doesn’t really address my concerns.
Here's a different example though. Suppose you're writing a tool which prints formatted output, and the field width is specified by the user.Now you NEED to build the format string at runtime, there's no other wayMaybe the problem is using a string to format this in the first place.For example, you could wrap the object you want to print with an adaptor in charge of padding to the right till you reach the column width.format(“{0}”, rPad(col_width, my_object));
On Oct 12, 2016, at 8:07 PM, Zachary Turner <ztu...@google.com> wrote:On Wed, Oct 12, 2016 at 12:40 PM Mehdi Amini <mehdi...@apple.com> wrote:On Oct 12, 2016, at 12:35 PM, Zachary Turner <ztu...@google.com> wrote:You get compile time checking automatically when we can use c++14 though. If you use it with a string literal, you'll get compile time checking, otherwise you won’t.I understand that, but that doesn’t really address my concerns.
Here's a different example though. Suppose you're writing a tool which prints formatted output, and the field width is specified by the user.Now you NEED to build the format string at runtime, there's no other wayMaybe the problem is using a string to format this in the first place.For example, you could wrap the object you want to print with an adaptor in charge of padding to the right till you reach the column width.format(“{0}”, rPad(col_width, my_object));FWIW I do think that literal format strings will handle 90% or more of uses. I just don't see the benefit of needlessly banning the other cases. Because all that's going to happen is someone is going to resort to using snprintf etc, which is exactly the problem I'm trying to solve.
It's literally no extra effort to support runtime format strings, and it makes the library more flexible as a result.
On Oct 12, 2016, at 8:33 PM, Zachary Turner <ztu...@google.com> wrote:AFAICT this appears to be the first time you've clarified that you're talking about a situation where the compile-time checking happens using something other than format strings.
In Pavel's original email, he suggested compile time checking and you mentioned that I didn't object to it. But if you go back and read my response, I said we can do the compile time checking *of the format strings* using C++14. So no I didn't object to it in principle, but I never strayed from the desire to use format strings.To respond to your other point, no it doesn't make it more flexible than a non-string based solution. But does anyone want a non string-based solution? We already have one, it's called raw_ostream. And STL has another one in iostreams. sprintf and llvm::format are not more flexible than streaming operators either, and yet people still flock to them because it yields the nicest looking code. James Knight pointed out earlier that "any time someone invents a new formatting library, everyone always ends up using printf anyway". There's a reason for that, and it's because printf is string-based. That's what people want.So if we're talking about string-based versus non string-based, then yes, I'm married to the idea of a string based solution.
That doesn't mean we can't *also* expose the underlying format functionality via an additional set of non format based functions. But string-based formatting is necessary if there is to be any adoption at all.
This may be a good time to point at https://reviews.llvm.org/D25018
But if someone ends up doing a full overhaul of the formatting that
makes that patch unnecessary, I'm happy too.
Cheers,
Nicolai
Hi all,Tentatively final version is up here: https://reviews.llvm.org/D25587It has a verbal LGTM, but I plan to wait a bit longer just in case anyone has some additional thoughts. It's a large patch, but if you're interested, one way you can help without doing a full-blown review is to look at the large comment blocks in FormatVariadic.h and FormatProviders.h. Here I provide a formal description of the grammar of the replacement sequences and format syntax. So you can look at this without looking at the code behind it and see if you have comments just on the format language.Here's a summary of (most) everything contained in this patch:1) UDL Syntax for outputting to a stream or converting to a string.outs() << "{0}"_fmt.stream(1)std::string S = "{0}"_fmt.string(1);
* UDL Syntax is removed in the latest version of the patch.* Name changed to `formatv` since `format_string` is too much to type.* Added conversion operators for `std::string` and `llvm::SmallString`.I had some feedback offline (not on this thread, unfortunately) that it might be worth using a printf style syntax instead of this Python-esque syntax. FTR, I actually somewhat object to this, for a couple of reasons:1) It makes back-reference syntax ugly. "{0} {1} {0}" is much clearer to me than "%0$ %1$ %0$". The latter syntax is also not a very well known feature of printf and so unlikely to be used by people with a printf-style implementation, whereas it's un-missable with the python-style syntax
2) I don't see why we should need to specify the type of the argument with %d if the compiler knows it's an integer. Even if the we can add compile-time checking to make it error, it seems unnecessary to even encounter this situation in the first place. I believe the compiler should simply format what you give it.
3) One of the most useful aspects of the current approach is the ability to plug in custom formatters for application specific data types. This is not straightforward with a printf-style syntax.
You might be able to hook up a template-specialization like mechanic to the processing of %s (similar to my current approach), but it's not obvious how you proceed from there to get custom format strings for individual types. For example, a formatter which can print a TimeSpan in different units depending on style options you pass in. This is especially useful when trying to print ranges where you often want to be able to specify a different separator, or control the formatting of the underlying type. (e.g. it's not clear how you would elegantly format a range of integers in hex using this style of approach).
FWIW, I'm also not entirely sold that we need a complex formatting
language here. The printf modifiers are easy to remember and are good
enough 90% of the time, whereas with something like this I feel like I'd
need to look up the syntax every time I used it.
Like James though, I'm fine with conceding to the majority on this one.
On Nov 7, 2016, at 2:58 PM, Zachary Turner via llvm-dev <llvm...@lists.llvm.org> wrote:FWIW, if you're only ever formatting numbers and strings (which I agree is likely a majority of use cases), the syntax should be very easy to remember. Most of the time you don't need to specify anything other than the placeholder index. In that respect I expect it to catch on very quickly as there's really nothing to remember.Only if you want to customize the behavior will you maybe have to look up the syntax, and in that case you would have to do something equally funky with printf (such as not using it and writing 4 lines of streaming stuff to an ostream instead).
_______________________________________________