Recently I was coding Perl 5 and quite often I had to change
interpolated strings or C<print> to C<sprintf> or <printf>.
I began to wonder, if qq strings couldn't allow sprintf-like
formatting directly.
I could imagine an \F escape sequence with the following syntax:
: '\F' printf-format-without-% '(' expr ')'
| '\F' printf-format-without-% '{' string '}'
Examples:
"The value in hex is \Fx($value)."
"You currently have \F020d($dollars) on your account."
"Leave some --\Fs60{space for this $interpolates string}--."
I find this syntax reads very well.
eg. "The value in hex is 'format hex $value'."
I like that it doesn't separate format specifier and
data like s?printf. Please note: I don't want to replace s?printf!
Is there something I have missed that would make this impossible or
impractical. Or maybe this feature already exists somewhere and I
don't know it?
best regards,
-Edwin
> "Leave some --\Fs60{space for this $interpolates string}--."
I'm sorry, this should be:
> "Leave some --\F60s{space for this $interpolates string}--."
Now, if you're on a Perl 6 list, you'd better be using Perl 6 patterns
:-)
/ \\F <printf_format_without_percent> \( <expr> \)
| \\F <printf_format_without_percent> \( <string> \) /
> Examples:
>
> "The value in hex is \Fx($value)."
> "You currently have \F020d($dollars) on your account."
> "Leave some --\Fs60{space for this $interpolates string}--."
>
> I find this syntax reads very well.
> eg. "The value in hex is 'format hex $value'."
I definitely like the idea. It's something like Python's % operator,
but inline, which seems to make more sense.
As far as the syntax, the () and {} don't make a lot of sense with
regard to the rest of the language. We could either utilize the
string/numeric context distinction that already exists in {} and []
for subscripting, or we could always use () in analog to $().
I'd like to have that dollar in there somewhere, actually.
"The value in hex is \Fx$( expression )."
Or something. That is kinda clunky, though. Maybe just a
stringification adverb, albeit verbose (but more versatile):
"The value in hex is $( expression where format('x') )"
No, I actually think that should be a property. In fact, one that has
been discussed before:
"The value in hex is $( expression but formatted('x') )"
That's actually my favorite so far.
> -Edwin
Luke
> As far as the syntax, the () and {} don't make a lot of sense with
> regard to the rest of the language. We could either utilize the
> string/numeric context distinction that already exists in {} and []
> for subscripting, or we could always use () in analog to $().
My idea was to make it like the scoped \L{ } of Apocalypse 2.
The \L also says something about formatting and the { } creates a
scope, which "stays" in the interpolated string.
You are right about the (), however, because there should be a
more visible marker (probably a sigil) when the syntax changes from
<interpolated_string> to <expr>.
> I'd like to have that dollar in there somewhere, actually.
>
> "The value in hex is \Fx$( expression )."
The problem is calculated format (I forgot to mention this):
"The value in the chosen format is \F$format$( expression )."
The compiler cannot know, if $format contains the whole format
specifier or just a part (or nothing) so it does not know if it should
take $( expression ) as part of the format or as the formatee ;).
With my proposed syntax the first '(' outside any nesting constructs
would clearly mark the beginning of the formatee.
One option would be to only allow
\\F <interpolated_without_unnested_open_brace> \{ <string> \}
so this would work and the dollar is there:
"You wanted to see it like that: \F$format\Q{$that}"
"You have \F${digits}d{$cent}."
...which is less than beautiful (should not be common, though).
Also the formatee would always be converted to a string before
formatting (also see conclusion below).
> Or something. That is kinda clunky, though. Maybe just a
> stringification adverb, albeit verbose (but more versatile):
>
> "The value in hex is $( expression where format('x') )"
>
> No, I actually think that should be a property. In fact, one that has
> been discussed before:
>
> "The value in hex is $( expression but formatted('x') )"
>
> That's actually my favorite so far.
So the value should 'carry' its own format...This makes sense in some
cases, in other cases it does not (Though you always could override
with another C<but>.)
The syntax is clean, but even longer than with sprintf:
"The value in hex is $( expression but formatted('x') )"
"The value in hex is $( sprintf '%x',expression )"
Why not allow both (\F with {} and C<but formatted>)? If we disallow
interpolated formats on the \F it introduces minimal complexity into
the parser and compiler. The only price to pay would be the \F
itself.
Disallowing interpolated formats on \F has the additional advantage of
making the {} unnecessary in the most common cases (also removing the
'force to string').
The best of both worlds:
sub foo(int $x,int $y)
{
# print "fooing $x with $y\n" if $debug;
# change it to hex format temporarily
print "fooing \Fx$x with \Fx$y\n" if $debug;
}
$msg = "The value of \$y is $( $y but formatted($chosen_format || '0d') )."
-Edwin
> Disallowing interpolated formats on \F has the additional advantage of
> making the {} unnecessary in the most common cases (also removing the
> 'force to string').
As an afterthought: This suggests getting rid of the {} entirely.
The rule could be like:
\\F <printf_format_without_percent> <funny_character_expression>
so
"The value in hex is \Fx$value."
"The value in hex is \Fx%lookup{$key}."
"The value in hex is \Fx$(calculate($x,5))."
would both be ok. For more complex formatting you use C<sprintf> or
C<but formatted>.
I really like that. (It's perlish, too, don't you think?)
-Edwin
> The rule could be like:
>
> \\F <printf_format_without_percent> <funny_character_expression>
After-afterthought:
We know: Everything between the \F and the next funny character is the
format specifier. This allows extensions to the printf-specifiers:
(These extension and more could also be used in C<but formatted>.)
rule format_specifier {
('-' | ' ') <fill_character>? <width>? ('.' <precision>)? <conversion>?
|
<fill_character_no_minus>? <width>? ('.' <precision>)? <conversion>?
}
rule fill_character {
'-' | <fill_character_no_minus>
}
rule fill_character_no_minus {
<!before <conversion>> ( <[^-$@%\\1-9. ]> | <escaped_character> )
}
(Hope I got that right.)
If there is no <conversion> specified, just do the alignment and
filling on the value (which is converted to string before that).
Examples:
$x = 3;
" \F6$x" --> 3
" \F-6$x" --> 3
" \F06$x" --> 000003
" \F*6$x" --> *****3
" \F-*6$x" --> 3*****
" \F\$6$x" --> $$$$$3 (yes, it's ugly)
" \F\-6$x" --> -----3
" \F -6$x" --> -----3 (looks better without the backslash, I think)
" \F--6$x" --> 3-----
"\F*20$()" eq ('*' x 20) (don't want to propose special syntax instead of $())
Another possible extension: If there is a <fill_character> specified,
followed by an 'x' and <width>, interpret it as <fill_character> x
<width> and don't expect the <funny_character_expression>:
"\F*x20" eq ('*' x 20)
...quite irregular though.
-Edwin
Cool, Perlish, scary.
> Examples:
> [snip]
> " \F\$6$x" --> $$$$$3 (yes, it's ugly)
> [snip]
> "\F*20$()" eq ('*' x 20) [...]
The Obfuscated Perl Contest people will LOVE this.
--Dks
>
> As far as the syntax, the () and {} don't make a lot of sense with
> regard to the rest of the language. We could either utilize the
> string/numeric context distinction that already exists in {} and []
> for subscripting, or we could always use () in analog to $().
>
> I'd like to have that dollar in there somewhere, actually.
>
> "The value in hex is \Fx$( expression )."
>
> Or something. That is kinda clunky, though. Maybe just a
> stringification adverb, albeit verbose (but more versatile):
>
> "The value in hex is $( expression where format('x') )"
>
> No, I actually think that should be a property. In fact, one that has
> been discussed before:
>
> "The value in hex is $( expression but formatted('x') )"
>
> That's actually my favorite so far.
>
> > -Edwin
> Luke
maybe the analogy with quotelike expressions in p6 could be usefull
so that "$" inside string are (sort of ) quotelike operators.
this is unambiguos if single ":" cannot be a beginning of variable
name.
"value is $:3int( $value ) or $:5.3float( $value )"
but maybe cleaner way is to have a predefined function which can be
passed modifyers
"value is \F:3int[ $value ] or \F:5.3float[ $value ]"
or just
"value is \F[as=>'3int', $value ] or \F[as=>'5.3float', $value ]"
arcadi
> but maybe cleaner way is to have a predefined function which can be
> passed modifyers
>
How about a pre- or user- defined function that just does sprintf?
"The values are $( sprintflike($format-string, @values))"
Now, if you want to talk about the cool amazing formatting syntax
you've conceived for sprintf replacement, that's fine. But I'm getting
that warm cozeny feeling that this is burning unnecessary listmips.
(Note: In the spirit of the "regex as generator" discussion of a few
months back, I'd love to hear about a rule [production?] based approach
to output formatting...)
=Austin
> Now, if you want to talk about the cool amazing formatting syntax
> you've conceived for sprintf replacement, that's fine. But I'm getting
> that warm cozeny feeling that this is burning unnecessary listmips.
Well, it's a bike shed. But it is a bike shed people use all the
time. The world cannot be run by nuclear scientists alone.
If you don't think it's worth talking about such things, install a
mail filter which deletes all mails not containing the word
'paradigm'.
-Edwin
Perhaps best not to have people expend lots of energy painting bike sheds
until the nuclear reactor's anywhere near functional, though.
I think the whole thing can be done, in whatever style people would like,
using whatever natty syntax, by means of $( ), overloaded string constants,
or, heaven forbid, a purpose-built grammar rule override for double-quoted
strings.
When we have any one of those things.
And I would go so far as to say that since we have proposals for three
different ways to allow people to do it precisely how they like, we don't need
to discuss a way to do it in the core language. At least, certainly not yet.
But then I'm one of those freaks who likes the idea of keeping core Perl 6
generic, extensible, clean and small, and letting all the clever stuff go
into extensions, a heretical position which is way out of favour with the
more influential listfolk, so feel free to ignore my opinion.
> But it is a bike shed people use all the time.
Agreed, I suppose.
% grep printf cvs/modules/**/*pm | wc -l
15
% grep -v printf cvs/modules/**/*pm | wc -l
15360
Well, 0.1% agreed, anyway.
--
Putting a square peg into a round hole can be worthwhile if you don't mind a
few shavings. -- Larry Wall
Now, now, that's hardly a fair comparison. Maybe if you grepped for lines
that contain "print" but not "printf", or simply did a grep -l to count the
number of modules that use printf at all anywhere . . .
I think output formatting is a logical thing to have in the core. It was the
first thing Perl was used for, after all.
I don't think we need a special magical way of doing it inside an
interpolation context, though. I think the less interpolation magic,
the better, and there's already a lot which can go away once we
have $( arbitrary expression ). I think $( sprintf ) is more than adequate.
The sprintf syntax could perhaps stand to be shorter. It is unfortunate that
Perl's string/number duality would make it at the very least awkward to
adopt the Python/Ruby overloaded % operator, which I otherwise like.
Perhaps we could, by analogy with uc() and lc(), introduce an sf() alias?
--
Mark REED | CNN Internet Technology
1 CNN Center Rm SW0831G | mark...@cnn.com
Atlanta, GA 30348 USA | +1 404 827 4754
> % grep printf cvs/modules/**/*pm | wc -l
> 15
> % grep -v printf cvs/modules/**/*pm | wc -l
> 15360
>
> Well, 0.1% agreed, anyway.
Could also mean the current printf syntax is not too popular.
Reusable code is also less likely to use it than the day-to-day code
one writes anew each time (being annoyed about printf).
There should be guidelines about what postings are appreciated on
perl6-language. I'd happily obey them. dev.perl.org says
Description: This list is for discussing user-visible changes to
the language.
It's somewhat unnerving to post on topic and (hopefully) politely and
get a cold (less on topic) reply from someone with "warm feelings". On
the other hand the sharks might miss the occasional bite...
regards
-Edwin
I think your post was spot on; the only problem I had with it is that I felt
it was addressing a problem at too low a level. This could be because I'm a
grouchy old-timer, and I carry over a Perl 5 design principle that says that
changes should be made in as general a way as possible.
I *want* to solve the sprintf-interpolation problem, but I think it's possible
to get too bound-up in syntax and miss more generic ways of solving the same
problem.
A quick review of the early archives of this list might serve to exhibit
the phenomenon. :)
I also think it's possible to get bogged down in low-level details right now,
when the same energy could be used to, say, hash out the MMD "big issue" that
Ziggy mentioned earlier today. I think all that needs to happen at this stage
is that we realise that a nicer way to do formatting in strings would be good,
we look at whether or not it can be done (decently, for *someone*'s definition
of decent ;) with the tools we already have proposed, and if not, flag it as
something to come back to when we need to hammer out the details.
> get a cold (less on topic) reply from someone with "warm feelings". On
> the other hand the sharks might miss the occasional bite...
Unfortunately, it cuts both ways; this is the second post in a row you've
ended with an unnecessary barb. I know I'm no saint as far as that's
concerned, but I also know it doesn't necessary endear people to your point of
view.
--
Facts do not cease to exist because they are ignored.
-- Aldous Huxley
> it was addressing a problem at too low a level. This could be because I'm a
> grouchy old-timer, and I carry over a Perl 5 design principle that says that
> changes should be made in as general a way as possible.
It's a very good principle, I think.
One (tiny) generalization I could think of was to pass everything
between the \F and the funny character as an argument to a method call
on the value. This method then stringifies the value. The default
method just does sprintf or something similar. I see that it would
probably be better to pass something like a general "stringification
context" to the value, which could contain eg. language info.
> Ziggy mentioned earlier today. I think all that needs to happen at this stage
> is that we realise that a nicer way to do formatting in strings
> would be good,
I'm content if this will be revisited (hopefully by someone with
better overview than mine). It just should not be ignored.
> Unfortunately, it cuts both ways; this is the second post in a row you've
> ended with an unnecessary barb. I know I'm no saint as far as that's
> concerned, but I also know it doesn't necessary endear people to your point of
> view.
Look, no barb --> :)
-Edwin
Oh, it definitely won't be ignored. :-) It's come up several times
before -- try searching for "stringification", IIRC -- and has always
sortof fizzled because the higher-ups were never quite ready for it
yet. And there's some primitive type and type conversion questions
that are still unclear -- until those are fleshed out, the
stringification proposals have been a bit "stuck".
But there is broad support for the idea that the somewhat elderly
printf syntax is a PITA, and that printf, in general, should be
completely unnecessary since we already *have* interpolated strings,
fer pete's sake.
If you really want to make your brain hurt, consider this:
stringification can be thought of, obliquely, as the "inverse" of
regexes. One puts strings together, the other takes them apart. And
Perl6 introduces shiny, clean-looking rule syntax:
/here is a <thingy>/
Oooh, pretty.
So if I were in an evil mood, which I almost always am, I'd ask: what's
the inverse of a <thingy> rule? Is it possible that interpolated
strings could benefit from the same angle-bracket syntax? __Is it
possible that there are "output rules" just like there are "input
rules"?__
So what would
"The value of x is <thingy>"
mean, from the interpolation end of things? _Could_ it mean something?
Is it possible that
"The value of x is <expr but formatted(...)>"
is in fact a cleaner, more elegant syntax than:
"The value of x is $(expr but formatted(...))"
Or, if we have "output rules" just like we have "input rules", could
something quite complex be expressed simply as:
"You have <$x as MoneyFormat>"
having previously defined your MoneyFormat "formatting rule" in some
other location?
MikeL
> Or, if we have "output rules" just like we have "input rules", could
> something quite complex be expressed simply as:
>
> "You have <$x as MoneyFormat>"
>
> having previously defined your MoneyFormat "formatting rule" in some
> other location?
"You have <MoneyFormat($x)>", no?
=Austin
Yeah. Though I'd actually hope both forms were acceptable, personally.
I really like the visual karma of the first, representing a "type or
format conversion", more than the second, representing the "creation of
a formatted object" -- though in practice the two notions are of course
identical. :-)
MikeL
Boggle.
I was thinking that the C<as> keyword was reserved for type
transformation, while <Rule($argument)> was already well-defined for
passing arguments to rules.
I'd much rather call a sub than create a temp object and then call a
method.
=Austin
A PITA, yes, but a darned powerful *and concise* PITA.
> Is it possible that
>
> "The value of x is <expr but formatted(...)>"
>
> is in fact a cleaner, more elegant syntax than:
Quite honestly, I'd like to do better. One of the things that makes
regexen so powerful is their concision; they pack a tremendous amount
of meaning into every character and yet, for the most part, they
aren't that hard to understand. I'd like to see the same for output
rules. The vast majority of output rules will probably be on the
order of: "make this an integer and left-pad with 0s to make sure
there are at least 2 digits". I'd much rather write:
"The value of x is \F02d$($x)"
than
"The value of x is $($x as integer but
formatted(<two-digits-left-pad-0>)"
> Or, if we have "output rules" just like we have "input rules", could
> something quite complex be expressed simply as:
>
> "You have <$x as MoneyFormat>"
I like this better; a good compromise between concision and
readability (although the later poster's suggestion of
'MoneyFormat($x)' was even better, IMO).
You still need to define MoneyFormat somewhere, however; I hope that
you will be able to do that with nice concise formatting codes.
--Dks
Not sure I care about that -- the good ones will wind up in core+, or
barely outside, just like useful helper functions.
I do like one other aspect of printf codes, however -- they are
independent of data, or at least they're REALLY late bound. Remember
the signature for printf:
int printf(format_string, ...)
Being able to code a Grammar used for proactive generation of output
makes P6 a real programm(ar)ing language, possibly the first such. :-)
But saying
rule MoneyFormat {
<currency_sigil>?
<digit> {1,3} (<thousands_delimiter><digit> {3})*
(<decimal_delimiter> <digit> {2})?
}
isn't the same as saying
emit MoneyFormat {
my $amt = shift;
my $out;
$amt = int($amt * 100);
$out = $decimal_delimiter _ ($amt % 100);
$amt /= 100;
while $amt > 999 {
$out = $thousands_delimiter _ ($amt % 1000) _ $out;
$amt /= 1000;
}
$out = $currency_sigil _ $amt _ $out;
return $out;
}
And as much as I hate to claim that any of the Apocalypsen are
incomplete, I wonder if maybe we need some more thought paid to the
"FORMAT" chapter -- replacing, or merging, formats with emit-rules
seems like an interesting project. (Although one that won't require
*much* in core- other than a standardized "anonymous stream" mechanism
that we've already talked over [inconclusively] once).
=Austin
I dunno, I think it fires my "change for the sake of change" alarm bells. So
far we're already throwing away thirty years of^W^W^W^W^W^Wrationalising one
Unix little language; can't we leave another one alone, please - at least
until 6.1?
We don't have to fix the entire world immediately; fixing the majority of it
is already taking quite long enough. :(
--
Mohandas K. Gandhi often changed his mind publicly. An aide once asked him
how he could so freely contradict this week what he had said just last week.
The great man replied that it was because this week he knew better.
FWIW I agree with you completely, and strongly.
If it can be outside the core it should be, unless there's a very
good reason why not.
But I guess I'm far from being an influential listfolk these days...
I console myself with a high level of trust in the core design team.
Tim [wandering off into the sunset to ruminate on DBI issues...]
I've not seen one (but then I've not been paying attention, so
forgive me if it's need done already, and perhaps point me to a url).
Tim.
Inform has something like this (though printing is overall very
different than in Perl). There are some functions defined by the
standard library, but any function can be used. It works like this
(in Inform):
! Print stuff according to the usual rules:
print foo, bar, " some constant string", baz;
! Print an object using its short name (which may be a
! routine, in which case it is run and is expected to print
! the appropriate thing, or a string):
print (name) foo;
! Print an object, but use the article in addition to the
! short name. (The object can override the article, and
! objects with the proper attribute don't use one, so this
! is not the same as expressly printing "the"):
print (the) foo;
! Same thing, but capitalise the article:
print (The) foo;
! There are some other predefined formatting routines, but here
! is the general case...
! Pass the object to the quux routine, which is expected to print it
! in some fashion:
print (quux) foo;
The parentheses are not the right syntax for Perl, obviously.
This does come in really handy when interpolating objects into
sentences...
Object thief "Thief" somewhere
with react_before [;
Insert:
move noun to self;
"You attempt to put ", (the) noun, " into " (the) second,
", but ", (the) self, " snatches it away.";
! That looks better with inform-mode syntax highlighting.
OtherAction:
do_stuff();
],
other_properties values, ! Elided here for brevity.
has proper animate;
I suspect an analagous feature would be really convenient in Perl.
If you can make the mental transition from the way Inform does things,
using the object's properties (which may be strings, routines,
whatever, depending on the object) to format it is not really very
different from using a format string to sprintf it. Either way, some
piece of metainformation (which property to use, or the format
string), when applied to a specific item, produces results that look a
certain way.
Hmmm... waitasec, now that I think about the above, we actually have
it, pretty much, even in Perl 5...
$noun->{parent}=self; UpdateObjectTree(noun);
print "You attempt to put ", the(noun), " into " the(second),
", but ", the(self), " snatches it away.\n";
return 1;
Okay, the syntax is ugly, but isn't the Perl6 core going to be
flexible enough to allow syntactic sugar to be built on top? So,
can't this be done outside of core? In particular, isn't it going to
be easier in Perl6 to jump into and out of strings, so that the
combination of double quotes and commas can be reduced to a character?
ISTR something like that in one of the Apocalypse articles. Or was
that for regexes? I get strings and regexes confused...
Oh, and of course in Perl the routines would have to return the
strings rather than printing them, but that's definitely the Perlish
way to do it.
Apart with the leaning toothpick, there are several other problems
with the \F approach:
* It's hard to parse visually.
* It's not general enough.
* It doesn't put the important thing out front.
* It's inventing new syntax when we don't actually need it.
The thing everyone is missing here is that methods can now be
interpolated (and that everything in Perl 6 can be treated as an
object if you want it to be). Suppose we have a method .form that
can format any object. Without changing anything, we already have:
"The value in hex is $value.form('x')."
"The value in hex is %lookup{$key}.form('x')."
"The value in hex is $(calculate($x,5).form('x'))."
That doesn't invent new syntax, and it puts the variable out front
where it belongs. It's more general in that it works outside of
interpolations as well as inside. It's also easier to parse visually.
If that is not deemed easy enough to parse visually, then we can talk
about syntactic relief. If we wanted to stick with standard printf
formats, we could go with a pythonesque operator. We can even go
with % if we add dot to keep it unambiguous:
"The value in hex is $value.%02x."
"The value in oct is $value.%03o."
"The value as string is $value.%-20.20s."
Those dots might look ambiguous there, but they're not, at least in
principle. It's basically using formats as a funny quoting construct
starting with .% and ending with a letter. Nevertheless, if it *looks*
ambiguous, that's also a problem.
On the other hand, if we went with a more rules-based formatting system,
this syntax suggests itself:
"The value in hex is $value.<02x>."
"The value in oct is $value.<03o>."
"The value as string is $value.<-20.20s>."
The nice thing about this approach is that the format is visually
encapsulated inside angles, so it's easy to ignore when you want to,
and easy to find the other end of. It's also more amenable to
interpolation:
"The value as string is $value.<-$x.$y s>."
This also gives us the more general
"The value is $value.<money>."
which can presumably take arguments:
"The value is $value.<money(11, 2, comma => 1)>."
But then maybe you might as well write
"The value is $value.as_money(11, 2, comma => 1)."
or some such.
Still, one could consider inverting the standard formatting sequence
"The value in hex is $value.<x 02>."
"The value in oct is $value.<o 03>."
"The value as string is $value.<s -$x.$y>."
where the defaults are just
"The value in hex is $value.<x>."
"The value in oct is $value.<o>."
"The value as string is $value.<s>."
We could even go as far as to reserve single character rule names
for rules that can work both directions. (Not all can.) But that's
probably a bad idea. On the other hand, if there are predefined rules
like <o>, <d>, and <x>, they'd map naturally to people's formatting
skills.
On the gripping hand, I've always loathed scanf() and its ilk...
But none of that needs to be added for 6.0.0, since methods work
just fine. All we need to figure out for sure is the name of the
standard formatting method. (And its arguments...)
Larry