Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

combinator-based text formatting

1 view
Skip to first unread message

Alex Shinn

unread,
Jun 25, 2007, 2:01:17 AM6/25/07
to
Hi,

The first public release of a new combinator-based formatting
library is available at:

http://synthcode.com/scheme/fmt/

It has everything you could want from a format-string library and
more, including multi-line columnar output and a C pretty-printer
that makes it extremely easy to generate C code. In general it
isn't notably more verbose than format strings, is much easier to
read, and is overall more powerful and extensible.

The current implementation is written in portable R[45]RS Scheme
using SRFI's 1, 6 and 13. It has been tested with Chicken, Gauche,
PLT Scheme and Scheme48, and includes packages for each of these.
The Chicken egg should be available shortly, and the PLT package is
at

http://synthcode.com/scheme/fmt/fmt.plt

This is still a preview release - it's stable and fairly well
tested, but I'm looking for feedback and am willing to change some
aspects of the API.

--
Alex

Eric Hanchrow

unread,
Jun 25, 2007, 11:04:50 AM6/25/07
to
>>>>> "Alex" == Alex Shinn <alex...@gmail.com> writes:

Alex> ... the PLT package is at

Alex> http://synthcode.com/scheme/fmt/fmt.plt

Do you intend to eventually distribute the PLT package via their
"planet" system (http://planet.plt-scheme.org/)? That would be very
convenient for us PLT users.

--
The reason the young care so much about prestige is that the
people they want to impress are not very discerning.
-- Paul Graham

Alex Shinn

unread,
Jun 28, 2007, 11:58:04 AM6/28/07
to
On Jun 26, 12:04 am, Eric Hanchrow <off...@blarg.net> wrote:

> >>>>> "Alex" == Alex Shinn <alexsh...@gmail.com> writes:
>
> Alex> ... the PLT package is at
>
> Alex> http://synthcode.com/scheme/fmt/fmt.plt
>
> Do you intend to eventually distribute the PLT package via their
> "planet" system (http://planet.plt-scheme.org/)? That would be very
> convenient for us PLT users.

Sure, I'd be happy to provide both planet and non-planet
versions. I've made a planet package but currently am getting

"Oops! An internal error occured."

whenever I try to upload to planet.plt-scheme.org.

--
Alex

RCN Newsgroups

unread,
Jun 28, 2007, 1:23:37 PM6/28/07
to
Hi, Alex:

Thanks for a great library. I have a suggestion: I think fmt would be much
easier to learn
if you named your simple combinators after CL/SRFI48 formatting directives,
(~a for dsp, ~s for wrt,
~y for pretty etc.), e.g.:

(fmt #f "Result: " (~d count) " branches in " (~s tree) ~%)

The leading ~ can be used in names of more advanced combinators too, so
there will be
fewer name conflicts and better separation between regular calculations and
formatting.

-Sergei


"Alex Shinn" <alex...@gmail.com> wrote in message
news:1182751277.9...@d30g2000prg.googlegroups.com...

Alex Shinn

unread,
Jun 29, 2007, 6:22:15 AM6/29/07
to
On Jun 26, 12:04 am, Eric Hanchrow <off...@blarg.net> wrote:
> >>>>> "Alex" == Alex Shinn <alexsh...@gmail.com> writes:
>
> Alex> ... the PLT package is at
>
> Alex> http://synthcode.com/scheme/fmt/fmt.plt
>
> Do you intend to eventually distribute the PLT package via their
> "planet" system (http://planet.plt-scheme.org/)? That would be very
> convenient for us PLT users.

OK, you can now access the library from Planet with
the following:

(require (planet "fmt.ss" ("ashinn" "fmt.plt" 1 0)))

and optionally

(require (planet "fmt-c.ss" ("ashinn" "fmt.plt" 1 0)))
(require (planet "fmt-color.ss" ("ashinn" "fmt.plt" 1 0)))
(require (planet "fmt-unicode.ss" ("ashinn" "fmt.plt" 1 0)))

I haven't converted the docs to text yet, so you'll
still want to use the online manual from synthcode.com.

--
Alex

Alex Shinn

unread,
Jun 29, 2007, 6:34:18 AM6/29/07
to
Hi,

On Jun 29, 2:23 am, "RCN Newsgroups" <e...@acm.org> wrote:
>
> Thanks for a great library. I have a suggestion: I think fmt would be much
> easier to learn
> if you named your simple combinators after CL/SRFI48 formatting directives,
> (~a for dsp, ~s for wrt,
> ~y for pretty etc.), e.g.:
>
> (fmt #f "Result: " (~d count) " branches in " (~s tree) ~%)

Well, this example could just as well be written:

(fmt #f "Result: " count " branches in " tree nl)

The formatters that translate directly from CL formatting
are actually used quite rarely. I'll add some side-by-side
comparisons of CL's format vs. fmt in the next release.

Also, I think this would actually make it harder to learn
for people not familiar with CL's format. One of the
problems with CL's format is no matter how much I use it
(and no matter how many times I've implemented my own
version) I can never remember all the obscure formatters.
I want to take advantage of being able to have readable
names (without making them too long for common cases
since I have to compete with ultra-concise format strings).

On the other hand, I'm not especially happy with the names,
and am willing to consider alternatives. ~ could work as
a general prefix, yielding ~write, ~display, ~pad, etc.
The first version actually used a >> suffix (e.g. write>>),
but the code started to look very ugly when you had lots
of these chained together.

--
Alex

Sergei Egorov

unread,
Jun 29, 2007, 3:13:11 PM6/29/07
to
"Alex Shinn" <alex...@gmail.com> wrote in message
news:1183113258.8...@e9g2000prf.googlegroups.com...
>
>> ...

>> (fmt #f "Result: " (~d count) " branches in " (~s tree) ~%)
>
> Well, this example could just as well be written:
>
> (fmt #f "Result: " count " branches in " tree nl)
>

Printing things in a reasonable way by default is a desirable
thing, but it is not what this library is about. I can tweak
my example a little to make it more clear:

(fmt #f "Result: #x" (~x count) " branches in " (~d tree) ~%)


> The formatters that translate directly from CL formatting
> are actually used quite rarely. I'll add some side-by-side
> comparisons of CL's format vs. fmt in the next release.
>
> Also, I think this would actually make it harder to learn
> for people not familiar with CL's format. One of the
> problems with CL's format is no matter how much I use it
> (and no matter how many times I've implemented my own
> version) I can never remember all the obscure formatters.
> I want to take advantage of being able to have readable
> names (without making them too long for common cases
> since I have to compete with ultra-concise format strings).

I did not mean mimicking obscure CL formatters, only the
basics (~a ~s ~% ~c ~x ~o ~d ~b) and perhaps ~w and ~y,
since you support circular structures and pretty-printing.
These, I think, are familiar enough to Scheme users with
no CL background.


> On the other hand, I'm not especially happy with the names,
> and am willing to consider alternatives. ~ could work as
> a general prefix, yielding ~write, ~display, ~pad, etc.
> The first version actually used a >> suffix (e.g. write>>),
> but the code started to look very ugly when you had lots
> of these chained together.

The good thing about having a naming convention of this
sort is that it is much easier to distinguish the actual data
from the formatters:

(fmt #f "Result: " foo " in " bar (baz quux) nl)

vs.

(fmt #f "Result: " foo " in " bar (~baz quux) ~nl)

Another reason is that it allows for more readable names,
having less chances of conflict with other names in the
scope of the formatting expression - nl looks nice, but I
used the name in some list processing functions ('l' is
the original list, 'nl' is the new list).

Adopting full names like ~write, ~display, ~pad solves
the last two problems. I think it would be even greater if,
for those who got used to "format" way of doing things,
your library provided a few common synonyms:

~a = ~display
~s = ~write/unshared
~% = ~nl
~c = ~write-char ; or identity if this is default behavior for chars
~w = ~write ;handles shared structures
~y = ~pretty
~b = (lambda (n) (~num n 2))
~o = (lambda (n) (~num n 8))
~d = (lambda (n) (~num n 10)) ; or identity if this is default behavior for
numbers
~x = (lambda (n) (~num n 16))

It makes the notation shorter too...

Regards,
Sergei


Alex Shinn

unread,
Jul 2, 2007, 9:51:22 AM7/2/07
to
On Jun 30, 4:13 am, "Sergei Egorov" <e...@acm.org> wrote:
>
> Printing things in a reasonable way by default is a desirable
> thing, but it is not what this library is about. I can tweak
> my example a little to make it more clear:
>
> (fmt #f "Result: #x" (~x count) " branches in " (~d tree) ~%)

The current way to write this is actually one character shorter:

(fmt #f "Result: " (radix 16 count) " branches in " tree nl)

Note that the #x is included by default as a result of DSP or
WRT, it's only the manual NUM that omits it.

I've also been thinking of adding the HEX, OCT and BIN shortcuts:

(define (hex . args) (radix 16 (apply-cat args)))

which further shortens the example to

(fmt #f "Result: " (hex count) " branches in " tree nl)

which really looks a lot better to me. With these abbreviations,
the CL specifiers you propose would save zero characters in the
case of NL, and only one character per formatter in all other
cases, most of which can be assumed by default (excepting the
less common PRETTY formatter).

The CL format style specifiers also discourage useful idioms such
as setting the default radix for several values. Scanning the
Linux source (which uses a *lot* of format strings) you find a
lot of messages output multiple pointers or other addresses all
of which you want in hex. To take one example from
fs/jffs2/gc.c:

printk(KERN_DEBUG "Expanded dnode to write from (0x%x-0x%x) to (0x%x-0x
%x)\n",
orig_start, orig_end, start, end)

That's four numbers in hex, all with a 0x prefix.

(fmt #t (hex "Expanded dnode to write from "
"(" orig-start "-" orig-end ") to "
"(" start "-" end ")" nl))

produces the exact same output and looks a lot more readable
to me (at least with syntax-highlighting where the strings
are a different color).

Also, although a consistent prefix (or suffix) looks nice at
first, it's difficult to keep consistent. These are
*combinators*, which encourages you to bind them to temporary
values and build higher-order combinators. You then have to
prefix all your personal formatter utilities with ~, and prefix
all temporary variables with ~, or you destroy the consistency
and the ~ works against readability.

I do appreciate your comments though, and would really like to
hear more opinions - from you and others. If many people prefer
the CL-format-style names I'd consider the switch, but the only
other feedback I've gotten is a request to make the names longer,
and prefix everything in the library with FORMAT:.

> Another reason is that it allows for more readable names,
> having less chances of conflict with other names in the
> scope of the formatting expression - nl looks nice, but I
> used the name in some list processing functions ('l' is
> the original list, 'nl' is the new list).

Is this a local variable name you use, or an exported function?
I do want to avoid name clashes. The other name that worries me
is JOIN which seems way too generic.

--
Alex

Sergei Egorov

unread,
Jul 2, 2007, 1:12:15 PM7/2/07
to
I agree on all points regarding the functionality of CL-style specifiers.
I like the approach taken by the fmt library, the only things bothering
me are steep learning curve from format and namespace pollution.

> Also, although a consistent prefix (or suffix) looks nice at
> first, it's difficult to keep consistent. These are
> *combinators*, which encourages you to bind them to temporary
> values and build higher-order combinators. You then have to
> prefix all your personal formatter utilities with ~, and prefix
> all temporary variables with ~, or you destroy the consistency
> and the ~ works against readability.

There is no need to be *that* consistent. If formatting is a significant
part of my code, I will name my combinators accordingly; in this
situation the naming conventions in the library are not important,
the library is in the focus of what I am doing and the context is clear.

It's the other situation I worry about - if I decide to use the library
in, say, a compiler with tons of local variables and main focus on
data transformation, not on formatting, I'll have to remember yet
another big set of names not to use. Scheme's hygiene does not
solve this problem - if I happened to name something in the
scope 'join' or 'nl', I'll have to rename mine or jump through macro
hoops to use yours. In C++, 'nl' is much less of a problem because of
the type-based overloading.

In any case, I agree that using ~ to prefix every exported name in the
library is misleading - different conventions might need to be adopted
for high-order combinators which don't do any formatting per se.

> I do appreciate your comments though, and would really like to
> hear more opinions - from you and others. If many people prefer
> the CL-format-style names I'd consider the switch, but the only
> other feedback I've gotten is a request to make the names longer,
> and prefix everything in the library with FORMAT:.
>

... which is another way to address the namespace pollution problem.
It may make sence to do both: have plain formatters named like ~this,
name high-order combinators like fmt:this, and add a few 1-letter
aliases (~s etc.) to simplify switching from string-based format.
Being able to write something like this:

(fmt #t "Expanded dnode to write from ("
(~x orig-start) "-" (~x orig-end) ") to ("
(~x start) "-" (~x end) ")\n")

is, in my opinion, a natural first step from format; advanced users can
write

(fmt #t (~hex "Expanded dnode to write from " ;tilde is mine, could not
resist :)


"(" orig-start "-" orig-end ") to "
"(" start "-" end ")" nl))

or make something of their own from the provided building blocks

>> Another reason is that it allows for more readable names,
>> having less chances of conflict with other names in the
>> scope of the formatting expression - nl looks nice, but I
>> used the name in some list processing functions ('l' is
>> the original list, 'nl' is the new list).
>
> Is this a local variable name you use, or an exported function?
> I do want to avoid name clashes. The other name that worries me
> is JOIN which seems way too generic.

Its a local variable name, standing for "new list". I don't usually
name my local variables after verbs, but I would hate to see a clear
name like 'join' renamed or mangled; fmt:join seems to be a better
way out.

-Sergei

Emilio Lopes

unread,
Jul 3, 2007, 1:54:22 PM7/3/07
to
Alex Shinn writes:

> I do appreciate your comments though, and would really like to
> hear more opinions - from you and others. If many people prefer
> the CL-format-style names I'd consider the switch,

I really appreciate your efforts here. Trying to make something
differently, instead of just reinventing the old "printf" or CL's
"format", is not easy and people need time to gather experience and
get used to it.

Incidentally Drew V. McDermott in his Ytools[1] includes alternatives
Common Lisp macros for formatted I/O and iteration (which was subject
of another posting of yours that I enjoyed, BTW).

He also noted that "format stinks".[2]

> but the only other feedback I've gotten is a request to make the
> names longer, and prefix everything in the library with FORMAT:.

Longer names is a matter of taste, as long as you don't do as if you
were writing a SMS :-). But the prefixing idea is just evil. This is
the job of the implementation's module system.

Footnotes:
[1] http://www.cs.yale.edu/homes/dvm/index.html#YTools
[2] http://www.cs.yale.edu/homes/dvm/format-stinks.html


--
Emílio C. Lopes Ich leb und weiß nit wie lang,
Munich, Germany ich stirb und weiß nit wann,
ich fahr und weiß nit wohin,
(Martinus von Biberach) mich wundert, dass ich fröhlich bin!

Sergei Egorov

unread,
Jul 3, 2007, 3:40:12 PM7/3/07
to

"Emilio Lopes" <ec...@gmx.net> wrote in message
news:mnr6np9...@freenet.de...

> [...] But the prefixing idea is just evil. This is


> the job of the implementation's module system.

I suppose this means that having, say, R6RS module system, we can rename
'string-length' and 'vector-length' to just 'length'? I bet you never tried
this
idea in practice...

-Sergei

Emilio Lopes

unread,
Jul 3, 2007, 3:48:02 PM7/3/07
to
Sergei Egorov writes:

> "Emilio Lopes" <ec...@gmx.net> wrote in message
> news:mnr6np9...@freenet.de...

>> [...] But the prefixing idea is just evil. This is
>> the job of the implementation's module system.

> I suppose this means that having, say, R6RS module system, we can rename
> 'string-length' and 'vector-length' to just 'length'?

Sorry, it seems I was not clear enough. I mean that, if I want all
identifiers exported by the module "fmt" to be prefixed with "format:"
I can use the module system of my Scheme implementation of choice to
achieve that.

In Scheme48 for example I would say:

(open (modify fmt (prefix format:)))

The same functionality is provided by PLT Scheme e.g. and surely
others.

0 new messages