Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

snprintf alternatives in iso9899

560 views
Skip to first unread message

Morten W. Petersen

unread,
Dec 8, 2015, 12:17:52 AM12/8/15
to
Hi there.

I'm wondering what the alternatives for snprintf are in ISO 9899, as
it looks like C99 is the earliest standard that includes it.

What I'm trying to do, is print an ASCII string to the console, and
I need to generate the entire string first, before I convert it to
for example UTF-8 wide char. In other words, replace the printf
things that are already in a program with something that can output
Unicode.

Any suggestion here?

Thanks,

Morten

Malcolm McLean

unread,
Dec 8, 2015, 5:14:24 AM12/8/15
to
snprintf() should work with UTF-8, except that any %c calls will only
accept ascii, and you get the number of bytes not the number of characters
(which is probably what you want).

If you use a wide encoding, I can't find a standard equivalent of wnsprintf(),
which is the Microsoft function - I've found that the wide sprints don't
support floating point, presumably on the basis that Europeans use a
comma as a decimal point, however.

Ben Bacarisse

unread,
Dec 8, 2015, 5:39:16 AM12/8/15
to
Malcolm McLean <malcolm...@btinternet.com> writes:

> On Tuesday, December 8, 2015 at 5:17:52 AM UTC, Morten W. Petersen wrote:
>> Hi there.
>>
>> I'm wondering what the alternatives for snprintf are in ISO 9899, as
>> it looks like C99 is the earliest standard that includes it.
>>
>> What I'm trying to do, is print an ASCII string to the console, and
>> I need to generate the entire string first, before I convert it to
>> for example UTF-8 wide char. In other words, replace the printf
>> things that are already in a program with something that can output
>> Unicode.
>>
> snprintf() should work with UTF-8, except that any %c calls will only
> accept ascii,

Not with the l modifier.

> and you get the number of bytes not the number of characters
> (which is probably what you want).
>
> If you use a wide encoding, I can't find a standard equivalent of wnsprintf(),
> which is the Microsoft function

swprintf?

> - I've found that the wide sprints don't
> support floating point, presumably on the basis that Europeans use a
> comma as a decimal point, however.

swprintf does floating point correctly on my system.

--
Ben.

Ben Bacarisse

unread,
Dec 8, 2015, 9:52:31 AM12/8/15
to
"Morten W. Petersen" <mor...@gmail.com> writes:

> I'm wondering what the alternatives for snprintf are in ISO 9899, as
> it looks like C99 is the earliest standard that includes it.

snprintf was implemented "in the wild" before standardisation, so some
pre-C99 systems had it. You could just assume C90+snprintf.

But there are alternatives. Some systems have an allocating printf (for
example asprintf) and you can always fprintf to a temp file and then
read back only 'n' bytes.

> What I'm trying to do, is print an ASCII string to the console, and
> I need to generate the entire string first, before I convert it to
> for example UTF-8 wide char.

I don't follow this at all. When you say "to the console" do you mean
"to the standard output stream"? If so, printing ASCII is usually
trivial (just use printf). If you mean something else by the console,
you'll need to ask in a system-specific group (for example, on a
Unix-like system you will probably be advised to use syslog rather than
the actual console).

And what is UTF-8 wide char? Those two terms are at odds with each
other in that UTF-8 is usually an alternative to "wide char" output.

> In other words, replace the printf
> things that are already in a program with something that can output
> Unicode.

You can't output Unicode -- you need to output some enconding or
Unicode. I know this will sound like a nit-pick answer but I'm trying
to find the actual question that is obscured by you use of words. It
can't be what you seems to be saying, because if you are printing ASCII,
and you are using UTF-8 as the Unicode encoding, then there is no
conversion to do.

--
Ben.

supe...@casperkitty.com

unread,
Dec 8, 2015, 10:15:18 AM12/8/15
to
On Tuesday, December 8, 2015 at 8:52:31 AM UTC-6, Ben Bacarisse wrote:
> snprintf was implemented "in the wild" before standardisation, so some
> pre-C99 systems had it. You could just assume C90+snprintf.

It's too bad there's no standard for a general-purpose printf which takes
a pointer to a structure with one or two function pointers and a void*, and
uses one of the passed-in functions for output (passing the void* to the
function, which can then use it as it sees fit). While there may be some
question as to whether it's better to use one function pointer for a function
that takes a char, or have separate versions that take individual characters
or a pointer+length combo, a general-purpose function could absorb all the
functions of sprintf, vprintf, fprintf, etc. plus other system-specific
variations; given such an arrangement, things like snprintf could easily be
built on top of them. Efficiency would be better if ANSI C had allowed a
pointer to one structure type to point to any other with the same initial
sequence (saving a level of indirection, since the wrapper function could
simply pass one pointer to a structure containing the functions and the
information they needed, rather than having to have a single structure type
with the pointers and a void* which then had to be indirected to get the
necessary info) but such a system would work decently in any event.

Keith Thompson

unread,
Dec 8, 2015, 11:41:52 AM12/8/15
to
"Morten W. Petersen" <mor...@gmail.com> writes:
> I'm wondering what the alternatives for snprintf are in ISO 9899, as
> it looks like C99 is the earliest standard that includes it.

ISO 9899 is the C standard. The current edition was published in 2011.
That edition supersedes the 1990 and 1999 editions.

I think you meant to ask about C90.

> What I'm trying to do, is print an ASCII string to the console, and
> I need to generate the entire string first, before I convert it to
> for example UTF-8 wide char. In other words, replace the printf
> things that are already in a program with something that can output
> Unicode.

C90 didn't have snprintf; that's why it was added to the language
in 1999. If you need to use snprintf, you can probably just go
ahead and use it. Most C implementations provide it, even if they
don't fully support C99.

If you can't use snprintf, you might be able to compute a maximum
size for the target array. sprintf doesn't check for a buffer
overrun but you can get away without that check if you're *very*
careful.

But I think you're a bit confused about UTF-8. There's no such
thing as a "UTF-8 wide char". UTF-8 is a representation of
Unicode that uses one to four octets for each Unicode code point.
ASCII is a 7-bit character set, usually represented using one
octet per character. ASCII text *is* UTF-8 text (that was a major
requirement for the design of UTF-8). If you have an ASCII string
that you want to print to the console, just print it.

You'll need to be more specific about what you're trying to do.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ben Bacarisse

unread,
Dec 8, 2015, 11:47:58 AM12/8/15
to
r...@zedat.fu-berlin.de (Stefan Ram) writes:

> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>>But there are alternatives. Some systems have an allocating printf (for
>>example asprintf) and you can always fprintf to a temp file and then
>>read back only 'n' bytes.
>
> I have published the source code for an allocating printf here:
>
> www.purl.org/stefan_ram/pub/c_faq_de
>
> (in case of a 403 HTTP error, one can try a Google referrer).
>
> The function is called »salfmt«. An example client:

Did you ever show it to a compiler? There's no way that code got
through one!

(Note that since it's just a wrapper around vsnprintf, it won't help
Morten.)

<snip>
--
Ben.

Keith Thompson

unread,
Dec 8, 2015, 12:19:00 PM12/8/15
to
r...@zedat.fu-berlin.de (Stefan Ram) writes:
> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>>Did you ever show it to a compiler? There's no way that code got
>
> The last version of salfmt was not tested, indeed.
> (I think, after I inserted »va_copy«, I did not
> test again.)
>
> But previous versions were tested, and I hope to
> find the time to one day maintain that page again.
>
> If you refer to a specific location in the code,
> feel free to point out where you see the problem.

www.purl.org/stefan_ram/pub/c_faq_de

I haven't tried compiling it myself, but a couple of things jump out.

s_type salfmt( s_type const f, ... )
{ va_list a;
char * b = 0;
va_copy(a,f); { int const s = vsnprintf( 0, 0, f, a ); va_end(a); }
if( s >= 0 )
{ size_t const k = 1 + s; if( b = malloc( k ))
{ va_start(a,f); { vsprintf( b, f, a ); va_end(a); }}}
return b; }

The code layout makes it difficult to read.

s_type is a typedef for char*. (IMHO the code would be much easier to
read if it used char* directly.)

You call va_copy with an argument of type char*. That might happen to
compile depending on how va_list is defined, but it's clearly wrong.

You define `s` inside a block and then refer to it outside the block.

(Personally I wouldn't publish code on a web page without at least
compiling it first.)

Ben Bacarisse

unread,
Dec 8, 2015, 12:27:44 PM12/8/15
to
r...@zedat.fu-berlin.de (Stefan Ram) writes:

> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>>Did you ever show it to a compiler? There's no way that code got
>
> The last version of salfmt was not tested, indeed.
> (I think, after I inserted »va_copy«, I did not
> test again.)
>
> But previous versions were tested, and I hope to
> find the time to one day maintain that page again.
>
> If you refer to a specific location in the code,
> feel free to point out where you see the problem.

I couldn't see the problem. I'm usually good at that, but I find your
layout almost unreadable. gcc spotted the two main problems but I can
now fill in some details.

The va_copy is wrong in various ways: the second argument must be a
va_list, you need to call va_start in order to have something to copy,
and you need a separate va_list to copy into. I'm sure, though, that
you need va_copy at all.

There is also an unused int (s) and an undeclared name (s) which are
obviously intended to be the same thing.

--
Ben.

Morten W. Petersen

unread,
Dec 8, 2015, 8:55:29 PM12/8/15
to
Ah yes. Well I want to output the parsed XML on stdout, to aid
in the development and debugging process.

I have been a bit confused by this whole console/stdout/encoding/
character type thing yes, and my thought initially was that it was
better to use the same function to output all kinds of strings, and
that converting something that is only ASCII characters to a wide char
and then printing it was better than mixing the (C) types of characters
that are sent to stdout.

Have I understood it correctly when I say that to output what is UTF-32
internally, I need to break down each 32-bit character to a sequence
of 1-4 regular char and send those to stdout using for example putc,
if the encoding used in stdout is UTF-8?

I guess this printing function will be used to output XML to
files as well, so might just as well do it properly.

-Morten

Ian Collins

unread,
Dec 8, 2015, 9:04:25 PM12/8/15
to
Morten W. Petersen wrote:
>
> Ah yes. Well I want to output the parsed XML on stdout, to aid
> in the development and debugging process.
>
> I have been a bit confused by this whole console/stdout/encoding/
> character type thing yes, and my thought initially was that it was
> better to use the same function to output all kinds of strings, and
> that converting something that is only ASCII characters to a wide char
> and then printing it was better than mixing the (C) types of characters
> that are sent to stdout.
>
> Have I understood it correctly when I say that to output what is UTF-32
> internally, I need to break down each 32-bit character to a sequence
> of 1-4 regular char and send those to stdout using for example putc,
> if the encoding used in stdout is UTF-8?

I guess that's what c32rtomb() is there for...

--
Ian Collins

Ben Bacarisse

unread,
Dec 8, 2015, 9:16:19 PM12/8/15
to
He's decided to limit himself to C90 so there is no c32rtomb. But even
if only C99 were assumed he could just use printf with %lc and %ls
formats (provided wchar_t is also suitable).

To answer Morten: yes, in C90 you need to write your own code to output
the UTF-8 encoding of a character or a string.

--
Ben.

Malcolm McLean

unread,
Dec 9, 2015, 7:49:58 AM12/9/15
to
Yes.

You probably want a unicode output stream.
It's build on fput_unich(), that is a function which takes a single unicode code point
(probably as a long), and outputs it.
It's then not hard to have modes, so it can output UTF-8, or big or little endian
UTF-32, or UTF-16. At the same time you can set it up to output to memory
streams or a stdio FILE *, or a socket.
You can of course also add convenience functions to it, e.g. to open or close
a tag, or check that a close tag is valid, or a unicode printf().

Tim Rentsch

unread,
Dec 9, 2015, 4:14:20 PM12/9/15
to
supe...@casperkitty.com writes:

> On Tuesday, December 8, 2015 at 8:52:31 AM UTC-6, Ben Bacarisse wrote:
>> snprintf was implemented "in the wild" before standardisation, so some
>> pre-C99 systems had it. You could just assume C90+snprintf.
>
> It's too bad there's no standard for a general-purpose printf
> which takes a pointer to a structure with one or two function
> pointers and a void*, and uses one of the passed-in functions for
> output (passing the void* to the function, which can then use it
> as it sees fit). [...]

If the lack bothers you, why don't you write one? I did.

Eric Sosman

unread,
Dec 9, 2015, 4:28:44 PM12/9/15
to
It seems to me that a more flexible and powerful approach would
be to implement such things at the FILE* level. It's easy to imagine
things along the lines of

FILE *stream = fopen("http://www.c-faq.com/", "r");

... and it wouldn't be a huge departure to offer something like

FILE *funcopen(int (*)reader(void*), int (*)writer(void*),
void *userarg);
FILE *stream = funcopen(NULL, mywriter, buffptr);

Something in this vein would be, I think, superior to running around
endlessly adding capabilities to printf(), scanf(), fseek(), ...

--
eso...@comcast-dot-net.invalid
"Don't be afraid of work. Make work afraid of you." -- TLM

Ian Collins

unread,
Dec 9, 2015, 4:35:18 PM12/9/15
to
This is similar in spirit to the C++ approach of associating a stream
buffer object with a stream. Both provide a convenient decoupling
between the internal and external representation of data as well as the
external source.

--
Ian Collins

Eric Sosman

unread,
Dec 9, 2015, 4:49:17 PM12/9/15
to
If it's "similar to C++", I withdraw the suggestion. C++, ick! ;)

*I* thought I was following Unix style, where all manner of non-I/O
things have file-ish interfaces: /dev/null, /dev/random, /proc, ...
These interfaces make it possible to employ all existing file-using
tools on the non-file thingummies, at a stroke and without (much)
additional effort. I've always thought the FILE* abstraction could be
used in much the same way.

Ian Collins

unread,
Dec 9, 2015, 5:00:53 PM12/9/15
to
Eric Sosman wrote:
> On 12/9/2015 4:35 PM, Ian Collins wrote:
>> Eric Sosman wrote:
>>>
>>> It seems to me that a more flexible and powerful approach would
>>> be to implement such things at the FILE* level. It's easy to imagine
>>> things along the lines of
>>>
>>> FILE *stream = fopen("http://www.c-faq.com/", "r");
>>>
>>> ... and it wouldn't be a huge departure to offer something like
>>>
>>> FILE *funcopen(int (*)reader(void*), int (*)writer(void*),
>>> void *userarg);
>>> FILE *stream = funcopen(NULL, mywriter, buffptr);
>>>
>>> Something in this vein would be, I think, superior to running around
>>> endlessly adding capabilities to printf(), scanf(), fseek(), ...
>>
>> This is similar in spirit to the C++ approach of associating a stream
>> buffer object with a stream. Both provide a convenient decoupling
>> between the internal and external representation of data as well as the
>> external source.
>
> If it's "similar to C++", I withdraw the suggestion. C++, ick! ;)

:)

> *I* thought I was following Unix style, where all manner of non-I/O
> things have file-ish interfaces: /dev/null, /dev/random, /proc, ...
> These interfaces make it possible to employ all existing file-using
> tools on the non-file thingummies, at a stroke and without (much)
> additional effort. I've always thought the FILE* abstraction could be
> used in much the same way.

C++ is following Unix style! In C++, iostreams handle the formatting of
data for I/O and the stream buffer associated with the stream handles
the low level raw I/O. Thus in C++ you can employ standard I/O on the
non-file thingummies.

--
Ian Collins

supe...@casperkitty.com

unread,
Dec 9, 2015, 5:02:26 PM12/9/15
to
On Wednesday, December 9, 2015 at 3:28:44 PM UTC-6, Eric Sosman wrote:
> It seems to me that a more flexible and powerful approach would
> be to implement such things at the FILE* level. It's easy to imagine
> things along the lines of

The difficulty with that is that the C Standard regards FILE as an opaque
type. While many implementations provide means of creating a FILE object
with callback functions, they all do so differently. I would propose a
single type which would behave like an output file, but have a defined
format to allow application code to extend its capabilities in a standard
way.

Richard Heathfield

unread,
Dec 9, 2015, 5:02:49 PM12/9/15
to
[Tangent!]

On 09/12/15 21:28, Eric Sosman wrote:

<snip>

> It seems to me that a more flexible and powerful approach would
> be to implement such things at the FILE* level.

In general, the FILE * interface for streams makes a good general
pattern for resource management (although I would have preferred for
fclose to take FILE **, such that after fclose(&fp) fp would have a null
pointer value).

The pattern can be adapted for all kinds of things. For example:

canvas *canvas_create(size_t width, size_t height, int bgcol);
canvas *canvas_load(const char *filename, int type);
int canvas_setpixel(canvas *c, size_t x, size_t y, unsigned long colour);
int canvas_setpixelrgb(canvas *c, size_t x, size_t y, int r, int g, int b);
int canvas_linedraw(canvas *c, size_t left, size_t top, size_t right,
size_t bottom);
/* etc */
void canvas_destroy(canvas **pc);

This method can be adapted to use constructors, iterators, and
destructors (C style, not C++ style!), and works well with opaque types.

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

Keith Thompson

unread,
Dec 9, 2015, 5:49:19 PM12/9/15
to
Eric Sosman <eso...@comcast-dot-net.invalid> writes:
> On 12/9/2015 4:14 PM, Tim Rentsch wrote:
>> supe...@casperkitty.com writes:
>>
>>> On Tuesday, December 8, 2015 at 8:52:31 AM UTC-6, Ben Bacarisse wrote:
>>>> snprintf was implemented "in the wild" before standardisation, so some
>>>> pre-C99 systems had it. You could just assume C90+snprintf.
>>>
>>> It's too bad there's no standard for a general-purpose printf
>>> which takes a pointer to a structure with one or two function
>>> pointers and a void*, and uses one of the passed-in functions for
>>> output (passing the void* to the function, which can then use it
>>> as it sees fit). [...]
>>
>> If the lack bothers you, why don't you write one? I did.
>
> It seems to me that a more flexible and powerful approach would
> be to implement such things at the FILE* level. It's easy to imagine
> things along the lines of
>
> FILE *stream = fopen("http://www.c-faq.com/", "r");

The string "http://www.c-faq.com/" could *also* be a valid local file
name. On a POSIX system, there could be a directory named "http:" with
a subdirectory name "www.c-faq.com". The trailing "/" means that the
whole thing refers to a directory, but "http://www.c-faq.com/index.html"
could be an ordinary file.

You could set up a "/url" directory, for example, so a file name like
"/url/http://www.c-faq.com/" would unambiguously refer to that URL.

I haven't heard of anyone actually doing that. The POSIX popen()
function, along with a command like "curl", can do pretty much the same
thing.

One issue is that the things URLs refer to only partially act like
files.

> ... and it wouldn't be a huge departure to offer something like
>
> FILE *funcopen(int (*)reader(void*), int (*)writer(void*),
> void *userarg);
> FILE *stream = funcopen(NULL, mywriter, buffptr);
>
> Something in this vein would be, I think, superior to running around
> endlessly adding capabilities to printf(), scanf(), fseek(), ...

There's the GNU-specific fopencookie().

Morten W. Petersen

unread,
Dec 10, 2015, 1:59:23 AM12/10/15
to
Hm, OK. As for the convenience functions, I don't see them as
necessary. An integrator can build the internal (memory)
representation and manipulate that. When it comes to turning
that into XML, one function can be called with any given element
whether it is the root element or one nested further down and
print that.

I managed to create a printer that works with UTF-8 terminals

https://github.com/morphex/smash_xml/blob/9585a3e5a932541d17d63640b6f4b075467bd818/decode_xml.c#L249

it's an adaptation of the algorithm from the Unicode consortium,
I'm not sure about copyright etc. there but it is different and
there are only so many ways to build UTF-8 I guess.

Took me way too long to figure out how to do it, as I had to turn
on UTF-8 mode in screen as well as the terminal emulator.

But there it is, happy about that. :)

-Morten

Malcolm McLean

unread,
Dec 10, 2015, 5:41:47 AM12/10/15
to
/*
Create a user FILE object
putch - function for writing character to stream (can be null)
getch - function fro reading character from stream (can be null)
close - function to destroy user context
userpointer - context pointer for callbacks
*/
FILE *userfp( int (*putch)(int ch, void *ptr), int (*getch)(void *ptr), void (*close)(void *ptr),
void *userpointer);

However you're adding a check to every stdio call if you add this function.

Tim Rentsch

unread,
Jan 29, 2016, 2:57:47 PM1/29/16
to
After seeing this and re-reading the earlier comments I think I
misconstrued what the previous discussion was trying to say.
I withdraw my comment.
0 new messages