scanf style parsing

Bruce Dawson

unread,

Sep 26, 2001, 1:42:45 AM9/26/01

to

I love programming in Python, but there are some things I have not found
the easy way to do. I understand that Python is supposed to be good at
text parsing, but I am having trouble with this simple task. Given this
text (the output from VisualC++) I want to find out how many errors and
warnings there were:

smtpmail.exe - 0 error(s), 0 warning(s)

In C/C++ that would be something like:
sscanf(buffer, "smtpmail.exe - %d error(s), %d warning(s)", &errors,
&warnings);

It's not that I think the sscanf syntax is particularly elegant, but it
sure is compact! I saw the discussion about adding scanf to Python

http://mail.python.org/pipermail/python-dev/2001-April/014027.html

but I need to know what people do right now when faced with this task.

Let me know. E-mail answer as well would be nice, since this newsgroup
is rather high-traffic.

Andrew Dalke

unread,

Sep 26, 2001, 2:28:37 AM9/26/01

to

Bruce Dawson:

>Given this text (the output from VisualC++) I want to find out
>how many errors and warnings there were:
>
>smtpmail.exe - 0 error(s), 0 warning(s)
>
>In C/C++ that would be something like:
>sscanf(buffer, "smtpmail.exe - %d error(s), %d warning(s)", &errors,
>&warnings);

The usual solution is to use regular expressions for this.

>>> import re
>>> text = "smtpmail.exe - 0 error(s), 5 warning(s)"
>>> m = re.match(r"smtpmail.exe - (\d+) error$s$, (\d+) warning$s$",
... text)
>>> m.group(1), m.group(2)
('0', '5')
>>>

See the documentation for the 're' module.

Andrew
da...@dalkescientific.com

Skip Montanaro

unread,

Sep 26, 2001, 2:14:58 AM9/26/01

to Bruce Dawson, pytho...@python.org

Bruce> I want to find out how many errors and warnings there were:

Bruce> smtpmail.exe - 0 error(s), 0 warning(s)

Bruce> In C/C++ that would be something like:
Bruce> sscanf(buffer, "smtpmail.exe - %d error(s), %d warning(s)", &errors,
Bruce> &warnings);

Bruce> It's not that I think the sscanf syntax is particularly elegant,
Bruce> but it sure is compact! I saw the discussion about adding scanf
Bruce> to Python

Bruce> but I need to know what people do right now when faced with this
Bruce> task.

This is a task often relegated to regular expressions, which probably
explains (in part) why scanf is not found in python. You could define a
regular expression like

errpat = re.compile(r'(?P<exe>[^\s]+)\s+-\s+'
r'(?P<err>\d+)\s+error$s$,\s+'
r'(?P<warn>\d+)\s+warning$s$')

then use it to extract the program name, number of errors and number of
warnings like so:

mat = errpat.match(line)
if mat is not None:
exe = mat.group('exe')
nerrs = int(mat.group('err')
nwarns = int(mat.group('warn')

It's not nearly as compact as a similar scanf pattern, but regular
expressions are considerably more powerful.

HTH,

--
Skip Montanaro (sk...@pobox.com)
http://www.mojam.com/
http://www.musi-cal.com/

Richard Jones

unread,

Sep 26, 2001, 2:09:29 AM9/26/01

to Bruce Dawson, pytho...@python.org

On Wednesday 26 September 2001 15:42, Bruce Dawson wrote:
> I love programming in Python, but there are some things I have not found
> the easy way to do.

That's what we're here for :)

> I understand that Python is supposed to be good at
> text parsing, but I am having trouble with this simple task. Given this
> text (the output from VisualC++) I want to find out how many errors and
> warnings there were:
>
> smtpmail.exe - 0 error(s), 0 warning(s)
>
> In C/C++ that would be something like:
> sscanf(buffer, "smtpmail.exe - %d error(s), %d warning(s)", &errors,
> &warnings);
>
> It's not that I think the sscanf syntax is particularly elegant, but it
> sure is compact! I saw the discussion about adding scanf to Python
>
> http://mail.python.org/pipermail/python-dev/2001-April/014027.html
>
> but I need to know what people do right now when faced with this task.

Right now, people use regular expressions, which are more flexible that
sscanf, but don't do the type conversions (everything comes out as a string)
and are a little more verbose, code-wise.

[richard@ike ~]% python
Python 2.1.1 (#1, Jul 20 2001, 22:37:24)
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.58mdk)] on linux-i386
Type "copyright", "credits" or "license" for more information.
>>> import re
>>> scan = re.compile(r'smtpmail.exe - (\d+) error$s$, (\d+) warning$s$')
>>> result = scan.match("smtpmail.exe - 0 error(s), 0 warning(s)")
>>> errors, warnings = map(int, result.groups())
>>> errors
0
>>> warnings
0
>>>

... or something similar. The RE can be made more flexible to allow, eg. the
non-existence of the ", %d warning(s)" part:

>>> scan = re.compile(r'foo.exe - (\d+) error$s$(, (\d+) warning$s$)?')
>>> result = scan.match("foo.exe - 0 error(s)")
>>> result.groups()
('0', None, None)
>>> result = scan.match("foo.exe - 0 error(s), 0 warning(s)")
>>> result.groups()
('0', ', 0 warning(s)', '0')

... and so on...

Richard

Andrei Kulakov

unread,

Sep 26, 2001, 6:21:05 PM9/26/01

to

On Wed, 26 Sep 2001 05:42:45 GMT, Bruce Dawson <comm...@cygnus-software.com> wrote:
> I love programming in Python, but there are some things I have not found
> the easy way to do. I understand that Python is supposed to be good at
> text parsing, but I am having trouble with this simple task. Given this
> text (the output from VisualC++) I want to find out how many errors and
> warnings there were:
>
> smtpmail.exe - 0 error(s), 0 warning(s)

Seeing that there should only be 2 words here that are numbers, you can do
this:

my_str # that's where the result is
words = my_str.split()
lst = []
for word in words:
try:
result = int(word)
lst.append(result)
except:
pass

errors = lst[0]
warnings = lst[1]

I'm not sure how can this format vary, but if these numbers are always at
the same place, you could have just done:

errors, warnings = int(words[2]), int(words[4])

- Andrei

--
Cymbaline: intelligent learning mp3 player - python, linux, console.
get it at: cy.silmarill.org

Bruce Dawson

unread,

Sep 27, 2001, 2:34:08 AM9/27/01

to

Wow! Great answers. And incredibly fast.

Thanks to all.

Rather than the long and complicated PEP route of adding scanf
style functionality to Python it would probably be enough to just add some
more examples to the Python regexp documentation. I searched the Python
documentation for scanf() and I looked at the regexp documentation, so if
it had contained the examples in this reply or others I would have done
my parsing properly long ago. For Perl hackers it is easy to figure out
regexp, but for us old C/C++ types, it's *tough*

Then again, I *always* say that the documentation is the problem...

Thanks again.

Tim Hammerquist

unread,

Sep 27, 2001, 6:48:24 AM9/27/01

to

Me parece que Bruce Dawson <comm...@cygnus-software.com> dijo:

> For Perl hackers it is easy to figure out
> regexp, but for us old C/C++ types, it's *tough*

It's not usually easy to learn regexps, no matter what your background.
I come from C/C++ roots (Turbo C++ 3.0) and TRS-80 BASIC before that,
and I certainly had no idea what regex's were really for until I looked
at Perl.

I struggled with regex's for months. I even had to take some time away
from Perl and regex's to calm down and not be so intimidated. Many
Pythonistas I've heard in this ng had a lot of difficulty with regular
expressions, and with *good reason*.

Of course, by the time I finally grasped regular expressions, I would
be looking at my mail and catch myself mentally writing a regex to
parse my gas bill! This is part of why Perler's are a bit overzealous
with regex's. Python's syntax tames this pretty quick tho, and that's a
good thing.

Regex's are useful and powerful. But they're also very easy to abuse.
I've actually seen the following Perl code:

if ($filename =~ /\.txt$/) { ... }

Which would be roughly equivalent to:

m = re.search(r'\.txt$', filename)
if m:
...

or, much more preferably:

if filename[-4:] == '.txt':
...

I think another reason for Perlers overusing regex's is Perl's shortage
of convenient string indexing operators.

The equivalent of the last Python code in Perl is:

if (substr($filename, -4) eq '.txt') { ... }

But don't think regex's are disposable just because Python's string type
is more convenient. Consider the following:

# perl
if ($filename =~ /\.([ps]?html?|cgi|php[\d]?|pl)$/) { ... }
# python
re_web_files = re.compile(r'\.([ps]?html?|cgi|php[\d]?|pl)$')
m = re_web_files.search(filename)
if m:
...

This is a very complicated (but relatively efficient way) to match files
with all the folowing extensions:
.htm .html .shtm .shtml .phtm .phtml
.cgi
.php .php2 .php3 .php4
.pl

Even with Python's less convenient class implementation of regex's (as
opposed to Perl's operator implementation), not a bad example, and half
of the power of regular expressions hasn't even been displayed here.

If you don't need a regex, don't feel obligated. (You very rarely *need*
a regex, but workarounds can get pretty ugly.)
Use them sparingly and they can save your butt. They did mine. <wink>

--
In 1968 it took the computing power of 2 C-64's to fly a rocket to the moon.
Now, in 1998 it takes the Power of a Pentium 200 to run Microsoft Windows 98.
Something must have gone wrong.

Duncan Booth

unread,

Sep 27, 2001, 7:32:35 AM9/27/01

to

t...@vegeta.ath.cx (Tim Hammerquist) wrote in
news:slrn9r61o...@vegeta.ath.cx:

> But don't think regex's are disposable just because Python's string type
> is more convenient. Consider the following:
>
> # perl
> if ($filename =~ /\.([ps]?html?|cgi|php[\d]?|pl)$/) { ... }
> # python
> re_web_files = re.compile(r'\.([ps]?html?|cgi|php[\d]?|pl)$')
> m = re_web_files.search(filename)
> if m:
> ...
>
> This is a very complicated (but relatively efficient way) to match files
> with all the folowing extensions:
> .htm .html .shtm .shtml .phtm .phtml
> .cgi
> .php .php2 .php3 .php4
> .pl

Wouldn't you be happier with this?:

extensions = ['.htm', '.html', '.shtm', '.shtml', '.phtm',
'.phtml', '.cgi', '.php', '.php2', 'php3', '.php4', '.pl']
ext = os.path.splitext(filename)[1]
if ext in extensions:
...

which has the arguable advantage of matching what your description says
instead of what your original code does.

regexes are wonderful: in moderation.

--
Duncan Booth dun...@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Jon Nicoll

unread,

Sep 27, 2001, 10:48:14 AM9/27/01

to

trivial followup: isn't it ... amazing ... the number of programs
which still have this kind of output?

> >
> > smtpmail.exe - 0 error(s), 0 warning(s)

___ ___

jon N

Oleg Broytmann

unread,

Sep 27, 2001, 10:59:41 AM9/27/01

to pytho...@python.org

No, it is not. In Russian, for example, it is very hard to automatically
change a word - Russian is very computer-unfriendly :) There are far too
many rules, conditions, exceptions and so on.
On the lighter side - it has very simple rules of pronounciation. Unlike
that damn English, that has exactly One Rule of Pronounciation: every word
is an exception :)))

Oleg.
----
Oleg Broytmann http://phd.pp.ru/ p...@phd.pp.ru
Programmers don't die, they just GOSUB without RETURN.

Skip Montanaro

unread,

Sep 27, 2001, 11:32:15 AM9/27/01

to Bruce Dawson, pytho...@python.org

Bruce> For Perl hackers it is easy to figure out regexp, but for us old
Bruce> C/C++ types, it's *tough*

Indeed. Regular expressions have been around for ages, but they always
seemed so clumsy to use from C. I think their full integration into tools
like Emacs and Perl have made them much more widely available than they
otherwise would have been.

Skip

Skip Montanaro

unread,

Sep 27, 2001, 12:24:42 PM9/27/01

to ti...@cpan.org, pytho...@python.org

Tim> It's not usually easy to learn regexps, no matter what your
Tim> background. I come from C/C++ roots (Turbo C++ 3.0) and TRS-80
Tim> BASIC before that, and I certainly had no idea what regex's were
Tim> really for until I looked at Perl.

I think the best way to learn about regular expressions is to use
incremental regular expression searching in Emacs/XEmacs. Just bind C-s and
C-r to isearch-forward-regexp and isearch-backward-regexp. Then, every time
you search you're using re's. Initially you'll just use plain strings, but
eventually start mixing in "." and character classes. Before you know it
"*" and "+" will be your buddies too. Once you start adding "$", "\|" and
"$" to your repertoire, you will attain enlightenment. ;-)

You'll generally never cook up complex regular rexpressions using
incremental search because you have no convenient way to correct mistakes
and retry, but you will use all the pieces and build up more complex stuff
when you're programming Perl or Python. Making the leap from Emacs's
old-style re's to POSIX-style re's as Perl and Python use now is fairly
straightforward. Mostly it involves getting rid of backslashes and learning
about {m,n}, \d, \s and other little shortcuts. (I still almost never use
\d. My fingers just type [0-9] automatically.)

maybe-the-best-argument-against-vi-ly, yr's

Chris Barker

unread,

Sep 27, 2001, 1:53:57 PM9/27/01

to

Bruce Dawson wrote:

> Rather than the long and complicated PEP route of adding scanf
> style functionality to Python it would probably be enough to just add some
> more examples to the Python regexp documentation. I searched the Python
> documentation for scanf() and I looked at the regexp documentation, so if
> it had contained the examples in this reply or others I would have done
> my parsing properly long ago.

I completely disagree: scanf-type functionality is by no means a
replacement for REs or vice-versa. I'm not all that familiar with scanf,
but from what I have seen, it is really only useful for very simple text
scanning, mostly to extract numbers from strings (or files). When scanf
would work well, Python's string functions would generally work just
fine (see the examples given in this thread--much easier than the re
methods!). When regexs are warranted, scanf would be useless.

That being said, I am a strong advocate of adding some kind of
scanf-type fuctionality to Python, so that it can be used for what it is
good for: extracting numbers from fairly rigidly formated text. This can
be done well with string methods. I have a lot of code similar to:

for line in file.xreadlines():
data.append(map(float,line.split()))

This is kind of clunky, and very slow if you have a lot of data. I'd
really like a cleaner, faster way of doing this kind of thing. I know
there are some routines in SciPy that help, but a more native approach
would be nice.

-Chris

--
Christopher Barker,
Ph.D.
ChrisH...@home.net --- --- ---
http://members.home.net/barkerlohmann ---@@ -----@@ -----@@
------@@@ ------@@@ ------@@@
Oil Spill Modeling ------ @ ------ @ ------ @
Water Resources Engineering ------- --------- --------
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------

Andrei Kulakov

unread,

Sep 27, 2001, 1:53:48 PM9/27/01

to

On Thu, 27 Sep 2001 11:32:35 +0000 (UTC), Duncan Booth <dun...@NOSPAMrcp.co.uk> wrote:
> t...@vegeta.ath.cx (Tim Hammerquist) wrote in
> news:slrn9r61o...@vegeta.ath.cx:
>
>> But don't think regex's are disposable just because Python's string type
>> is more convenient. Consider the following:
>>
>> # perl
>> if ($filename =~ /\.([ps]?html?|cgi|php[\d]?|pl)$/) { ... }
>> # python
>> re_web_files = re.compile(r'\.([ps]?html?|cgi|php[\d]?|pl)$')
>> m = re_web_files.search(filename)
>> if m:
>> ...
>>
>> This is a very complicated (but relatively efficient way) to match files
>> with all the folowing extensions:
>> .htm .html .shtm .shtml .phtm .phtml
>> .cgi
>> .php .php2 .php3 .php4
>> .pl
>
> Wouldn't you be happier with this?:
>
> extensions = ['.htm', '.html', '.shtm', '.shtml', '.phtm',
> '.phtml', '.cgi', '.php', '.php2', 'php3', '.php4', '.pl']
> ext = os.path.splitext(filename)[1]
> if ext in extensions:

It would be even better to:
if ext.lower() in extensions:

> ...
>
> which has the arguable advantage of matching what your description says
> instead of what your original code does.
>
> regexes are wonderful: in moderation.

I hate them! :/

Fredrik Lundh

unread,

Sep 27, 2001, 3:14:32 PM9/27/01

to

Andrei Kulakov wrote:
> > regexes are wonderful: in moderation.
>
> I hate them! :/

me too ;-)

</F>

Oleg Broytmann

unread,

Sep 27, 2001, 3:22:01 PM9/27/01

to pytho...@python.org

On Thu, Sep 27, 2001 at 07:14:32PM +0000, Fredrik Lundh wrote:
> > > regexes are wonderful: in moderation.
> >
> > I hate them! :/
>
> me too ;-)
>
> </F>

ROFL :)))))

Tim Hammerquist

unread,

Sep 27, 2001, 6:28:16 PM9/27/01

to

Me parece que Skip Montanaro <sk...@pobox.com> dijo:

>
> Tim> It's not usually easy to learn regexps, no matter what your
> Tim> background. I come from C/C++ roots (Turbo C++ 3.0) and TRS-80
> Tim> BASIC before that, and I certainly had no idea what regex's were
> Tim> really for until I looked at Perl.
>
> I think the best way to learn about regular expressions is to use
> incremental regular expression searching in Emacs/XEmacs. Just bind C-s and
> C-r to isearch-forward-regexp and isearch-backward-regexp. Then, every time
> you search you're using re's. Initially you'll just use plain strings, but
> eventually start mixing in "." and character classes. Before you know it
> "*" and "+" will be your buddies too. Once you start adding "$", "\|" and
> "$" to your repertoire, you will attain enlightenment. ;-)

I used Emacs briefly and couldn't get the hang of it; besides, vi's
command mode keys were just a bit more mnemonic for me.

> You'll generally never cook up complex regular rexpressions using
> incremental search because you have no convenient way to correct mistakes
> and retry, but you will use all the pieces and build up more complex stuff
> when you're programming Perl or Python. Making the leap from Emacs's
> old-style re's to POSIX-style re's as Perl and Python use now is fairly
> straightforward. Mostly it involves getting rid of backslashes and learning
> about {m,n}, \d, \s and other little shortcuts. (I still almost never use
> \d. My fingers just type [0-9] automatically.)

All true.

> maybe-the-best-argument-against-vi-ly, yr's

This I don't understand. Where was the argument against vi? vi (at
least vim) uses regex's for it's search; it just uses '/' and '?'
instead of the C-s/C-r mapping you mentioned. It is also a good way for
Perler's to get used to "old-style" re syntax (ie, the one with a lot of
backslashes). <wink>

Cheers,
Tim
--
The two surviving chocolate people copulate desperately, losing
themselves in a melting frenzy of lust, spending the last of their
brief, borrowed lives in a spasm of raspberry cream and fear.
-- Narrator, The Sandman

Tim Hammerquist

unread,

Sep 27, 2001, 6:42:30 PM9/27/01

to

Me parece que Duncan Booth <dun...@NOSPAMrcp.co.uk> dijo:

The main point of the example was to demonstrate my own peeve (Python's
clumsy re implementation), not to show an example of good idiomatic
Python. As I said, this is a good thing; it keeps
those of us with Perl experience in check. <wink>

You're solution is quite good, and probably one I'd use; in practice, I
would add the .lower() method to the ext var, just as I would add the
re.I flag in the re.compile() statement...unless I wanted to put IIS
servers through Hell. <wink>

> regexes are wonderful: in moderation.

That bore repeating. Thank you. =)

--
Destinations are often a surprise to the destined.
-- Thessaly, The Sandman

Skip Montanaro

unread,

Sep 27, 2001, 7:51:01 PM9/27/01

to ti...@cpan.org, pytho...@python.org

>> maybe-the-best-argument-against-vi-ly, yr's

Tim> This I don't understand. Where was the argument against vi?

I guess you didn't see the implied smiley.

Skip

Greg Ewing

unread,

Sep 27, 2001, 9:31:33 PM9/27/01

to

Tim Hammerquist wrote:
>
> I struggled with regex's for months.

Was it the concepts you found difficult, or the syntax?
The syntax conventionally used for REs is atrocious for
anything beyond the simplest cases, IMO.

It doesn't have to be that way. I used a different
approach in Plex, for example:

http://www.cosc.canterbury.ac.nz/~greg/python/Plex/

--
Greg Ewing, Computer Science Dept, University of Canterbury,
Christchurch, New Zealand
To get my email address, please visit my web page:
http://www.cosc.canterbury.ac.nz/~greg

Tim Hammerquist

unread,

Sep 27, 2001, 9:57:24 PM9/27/01

to

Me parece que Greg Ewing <gr...@cosc.canterbury.ac.nz> dijo:

> Tim Hammerquist wrote:
> >
> > I struggled with regex's for months.
>
> Was it the concepts you found difficult, or the syntax?
> The syntax conventionally used for REs is atrocious for
> anything beyond the simplest cases, IMO.

Actually, it was the concepts, IIRC. I think I head a leg up as far as
syntax was concerned...Perl code makes sense to me. <wink>

The only thing that throws me in syntax is when I swith from
Perl/Python.re syntax to Emacs/Vim syntax. ("Ok, the + uses a backslash,
but the * doesn't. Remember that, self.")

It's kind of like moving from Oregon to Los Angeles: you can do a lot
more, but you're likely to kill brain cells in the process. <wink>

--
We're dumber than squirrels. We hear voices and do
what they command. I have broccoli in my socks.
-- Dilbert's boss

Bruce Dawson

unread,

Sep 27, 2001, 10:39:47 PM9/27/01

to

And a good thing to - otherwise the scanf/regexp job would
be a bit more complicated.

Skip Montanaro

unread,

Sep 27, 2001, 10:39:47 PM9/27/01

to Greg Ewing, pytho...@python.org

Greg> The syntax conventionally used for REs is atrocious for anything
Greg> beyond the simplest cases, IMO.

Yeah, that's why Ping wrote the module he did (can't remember the name at
the moment) that sits as a layer above the old regex module. In my own
stuff I wrote an application-specific higher-level encoding that "compiles"
to regular expressions:

http://musi-cal.mojam.com/help-patterns.shtml

It can generate some truly horrendous regular expressions, and I'm not doing
anything particularly fancy.

Andrew Dalke

unread,

Sep 27, 2001, 11:05:27 PM9/27/01

to

Greg Ewing wrote:
>It doesn't have to be that way. I used a different
>approach in Plex, for example:
>
>http://www.cosc.canterbury.ac.nz/~greg/python/Plex/

Which I corrupted and use for Martel
http://www.dalkescientific.com/Martel/

Without the Plex syntax requires horrendous patterns strings:
http://www.dalkescientific.com/Martel/ebi-talk/img25.htm

:)

Andrew
da...@dalkescientific.com

Steve Clift

unread,

Sep 27, 2001, 11:54:30 PM9/27/01

to

A decade or two ago I wrote a sscanf module for just these reasons. It
used to be buried in the depths of python.org, but I haven't checked on
its whereabouts lately. I mostly use it for converting tabular stuff
produced by other programs - the standard approaches (re, split etc) can
be relatively gruesome for this sort of thing. The version currently at
large has a minor bug or two. If anyone wants an updated version, drop
me a line.

I wasn't aware of the Python-dev suggestion, but it invoked a strong
feeling of "I think I've been here before". When I first cooked this
thing up, I suggested to Guido that 'string' / 'format' -> [list] was an
obvious move. He wasn't impressed with the notion, but I don't recall
why.

-Steve

Boyd Roberts

unread,

Sep 28, 2001, 4:37:43 AM9/28/01

to

"Tim Hammerquist" <t...@vegeta.ath.cx> a écrit dans le message news: slrn9r7ap...@vegeta.ath.cx...

> Where was the argument against vi?

err, there's an argument _for_ vi?

nah, use sam:

http://netlib.bell-labs.com/magic/netlib_find?db=0&pat=sam+pike

Stefan Schwarzer

unread,

Sep 28, 2001, 3:29:38 PM9/28/01

to

Hi Tim

my posting is not really related to the scanf discussion but maybe helpful ...

Tim Hammerquist wrote:
> or, much more preferably:
>
> if filename[-4:] == '.txt':
> ...

Since (I think:) Python 2.0 it's possible to use

if filename.endswith('.txt'):
...

which is less error-prone if the string is a bit longer.

Stefan

Tim Hammerquist

unread,

Sep 28, 2001, 5:59:43 PM9/28/01

to

Me parece que Stefan Schwarzer <s.sch...@ndh.net> dijo:

..and much more maintainable. You no longer must change the index when
the length of the string changes. Very nice. I hadn't been
familiarized with this method. Thank you for bringing it up.

> Stefan

Tim Hammerquist
--
I would feel infinitely more comfortable in your presence if you would agree
to treat Gravity as a Law, rather than one of a number suggested options.
-- Barnabas, The Sandman

Ralph Corderoy

unread,

Sep 29, 2001, 7:09:03 AM9/29/01

to

Hi Tim,

> > For Perl hackers it is easy to figure out regexp, but for us old
> > C/C++ types, it's *tough*
>
> It's not usually easy to learn regexps, no matter what your
> background. I come from C/C++ roots (Turbo C++ 3.0) and TRS-80 BASIC
> before that, and I certainly had no idea what regex's were really for
> until I looked at Perl.
>
> I struggled with regex's for months. I even had to take some time
> away from Perl and regex's to calm down and not be so intimidated.
> Many Pythonistas I've heard in this ng had a lot of difficulty with
> regular expressions, and with *good reason*.

An easy, sure fire way for any programmer to learn regex concepts is to
read Kernighan and Plauger's _Software Tools_ where, amongst many other
interesting topics, they implement a regex pattern matcher and preceed
to use it in their versions of grep, ed, etc.

This gives you an immediate understanding of how conceptually simple
the implementation is and also makes issues like greediness, or
pathological patterns, simple to understand.

> Regex's are useful and powerful. But they're also very easy to
> abuse. I've actually seen the following Perl code:
>
> if ($filename =~ /\.txt$/) { ... }
>

> ...
>
> or, much more preferably:
>
> if filename[-4:] == '.txt':

Overall, the Perl code's better. It didn't have to hard-code the
length of the string.

Cheers,

Ralph.

PS. Every programmer should read _Software Tools_ anyway.

Stefan Schwarzer

unread,

Sep 29, 2001, 11:05:51 AM9/29/01

to

Hello Tim

Tim Hammerquist wrote:
> > > or, much more preferably:
> > >
> > > if filename[-4:] == '.txt':
> > > ...
> >
> > Since (I think:) Python 2.0 it's possible to use
> >
> > if filename.endswith('.txt'):
> > ...
> >
> > which is less error-prone if the string is a bit longer.
>
> ..and much more maintainable. You no longer must change the index when
> the length of the string changes.

Yes, I meant that by "less error-prone". Up to four characters are yet
_relatively_ simply to see, but with more characters it is easy to mismatch
the string and the number.

Because endswith exists, this is the better solution, of course. But otherwise,
it could anyhow also be defined like

def endswith(s, part):
return s[ -len(part) : ] == part

> Very nice. I hadn't been
> familiarized with this method. Thank you for bringing it up.

:-)

Stefan

ch...@atlantis.rift

unread,

Sep 29, 2001, 11:47:49 AM9/29/01

to

>> Regex's are useful and powerful. But they're also very easy to
>> abuse. I've actually seen the following Perl code:
>>
>> if ($filename =~ /\.txt$/) { ... }
>>
>> ...
>>
>> or, much more preferably:
>>
>> if filename[-4:] == '.txt':
>
>Overall, the Perl code's better. It didn't have to hard-code the
>length of the string.

ext = '.txt'
if filename[-len(ext):] == ext:

etc...

Julian

Grant Edwards

unread,

Sep 29, 2001, 12:06:23 PM9/29/01

to

if filename.endswith(".txt"):

--
Grant Edwards grante Yow! I have many CHARTS
at and DIAGRAMS...
visi.com

Malcolm Tredinnick

unread,

Sep 29, 2001, 12:19:41 PM9/29/01

to pytho...@python.org

Or
if filename.endswith(ext):
# etc...

which is the reason the startswith() and endswith() string methods
exist. :-)

Cheers,
Malcolm

--
On the other hand, you have different fingers.

Fredrik Lundh

unread,

Sep 29, 2001, 12:05:55 PM9/29/01

to

Ralph Corderoy wrote:
> > or, much more preferably:
> >
> > if filename[-4:] == '.txt':
>
> Overall, the Perl code's better. It didn't have to hard-code the
> length of the string.

if it matters, use endswith instead:

if filename.endswith('.txt'):
...

</F>

Aahz Maruch

unread,

Sep 30, 2001, 5:56:26 PM9/30/01

to

In article <Xns91297E5A710...@127.0.0.1>,

Duncan Booth <dun...@rcp.co.uk> wrote:
>t...@vegeta.ath.cx (Tim Hammerquist) wrote in
>news:slrn9r61o...@vegeta.ath.cx:
>

>Wouldn't you be happier with this?:
>
> extensions = ['.htm', '.html', '.shtm', '.shtml', '.phtm',
> '.phtml', '.cgi', '.php', '.php2', 'php3', '.php4', '.pl']
> ext = os.path.splitext(filename)[1]
> if ext in extensions:
> ...
>
>which has the arguable advantage of matching what your description says
>instead of what your original code does.

Well, if you're going to do that, extensions should be a dict for real
speed. ;-)

>regexes are wonderful: in moderation.

'Some people, when confronted with a problem, think "I know, I'll use regular
expressions". Now they have two problems.' --Jamie Zawinski, comp.lang.emacs
--
--- Aahz <*> (Copyright 2001 by aa...@pobox.com)

Hugs and backrubs -- I break Rule 6 http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

Aahz Maruch

unread,

Sep 30, 2001, 5:59:44 PM9/30/01

to

In article <mailman.1001607930...@python.org>,

Skip Montanaro <sk...@pobox.com> wrote:
>
>You'll generally never cook up complex regular rexpressions using
>incremental search because you have no convenient way to correct mistakes
>and retry, but you will use all the pieces and build up more complex stuff
>when you're programming Perl or Python. Making the leap from Emacs's
>old-style re's to POSIX-style re's as Perl and Python use now is fairly
>straightforward. Mostly it involves getting rid of backslashes and learning
>about {m,n}, \d, \s and other little shortcuts. (I still almost never use
>\d. My fingers just type [0-9] automatically.)
>
>maybe-the-best-argument-against-vi-ly, yr's

Actually, what I usually do with vi is open up another editor window to
write my regex and then cut'n'paste to the window where I'm doing real
work. That way I *do* have a convenient way to correct mistakes in
complex regexes. Here's a cutie:

:s/($[0-9][0-9][0-9]$) *$[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$/\1-\2/

Aahz Maruch

unread,

Sep 30, 2001, 6:04:52 PM9/30/01

to

In article <9p4a4f$kdd$1...@inputplus.demon.co.uk>,

Ralph Corderoy <ra...@inputplus.demon.co.uk> wrote:
>
>An easy, sure fire way for any programmer to learn regex concepts is to
>read Kernighan and Plauger's _Software Tools_ where, amongst many other
>interesting topics, they implement a regex pattern matcher and preceed
>to use it in their versions of grep, ed, etc.

The problem is that this gives a narrow view of regexes and doesn't show
the full power available in complex regex languages in Perl/Python.
Much better is _Mastering Regular Expressions_ by Jeffrey Friedl, though
it badly needs freshening.

Duncan Booth

unread,

Oct 1, 2001, 5:06:14 AM10/1/01

to

aa...@panix.com (Aahz Maruch) wrote in news:9p84ea$cd7$1...@panix3.panix.com:

>>Wouldn't you be happier with this?:
>>
>> extensions = ['.htm', '.html', '.shtm', '.shtml', '.phtm', '.phtml',
>> '.cgi', '.php', '.php2', 'php3', '.php4', '.pl'] ext =
>> os.path.splitext(filename)[1] if ext in extensions: ...
>>
>>which has the arguable advantage of matching what your description says
>>instead of what your original code does.
>
> Well, if you're going to do that, extensions should be a dict for real
> speed. ;-)

Not until you have profiled the code and determined that this particular
lookup is a bottleneck, and that your proposed improvement actually has
some benefit. Get the code right first, then get it fast.

I knocked up a quick test program (attached at the end of the post) to do
just that and the results are perhaps surprising:
testdict: 1.290, each loop 161.2uS match 6000, nomatch 2000
testlist: 1.304, each loop 163.0uS match 6000, nomatch 2000
testregex: 0.071, each loop 8.9uS match 6000, nomatch 2000
testlist1: 0.162, each loop 20.2uS match 6000, nomatch 2000

The first time around the dictionary lookup actually came out slower than
the list: I had to optimise the extensions.has_key out of the loop to make
it faster. The real killer though is the call to splitext; replacing that
with a home rolled one that doesn't actually work correctly in all cases,
but does a good enough job for this situation brings the time down
significantly but the regex still wins.

The moral is, never jump to conclusions over speed in Python.

---- test.py ----
import time
import re

TESTFILES = []
TESTEXT = ('.htm', '.html', '.shtm', '.shtml', '.phtm',
'.phtml', '.cgi', '.php', '.php2', '.php3', '.php4', '.pl',
'.py', '.txt', '.exe', '')
TESTBASE = ('foo', 'bar', '/usr/local/bin/foo', '/usr/local/bin/bar',
'd:\\Program Files\\Silly.dir\\fred')
for filename in TESTBASE:
for ext in TESTEXT:
TESTFILES.append(filename + ext)
LOOPS = 100

def testregex():
match, nomatch = 0, 0

re_web_files = re.compile(r'\.([ps]?html?|cgi|php[\d]?|pl)$')

start = time.clock()
for i in range(LOOPS):
for filename in TESTFILES:

m = re_web_files.search(filename)
if m:

match += 1
else:
nomatch += 1
stend = time.clock()
results("testregex", start, stend, match, nomatch)

def testlist():
match, nomatch = 0, 0

extensions = ['.htm', '.html', '.shtm', '.shtml', '.phtm',

'.phtml', '.cgi', '.php', '.php2', '.php3', '.php4', '.pl']
from os.path import splitext
start = time.clock()
for i in range(LOOPS):
for filename in TESTFILES:
ext = splitext(filename)[1]
if ext in extensions:
match += 1
else:
nomatch += 1
stend = time.clock()
results("testlist", start, stend, match, nomatch)

def testlist1():
match, nomatch = 0, 0

extensions = ['.htm', '.html', '.shtm', '.shtml', '.phtm',

'.phtml', '.cgi', '.php', '.php2', '.php3', '.php4', '.pl']
def splitext(filename):
'''This doesn't always correctly split the extension,
but within the requirements of this test it will work, as it will
give nomatch
in the cases where it fails.'''
pos = filename.rfind('.')
if pos >= 0:
return (filename[:pos], filename[pos:])
return (filename, '')

start = time.clock()
for i in range(LOOPS):
for filename in TESTFILES:
ext = splitext(filename)[1]
if ext in extensions:
match += 1
else:
nomatch += 1
stend = time.clock()

results("testlist1", start, stend, match, nomatch)

def testdict():
match, nomatch = 0, 0
extensions = {'.htm':0, '.html':0, '.shtm':0, '.shtml':0, '.phtm':0,
'.phtml':0, '.cgi':0, '.php':0, '.php2':0, '.php3':0, '.php4':0,
'.pl':0}
from os.path import splitext
matches = extensions.has_key
start = time.clock()
for i in range(LOOPS):
for filename in TESTFILES:
ext = splitext(filename)[1]
if matches(ext):
match += 1
else:
nomatch += 1
stend = time.clock()

results("testdict", start, stend, match, nomatch)

def results(fn, start, stend, match, nomatch):
print "%-12s %7.3f, each loop %5.1fuS" % (fn+':', stend-start, (stend-
start)*1000000/(match+nomatch)),
print "match %d, nomatch %d" % (match, nomatch)

if __name__=='__main__':
testdict()
testlist()
testregex()
#
testlist1()
---- end of test.py ----
--
Duncan Booth dun...@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Toby Dickenson

unread,

Oct 1, 2001, 5:42:30 AM10/1/01

to

"Fredrik Lundh" <fre...@pythonware.com> wrote:

That wont match any .TXT files. Whether thats a problem depends on
whether you use Windows or not.....

Toby Dickenson
tdick...@geminidataloggers.com

Ralph Corderoy

unread,

Oct 1, 2001, 6:23:16 AM10/1/01

to

Hi,

> > An easy, sure fire way for any programmer to learn regex concepts
> > is to read Kernighan and Plauger's _Software Tools_ where, amongst
> > many other interesting topics, they implement a regex pattern
> > matcher and preceed to use it in their versions of grep, ed, etc.
>
> The problem is that this gives a narrow view of regexes and doesn't
> show the full power available in complex regex languages in
> Perl/Python.

But that's part of its appeal. By just sticking with concatenation,
repetition, and alternation you can see through the regex syntax to the
implementation and the matching engine. Everything from then on is
either syntax enhancements, e.g. r+ == rr*, or stuff that you can
easily learn because you've a good grounding.

> Much better is _Mastering Regular Expressions_ by Jeffrey Friedl,
> though it badly needs freshening.

I've read that and I think it goes too deep too soon and takes too many
pages to do it. It is also cluttered in having to state all the
exceptions at every turn, e.g. `but not in egrep'. _Software Tools_
gives sufficient grounding to a programmer in a chapter.

Also, like you say, a large part of _Mastering Regular Expressions_ is
out of date since Python and Perl have moved on. Given how much hassle
JF said it was to right in the first place I wouldn't expect to see him
queing up to right the 2nd Ed. :-)

Cheers,

Ralph.

Tim Hammerquist

unread,

Oct 1, 2001, 4:04:30 PM10/1/01

to

Me parece que Toby Dickenson <tdick...@devmail.geminidataloggers.co.uk> dijo:

Wouldn't be the first time Windows was a thorn in my side. But:

filename = 'AUTOEXEC.BAT'
if filename.lower().endswith('.bat'):
# things and stuff

> Toby Dickenson
> tdick...@geminidataloggers.com

Tim Hammerquist
--
"Supported" is an MS term that means the function exists. The fact
that it always fails means that it is an exercise for the programmer.
-- Sarathy, p5p

Quinn Dunkan

unread,

Oct 3, 2001, 11:48:07 PM10/3/01

to

On Wed, 26 Sep 2001 22:21:05 GMT, Andrei Kulakov <si...@optonline.net> wrote:

>On Wed, 26 Sep 2001 05:42:45 GMT, Bruce Dawson <comm...@cygnus-software.com> wrote:
>> I love programming in Python, but there are some things I have not found
>> the easy way to do. I understand that Python is supposed to be good at
>> text parsing, but I am having trouble with this simple task. Given this
>> text (the output from VisualC++) I want to find out how many errors and
>> warnings there were:
>>
>> smtpmail.exe - 0 error(s), 0 warning(s)
>

>Seeing that there should only be 2 words here that are numbers, you can do
>this:
>
>my_str # that's where the result is
>words = my_str.split()
>lst = []
>for word in words:
> try:
> result = int(word)
> lst.append(result)
> except:
> pass
>
>errors = lst[0]
>warnings = lst[1]

More compactly:

errors, warnings = [ int(w) for w in line.split() if w.isdigit() ]

Hans-Peter Jansen

unread,

Oct 4, 2001, 11:25:30 AM10/4/01

to

Skip Montanaro <sk...@pobox.com> wrote in message news:<mailman.1001607930...@python.org>...
> Tim> It's not usually easy to learn regexps, no matter what your
> Tim> background. I come from C/C++ roots (Turbo C++ 3.0) and TRS-80
> Tim> BASIC before that, and I certainly had no idea what regex's were
> Tim> really for until I looked at Perl.
>
> I think the best way to learn about regular expressions is to use
> incremental regular expression searching in Emacs/XEmacs. Just bind C-s and
> C-r to isearch-forward-regexp and isearch-backward-regexp. Then, every time
> you search you're using re's. Initially you'll just use plain strings, but
> eventually start mixing in "." and character classes. Before you know it
> "*" and "+" will be your buddies too. Once you start adding "$", "\|" and
> "$" to your repertoire, you will attain enlightenment. ;-)

>
> You'll generally never cook up complex regular rexpressions using
> incremental search because you have no convenient way to correct mistakes
> and retry, but you will use all the pieces and build up more complex stuff
> when you're programming Perl or Python. Making the leap from Emacs's
> old-style re's to POSIX-style re's as Perl and Python use now is fairly
> straightforward. Mostly it involves getting rid of backslashes and learning
> about {m,n}, \d, \s and other little shortcuts. (I still almost never use
> \d. My fingers just type [0-9] automatically.)
>
> maybe-the-best-argument-against-vi-ly, yr's

Well, yesterday, I tried to parse some simple hexdump, produced by
tcpdump -xs1500 port 80. The idea was, filter the hexcodes, and display
and 7 bit acsii codes like a little advanced hex monitors do.

As I'm fairly new to advanced regex constructs, would somebody enlight
me, how to efficiently parse lines like:

2067 726f 7570 732e 2e2e 3c2f 613e 3c2f
666f 6e74 3e3c 2f74 643e 3c2f 7472 3e3c
7472 3e3c 7464 2062 6763 6f6c 6f72 3d23
6666 6363 3333 2063 6f6c 7370 616e 3d34
3e3c 494d 4720 6865 6967 6874 3d31 2073
7263 3d22 2f69 6d61 6765 732f 636c 6561
7264 6f74 2e67 6966 2220 7769 6474 683d
3120 3e3c 2f74 643e 3c2f 7472 3e3c 2f74
6162 6c65 3e3c 703e 3c66 6f6e 7420 7369
7a65 3d2d 313e 4172 6520 796f 7520 6120

with respect to varying column numbers. I will refrain to
show my stupid beginnings, but I wasn't able to get that _one_
regex right, with all columns in matchobj.groups() listed.

new-in-regexing-ly, yr's
Hans-Peter

P.S.: I ended up in a "simple" c based filter...
Please CC me

George Demmy

unread,

Oct 4, 2001, 12:54:05 PM10/4/01

to h...@urpla.net

h...@urpla.net (Hans-Peter Jansen) writes:
> Well, yesterday, I tried to parse some simple hexdump, produced by
> tcpdump -xs1500 port 80. The idea was, filter the hexcodes, and display
> and 7 bit acsii codes like a little advanced hex monitors do.
>
> As I'm fairly new to advanced regex constructs, would somebody enlight
> me, how to efficiently parse lines like:
>
> 2067 726f 7570 732e 2e2e 3c2f 613e 3c2f
> 666f 6e74 3e3c 2f74 643e 3c2f 7472 3e3c
> 7472 3e3c 7464 2062 6763 6f6c 6f72 3d23
> 6666 6363 3333 2063 6f6c 7370 616e 3d34
> 3e3c 494d 4720 6865 6967 6874 3d31 2073
> 7263 3d22 2f69 6d61 6765 732f 636c 6561
> 7264 6f74 2e67 6966 2220 7769 6474 683d
> 3120 3e3c 2f74 643e 3c2f 7472 3e3c 2f74
> 6162 6c65 3e3c 703e 3c66 6f6e 7420 7369
> 7a65 3d2d 313e 4172 6520 796f 7520 6120
>
> with respect to varying column numbers. I will refrain to
> show my stupid beginnings, but I wasn't able to get that _one_
> regex right, with all columns in matchobj.groups() listed.
>
> new-in-regexing-ly, yr's
> Hans-Peter
>
> P.S.: I ended up in a "simple" c based filter...
> Please CC me

Hi Hans-Peter,

You're asking how to use a regex to parse your hexdump, with an eye
towards displaying the ascii representation. I don't know if regex is
what you want to do the latter. Here is some example code that might
help.

import re

hexpat = re.compile ('[a-f0-9]{4}')

# your first line of the hexdump, stripped

line = '2067 726f 7570 732e 2e2e 3c2f 613e 3c2fp'
hexpat.search (line).span ()

-> (0, 4)

hexpat.search (line[4:])

-> (1, 5)

As to the getting your ascii...

import operator

def hex2ascii (hexstr):
"""hex2ascii (hexstr) -> ascii rep of 4 character hex string"""
# error checking here, please!
return chr (int (hexstr[:2], 16)) + chr (int (hexstr[2:], 16))

# slurp your hexdump by line (your example is stored in hexdat, by line)
# stripping off the leading whitespace

hexdat = map (lambda x: x.strip (), open ("dumpfile").readlines ())

for i in hexdat:
print i, reduce (operator.add, map (hex2ascii, i.split ()))

->
2067 726f 7570 732e 2e2e 3c2f 613e 3c2f groups...</a></
666f 6e74 3e3c 2f74 643e 3c2f 7472 3e3c font></td></tr><
7472 3e3c 7464 2062 6763 6f6c 6f72 3d23 tr><td bgcolor=#
6666 6363 3333 2063 6f6c 7370 616e 3d34 ffcc33 colspan=4
3e3c 494d 4720 6865 6967 6874 3d31 2073 ><IMG height=1 s
7263 3d22 2f69 6d61 6765 732f 636c 6561 rc="/images/clea
7264 6f74 2e67 6966 2220 7769 6474 683d rdot.gif" width=
3120 3e3c 2f74 643e 3c2f 7472 3e3c 2f74 1 ></td></tr></t
6162 6c65 3e3c 703e 3c66 6f6e 7420 7369 able><p><font si
7a65 3d2d 313e 4172 6520 796f 7520 6120 ze=-1>Are you a

Hope this helps, and critique most welcome...

G
--
George Demmy
Layton Graphics, Inc
Marietta, Georgia

Skip Montanaro

unread,

Oct 4, 2001, 10:14:41 PM10/4/01

to Hans-Peter Jansen, pytho...@python.org

Hans-Peter> Well, yesterday, I tried to parse some simple hexdump,
Hans-Peter> produced by tcpdump -xs1500 port 80. The idea was, filter
Hans-Peter> the hexcodes, and display and 7 bit acsii codes like a
Hans-Peter> little advanced hex monitors do.

Hans-Peter> As I'm fairly new to advanced regex constructs, would
Hans-Peter> somebody enlight me, how to efficiently parse lines like:

Hans-Peter> 2067 726f 7570 732e 2e2e 3c2f 613e 3c2f
Hans-Peter> 666f 6e74 3e3c 2f74 643e 3c2f 7472 3e3c
Hans-Peter> 7472 3e3c 7464 2062 6763 6f6c 6f72 3d23
Hans-Peter> 6666 6363 3333 2063 6f6c 7370 616e 3d34
Hans-Peter> 3e3c 494d 4720 6865 6967 6874 3d31 2073
Hans-Peter> 7263 3d22 2f69 6d61 6765 732f 636c 6561
Hans-Peter> 7264 6f74 2e67 6966 2220 7769 6474 683d
Hans-Peter> 3120 3e3c 2f74 643e 3c2f 7472 3e3c 2f74
Hans-Peter> 6162 6c65 3e3c 703e 3c66 6f6e 7420 7369
Hans-Peter> 7a65 3d2d 313e 4172 6520 796f 7520 6120

Hans-Peter> with respect to varying column numbers. I will refrain to
Hans-Peter> show my stupid beginnings, but I wasn't able to get that
Hans-Peter> _one_ regex right, with all columns in matchobj.groups()
Hans-Peter> listed.

I'm not sure quite what you're looking for, but this data is so regular I
wouldn't use regular expressions to parse it (no pun intended).

Assuming the above stream is coming in on stdin and I wanted to display
any printable ASCII characters, I'd start with something like this:

import sys

for line in sys.stdin.readlines():
line = line.strip()
fields = line.split()
printing = []
for pair in fields:
first = chr(int(pair[:2], 16))
second = chr(int(pair[2:], 16))
if first < " " or first > "~":
first = "."
if second < " " or second > "~":
second = "."
printing.extend([first, second])
print line, "".join(printing)

The above hex data fed to this code produces

2067 726f 7570 732e 2e2e 3c2f 613e 3c2f groups...</a></
666f 6e74 3e3c 2f74 643e 3c2f 7472 3e3c font></td></tr><
7472 3e3c 7464 2062 6763 6f6c 6f72 3d23 tr><td bgcolor=#
6666 6363 3333 2063 6f6c 7370 616e 3d34 ffcc33 colspan=4
3e3c 494d 4720 6865 6967 6874 3d31 2073 ><IMG height=1 s
7263 3d22 2f69 6d61 6765 732f 636c 6561 rc="/images/clea
7264 6f74 2e67 6966 2220 7769 6474 683d rdot.gif" width=
3120 3e3c 2f74 643e 3c2f 7472 3e3c 2f74 1 ></td></tr></t
6162 6c65 3e3c 703e 3c66 6f6e 7420 7369 able><p><font si
7a65 3d2d 313e 4172 6520 796f 7520 6120 ze=-1>Are you a

on stdout.