Pattern matching within ,, and ^^ parser

Carlo Zanziba

unread,

Sep 11, 2012, 3:27:58 AM9/11/12

to

Hello,

Suppose I have such string

str="first. second; third. lost? maybe! no"

I'd want to turn every first character, and every non-blank character
after dot, question and exclamative marks to capital, so that it turns to

First. Second; third. Lost? Maybe! No

--------------
As add-on, consider that any inner text enclosed into double quotes
should be capitalized, e.g.

str="first. second; third. \"the prince of arabia\" lost? maybe! no"

turned to

First. Second; Third. "The Prince Of Arabia" lost? Maybe! No

but this may prove impossible.
--------------

I tried with a figure like:

shopt -s extglob
str2=${str^$pat}

where pat="+(\.|\?|\!|^)*( )+([a-z])"

but it didn't work; my attermp is to match every combination of
dot + any space + any letter
? + any space + any letter
! + any space + any letter
start of line + any space + any letter.

The problem is that ${parameter^^pattern} seems to work with definite
patterns, like abc or whatever, but if you use compound parameters,
such as [ab][tr] (for any combination of ab with tr, e.g. at, ar, bt,
br), for instance, this doesn't work.

Where is my mistake? Or I am trying to get blood from a rock?

Thanks.

-- Carlo

Thomas 'PointedEars' Lahn

unread,

Sep 11, 2012, 5:36:34 AM9/11/12

to

Carlo Zanziba wrote:

> Suppose I have such string
>
> str="first. second; third. lost? maybe! no"
>
> I'd want to turn every first character, and every non-blank character
> after dot, question and exclamative marks to capital, so that it turns to
>
> First. Second; third. Lost? Maybe! No
>
> --------------
> As add-on, consider that any inner text enclosed into double quotes
> should be capitalized, e.g.
>
> str="first. second; third. \"the prince of arabia\" lost? maybe! no"
>
> turned to
>
> First. Second; Third. "The Prince Of Arabia" lost? Maybe! No
>
> but this may prove impossible.

If you call a function that only works on the quoted part and prints the
result of the transformation, then it should be possible even with only the
shell.

> --------------
>
> I tried with a figure like:
>
> shopt -s extglob
> str2=${str^$pat}
>
> where pat="+(\.|\?|\!|^)*( )+([a-z])"
>
> but it didn't work; my attermp is to match every combination of
> dot + any space + any letter
> ? + any space + any letter
> ! + any space + any letter
> start of line + any space + any letter.
>
> The problem is that ${parameter^^pattern} seems to work with definite
> patterns, like abc or whatever,

More specifically, it works with patterns that match a *single* *letter*.

> but if you use compound parameters, such as [ab][tr] (for any combination
> of ab with tr, e.g. at, ar, bt, br), for instance, this doesn't work.
>
> Where is my mistake?

| Case modification. This expansion modifies the case of alphabetic
| characters in parameter. The pattern is expanded to produce a pattern
| just as in pathname expansion. The ^ operator converts lowercase letters
^^^^^^^
| matching pattern to uppercase […]
^^^^^^^^^^^^^^^^

Note: Letters, _not_ strings. Your pattern matches at least *two* letters,
so it *never* matches a *single* letter. extglob does not help you there.

> Or I am trying to get blood from a rock?

In a manner of speaking. Use ${…/…}, sed, awk, or perl instead.

Next time, state the shell you are using, please.

--
PointedEars

Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.

Janis Papanagnou

unread,

Sep 11, 2012, 5:39:35 AM9/11/12

to

Am 11.09.2012 09:27, schrieb Carlo Zanziba:
> Hello,
>
> Suppose I have such string
>
> str="first. second; third. lost? maybe! no"
>
> I'd want to turn every first character, and every non-blank character
> after dot, question and exclamative marks to capital, so that it turns to
>
> First. Second; third. Lost? Maybe! No

awk '{ while (match($0, /(^|[.?!])[[:blank:]]*[[:lower:]]/))
$0 = substr($0,1,RSTART-1) toupper(substr($0,RSTART,RLENGTH))
substr($0,RSTART+RLENGTH)
print
}'

Janis

Ed Morton

unread,

Sep 11, 2012, 1:28:18 PM9/11/12

to

Simplest (and maybe only) thing is just to do it one char at a time:

$ cat file

first. second; third. "the prince of arabia" lost? maybe! no

$ cat tst.awk
BEGIN{ FS="" }
{
prev = lastNonBlank = ""
for (i=1; i<=NF; i++) {

curr = $i

if (curr == "\"") {
inQuotes = !inQuotes
}

if ( (prev == "") ||
(lastNonBlank ~ /[.?!]/) ||
(inQuotes && (prev !~ /[[:alpha:]]/)) ) {
curr = toupper(curr)
}

if ( curr !~ /[[:blank:]]/ ) {
lastNonBlank = curr
}
prev = curr

printf "%c", curr
}
print ""
}
$ awk -f tst.awk file
First. Second; third. "The Prince Of Arabia" lost? Maybe! No

Regards,

Ed.

Posted using www.webuse.net

Carlo Zanziba

unread,

Sep 12, 2012, 2:13:19 AM9/12/12

to

Thanks everybody. I was afraid my supposition was too far from what bash
could give me.

In any case, I appreciated your efforts.

Cheers

Carlo

Ed Morton

unread,

Sep 12, 2012, 6:21:24 AM9/12/12

to

On 9/12/2012 1:13 AM, Carlo Zanziba wrote:
> Thanks everybody. I was afraid my supposition was too far from what bash could
> give me.
>
> In any case, I appreciated your efforts.

You did see that I gave you a working solution, right?

Ed.

Thomas 'PointedEars' Lahn

unread,

Sep 12, 2012, 9:46:45 AM9/12/12

to

A solution that requires more than bash (it requires awk). That said, I
still think it can be done in bash, but less efficiently than in awk &
friends.

Thomas 'PointedEars' Lahn

unread,

Sep 12, 2012, 10:13:54 AM9/12/12

to

Thomas 'PointedEars' Lahn wrote:

> Ed Morton wrote:
>> On 9/12/2012 1:13 AM, Carlo Zanziba wrote:
>>> Thanks everybody. I was afraid my supposition was too far from what bash
>>> could give me.
>>>
>>> In any case, I appreciated your efforts.
>> You did see that I gave you a working solution, right?
>
> A solution that requires more than bash (it requires awk). That said, I
> still think it can be done in bash, but less efficiently than in awk &
> friends.

Proof of concept:

str="first. second; third. lost? maybe! no"

IFS_BAK=$IFS
IFS='.?!'

for sentence in $str
do
sentence=${sentence##[[:space:]]}
sentence_uppercase=${sentence^[a-z]}
str=${str/$sentence/$sentence_uppercase}
done

IFS=$IFS_BAK

$ printf '%s\n' "$str"

First. Second; third. Lost? Maybe! No

$ bash --version
GNU bash, version 4.2.37(1)-release (i486-pc-linux-gnu)
[…]

Ed Morton

unread,

Sep 12, 2012, 1:06:09 PM9/12/12

to

On 9/12/2012 8:46 AM, Thomas 'PointedEars' Lahn wrote:
> Ed Morton wrote:
>
>> On 9/12/2012 1:13 AM, Carlo Zanziba wrote:
>>> Thanks everybody. I was afraid my supposition was too far from what bash
>>> could give me.
>>>
>>> In any case, I appreciated your efforts.
>>
>> You did see that I gave you a working solution, right?
>
> A solution that requires more than bash (it requires awk).

I've yet to see a bash installation that didn't come with awk. Also bash, like
all shells, is just an environment from which to call tools. Forcing yourself
to stick with only bash builtins or whatever you think constitutes "doing it
in bash" is just pointless (no offense).

That said, I
> still think it can be done in bash, but less efficiently than in awk &
> friends.
>

Well, yes, I expect with enough effort anything you can do using sed, grep,
awk, etc. could be done "in bash" but why bother?

Ed.

<OT>
P.S. Apologies to anyone who's recently received a personal email from me in
response to a usenet posting. Thunderbird recently changed their interface for
responding to NGs such that instead of "Reply" replying to the posting in
usenet, "Reply" now replies to the email address of the poster and you need to
click on "Followup" instead of "Reply" for your reply to go to usenet. So I'm
now constantly finding myself replying to people by email instead of replying
on usenet and then having to go copy/paste the email I sent into a NG
"Followup". It's absolutely infuriating and I think after 15 or so years of
using Netscape/Thunderbird I'm going to have to abandon it as a newsreader as
the odds of me ever remembering to go look for a button other than Reply, even
if I wanted to, are pretty remote.
</OT>

Posted using www.webuse.net

Janis Papanagnou

unread,

Sep 12, 2012, 2:26:01 PM9/12/12

to

On 12.09.2012 19:06, Ed Morton wrote:
>
> <OT>
> P.S. Apologies to anyone who's recently received a personal email from me in
> response to a usenet posting. Thunderbird recently changed their interface for
> responding to NGs such that instead of "Reply" replying to the posting in
> usenet, "Reply" now replies to the email address of the poster and you need to
> click on "Followup" instead of "Reply" for your reply to go to usenet. So I'm
> now constantly finding myself replying to people by email instead of replying
> on usenet and then having to go copy/paste the email I sent into a NG
> "Followup". It's absolutely infuriating and I think after 15 or so years of
> using Netscape/Thunderbird I'm going to have to abandon it as a newsreader as
> the odds of me ever remembering to go look for a button other than Reply, even
> if I wanted to, are pretty remote.
> </OT>

Stumbled into that as well, but noticed that you can change the GUI interface
of Thunderbird; right-click context-menu "customize", and drag the annoying
button onto the window box. Fixed it for me.
That said; switching to another newsreader might be advantageous nonetheless.

Janis

Kaz Kylheku

unread,

Sep 12, 2012, 4:01:50 PM9/12/12

to

On 2012-09-12, Janis Papanagnou <janis_pa...@hotmail.com> wrote:
> Stumbled into that as well, but noticed that you can change the GUI interface
> of Thunderbird; right-click context-menu "customize", and drag the annoying
> button onto the window box. Fixed it for me.
> That said; switching to another newsreader might be advantageous nonetheless.

You two, of all people, should be using true "Unixy" newsreader of some sort.

Sheesh ... :)

Ed Morton

unread,

Sep 12, 2012, 5:21:02 PM9/12/12

to

I'm open to any suggestions and I've tried lynx and rn but lynx is pretty clunky
and I've never successfully chanted the right magical incantations to get rn to
work, e.g. from "rn"

Connecting to
EXPORT_NNTPSERVER_TO_SPECIFY_SERVER_NAME...EXPORT_NNTPSERVER_TO_SPECIFY_SERVER_NAME:
Unknown host.
failed.
[Type space to continue]
[Type space to continue] [Type space to continue] [Type space to continue]
Couldn't open any newsrc groups. Is your access file ok?

I'm not looking to debug that, just demonstrating one of the issues. I'm happy
enough with Thunderbird (and webuse.net when I can't use that) assuming I can
get rid of their infuriating new "Followup" button (or get rid of the "Reply"
one if I'm forced to keep "Followup") by following Janis' instructions.

Ed.

Posted using www.webuse.net

Kaz Kylheku

unread,

Sep 12, 2012, 6:11:10 PM9/12/12

to

On 2012-09-12, Ed Morton <morto...@gmail.com> wrote:
> Kaz Kylheku <k...@kylheku.com> wrote:
>
>> On 2012-09-12, Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>> > Stumbled into that as well, but noticed that you can change the GUI interface
>> > of Thunderbird; right-click context-menu "customize", and drag the annoying
>> > button onto the window box. Fixed it for me.
>> > That said; switching to another newsreader might be advantageous nonetheless.
>>
>> You two, of all people, should be using true "Unixy" newsreader of some sort.
>>
>> Sheesh ... :)
>
> I'm open to any suggestions and I've tried lynx and rn but lynx is pretty clunky
> and I've never successfully chanted the right magical incantations to get rn to
> work, e.g. from "rn"

I last used plain rn in about 1993 on an HP-UX 9 machine.

Around that time I also compile "strn" and got it running: rn extended with
threading (trn) and scoring (strn). So for a little while I used that.

I loved the thread navigation UI in [s]trn, where you would see where you
could see a small slice of the discussion thread as a kind of ASCII diagram:

(1)+-<2>
\-[3]+-[4]
\-[5]

or something like that and move around in the tree.

I haven't kept up with the history. Is it just "rn" now, or did it stay forever
forked into "trn" and "strn".

For years and years now, though, I have been using slrn.

Even if these programs had no other inherent advantages, just being able to use
a decent text editor is worth it.

Chris F.A. Johnson

unread,

Sep 12, 2012, 6:16:36 PM9/12/12

to

On 2012-09-12, Ed Morton wrote:
>
> On 9/12/2012 8:46 AM, Thomas 'PointedEars' Lahn wrote:
>> Ed Morton wrote:
>>
>>> On 9/12/2012 1:13 AM, Carlo Zanziba wrote:
>>>> Thanks everybody. I was afraid my supposition was too far from what bash
>>>> could give me.
>>>>
>>>> In any case, I appreciated your efforts.
>>>
>>> You did see that I gave you a working solution, right?
>>
>> A solution that requires more than bash (it requires awk).
>
> I've yet to see a bash installation that didn't come with awk. Also bash, like
> all shells, is just an environment from which to call tools. Forcing yourself
> to stick with only bash builtins or whatever you think constitutes "doing it
> in bash" is just pointless (no offense).

Nonsense.

> That said, I
>> still think it can be done in bash, but less efficiently than in awk &
>> friends.
>>
>
> Well, yes, I expect with enough effort anything you can do using sed, grep,
> awk, etc. could be done "in bash" but why bother?

Because it is orders of magnitude more efficient to do it in the shell.

--
Chris F.A. Johnson, author <http://shell.cfajohnson.com/>
===================================================================
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress)

Ed Morton

unread,

Sep 12, 2012, 6:19:25 PM9/12/12

to

From work where I use UNIX the most:

$ slrn
slrn 0.9.6.2 (May 22 2000 11:03:47)
You need to set the NNTPSERVER environment variable to your server name.
Example (csh): setenv NNTPSERVER my.news.server
slrn fatal error:
Unable to select server/post object.

which just reminded me that I can't access any external email or news server
from work so I'm still going to be using webuse.net from there anyway.

From home I use cygwin for UNIXy stuff:

$ slrn
bash: slrn: command not found

I daresay I could find it a cygwin version of slrn but I expect I'd still have
to go through more setup steps than I care to and at the end of the day I simply
like using the same tool for email and netnews so I'm not going to do any more
investigating unless I find I can no longer make Thunderbird behave reasonably.

Thanks Kaz and Janis for the responses and tips.

Ed.

Posted using www.webuse.net

Ed Morton

unread,

Sep 12, 2012, 6:42:11 PM9/12/12

to

Chris F.A. Johnson <cfajo...@gmail.com> wrote:

> On 2012-09-12, Ed Morton wrote:
> >
> > On 9/12/2012 8:46 AM, Thomas 'PointedEars' Lahn wrote:
> >> Ed Morton wrote:
> >>
> >>> On 9/12/2012 1:13 AM, Carlo Zanziba wrote:
> >>>> Thanks everybody. I was afraid my supposition was too far from what bash
> >>>> could give me.
> >>>>
> >>>> In any case, I appreciated your efforts.
> >>>
> >>> You did see that I gave you a working solution, right?
> >>
> >> A solution that requires more than bash (it requires awk).
> >
> > I've yet to see a bash installation that didn't come with awk. Also bash, like
> > all shells, is just an environment from which to call tools. Forcing yourself
> > to stick with only bash builtins or whatever you think constitutes "doing it
> > in bash" is just pointless (no offense).
>
> Nonsense.
>
> > That said, I
> >> still think it can be done in bash, but less efficiently than in awk &
> >> friends.
> >>
> >
> > Well, yes, I expect with enough effort anything you can do using sed, grep,
> > awk, etc. could be done "in bash" but why bother?
>
> Because it is orders of magnitude more efficient to do it in the shell.
>

Theory and practice are identical in theory but in practice they are quite
different.

1) Any script you write will almost certainly run "fast enough" in awk (or
whatever), and
2) Medium-to-Large shell scripts tend to enforce poor architectural
choices so while individual commands may be faster than their awk (or
whatever) equivalent, the script constructed from them probably won't be
noticeably faster, if at all.
3) If you need it to run that fast, just do it in C or some other compiled
language and it'll probably be faster than your shell script and better structured.

If you're just talking about one-liners like:

$ x="foo/bar"; echo "${x%/*}"
foo

instead of:

$ x="foo/bar"; echo "$x" | sed 's?/.*??'
foo

then I'd do that mainly because it's briefer rather than faster but whatever you
prefer and in any case that wasn't the context in this thread.

If you'd care to write a bash-builtins-only equivalent of the posted awk
script so we can compare their performance on a 1,000-line file, that'd be
great, although the awk one runs in about half a second on a file like
that so chances are my point "1" above applies anyway to the OPs real situation.

Ed.

Posted using www.webuse.net

Janis Papanagnou

unread,

Sep 12, 2012, 8:38:34 PM9/12/12

to

Yeah, and I really feel bad about my laziness to switch. Decades ago I used
nn, and I loved it. At some point the company I worked for at that time made
it impossible to use nn any more (don't recall the details, but I think a
system update made nn ineffective). Switched to... - was it r/tin? - another
NNTP client then; not as fine as nn. Anyway... With the advent of companies
and management that forced us to use WinDOS, all that became history, sadly.
Then with Linux/GNU, Thunderbird worked out of the box, other mail and news
clients had some issues in my environment (originally SuSE, <shudder>). Last
time I tried nn on Xubuntu there was again some issue (too many dependencies,
or something else? Don't recall). Thunderbird is a compromise. Works for me.

Janis

>
> Sheesh ... :)
>

Ed Morton

unread,

Sep 12, 2012, 10:38:14 PM9/12/12

to

So.... when I moved the "Reply" button off my window I was left with just the
"Followup" button which posted my response to usenet which is what I wanted.
Eureka you might think but sadly no. When I went to read my email afterwards I
found that the "Reply" button was gone from not just usenet but from my email
too and all I had left was the "Reply All" button. A bit of investigation turned
up that what I REALLY had left was the inappropriately named Thunderbird "Smart
Reply" button which was deciding to appear as "Forward" for usenet and "Reply
All" by default for emails with multiple recipients. I can click on a drop-down
next to it to find "Reply".

So, unless someone can tell me how to get rid of that "Smart Reply" button I
have a choice between 2 evils - keep the "Reply" button and remember to look for
a "Smart Reply/Forward" button instead of "Reply" when responding to usenet, or
remove the "Reply" button and remember to click on the drop-down next to "Smart
Reply/Reply All" to find plain old "Reply" when responding to email with
multiple recipients.

Ah, I fondly remember about a week ago before the latest Thunderbird update when
"Reply" just did what I wanted for email and usenet....

I'm following up with a Thunderbird community forum post but not real
optimistic. Thanks for the pointers.

Ed.

Ed Morton

unread,

Sep 12, 2012, 10:51:08 PM9/12/12

to

Eureka! If I disable every button on the email header window using "Customize"
and then enable them on the "Mail Toolbar" instead then I get the old
functionality back! Now I'm safe until the next Thunderbird update. Sorry for
the OT postings.

Ed.

Martin Vaeth

unread,

Sep 13, 2012, 2:55:52 AM9/13/12

to

Ed Morton <morto...@gmail.com> wrote:
>
> Theory and practice are identical in theory but in practice they are quite
> different.
>
> 1) Any script you write will almost certainly run "fast enough" in awk (or
> whatever), and
> 2) Medium-to-Large shell scripts tend to enforce poor architectural
> choices so while individual commands may be faster than their awk (or
> whatever) equivalent, the script constructed from them probably won't be
> noticeably faster, if at all.

There are several other problems with awk than only the speed:

1. It is not that standardized; probably there are also versions with
various kind of bugs in the wild.

2. There are several built-in limitations (depending on the awk version).

So, for a more complex task, I would recommend to use immediately a language
which does not have these problems (e.g. perl, python).

> 3) If you need it to run that fast, just do it in C or some other compiled
> language and it'll probably be faster than your shell script and better
> structured.
>
> If you're just talking about one-liners like:
>
> $ x="foo/bar"; echo "${x%/*}"
> foo
>
> instead of:
>
> $ x="foo/bar"; echo "$x" | sed 's?/.*??'
> foo
>
> then I'd do that mainly because it's briefer rather than faster

In such cases, it really is a considerable speed issue.
What is even worse is that the one-liner with external calls is
error-prune. A trivial example is x='-n' in your code.
A less trivial example is that even if you avoid "echo" you
get the wrong result without additional measurements:

x="foo

/bar" && y=`printf '%s' "${x}"| sed 's?/.*??'`

The trailing two "\n" are lost in y. So what you have to do
is use something like

y=`printf '%sX' "${x}" | sed 's?/.*??'`
y=${y%X}

Martin Vaeth

unread,

Sep 13, 2012, 2:58:02 AM9/13/12

to

Ed Morton <morto...@gmail.com> wrote:
>
> Theory and practice are identical in theory but in practice they are quite
> different.
>
> 1) Any script you write will almost certainly run "fast enough" in awk (or
> whatever), and
> 2) Medium-to-Large shell scripts tend to enforce poor architectural
> choices so while individual commands may be faster than their awk (or
> whatever) equivalent, the script constructed from them probably won't be
> noticeably faster, if at all.

There are several other problems with awk than only the speed:

1. It is not that standardized; probably there are also versions with
various kind of bugs in the wild.

2. There are several built-in limitations (depending on the awk version).

So, for a more complex task, I would recommend to use immediately a language
which does not have these problems (e.g. perl, python).

> 3) If you need it to run that fast, just do it in C or some other compiled
> language and it'll probably be faster than your shell script and better
> structured.
>
> If you're just talking about one-liners like:
>
> $ x="foo/bar"; echo "${x%/*}"
> foo
>
> instead of:
>
> $ x="foo/bar"; echo "$x" | sed 's?/.*??'
> foo
>
> then I'd do that mainly because it's briefer rather than faster

In such cases, it really is a considerable speed issue.
What is even worse is that the one-liner with external calls is
error-prune. A trivial example is x='-n' in your code.
A less trivial example is that even if you avoid "echo" you
get the wrong result without additional measurements:

x="foo

/bar" && y=`printf '%s' "${x}"| sed 's?/.*??'`

The trailing two "\n" are lost in y. So what you have to do
is use something like

y=`printf '%s' "${x}" | sed 's?/.*??' ; echo X`
y=${y%X}

Carlo Zanziba

unread,

Sep 13, 2012, 3:02:05 AM9/13/12

to

Sure. It works, and fast!

Thanks.

Carlo

>
> Ed.

Martin Vaeth

unread,

Sep 13, 2012, 3:02:40 AM9/13/12

to

Ed Morton <morto...@gmail.com> wrote:
>
> Theory and practice are identical in theory but in practice they are quite
> different.
>
> 1) Any script you write will almost certainly run "fast enough" in awk (or
> whatever), and
> 2) Medium-to-Large shell scripts tend to enforce poor architectural
> choices so while individual commands may be faster than their awk (or
> whatever) equivalent, the script constructed from them probably won't be
> noticeably faster, if at all.

There are several other problems with awk than only the speed:

1. It is not that standardized; probably there are also versions with
various kind of bugs in the wild.

2. There are several built-in limitations (depending on the awk version).

So, for a more complex task, I would recommend to use immediately a language
which does not have these problems (e.g. perl, python).

> 3) If you need it to run that fast, just do it in C or some other compiled
> language and it'll probably be faster than your shell script and better
> structured.
>
> If you're just talking about one-liners like:
>
> $ x="foo/bar"; echo "${x%/*}"
> foo
>
> instead of:
>
> $ x="foo/bar"; echo "$x" | sed 's?/.*??'
> foo
>
> then I'd do that mainly because it's briefer rather than faster

In such cases, it really is a considerable speed issue.
What is even worse is that the one-liner with external calls is

error-prone. A trivial example is x='-n' in your code.

Janis Papanagnou

unread,

Sep 13, 2012, 3:23:33 AM9/13/12

to

On 13.09.2012 09:02, Martin Vaeth wrote:
> Ed Morton <morto...@gmail.com> wrote:
>>
>> Theory and practice are identical in theory but in practice they are quite
>> different.
>>
>> 1) Any script you write will almost certainly run "fast enough" in awk (or
>> whatever), and
>> 2) Medium-to-Large shell scripts tend to enforce poor architectural
>> choices so while individual commands may be faster than their awk (or
>> whatever) equivalent, the script constructed from them probably won't be
>> noticeably faster, if at all.
>
> There are several other problems with awk than only the speed:
>
> 1. It is not that standardized; probably there are also versions with
> various kind of bugs in the wild.

In both of those respects we have more issues with shells than with awk.

>
> 2. There are several built-in limitations (depending on the awk version).

Well, mainly the old Solaris awk has, but there you should and can use
the XPG version instead. What other limitations are you thinking of?

>
> So, for a more complex task, I would recommend to use immediately a language
> which does not have these problems (e.g. perl, python).

*cough* - Can't speak for python, but given the evolution of perl versions
I don't know what you are thinking here.

I agree with you that for more complex tasks shells or awk may most likely
not be the preferred way to go.

Janis

>
>> [...]

Martin Vaeth

unread,

Sep 13, 2012, 4:14:12 AM9/13/12

to

Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>>
>> There are several other problems with awk than only the speed:
>>
>> 1. It is not that standardized; probably there are also versions with
>> various kind of bugs in the wild.
>
> In both of those respects we have more issues with shells than with awk.

For shells, almost all current systems follow POSIX.
I am not sure how it is with awk versions (e.g. in busybox).
I had some bad experiences, but do not remember details since this
was many years ago: As mentioned, if I have a more complex task,
I prefer to use a more powerful language like zsh or perl.

>> 2. There are several built-in limitations (depending on the awk version).
>
> Well, mainly the old Solaris awk has, but there you should and can use
> the XPG version instead. What other limitations are you thinking of?

Various buffer/array sizes, maximal string length, number of
open files, etc.
IIRC, these "random" limitations were one of the motivations
to develop perl.

>> So, for a more complex task, I would recommend to use immediately a
>> language which does not have these problems (e.g. perl, python).
>
> *cough* - Can't speak for python, but given the evolution of perl versions
> I don't know what you are thinking here.

It is hard to find an old perl script of the "style" and complexity of
typical awk scripts which breaks with a newer perl version.
I had a huge collection of such scripts with perl3 (the oldest version
which I know), and only for one I got once a warning that some
tricky inline expansion of @ might go wrong with perl4.
With perl4->perl5 I had never any issues.
perl6 might be a different story, of course, if it should ever be finished,
but this will probably never "replace" perl3-5 completely.
Sure, theoretically there are several corner cases, but I think the
perl community did a great job to make sure that "normal" programs
are not affected.

Ed Morton

unread,

Sep 13, 2012, 8:49:05 AM9/13/12

to

On 9/13/2012 2:02 AM, Martin Vaeth wrote:
> Ed Morton <morto...@gmail.com> wrote:
>>
>> Theory and practice are identical in theory but in practice they are quite
>> different.
>>
>> 1) Any script you write will almost certainly run "fast enough" in awk (or
>> whatever), and
>> 2) Medium-to-Large shell scripts tend to enforce poor architectural
>> choices so while individual commands may be faster than their awk (or
>> whatever) equivalent, the script constructed from them probably won't be
>> noticeably faster, if at all.
>
> There are several other problems with awk than only the speed:
>
> 1. It is not that standardized;

There's a POSIX standard for awk
(http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html) and the
differences between shells are far greater than the differences between newer
awks. As long as it's not heavily using the few extensions specific to GNU awk,
any awk script will work just fine on any other newer awk (e.g. gawk, nawk,
/usr/xpg4/bin/awk) with VERY minor tweaks in rare situations.

> probably there are also versions with various kind of bugs in the wild.

The only buggy version I know of is old, broken awk (/usr/bin/awk on Solaris).

> 2. There are several built-in limitations (depending on the awk version).

I've never hit a built in limitation in any awk except old, broken awk. Maybe if
I wrote really sloppy scripts that don't close files as I'm done with them or
unnecessarily fill up arrays with gigabytes of data I would, I don't know, it's
just never come up.

I think maybe you haven't really looked awk since having a bad experience with
old, broken awk some time in the past.

> So, for a more complex task, I would recommend to use immediately a language
> which does not have these problems (e.g. perl, python).

But they have other problems such as not being available on every UNIX box by
default and a syntax very different from the Algol-based languages that most
programmers are familiar with.

IMHO if you're parsing text files, use awk. If you need to do file or process
operations or other things the shell does well, THEN either use a mixture of
shell+awk or switch to perl. We're getting kinda off the subject here though -
we were discussing whether or not it makes sense to write large-ish scripts
using bash built-ins rather than awk, perl, etc. not whether to use awk vs perl.

>> 3) If you need it to run that fast, just do it in C or some other compiled
>> language and it'll probably be faster than your shell script and better
>> structured.
>>
>> If you're just talking about one-liners like:
>>
>> $ x="foo/bar"; echo "${x%/*}"
>> foo
>>
>> instead of:
>>
>> $ x="foo/bar"; echo "$x" | sed 's?/.*??'
>> foo
>>
>> then I'd do that mainly because it's briefer rather than faster
>
> In such cases, it really is a considerable speed issue.

That's fine but my point is it rarely matters in real world situations. To be
clear: I'm not advocating avoiding using bash built-ins, I'm advocating not
restricting yourself to using bash buitt-ins for scripting.

> What is even worse is that the one-liner with external calls is
> error-prone. A trivial example is x='-n' in your code.

Yes, I should have used printf instead of echo. Old habits...

> A less trivial example is that even if you avoid "echo" you
> get the wrong result without additional measurements:
>
> x="foo
>
> /bar" && y=`printf '%s' "${x}"| sed 's?/.*??'`

You obviously wouldn't use a line-oriented editor to modify a variable that
contains multi-line text.

> The trailing two "\n" are lost in y. So what you have to do
> is use something like
>
> y=`printf '%s' "${x}" | sed 's?/.*??' ; echo X`
> y=${y%X}

OK. Chances are in reality I'd do something completely different if I had
variables containing multi-line text I had to modify but what that would be I
don't know, it'd depend on the context. Like I say, for the normal case I'd use
the bash operation anyway for it's brevity.

I think the OP summed things up best in his most recent post
(https://groups.google.com/d/msg/comp.unix.shell/A0ZdnI-QuKs/IZhLrMx-HMgJ) about
the awk script:

"It works, and fast!"

Regards,

Ed.

Ed Morton

unread,

Sep 13, 2012, 9:06:19 AM9/13/12

to

On 9/13/2012 3:14 AM, Martin Vaeth wrote:
> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>>>
>>> There are several other problems with awk than only the speed:
>>>
>>> 1. It is not that standardized; probably there are also versions with
>>> various kind of bugs in the wild.
>>
>> In both of those respects we have more issues with shells than with awk.
>
> For shells, almost all current systems follow POSIX.

Not sure what that means. There's bash, zsh, ksh, etc. all with significant
difference from each other and within versions of themselves. All modern awks,
on the other hand, are very similar and also follow POSIX.

> I am not sure how it is with awk versions (e.g. in busybox).
> I had some bad experiences, but do not remember details since this
> was many years ago: As mentioned, if I have a more complex task,
> I prefer to use a more powerful language like zsh or perl.

awk is a tool/language for text processing. The above mentioned tools/languages
are not more powerful than awk for text processing, they just provide additional
functionality unrelated to text processing. If the job you need to do is not
text processing then use a tool/language other than or in addition to awk.
Personally I use a mixture of shell plus awk in those situations, but I do
understand others might prefer perl or some other tool for that and maybe there
are some problem domains I personally don't come across where a shell+awk mix is
undesirable.

>>> 2. There are several built-in limitations (depending on the awk version).
>>
>> Well, mainly the old Solaris awk has, but there you should and can use
>> the XPG version instead. What other limitations are you thinking of?
>
> Various buffer/array sizes, maximal string length, number of
> open files, etc.
> IIRC, these "random" limitations were one of the motivations
> to develop perl.

And, I suspect, modern awks. I've never hit a limitation on any of those in a
modern awk. Again, I think you've just had a bad experience some time with old,

broken awk (/usr/bin/awk on Solaris).

>>> So, for a more complex task, I would recommend to use immediately a
>>> language which does not have these problems (e.g. perl, python).
>>
>> *cough* - Can't speak for python, but given the evolution of perl versions
>> I don't know what you are thinking here.
>
> It is hard to find an old perl script of the "style" and complexity of
> typical awk scripts which breaks with a newer perl version.
> I had a huge collection of such scripts with perl3 (the oldest version
> which I know), and only for one I got once a warning that some
> tricky inline expansion of @ might go wrong with perl4.
> With perl4->perl5 I had never any issues.
> perl6 might be a different story, of course, if it should ever be finished,
> but this will probably never "replace" perl3-5 completely.
> Sure, theoretically there are several corner cases, but I think the
> perl community did a great job to make sure that "normal" programs
> are not affected.

The same is true for awk. Old awk scripts will run just fine in newer awks.

Ed.

Martin Vaeth

unread,

Sep 13, 2012, 10:55:38 AM9/13/12

to

Ed Morton <morto...@gmail.com> wrote:
>
> I think maybe you haven't really looked awk since having a bad experience with
> old, broken awk some time in the past.

Yep. Using perl, things just worked, and since perl can also do most shell
tasks very well, it rarely makes no sense to use a cumbersome shell+awk
solution if you can have a more powerful one easily.
As one example, just think of the case that you suddenly realize that you
need a perl-extension for a particular regular expressions...
IMHO, the only argument to use awk is if you have restrictions on the
tools you are allowed to use.

> IMHO if you're parsing text files, use awk.

The perl regular expressions alone are an important improvement over awk.
Not to speak about real grammar support which in perl5 is available
through packages and in perl6 even natively.

>> In such cases, it really is a considerable speed issue.
>
> That's fine but my point is it rarely matters in real world situations.

In these one-line examples used for trivial text transformations,
I do not agree: Usually, these are called in iterated loops, and starting
thousands of subthreads slows down the machine. The problem is not
necessarily the slowdown in your script itself but the non-niceness if
processes with high resources requirements are running simultaneously
(even if the script is nice'd, because of IO and memory load).

>> A less trivial example is that even if you avoid "echo" you
>> get the wrong result without additional measurements:
>>
>> x="foo
>>
>> /bar" && y=`printf '%s' "${x}"| sed 's?/.*??'`
>
> You obviously wouldn't use a line-oriented editor to modify a variable that
> contains multi-line text.

Actually, I wanted to point out that `...` cuts the trailing newline(s).
But you are right: Even sed will already "fail" with this input, i.e.
the "solution" which I proposed does not work (I had not tried it).
This shows even more what I said: Using external tools is very error-prone.

In most cases, you do not even know whether your variable contains
multi-line text; for filenames you should always be aware that this is
possible. On modern systems also filenames in non-valid utf8 might
be a possibility for breakage with external tools.

Thomas 'PointedEars' Lahn

unread,

Sep 13, 2012, 11:08:43 AM9/13/12

to

Ed Morton wrote:

> On 9/12/2012 8:46 AM, Thomas 'PointedEars' Lahn wrote:
>> Ed Morton wrote:
>>> On 9/12/2012 1:13 AM, Carlo Zanziba wrote:
>>>> Thanks everybody. I was afraid my supposition was too far from what
>>>> bash could give me.
>>>>
>>>> In any case, I appreciated your efforts.
>>> You did see that I gave you a working solution, right?
>> A solution that requires more than bash (it requires awk).
>
> I've yet to see a bash installation that didn't come with awk.

Perhaps with *a version of* awk, and therein lies a problem already. In any
case, when you encounter one such system you will be glad that you know how
to do it without awk. BTW, (GNU) bash does not need to mean GNU, and
certainly not GNU awk.

> Also bash, like all shells, is just an environment from which to call
> tools.

No, command shells, in particular sh-based and POSIX-compliant shells which
we are discussing here, are programming languages in their own right, and
capable of string manipulation, among other things.

> Forcing yourself to stick with only bash builtins or whatever you think
> constitutes "doing it in bash" is just pointless (no offense).

The OP is asking for a shell solution. You are the one forcing on them an
awk solution, claiming that not using awk is a problem. I fail to see the
logic in that.

> <OT>
> P.S. Apologies to anyone who's recently received a personal email from me
> in response to a usenet posting.

I do not mind getting e-mails as responses to postings, but good to know
that I do not have to reply to this one.

> Thunderbird recently changed their interface for responding to NGs such
> that instead of "Reply" replying to the posting in usenet, "Reply" now
> replies to the email address of the poster and you need to click on
> "Followup" instead of "Reply" for your reply to go to usenet.

I know that several other people have stumbled on that as well. It should
be reported as a bug. The default reply action for any decent newsreader
should be Follow-up (to newsgroup), not Reply (by e-mail). Google Groups
has this wrong as well, but then again what is not broken at Google Groups?

Free advice: Use KNode for NetNews instead.

Thomas 'PointedEars' Lahn

unread,

Sep 13, 2012, 11:16:28 AM9/13/12

to

Ed Morton wrote:

> On 9/13/2012 3:14 AM, Martin Vaeth wrote:
>> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>>>> There are several other problems with awk than only the speed:
>>>>
>>>> 1. It is not that standardized; probably there are also versions with
>>>> various kind of bugs in the wild.
>>> In both of those respects we have more issues with shells than with awk.
>> For shells, almost all current systems follow POSIX.
>
> Not sure what that means. There's bash, zsh, ksh, etc. all with
> significant difference from each other and within versions of themselves.
> All modern awks, on the other hand, are very similar and also follow
> POSIX.

It means that you can write a POSIX-compliant shell script – like I did –
and all those shells, no matter their special features, will run it, without
extra tools or adaptations.

>> Sure, theoretically there are several corner cases, but I think the
>> perl community did a great job to make sure that "normal" programs
>> are not affected.
>
> The same is true for awk. Old awk scripts will run just fine in newer
> awks.

You appear to have the misconception that there is only one awk.

Ed Morton

unread,

Sep 13, 2012, 12:26:49 PM9/13/12

to

On 9/13/2012 10:08 AM, Thomas 'PointedEars' Lahn wrote:
> Ed Morton wrote:
>
>> On 9/12/2012 8:46 AM, Thomas 'PointedEars' Lahn wrote:
>>> Ed Morton wrote:
>>>> On 9/12/2012 1:13 AM, Carlo Zanziba wrote:
>>>>> Thanks everybody. I was afraid my supposition was too far from what
>>>>> bash could give me.
>>>>>
>>>>> In any case, I appreciated your efforts.
>>>> You did see that I gave you a working solution, right?
>>> A solution that requires more than bash (it requires awk).
>>
>> I've yet to see a bash installation that didn't come with awk.
>
> Perhaps with *a version of* awk, and therein lies a problem already. In any
> case, when you encounter one such system you will be glad that you know how
> to do it without awk. BTW, (GNU) bash does not need to mean GNU, and
> certainly not GNU awk.

I don't believe a UNIX system that doesn't have a newer awk (i.e. not old,
broken awk) available exists and I never said anything about bash meaning GNU,
awk or otherwise.

>> Also bash, like all shells, is just an environment from which to call
>> tools.
>
> No, command shells, in particular sh-based and POSIX-compliant shells which
> we are discussing here, are programming languages in their own right, and
> capable of string manipulation, among other things.

What they're capable of and what they should be used for are 2 different things.
If you have a small shell script doing file movement and/or process creation and
you want to manipulate a string in it then of course you'd use the shell command
to do that, but if you're writing a large program to do text manipulation then
it wouldn't make sense to do it in shell builtins when there are tools/languages
already on your system designed to do the job easier and fast enough.

>> Forcing yourself to stick with only bash builtins or whatever you think
>> constitutes "doing it in bash" is just pointless (no offense).
>
> The OP is asking for a shell solution. You are the one forcing on them an
> awk solution, claiming that not using awk is a problem.

You can call awk from the shell just as you can sed, grep, etc. Posters asking
for a shell solution rarely are asking for a solution only using shell builtins
and even when they are it's usually because they haven't thought about the
alternatives. I'm not claiming that not using awk is a problem, I'm claiming
that using awk is not a problem - there's a big difference.

I fail to see the
> logic in that.
>
>> <OT>
>> P.S. Apologies to anyone who's recently received a personal email from me
>> in response to a usenet posting.
>
> I do not mind getting e-mails as responses to postings, but good to know
> that I do not have to reply to this one.
>
>> Thunderbird recently changed their interface for responding to NGs such
>> that instead of "Reply" replying to the posting in usenet, "Reply" now
>> replies to the email address of the poster and you need to click on
>> "Followup" instead of "Reply" for your reply to go to usenet.
>
> I know that several other people have stumbled on that as well. It should
> be reported as a bug. The default reply action for any decent newsreader
> should be Follow-up (to newsgroup), not Reply (by e-mail). Google Groups
> has this wrong as well, but then again what is not broken at Google Groups?
>
> Free advice: Use KNode for NetNews instead.
>

Thanks for the tip, I'll google KNode next time Thunderbird removes useful
functionality.

Ed.

Aragorn

unread,

Sep 13, 2012, 12:35:21 PM9/13/12

to

On Thursday 13 September 2012 18:26, Ed Morton conveyed the following to
comp.unix.shell...

KNode is the native newsreader application of the KDE desktop
environment. It can be installed by itself but it does require parts of
KDE (kde-base) and the Qt libraries.

While it is very good - I've been faithfully using it since 2000 - it
does have a tendency to barf when trying to reply to a thread with a
long reference header. Insofar as I know, this is a problem which was
introduced in the KDE 4.x version of KNode. Previous versions were not
afflicted.

--
= Aragorn =
(registered GNU/Linux user #223157)

Ed Morton

unread,

Sep 13, 2012, 12:53:35 PM9/13/12

to

On 9/13/2012 9:55 AM, Martin Vaeth wrote:
> Ed Morton <morto...@gmail.com> wrote:
>>
>> I think maybe you haven't really looked awk since having a bad experience with
>> old, broken awk some time in the past.
>
> Yep. Using perl, things just worked, and since perl can also do most shell
> tasks very well, it rarely makes no sense to use a cumbersome shell+awk
> solution if you can have a more powerful one easily.

It's not more powerful if it does the same job. It could have more succinct
language constructs maybe? I've never found it in any way cumbersome to use
shell+awk but I suppose YMMV depending on your application. People do use perl,
after all, so there must be some significant attraction to it given the syntax.

> As one example, just think of the case that you suddenly realize that you
> need a perl-extension for a particular regular expressions...

You never need a perl-extension for an RE. I expect there are probably times
when you could make your code more succinct with an extension but again I've
always been unable to do what I want in awk with it's EREs.

> IMHO, the only argument to use awk is if you have restrictions on the
> tools you are allowed to use.
>
>> IMHO if you're parsing text files, use awk.
>
> The perl regular expressions alone are an important improvement over awk.

Not really, that's just a bit of syntactic sugar AFAIK. I've never missed them
anyway. The only thing related to REs that awk could really stand to have is
back-references in the matched string but that rarely comes up and has a fairly
simple workaround.

> Not to speak about real grammar support which in perl5 is available
> through packages and in perl6 even natively.
>
>>> In such cases, it really is a considerable speed issue.
>>
>> That's fine but my point is it rarely matters in real world situations.
>
> In these one-line examples used for trivial text transformations,
> I do not agree: Usually, these are called in iterated loops, and starting
> thousands of subthreads slows down the machine. The problem is not
> necessarily the slowdown in your script itself but the non-niceness if
> processes with high resources requirements are running simultaneously
> (even if the script is nice'd, because of IO and memory load).

I would also recommend using a shell bultin in that situation, I just don't
recommend writing shell scripts with loops being iterated through thousands of
times in the first place. If you do find yourself in that situation though then
yes, of course you should look at ways to optimize your script, including
changing the way you manipulate strings.

>>> A less trivial example is that even if you avoid "echo" you
>>> get the wrong result without additional measurements:
>>>
>>> x="foo
>>>
>>> /bar" && y=`printf '%s' "${x}"| sed 's?/.*??'`
>>
>> You obviously wouldn't use a line-oriented editor to modify a variable that
>> contains multi-line text.
>
> Actually, I wanted to point out that `...` cuts the trailing newline(s).
> But you are right: Even sed will already "fail" with this input, i.e.
> the "solution" which I proposed does not work (I had not tried it).
> This shows even more what I said: Using external tools is very error-prone.
>
> In most cases, you do not even know whether your variable contains
> multi-line text; for filenames you should always be aware that this is
> possible. On modern systems also filenames in non-valid utf8 might
> be a possibility for breakage with external tools.

Fine, no argument here on relative functionality provided by a bash builtins vs
a pipe to a specific command.

Look, all I'm really saying is that for most real world text processing
applications, you don't need to write a script using shell builtins. Write it in
awk (or perl or ruby or whatever) and it'll be easier to write and will almost
certainly run fast enough for your needs.

Ed.

Janis Papanagnou

unread,

Sep 13, 2012, 2:08:03 PM9/13/12

to

On 13.09.2012 10:14, Martin Vaeth wrote:
> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>>>
>>> There are several other problems with awk than only the speed:
>>>
>>> 1. It is not that standardized; probably there are also versions with
>>> various kind of bugs in the wild.
>>
>> In both of those respects we have more issues with shells than with awk.
>
> For shells, almost all current systems follow POSIX.

The prominent ones do. Nonetheless even some basic constructs aren't
standardised, which result in different behaviour; most prominent
example the "pipe processes executed in subprocess" discrepancy.
And also you would have to limit yourself to the small POSIX subset,
where the shells provide powerful extensions that you want to use,
e.g. to make your programs more efficient and much better readable
and maintainable, not to mention to implement functionality that you
just cannot implement with POSIX shell.

> I am not sure how it is with awk versions (e.g. in busybox).

The basic awk language is quite terse and stable now for decades.
There are a couple useful extensions, e.g. by gawk, but you don't
need to use those to do, say, 98% of your tasks - well, at least I
usually don't need them in most cases; I only use them because they
are quite handy.

> I had some bad experiences, but do not remember details since this
> was many years ago: As mentioned, if I have a more complex task,
> I prefer to use a more powerful language like zsh or perl.

I know the advantages (and disadvantages) of using perl. I fully
understand that a lot people like to use it.

>
>>> 2. There are several built-in limitations (depending on the awk version).
>>
>> Well, mainly the old Solaris awk has, but there you should and can use
>> the XPG version instead. What other limitations are you thinking of?
>
> Various buffer/array sizes, maximal string length, number of
> open files, etc.
> IIRC, these "random" limitations were one of the motivations
> to develop perl.

Don't recall such older awk's limitations, and I've never stumbled
across them.

I know professional perl programmers who find it helpful to read in
the whole data before starting processing; in such cases, with that
approach, one may more often than others encounter buffer limitations
when using some probably old version of any limited text processor.

Janis

> [...]

Janis Papanagnou

unread,

Sep 13, 2012, 2:16:52 PM9/13/12

to

On 13.09.2012 18:35, Aragorn wrote:
>
> KNode is the native newsreader application of the KDE desktop
> environment. It can be installed by itself but it does require parts of
> KDE (kde-base) and the Qt libraries.
>
> While it is very good - I've been faithfully using it since 2000 - it
> does have a tendency to barf when trying to reply to a thread with a
> long reference header. Insofar as I know, this is a problem which was
> introduced in the KDE 4.x version of KNode. Previous versions were not
> afflicted.

This is a funny advertisement. Usually proponents of their Favourite Tools
don't point out that they operate unreliably and should thus, honestly, be
considered crap. :-)

Janis

Thomas 'PointedEars' Lahn

unread,

Sep 13, 2012, 3:48:34 PM9/13/12

to

As we are off-topic already:

FWIW, KNode/4.4.11 has a tendency to crash on *submitting* messages, indeed.
I did not know that this was related to long References, but that makes
sense (where I am subscribed to the threads tend to get long, and I do not
remember it crashing on posting a follow-up to an OP). I have read that
KNode/4.8.x is more stable, but AFAICS it has not arrived in Debian yet.

However, knowing that KNode/4.4.x may occasionally crash then, I save often
and early. KNode's scoring and other NetNews-related capabilities by far
outperform those of Thunderbird, which saves me a lot of free time on
Usenet. So I would not want to switch back to Thunderbird for news (I have
used it primarily for several years in the last decade, certainly when for
reasons that now are beyond me I was using Windows regularly) despite some
bugs KNode undoubtedly has.

I am keeping Thunderbird as a good e-mail client, though. It remains to be
seen if Thunderbird's recent change of development status (Mozilla.com
considers it finished – obviously, I do not –, community volunteers are
supposed to take over) will be positively or negatively affected in
comparison to KMail and KNode, both of which are still actively developed by
the KDE.org people.

Since I do not know where this could be on-topic, F'up2 poster.
Please modify if you know a fitting newsgroup.

Aragorn

unread,

Sep 13, 2012, 8:03:32 PM9/13/12

to

On Thursday 13 September 2012 20:16, Janis Papanagnou conveyed the
following to comp.unix.shell...

I was not /advertising/ KNode. The previous poster indicated that he
didn't know what it was because he said "he'd have to Google it", so I
gave him a brief summary of what it is.

Thomas 'PointedEars' Lahn

unread,

Sep 13, 2012, 8:45:25 PM9/13/12

to

Ed Morton wrote:

> On 9/13/2012 10:08 AM, Thomas 'PointedEars' Lahn wrote:
>> Ed Morton wrote:
>>> On 9/12/2012 8:46 AM, Thomas 'PointedEars' Lahn wrote:
>>>> Ed Morton wrote:
>>>>> On 9/12/2012 1:13 AM, Carlo Zanziba wrote:
>>>>>> Thanks everybody. I was afraid my supposition was too far from what
>>>>>> bash could give me.
>>>>>>
>>>>>> In any case, I appreciated your efforts.
>>>>> You did see that I gave you a working solution, right?
>>>> A solution that requires more than bash (it requires awk).
>>> I've yet to see a bash installation that didn't come with awk.
>> Perhaps with *a version of* awk, and therein lies a problem already. In
>> any case, when you encounter one such system you will be glad that you
>> know how to do it without awk. BTW, (GNU) bash does not need to mean
>> GNU, and certainly not GNU awk.
>
> I don't believe a UNIX system that doesn't have a newer awk (i.e. not old,
> broken awk) available exists and I never said anything about bash meaning
> GNU, awk or otherwise.

Your beliefs are of secondary interest here; individual experiences are not
representative. What matters it is that it can happen and has happened, and
you will be better off knowing alternatives to awk than not knowing them
then. Preferably those that do not involve other tools; every tool is a
dependency – dependencies are bad[tm].

>>> Also bash, like all shells, is just an environment from which to call
>>> tools.
>> No, command shells, in particular sh-based and POSIX-compliant shells
>> which we are discussing here, are programming languages in their own
>> right, and capable of string manipulation, among other things.
>
> What they're capable of and what they should be used for are 2 different
> things. If you have a small shell script doing file movement and/or
> process creation and you want to manipulate a string in it then of course
> you'd use the shell command to do that, but if you're writing a large
> program to do text manipulation then it wouldn't make sense to do it in
> shell builtins when there are tools/languages already on your system
> designed to do the job easier and fast enough.

Well, when I have a string value in a shell variable and want to manipulate
that string like here, I try to use shell builtins first. I do *not* printf
the quoted variable value so that I can pipe it into awk, let awk do the
wasteful job of processing one line, and then write awk's stdout back to a
shell variable. KISS. I have showed how simple and still efficient it can
be.

>>> Forcing yourself to stick with only bash builtins or whatever you think
>>> constitutes "doing it in bash" is just pointless (no offense).
>> The OP is asking for a shell solution. You are the one forcing on them
>> an awk solution, claiming that not using awk is a problem.
>
> You can call awk from the shell just as you can sed, grep, etc. Posters
> asking for a shell solution rarely are asking for a solution only using
> shell builtins and even when they are it's usually because they haven't
> thought about the alternatives.

Or because they do not care because they have with the command shell a
rather decent programming language already, and are posting to the newsgroup
dealing with *shell* scripting. You are projecting.

> I'm not claiming that not using awk is a problem, I'm claiming that using
> awk is not a problem - there's a big difference.

And I say it is. Your problem appears to be that all you know about shell
scripting is not shell scripting at all but awk scripting, so you use and
recommend only awk – in a rather unnecessary cryptic way, I might add – to
solve problems, regardless whether it is necessary to use awk or awk is the
right tool for the job.

This is comp.unix.*shell*, after all, so the possibilities of using shell
syntax and shell builtins should be first on a regular's mind. If you want
to discuss awk, there is comp.lang.*awk* instead. If you want to discuss
perl, there is comp.lang.*perl*.ALL. And so on.

Janis Papanagnou

unread,

Sep 14, 2012, 2:58:15 AM9/14/12

to

On 14.09.2012 02:03, Aragorn wrote:
> On Thursday 13 September 2012 20:16, Janis Papanagnou conveyed the
> following to comp.unix.shell...
>> On 13.09.2012 18:35, Aragorn wrote:

>>>[...]
>>> While it is very good - I've been faithfully using it since 2000 - [...]
>>
>> This is a funny advertisement. [...] :-)
>
> I was not /advertising/ KNode. [...]

If that statement above was not an advertising statement, what shall I
believe. Anyway...

Janis

Aragorn

unread,

Sep 14, 2012, 7:29:53 AM9/14/12

to

On Friday 14 September 2012 08:58, Janis Papanagnou conveyed the

following to comp.unix.shell...

> On 14.09.2012 02:03, Aragorn wrote:
>> On Thursday 13 September 2012 20:16, Janis Papanagnou conveyed the
>> following to comp.unix.shell...
>>> On 13.09.2012 18:35, Aragorn wrote:
>>>>[...]
>>>> While it is very good - I've been faithfully using it since 2000 -
>>>> [...]
>>>
>>> This is a funny advertisement. [...] :-)
>>
>> I was not /advertising/ KNode. [...]
>
> If that statement above was not an advertising statement, what shall I
> believe. Anyway...

It was a neutral summary, and my attempt to express that. I do choose
my words poorly every now and then. Has to do with the fact that I'm
both autistic and someone who speaks multiple different languages on an
almost daily basis.

I do not tell lies. You may trust me on my word. I'm a man of honor.

Ed Morton

unread,

Sep 14, 2012, 1:02:27 PM9/14/12

to

You appear to be advocating avoiding using anything because on some system
you'll probably never use something could be missing. Brilliant! I fail to see
how coding in bash and then porting it to a machine that's OS is ksh88i would be
any better.

> >>> Also bash, like all shells, is just an environment from which to call
> >>> tools.
> >> No, command shells, in particular sh-based and POSIX-compliant shells
> >> which we are discussing here, are programming languages in their own
> >> right, and capable of string manipulation, among other things.
> >
> > What they're capable of and what they should be used for are 2 different
> > things. If you have a small shell script doing file movement and/or
> > process creation and you want to manipulate a string in it then of course
> > you'd use the shell command to do that, but if you're writing a large
> > program to do text manipulation then it wouldn't make sense to do it in
> > shell builtins when there are tools/languages already on your system
> > designed to do the job easier and fast enough.
>
> Well, when I have a string value in a shell variable and want to manipulate
> that string like here, I try to use shell builtins first. I do *not* printf
> the quoted variable value so that I can pipe it into awk, let awk do the
> wasteful job of processing one line, and then write awk's stdout back to a
> shell variable. KISS. I have showed how simple and still efficient it can
> be.

If you care to check back earlier in this thread, using the shell builtin for
that is exactly what I recommended and I never mentioned awk as an alternative.
I did show an alternative sed solution that I said I would not use.

> >>> Forcing yourself to stick with only bash builtins or whatever you think
> >>> constitutes "doing it in bash" is just pointless (no offense).
> >> The OP is asking for a shell solution. You are the one forcing on them
> >> an awk solution, claiming that not using awk is a problem.
> >
> > You can call awk from the shell just as you can sed, grep, etc. Posters
> > asking for a shell solution rarely are asking for a solution only using
> > shell builtins and even when they are it's usually because they haven't
> > thought about the alternatives.
>
> Or because they do not care because they have with the command shell a
> rather decent programming language already, and are posting to the newsgroup
> dealing with *shell* scripting. You are projecting.

No, it happens VERY frequently that someone posts "how can I do X in shell" or
"how can I do Y with sed" and after being asked they admit that they really
couldn't care less what tools/language the solution uses as long as it's
something they can call from their shell.

> > I'm not claiming that not using awk is a problem, I'm claiming that using
> > awk is not a problem - there's a big difference.
>
> And I say it is. Your problem appears to be that all you know about shell
> scripting is not shell scripting at all but awk scripting, so you use and
> recommend only awk – in a rather unnecessary cryptic way, I might add – to
> solve problems, regardless whether it is necessary to use awk or awk is the
> right tool for the job.

Well, I've been shell scripting for 30 years and awk scripting for 10 so I do
know a bit about shell scripting too and I do on occasion post a response that
isn't related to awk. The reason most of my posts are about awk, though, is that
I tend to only reply to the ones where awk is a reasonable solution as it seems
like that's where I have most to contribute. You may disagree on whether or not
awk is the best tool for any given job, but then you can always post an
alternative and let the OP decide. I do wonder what you think I've posted that's
cryptic though.

> This is comp.unix.*shell*, after all, so the possibilities of using shell
> syntax and shell builtins should be first on a regular's mind. If you want
> to discuss awk, there is comp.lang.*awk* instead. If you want to discuss
> perl, there is comp.lang.*perl*.ALL. And so on.

So you'll be advocating comp.lang.sed, comp.lang.diff, etc. for all the other
commonly used shell commands? No, the first thing on people minds should be how
to get the job done as quickly and succinctly as possible using the commands
available in their shell and that includes grep, diff, sed, awk, etc. Making use
of only shell builtins a priority would be like a construction worker trying to
do everything with a hammer and screwdriver rather than picking up a nail gun
and drill.

Ed.

Posted using www.webuse.net

Janis Papanagnou

unread,

Sep 14, 2012, 5:25:27 PM9/14/12

to

On 14.09.2012 13:29, Aragorn wrote:
>
> I do not tell lies.

Oh my god; how could you believe that I intended to express that?!

Janis

Martin Vaeth

unread,

Sep 15, 2012, 1:39:05 AM9/15/12

to

Janis Papanagnou <janis_pa...@hotmail.com> wrote:
> On 13.09.2012 10:14, Martin Vaeth wrote:
>> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>>>>
>>>> There are several other problems with awk than only the speed:
>>>>
>>>> 1. It is not that standardized; probably there are also versions with
>>>> various kind of bugs in the wild.
>>>
>>> In both of those respects we have more issues with shells than with awk.
>>
>> For shells, almost all current systems follow POSIX.
>
> The prominent ones do. Nonetheless even some basic constructs aren't
> standardised, which result in different behaviour; most prominent
> example the "pipe processes executed in subprocess" discrepancy.
> And also you would have to limit yourself to the small POSIX subset,

So what? If you use incompatible expansions of a particular shell,
this runs in gernal only on the particular shell. Otherwise it is
standardized. So in which sense does that support your sentence

"In both of those respects we have more issues with shells

than with awk."?

> not to mention to implement functionality that you
> just cannot implement with POSIX shell.

For bash such a functionality remains yet to be found
(only candidate I know is <(...), but there are FIFOs).
For zsh there are a few like assignment to USER.
Again: So what? If you really need zsh features, program in zsh...

> Don't recall such older awk's limitations, and I've never stumbled
> across them.

See e.g. Section 11.9 in
http://oreilly.com/catalog/unixnut3/chapter/ch11.html

Some practical limits of particular awk implementations are
mentioned here:
http://computer-programming-forum.com/11-awk/88037ec462264097.htm

Just because your gawk implementation perhaps doesn't have it,
it does not mean that your program works reliable on every machine.
OTOH, with perl you do not have to fear such a thing.

> I know professional perl programmers who find it helpful to read in
> the whole data before starting processing;

There might be cases where this is necessary, but this is not related
with awk vs. perl. Only that it is simpler in perl :)
If several passes are required, this is probably the most reasonable.
But this is admittedly a phenomenon: Since it is easy to do things in
perl in the wrong way, some people do it in perl in the wrong way.

> in such cases, with that
> approach, one may more often than others encounter buffer limitations

The only limitation in perl is the available memory/swap space:
This was one of the main conceptual goals from the very beginning.

Martin Vaeth

unread,

Sep 15, 2012, 2:13:42 AM9/15/12

to

Ed Morton <morto...@gmail.com> wrote:
> On 9/13/2012 9:55 AM, Martin Vaeth wrote:
>>
>> Yep. Using perl, things just worked, and since perl can also do most shell
>> tasks very well, it rarely makes no sense to use a cumbersome shell+awk

s/no//

>> solution if you can have a more powerful one easily.
>
> It's not more powerful if it does the same job.

I meant power of the used tools.
The power of the result is that it is probably finished sooner and,
moreover, easier to modify/extend if necessary.

>> As one example, just think of the case that you suddenly realize that you
>> need a perl-extension for a particular regular expressions...
>
> You never need a perl-extension for an RE.

If you have non-trivial tasks to do you do need.
Actually, even relatively simple tasks like transformation of
certain TeX expressions (in a specified subset of a grammer)
require much more.
Well, you are right in the sense that you can write a whole
parser manually. In this sense you can also claim that you never
need RE in awk.

> I expect there are probably times
> when you could make your code more succinct with an extension but again I've
> always been unable to do what I want in awk with it's EREs.

Just because you have never run into such a situation doesn't mean
it doesn't exist. I needed it several times.

>> The perl regular expressions alone are an important improvement over awk.
>
> Not really, that's just a bit of syntactic sugar AFAIK.

No. Awk is only able to parse regular expressions which can run in a FSM.
Things like nested braces, e.g., which you can do with Perl RE,
are far beyond this.

> I just don't
> recommend writing shell scripts with loops being iterated through thousands of
> times in the first place.

This is not so rare, especially for scripts which are meant to "syncronize"
certain files/dirs/archives in some way. In my experience, it is not unusal
that several thousand files (or at least lines in "control" files) are involved,
and yet the shell may be the appropriate language, because mainly external
calls (copy, link, special utilities for the files) are involved.

Martin Vaeth

unread,

Sep 15, 2012, 2:17:24 AM9/15/12

to

Ed Morton <morto...@gmail.com> wrote:
>
> You appear to be advocating avoiding using anything because on some system
> you'll probably never use something could be missing.

Apparently, you have never written anything which is supposed to be
included in a distribution which should run on thousands of systems.

Aragorn

unread,

Sep 15, 2012, 4:11:28 AM9/15/12

to

On Friday 14 September 2012 23:25, Janis Papanagnou conveyed the
following to comp.unix.shell...

> On 14.09.2012 13:29, Aragorn wrote:
>>
>> I do not tell lies.
>
> Oh my god; how could you believe that I intended to express that?!

I didn't say you did. I was just attempting to clarify myself, since
you mistook my description of KNode as a form of advocacy, even after I
wrote that it was not intended that way.

Janis Papanagnou

unread,

Sep 15, 2012, 7:24:35 AM9/15/12

to

On 15.09.2012 07:39, Martin Vaeth wrote:
> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>> On 13.09.2012 10:14, Martin Vaeth wrote:
>>> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>>>>>
>>>>> There are several other problems with awk than only the speed:
>>>>>
>>>>> 1. It is not that standardized; probably there are also versions with
>>>>> various kind of bugs in the wild.
>>>>
>>>> In both of those respects we have more issues with shells than with awk.
>>>
>>> For shells, almost all current systems follow POSIX.
>>
>> The prominent ones do. Nonetheless even some basic constructs aren't
>> standardised, which result in different behaviour; most prominent
>> example the "pipe processes executed in subprocess" discrepancy.
>> And also you would have to limit yourself to the small POSIX subset,
>
> So what? If you use incompatible expansions of a particular shell,

The "pipe processes executed in subprocess" issue is not standardized
(probably because shells do behave differently; but anyway); it's bad
to have different behaviour and to need to work around that with ugly
constructs to make that work reliable across POSIX shells.

(WRT the limitations by POSIX subset see below.)

> this runs in gernal only on the particular shell. Otherwise it is
> standardized. So in which sense does that support your sentence
> "In both of those respects we have more issues with shells
> than with awk."?

It's the amount of available extensions in shells compared to awk, and
it's the amount of necessary extensions in shells compared to awk.

I think I've explained that already with other words; you can write
most of your programs in a well maintainable way in awk, and you don't
need to use any of the few extensions to do so. To make shell programs
readable by abstaining from non-POSIX extensions is hardly possible in
that way; consider (for example) the ${ // / } or ${ : : } constructs,
just to name some that are available in the prominent shells but not
in the standard.

>
>> not to mention to implement functionality that you
>> just cannot implement with POSIX shell.
>
> For bash such a functionality remains yet to be found
> (only candidate I know is <(...), but there are FIFOs).

The <(...) construct is also depending on support by the underlying OS.
But if supported by the OS it's at least available by all the prominent
shells.

> For zsh there are a few like assignment to USER.
> Again: So what? If you really need zsh features, program in zsh...

Yes. And leaving the standard path.

>
>> Don't recall such older awk's limitations, and I've never stumbled
>> across them.
>
> See e.g. Section 11.9 in
> http://oreilly.com/catalog/unixnut3/chapter/ch11.html

Those, AFAICT, are limits you have depending on OS-restrictions, limits
that you find also in shells, and limits typical for non-contemporary
ancient systems. Anything more specific so that we can judge?

>
> Some practical limits of particular awk implementations are
> mentioned here:
> http://computer-programming-forum.com/11-awk/88037ec462264097.htm

The quoted thread seems to be Windows/DOS oriented. It's beyond me which
limits that environment will impose on the applications.

It would be helpful if you could quote a reference of any actual limits,
so that we can see if any is an OS issue (WinDOS), a historic awk limit,
an academic limit, or actually one that is relevant for awk's of the
last - since you like the comparison with perl - 20/25 years (or so).

>
> Just because your gawk implementation perhaps doesn't have it,
> it does not mean that your program works reliable on every machine.

Here I agree with you.

> OTOH, with perl you do not have to fear such a thing.

Don't compare apples with bananas. Or is there a POSIX or ANSI standard
for perl, as opposed to POSIX shell and POSIX awk?

Janis

> [...]

Janis Papanagnou

unread,

Sep 15, 2012, 7:55:20 AM9/15/12

to

On 15.09.2012 08:13, Martin Vaeth wrote:
> Ed Morton <morto...@gmail.com> wrote:
>> On 9/13/2012 9:55 AM, Martin Vaeth wrote:

[...]

>
>>> As one example, just think of the case that you suddenly realize that you
>>> need a perl-extension for a particular regular expressions...
>>
>> You never need a perl-extension for an RE.
>
> If you have non-trivial tasks to do you do need.

We have to differentiate between Regular Expressions (Language),
extensions to the syntax of regular expressions (\s, {n,m}, etc.
shortcuts), and extensions to the RE language class (non-regular
language supported by "regexp" parsers, like back-references,
matching quotes/brackets counting).

> Actually, even relatively simple tasks like transformation of
> certain TeX expressions (in a specified subset of a grammer)
> require much more.
> Well, you are right in the sense that you can write a whole
> parser manually. In this sense you can also claim that you never
> need RE in awk.

Not only in that sense. (And I suppose Ed hasn't meant it that way
either.)

>
>> I expect there are probably times
>> when you could make your code more succinct with an extension but again I've
>> always been unable to do what I want in awk with it's EREs.
>
> Just because you have never run into such a situation doesn't mean
> it doesn't exist. I needed it several times.

Did, when you needed it, leave the FSM/Chomsky-3 language level then?
I have to suppose so.

The point is; if you needed more than Regular Expressions, say
back-references, then you can use perl or gawk, since both languages
support this non-regular expression extension. But what if you need
context-free parsing (like back-references and bracket counting also
a feature of a Chomsky-2 language), then you are out of luck with
perl (or gawk). Where do you draw the line?

>
>>> The perl regular expressions alone are an important improvement over awk.
>>
>> Not really, that's just a bit of syntactic sugar AFAIK.
>
> No. Awk is only able to parse regular expressions which can run in a FSM.
> Things like nested braces, e.g., which you can do with Perl RE,
> are far beyond this.

See above. Gawk with back-references is already beyond a FSM. Perl
with nested structures counting is also already beyond a FSM. It may
be handy to have such non-regular extension available, undoubtly.
But saying this or that non-regular extension is okay or necessary,
another one is not. Well, I consider this quite arbitrary.

In addition, in this context it should be mentioned that perl regexps
had (still have?) severe performance issues with such constructs, as
I had been informed by a perl professional; depending on the construct
and parsed data you seem to be able to observe exponential time demands.

The fine thing with Regular Expressions is that you can parse in O(N)
complexity. Leaving the Regular Expression class requires to know what
you buy with it, generally, and in the context of the specific parser.

Janis

> [...]

Ed Morton

unread,

Sep 15, 2012, 8:42:30 AM9/15/12

to

No, I have (in fact you almost certainly are using my software on a daily basis)
but some of those boxes have ksh, some have bourne shell, most do not have perl,
etc. so I wrote the non-realtime parts in awk for portability.

Ed.

Thomas 'PointedEars' Lahn

unread,

Sep 18, 2012, 5:37:32 AM9/18/12

to

Yes, that only appears to be so.

> Brilliant! I fail to see how coding in bash and then porting it to a
> machine that's OS is ksh88i would be any better.

Argumentum ad ridiculum.

>> >>> Also bash, like all shells, is just an environment from which to call
>> >>> tools.
>> >> No, command shells, in particular sh-based and POSIX-compliant shells
>> >> which we are discussing here, are programming languages in their own
>> >> right, and capable of string manipulation, among other things.
>> >
>> > What they're capable of and what they should be used for are 2
>> > different things. If you have a small shell script doing file movement
>> > and/or process creation and you want to manipulate a string in it then
>> > of course you'd use the shell command to do that, but if you're writing
>> > a large program to do text manipulation then it wouldn't make sense to
>> > do it in shell builtins when there are tools/languages already on your
>> > system designed to do the job easier and fast enough.
>>
>> Well, when I have a string value in a shell variable and want to
>> manipulate
>> that string like here, I try to use shell builtins first. I do *not*
>> printf the quoted variable value so that I can pipe it into awk, let awk
>> do the wasteful job of processing one line, and then write awk's stdout
>> back to a
>> shell variable. KISS. I have showed how simple and still efficient it
>> can be.
>
> If you care to check back earlier in this thread, using the shell builtin
> for that is exactly what I recommended and I never mentioned awk as an
> alternative. I did show an alternative sed solution that I said I would
> not use.

But *first*, in <news:201209111...@webuse.net>, you recommended to use
awk. You have been insisting to use awk in that subtread ever since.

Only *after* I posted a shell-only solution and Chris F.A. Johnson disputed
your unfounded statement there, you conceded that it could be done in the
shell.

Why?

>> >>> Forcing yourself to stick with only bash builtins or whatever you
>> >>> think constitutes "doing it in bash" is just pointless (no offense).
>> >> The OP is asking for a shell solution. You are the one forcing on
>> >> them an awk solution, claiming that not using awk is a problem.
>> >
>> > You can call awk from the shell just as you can sed, grep, etc. Posters
>> > asking for a shell solution rarely are asking for a solution only using
>> > shell builtins and even when they are it's usually because they haven't
>> > thought about the alternatives.
>>
>> Or because they do not care because they have with the command shell a
>> rather decent programming language already, and are posting to the
>> newsgroup dealing with *shell* scripting. You are projecting.
>
> No, it happens VERY frequently that someone posts "how can I do X in
> shell" or "how can I do Y with sed" and after being asked they admit that
> they really couldn't care less what tools/language the solution uses as
> long as it's something they can call from their shell.

Prove that it happens more frequently than the opposite. Prove that this
newsgroup exists primarily not to discuss the shell, but solutions that
*require* more than the shell. Hint: Read the newsgroup name, tagline and
charter. I have already pointed out that there is a newsgroup for awk
programming, which should indicate to you that awk programming is not the
primary topic of this newsgroup.

>> This is comp.unix.*shell*, after all, so the possibilities of using shell
>> syntax and shell builtins should be first on a regular's mind. If you
>> want to discuss awk, there is comp.lang.*awk* instead. If you want to
>> discuss perl, there is comp.lang.*perl*.ALL. And so on.
>
> So you'll be advocating comp.lang.sed, comp.lang.diff, etc. for all the
> other commonly used shell commands? No, the first thing on people minds
> should be how to get the job done as quickly and succinctly as possible
> using the commands available in their shell and that includes grep, diff,
> sed, awk, etc. Making use of only shell builtins a priority would be like
> a construction worker trying to do everything with a hammer and
> screwdriver rather than picking up a nail gun and drill.

Argument at ridicule again.

Ed Morton

unread,

Sep 18, 2012, 9:56:06 AM9/18/12

to

On 9/18/2012 4:37 AM, Thomas 'PointedEars' Lahn wrote:
> Ed Morton wrote:
>
>> Thomas 'PointedEars' Lahn <Point...@web.de> wrote:

<snip>

>>> Well, when I have a string value in a shell variable and want to
>>> manipulate
>>> that string like here, I try to use shell builtins first. I do *not*
>>> printf the quoted variable value so that I can pipe it into awk, let awk
>>> do the wasteful job of processing one line, and then write awk's stdout
>>> back to a
>>> shell variable. KISS. I have showed how simple and still efficient it
>>> can be.
>>
>> If you care to check back earlier in this thread, using the shell builtin
>> for that is exactly what I recommended and I never mentioned awk as an
>> alternative. I did show an alternative sed solution that I said I would
>> not use.
>
> But *first*, in <news:201209111...@webuse.net>, you recommended to use
> awk. You have been insisting to use awk in that subtread ever since.
>
> Only *after* I posted a shell-only solution and Chris F.A. Johnson disputed
> your unfounded statement there, you conceded that it could be done in the
> shell.
>
> Why?

Nonsense, read the thread again as you're confusing 2 very different things. I
advocated using awk for non-trivial text processing applications but using a
shell construct if you just want to tweak a string in the middle of a shell
script. I never once advocated doing something like echoing a string to awk to
modify the string and then reading the result back into a variable as you're
suggesting I did.

>>>>>> Forcing yourself to stick with only bash builtins or whatever you
>>>>>> think constitutes "doing it in bash" is just pointless (no offense).
>>>>> The OP is asking for a shell solution. You are the one forcing on
>>>>> them an awk solution, claiming that not using awk is a problem.
>>>>
>>>> You can call awk from the shell just as you can sed, grep, etc. Posters
>>>> asking for a shell solution rarely are asking for a solution only using
>>>> shell builtins and even when they are it's usually because they haven't
>>>> thought about the alternatives.
>>>
>>> Or because they do not care because they have with the command shell a
>>> rather decent programming language already, and are posting to the
>>> newsgroup dealing with *shell* scripting. You are projecting.
>>
>> No, it happens VERY frequently that someone posts "how can I do X in
>> shell" or "how can I do Y with sed" and after being asked they admit that
>> they really couldn't care less what tools/language the solution uses as
>> long as it's something they can call from their shell.
>
> Prove that it happens more frequently than the opposite.

Read posts to the NG over the past several years. QED.

Prove that this
> newsgroup exists primarily not to discuss the shell, but solutions that
> *require* more than the shell.

"the shell" does not mean "only shell builtins", it means "the shell".

Hint: Read the newsgroup name, tagline and
> charter.

comp.unix.shell. It's about UNIX shell which includes tools available to call
from the shell. You aren't seriously suggesting we exclude all
non-shell-builtins from discussions in this NG are you? If the NG was named
comp.shell.builtins you might have a case but then it wouldn't matter because
no-one would bother posting to that NG.

I have already pointed out that there is a newsgroup for awk
> programming, which should indicate to you that awk programming is not the
> primary topic of this newsgroup.

There is no NG for sed. sed is not a shell builtin. Therefore by your logic
no-one must ever discuss sed in any NG, right?

>>> This is comp.unix.*shell*, after all, so the possibilities of using shell
>>> syntax and shell builtins should be first on a regular's mind. If you
>>> want to discuss awk, there is comp.lang.*awk* instead. If you want to
>>> discuss perl, there is comp.lang.*perl*.ALL. And so on.
>>
>> So you'll be advocating comp.lang.sed, comp.lang.diff, etc. for all the
>> other commonly used shell commands? No, the first thing on people minds
>> should be how to get the job done as quickly and succinctly as possible
>> using the commands available in their shell and that includes grep, diff,
>> sed, awk, etc. Making use of only shell builtins a priority would be like
>> a construction worker trying to do everything with a hammer and
>> screwdriver rather than picking up a nail gun and drill.
>
> Argument at ridicule again.

You're embarrassing yourself, give it up.

Ed.

Thomas 'PointedEars' Lahn

unread,

Sep 18, 2012, 10:23:31 AM9/18/12

to

That is correct; you actually suggested creating from the string value a
text file to be processed by awk instead, which is even more illogical than
piping the string echo into awk, or using shell built-ins instead. And you
called that "a working solution", showing that you did not understand the
OP's problem correctly in the first place.

>>>>>>> Forcing yourself to stick with only bash builtins or whatever you
>>>>>>> think constitutes "doing it in bash" is just pointless (no offense).
>>>>>> The OP is asking for a shell solution. You are the one forcing on
>>>>>> them an awk solution, claiming that not using awk is a problem.
>>>>> You can call awk from the shell just as you can sed, grep, etc.
>>>>> Posters asking for a shell solution rarely are asking for a solution
>>>>> only using shell builtins and even when they are it's usually because
>>>>> they haven't thought about the alternatives.
>>>> Or because they do not care because they have with the command shell a
>>>> rather decent programming language already, and are posting to the
>>>> newsgroup dealing with *shell* scripting. You are projecting.
>>> No, it happens VERY frequently that someone posts "how can I do X in
>>> shell" or "how can I do Y with sed" and after being asked they admit
>>> that they really couldn't care less what tools/language the solution
>>> uses as long as it's something they can call from their shell.
>> Prove that it happens more frequently than the opposite.
>
> Read posts to the NG over the past several years. QED.

(Fallacy: Shifting the burden of proof.)

A *sound* argumentation does not work that way. *You* claim, *you* prove.
Granted, you have not claimed that the majority of problems would require a
not-shell-only solution; you have used the weasel words "VERY frequently".
Which is fallacious, of course. So I gave you an opportunity here to
quantify the postings and thereby substantiate your argument. You did not
use it.

>> Prove that this newsgroup exists primarily not to discuss the shell, but
>> solutions that *require* more than the shell.
>
> "the shell" does not mean "only shell builtins", it means "the shell".

I concur. And it does not mean "awk only" either. There is the flaw in
your approach and your argument.

>> Hint: Read the newsgroup name, tagline and charter.
>
> comp.unix.shell. It's about UNIX shell which includes tools available to
> call from the shell. You aren't seriously suggesting we exclude all
> non-shell-builtins from discussions in this NG are you?

That is correct, I am not. You are still misconstruing what I said.

> If the NG was named comp.shell.builtins you might have a case but then it
> wouldn't matter because no-one would bother posting to that NG.

Argument at ridicule again.

>> I have already pointed out that there is a newsgroup for awk programming,
>> which should indicate to you that awk programming is not the primary
>> topic of this newsgroup.
>
> There is no NG for sed. sed is not a shell builtin. Therefore by your
> logic no-one must ever discuss sed in any NG, right?

No, because you continue to misconstrue my argument by ignoring the
qualifiers I have always put in there.

>>>> This is comp.unix.*shell*, after all, so the possibilities of using
>>>> shell syntax and shell builtins should be first on a regular's mind.
>>>> If you want to discuss awk, there is comp.lang.*awk* instead. If you
>>>> want to discuss perl, there is comp.lang.*perl*.ALL. And so on.
>>> So you'll be advocating comp.lang.sed, comp.lang.diff, etc. for all the
>>> other commonly used shell commands? No, the first thing on people minds
>>> should be how to get the job done as quickly and succinctly as possible
>>> using the commands available in their shell and that includes grep,
>>> diff, sed, awk, etc. Making use of only shell builtins a priority would
>>> be like a construction worker trying to do everything with a hammer and
>>> screwdriver rather than picking up a nail gun and drill.
>>
>> Argument at ridicule again.
>
> You're embarrassing yourself, give it up.

Argumentum ad hominem, at last.

I am pointing out to you the flaws in your approach and your argumentation
defending it. If I stop doing that, they will still be in there.

Ed Morton

unread,

Sep 18, 2012, 10:48:38 AM9/18/12

to

What the heck are you talking about "creating from the string value a text file
to be processed by awk"??? Also, the OP specifically posted that the awk script
solved his problem (see below) so I think that's a pretty good indication that i
did understand it.

Here's exactly what I said in the posts I think you're referring to:

#On 9/11/2012 12:28 PM, Ed Morton wrote:
# Carlo Zanziba <zanzib...@NOSPAMlibero.it> wrote:
#
#> Hello,
#>
#> Suppose I have such string
#>
#> str="first. second; third. lost? maybe! no"
#>
#> I'd want to turn every first character, and every non-blank character
#> after dot, question and exclamative marks to capital, so that it turns to
#>
#> First. Second; third. Lost? Maybe! No
#>
#> --------------
#> As add-on, consider that any inner text enclosed into double quotes
#> should be capitalized, e.g.
#>
#> str="first. second; third. \"the prince of arabia\" lost? maybe! no"
#>
#> turned to
#>
#> First. Second; Third. "The Prince Of Arabia" lost? Maybe! No
#>
#
# Simplest (and maybe only) thing is just to do it one char at a time:
#
# $ cat file
# first. second; third. "the prince of arabia" lost? maybe! no
# $ cat tst.awk
# BEGIN{ FS="" }
# {
# prev = lastNonBlank = ""
# for (i=1; i<=NF; i++) {
#
# curr = $i
#
# if (curr == "\"") {
# inQuotes = !inQuotes
# }
#
# if ( (prev == "") ||
# (lastNonBlank ~ /[.?!]/) ||
# (inQuotes && (prev !~ /[[:alpha:]]/)) ) {
# curr = toupper(curr)
# }
#
# if ( curr !~ /[[:blank:]]/ ) {
# lastNonBlank = curr
# }
# prev = curr
#
# printf "%c", curr
# }
# print ""
# }
# $ awk -f tst.awk file
# First. Second; third. "The Prince Of Arabia" lost? Maybe! No
#
#On 9/12/2012 5:42 PM, Ed Morton wrote:
# If you're just talking about one-liners like:
#
# $ x="foo/bar"; echo "${x%/*}"
# foo
#
# instead of:
#
# $ x="foo/bar"; echo "$x" | sed 's?/.*??'
# foo
#
# then I'd do that mainly because it's briefer rather than faster

and here's the OPs final response to the thread:

#On 9/13/2012 2:02 AM, Carlo Zanziba wrote:
# Il 12/09/2012 12:21, Ed Morton ha scritto:
#> On 9/12/2012 1:13 AM, Carlo Zanziba wrote:
#>> Thanks everybody. I was afraid my supposition was too far from what
#>> bash could
#>> give me.
#>>
#>> In any case, I appreciated your efforts.
#>
#> You did see that I gave you a working solution, right?
#
# Sure. It works, and fast!
#
# Thanks.
#
# Carlo

Are you confusing my responses with someone else's? I don't recall seeing an
alternative shell-builtin-only solution posted by you or anyone else but if you
have one then please do share, otherwise just stop whining as you're really
making no sense trying to attribute statements to me that I never made and
complaining about perfectly reasonable working solutions, especially when no
alternative has been posted.

Ed.

Thomas 'PointedEars' Lahn

unread,

Sep 18, 2012, 11:15:47 AM9/18/12

to

Ed Morton wrote:

> On 9/18/2012 9:23 AM, Thomas 'PointedEars' Lahn wrote:
>> Ed Morton wrote:
>>> On 9/18/2012 4:37 AM, Thomas 'PointedEars' Lahn wrote:
>>>> Ed Morton wrote:
>>>>> Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
>>> <snip>

>>>> But *first*, in <news:201209111...@webuse.net>, you recommended
>>>> to use awk. You have been insisting to use awk in that subtread ever
>>>> since.
>>>>
>>>> Only *after* I posted a shell-only solution and Chris F.A. Johnson
>>>> disputed your unfounded statement there, you conceded that it could be
>>>> done in the shell.
>>>>
>>>> Why?
>>> Nonsense, read the thread again as you're confusing 2 very different
>>> things. I advocated using awk for non-trivial text processing
>>> applications but using a shell construct if you just want to tweak a
>>> string in the middle of a shell script. I never once advocated doing
>>> something like echoing a string to awk to modify the string and then
>>> reading the result back into a variable as you're suggesting I did.
>> That is correct; you actually suggested creating from the string value a
>> text file to be processed by awk instead, which is even more illogical
>> than piping the string echo into awk, or using shell built-ins instead.
>> And you called that "a working solution", showing that you did not
>> understand the OP's problem correctly in the first place.
>
> What the heck are you talking about "creating from the string value a text
> file to be processed by awk"???

See below.

> Also, the OP specifically posted that the awk script solved his problem
> (see below) so I think that's a pretty good indication that i did
> understand it.

No, it is pretty good indication that the OP thought that, as you were
resorting to awk immediately, there were no other possibilities,
particularly none with only the shell. Which they made clear in their other
followup (see below). Are you simply not paying attention, or whom are you
trying to fool here?

> Here's exactly what I said in the posts I think you're referring to:
>
> #On 9/11/2012 12:28 PM, Ed Morton wrote:
> # Carlo Zanziba <zanzib...@NOSPAMlibero.it> wrote:
> #
> #> Hello,
> #>
> #> Suppose I have such string
> #>
> #> str="first. second; third. lost? maybe! no"

> #> […]

So they have a string value in a sh-based shell (which, as it turned out, is
bash). And they want to transform that string:

| I'd want to turn every first character, and every non-blank character

| after dot, question and exclamative marks to capital, so that it turns to
|

| First. Second; third. Lost? Maybe! No

A plain and simple problem. From which in your solution, out of thin air,
it emerges a text file containing the string value …

> # $ cat file
> # first. second; third. "the prince of arabia" lost? maybe! no

… that you can process with awk then, of course:

> # $ cat tst.awk
> # BEGIN{ FS="" }
> # {

> […]

> and here's the OPs final response to the thread:
>
> #On 9/13/2012 2:02 AM, Carlo Zanziba wrote:
> # Il 12/09/2012 12:21, Ed Morton ha scritto:
> #> On 9/12/2012 1:13 AM, Carlo Zanziba wrote:
> #>> Thanks everybody. I was afraid my supposition was too far from

> #>> what bash could

> #>> give me.
> #>>
> #>> In any case, I appreciated your efforts.
> #>
> #> You did see that I gave you a working solution, right?
> #
> # Sure. It works, and fast!

> […]

Of course it works in certain situations. Nobody doubted that. The problem
that you still fail to see lies elsewhere: If all you know is a hammer,
every problem looks like a nail.

> Are you confusing my responses with someone else's? I don't recall seeing
> an alternative shell-builtin-only solution posted by you or anyone else
> but if you have one then please do share, otherwise just stop whining as
> you're really making no sense trying to attribute statements to me that I
> never made and complaining about perfectly reasonable working solutions,
> especially when no alternative has been posted.

Here it comes:

| From: Carlo Zanziba <zanzib...@NOSPAMlibero.it>
| Newsgroups: comp.unix.shell
| Subject: Re: Pattern matching within ,, and ^^ parser
| Date: Wed, 12 Sep 2012 08:13:19 +0200
| […]
| Message-ID: <k2p961$6f4$1...@tdi.cu.mi.it>
| References: <k2mp5u$edm$1...@tdi.cu.mi.it>
| […]
|
| Thanks everybody. I was afraid my supposition was too far from what bash
| could give me.
| […]

And that is simply not true. However, your posting and others, that did not
even mention that there was a solution without external tools, mislead the
OP into believing so. More, the OP have been mislead into believing that
the not-shell-only solutions were without any compatibility issues because
compatibility issues also were conveniently left out in their presentation.

As a result, not only have they not learned to leverage the power of sh-
based shells, in particular bash, they also believe now they using issue-
free code. Fortunately in this case there were other people to point out
that there was a shell/bash solution, and that the awk solution was not
without issues itself – as there is not only one awk; where in contrast
there is only one (GNU) bash, albeit in different versions which is not
relevant here. But what if there had not been?

*That* is a Bad Thing, particularly when in originates in comp.unix.*shell*.
The tools are there, and should be discussed *here* as such, to *complement*
the shell command language; _not_ to replace it.

To point that out to the obviously extremely short-sighted, and to ask for
reconsideration of approaches, constitutes anything but whining.

BTW: Trim your quotes to the relevant minimum next time, please.

--
PointedEars

Ed Morton

unread,

Sep 18, 2012, 11:36:48 AM9/18/12

to

On 9/18/2012 10:15 AM, Thomas 'PointedEars' Lahn wrote:
> Ed Morton wrote:

<snip>

>> Also, the OP specifically posted that the awk script solved his problem
>> (see below) so I think that's a pretty good indication that i did
>> understand it.
>
> No, it is pretty good indication that the OP thought that, as you were
> resorting to awk immediately, there were no other possibilities,
> particularly none with only the shell. Which they made clear in their other
> followup (see below). Are you simply not paying attention, or whom are you
> trying to fool here?

Now who's projecting?

At the end of the day - no-one has posted a solution using just shell builtins.
If you think such a solution is in some way better than the awk one, then please
do post it. The "proof of concept" you posted elsethread doesn't even come close
to solving the problem (particularly the part about capitalizing the first
letter of every word within quotes) but I'm sure you know that very well or
you'd have produced a working solution based on that.

>> Here's exactly what I said in the posts I think you're referring to:
>>
>> #On 9/11/2012 12:28 PM, Ed Morton wrote:
>> # Carlo Zanziba <zanzib...@NOSPAMlibero.it> wrote:
>> #
>> #> Hello,
>> #>
>> #> Suppose I have such string
>> #>
>> #> str="first. second; third. lost? maybe! no"
>> #> […]
>
> So they have a string value in a sh-based shell (which, as it turned out, is
> bash). And they want to transform that string:
>
> | I'd want to turn every first character, and every non-blank character
> | after dot, question and exclamative marks to capital, so that it turns to
> |
> | First. Second; third. Lost? Maybe! No
>
> A plain and simple problem. From which in your solution, out of thin air,
> it emerges a text file containing the string value …
>
>> # $ cat file
>> # first. second; third. "the prince of arabia" lost? maybe! no

THAT'S what you're whining about? That I used a file to contain the input rather
than a here document or a pipe or just passing a string to the awk script??? I
assumed the OP wasn't pulling strings out of thin air or hard-coding them into
his script otherwise it'd be a pretty pointless exercise. If that had been any
kind of issue it's obviously trivial to use some other way of inputting the
string to the awk script.

> … that you can process with awk then, of course:
>
>> # $ cat tst.awk
>> # BEGIN{ FS="" }
>> # {
>> […]
>> and here's the OPs final response to the thread:
>>
>> #On 9/13/2012 2:02 AM, Carlo Zanziba wrote:
>> # Il 12/09/2012 12:21, Ed Morton ha scritto:
>> #> On 9/12/2012 1:13 AM, Carlo Zanziba wrote:
>> #>> Thanks everybody. I was afraid my supposition was too far from
>> #>> what bash could
>> #>> give me.
>> #>>
>> #>> In any case, I appreciated your efforts.
>> #>
>> #> You did see that I gave you a working solution, right?
>> #
>> # Sure. It works, and fast!
>> […]
>
> Of course it works in certain situations. Nobody doubted that. The problem
> that you still fail to see lies elsewhere: If all you know is a hammer,
> every problem looks like a nail.

Again, please post the better solution.

<further pointless ramblings snipped>

> BTW: Trim your quotes to the relevant minimum next time, please.

Done. Look, if you think a shell-builtin-only solution is appropriate to solve
the OPs problem then just post it so we can compare or stop whining about the
awk solution.

Ed.

Thomas 'PointedEars' Lahn

unread,

Sep 18, 2012, 2:45:05 PM9/18/12

to

Ed Morton wrote:

> On 9/18/2012 10:15 AM, Thomas 'PointedEars' Lahn wrote:
>> Ed Morton wrote:
> <snip>
>>> Also, the OP specifically posted that the awk script solved his problem
>>> (see below) so I think that's a pretty good indication that i did
>>> understand it.
>>
>> No, it is pretty good indication that the OP thought that, as you were
>> resorting to awk immediately, there were no other possibilities,
>> particularly none with only the shell. Which they made clear in their
>> other followup (see below). Are you simply not paying attention, or whom
>> are you trying to fool here?
>
> Now who's projecting?

Not me. People who are not versed enough to know the shell command language
will necessarily be satisfied with any response perceived as a solution that
comes along, even if that includes using external tools.

You can observe that kind of beginner's behavior all over the Net. For
example, if you give a JavaScript novice an unnecessarily bloated jQuery
solution to a trivial DOM problem, they will use that. They will not
question the underlying bad approach and code quality, because they do not –
they cannot – know about that at this point in the learning curve. And of
course they will say "it works" – in the handful of environments (if you are
lucky) and the one circumstance they have tested it with.

> At the end of the day - no-one has posted a solution using just shell
> builtins.

I have. It was only not a Bourne Shell-compatible or POSIX:2008 shell
builtin; it was one that is supported by bash, ksh, and probably other
shells since several versions, though. And I had posted it particularly as
followup to the OP's false statement that *bash*'s shell command language
alone could not do it.

> If you think such a solution is in some way better than the awk
> one, then please do post it.

Maybe later. Whether I post it or not is very well beside the point,
though.

> The "proof of concept" you posted elsethread doesn't even come close to
> solving the problem

That is why it is a proof of *concept* and not a complete solution. It
gives the dedicated reader something to think about and build on. What
matters is that it shows how it could be done *without* further assumptions
and dependencies.

> (particularly the part about capitalizing the first letter of every word
> within quotes)

That was optional, but I have also indicated in my first followup how that
could be done with shell builtins.

> but I'm sure you know that very well or you'd have produced a working
> solution based on that.

Yet another fallacy from you.

>>> Here's exactly what I said in the posts I think you're referring to:
>>>
>>> #On 9/11/2012 12:28 PM, Ed Morton wrote:
>>> # Carlo Zanziba <zanzib...@NOSPAMlibero.it> wrote:
>>> #
>>> #> Hello,
>>> #>
>>> #> Suppose I have such string
>>> #>
>>> #> str="first. second; third. lost? maybe! no"
>>> #> […]
>>
>> So they have a string value in a sh-based shell (which, as it turned out,
>> is bash). And they want to transform that string:
>>
>> | I'd want to turn every first character, and every non-blank character
>> | after dot, question and exclamative marks to capital, so that it turns
>> | to
>> |
>> | First. Second; third. Lost? Maybe! No
>>
>> A plain and simple problem. From which in your solution, out of thin
>> air, it emerges a text file containing the string value …
>>
>>> # $ cat file
>>> # first. second; third. "the prince of arabia" lost? maybe! no
>
> THAT'S what you're whining about? That I used a file to contain the input
> rather than a here document or a pipe or just passing a string to the awk
> script???

I am pointing out to you again, because despite all my efforts so far you
still fail (or refuse?) to see it, that your so-called solution adds more
dependencies than the problem originally had: a text file, and a specific
version or range of versions of a specific implementation or a specific
range of implementations of awk. And it does so *unnecessarily*, in a
newsgroup where awk should be merely an additional tool.

> I assumed the OP wasn't pulling strings out of thin air

There you go. Where is the basis for that assumption? And where is the
basis for the assumption that, assuming that they are using a text file, the
text file contains only that line, or is in a suitable format?

> or hard-coding them into his script otherwise it'd be a pretty pointless
> exercise.

They may very well have retrieved the string value in the script from
elsewhere. Have you even considered that?

> If that had been any kind of issue it's obviously trivial to use
> some other way of inputting the string to the awk script.

And there we have the previously criticized pipe-into-awk approach as your
so-called solution for a problem that does not even call for using awk in
the first place. Which is not the first time (in fact, it is hard to find a
posting here where you do not suggest *only* awk), and *that* is what I do
mind in comp.unix.*shell*. The newsgroup is about using the *shell*, _not_
only awk. If I want to read only awk-only postings, I can read
comp.lang.awk already.

Ed Morton

unread,

Sep 18, 2012, 5:28:51 PM9/18/12

to

Thomas 'PointedEars' Lahn <Point...@web.de> wrote:

<all rehashed nonsense snipped>

1) I posted an awk script that solves the OPs problem (see below).

2) You posted a few trivial lines of bash builtins that do not solve the OPs
problem (also see below) and cannot be expanded upon to do so without major
rework and enhancement.

3) We agree that the NG comp.unix.shell is about all shell solutions, not just
those involving shell builtins. The fact that I choose to respond mainly to the
threads where an awk solution is reasonable and others choose to respond to
threads where a perl solution is reasonable, etc. based on where we think we
have most to contribute in no way diminishes the NG.

Have a nice day.

Ed.

---------------------------------------
Input:

first. second; third. "the prince of arabia" lost? maybe! no

---------------------------------------
Desired Output:

First. Second; third. "The Prince Of Arabia" lost? Maybe! No

---------------------------------------
Working awk solution (not using a file for input at your request):
$ cat awk.bash
str='first. second; third. "the prince of arabia" lost? maybe! no'

str=$(awk '
BEGIN{ FS="" }
{
prev = lastNonBlank = ""

for (i=1; i<=NF; i++) {

curr = $i

if (curr == "\"") {
inQuotes = !inQuotes

}

if ( (prev == "") ||

(lastNonBlank ~ /[.?!]/) ||

(inQuotes && (prev !~ /[[:alpha:]]/)) ) {

curr = toupper(curr)

}

if ( curr !~ /[[:blank:]]/ ) {

lastNonBlank = curr
}
prev = curr

printf "%c", curr
}
print ""
}
' <<!
$str
!
)

printf '%s\n' "$str"

$ ./awk.bash

First. Second; third. "The Prince Of Arabia" lost? Maybe! No

---------------------------------------
Non-working bash builtins solution:
$ cat builtins.bash
str='first. second; third. "the prince of arabia" lost? maybe! no'

IFS_BAK=$IFS
IFS='.?!'

for sentence in $str
do
sentence=${sentence##[[:space:]]}
sentence_uppercase=${sentence^[a-z]}
str=${str/$sentence/$sentence_uppercase}
done

IFS=$IFS_BAK

printf '%s\n' "$str"

$ ./builtins.bash
First. Second; third. "the prince of arabia" lost? Maybe! No

Posted using www.webuse.net

Thomas 'PointedEars' Lahn

unread,

Sep 18, 2012, 8:33:59 PM9/18/12

to

Ed Morton wrote:

> Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
> <all rehashed nonsense snipped>

Score adjusted.

> 1) I posted an awk script that solves the OPs problem (see below).

You really don't get it, do you?

> 2) You posted a few trivial lines of bash builtins

Trivial, indeed. To show how easy it can be if you know how.

> that do not solve the OPs problem (also see below) and cannot be expanded
> upon to do so without major rework and enhancement.

Neither can your awk solution. That is a non-argument. Another one.

> 3) We agree that the NG comp.unix.shell is about all shell solutions, not
> just those involving shell builtins. The fact that I choose to respond
> mainly to the threads where an awk solution is reasonable

Where *you* *think* an awk solution is reasonable because, basically, you
know nothing else; particularly you don't know the *primary* subject of this
newsgroup, the shell. You only know a hammer, and what you see is a nail.

> […] in no way diminishes the NG.

Yes, it does, if the features that the shell already has are left out of the
equation and the – basically – non-shell solution is presented as the best
and only one. A craftsman should learn more than how to swing a hammer.
Much more.

> […]

> ---------------------------------------
> Non-working bash builtins solution:

Since you prefer the hammer: Do you need "concept" hammered on your forehead
before you get it? It was never intended as a complete solution, and the
in-quotes part was defined to be *optional* ("add-on") from the start:

| As add-on, consider that any inner text enclosed into double quotes

| should be capitalized, e.g.

|
| str="first. second; third. \"the prince of arabia\" lost? maybe! no"
|

| turned to
|
| First. Second; Third. "The Prince Of Arabia" lost? Maybe! No
|
| but this may prove impossible.

Of course, it is not impossible even with shell builtins.

> $ cat builtins.bash
> str='first. second; third. "the prince of arabia" lost? maybe! no'
>
> IFS_BAK=$IFS
> IFS='.?!'
>
> for sentence in $str
> do
> sentence=${sentence##[[:space:]]}
> sentence_uppercase=${sentence^[a-z]}
> str=${str/$sentence/$sentence_uppercase}
> done
>
> IFS=$IFS_BAK
>
> printf '%s\n' "$str"
>
> $ ./builtins.bash
> First. Second; third. "the prince of arabia" lost? Maybe! No

No, you just don't get it.

Ed Morton

unread,

Sep 19, 2012, 10:49:55 AM9/19/12

to

Thomas 'PointedEars' Lahn <Point...@web.de> wrote:

<a whole bunch of fascinating stuff snipped>

Apologies to the community, I know I should just stop responding but it's like
picking a scab, I just can't seem to stop myself. You don't often get to see
inside a mind like PointedEars' now that Sidney Lambe has stopped posting here.

PointedEars -

a) when is it appropriate to post awk solutions in this NG?
b) who should be allowed to post awk solutions in this NG?
c) which other tools, if any, should be avoided?

Regards,

Ed.

Posted using www.webuse.net

Janis Papanagnou

unread,

Sep 19, 2012, 11:15:27 AM9/19/12

to

Am 19.09.2012 16:49, schrieb Ed Morton:
> Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
> <a whole bunch of fascinating stuff snipped>
>
> Apologies to the community, I know I should just stop responding but it's like
> picking a scab, I just can't seem to stop myself. You don't often get to see
> inside a mind like PointedEars' now that Sidney Lambe has stopped posting here.

It's better _for you_ as well, to stop replying to pointless postings.

Janis

>
> [...]

Ed Morton

unread,

Sep 19, 2012, 11:36:52 AM9/19/12

to

I know, I know. This is the end of it, honest.

I did make an interesting discovery just a minute ago though - as I mentioned
above, PointedEars ramblings reminded me a lot of Sidney Lambe, even the
anti-awk rhetoric. After making that connection I then noticed that in one of
his responses PointedEars had said "Score adjusted":

https://groups.google.com/d/msg/comp.unix.shell/A0ZdnI-QuKs/5JZVA_9Yz0UJ

which vaguely rang a bell that that was something Sid used to say often so I
just did a quick search and found that I was remembering correctly:

https://groups.google.com/d/msg/comp.os.linux.networking/JBePPlk5ASk/m5lbnhRu5OUJ

I do believe Sid may not have stopped posting here after all, he just has a new
name to add to his list
(https://groups.google.com/forum/?fromgroups=#!msg/alt.os.linux.slackware/3WC8aHaJGaw/1q3YzeJkgycJ).

Anyway, I'm done responding now. You're right, it's way past time to let it lie.

Thomas 'PointedEars' Lahn

unread,

Sep 20, 2012, 8:06:00 AM9/20/12

to

Ed Morton wrote:

> a) when is it appropriate to post awk solutions in this NG?

That depends on what an awk solution is. What I think matters is that what
the shell command language already can do should not be neglected over the
understandable excitement that using awk and other tools can create. AISB,
one needs to know more than just a hammer.

So the approach should be "You can do this like that in the shell, but you
can also do it like so with sed, awk, etc. Choose what you think, given the
arguments presented so far, is best for you."

> b) who should be allowed to post awk solutions in this NG?

Everyone, with the aforementioned in mind.

> c) which other tools, if any, should be avoided?

You are still missing the point, and I am getting the impression here that
you do so *intentionally*. EOD.