The joy of shell scripting is ...

John Doe

unread,

Mar 14, 2016, 8:50:42 PM3/14/16

to

infinite !
Do you agree ?

I searched the internet for a good shell script library but couldnt find
any.
Can you please give me some pointers ?

Best Regards

Trek

unread,

Apr 13, 2016, 4:11:00 PM4/13/16

to

On Tue, 15 Mar 2016 01:50:51 +0100
"John Doe" <John...@gmail.com> wrote:

> I searched the internet for a good shell script library but couldnt
> find any.
> Can you please give me some pointers ?

there isn't too much around and it depends by your needs: what features
in a library would help you writing scripts?

I'm interested in the features you would like to use, as actually I'm
writing a small library (naivelib)

in the meanwhile here a list of libraries for bash (check also the
comments section):

https://dberkholz.com/2011/04/07/bash-shell-scripting-libraries/

c-ya!

Rakesh Sharma

unread,

Apr 20, 2016, 3:30:33 AM4/20/16

to

This is a good place to get you started...

http://www.tldp.org/LDP/abs/html/

Chris F.A. Johnson

unread,

Apr 20, 2016, 8:08:06 PM4/20/16

to

That is not recommended. For example, this code from the ABS does
not do what it claims:

if test -z "$1"
then
echo "No command-line arguments."
else
echo "First command-line argument is $1."
fi

--
Chris F.A. Johnson

George Caswell

unread,

Apr 28, 2016, 7:25:39 PM4/28/16

to

On Monday, March 14, 2016 at 8:50:42 PM UTC-4, John Doe wrote:
("The joy of shell scripting is..")

> infinite !
> Do you agree ?

Somewhat. For a programming language that's built around the concept of using a diverse set of single-purpose programs together to solve problems it seems to me that it doesn't actually do all that much to facilitate this. I think the features offered by the shell were a good fit for the era that spawned them, and even into the 1990s as Linux was gaining a lot of ground, but I think a lot of the philosophy and design has become a bit outdated. The shells have continued to gain new features but I think people have largely stopped thinking of it as a design that can (and should!) grow and evolve over time.

As an example, one of the classic tenets of the "Unix Philosophy", attributed to Doug McIlroy, is that text streams are the "universal interface", and that the environment should be independent of any specific system of data organization. Another (also attributed to McIlroy) is that programs should have a single job, and do it well, and that the aim should be to make these tools work well together: Simple tools combine to solve greater problems.

The problem, IMO, is that the two goals become incompatible as the problems grow in complexity. The overall "format-agnostic" approach can be seen as a "lack of restrictions" but it's also a "lack of structure".

Even today there is no "universal" data format. There were various attempts over the years to create one - fascinating stuff, IMO, to look back with the benefit of hindsight and see these earnest attempts to actually unify all computer data in one meta-format (IFF for instance) - It's simultaneously amazing to look at the ambition of those efforts, and a bit sad to reflect on how hopeless the idea was. At this point, it's a concept we've outgrown. There are different formats for different purposes.

But at the same time, these days there's things like XML and JSON - meta-formats that have been around for decades, and they're widely used for a huge variety of different jobs, and will probably be around for decades to come. None of these format is "the one true format", they won't last forever or unite all the world's data, but they are a huge part of present-day computing.

As such, IMO, they are languages that the shell and its tools should know how to speak quite well. As it stands, this isn't the case.

I think shell tools could be quite a bit better at dealing with these structured formats if they provided the concept of processing a "record" with "fields". Instead of this, for instance:

$ get_records | sed -field 4 -e 's/narf/zort/g;'

you get something more like this:

$ get_records | sed -e 's/^([^:]*:[^:]*:[^:]*:[^:]*)narf/\1zort/g;'

The "sed" command gets muddled in the syntax of the record you're trying to process, the low-level details of how the record is stored, just to make sure it applies "s/narf/zort/g;' to the intended field. (It gets more complicated if you want to implement escape characters, so the delimiter character can actually be present in a field...) And this is just one tool. Doing the job, and doing it right in several different tools with various subtle differences (in regex syntax, etc.) can be a real problem. These "simple tools" could be much "simpler" if they followed some convention for how to separate records and fields.

In a sense, we already have that: there are control characters in low-ASCII (carried over into Unicode) that are meant to do things like separate fields and records. But they're not commonly supported in tools and even being as rarely used as they are I don't think I'd want to assume they'd never appear within a field. Even the null character (supported as a record separator already by some tools) might be needed within a field at some point.

This is the kind of problem I'm interested in solving, personally. If the tools followed a common convention on how to structure data, and how to work with structured data, I think they'd be much better-equipped to deal meaningfully with data like XML files.

Janis Papanagnou

unread,

Apr 29, 2016, 2:43:06 AM4/29/16

to

On 29.04.2016 01:25, George Caswell wrote:
> On Monday, March 14, 2016 at 8:50:42 PM UTC-4, John Doe wrote: ("The joy of
> shell scripting is..")
>> infinite ! Do you agree ?
>
> Somewhat. For a programming language that's built around the concept of
> using a diverse set of single-purpose programs together to solve problems
> it seems to me that it doesn't actually do all that much to facilitate
> this. I think the features offered by the shell were a good fit for the era
> that spawned them, and even into the 1990s as Linux was gaining a lot of
> ground, but I think a lot of the philosophy and design has become a bit
> outdated. The shells have continued to gain new features but I think people
> have largely stopped thinking of it as a design that can (and should!) grow
> and evolve over time.
>
> As an example, one of the classic tenets of the "Unix Philosophy",
> attributed to Doug McIlroy, is that text streams are the "universal
> interface", and that the environment should be independent of any specific
> system of data organization. Another (also attributed to McIlroy) is that
> programs should have a single job, and do it well, and that the aim should
> be to make these tools work well together: Simple tools combine to solve
> greater problems.

Both of those concepts are still valid and helpful; moreover, I think that
those concepts are one of the reasons why Unix (even without advertised
by a powerful marketing division) is still a major (and probably the most
important) OS out there since decades.

>
> The problem, IMO, is that the two goals become incompatible as the problems
> grow in complexity. The overall "format-agnostic" approach can be seen as a
> "lack of restrictions" but it's also a "lack of structure".

The point is that you can define the structure. The advantage of the text
based approach is multifold; in a piped stream of text you can at any place
inspect the data, manipulate it, channel the data; you need no extra tools
for that, you are not depending on (often proprietary, often incompatible)
binary formats. Where I agree is that many of the old components (programs)
did not have a formally clean data format (like ps or ls, as well known
infamous examples) though modern versions often let you define a format
that supports parsing the [field] structure.

>
> Even today there is no "universal" data format. There were various attempts
> over the years to create one - fascinating stuff, IMO, to look back with
> the benefit of hindsight and see these earnest attempts to actually unify
> all computer data in one meta-format (IFF for instance) - It's
> simultaneously amazing to look at the ambition of those efforts, and a bit
> sad to reflect on how hopeless the idea was. At this point, it's a concept
> we've outgrown. There are different formats for different purposes.

It's interesting that you mention IFF but not ASN.1; both are from around
the same time, but the latter is internationally standardized, and despite
competing with the "simple" approaches it is a good example of a "universal
data format. (It's also present even in some of the internet protocols.)
Relying on such structured [binary] data makes it a lot more intransparent
what's going on and much more difficult to manipulate the data. But it's
there.

>
> But at the same time, these days there's things like XML and JSON -
> meta-formats that have been around for decades, and they're widely used for
> a huge variety of different jobs, and will probably be around for decades
> to come. None of these format is "the one true format", they won't last
> forever or unite all the world's data, but they are a huge part of
> present-day computing.

Well, actually it seems that this bulky XML seems to have become something
like the "the one true format" if you considere its braod application. But
given its deficiencies it's understandable that there's alternative formats
to overcome those deficiencies (and introducing other deficiencies).

>
> As such, IMO, they are languages that the shell and its tools should know
> how to speak quite well. As it stands, this isn't the case.
>
> I think shell tools could be quite a bit better at dealing with these
> structured formats if they provided the concept of processing a "record"
> with "fields". Instead of this, for instance:
>
> $ get_records | sed -field 4 -e 's/narf/zort/g;'
>
> you get something more like this:
>
> $ get_records | sed -e 's/^([^:]*:[^:]*:[^:]*:[^:]*)narf/\1zort/g;'

Well, yes. But there are such tools if you want to address "fields"; cut
(as the most primitive), or awk (as a quite universal one). In Awk you can
define the record separator as ASCII RS and the field separator as ASCII FS
if you like, or to anything else you need, and do the substitution on the
respective field. (Your example would be, gsub(/X/,"Z",$2), with appropriate
defined separators, say, RS="\x1e" and FS="\x1c". But now you are free to
not restrict to those ASCII definitions (and not only if you are actually
even in another character set domain).

>
> The "sed" command gets muddled in the syntax of the record you're trying to
> process, the low-level details of how the record is stored, just to make
> sure it applies "s/narf/zort/g;' to the intended field. (It gets more
> complicated if you want to implement escape characters, so the delimiter
> character can actually be present in a field...) And this is just one tool.
> Doing the job, and doing it right in several different tools with various
> subtle differences (in regex syntax, etc.) can be a real problem. These
> "simple tools" could be much "simpler" if they followed some convention for
> how to separate records and fields.

Actually there are conventions; usually the Unix record is a line, and the
standard delimiters are sequences of white-space. You can adjust those if
you want to deviate from that convention. (In awk, for example, by setting
the field and record separators.) Many Unix text processing tools allow to
redefine the field separator, and support data with fields separated by
colons, pipes, commas, or semicolons that are often used.

>
> In a sense, we already have that: there are control characters in low-ASCII
> (carried over into Unicode) that are meant to do things like separate
> fields and records. But they're not commonly supported in tools and even
> being as rarely used as they are I don't think I'd want to assume they'd
> never appear within a field. Even the null character (supported as a record
> separator already by some tools) might be needed within a field at some
> point.

Well, if you are familiar with the Unix text based data interchange philosphy
you might not want to rely on control characters (other than TAB and NL) since
you want the data structure to be easy to create (from keyboard) and to read.
We should note - if comparing the flexibility of data structuring to, say,
ASN.1 - that those structuring ASCII characters are also just a very limited
concept.

>
> This is the kind of problem I'm interested in solving, personally. If the
> tools followed a common convention on how to structure data, and how to
> work with structured data, I think they'd be much better-equipped to deal
> meaningfully with data like XML files.

I wonder how you'd then address the presentation problem if you advertise,
e.g., the ASCII FS, RS, GS, US.

I'm not quite sure where you're comming from, but maybe the MS proprietary
.NET based "powershell" concept might serve you better; as I understand it
you are able to exchange objects without visibility or direct accessibility
of the transfer syntax at all; in that sense it goes even further than only
structuring the transfer-stream in a different way.

Janis

George Caswell

unread,

Apr 29, 2016, 7:35:13 PM4/29/16

to

On Friday, April 29, 2016 at 2:43:06 AM UTC-4, Janis Papanagnou wrote:
> On 29.04.2016 01:25, George Caswell wrote:
> > On Monday, March 14, 2016 at 8:50:42 PM UTC-4, John Doe wrote: ("The joy of
> > shell scripting is..")
> >> infinite ! Do you agree ?
> >
> > Somewhat. For a programming language that's built around the concept of
> > using a diverse set of single-purpose programs together to solve problems
> > it seems to me that it doesn't actually do all that much to facilitate
> > this.

> > As an example, one of the classic tenets of the "Unix Philosophy",
> > attributed to Doug McIlroy, is that text streams are the "universal
> > interface", and that the environment should be independent of any specific
> > system of data organization. Another (also attributed to McIlroy) is that
> > programs should have a single job, and do it well, and that the aim should
> > be to make these tools work well together: Simple tools combine to solve
> > greater problems.
>
> Both of those concepts are still valid and helpful; moreover, I think that
> those concepts are one of the reasons why Unix (even without advertised
> by a powerful marketing division) is still a major (and probably the most
> important) OS out there since decades.

Perhaps. As I said I think the concepts were a good fit for their time. A lot of details of machine word sizes and character sets were still up in the air (and of course there are issues like byte ordering that are still a problem today) - but processing text was a necessity, so in a sense it was, and remains "universal".

The problem is that, as far as defining data structures is concerned, "text is universal" has little more meaning than "byte streams are universal". It gives you all the expressive power you could ever need, but there's no structure, no direction on how to use it. To put data into that form and get it out again you need to decide how. A lot of those details seem obvious (like how to write numeric values, etc.) but when you really get into it, it's less "obvious" than it seems. Like is "013" 13, or 11? What about locale-sensitive formatting like decimal points? If those conventions aren't rigorously-defined, it results in ugly surprises. The ad-hoc nature of it is what made it easy to implement and accessible, but it also makes it chaotic, unpredictable, and thus unreliable.

> > The problem, IMO, is that the two goals become incompatible as the problems
> > grow in complexity. The overall "format-agnostic" approach can be seen as a
> > "lack of restrictions" but it's also a "lack of structure".
>
> The point is that you can define the structure.

That might be fine, except for two things:
First, it means you have to specify the structure you're going to use to each tool, each time you use it. This makes command strings needlessly weighty, you get bogged down in specifying HOW to do what you want, when generally it's preferable to specify just WHAT you want done.
Second, it means every one of those tools you use in combination needs to provide that same functionality - to parse whatever data format you throw at it, operate on a specific part of that data, and take action based on the result. And that is not the case.

This kind of goes back to my first point: If we take the shell more or less according to its current definition, as a command language that links different programs together through text - it should be a lot better at that job than it is presently. XML is a text format, JSON is a relatively simple text format, and shell tools aren't especially great at processing either one. To handle either one correctly you need to parse it.

But my point here is not just that the shell should be good at working with XML or JSON - rather, when faced with a text-based format of comparable complexity, it should be good at working with that new format, too. The shell, being an environment that specializes in text processing, should be a STELLAR choice for working with ANY such format... And it's really not.

And that brings me back to the point about establishing conventions in the shell: Even if all these tools (grep, sed, sort, etc.) were augmented with a parsing library and command line options to describe the syntax it should parse and the action it should take - that would make for some very weighty, near-illegible command stings. A much more effective strategy is to teach these tools to support a common format or two. Then when dealing with some specialized, esoteric format, you can use a specialized tool to translate it to that common format, process it, and then translate it back.

That, IMO, is critical to allowing "small, single-purpose" tools to work together effectively on larger problems. This is also why a collection of "small, single-purpose" tools contained in an environment like Perl or Python is often more effective: these tools have data types - a common vocabulary they can use to exchange data. They're not saddled with this burden of having to parse their input and serialize their output at each step along the way.

> > Even today there is no "universal" data format...

>
> It's interesting that you mention IFF but not ASN.1; both are from around
> the same time, but the latter is internationally standardized, and despite
> competing with the "simple" approaches it is a good example of a "universal
> data format. (It's also present even in some of the internet protocols.)

Well, it's versatile and widely applicable, but it's not "universal" - the idea behind IFF was that it would be the foundation for every file format on the system. That didn't work out, of course... It doesn't really make sense to try to fit all data into one mould.

But thanks for pointing it out. I don't think I'd read up on ASN before. The encoding reminds me a bit of Google protocol buffers.

> > The "sed" command gets muddled in the syntax of the record you're trying to
> > process, the low-level details of how the record is stored, just to make

> > sure it applies "s/narf/zort/g;' to the intended field. ... These

> > "simple tools" could be much "simpler" if they followed some convention for
> > how to separate records and fields.
>
> Actually there are conventions; usually the Unix record is a line, and the
> standard delimiters are sequences of white-space. You can adjust those if
> you want to deviate from that convention. (In awk, for example, by setting
> the field and record separators.) Many Unix text processing tools allow to
> redefine the field separator, and support data with fields separated by
> colons, pipes, commas, or semicolons that are often used.

Right, but those conventions (and the extent to which they're implemented in different tools) aren't adequate. To process JSON for instance you need to handle quoting syntax and nested structures - it's a challenge beyond the scope of a "delimiter character" (Awk's FS) or even a regular expression (Awk's FPAT) - it's a format that must be parsed. (Of course Awk is a programming language, so you can implement a parser in it - but as soon as you pipe out to another process you hit that same bottleneck - you have to serialize the data in such a way that it can be processed by the next command in the pipeline, and in a form complete enough that it can maybe be packed back into JSON later)

To frame the problem in terms of "The Unix Philosophy", JSON parsing would be deemed beyond the scope of most tools (as they should be kept simple) - and as such it's a responsibility we'd want to farm out to a specialized tool. But such a tool would have to translate the JSON data to a format that's equivalent, but easier for those other tools to process.

> > In a sense, we already have that: there are control characters in low-ASCII
> > (carried over into Unicode) that are meant to do things like separate
> > fields and records.
>

> Well, if you are familiar with the Unix text based data interchange philosphy
> you might not want to rely on control characters (other than TAB and NL) since
> you want the data structure to be easy to create (from keyboard) and to read.
> We should note - if comparing the flexibility of data structuring to, say,
> ASN.1 - that those structuring ASCII characters are also just a very limited
> concept.

I agree, delimiters in general aren't an ideal solution. I was mostly using the low-ASCII characters as an example - how an approach that's already gained some traction (NUL-delimited records), if extended and applied more consistently across the various tools, could improve the overall situation. There are some problems where that would be very helpful, but others where having just those few levels of delimiter nesting wouldn't be adequate.

As for making data easy to create and display, it takes nothing more than a command on one end of the pipe to encode the data, and a command on the other end to format it for viewing. Or, if that format is REALLY considered a universal constant of the environment, the shell can do it for you: offer syntax to construct a stream, and display streams in a legible form.

> I'm not quite sure where you're comming from, but maybe the MS proprietary
> .NET based "powershell" concept might serve you better

I like Powershell's features but I'm really not fond of Windows, personally. And it's very much rooted in Windows, its concepts, and its conventions. Good features, bad flavor - for me anyway.

Kaz Kylheku

unread,

Apr 30, 2016, 9:52:03 AM4/30/16

to

On 2016-04-29, George Caswell <gec...@gmail.com> wrote:
> On Friday, April 29, 2016 at 2:43:06 AM UTC-4, Janis Papanagnou wrote:
>> > The problem, IMO, is that the two goals become incompatible as the problems
>> > grow in complexity. The overall "format-agnostic" approach can be seen as a
>> > "lack of restrictions" but it's also a "lack of structure".
>>
>> The point is that you can define the structure.
>
> That might be fine, except for two things:
> First, it means you have to specify the structure you're going to use
> to each tool, each time you use it.

Alan Perlis' Epigram #34.

Kaz Kylheku

unread,

Apr 30, 2016, 10:33:35 AM4/30/16

to

On the other hand, #106.

#9 also very relevant.

Janis Papanagnou

unread,

May 1, 2016, 3:03:15 AM5/1/16

to

On 30.04.2016 01:35, George Caswell wrote:
> On Friday, April 29, 2016 at 2:43:06 AM UTC-4, Janis Papanagnou wrote:
>> On 29.04.2016 01:25, George Caswell wrote:
>>>
>
> The problem is that, as far as defining data structures is concerned, "text
> is universal" has little more meaning than "byte streams are universal". It
> gives you all the expressive power you could ever need, but there's no
> structure, no direction on how to use it.

There's a basic default structure of records (NL) and fields (SPC/TAB).

> To put data into that form and
> get it out again you need to decide how. A lot of those details seem
> obvious (like how to write numeric values, etc.) but when you really get

> into it, it's less "obvious" than it seems. Like is "013" 13, or 11? [...]

Those are [arbitrary] transfer-syntax interpretations that go beyond the
simple structuring you talked about in your post.

>
>>> The problem, IMO, is that the two goals become incompatible as the
>>> problems grow in complexity. The overall "format-agnostic" approach can
>>> be seen as a "lack of restrictions" but it's also a "lack of
>>> structure".
>>
>> The point is that you can define the structure.
>
> That might be fine, except for two things: First, it means you have to
> specify the structure you're going to use to each tool, each time you use
> it. This makes command strings needlessly weighty, you get bogged down in
> specifying HOW to do what you want, when generally it's preferable to
> specify just WHAT you want done. Second, it means every one of those tools
> you use in combination needs to provide that same functionality - to parse
> whatever data format you throw at it, operate on a specific part of that
> data, and take action based on the result. And that is not the case.

The problem is, since you have no meta-structuring level, that you cannot
use "printable codes" for structuring, and you don't want (well, or rather,
I don't want) to sacrifice the text interface with upthread mentioned good
properties in general pipelined processing.

>
> This kind of goes back to my first point: If we take the shell more or less
> according to its current definition, as a command language that links
> different programs together through text - it should be a lot better at
> that job than it is presently. XML is a text format, JSON is a relatively
> simple text format, and shell tools aren't especially great at processing
> either one. To handle either one correctly you need to parse it.
>
> But my point here is not just that the shell should be good at working with
> XML or JSON - rather, when faced with a text-based format of comparable
> complexity, it should be good at working with that new format, too. The
> shell, being an environment that specializes in text processing, should be
> a STELLAR choice for working with ANY such format... And it's really not.

Well, I think we disagree here; I don't think that shell is designed for
(or should be good at) "text processing". Texts is just the primary data
exchange format across piped processes. (A good "text processor" is, e.g.,
awk, and this tool supports flexible record/field structures.)

>
> And that brings me back to the point about establishing conventions in the
> shell: Even if all these tools (grep, sed, sort, etc.) were augmented with
> a parsing library and command line options to describe the syntax it should
> parse and the action it should take - that would make for some very
> weighty, near-illegible command stings. A much more effective strategy is
> to teach these tools to support a common format or two. Then when dealing
> with some specialized, esoteric format, you can use a specialized tool to
> translate it to that common format, process it, and then translate it
> back.

You mean to have all Unix tools support some (necessarily varying) option
character to support something like find's -print0 and xargs's -0 option
to allow standardized record and field delimiters? (So the shell is not
concerned in the first place but all the tools.) - Hmm.. - Franky, while
that is an interesting approach I think the change effort is huge and the
gain is limited. After all, if tools communicate, the receiver needs to
know what the sender provides; a syntactic convention is not the point,
the receiver needs to know the semantics of the transferred data. All we'd
gain seems to be to make it superfluous to specify non-default separators
through explicit options when connecting two tools. But you pay for it with
control-codes in your transfer data, and primitive but additional tools to
do the interface translation for readability (that you mentioned later in
your post), but then you still have the transaltion and presentation issue).

> [...]

>
>>> Even today there is no "universal" data format...
>>
>> It's interesting that you mention IFF but not ASN.1; both are from
>> around the same time, but the latter is internationally standardized, and
>> despite competing with the "simple" approaches it is a good example of a
>> "universal data format. (It's also present even in some of the internet
>> protocols.)
>
> Well, it's versatile and widely applicable, but it's not "universal" - the
> idea behind IFF was that it would be the foundation for every file format
> on the system. That didn't work out, of course... It doesn't really make
> sense to try to fit all data into one mould.

With ''"universal" data'' format I was referring to ASN.1 (not IFF).

>
> But thanks for pointing it out. I don't think I'd read up on ASN before.
> The encoding reminds me a bit of Google protocol buffers.

There's not one encoding; to meet different requirements there are a couple
transfer-syntaxes defined (BER, CER, DER, PER, and even [XML] XER). The
point here is that data presentation is separated from the transfer syntax.

> [snip JSON topics]
>
>> [...]

>
> I agree, delimiters in general aren't an ideal solution. I was mostly using
> the low-ASCII characters as an example - how an approach that's already
> gained some traction (NUL-delimited records), if extended and applied more
> consistently across the various tools, could improve the overall situation.
> There are some problems where that would be very helpful, but others where
> having just those few levels of delimiter nesting wouldn't be adequate.

I've already said something about [all] tools supporting [other] standardized
structuring delimiters. I want to add that I'd see more use if the tools
would (instead of changing the delimiters) provide the possibility to define
what output they create, similar to ps's -o option. To put it simple; for
piped data processing in shells "all" I need is a -o option to define the
actual fields I want to process, and I don't want (in whatever way delimited)
all the irrelevant data the tool may provide. And an option -d to specify the
delimiter of the fields for the given application context (an ISO timestamp
field will not be delimited by - or : , for example, but a password file will
not be delimited by blanks).

Janis

> [...]

George Caswell

unread,

May 1, 2016, 8:30:43 AM5/1/16

to

On Sunday, May 1, 2016 at 3:03:15 AM UTC-4, Janis Papanagnou wrote:
> On 30.04.2016 01:35, George Caswell wrote:
> > On Friday, April 29, 2016 at 2:43:06 AM UTC-4, Janis Papanagnou wrote:
> >> On 29.04.2016 01:25, George Caswell wrote:
> >
> > The problem is that, as far as defining data structures is concerned, "text
> > is universal" has little more meaning than "byte streams are universal". It
> > gives you all the expressive power you could ever need, but there's no
> > structure, no direction on how to use it.
>
> There's a basic default structure of records (NL) and fields (SPC/TAB).

It is very basic, IMO it's inadequate as well (because delimiters can't appear in fields, and because sometimes you need more than two levels of hierarchy in data structures) - and it's not even well-supported in the tools we've got. (grep and sed, for instance, can't operate on a specified field, you have to bake that into the regex. So there is no structure of "fields" there.)

> > To put data into that form and
> > get it out again you need to decide how. A lot of those details seem
> > obvious (like how to write numeric values, etc.) but when you really get
> > into it, it's less "obvious" than it seems. Like is "013" 13, or 11? [...]
>
> Those are [arbitrary] transfer-syntax interpretations that go beyond the
> simple structuring you talked about in your post.

They're arbitrary, but one can't expect reliable handling of data if there is not agreement on such details. Process 1 prints integers with leading zeros to pad out a field, process 2 interprets integers starting with "0" as octal, suddenly your script is producing the wrong answers. The point is that encoding data as text is not as simple or straightforward as people make it out to be. It is nuanced and filled with traps and pitfalls. The contextual information may be obvious to a human reader in many cases but that's not particularly helpful for writing a program to process the data.

> >>> The problem, IMO, is that the two goals become incompatible as the
> >>> problems grow in complexity. The overall "format-agnostic" approach can
> >>> be seen as a "lack of restrictions" but it's also a "lack of
> >>> structure".
> >>
> >> The point is that you can define the structure.
> >
> > That might be fine, except for two things: First, it means you have to
> > specify the structure you're going to use to each tool, each time you use
> > it. This makes command strings needlessly weighty, you get bogged down in
> > specifying HOW to do what you want, when generally it's preferable to
> > specify just WHAT you want done. Second, it means every one of those tools
> > you use in combination needs to provide that same functionality - to parse
> > whatever data format you throw at it, operate on a specific part of that
> > data, and take action based on the result. And that is not the case.
>
> The problem is, since you have no meta-structuring level, that you cannot
> use "printable codes" for structuring, and you don't want (well, or rather,
> I don't want) to sacrifice the text interface with upthread mentioned good
> properties in general pipelined processing.

Indeed, I don't see "the text interface" as something worth clinging to. If we look at the shell as a programming language and the programs that run in the shell as its commands and libraries, this programming language is hobbled with a terrible inadequacy when it comes to data abstraction, and as a result every step of every process defined in the language has to deal explicitly with the specific storage details of the data they're working on.

> Well, I think we disagree here; I don't think that shell is designed for
> (or should be good at) "text processing". Texts is just the primary data
> exchange format across piped processes.

...That's pretty much exactly why I think the shell and its tools should be very good at text processing. If "Text" is the primary form of data exchange in shell pipelines, the programs involved should be very good at working with it (and particularly, with data structures encoded in it.)

> > And that brings me back to the point about establishing conventions in the
> > shell: Even if all these tools (grep, sed, sort, etc.) were augmented with
> > a parsing library and command line options to describe the syntax it should
> > parse and the action it should take - that would make for some very
> > weighty, near-illegible command stings. A much more effective strategy is
> > to teach these tools to support a common format or two. Then when dealing
> > with some specialized, esoteric format, you can use a specialized tool to
> > translate it to that common format, process it, and then translate it
> > back.
>
> You mean to have all Unix tools support some (necessarily varying) option
> character to support something like find's -print0 and xargs's -0 option
> to allow standardized record and field delimiters? (So the shell is not
> concerned in the first place but all the tools.) - Hmm.. - Franky, while
> that is an interesting approach I think the change effort is huge and the
> gain is limited. After all, if tools communicate, the receiver needs to
> know what the sender provides; a syntactic convention is not the point,
> the receiver needs to know the semantics of the transferred data. All we'd
> gain seems to be to make it superfluous to specify non-default separators
> through explicit options when connecting two tools. But you pay for it with
> control-codes in your transfer data, and primitive but additional tools to
> do the interface translation for readability (that you mentioned later in
> your post), but then you still have the transaltion and presentation issue).

That's the basic idea. Though the shell is a part of this as well; it includes various built-in commands and functionality that ought to (in this scheme) follow these rules as well.

As an example, JSON could be used as such an encapsulation format. When a program processes JSON, it parses it, operates on the parsed representation, and then emits JSON again. Telling a program "you're gonna be working with JSON" would be shorthand for a set of parsing and output formatting directives that would be difficult or even impossible to express in the options and syntax of many commands as they currently exist.

Indeed, that doesn't solve all issues of making two programs communicate with one another; but it eases the burden significantly, because the "format the output as X" and "parse the input as X" directives in consecutive commands in a pipeline can now deal in the abstractions of that data structure rather than the byte-level details of the encapsulation format.

I think also that the tools we have at our disposal tend to influence how we approach problems. ("If all you have is a hammer", etc.) New capabilities are something we as shell programmers would have to grow into if we embrace them. A change like this would impact how we approach problem solving in the shell, and what kinds of problems we attempt to solve in the shell.

> I've already said something about [all] tools supporting [other] standardized
> structuring delimiters. I want to add that I'd see more use if the tools
> would (instead of changing the delimiters) provide the possibility to define
> what output they create, similar to ps's -o option. To put it simple; for
> piped data processing in shells "all" I need is a -o option to define the
> actual fields I want to process, and I don't want (in whatever way delimited)
> all the irrelevant data the tool may provide. And an option -d to specify the
> delimiter of the fields for the given application context (an ISO timestamp
> field will not be delimited by - or : , for example, but a password file will
> not be delimited by blanks).

But the data that's "irrelevant" to one tool may be needed in a later stage of processing: For instance, filtering or sorting a list of records based on the state of a single field. The next stage in the processing may care about other fields. (List processes | filter out those that have been running less than 24 hours | sort by UID)

And for some data sources there are very limited choices (or even no choices at all) for a delimiter unless you include escape syntax to allow those delimiters to appear within fields - which generally puts you beyond most tools' input- and output-processing capabilities. This is particularly important when dealing with data from sources we don't necessarily control, or whose contents we can't readily predict. (XML processing, for instance) And two levels of delimiters (record and field) become inadequate once you have at least two variable-length compound fields in a record - because the start of the second depends on the length of the first. You then need encapsulation syntax (and thus parsing) or a third level of delimiters (which only postpones the issue until you hit a situation where you need a fourth level of delimiters...)

lawren...@gmail.com

unread,

Jul 6, 2016, 9:04:06 PM7/6/16

to

On Friday, April 29, 2016 at 11:25:39 AM UTC+12, George Caswell wrote:
> For a programming language that's built around the concept of using a
> diverse set of single-purpose programs together to solve problems it seems
> to me that it doesn't actually do all that much to facilitate this.

It does enough. If you need more, you abandon shell scripting and use another language with a proper library collection, e.g. Python.

Shell is good for ad-hoc processing, e.g. showing the sad state of dotfile proliferation by apps that don’t abide by the XDG Base Directory spec:

ldo@theon:~> ls -A ~/ | grep -c '^\.'
215
ldo@theon:~> ls -A ~/ | grep -vc '^\.'
35
ldo@theon:~> ls -A ~/ | wc -l
250

> I think the features offered by the shell were a good fit for the era that
> spawned them, and even into the 1990s as Linux was gaining a lot of ground,
> but I think a lot of the philosophy and design has become a bit outdated.

The Linux command line is still the most powerful one around. Why do you think Microsoft has been trying to offer PowerShell? Which unfortunately is 1) not powerful enough, and 2) sufficiently intimidating to put off its target userbase, which is accustomed to only pointing and clicking.

Joerg.S...@fokus.fraunhofer.de

unread,

Jul 8, 2016, 4:44:27 AM7/8/16

to

In article <a6b72e49-cfb4-4d84...@googlegroups.com>,

<lawren...@gmail.com> wrote:

>The Linux command line is still the most powerful one around. Why do you think Microsoft has been trying

Do you have a verification that Stephen Bourne worked for Linux in 1976?

--
EMail:jo...@schily.net (home) Jörg Schilling D-13353 Berlin
joerg.s...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/

D. Aaron Sawyer

unread,

Jul 8, 2016, 12:38:21 PM7/8/16

to

On 2016.07.08 05:44, Joerg.S...@fokus.fraunhofer.de wrote:
> In article <a6b72e49-cfb4-4d84...@googlegroups.com>,
> <lawren...@gmail.com> wrote:
>
>> The Linux command line is still the most powerful one around. Why do you think Microsoft has been trying
>
> Do you have a verification that Stephen Bourne worked for Linux in 1976?
>

No one does. https://en.wikipedia.org/wiki/Stephen_R._Bourne

Linux did not exist in 1976. Linux Torvalds was not in university until 1988, and did not have Andrew Tanenbaum's book,
Operating Systems: Design and Implementation, or the MINIX source code, until 1990.
https://en.wikipedia.org/wiki/Linus_Torvalds
https://en.wikipedia.org/wiki/Operating_Systems:_Design_and_Implementation
https://en.wikipedia.org/wiki/MINIX

First prototype Linux kernel release was 5 October 1991. https://en.wikipedia.org/wiki/Linux
Linux is _ONLY_ a kernel. The user space libraries, executables, text and data files are all from the GNU (GNU's Not
Unix) project by Richard Stallman and many others. https://en.wikipedia.org/wiki/GNU and https://www.gnu.org/

The Bourne shell (sh) has been supplanted in general use by the Bourne Again SHell (bash), and others ((ksh, zsh, dash)
(csh, tcsh))
https://en.wikipedia.org/wiki/Unix_shell

FWIW,
=Aaron

Andre Majorel

unread,

Jul 8, 2016, 2:01:29 PM7/8/16

to

On 2016-07-08, D. Aaron Sawyer <aa...@110.net> wrote:
> On 2016.07.08 05:44, Joerg.S...@fokus.fraunhofer.de wrote:
>> In article <a6b72e49-cfb4-4d84...@googlegroups.com>,
>> <lawren...@gmail.com> wrote:
>>
>>> The Linux command line is still the most powerful one
>>> around. Why do you think Microsoft has been trying
>>
>> Do you have a verification that Stephen Bourne worked for
>> Linux in 1976?
>>
> No one does. https://en.wikipedia.org/wiki/Stephen_R._Bourne
>
> Linux did not exist in 1976.

I think Joerg's question was rhetorical, designed to express his
annoyance at how Aaron's statement gives the impression that
Linux is the only Unix worth mentioning.

Which is arguably the case if you look at market shares. A fact
that fills me a deep sadness, somewhat relieved by reminding
myself that the commercial Unix were very expensive to run and
didn't come with sources.

> The Bourne shell (sh) has been supplanted in general use by
> the Bourne Again SHell (bash), and others ((ksh, zsh, dash)
> (csh, tcsh))
> https://en.wikipedia.org/wiki/Unix_shell

Supplanted for interactive use. For scripting, it depends who
you ask.

--
André Majorel http://www.teaser.fr/~amajorel/
J'ai des vrais problèmes, vous avez des faux problèmes.

Kaz Kylheku

unread,

Jul 9, 2016, 2:29:35 AM7/9/16

to

On 2016-07-08, Andre Majorel <che...@halliburton.com> wrote:
> On 2016-07-08, D. Aaron Sawyer <aa...@110.net> wrote:
>> On 2016.07.08 05:44, Joerg.S...@fokus.fraunhofer.de wrote:
>>> In article <a6b72e49-cfb4-4d84...@googlegroups.com>,
>>> <lawren...@gmail.com> wrote:
>>>
>>>> The Linux command line is still the most powerful one
>>>> around. Why do you think Microsoft has been trying
>>>
>>> Do you have a verification that Stephen Bourne worked for
>>> Linux in 1976?
>>>
>> No one does. https://en.wikipedia.org/wiki/Stephen_R._Bourne
>>
>> Linux did not exist in 1976.
>
> I think Joerg's question was rhetorical, designed to express his

And, being rhetorical, it resulted in rhetoric. What's the problem?

lawren...@gmail.com

unread,

Jul 10, 2016, 8:36:41 PM7/10/16

to

On Saturday, July 9, 2016 at 6:01:29 AM UTC+12, Andre Majorel wrote:

> Linux is the only Unix worth mentioning.

Close. Linux Is Not UniX. Unix is dead or dying; Linux is thriving.

Janis Papanagnou

unread,

Jul 10, 2016, 10:33:24 PM7/10/16

to

On 11.07.2016 02:36, lawren...@gmail.com wrote:
> On Saturday, July 9, 2016 at 6:01:29 AM UTC+12, Andre Majorel wrote:
>
>> Linux is the only Unix worth mentioning.

(lawrencedo99, your stripped quote gives the impression that Andre
would claim that, but he doesn't. If you'd quoted sufficient context
that would have been clear; he said: "[...] Aaron's statement gives
the impression that Linux is the only Unix worth mentioning.")

>
> Close. Linux Is Not UniX. Unix is dead or dying; Linux is thriving.

What's worth mentioning certainly lies in the eye of the beholder,
and companies that use (for example) any of the commercial Unixes
will certainly disagree as well as probably Apples's OS-X (FreeBSD
based) users would.

Since Unix is the name of an OS family Linux is part of it. So even
if we'd accept the debatable thesis that "Unix" would be dead/dying
such a statement makes little sense. The former original AT&T "UNIX"
OS was the base of many commercial Unix systems, the largest variants
still running (and for decades). Meanwhile that UNIX label is an Open
Group trademark used for certifying Unix systems as SUS complient. So
I would doubt that either will die soon.

Janis

Andre Majorel

unread,

Jul 11, 2016, 5:59:20 AM7/11/16

to

On 2016-07-11, lawren...@gmail.com <lawren...@gmail.com> wrote:
> On Saturday, July 9, 2016 at 6:01:29 AM UTC+12, Andre Majorel wrote:
>
>> Linux is the only Unix worth mentioning.

While this sequence of words did appear in my previous post,
there were more words around it that gave the whole sentence a
different meaning. I'd be grateful if you could find it in you
to refrain from putting words in my mouth like this.

> Close. Linux Is Not UniX. Unix is dead or dying; Linux is
> thriving.

Vegetables are dying, carrots are thriving.

lawren...@gmail.com

unread,

Jul 11, 2016, 7:00:58 PM7/11/16

to

On Monday, July 11, 2016 at 2:33:24 PM UTC+12, Janis Papanagnou wrote:

>
> On 11.07.2016 02:36, Lawrence D’Oliveiro wrote:
>>
>> Linux Is Not UniX. Unix is dead or dying; Linux is thriving.
>
> What's worth mentioning certainly lies in the eye of the beholder,

Market share does not lie in the eye of the beholder.

> and companies that use (for example) any of the commercial Unixes
> will certainly disagree as well as probably Apples's OS-X (FreeBSD
> based) users would.

Have you noticed that Apple Mac sales are in decline?

> Since Unix is the name of an OS family Linux is part of it.

No, “Unix” is a trademark that you have to pay money to be able to associate with your product. Nobody who works with Linux has bothered to pay that money. Whereas all the products that *have* paid that money are either dead or dying.

Hence Linux Is Not UniX.

Janis Papanagnou

unread,

Jul 12, 2016, 3:58:51 AM7/12/16

to

On 12.07.2016 01:00, lawren...@gmail.com wrote:
> On Monday, July 11, 2016 at 2:33:24 PM UTC+12, Janis Papanagnou wrote:
>>
>> On 11.07.2016 02:36, Lawrence D’Oliveiro wrote:
>>>
>>> Linux Is Not UniX. Unix is dead or dying; Linux is thriving.
>>
>> What's worth mentioning certainly lies in the eye of the beholder,
>
> Market share does not lie in the eye of the beholder.

There are various publications, and depending on the focus (desktop
client vs. server, commercially vs. privately used), so if you want to
discuss market share you have to name the criterions or the concrete
publications.

>
>> and companies that use (for example) any of the commercial Unixes will
>> certainly disagree as well as probably Apples's OS-X (FreeBSD based)
>> users would.
>
> Have you noticed that Apple Mac sales are in decline?

I did not mean Apple's OS-X alone (which I explicitly named besides the
professionally used commercial Unixes; I said "as well as"); it's (e.g.)
AIX, HP-UX, Solaris. Those OSes are used professionally as server systems
in many companies, where Macs are basically end user client systems. The
many companies that use these systems don't share your opinion "what's
worth mentioning", nor do the immense numbers of Apple users. (But note
that most Apple users don't care about the underlying OS-X system; it's
just there and works.)

>
>> Since Unix is the name of an OS family Linux is part of it.
>
> No, “Unix” is a trademark that you have to pay money to be able to
> associate with your product. Nobody who works with Linux has bothered to
> pay that money. Whereas all the products that *have* paid that money are
> either dead or dying.
>
> Hence Linux Is Not UniX.

The classical trademark is "UNIX" (not "Unix" nor "UniX"). The term "Unix"
has informally used for decades to denote the family of "UNIX"-like OSes
to distinguish the formerly AT&T trademark for System V UNIX from others.
These are besides the classical System V based systems also the BSD based
systems and systems that shared concepts of both worlds.

Yes, Linux is not certified by the Open Group.

Janis

Andre Majorel

unread,

Jul 12, 2016, 4:06:11 AM7/12/16

to

On 2016-07-11, lawren...@gmail.com <lawren...@gmail.com> wrote:

> On Monday, July 11, 2016 at 2:33:24 PM UTC+12, Janis Papanagnou wrote:
>
>> Since Unix is the name of an OS family Linux is part of it.
>

> No, ???Unix??? is a trademark that you have to pay money to be

> able to associate with your product. Nobody who works with

> Linux has bothered to pay that money. [...] Hence Linux Is Not
> UniX.

"Unix" means different things in different contexts. The trade
mark aspect pertains to legal contexts. Most readers of
comp.unix.shell are programmers, not solicitors. We tend to
judge whether something is Unix by criteria that are relevant to
our own field.

Geoff Clare

unread,

Jul 12, 2016, 8:41:07 AM7/12/16

to

lawrencedo99 wrote:

> “Unix” is a trademark that you have to pay money to be able to
> associate with your product. Nobody who works with Linux has bothered
> to pay that money.

Incorrect.

https://en.wikipedia.org/wiki/Inspur_K-UX

"Inspur K-UX is a Linux distribution based on Red Hat Enterprise Linux
produced by Inspur, a Chinese multinational company specializing in
information technology. Inspur K-UX 2.0 and 3.0 for x86-64 are
officially certified as UNIX systems by The Open Group."

--
Geoff Clare <net...@gclare.org.uk>

Trek

unread,

Jul 12, 2016, 9:40:13 AM7/12/16

to

On Mon, 11 Jul 2016 16:00:50 -0700 (PDT)
lawren...@gmail.com wrote:

> > and companies that use (for example) any of the commercial Unixes
> > will certainly disagree as well as probably Apples's OS-X (FreeBSD
> > based) users would.
> Have you noticed that Apple Mac sales are in decline?

have you noticed an unprecedented big spike in the ios and android
devices? they are un*x

> > Since Unix is the name of an OS family Linux is part of it.
> No, “Unix” is a trademark that you have to pay money to be able to
> associate with your product. Nobody who works with Linux has bothered
> to pay that money. Whereas all the products that *have* paid that
> money are either dead or dying.
> Hence Linux Is Not UniX.

unix is a trademark and nspur paid money to certificate his linux
distribution called inspur k-ux

http://www.opengroup.org/csq/repository/RID=inspur%252FXY1%252F1.html

hence linux is unix

Geoff Clare

unread,

Jul 13, 2016, 8:41:07 AM7/13/16

to

Inspur's K-UX Linux distribution is UNIX. Other Linux distributions
are not (yet).

This is not just a question of licensing the UNIX trademark. Other
Linux distributions would not pass the Open Group's test suites.

--
Geoff Clare <net...@gclare.org.uk>

lawren...@gmail.com

unread,

Jul 13, 2016, 11:38:35 PM7/13/16

to

On Thursday, July 14, 2016 at 12:41:07 AM UTC+12, Geoff Clare wrote:

> This is not just a question of licensing the UNIX trademark. Other
> Linux distributions would not pass the Open Group's test suites.

And even those test suites are not really much of a guarantee of standardization--look at the fragmented mess that the Unix world has been in for so many decades.

Whereas, even though there are more Linux distros than there ever were different varieties of Unix, yet they can all run the same software.

Linux is about choice, not about lock-in.