On 29.04.2016 01:25, George Caswell wrote:
> On Monday, March 14, 2016 at 8:50:42 PM UTC-4, John Doe wrote: ("The joy of
> shell scripting is..")
>> infinite ! Do you agree ?
>
> Somewhat. For a programming language that's built around the concept of
> using a diverse set of single-purpose programs together to solve problems
> it seems to me that it doesn't actually do all that much to facilitate
> this. I think the features offered by the shell were a good fit for the era
> that spawned them, and even into the 1990s as Linux was gaining a lot of
> ground, but I think a lot of the philosophy and design has become a bit
> outdated. The shells have continued to gain new features but I think people
> have largely stopped thinking of it as a design that can (and should!) grow
> and evolve over time.
>
> As an example, one of the classic tenets of the "Unix Philosophy",
> attributed to Doug McIlroy, is that text streams are the "universal
> interface", and that the environment should be independent of any specific
> system of data organization. Another (also attributed to McIlroy) is that
> programs should have a single job, and do it well, and that the aim should
> be to make these tools work well together: Simple tools combine to solve
> greater problems.
Both of those concepts are still valid and helpful; moreover, I think that
those concepts are one of the reasons why Unix (even without advertised
by a powerful marketing division) is still a major (and probably the most
important) OS out there since decades.
>
> The problem, IMO, is that the two goals become incompatible as the problems
> grow in complexity. The overall "format-agnostic" approach can be seen as a
> "lack of restrictions" but it's also a "lack of structure".
The point is that you can define the structure. The advantage of the text
based approach is multifold; in a piped stream of text you can at any place
inspect the data, manipulate it, channel the data; you need no extra tools
for that, you are not depending on (often proprietary, often incompatible)
binary formats. Where I agree is that many of the old components (programs)
did not have a formally clean data format (like ps or ls, as well known
infamous examples) though modern versions often let you define a format
that supports parsing the [field] structure.
>
> Even today there is no "universal" data format. There were various attempts
> over the years to create one - fascinating stuff, IMO, to look back with
> the benefit of hindsight and see these earnest attempts to actually unify
> all computer data in one meta-format (IFF for instance) - It's
> simultaneously amazing to look at the ambition of those efforts, and a bit
> sad to reflect on how hopeless the idea was. At this point, it's a concept
> we've outgrown. There are different formats for different purposes.
It's interesting that you mention IFF but not ASN.1; both are from around
the same time, but the latter is internationally standardized, and despite
competing with the "simple" approaches it is a good example of a "universal
data format. (It's also present even in some of the internet protocols.)
Relying on such structured [binary] data makes it a lot more intransparent
what's going on and much more difficult to manipulate the data. But it's
there.
>
> But at the same time, these days there's things like XML and JSON -
> meta-formats that have been around for decades, and they're widely used for
> a huge variety of different jobs, and will probably be around for decades
> to come. None of these format is "the one true format", they won't last
> forever or unite all the world's data, but they are a huge part of
> present-day computing.
Well, actually it seems that this bulky XML seems to have become something
like the "the one true format" if you considere its braod application. But
given its deficiencies it's understandable that there's alternative formats
to overcome those deficiencies (and introducing other deficiencies).
>
> As such, IMO, they are languages that the shell and its tools should know
> how to speak quite well. As it stands, this isn't the case.
>
> I think shell tools could be quite a bit better at dealing with these
> structured formats if they provided the concept of processing a "record"
> with "fields". Instead of this, for instance:
>
> $ get_records | sed -field 4 -e 's/narf/zort/g;'
>
> you get something more like this:
>
> $ get_records | sed -e 's/^([^:]*:[^:]*:[^:]*:[^:]*)narf/\1zort/g;'
Well, yes. But there are such tools if you want to address "fields"; cut
(as the most primitive), or awk (as a quite universal one). In Awk you can
define the record separator as ASCII RS and the field separator as ASCII FS
if you like, or to anything else you need, and do the substitution on the
respective field. (Your example would be, gsub(/X/,"Z",$2), with appropriate
defined separators, say, RS="\x1e" and FS="\x1c". But now you are free to
not restrict to those ASCII definitions (and not only if you are actually
even in another character set domain).
>
> The "sed" command gets muddled in the syntax of the record you're trying to
> process, the low-level details of how the record is stored, just to make
> sure it applies "s/narf/zort/g;' to the intended field. (It gets more
> complicated if you want to implement escape characters, so the delimiter
> character can actually be present in a field...) And this is just one tool.
> Doing the job, and doing it right in several different tools with various
> subtle differences (in regex syntax, etc.) can be a real problem. These
> "simple tools" could be much "simpler" if they followed some convention for
> how to separate records and fields.
Actually there are conventions; usually the Unix record is a line, and the
standard delimiters are sequences of white-space. You can adjust those if
you want to deviate from that convention. (In awk, for example, by setting
the field and record separators.) Many Unix text processing tools allow to
redefine the field separator, and support data with fields separated by
colons, pipes, commas, or semicolons that are often used.
>
> In a sense, we already have that: there are control characters in low-ASCII
> (carried over into Unicode) that are meant to do things like separate
> fields and records. But they're not commonly supported in tools and even
> being as rarely used as they are I don't think I'd want to assume they'd
> never appear within a field. Even the null character (supported as a record
> separator already by some tools) might be needed within a field at some
> point.
Well, if you are familiar with the Unix text based data interchange philosphy
you might not want to rely on control characters (other than TAB and NL) since
you want the data structure to be easy to create (from keyboard) and to read.
We should note - if comparing the flexibility of data structuring to, say,
ASN.1 - that those structuring ASCII characters are also just a very limited
concept.
>
> This is the kind of problem I'm interested in solving, personally. If the
> tools followed a common convention on how to structure data, and how to
> work with structured data, I think they'd be much better-equipped to deal
> meaningfully with data like XML files.
I wonder how you'd then address the presentation problem if you advertise,
e.g., the ASCII FS, RS, GS, US.
I'm not quite sure where you're comming from, but maybe the MS proprietary
.NET based "powershell" concept might serve you better; as I understand it
you are able to exchange objects without visibility or direct accessibility
of the transfer syntax at all; in that sense it goes even further than only
structuring the transfer-stream in a different way.
Janis