On 28.07.2012 05:11, Jonadab the Unsightly One wrote:
> On Jul 27, 8:40 am, Janis Papanagnou <
janis_papanag...@hotmail.com>
> wrote:
>>>> ASCII defines a character set, no more no less.
>>
>>> Yes, but it defines *meanings* for some of those
>>> characters,
>>
>> You cannot isolate the "meaning" from the interpreting
>> device.
>
> That's the stupidest thing you've said in this entire
> thread. ASCII and Unicode would both have no reason to
> exist if that were so.
What I wrote is so fundamentally "a Truth" that I wonder that
you comment in such a personal way; since usually I got the
impression that you think about issues more deeply. I suggest
to rethink about that.
And I would therefore prefer to abstain from commenting the
rest of your posting since you seem to have missed even some
of the very basics here.
Also remember; we have been looking for a general standard
for a "text file" specification generally or for "text file"
line terminators, specifically. And yet no one provided one.
That should suffice for most readers. The rest for the [OT]
hard-liners...
>
> A character set certainly can assign a particular meaning to
> a given number.
Example: 'I' (ASCII 75)
Meaning: Capital Letter I
Meaning: Roman Number 1
Meaning: Chemical Element Iod
etc.
The fact that the ASCII description is "CAPITAL LETTER I" does
not restrict meanings for other devices (human, algorithmic,
hardware, software). It is depending on the processing device.
Send 'I' to one device you get different reactions than from
other devices. Don't ever assume that 'I' will have a generic
meaning for all devices.
Now back to "text files"; you want to terminate a line, so you
have to choose some line terminator. You need one unambiguous
character, so you choose a control code. You want to choose one
that resembles the entity "line", so you may choose a "line feed",
LF. Others may have thought differently, and thought a "carriage"
may be appropriate because they had the picture in mind that a
text is something to be printed out, and since on a typewriter
you have the carriage lever where you can position the carriage
to the left (and the line feed is done implicitly), so a simple
CR is sufficient. And again others will decide to use CR and LF.
So what it the truth? What is a "false" design? What is right?
It depends on the interpreting device what you get. And for that
standards are helpful. And you want a standard for "text files".
But we don't have one! We have many that define character sets,
including control codes. ASCII for example. Do you think we
have to assume that all ASCII codes shall be used (if somehow
appropriate)? Why shall we use, for "text files", a "carriage
concept"? But why shall we *not* use a STX (Start of text) at
the beginning of a file, or a RS (Record Separator) between
lines? - Clear now? - If not, I really can't help, I'll bite.
"If all you have is a hammer everything looks like a nail."
Obviously, your hammer is ASCII. And a very specific view how
to use it.
> [snip meaningless and already covered text]
>> Devices, ASCII (or other) control codes, and interpreting
>> control characters, are related. That's covered by most of
>> what I posted in the thread.
>
> Of course they're related. They're related in a very
> particular way. You seem, however, to be under the
> impression that an operating system does not qualify as a
> device (it certainly does), or that it is normal and
> expected that devices claiming to support the same standard
> do so in contradictory, mutually incompatible ways that
> violate the spirit and letter of the standard in question.
No, I (basically) said that you cannot assume a "text file"
line terminator to contain a 'CR' ASCII control code. You
seem to see some contradictory behaviour of the Unix OS'es.
You claimed that CR-LF is right and everything else wrong.
I think, here you should ponder about 'STX', again. And why,
for you, an STX or RS may be invalid (per "meaning") and why
a CR would be necessary to have a "correct" text file.
Actually, you don't need STX, ETX, CR, etc. for definition
of a "text file" structure. And there's no general standard
for the latter. Assuming using CR is "right" and not using
it "wrong" is arbitrary.
> [snip stuff that has already been answered]
>
> [CR+LF] Unix (and
> C for that matter; the two were developed together) uses the
> line feed character to signify the sum of both these things
> together.
Well, no. Unix uses a control character LF to indicate that a
line in a "text file" is complete. (Ready to feed a new line,
if you like comprehensive pictures.)
Your mental model seem to somehow be focussed on using CR as
well.
Both are arbitrary, both fit to some degree, depending on the
mindset. None of them is even necessary. None of them is even
standard for "text files".
(And even ASCII defines other codes for data structuring; see
below.)
> [snip some more that has already been answered]
>> You will also get errors if you are sending text files (either
>> LF, CR-LF, or CR terminated) over an BER encoded
>> ASN.1 defined X.400 mail system. So what?!
>
> Now you're just being deliberately obtuse.
Verbal attacks won't help you to understand the issue. Better
think about what has been said more deeply. I will elaborate...
> BER-encoded
> ASN.1 defined X.400 mail systems aren't based on the
> ASCII specification in the same way that telnet and
> SMTP are. These protocols handle the linefeed character
> they way they do because ASCII says it means a certain
> thing.
The point that has been made was; those protocols *define* what
they expect as data format to be exchanged, they explicitly
specify, e.g., that the underlying character set is ASCII, and
that the separator is CR-LF. Every communication protocol will
have to define those issues for interoperability. The same is
true for X.400; they also define the transfer syntax for valid
X.400 messages (a binary format in case of BER). If they would
not do that, they would just *not work* in practice.
You said in the posting I responded to, that you will get errors
if you feed (only-)LF terminated text directly into a SMTP (or
NNTP) message body. And of course you may get errors! Because
those protocols define other data formats to use. If you fit
something into a protocol that it is not defined for you will
get errors; non-conforming text in SMTP as well as in X.400.
That was the point! Okay?
You have specified communication protocols on one side, OTOH
you still have unspecified "text files".
(Remember? We were talking about "text file" specifications!)
All references to communication protocols help to understand,
implement, or use those protocols; but for the issue here that
is nothing but a red herring.
Clear now?
>
>> (( $(wc -l < A) + $(wc -l < B) == $(cat A B | wc -l) ))
>>
>> should always be an invariant. ALWAYS!
>
> Now you're talking about POSIX in the modern (post-BSD)
*sigh* - no, I am *not* talking about POSIX. I am talking about
_a functional invariant_.
> world, which is the way it is precisely because Unix was the
> way it was and did what it did.
I described a functional property that is independent of Unix.
The consequence of not considering functional invariants is one
major reason for inconsistent systems (AKA "broken by design").
>
> It is certainly true that why we can't now go back and
> implement a Unix system that interprets linefeeds according
> to the ASCII spec.
There's no need to do so.
> It's far too late to fix it now.
There's nothing to "fix" [in Unix], since there's no bug.
But it's certainly too late to have a unique standard for "text
files".
> [...]
>
>> So let me repeat also; for line termination a
>> single control character is sufficient,
>
> In the abstract, yes.
>
> But no single ASCII character by itself was intended to be
> used in this way, because ASCII (very much unlike Unicode)
> went out of its way not to define redundant characters, in
> order to fit in seven bits.
You should explain this opinion on ASCII.
> You will notice, if you study the matter closely,
LOL! - Be assured I did, for decades, very closely. :-)
> that ASCII does not for example define
> distinct characters for single quote and apostrophe. One
> character for both was sufficient. There's also no elipsis
> character, because series of three periods conveyed the same
> information. There's no copyright symbol, because you can
> just say (C).
See, you must take into account the time when it was developed;
at that time there were still 5 bit (Telex) and 6 bit (mainframes)
character sets in use. But also with 7 bit (ASCII) you can just
represent a fraction of what's needed worldwide. The problem, from
my perspective, was not the missing ellipsis or copyright symbol -
well, maybe you see it that way, since the 'A' in ASCII indicates
clearly the country context of the development -; but there are
not even the characters available that are used and necessary in
the European countries. The point is; 7 bit is just limited, and
the US of 'A' standards bodies primarily took care of their own
(limited) needs. So count the characters and code positions you
have (52 letters + 10 digits + 16 [necessary] punctuation/space)
and you fill the free positions with codes that *may* be useful.
ASCII control codes are for "Information Interchange", with all
what may be necessary, including codes for signalling, which are
for "text file" definitions completely unnecessary, for example.
But not all control codes are necessary for every purpose; that
depends on the actual communication devices.
> There's no "end of line" character, since
> that would basically just be the sum of carriage return and
> linefeed anyhow, so just use that.
For data structuring I see (at least) four(!) ASCII codes that
could be used; one of them RS ('Record Separator', ASCII 30)
even seems to have been used by some systems (as you see in the
reference that I posted
http://en.wikipedia.org/wiki/Line_feed).
So, as I see it, those claims and assumptions that you make are
arbitrary, and partly not even correct. None of the four codes
for data structuring (FS, GS, RS, US) is used by the established
systems for line endings; orthogonality of codes can hardly be
seen as reason for defining two device control codes CR and LF
but ignore existing data structuring codes, when it comes to a...
..."text file" specification.
If you have problems to understand those points, I fear, we have
to agree that we disagree. Nothing wrong with that.
PS: If you want to extend the communication on that (off-)topic,
please abstain from calling me "stupid" and "obtuse" again.
With your knowledge and reasoning I think you're also not in a
too good position for offences like that and insults. Thanks for
your understanding.
Janis
> [...]