Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How To Patch Nethack??

316 views
Skip to first unread message

William

unread,
Jul 19, 2012, 5:46:40 AM7/19/12
to
Today I tried to add
Convict Role 0.7 and AnyPet 1.0
(Both on http://bilious.alt.org/)
to vanilla NetHack, but although I followed all the instructions on NetHackWiki
(http://nethackwiki.com/wiki/Patching), it still wouldn't work.When I start patch.exe using a batch file, I only got a blank command line screen with the title C:\***\patch.exe.Five minutes passed, still a blank window.What is the problem??I am using a Win7 system.Is it because the software doesn't work on the system?(http://nethackwiki.com/wiki/Patch says that it has problems on Vista)
Please help me.

Nephi

unread,
Jul 19, 2012, 11:15:40 AM7/19/12
to
On Thursday, July 19, 2012 3:46:40 AM UTC-6, William wrote:
> Today I tried to add
> Convict Role 0.7 and AnyPet 1.0
> (Both on http://bilious.alt.org/)
> to vanilla NetHack, but although I followed all the instructions on NetHackWiki
> (http://nethackwiki.com/wiki/Patching), it still wouldn't work.When I start patch.exe using a batch file, I only got a blank command line screen with the title C:\***\patch.exe.Five minutes passed, still a blank window.What is the problem??I am using a Win7 system.Is it because the software doesn't work on the system?(http://nethackwiki.com/wiki/Patch says that it has problems on Vista)
> Please help me.

Exactly what syntax are you using? Try something like this:
patch -p0 < whatever.patch

Though I like to do this first:
patch --dry-run -p0 < whatever.patch
just to make sure everything is OK before making real changes.

Jorgen Grahn

unread,
Jul 19, 2012, 2:41:23 PM7/19/12
to
On Thu, 2012-07-19, Nephi wrote:
> On Thursday, July 19, 2012 3:46:40 AM UTC-6, William wrote:
>> Today I tried to add
>> Convict Role 0.7 and AnyPet 1.0
>> (Both on http://bilious.alt.org/)
>> to vanilla NetHack, but although I followed all the
>> instructions on NetHackWiki
>
> Exactly what syntax are you using? Try something like this:
> patch -p0 < whatever.patch

He said he followed the instructions, which are:

patch -p1 < nh343-menucolor.diff

but it seems to me he must have made an error. The symptom is exactly
what you get if you forget the '<' redirection, and patch starts
waiting for text from the terminal.

>> it still wouldn&#39;t work.When I start patch.exe using a
>> batch file,

Hm, that's not what the instructions say ... OP, did you do *both*, or
didn't you follow the instructions after all?

>> I only got a blank command line screen with the title
>> C:\***\patch.exe.Five minutes passed, still a blank window.

>> What is the problem??I am using a Win7 system.Is it because the
>> software doesn&#39;t work on the system?
>> (http://nethackwiki.com/wiki/Patch says that it has problems on Vista)

Seems unlikely that that would give that problem ... but I don't know
much about Windows.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

pillow13...@gmail.com

unread,
Jul 20, 2012, 2:13:48 AM7/20/12
to
Nephi於 2012年7月19日星期四UTC+8下午11時15分40秒寫道:
> On Thursday, July 19, 2012 3:46:40 AM UTC-6, William wrote:
> &gt; Today I tried to add
> &gt; Convict Role 0.7 and AnyPet 1.0
> &gt; (Both on http://bilious.alt.org/)
> &gt; to vanilla NetHack, but although I followed all the instructions on NetHackWiki
> &gt; (http://nethackwiki.com/wiki/Patching), it still wouldn&amp;#39;t work.When I start patch.exe using a batch file, I only got a blank command line screen with the title C:\***\patch.exe.Five minutes passed, still a blank window.What is the problem??I am using a Win7 system.Is it because the software doesn&amp;#39;t work on the system?(http://nethackwiki.com/wiki/Patch says that it has problems on Vista)
> &gt; Please help me.
>
> Exactly what syntax are you using? Try something like this:
> patch -p0 &lt; whatever.patch
>
> Though I like to do this first:
> patch --dry-run -p0 &lt; whatever.patch
> just to make sure everything is OK before making real changes.

I used "patch.exe -p1 < file"

pillow13...@gmail.com

unread,
Jul 20, 2012, 2:19:48 AM7/20/12
to
>> it still wouldn&#39;t work.When I start patch.exe using a
>> batch file,

Hm, that's not what the instructions say ... OP, did you do *both*, or
didn't you follow the instructions after all?

I tried using a cmd window, but it says "can not find the file"
when I look at the batch window carefully, I see that the "patch.exe -p1 < file"
has turned into "patch.exe -p1 <0file"
I wonder if that is the problem..

William

unread,
Jul 20, 2012, 2:37:00 AM7/20/12
to
I even tried to use the patch command on ophcrack(based on some kind of linux)
but it said that it can not find include/decl.h.

Jorgen Grahn

unread,
Jul 20, 2012, 7:23:53 AM7/20/12
to
On Fri, 2012-07-20, pillow13...@gmail.com wrote:
>>> it still wouldn&#39;t work.When I start patch.exe using a
>>> batch file,
>
> Hm, that's not what the instructions say ... OP, did you do *both*, or
> didn't you follow the instructions after all?

Please quote properly! I wrote the paragraph above, in reply to the
topmost one.

> I tried using a cmd window, but it says "can not find the file"

Which one -- the patch executable or the patch?

> when I look at the batch window carefully, I see that the "patch.exe -p1 < file"
> has turned into "patch.exe -p1 <0file"
> I wonder if that is the problem..

Your newsreader (IIRC the new Google Groups with even more bugs)
mangles your text, but not that part, it seems. So it said:

patch.exe -p1 <0file

Then yes of course that's a problem. Why should it work? Your telling
it the patch it should read is called 0file, and I suspect the file
doesn't have that name.

SoothSayer

unread,
Jul 20, 2012, 11:26:37 PM7/20/12
to
Have you tried making a DOSBox environment and running it from within
that?

I run all my old legacy PCB layout and CAD apps within that.

You can point it to a directory and call that "c:" and everything under
it should then also include those error found file directory structures
if you start out with it all under one directory, that is.

William

unread,
Jul 21, 2012, 4:16:23 AM7/21/12
to
I tried to patch it again in the cmd line today.
It says things like "Assertion failed, hunk, file patch.c, line 343,"
I read http://nethackwiki.com/wiki/Patching
"note: On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given. "
and tried the command "patch.exe --binary -p1 < file"
It jumped out a blank window which disappeared quickly.
I compiled it after that,but it was like I have never patched it.
Am I doing anything wrong?

SoothSayer

unread,
Jul 21, 2012, 12:04:12 PM7/21/12
to
On Sat, 21 Jul 2012 01:16:23 -0700 (PDT), William
<pillow13...@gmail.com> wrote:

snip

>I compiled it after that,but it was like I have never patched it.
>Am I doing anything wrong?

The patch doesn't take but a second to do. All it does is change the
files needed to alter the code so that the subsequent compile has the
changes in it.

The old way was a list of files and the edited strings, etc., and hand
edit sessions.

All the "patch" does is perform those edits for you so all you need to
do is "patch" and "compile" like you normally do.

It is because "compilers" were programmers in the early days, and they
knew what was going on. Nowadays most "compilers" are "users" who have
limited understanding of what is actually going on, but they know how to
go through the simplified compile steps the developers set it up so that
it would be easy for a simple user level person to compile.

pillow13...@gmail.com

unread,
Jul 23, 2012, 1:55:37 AM7/23/12
to
SoothSayer於 2012年7月22日星期日UTC+8上午12時04分12秒寫道:
> On Sat, 21 Jul 2012 01:16:23 -0700 (PDT), William
> &lt;pillow13...@gmail.com&gt; wrote:
>
> snip
>
> &gt;I compiled it after that,but it was like I have never patched it.
> &gt;Am I doing anything wrong?
>
> The patch doesn&#39;t take but a second to do. All it does is change the
> files needed to alter the code so that the subsequent compile has the
> changes in it.
>
> The old way was a list of files and the edited strings, etc., and hand
> edit sessions.
>
> All the &quot;patch&quot; does is perform those edits for you so all you need to
> do is &quot;patch&quot; and &quot;compile&quot; like you normally do.
>
> It is because &quot;compilers&quot; were programmers in the early days, and they
> knew what was going on. Nowadays most &quot;compilers&quot; are &quot;users&quot; who have
> limited understanding of what is actually going on, but they know how to
> go through the simplified compile steps the developers set it up so that
> it would be easy for a simple user level person to compile.

..But,what do I have to do to make it work??

TheQuickBrownFox

unread,
Jul 23, 2012, 11:32:24 PM7/23/12
to
On Sun, 22 Jul 2012 22:55:37 -0700 (PDT), pillow13...@gmail.com
wrote:

>SoothSayer? 2012?7?22????UTC+8??12?04?12????
Read and try my original response.

Might work. might not.

I play Vulture's Eye on my iPad!

Since it has mouse awareness, it has tap awareness!

So, I use an RDP to open my windows desktop remotely, and get pure pixel
for pixel performance.

It is actually the best RDP app I have yet used.

I was $12 called "iTap RDP". Pretty awesome. Of course, I have a bt
keyboard too!

I should take a screen shot of Vulture's Eye on my iPad (a photo
actually) and put it on an iPad chat site or nethack site and watch the
queries for how I did it flow in.

Hell, I could troll the hell out of them by running impossible apps on
it and posting videos or such. Hehehehehee.*this* RDP app is pretty
tight.

Jonadab the Unsightly One

unread,
Jul 25, 2012, 8:21:34 AM7/25/12
to
On Jul 19, 5:46 am, William <pillow1301300...@gmail.com> wrote:
> I followed all the instructions on NetHackWiki
> (http://nethackwiki.com/wiki/Patching),
> it still wouldn't work.

Patching and compiling on Windows is certainly possible, but it's
more work (than on other platforms). Among other things...

Windows doesn't come with patch, so you have to obtain and
install that. I assume you believe you have done this, but have
you verified that the version of patch you installed works on
your system (e.g., by applying a simple patch to a simple text
file and checking that it applies correctly)? The wiki article
you cite suggests getting patch from GnuWin32 as a zip archive,
but it neglects to mention that you can't necessarily run
anything from a zip archive. You have to actually install it.

Windows doesn't come with a compiler either, so you'll need to
obtain and install that too.

Windows doesn't come with basic build tools either (e.g., make),
so you'll need to obtain and install those as well, plus any
other basic system tools that the NetHack build process takes for
granted. (NetHack comes from the Unix world, so it may just
_assume_ that every computer has stuff like sed and cat and pwd
and touch and tail and so on. I don't happen to know exactly
what its build process uses, but you may run into things that you
need.)

PATH management on Windows is a royal pain in the neck (due to
fundamental features of the way the directory structure is
organized, which is very different from a Unix layout), so you'll
probably end up having to type full paths to executables on the
command line sometimes. The instructions may not always indicate
this when it's necessary, because the people who wrote the
instructions either weren't using Windows or had heavily
customized their system, putting dozens of hours into getting
their PATH exactly just so. (This is difficult because of the
draconian limits Windows places on the length of environment
variables, in combination with the fact that every program you
need adds another directory to the path, not to mention the whole
Progra~1 problem. When I used Windows on a regular basis, I had
an elaborate set of batch files that managed my PATH for various
kinds of tasks. The SUBST command was heavily involved. It was
a nightmare to maintain. Every time I installed a new utility --
something you do constantly on Windows because it doesn't come
with anything -- I had to figure out how to make room for it in
the PATH. Fun times.)

NetHack being from the Unix world, you'll probably be working
with some files that have Unix-style line endings (a single
linefeed character instead of the standard ASCII CRLF pair).
These may need to be converted. The instructions for doing so in
the Wiki article you cited are almost valid but are error prone
and in any case will result in files that have the wrong filename
extension, which an out-of-the-box Windows install will hide from
you by default. (Anyone who has used Windows extensively will
have turned that particularly ill-conceived feature of Windows
Explorer off a long time ago, but if you don't know what I'm
talking about it implies that you probably haven't done so.)
Besides turning the extension hiding behavior off in the Folder
Options, you may also want to consider using a more reliable
line-ending conversion method in the first place, such as
unix2dos or a proper text editor.

> tried the command "patch.exe --binary -p1 < file"
> It jumped out a blank window which disappeared quickly.

Two things here:

1. The patch was almost certainly not intended to be used
in binary mode. The correct solution is to convert the
line endings of all the files involved (both the patch
itself and the source files that it patches) to full
CRLF pairs. (They *should* be that way in the first
place, but stuff from the Unix world often is not,
especially older stuff, and NetHack is old.)

2. If you're getting a blank window that disappears
quickly, something is wrong. You should not be getting
a separate window, other than the command window that
you already have open and are using to run the commands
in question, which should remain open. Surely you're
not trying to do command-line stuff by double-clicking
on batch files in Windows Explorer? Because, if you do
that, you'll never be able to read any error messages to
find out what's going wrong. If you're already using a
command window and a second one opens up, I'm not sure
what could be causing that.

Janis Papanagnou

unread,
Jul 25, 2012, 10:51:26 AM7/25/12
to
Am 25.07.2012 14:21, schrieb Jonadab the Unsightly One:
>
> NetHack being from the Unix world, you'll probably be working
> with some files that have Unix-style line endings (a single
> linefeed character instead of the standard ASCII CRLF pair).

ITYM; "instead of the CRLF pair that is standard on Windows".

Just to be sure no one will falsely assume that CRLF would be
some general standard for text file line terminations.

Janis

MrTallyman

unread,
Jul 25, 2012, 9:14:39 PM7/25/12
to
Well, SOMEONE should have made a decision that EVERYONE followed about
4 decades ago.

Same thing happens with idiots behind the wheel of a car.

Pick any ten, and I can guarantee that you will get ten different
reactions to any given scenario.

This is the reason why folks buy foreign cars.

Nobody over here can make up their mind, and any that do will not share
any choices made for fear of being competed with.

Somebody should come out with a stainless car that lasts for decades.

No, DMC wasn't it.

Jonadab the Unsightly One

unread,
Jul 25, 2012, 10:09:10 PM7/25/12
to
On Jul 25, 10:51 am, Janis Papanagnou <janis_papanag...@hotmail.com>
wrote:
> > NetHack being from the Unix world, you'll probably be working
> > with some files that have Unix-style line endings (a single
> > linefeed character instead of the standard ASCII CRLF pair).
>
> ITYM; "instead of the CRLF pair that is standard on Windows".

No, actually, Unix gets this one wrong. I'm not a big Windows fan,
but their interpretation of the ASCII spec on this issue is absolutely
correct -- and, I might add, matches the interpretation of most
pre-PostScript printer manufacturers AND most application-layer
network protocols, especially ones that typically run over TCP/IP.

> Just to be sure no one will falsely assume that CRLF would be
> some general standard for text file line terminations.

Technically, per the ASCII spec (on which almost all current
text encoding specifications, including Unicode, are based) a
linefeed by itself is a control character that means move directly
down one line, to the character directly under the previous position.

Jonadab the Unsightly One

unread,
Jul 25, 2012, 10:15:50 PM7/25/12
to
On Jul 25, 9:14 pm, MrTallyman <MrTally...@BananaCountersRUs.org>
wrote:
[Line-ending conventions]
> Well, SOMEONE should have made a decision that
> EVERYONE followed about 4 decades ago.

The good news is, CR-only is pretty well dead at this point.
A lot of software (especially text editors) still supports it
as an option, but it's no longer the default on pretty much
anything.

Unix did have a reason for deviating from the spec:
saving one byte per line of text was actually considered
significant at the time. The good news is, almost all
Unix software can handle standard CRLFs, so bringing
files over from Windows to Unix is seldom a problem.

However, a lot of Unix software still generates LF-only
line endings by default, and some Windows-only software
(very notably, Notepad) does not support that, so taking
Unix files over to a Windows system often necessitates
conversion.
Message has been deleted

Janis Papanagnou

unread,
Jul 26, 2012, 2:29:09 AM7/26/12
to
On 26.07.2012 04:09, Jonadab the Unsightly One wrote:
> On Jul 25, 10:51 am, Janis Papanagnou <janis_papanag...@hotmail.com>
> wrote:
>>> NetHack being from the Unix world, you'll probably be working
>>> with some files that have Unix-style line endings (a single
>>> linefeed character instead of the standard ASCII CRLF pair).
>>
>> ITYM; "instead of the CRLF pair that is standard on Windows".
>
> No, actually, Unix gets this one wrong. I'm not a big Windows fan,
> but their interpretation of the ASCII spec on this issue is absolutely

ASCII defines a character set, no more no less. What you would need
is a specification for "text files"; there isn't any. There isn't
any standard for _text file line endings_.

All you derive from that view is arbitrary, so...
> [snip]

>> Just to be sure no one will falsely assume that CRLF would be
>> some general standard for text file line terminations.
>
> Technically, per the ASCII spec (on which almost all current
> text encoding specifications, including Unicode, are based) a
> linefeed by itself is a control character that means move directly
> down one line, to the character directly under the previous position.

This is also not correct; you have to distinguish control character
handling of devices from data format specifications. The first one
depends on the device interpreting the control characters; e.g. ^G
(ASCII BEL, ASCII code 7) rings a bell on some device, flashed on
other devices, does nothing on, again, other devices, and may fire
missile weapons on, again, other devices. WRT text file formats, I
repeat, there is no general specification; most extreme example, but
good enough to make the issue apparent, are main frames which often
had no line terminators at all. You certainly know the contemporary
CR, LF, and CRLF system variants. Some programs even omit a final
line termination at the end of the file; stupid programmers!

ASCII has defined a couple of codes that may be used to control
hardware devices; you gave printers as example. In some old printers
there could be interest to control line feeds individually. But not
generally. E.g. for a teletyper there's with a carriage there's need
for some carriage control code, and for some line feed control code.
For the at that time common line printers (which had one chain with
letter types for each print column!) there is no need for a CR code.
Even for contemporary laser printers there is no necessity for a CR
and no need for LF. But that is about (ASCII or other) control codes
for devices; the issue here is about text file line endings, where
there is no standard. CR, LF, CR-LR, are just used because there was
a feel that it somehow "matches" and (in modern architectures) you
will need to use some non-prinable code.

Janis

Janis Papanagnou

unread,
Jul 26, 2012, 2:43:19 AM7/26/12
to
On 26.07.2012 04:15, Jonadab the Unsightly One wrote:
> On Jul 25, 9:14 pm, MrTallyman <MrTally...@BananaCountersRUs.org>
> wrote:
> [Line-ending conventions]
>> Well, SOMEONE should have made a decision that
>> EVERYONE followed about 4 decades ago.
>
> The good news is, CR-only is pretty well dead at this point.
> A lot of software (especially text editors) still supports it
> as an option, but it's no longer the default on pretty much
> anything.
>
> Unix did have a reason for deviating from the spec:
> saving one byte per line of text was actually considered
> significant at the time. The good news is, almost all
> Unix software can handle standard CRLFs, so bringing
> files over from Windows to Unix is seldom a problem.

Many wrong claims. You should check your sources of wisdom.

> However, a lot of Unix software still generates LF-only
> line endings by default, [...]

Of course; LF is standard on Unix, as CR-LF is standard on
WinDOS, and CR was standard on Macs.

There's still no general standard.

WRT data format specifications; protocols have to define
their data formats for purpose of separation or termination
of records. There's no such beast for text files.

And there's certainly no need to define *two* control codes
for unambiguous separation of text units. And no need to
assume a text file's main purpose is to send it to a printer
with a carriage that you need to position by a CR control
code.

Janis

Janis Papanagnou

unread,
Jul 26, 2012, 4:09:11 AM7/26/12
to
Am 26.07.2012 07:26, schrieb Jukka Lahtinen:
> Jonadab the Unsightly One <jonadab.the...@gmail.com> writes:
>
>> [Line-ending conventions]
>
>> Unix did have a reason for deviating from the spec:
>
> Err.. which spec? Who has defined one and when? Any reference to any
> standard about line-endings?

In the past decades I made two attempts to find one that would
only clarify the line terminator vs. line separator issue; but
to no avail!

(The CR/LF non-standard issue, OTOH, was (to me) quite apparent
given the history of computer systems technology.)

Yes, if Jonadab (or someone else) would be able to provide such
a normative standard (or if only a "quasi-standard") document,
I'd welcome that as well. I doubt, though, that there would be
any attempt recently, since, with the given status quo, you could
never satisfy both of those two established worlds, Unix and MS.

Janis

PS: I notice we're discussing this in the Nethack group; [OT] added.

ais523

unread,
Jul 26, 2012, 7:58:38 AM7/26/12
to
On Thu, 26 Jul 2012 10:09:11 +0200, Janis Papanagnou wrote:
> Am 26.07.2012 07:26, schrieb Jukka Lahtinen:
>> Err.. which spec? Who has defined one and when? Any reference to any
>> standard about line-endings?
>
> In the past decades I made two attempts to find one that would only
> clarify the line terminator vs. line separator issue; but to no avail!
>
> (The CR/LF non-standard issue, OTOH, was (to me) quite apparent given
> the history of computer systems technology.)

CR/LF /is/ standard for network transmissions (just like "network byte
order" is), and you'll find POSIXy systems using it even though they use
plain LF for everything else.

I've heard, but don't know of a reliable source, that the original reason
that CR was split from LF is that some old printers couldn't move back to
the start of the line fast enough, and using a two-byte code gave enough
of a delay for them to be able to process the code. It's interesting how
old hacks have such a tendency to propagate.

It's also worth noting that EBCDIC, that famous competitor to ASCII which
ended up losing, had /three/ line-ending-like control codes; linefeed,
carriage return, and newline. (I don't know which were most commonly
used.)

--
ais523

Janis Papanagnou

unread,
Jul 26, 2012, 12:37:32 PM7/26/12
to
On 26.07.2012 13:58, ais523 wrote:
>
> I've heard, but don't know of a reliable source, that the original reason
> that CR was split from LF is that some old printers couldn't move back to
> the start of the line fast enough, and using a two-byte code gave enough
> of a delay for them to be able to process the code. It's interesting how
> old hacks have such a tendency to propagate.

Not too astonishing, I think. If you consider the teletype devices; they
have a carriage and a role, the carriage can be triggered by the CR and
the role by the LF independently. But I presume by that parallelism you
won't gain too much, since you have to synchronize output of subsequent
characters with a complete returned carriage anyway. But as I mentioned
elsethread, a carriage is not the only output device we had at that time
(or that we have now), and the typical line printers in computer operating
centres were built and working differently.

> It's also worth noting that EBCDIC, that famous competitor to ASCII which
> ended up losing, had /three/ line-ending-like control codes; linefeed,
> carriage return, and newline. (I don't know which were most commonly
> used.)

There's an article in Wikipedia that gives some good insights - wait... -
here it is: http://en.wikipedia.org/wiki/Line_feed
that has even more information about the issue, including the NEL code
that you probably mean.

Also interesting, as I just remembered after my last posting, old printers
(I think it was in a CDC 176 context) used the *ordinary printable character*
in the first column as control code. So a plain character '1' or '2' (don't
recall) in column 1 would issue, say, a paper feed (i.e. ejecting/switching
to the next page); imagine what happens if you route your numerical tables
(by bypassing the driver) to the raw printer device! - Fun, an angry
operator, and a lot of useless paper, maybe to be used in the lavatory only.

Janis

Jonadab the Unsightly One

unread,
Jul 27, 2012, 4:31:12 AM7/27/12
to
On Jul 26, 1:26 am, Jukka Lahtinen <jtfjd...@hotmail.com.invalid>
wrote:
> Jonadab the Unsightly One <jonadab.theunsightly...@gmail.com> writes:
>
> > [Line-ending conventions]
> > Unix did have a reason for deviating from the spec:
>
> Err.. which spec?

The American Symbolic Code for Information Interchange

> Who has defined one and when?

It was published in the sixties by some standards organization or
another and is used by, basically, everything.

Janis Papanagnou

unread,
Jul 27, 2012, 4:47:20 AM7/27/12
to
Am 27.07.2012 10:31, schrieb Jonadab the Unsightly One:
> On Jul 26, 1:26 am, Jukka Lahtinen <jtfjd...@hotmail.com.invalid>
> wrote:
>> Jonadab the Unsightly One <jonadab.theunsightly...@gmail.com> writes:
>>
>>> [Line-ending conventions]
>>> Unix did have a reason for deviating from the spec:
>>
>> Err.. which spec?
>
> The American Symbolic Code for Information Interchange

You should read the details of that spec and ask yourself what it
has to do with a spec for text file line terminators/separators.

In addition I suggest to have a closer look into
http://en.wikipedia.org/wiki/Line_feed
which has very good information and helps to understand the issue.

Janis

Jonadab the Unsightly One

unread,
Jul 27, 2012, 5:24:05 AM7/27/12
to
On Jul 26, 2:29 am, Janis Papanagnou <janis_papanag...@hotmail.com>
wrote:
> ASCII defines a character set, no more no less.

Yes, but it defines *meanings* for some of those characters,
including the carriage return and linefeed. As I noted
upthread, ASCII specifically indicates that the linefeed
means move straight down one line to the position directly
below the previous position. That's how ASCII printers
treat the character -- which is why you can chuck a DOS text
file out the parallel port and expect the printer to handle
it correctly, but this will not work with a Unix text file.
Dumb terminals (back when people still used such devices)
interpreted the line feed character in the same way, because
that's what the spec says it means.

ASCII doesn't specifically say that you move to the next
line by issuing CR and then LF. (In fact, LF and then CR
would have exactly the same meaning, per the spec; putting
the CR first is purely a convention.) What it does specify
is that LF means go straight down one line, without changing
columns, and that CR means go to the beginning of the
current line. There isn't any single control character in
ASCII for "go to the beginning of the next line".

> you have to distinguish control character
> handling of devices from data format specifications.

Obviously, a device can choose to implement or not implement
any given control character, or to substitute equivalent
behavior (such as flashing instead of beeping when no
speaker is available). That's largely irrelevant to the
question of what the spec says the control character
*means*.

Some newsreaders implement the pagefeed character by
requiring some scroll-down action on the part of the user in
order to view what follows. Other newsreaders ignore it.
This does not change the fact that the pagefeed character
*means* that the following content is on the next page.

The real clincher for my point is that the people who wrote
almost all of the application-layer network protocols we use
every single day seem to agree with me, since they pretty
much universally specify both CR and LF as necessary to
terminate a line in said protocols. Most of these people
used Unix and were not unaware of the Unix text file
convention. You can't send a Unix text file as the body of
an email message without at some point converting it, for
example. The mail server will send you error codes. This
goes back ultimately to telnet.

> good enough to make the issue apparent, are
> main frames which often had no line terminators at all.

Some mainframe file formats did not have line terminators,
because they were not streams, and there was no transition
from one line to the next -- each line was logically
separate. (In particular, many record-based formats
specified the length of each field, so no delimiter was
necessary between fields. The closest modern equivalent is
probably an SQLite DB file, where each "line" of text is a
record.) This is neither here nor there. It certainly has
no bearing on the question of what the linefeed character
means, since the files in question did not even use
linefeeds.

> Some programs even omit a final line termination
> at the end of the file;

Technically, it's not required; however, I would apply here
the rule, "Be strict with what output you generate, liberal
in what input you accept". The greater mistake is writing
software that does not work correctly if the last line isn't
blank, but since such software is known to exist it's also a
mistake (in most cases) to generate files with a non-blank
last line.

The thing is, the people who wrote Unix were *aware* that
technically a line should end with carriage return and
linefeed. They chose to omit the carriage return to save a
byte per line. At the time, this made some sense, because
disk space was a scarce resource -- extremely so, by our
modern standards in this decadent era of seventy-dollar
consumer-grade hard drives that hold half a terabyte or more
and fit in a microcomputer's 3.5-inch internal drive slot,
when companies give out USB Flash drives that hold half a
gigabyte or more and fit in the palm of your hand as free
promotional items. You tell young people these days that
you once had a ten-megabyte hard drive, and they just assume
you've got your unit sizes confused -- but when Unix was
designed, a drive that large would have supported dozens of
users.

Jonadab the Unsightly One

unread,
Jul 27, 2012, 5:42:07 AM7/27/12
to
On Jul 26, 12:37 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
wrote:

> There's an article in Wikipedia that gives some good insights - wait... -
> here it is:http://en.wikipedia.org/wiki/Line_feed

Interesting. I was not aware that some systems did indeed
use LFCR. Logically it makes just as much sense as CRLF.
Either way you are both moving to the next line and also to
the first column -- assuming you're doing both together as
a single "beginning of next line" operation, the order of the
two components thereof is unimportant.

Janis Papanagnou

unread,
Jul 27, 2012, 7:36:11 AM7/27/12
to
The order could be important, as mentioned somewhere in the thread,
if you trigger the subdevices (carriage and role) separately; the
rotation of the role by ~1/6" (IIRC) would be possible much faster
than returning the carriage 80/6" (or was it 80/8"?). So we could
invoke CR earlier than the LF, because to execute a CR would require
more time. But that would mean that we speak of slooow devices, and
that the transmission of individual characters would require time
that is non-negligible. While my conclusion would be to agree with
you that CR/LF order is unimportant in practice, but (as had been
mentioned by someone) could have influenced thoughts during the
"design" of a CR/LF sequence.

Janis

Janis Papanagnou

unread,
Jul 27, 2012, 8:40:05 AM7/27/12
to
Am 27.07.2012 11:24, schrieb Jonadab the Unsightly One:
> On Jul 26, 2:29 am, Janis Papanagnou <janis_papanag...@hotmail.com>
> wrote:
>> ASCII defines a character set, no more no less.
>
> Yes, but it defines *meanings* for some of those characters,

You cannot isolate the "meaning" from the interpreting device.

(Please read the provided reference, which elaborates clearly
also on that issue.)

> including the carriage return and linefeed. As I noted
> upthread, ASCII specifically indicates that the linefeed
> means move straight down one line to the position directly
> below the previous position. That's how ASCII printers
> treat the character -- which is why you can chuck a DOS text
> file out the parallel port and expect the printer to handle
> it correctly, but this will not work with a Unix text file.
> Dumb terminals (back when people still used such devices)
> interpreted the line feed character in the same way, because
> that's what the spec says it means.

Now what is your opinion about sending the _text files_ (for which
we have been seeking a standard for) to a non-"ASCII printer".

(I've got the impression that you are building your view of "the
world" around ASCII. Without understanding it's role.)

> ASCII doesn't specifically say that you move to the next
> line by issuing CR and then LF. (In fact, LF and then CR
> would have exactly the same meaning, per the spec; putting
> the CR first is purely a convention.) What it does specify
> is that LF means go straight down one line, without changing
> columns, and that CR means go to the beginning of the
> current line. There isn't any single control character in
> ASCII for "go to the beginning of the next line".

(You seem to haven't read the references carefully enough.)

>
>> you have to distinguish control character
>> handling of devices from data format specifications.
>
> Obviously, a device can choose to implement or not implement
> any given control character, or to substitute equivalent
> behavior (such as flashing instead of beeping when no
> speaker is available). That's largely irrelevant to the
> question of what the spec says the control character
> *means*.

Devices, ASCII (or other) control codes, and interpreting
control characters, are related. That's covered by most of
what I posted in the thread. The other point was that ASCII
control codes are not suitable to unambiguously define what
a new-line constitues, let alone provide a standard; ASCII
doesn't define that. This is also very clearly explained in
the reference that I provided.

>
> Some newsreaders [...]. Other newsreaders [...].
> This does not change the fact that the pagefeed character
> *means* that the following content is on the next page.

This means nothing, actually. You need a definition for an
output device how it handles control codes or other characters.

I repeat; per se a "carriage return" is meaningless on a
chain-line-printer or a laser printer, as a page-feed may be
irrelevant on a scrollable window on any windowing system.

>
> The real clincher for my point is that the people who wrote
> almost all of the application-layer network protocols we use
> every single day seem to agree with me, since they pretty
> much universally specify both CR and LF as necessary to
> terminate a line in said protocols.

In your argument you now mix the well-defined "apprication-
layer network protocols" with the [non-existing] "text file"
standards.

(Also, again, read carefully the references which comment
also on those protocols.)

> Most of these people
> used Unix and were not unaware of the Unix text file
> convention.

I don't see any evidence for that. (Rather, if you inspect
the author's names you may get the opposite view.) - So, what
makes you think so? - Basically it's even irrelevant, given
what said in the previous paragraph and earlier in the thread.

> You can't send a Unix text file as the body of
> an email message without at some point converting it, for
> example. The mail server will send you error codes. This
> goes back ultimately to telnet.

You will also get errors if you are sending text files (either
LF, CR-LF, or CR terminated) over an BER encoded ASN.1 defined
X.400 mail system. So what?!

Again you are talking about a protocol with its data definition.
(I already commented on that very early in the thread.) We have
been talking about standards for "text file" line terminators.
It's not helpful if you try to extend, say, RFC-822 definitions
(or any other protocols) to the question of the issue.
Please refrain from the red herrings and provide a specification
for what we were discussing; if you know any.

>
>> good enough to make the issue apparent, are
>> main frames which often had no line terminators at all.
>
> Some mainframe file formats did not have line terminators,
> because they were not streams, and there was no transition
> from one line to the next -- each line was logically
> separate.

Wrong. (Read the references! *sigh*)

> (In particular, many record-based formats
> specified the length of each field, so no delimiter was
> necessary between fields. The closest modern equivalent is
> probably an SQLite DB file, where each "line" of text is a
> record.) This is neither here nor there. It certainly has
> no bearing on the question of what the linefeed character
> means, since the files in question did not even use
> linefeeds.
>
>> Some programs even omit a final line termination
>> at the end of the file;
>
> Technically, it's not required;

Technically neither a LF nor a CR is necessary.

> however, I would apply here
> the rule, "Be strict with what output you generate, liberal
> in what input you accept". The greater mistake is writing
> software that does not work correctly if the last line isn't
> blank, but since such software is known to exist it's also a
> mistake (in most cases) to generate files with a non-blank
> last line.

I fear, you completely missed the issue with such omissions.

(( $(wc -l < A) + $(wc -l < B) == $(cat A B | wc -l) ))

should always be an invariant. ALWAYS! Don't start fiddling
with artifical "final blank lines" or similar nonsense; that
view results exactly in such inconsistent software behaviour
we meet more often than doing good to ones health. Try to
understand the implications of such "details"!

>
> The thing is, the people who wrote Unix were *aware* that
> technically a line should end with carriage return and
> linefeed. They chose to omit the carriage return to save a
> byte per line.

You repeated that opinion. If you think that would be crucial
for the question - which I don't think it is - it would be
appropriate to give some evidence.

So let me repeat also; for line termination a single control
character is sufficient, using CR, or LF, or anything else is
arbitrary and (sadly) not standardized for text files.

Janis

> At the time, this made some sense, because
> disk space was a scarce resource -- [...]

Jonadab the Unsightly One

unread,
Jul 27, 2012, 11:11:29 PM7/27/12
to
On Jul 27, 8:40 am, Janis Papanagnou <janis_papanag...@hotmail.com>
wrote:
> >> ASCII defines a character set, no more no less.
>
> > Yes, but it defines *meanings* for some of those
> > characters,
>
> You cannot isolate the "meaning" from the interpreting
> device.

That's the stupidest thing you've said in this entire
thread. ASCII and Unicode would both have no reason to
exist if that were so.

A character set certainly can assign a particular meaning to
a given number. That's what character sets do. That's
*all* they do. A number that is defined to have a
particular meaning is the very *definition* of a character
(in the context of a character set; obviously linguists and
authors both use the word "character" to mean other things).

The interpreting device is supposed to interpret whatever
characters it interprets in a fashion consistent with the
meaning assigned by the character set. That's what it means
for a device to "support" a given character set.

> (Please read the provided reference, which elaborates
> clearly also on that issue.)

If by "the provided reference" you mean the article on line
feeds, it explains very clearly why you're wrong, several
times, in several different paragraphs. I don't see how you
missed it.

> Now what is your opinion about sending the _text files_ (for which
> we have been seeking a standard for) to a non-"ASCII printer".

Then they have to be converted, obviously.

If Unix had been an EBCDIC system, or used some other
encoding, then of course we would *expect* its text files to
need to be converted to be sent to an ASCII printer. But
Unix used ASCII as its character set. Mostly. Or such was
the claim at any rate.

> Devices, ASCII (or other) control codes, and interpreting
> control characters, are related. That's covered by most of
> what I posted in the thread.

Of course they're related. They're related in a very
particular way. You seem, however, to be under the
impression that an operating system does not qualify as a
device (it certainly does), or that it is normal and
expected that devices claiming to support the same standard
do so in contradictory, mutually incompatible ways that
violate the spirit and letter of the standard in question.

Okay, to some extent that is normal and expected, because we
live in a complicated and imperfect world, and sometimes
there are even good reasons for it, but when it happens I
like to acknowledge the fact that it has in fact happened.

DOS and Windows certainly have their share of deviations
from various specifications. Too many to list. I am not
aware of any operating system that is completely without.

> The other point was that ASCII control codes are
> not suitable to unambiguously define what a new-line
> constitues, let alone provide a standard;

You keep saying this, but you have yet to offer any
reasoning for why it is so. Setting to one side the
irrelevant terminology "newline", ASCII *does* define very
clearly what an ASCII carriage return character signifies
and what an ASCII line feed character signifies. Unix (and
C for that matter; the two were developed together) uses the
line feed character to signify the sum of both these things
together. That's clearly quite different from what ASCII
says it means, and it's different from how every single
other thing in the history of computing understands it
(except for things that were implemented after Unix and
followed its example because they were designed to be
compatible with it, e.g., BeOS).

> I repeat; per se a "carriage return" is meaningless on a
> chain-line-printer or a laser printer,

An ASCII carriage return is per se meaningless on a
PostScript printer, perhaps, but that's not the same
statement you appear to be trying to make.

By analogy, an ASCII carriage return is not applicable
on an EBCDIC system. If Unix were an EBCDIC system,
your reasoning would be valid. Such is not the case.

> You will also get errors if you are sending text files (either
> LF, CR-LF, or CR terminated) over an BER encoded
> ASN.1 defined X.400 mail system. So what?!

Now you're just being deliberately obtuse. BER-encoded
ASN.1 defined X.400 mail systems aren't based on the
ASCII specification in the same way that telnet and
SMTP are. These protocols handle the linefeed character
they way they do because ASCII says it means a certain
thing.

> (( $(wc -l < A) + $(wc -l < B) == $(cat A B | wc -l) ))
>
> should always be an invariant. ALWAYS!

Now you're talking about POSIX in the modern (post-BSD)
world, which is the way it is precisely because Unix was the
way it was and did what it did.

It is certainly true that why we can't now go back and
implement a Unix system that interprets linefeeds according
to the ASCII spec. It's far too late to fix it now. Too
much would break. There would be great pain and sorrow and
weeping and gnashing of teeth. I'll concede that readily.

We're stuck with the Unix interpretation of the linefeed
character for the forseeable future, possibly even forever.

(Indeed, I think it likely that Microsoft or their logical
successor will eventually switch over to storing text files
in the Unix fashion. It may be decades yet, but it will
happen sooner or later. Sometimes de facto standards are
more important than restoring a correct interpretation of a
long-neglected point in an old standard, and ASCII is in the
process of being displaced by Unicode now anyway.)

> So let me repeat also; for line termination a
> single control character is sufficient,

In the abstract, yes.

But no single ASCII character by itself was intended to be
used in this way, because ASCII (very much unlike Unicode)
went out of its way not to define redundant characters, in
order to fit in seven bits. You will notice, if you study
the matter closely, that ASCII does not for example define
distinct characters for single quote and apostrophe. One
character for both was sufficient. There's also no elipsis
character, because series of three periods conveyed the same
information. There's no copyright symbol, because you can
just say (C). There's no "end of line" character, since
that would basically just be the sum of carriage return and
linefeed anyhow, so just use that. The list of such choices
that ASCII makes for the sake of keeping the number of
characters small is lengthy. It was a fundamental design
principle of the character set. In some ways, it's
analogous to a RISC chip set or, more to the point, the Unix
philosophy of providing a number of simple tools that each
do just one thing and allowing the user to glue them
together with shell constructs like backticks and pipes.

Janis Papanagnou

unread,
Jul 28, 2012, 5:38:21 AM7/28/12
to
On 28.07.2012 05:11, Jonadab the Unsightly One wrote:
> On Jul 27, 8:40 am, Janis Papanagnou <janis_papanag...@hotmail.com>
> wrote:
>>>> ASCII defines a character set, no more no less.
>>
>>> Yes, but it defines *meanings* for some of those
>>> characters,
>>
>> You cannot isolate the "meaning" from the interpreting
>> device.
>
> That's the stupidest thing you've said in this entire
> thread. ASCII and Unicode would both have no reason to
> exist if that were so.

What I wrote is so fundamentally "a Truth" that I wonder that
you comment in such a personal way; since usually I got the
impression that you think about issues more deeply. I suggest
to rethink about that.

And I would therefore prefer to abstain from commenting the
rest of your posting since you seem to have missed even some
of the very basics here.

Also remember; we have been looking for a general standard
for a "text file" specification generally or for "text file"
line terminators, specifically. And yet no one provided one.

That should suffice for most readers. The rest for the [OT]
hard-liners...

>
> A character set certainly can assign a particular meaning to
> a given number.

Example: 'I' (ASCII 75)
Meaning: Capital Letter I
Meaning: Roman Number 1
Meaning: Chemical Element Iod
etc.
The fact that the ASCII description is "CAPITAL LETTER I" does
not restrict meanings for other devices (human, algorithmic,
hardware, software). It is depending on the processing device.
Send 'I' to one device you get different reactions than from
other devices. Don't ever assume that 'I' will have a generic
meaning for all devices.

Now back to "text files"; you want to terminate a line, so you
have to choose some line terminator. You need one unambiguous
character, so you choose a control code. You want to choose one
that resembles the entity "line", so you may choose a "line feed",
LF. Others may have thought differently, and thought a "carriage"
may be appropriate because they had the picture in mind that a
text is something to be printed out, and since on a typewriter
you have the carriage lever where you can position the carriage
to the left (and the line feed is done implicitly), so a simple
CR is sufficient. And again others will decide to use CR and LF.
So what it the truth? What is a "false" design? What is right?

It depends on the interpreting device what you get. And for that
standards are helpful. And you want a standard for "text files".
But we don't have one! We have many that define character sets,
including control codes. ASCII for example. Do you think we
have to assume that all ASCII codes shall be used (if somehow
appropriate)? Why shall we use, for "text files", a "carriage
concept"? But why shall we *not* use a STX (Start of text) at
the beginning of a file, or a RS (Record Separator) between
lines? - Clear now? - If not, I really can't help, I'll bite.

"If all you have is a hammer everything looks like a nail."
Obviously, your hammer is ASCII. And a very specific view how
to use it.

> [snip meaningless and already covered text]

>> Devices, ASCII (or other) control codes, and interpreting
>> control characters, are related. That's covered by most of
>> what I posted in the thread.
>
> Of course they're related. They're related in a very
> particular way. You seem, however, to be under the
> impression that an operating system does not qualify as a
> device (it certainly does), or that it is normal and
> expected that devices claiming to support the same standard
> do so in contradictory, mutually incompatible ways that
> violate the spirit and letter of the standard in question.

No, I (basically) said that you cannot assume a "text file"
line terminator to contain a 'CR' ASCII control code. You
seem to see some contradictory behaviour of the Unix OS'es.
You claimed that CR-LF is right and everything else wrong.
I think, here you should ponder about 'STX', again. And why,
for you, an STX or RS may be invalid (per "meaning") and why
a CR would be necessary to have a "correct" text file.
Actually, you don't need STX, ETX, CR, etc. for definition
of a "text file" structure. And there's no general standard
for the latter. Assuming using CR is "right" and not using
it "wrong" is arbitrary.

> [snip stuff that has already been answered]
>
> [CR+LF] Unix (and
> C for that matter; the two were developed together) uses the
> line feed character to signify the sum of both these things
> together.

Well, no. Unix uses a control character LF to indicate that a
line in a "text file" is complete. (Ready to feed a new line,
if you like comprehensive pictures.)

Your mental model seem to somehow be focussed on using CR as
well.

Both are arbitrary, both fit to some degree, depending on the
mindset. None of them is even necessary. None of them is even
standard for "text files".

(And even ASCII defines other codes for data structuring; see
below.)

> [snip some more that has already been answered]

>> You will also get errors if you are sending text files (either
>> LF, CR-LF, or CR terminated) over an BER encoded
>> ASN.1 defined X.400 mail system. So what?!
>
> Now you're just being deliberately obtuse.

Verbal attacks won't help you to understand the issue. Better
think about what has been said more deeply. I will elaborate...

> BER-encoded
> ASN.1 defined X.400 mail systems aren't based on the
> ASCII specification in the same way that telnet and
> SMTP are. These protocols handle the linefeed character
> they way they do because ASCII says it means a certain
> thing.

The point that has been made was; those protocols *define* what
they expect as data format to be exchanged, they explicitly
specify, e.g., that the underlying character set is ASCII, and
that the separator is CR-LF. Every communication protocol will
have to define those issues for interoperability. The same is
true for X.400; they also define the transfer syntax for valid
X.400 messages (a binary format in case of BER). If they would
not do that, they would just *not work* in practice.

You said in the posting I responded to, that you will get errors
if you feed (only-)LF terminated text directly into a SMTP (or
NNTP) message body. And of course you may get errors! Because
those protocols define other data formats to use. If you fit
something into a protocol that it is not defined for you will
get errors; non-conforming text in SMTP as well as in X.400.
That was the point! Okay?

You have specified communication protocols on one side, OTOH
you still have unspecified "text files".
(Remember? We were talking about "text file" specifications!)
All references to communication protocols help to understand,
implement, or use those protocols; but for the issue here that
is nothing but a red herring.
Clear now?

>
>> (( $(wc -l < A) + $(wc -l < B) == $(cat A B | wc -l) ))
>>
>> should always be an invariant. ALWAYS!
>
> Now you're talking about POSIX in the modern (post-BSD)

*sigh* - no, I am *not* talking about POSIX. I am talking about
_a functional invariant_.

> world, which is the way it is precisely because Unix was the
> way it was and did what it did.

I described a functional property that is independent of Unix.
The consequence of not considering functional invariants is one
major reason for inconsistent systems (AKA "broken by design").

>
> It is certainly true that why we can't now go back and
> implement a Unix system that interprets linefeeds according
> to the ASCII spec.

There's no need to do so.

> It's far too late to fix it now.

There's nothing to "fix" [in Unix], since there's no bug.

But it's certainly too late to have a unique standard for "text
files".

> [...]
>
>> So let me repeat also; for line termination a
>> single control character is sufficient,
>
> In the abstract, yes.
>
> But no single ASCII character by itself was intended to be
> used in this way, because ASCII (very much unlike Unicode)
> went out of its way not to define redundant characters, in
> order to fit in seven bits.

You should explain this opinion on ASCII.

> You will notice, if you study the matter closely,

LOL! - Be assured I did, for decades, very closely. :-)

> that ASCII does not for example define
> distinct characters for single quote and apostrophe. One
> character for both was sufficient. There's also no elipsis
> character, because series of three periods conveyed the same
> information. There's no copyright symbol, because you can
> just say (C).

See, you must take into account the time when it was developed;
at that time there were still 5 bit (Telex) and 6 bit (mainframes)
character sets in use. But also with 7 bit (ASCII) you can just
represent a fraction of what's needed worldwide. The problem, from
my perspective, was not the missing ellipsis or copyright symbol -
well, maybe you see it that way, since the 'A' in ASCII indicates
clearly the country context of the development -; but there are
not even the characters available that are used and necessary in
the European countries. The point is; 7 bit is just limited, and
the US of 'A' standards bodies primarily took care of their own
(limited) needs. So count the characters and code positions you
have (52 letters + 10 digits + 16 [necessary] punctuation/space)
and you fill the free positions with codes that *may* be useful.
ASCII control codes are for "Information Interchange", with all
what may be necessary, including codes for signalling, which are
for "text file" definitions completely unnecessary, for example.
But not all control codes are necessary for every purpose; that
depends on the actual communication devices.

> There's no "end of line" character, since
> that would basically just be the sum of carriage return and
> linefeed anyhow, so just use that.

For data structuring I see (at least) four(!) ASCII codes that
could be used; one of them RS ('Record Separator', ASCII 30)
even seems to have been used by some systems (as you see in the
reference that I posted http://en.wikipedia.org/wiki/Line_feed).

So, as I see it, those claims and assumptions that you make are
arbitrary, and partly not even correct. None of the four codes
for data structuring (FS, GS, RS, US) is used by the established
systems for line endings; orthogonality of codes can hardly be
seen as reason for defining two device control codes CR and LF
but ignore existing data structuring codes, when it comes to a...

..."text file" specification.

If you have problems to understand those points, I fear, we have
to agree that we disagree. Nothing wrong with that.

PS: If you want to extend the communication on that (off-)topic,
please abstain from calling me "stupid" and "obtuse" again.
With your knowledge and reasoning I think you're also not in a
too good position for offences like that and insults. Thanks for
your understanding.

Janis

> [...]

pillow13...@gmail.com

unread,
Jul 28, 2012, 10:32:55 PM7/28/12
to
And how can I convert LF to CRLF?

rpresser

unread,
Jul 28, 2012, 10:44:24 PM7/28/12
to
Janis, you're trying to say Unix is "right" because
it uses a linefeed to end lines in text files.

ASCII was designed LONG BEFORE any such thing as
a text file had even been thought of. ASCII was
derived from other codes and standards describing
the behavior of physical devices for which the
distinction between a carriage return and a line
feed was exceedingly important.

ASCII was optimized for transmitting messages over
wires to objects that produced marks printed on paper.
Unix was optimized for saving files on spinning
magnetic discs. Of COURSE the interpretation is different.
And Unix's decision to use one character instead of
two makes sense when you look at what Unix's designers
wanted to accomplish.

It does not change the fact that ASCII was designed
and optimized for a completely different purpose.

Denying that the ASCII code defines CR and LF for
different purposes is a denial of history. Denying
that Unix's convention is workable for text files
is a denial of the present. Neither denial is
supportable among reasonable individuals.

Janis Papanagnou

unread,
Jul 29, 2012, 3:31:03 AM7/29/12
to
On 29.07.2012 04:44, rpresser wrote:
> Janis, you're trying to say Unix is "right" because
> it uses a linefeed to end lines in text files.

No, I didn't say that. In the long lasting discussion
you may have missed what had been the issue; I quote:

> Am 25.07.2012 14:21, schrieb Jonadab the Unsightly One:
> >
> > [...] (a single
> > linefeed character instead of the standard ASCII CRLF pair).
>
> ITYM; "instead of the CRLF pair that is standard on Windows".
>
> Just to be sure no one will falsely assume that CRLF would be
> some general standard for text file line terminations.

It's a difference if one says that line terminators
for text files are not standardized, or if one claims
that they have to be a CR-LF sequence to be "right".

>
> ASCII was designed LONG BEFORE any such thing as
> a text file had even been thought of. ASCII was
> derived from other codes and standards describing
> the behavior of physical devices for which the
> distinction between a carriage return and a line
> feed was exceedingly important.

Yes, that has already been said before; I explained
also about such devices (teletype), and other devices
(line printers), where a CR can make sense or not.

>
> [...]
> And Unix's decision to use one character instead of
> two makes sense when you look at what Unix's designers
> wanted to accomplish.

Which goal of their work do you have in mind here?

>
> It does not change the fact that ASCII was designed
> and optimized for a completely different purpose.
>
> Denying that the ASCII code defines CR and LF for
> different purposes is a denial of history.

I don't think anyone here did that, but I'm not sure.

> Denying
> that Unix's convention is workable for text files
> is a denial of the present.

Yes it's "workable". The other poster just said that
it's (sort of) conceptually broken.

> Neither denial is
> supportable among reasonable individuals.

Janis

Janis Papanagnou

unread,
Jul 29, 2012, 3:35:52 AM7/29/12
to
On 29.07.2012 04:32, pillow13...@gmail.com wrote:
> And how can I convert LF to CRLF?

There are tools for that, like unix2dos, and other programs;
see, for example: http://en.wikipedia.org/wiki/Unix2dos.
On WinDOS platforms I use Cygwin with the necessary Unix tools.
But googling you will find WinDOS specific binaries as well.

Janis

Capt. Cave Man

unread,
Jul 29, 2012, 4:30:39 AM7/29/12
to
On Sat, 28 Jul 2012 19:44:24 -0700 (PDT), rpresser <rpre...@gmail.com>
wrote:
An inspirationally intelligent post.

jim in austin

unread,
Jul 29, 2012, 8:03:10 AM7/29/12
to

Jonadab the Unsightly One

unread,
Jul 29, 2012, 6:31:40 PM7/29/12
to
On Jul 28, 10:32 pm, pillow1301300...@gmail.com wrote:
> And how can I convert LF to CRLF?

I recommend a utility program designed for the purpose, such as
unix2dos.

There are other ways (e.g., most reasonably capable
text editors can do it), but all the other ways are either
more complicated to use or require you to install a
significantly larger piece of software -- often both.

Jonadab the Unsightly One

unread,
Jul 29, 2012, 7:26:18 PM7/29/12
to
On Jul 28, 5:38 am, Janis Papanagnou <janis_papanag...@hotmail.com>
wrote:

> >>>> ASCII defines a character set, no more no less.
> >>> Yes, but it defines *meanings* for some of those
> >>> characters,
> > A character set certainly can assign a particular
> > meaning to a given number.
>
> Example: 'I' (ASCII 75)

Decimal 73 you mean.

> Meaning: Capital Letter I
> Meaning: Roman Number 1
> Meaning: Chemical Element Iod
> etc.

ASCII defines decimal 75 to mean capital letter I. The
fact that the capital letter I itself has various meanings
(as a roman numeral, as a symbol for an element, as
a first-person singular pronoun in English, etc.) is
neither here nor there. Decimal 75 doesn't mean those
things directly. It means capital I, and capital I means
those things. Another character set (say, EBCDIC)
might define decimal 201 to mean capital I. People
using systems that use that encoding would still
use capital I (now represented in data files by a
different number, decimal 201) to mean the roman
numeral for 1 and the first-person singular pronoun
and so on and so forth.

And yes, ASCII does give multiple meanings to
some characters. For example, decimal 39 to mean
either apostrophe or single quotation mark. This was
done to reduce the number of characters and save bits.

However, it does NOT define decimal 10 (hex 0A) to
mean either line feed (move directly down one line to
the position directly under the previous position) or
beginning-of-next-line, nor did any convention prior
to Unix ever make an association between those
two meanings like in your roman numeral example.
The designers of Unix and C made that meaning for
it out of whole cloth. It was never previously used
in that manner, nor is its specification consistent
with that usage.

> No, I (basically) said that you cannot assume a "text file"
> line terminator to contain a 'CR' ASCII control code. You
> seem to see some contradictory behaviour of the Unix OS'es.
> You claimed that CR-LF is right and everything else wrong.

I specifically said that LFCR makes just as much sense
as CRLF. The order is purely a matter of convention.

> > BER-encoded
> > ASN.1 defined X.400 mail systems aren't based on the
> > ASCII specification in the same way that telnet and
> > SMTP are. These protocols handle the linefeed character
> > they way they do because ASCII says it means a certain
> > thing.
>
> The point that has been made was; those protocols *define* what
> they expect as data format to be exchanged, they explicitly
> specify, e.g., that the underlying character set is ASCII, and
> that the separator is CR-LF.

Yes, and they specify the separator as CR LF because their
understanding of ASCII -- their *correct* understanding thereof --
demanded that both of those characters would be required.

It is true that a protocol not based on ASCII might well
use a different line separator, but it is not germane.

> (Remember? We were talking about "text file" specifications!)

It doesn't matter if we're talking about space laser
specifications. Either the space laser is based on
ASCII, or it is not -- and either its interpretation of
ASCII control codes is consistent with the ASCII spec,
or it is not.

> All references to communication protocols help to understand,
> implement, or use those protocols; but for the issue here that
> is nothing but a red herring.

My point was that the people who wrote the network
protocols understood the ASCII linefeed character in
a certain way, based on the wording of that spec,
which they used as a building block for their own.

> See, you must take into account the time when it was developed;

I believe I have.

> The problem, from my perspective, was not the missing
> ellipsis or copyright symbol - well, maybe you see it that
> way, since the 'A' in ASCII indicates clearly the country
> context of the development -; but there are not even
> the characters available that are used and necessary
> in the European countries.

It is true that ASCII was originally intended to be used in
North America. The historical fact that American technology
based on it was subsequently adopted more or less worldwide
is why ASCII was then used as the basis for later, larger
character sets, such as ISO-8859 and, ultimately, Unicode.
I'm not sure what bearing that has on the meaning of the
linefeed character, however. I believe the meaning of the
linefeed character is largely independent of language (with
the possible exception of some imagined science-fiction
space alien languages that cannot be represented in a
line-based writing system e.g. due to being based on a
DNA-altering virus that must be ingested and permitted
to do as it likes in your body for the message to be
understood), but even Unicode in its most expansive
form does not provide for those).

> the US of 'A' standards bodies primarily took care of
> their own (limited) needs.

Granted.

> ASCII control codes are for "Information Interchange", with all
> what may be necessary, including codes for signalling, which are
> for "text file" definitions completely unnecessary, for example.
> But not all control codes are necessary for every purpose; that
> depends on the actual communication devices.

Also granted. There is no need for a printer, for example,
to implement the BEL character, and although some news
readers choose to implement formfeed, it isn't strictly
necessary. A text file probably does not need to contain
any Cancel or Delete characters.

> > There's no "end of line" character, since that would
> > basically just be the sum of carriage return and
> > linefeed anyhow, so just

> PS: If you want to extend the communication on
> that (off-)topic, please abstain from calling me
> "stupid" and "obtuse" again.

I did not call you either of those things, nor do I think
that you are. (If I did, I wouldn't bother to discuss
anything with you.) I did say that one particular claim
you made was stupid, and that another was obtuse.
Surely you are intelligent enough to understand the
distinction between yourself as a person and some
words that you wrote. I do not believe there can have
been more than half a dozen (non-mute) men in the
history of the world who never said anything stupid
nor advanced an obtuse argument. I consider that
figure to be a generous estimate.

pillow13...@gmail.com

unread,
Aug 2, 2012, 11:24:39 PM8/2/12
to
I got unix2dos for Windows at http://www.bastet.com/uddu.zip
I think it worked OK because after I executed the unix2dos.exe,I opened the diff file,and it doesn't look the way before.But I tried C:\***\patch.exe -p1 < C:\***\file.diff and a blank cmd window appeared.The window is completely blank,even after a few minutes.

Mark Lambert

unread,
Aug 29, 2012, 9:15:22 PM8/29/12
to
The issue is that Windows 7 assumes that anything named patch.exe needs to be run under UAC. There are a couple of solutions available on the net. Most require a compiler. You can try renaming the EXE.

Mark Lambert

unread,
Aug 29, 2012, 9:15:52 PM8/29/12
to

Marc Espie

unread,
Sep 5, 2012, 7:42:46 AM9/5/12
to
In article <m3fw8f5...@ipa.eternal-september.org>,
Jukka Lahtinen <jtfj...@hotmail.com.invalid> wrote:
>Jonadab the Unsightly One <jonadab.the...@gmail.com> writes:
>
>> [Line-ending conventions]
>
>> Unix did have a reason for deviating from the spec:
>
>Err.. which spec? Who has defined one and when? Any reference to any
>standard about line-endings?
>
>> saving one byte per line of text was actually considered
>> significant at the time. The good news is, almost all
>> Unix software can handle standard CRLFs, so bringing
>
>Where is CRLF supposed to be defined as standard?

Most RFC (request for comments) dating back to telnet and earlier do say
that text data should be sent with CRLF.

That's a common text interchange standard that dates back from BEFORE
there were standards.

As far as the original reason, well, a teletype will use LF to advance
the paper, and CR to bring back the "cariage" to the beginning of a line.
Note that a mechanical typewriter works exactly the same...

If you want bold characters, that's simple, you can do a CR, then enter
a line with blank characters for "normal" weight and retype the character
for bold.... with enough bold characters, it's going to be faster and better
than the usual "go back a character and retype it" (^H). I would venture
it will also give better vertical alignment results, unless your "go back
one character" is really very precise...

C talks about text files and binary files.

At the time Unix came out, there were some radical ideas in its design
related to file handling. The LF stuff was just the tip of the iceberg,
compared to having "unformatted files" that were just streams of bytes
instead of formatted records (only).
Message has been deleted
0 new messages