Robin
You'd need to use RandomAccess, seek to the end of the file, work your
way back looking for a linefeed/CR, and then slurp forward again into a
buffer. While seeking backwards you can count characters and thus know
exactly how big to make the StringBuilder for maximum efficiency.
You could use a RandomAccessFile and search backwards from the end for a
linefeed. Depending on the size of the line and the size of the file,
it might not be more efficient than reading the whole file.
--
Knute Johnson
s/nospam/knute2011/
Yes, but it's tricky. You need a random-access file and seek backwards to a
newline.
--
Lew
Honi soit qui mal y pense.
You could use a RandomAccessFile and search backwards from the end for a
$ echo RandomAccessFile | hivemind | cut
> Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?
Yes.
tom
--
What? Yeah!
bash: hivemind: command not found
--
Luuk
I don't see a read last line. It seems you have to know your end of
line character and check for it yourself.
http://download.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html
cljp: hivemind: ooh yeah!
;)
> Is it possible to read the last text line from a text file WITHOUT
> reading the previous (n-1) lines?
>
> Robin
>
>
Yes, under certain circumstances. For example, if you know "n" and know that
all of the lines are of some fixed length (also known). There are other
situations as well.
Others have mentioned using RandomAccess to work backward from the
end of the file until you find the penultimate line-ending. This can
work, but it can also fail. Consider a file with context-sensitive
encoding, for example, where the meaning of a byte depends on the values
of bytes that precede it. If you read an isolated byte of value 91 from
such a file, without knowing whether it's a free-standing character or a
part of a multi-byte sequence or possibly preceded by a "shift-out," you
won't know what that byte value means.
One strategy is to estimate a typical line length of N characters,
seek to 100*N (say) bytes before the end, and start reading from
there. A nice feature of most multi-byte encoding schemes is that they
tend to self-synchronize: You may get misinterpreted garbage for a
while, but things are likely to get back on track eventually. If you
want to get fancy you can apply reasonability tests to what you (think
you've) read, and restart at END-1000*N if things seem unreasonable.
--
Eric Sosman
eso...@ieee-dot-org.invalid
In general no.
All the RandomAccessFile tricks are based on assumptions about lines
being separated by something - they do not work with record formats
that contains a line length instead of a delimiter.
If Unix/Linux/Windows/MacOS X is all you need to support then try:
public static String readLastLineUnSup(String fnm) throws IOException {
RandomAccessFile raf = new RandomAccessFile(fnm, "r");
String res = "";
long ix = raf.length() - 1;
for(;;) {
raf.seek(ix);
int c = raf.read();
if(c == '\r' || c == '\n') break;
res = (char)c + res;
ix--;
}
raf.close();
return res;
}
Arne
> On 23-02-2011 10:59, Robin Wenger wrote:
>> Is it possible to read the last text line from a text file WITHOUT
>> reading the previous (n-1) lines?
>
> In general no.
>
> All the RandomAccessFile tricks are based on assumptions about lines
> being separated by something - they do not work with record formats that
> contains a line length instead of a delimiter.
"Record formats" are not relevant here, nor was someone else's concern
about compressed formats -- the OP clearly said "a text file", by which
is generally understood flat ASCII with CR, LF, or CRLF as line delimiter.
OpenVMS supports many record formats, but the "native" one for
text files is VAR: A two-byte binary count, the payload characters,
and if necessary a padding byte to make the total byte count even.
The "next most native" format is VFC, which is sort of like VAR
except that the first N (fixed) bytes of the payload are metadata
(line numbers, carriage control, ...) instead of line content.
Then come the easy formats: STREAM, STREAM-LF, STREAM-CR, and
FIXED. Oh, yes, and UNDEF; let's not forget UNDEF (although, to be
honest, UNDEF is more commonly used for "binary" than "text" files).
(Strangest text file format I ever ran into used line-*bracketing*
characters: a CR before and an LF after. The rationale for this format
caused me to shake my head and sigh: It was said that as you printed
such a file on a typewriter-like console, possibly with long pauses
between lines for progress messages and the like, then the LF at end-
of-line would move the paper so the print head wouldn't interfere with
reading it. As I said, shake the head.)
In short, all I'm asking is that you delete the word "generally"
because your experience is insufficiently general.
--
Eric Sosman
eso...@ieee-dot-org.invalid
Obsolete systems do not interest me. Since those days, the world has
standardized on ASCII flat files for text files. I just wish it had
standardized on one canonical end-of-line character too!
then…
> Since those days, the world has standardized on ASCII flat files for text files.
LOL!
Windows text files are flat ASCII files (with CRLF line ends). Mac text
files are flat ASCII files (with CR line ends). Unix text files are flat
ASCII files (with LF line ends). And that exhausts 99.99% of the
operating system market share right there, if not more, not counting
smartphones which are all too modern to be using weird legacy formats for
text files.
I can't remember the last time I had to interoperate with any machine
that had anything other than standard ASCII as the native format for text
files. It's gotta be decades.
ASCII character values are limited to the 0-127 range. That's an
outdated "standard".
I remember when we used a seven-bit character code to write my native
language. We could toggle the way we viewed the character codes where
we had put those characters that were not in ASCII. It was either
brackets and braces or those letters, but never both.
V{nkyr{-{{kk|si{. It's not a happy memory.
Used by "obsolete systems". A key point in my amusement. :)
I have the same experience. C code wasn't very readable with "Swedish
ASCII". At least Finnish doesn't use "å", except when quoting Swedish words.
I thought so, but Ken seemed to need an explanation.
Actually I find that, nowadays, lots of text files on Windows are
so-called 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16
with BOM).
Even on my ancient XP boxes, Notepad offers only ANSI, Unicode, Unicode
big-endian and UTF-8. Wordpad offers RTF, Text-Document (turns out to be
CP-1252), Text-Document DOS format (turns out to be CP-850) and
Unicode. No ASCII.
--
RGB
Ah, the warm blanket of provincialism.
>> OpenVMS supports many record formats, but the "native" one for
>> text files is VAR: A two-byte binary count, the payload characters, and
>> if necessary a padding byte to make the total byte count even.
>> ...
>> In short, all I'm asking is that you delete the word "generally"
>> because your experience is insufficiently general.
On the IBM i machines (formerly i Series, formerly System i, formerly
AS/400, successor to the System/3x), using the default filesystem, a
text "file" is actually a series of records in a "member" of a
"physical file". The i operating system hides implementation details,
but access to the contents of the "file" is record-oriented, not
byte-oriented.
In the alternate Hierarchical File System supported by the i machines
for POSIX compatibility, text files are byte-oriented, but usually
EBCDIC, not ASCII.
On IBM and other EBCDIC mainframe systems, there are a variety of
formats for text files, but flat byte-oriented ASCII isn't one of
them, unless you're running Linux.
> Obsolete systems do not interest me.
Apparently, neither do prominent ones that you don't happen to know
about. What a surprise.
> Since those days, the world has
> standardized on ASCII flat files for text files.
Only for sufficiently small values of "the world".
--
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
> Ken Wesson wrote:
>> On Thu, 24 Feb 2011 00:42:41 -0500, Eric Sosman wrote:
>>> On 2/23/2011 10:23 PM, Ken Wesson wrote:
>>>> "Record formats" are not relevant here, nor was someone else's
>>>> concern about compressed formats -- the OP clearly said "a text
>>>> file", by which is generally understood flat ASCII with CR, LF, or
>>>> CRLF as line delimiter.
>
> Ah, the warm blanket of provincialism.
Who asked you for your opinions of others here?
>>> OpenVMS supports many record formats, but the "native" one for
>>> text files is VAR: A two-byte binary count, the payload characters,
>>> and if necessary a padding byte to make the total byte count even. ...
>>> In short, all I'm asking is that you delete the word "generally"
>>> because your experience is insufficiently general.
>
> On the IBM i machines (formerly i Series, formerly System i, formerly
> AS/400, successor to the System/3x), blah blah blah
You're one to talk about provincialism. Who the hell uses these ancient
museum pieces any more?
>> Obsolete systems do not interest me.
>
> Apparently, neither do prominent ones that you don't happen to know
> about.
There is nothing at all prominent about those IBM dinosaurs. They may
have been prominent 30 years ago, but not now.
>> Since those days, the world has
>> standardized on ASCII flat files for text files.
>
> Only for sufficiently small values of "the world".
Fine, then -- corporate America and home computers in America then.
Perhaps you live in a place where they're 30 years behind us, but you're
the unusual ones in that case.
> On 24/02/2011 14:00, Ken Wesson wrote:
>> Windows text files are flat ASCII files (with CRLF line ends).
>
> Actually I find that, nowadays, lots of text files on Windows are
> so-called 'ANSI' (mostly CP-1252)
Same difference. The files are plain text, with CRLF line ends.
> RTF etc.
Not text files. RTF is more akin to word processor document files than
text files. Nobody would use RTF to encode source code or a shell script.
> I remember when we used a seven-bit character code to write my native
> language. etc
That's why we now actually use that 8th bit for something useful, if need
be.
Well, these days we use the 8th bit for accented characters instead of
just wasting it. Technically it's not your granddaddy's ASCII with that
in use, but it's close enough for government work, and certainly close
enough not to mess with using tests for CR/LF to detect line boundaries.
Yes, and it was a good explanation. Unfortunately, I don't think he
understood the explanation, nor do I think he will understand further
clarification. I think it more likely that the harder anyone tries to
explain to him these points, the more dug in his heels will be.
To do otherwise would necessarily require an admission that there's no
single "text file" format, and that even if there were, ASCII or any of
the single-byte derivatives thereof ain't it. I don't see any way such
an admission would ever be produced.
Pete
> On 2/24/11 10:42 PM, Lars Enderin wrote:
>> 2011-02-24 15:26, Peter Duniho skrev:
>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>> ASCII character values are limited to the 0-127 range. That's an
>>>> outdated "standard".
>>>
>>> Used by "obsolete systems". A key point in my amusement. :)
>>
>> I thought so, but Ken seemed to need an explanation.
>
> Yes, and it was a good explanation. Unfortunately, I don't think he
> understood the explanation, nor do I think he will understand further
> clarification. I think it more likely that the harder anyone tries to
> explain to him these points, the more dug in his heels will be.
You know, that's what you can expect when you are unpleasant, nasty, and
rude about things -- other people display a curious unwillingness to
listen to anything you have to say. An old adage comes to mind --
something about honey and vinegar?
(It doesn't help when your "counterexamples" are obscure formats used on
dinosaurian machines of yesteryear; the fact is that text files with CR/
LF line delimiters are standard on a set of operating systems that have
the overwhelming majority of the market share for such these days.)
Spot on.
hopefully you live in another country than i do....
--
Luuk
> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>
>> Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 00:42:41 -0500, Eric Sosman wrote:
>>>> On 2/23/2011 10:23 PM, Ken Wesson wrote:
>>>>> "Record formats" are not relevant here, nor was someone else's
>>>>> concern about compressed formats -- the OP clearly said "a text
>>>>> file", by which is generally understood flat ASCII with CR, LF, or
>>>>> CRLF as line delimiter.
>>
>> Ah, the warm blanket of provincialism.
>
> Who asked you for your opinions of others here?
>
>>>> OpenVMS supports many record formats, but the "native" one for
>>>> text files is VAR: A two-byte binary count, the payload characters,
>>>> and if necessary a padding byte to make the total byte count even. ...
>>>> In short, all I'm asking is that you delete the word "generally"
>>>> because your experience is insufficiently general.
>>
>> On the IBM i machines (formerly i Series, formerly System i, formerly
>> AS/400, successor to the System/3x), blah blah blah
>
> You're one to talk about provincialism. Who the hell uses these ancient
> museum pieces any more?
Um, that would be me, or rather my employer's customers.
Indirectly, anyone who has an account with a bank or credit union is
likely using an EBCDIC-based machine. There are some that don't, but
it's not the way to bet.
--
Jim Janney
There is a single text file format: lines of characters in some encoding,
terminated by an end-of-line sequence which is distinguishable from any
other characters.
It's merely the case that some current mainframes, and some obscure or
historical systems, do not store text in text files!
tom
--
everything from live chats and the Web, to the COOLEST DISGUSTING
PORNOGRAPHY AND RADICAL MADNESS!!
> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>
>> Ken Wesson wrote:
>>> Obsolete systems do not interest me.
>>
>> Apparently, neither do prominent ones that you don't happen to know
>> about.
>
> There is nothing at all prominent about those IBM dinosaurs. They may
> have been prominent 30 years ago, but not now.
>
You know, you sound exactly like a character who surfaced in a Y2K
newsgroup back in 1998/99. He refused to believe that any computers apart
from PCs were in use at the time.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
No one. I offer them out of sheer generosity. No thanks are necessary.
In the twenty years I've been on Usenet, I've found offering my
opinions on the local idiots to be immensely useful. At least to me.
>> On the IBM i machines (formerly i Series, formerly System i, formerly
>> AS/400, successor to the System/3x), blah blah blah
>
> You're one to talk about provincialism. Who the hell uses these ancient
> museum pieces any more?
Thousands of organizations, which is why they still enjoy healthy sales.
>>> Obsolete systems do not interest me.
>> Apparently, neither do prominent ones that you don't happen to know
>> about.
>
> There is nothing at all prominent about those IBM dinosaurs. They may
> have been prominent 30 years ago, but not now.
Tell that to the many thousands of organizations that still use them.
And the majority of business transactions still runs on IBM mainframe
and midrange systems, and similar offerings from other companies.
IBM had just shy of $100B in sales last year. A good chunk of that was
from mainframes: mainframe sales were up 68% from 2009, to the best
level in six years. MIPS capacity (mainframe processing capacity owned
by customers) rose 58%, and IBM acquired a couple dozen new mainframe
customers - businesses that bought their first mainframes.[1]
As usual, you don't know what the hell you're talking about, and
clearly can't be bothered to do even a moment of research before
posting something else that demonstrates your ignorance. Not that
you'll learn anything from this exchange, either, I suppose.
[1] http://www.theregister.co.uk/2011/01/18/ibm_q4_2010_numbers/
Windows hasn't used ASCII in decades.
--
Lew
Honi soit qui mal y pense.
They are - because the record format determines whether RandomAccessFile
has a chance of working or not.
> nor was someone else's concern
> about compressed formats -- the OP clearly said "a text file", by which
> is generally understood flat ASCII with CR, LF, or CRLF as line delimiter.
That is probably true among non IT pros.
But this group is for IT pros.
They know that there are other character sets and other
record formats.
Arne
Yep.
NOS/VE (it may not be relevant here because I don't think there
exists a Java for NOS/VE) used 6 byte length + data + 6 byte length.
The trailing 6 byte length made it possible to securely read the
file backwards which the VMS format does not.
Arne
Whether a solution works in general or not depends on whether
it is guaranteed to work on all platforms or not.
The RandomAccessFile and search for CR and LF does not.
Whether it works on platforms that interest you are completely
irrelevant.
> Since those days, the world has
> standardized on ASCII flat files for text files.
Not really.
Windows uses CP-1252, UTF-8 and UTF-16
Unix/Linux/VMS uses ISO-8859-1 and UTF-8
IBM mainframe uses EBCDIC
There are really very few systems today that uses just ASCII.
Arne
No.
They are CP-1252, UTF-8 or UTF-16.
> Mac text
> files are flat ASCII files (with CR line ends). Unix text files are flat
> ASCII files (with LF line ends).
No.
They are ISO-8859-1 or UTF-8.
> And that exhausts 99.99% of the
> operating system market share right there, if not more,
No.
z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.
> I can't remember the last time I had to interoperate with any machine
> that had anything other than standard ASCII as the native format for text
> files. It's gotta be decades.
Possible that you only work with 20+ year old Unix and OpenVMS
systems with 7 bit VT100 access.
But that is not very common.
Arne
I am pretty sure that Ken completely missed your joke.
Arne
No.
There are also count prefix (and sometimes suffix) formats.
They have the advantage of begin able to actually have
all possible values in lines.
And the disadvantage of various hacks assuming all records
use delimiters does not work.
Arne
Then it is not ASCII.
> Technically it's not your granddaddy's ASCII with that
> in use, but it's close enough for government work, and certainly close
> enough not to mess with using tests for CR/LF to detect line boundaries.
The character set and the record format are independent of each other.
Arne
Good old ISO 646 NRC.
Horrible by today's standards.
But back then it was what we had.
Arne
Well - you are the one that has been claiming that everybody is using
a 7 bit standard (ASCII) today.
Arne
Completely different char set.
Arne
I don't think it ever have.
DOS used CP-437, CP-850 etc..
32/64 bit Windows uses CP-1252 (which is practically the
same as ISO-8859-1) and some UTF-16.
.NET added UTF-8.
I don't remember 16 bit Windows, but I am pretty sure
that it did not use ASCII.
Arne
PS: CP-850 and CP-1252 is for western countries - other
countries uses other char sets.
Yep.
>>> OpenVMS supports many record formats, but the "native" one for
>>> text files is VAR: A two-byte binary count, the payload characters, and
>>> if necessary a padding byte to make the total byte count even.
>>> ...
>>> In short, all I'm asking is that you delete the word "generally"
>>> because your experience is insufficiently general.
>
> On the IBM i machines (formerly i Series, formerly System i, formerly
> AS/400, successor to the System/3x), using the default filesystem, a
> text "file" is actually a series of records in a "member" of a
> "physical file". The i operating system hides implementation details,
> but access to the contents of the "file" is record-oriented, not
> byte-oriented.
And it is a pretty good guess that the RandomAccessFile searching
for CR and LF will fail on i also then.
> In the alternate Hierarchical File System supported by the i machines
> for POSIX compatibility, text files are byte-oriented, but usually
> EBCDIC, not ASCII.
>
> On IBM and other EBCDIC mainframe systems, there are a variety of
> formats for text files, but flat byte-oriented ASCII isn't one of
> them, unless you're running Linux.
Linux will be either ISO-8859-1 or UTF-8 not ASCII.
Arne
Anyone posting to usenet gives the entire world the
opportunity to comment on them.
The smart people try to post something smart.
>>>> OpenVMS supports many record formats, but the "native" one for
>>>> text files is VAR: A two-byte binary count, the payload characters,
>>>> and if necessary a padding byte to make the total byte count even. ...
>>>> In short, all I'm asking is that you delete the word "generally"
>>>> because your experience is insufficiently general.
>>
>> On the IBM i machines (formerly i Series, formerly System i, formerly
>> AS/400, successor to the System/3x), blah blah blah
>
> You're one to talk about provincialism. Who the hell uses these ancient
> museum pieces any more?
Lots of places.
Retail sector, public sector, financial sector
>>> Obsolete systems do not interest me.
>>
>> Apparently, neither do prominent ones that you don't happen to know
>> about.
>
> There is nothing at all prominent about those IBM dinosaurs. They may
> have been prominent 30 years ago, but not now.
Both z/OS and i are widely used today.
>
>>> Since those days, the world has
>>> standardized on ASCII flat files for text files.
>>
>> Only for sufficiently small values of "the world".
>
> Fine, then -- corporate America and home computers in America then.
OK - neither z/OS or i are common on home computers.
But they are very common in corporate America.
If all z/OS systems disappeared over night then everything
would break down, because so many critical systems are
running on them.
Arne
The biggest chunk of IBM's revenue is services.
But they still sell a lot of big iron.
The don't publicize numbers at the OS level, but I would guess that
at least 10 B$ was mainframe HW & SW.
Arne
Interesting. On the one hand he retreats from his earlier claim
that "ASCII" encoding is universal, and on the other he advances the
notion that CR/LF is The One True Delimiter. So, which hand advances
and which retreats? Is he spinning clockwise or counterclockwise?
Well, maybe his rotation will make him a sort of human eggbeater,
better at mixing the vinegar with the honey. (Ugh.)
--
Eric Sosman
eso...@ieee-dot-org.invalid
I think it's amusing that he says "All the world's ASCII," and
posts his assertion in a message whose Content-Type says otherwise.
--
Eric Sosman
eso...@ieee-dot-org.invalid
> On 2/24/2011 2:10 PM, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 02:52:56 +0800, Peter Duniho wrote:
>>
>>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>>> outdated "standard".
>>>>>
>>>>> Used by "obsolete systems". A key point in my amusement. :)
>>>>
>>>> I thought so, but Ken seemed to need an explanation.
>>>
>>> Yes, and it was a good explanation. Unfortunately, I don't think he
>>> understood the explanation, nor do I think he will understand further
>>> clarification. I think it more likely that the harder anyone tries to
>>> explain to him these points, the more dug in his heels will be.
>>
>> You know, that's what you can expect when you are unpleasant, nasty,
>> and rude about things -- other people display a curious unwillingness
>> to listen to anything you have to say. An old adage comes to mind --
>> something about honey and vinegar?
>>
>> (It doesn't help when your "counterexamples" are obscure formats used
>> on dinosaurian machines of yesteryear; the fact is that text files with
>> CR/ LF line delimiters are standard on a set of operating systems that
>> have the overwhelming majority of the market share for such these
>> days.)
>
> Interesting. On the one hand he retreats from
Didn't anyone ever tell you that it was rude to discuss someone in the
third person right in front of him like that?
If you have something to say about me you can address it directly to me,
Sosman.
> human eggbeater
Your personal opinions of others are not the topic of this newsgroup. Do
you have anything Java-related to say?
Those aren't text files. Text is, notionally, a string of characters,
including perhaps spaces and line-end characters. A text file is
therefore a file whose content is a string of characters, including
perhaps spaces and line-end characters. Such a thing is, logically, the
only native way to represent raw text. Anything more structured is
obviously not a plain text file. It may be a text-containing file of some
kind but it is not a text file.
> They have the advantage of begin able to actually have all possible
> values in lines.
That's nonsense. The only character a normal text file cannot have in
lines is a line break, and in actual fact you cannot have a line break in
the middle of a line *by definition*. Wherever there is a line break one
line ENDS and another one BEGINS, *by definition*. If that weren't the
case then it wouldn't be a line break!
So there is no "advantage" here. What you are actually describing is a
"list-of-strings" file, not a text file (which is representable as a
single string). Your "list-of-strings" file is in fact NOT representable
as a single string, without resort to some escaping mechanism or the use
of a data structure such as ArrayList<String>. For instance, suppose you
have a file of two test records, one of which is
foo
bar
and the other of which is
baz
where the first one has a literal linebreak after the second o. To
represent this whole file in a string requires either a string
serialization of an in-memory record format (e.g., ArrayList<String> ->
ObjectOutputStream -> ByteBuffer -> BinHex converter -> String) or a
string with delimiters. Say you use newlines as the delimiters. Then you
need to escape the literal newline after that o, say,
foo\nbar
baz
is the string. And you also need to escape the escape, so, here you'd
have to escape backslashes. Any other delimiter you choose instead will
have the same effect, so long as you are storing this into a String. (You
can use an ArrayList<Object> with Characters and a delimiter object that
is not equal to any Character, but then once again this is not a String.)
Face it: those record-oriented file formats are not text files. They have
additional structure that cannot be represented natively in a String,
therefore represent more than just a String, such as a collection of
Strings, and therefore are not text files but something else -- archives
of multiple text files bundled into single files.
The main use for such a thing over plain ordinary text files that I can
think of is storing a mailbox without resorting to hacks that behave
oddly when lines in the bodies start with the word "from". And these days
filesystems work fine with large directories full of tiny files, so
there's less need for that sort of thing than there once was.
> And the disadvantage of various hacks assuming all records use
> delimiters does not work.
Nobody is assuming records use delimiters. They are assuming text files
are text files. The lines in text files use delimiters as an inherent
property. If you have a text in a String, seeking backward from the end
until a newline character (or the beginning of the String, whichever you
hit first) will reliably find the start of the last line in the String.
The same is true of any disk file format that faithfully represents the
String as a flat string of text rather, and in particular of the formats
commonly used to store, e.g., C source files.
> I am pretty sure that Ken completely missed your joke.
What did I just tell Sosman about talking about people as if they aren't
there?
And what does this have to do with Java? This is
comp.lang.java.programmer, Arne, not rec.humor.did.ken.get.the.joke.
> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
>> it's not (...) ASCII (...).
Alleged by whom? That distorted quote is most certainly not what I wrote.
> I think it's amusing that he says "All the world's ASCII,"
Who says "all the world's ASCII", Sosman? I can't recall anybody doing so
in this group recently.
It is true that almost all the world seems to use encodings that contain
ASCII as a subset. That is not quite the same thing.
> On 24-02-11 19:46, Ken Wesson wrote:
>> but it's close enough for government work,
>
> hopefully you live in another country than i do....
> On 24-02-2011 13:46, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 15:14:44 +0100, Lars Enderin wrote:
>>
>>> 2011-02-24 15:00, Ken Wesson skrev:
>>>> I can't remember the last time I had to interoperate with any machine
>>>> that had anything other than standard ASCII as the native format for
>>>> text files. It's gotta be decades.
>>>
>>> ASCII character values are limited to the 0-127 range. That's an
>>> outdated "standard".
>>
>> Well, these days we use the 8th bit for accented characters instead of
>> just wasting it.
>
> Then it is not ASCII.
It contains ASCII as a subset.
So it is ASCII. And more.
>> Technically it's not your granddaddy's ASCII with that
>> in use, but it's close enough for government work, and certainly close
>> enough not to mess with using tests for CR/LF to detect line
>> boundaries.
>
> The character set and the record format are independent of each other.
Record formats are not relevant here, since text files do not have record
formats; they are raw sequences in some character set more or less by
definition. Anything with additional structure over and above that is
something other than a text file. Generically we call such things "binary
files" though commonly binary files do *contain* text. But all contain
additional structure that cannot be represented in, say, a
java.lang.String without resort to some form of escaping or encoding. And
that makes them not pure text, but text-and-some-other-stuff or some-
other-stuff-that-happens-to-contain-text.
Technically they are, since the various more recent standards they use
contain ASCII as a subset and generally reduce to ASCII if you strip the
high bit off (code pages) or the high byte and highest remaining bit (16-
bit encodings). So they are using ASCII and sometimes some additional
stuff that encloses and contains ASCII.
Funny that something so "completely different" intersects with ASCII in
the entirety of ASCII's range (0-127). It just specifies what 128-255
mean instead of leaving those values undefined. Unicode specifies what
128-65535 mean and still intersects with ASCII on 0-127.
> On 24-02-2011 19:12, Lew wrote:
>> On 02/24/2011 09:49 AM, RedGrittyBrick wrote:
>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>
>>> Actually I find that, nowadays, lots of text files on Windows are
>>> so-called
>>> 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16 with
>>> BOM).
>>>
>>> Even on my ancient XP boxes, Notepad offers only ANSI, Unicode,
>>> Unicode big-endian and UTF-8. Wordpad offers RTF, Text-Document (turns
>>> out to be CP-1252), Text-Document DOS format (turns out to be CP-850)
>>> and Unicode. No
>>> ASCII.
>>
>> Windows hasn't used ASCII in decades.
>
> I don't think it ever have.
Funny then that bog-standard ASCII files seem to read and write just fine
in Notepad on the occasions that I use Windows computers.
> DOS used CP-437, CP-850 etc..
>
> 32/64 bit Windows uses CP-1252 (which is practically the same as
> ISO-8859-1) and some UTF-16.
All of those seem to be ASCII plus another up to 128 characters, or in
the case of UTF-16, another up to 65408 characters.
Saying that a 7-bit-clean file interpreted in one of those is not ASCII
is like saying that humans are not mammals.
> On 24-02-2011 09:00, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:
>>
>>> On 2/24/11 9:06 PM, Ken Wesson wrote:
>>>> [...]
>>>> Obsolete systems do not interest me.
>>>
>>> then…
>>>
>>>> Since those days, the world has standardized on ASCII flat files for
>>>> text files.
>>>
>>> LOL!
>>
>> Windows text files are flat ASCII files (with CRLF line ends).
>
> No.
>
> They are CP-1252, UTF-8 or UTF-16.
All of which are ASCII++, for all intents and purposes.
>> Mac text
>> files are flat ASCII files (with CR line ends). Unix text files are
>> flat ASCII files (with LF line ends).
>
> No.
>
> They are ISO-8859-1 or UTF-8.
Which are ASCII++, for all intents and purposes.
>> And that exhausts 99.99% of the
>> operating system market share right there, if not more,
>
> No.
>
> z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.
Nonsense. There are *at least* ten thousand PCs running Windows for every
one machine running one of those operating systems.
Ten thousand *PCs running Windows*.
If you throw in Unix and MacOS you get a lot more, especially given how
heavily Unix is used in server racks.
>> I can't remember the last time I had to interoperate with any machine
>> that had anything other than standard ASCII as the native format for
>> text files. It's gotta be decades.
>
> Possible that you only work with 20+ year old Unix and OpenVMS systems
> with 7 bit VT100 access.
>
> But that is not very common.
I work with what nearly everyone in the field works with these days: a
mix of Unix, MacOS, and Windows, mainly Unix server blades whose services
are accessed by mainly Windows desktop/netbook users with a smattering of
Mac users and a small but growing contingent of smartphone users.
> Ken Wesson <kwe...@gmail.com> writes:
>
>> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>>
>>> On the IBM i machines (formerly i Series, formerly System i, formerly
>>> AS/400, successor to the System/3x), blah blah blah
>>
>> You're one to talk about provincialism. Who the hell uses these ancient
>> museum pieces any more?
>
> Um, that would be me, or rather my employer's customers.
Your employer may happen to be using such legacy systems, but I very much
doubt that very many people deal with them in an IT capacity. Far, *far*
fewer than deal with Unix, Windows, and Mac boxes in such a capacity.
How many end-users interact indirectly with these systems is of course
irrelevant.
> On Thu, 24 Feb 2011 19:42:19 +0100, Ken Wesson wrote:
>
>> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>>
>>> Ken Wesson wrote:
>>>> Obsolete systems do not interest me.
>>>
>>> Apparently, neither do prominent ones that you don't happen to know
>>> about.
>>
>> There is nothing at all prominent about those IBM dinosaurs. They may
>> have been prominent 30 years ago, but not now.
>>
> You know, you sound exactly like a character who surfaced in a Y2K
> newsgroup back in 1998/99. He refused to believe that any computers
> apart from PCs were in use at the time.
I doubt that. He may have correctly pointed out that the vast *majority*
of computers were PCs at the time. (Now, laptops and smartphones may have
the slight edge, or perhaps even server blades, now that typical servers
are racks full of small computers instead of single big computers.)
If he did claim they *all* were then he was an idiot.
> Ken Wesson wrote:
>> Who asked you for your opinions of others here?
>
> No one. I offer them out of sheer generosity.
Calling other people names is hardly what I would call "generosity", nor
is polluting a newsgroup with off-topic traffic.
> the local idiots
Your personal opinions of others are not the topic of this newsgroup. Do
you have anything Java-related to say?
>>> On the IBM i machines (formerly i Series, formerly System i, formerly
>>> AS/400, successor to the System/3x), blah blah blah
>>
>> You're one to talk about provincialism. Who the hell uses these ancient
>> museum pieces any more?
>
> Thousands of organizations, which is why they still enjoy healthy sales.
Ah, must be vendor lockin. Sucks to be them. Soon they'll be outcompeted
by newer, nimbler firms that use modern things like the free Unixes on
commodity hardware. Of course, they might last a while if they can keep
convincing the government to give them "bailouts" or other protectionist
help in the face of competitors and their own screwups.
Still, your "thousands" of organizations are outweighed by the *hundreds*
of thousands that don't use such systems and *they* are outweighed by the
hundreds of *millions* of individuals who collectively possess *billions*
of personal computers (often two or three desktop/laptop/netbook
machines, one current and one or two older units, *plus* a smartphone or
an iPad or whatever, and that's not even counting routers or other
gadgets with general-purpose microprocessors but non-general-purpose
applications).
Less than 1 in 10,000 computers, and probably *far* less, don't store
text files in a form that consists of CR, CRLF, or LF delimited lines. A
very large number of those that do in fact use one or another ASCII-
superset character set and they pretty much all intersect on using
characters 10 and 13 to represent the potential line-end characters.
>> There is nothing at all prominent about those IBM dinosaurs. They may
>> have been prominent 30 years ago, but not now.
>
> Tell that to the many thousands of organizations that still use them.
Perhaps ten thousand aging dinosaurian computers using them. Over one
billion using some variation on the theme of Windows, Unix, MacOS, iOS,
or Android. Probably more devices use phone OS also-rans like SymbianOS
and PalmOS than use those oddball IBM operating systems.
> And the majority of business transactions
have no bearing on this discussion, which has to do with the majority of
*computers* and, secondarily, what will be encountered routinely by the
majority of *IT workers*.
> IBM had just shy of $100B in sales last year.
Vendor lockin has been good to them.
> you don't know what the hell
Your personal opinions of others are not the topic of this newsgroup. Do
you have anything Java-related to say?
> can't be bothered to do even a moment of research
Your personal opinions of others are not the topic of this newsgroup. Do
you have anything Java-related to say?
> your ignorance
Your personal opinions of others are not the topic of this newsgroup. Do
you have anything Java-related to say?
> Not that you'll learn
> On 24-02-2011 13:42, Ken Wesson wrote:
>> Who asked you for your opinions of others here?
>
> Anyone posting to usenet gives the entire world the opportunity to
> comment on them.
Only people ignorant about etiquette, the newsgroup's topic, or both will
actually do so.
> The smart people
Your personal opinions of others are not the topic of this newsgroup. Do
you have anything Java-related to say?
>> You're one to talk about provincialism. Who the hell uses these ancient
>> museum pieces any more?
>
> Lots of places.
>
> Retail sector, public sector, financial sector
If you're counting it that way, that's 3 places. Hardly "lots". :)
See other posts. Perhaps a collected few tens of thousands of computers
using museum-worthy OSes like those versus a collected *billion* or more
of machines running Windows, MacOS, iOS, Android, and Unix.
>> There is nothing at all prominent about those IBM dinosaurs. They may
>> have been prominent 30 years ago, but not now.
>
> Both z/OS and i are widely used today.
If by "widely used" you mean on one in ten thousand or fewer computers.
>> Fine, then -- corporate America and home computers in America then.
>
> OK - neither z/OS or i are common on home computers.
>
> But they are very common in corporate America.
If by "very common" you mean used on one in ten thousand or fewer of
their computers. For every single z/OS machine in corporate America there
are probably a thousand blade servers and ten thousand office PCs and
employer-provided laptops and God alone knows how many employee
smartphones with plans and/or handsets paid for by their company.
> If all z/OS systems disappeared over night then everything would break
> down, because so many critical systems are running on them.
A somewhat scary thought, but hardly relevant unless you're trying to
stir up enough public alarm to foment a general movement to replace these
legacy systems with more modern ones.
> On 24-02-2011 09:18, Michael Wojcik wrote:
>> Ah, the warm blanket of provincialism.
>
> Yep.
Your personal opinions of others are not the topic of this newsgroup. Do
you have anything Java-related to say?
> And it is a pretty good guess that the RandomAccessFile searching for CR
> and LF will fail on i also then.
How fortunate that i runs on fewer than one in ten thousand machines.
Does Java even run on i?
> Linux will be either ISO-8859-1 or UTF-8 not ASCII.
Both contain ASCII as a subset -- if you take a pure-ASCII file and
reencode it in either the result is the identical byte sequence.
> On 24-02-2011 08:06, Ken Wesson wrote:
>> Obsolete systems do not interest me.
>
> Whether a solution works in general or not depends on whether it is
> guaranteed to work on all platforms or not.
Actually, "in general" tends to have some kind of implicit scope that is
usually less than "all platforms". For instance, when discussing a Java
solution, we can exclude platforms that Java doesn't run on. The last I
heard that even includes one prominent one: iOS, the platform of iPhones
and iPads.
> The RandomAccessFile and search for CR and LF does not.
It probably runs on all platforms Java is normally used on. It certainly
runs on 99.99% or more of the machines anyone is likely to run Java on,
AND the remaining less than .01% are ones sufficiently oddball that their
operators will *know* to expect common crossplatform software to often
break on them. Typical C code using I/O will probably not work on such
machines without heavy modification, even C code that compiles and works
fine on every POSIX-compliant system and every Windows box and most other
machines. Hell, these machines may not even be able to represent C source
trees normally, requiring the compiler vendor to jump through hoops and
requiring unusual tools and IDEs be used to hack C sources and not just
the system text editor. Hell, I wouldn't be surprised if there were no
working C implementations on some of these systems -- and I'd be
surprised if many, if any, of them ran Java at all, let alone had a fully
compliant JavaSE 5/6 implementation.
> Whether it works on platforms that interest you are completely
> irrelevant.
On the contrary, whether software works on platforms that interest its
developer and user base is 100% relevant and whether it works on
platforms that *don't* interest its developer and user base is irrelevant.
>> Since those days, the world has
>> standardized on ASCII flat files for text files.
>
> Not really.
>
> Windows uses CP-1252, UTF-8 and UTF-16 Unix/Linux/VMS uses ISO-8859-1
> and UTF-8
All ASCII supersets. Which means the common denominator among all those
is ... ta-da! ASCII. :)
> IBM mainframe uses EBCDIC
And hardly anyone uses IBM mainframe (sic). What was that figure again?
0.01% of all computers? Or fewer. And shrinking. Even if the number of
IBM mainframes is actually growing (for the love of God, *why*?), the
number of non-IBM-mainframe computers is growing *exponentially* faster.
There was a time when IBM mainframes may have been over 50% and were
surely over 20% of all computers; the trend has been one of exponential
decay of that percentage ever since, with the knee of the curve
corresponding quite closely with the beginning of widespread adoption of
the PC.
> There are really very few systems today that uses just ASCII.
But many that use ASCII.
> On 23-02-2011 22:23, Ken Wesson wrote:
>> On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:
>>> On 23-02-2011 10:59, Robin Wenger wrote:
>>>> Is it possible to read the last text line from a text file WITHOUT
>>>> reading the previous (n-1) lines?
>>>
>>> In general no.
>>>
>>> All the RandomAccessFile tricks are based on assumptions about lines
>>> being separated by something - they do not work with record formats
>>> that contains a line length instead of a delimiter.
>>
>> "Record formats" are not relevant here,
>
> They are
They are not, since files in record formats are not text files.
>> nor was someone else's concern
>> about compressed formats -- the OP clearly said "a text file", by which
>> is generally understood flat ASCII with CR, LF, or CRLF as line
>> delimiter.
>
> non IT pros
Your personal opinions of others are not the topic of this newsgroup. Do
you have anything Java-related to say?
> They know that there are other character sets and other record formats.
Other character sets mostly intersect in ASCII. Nearly all in any kind of
widespread use intersect in using characters 10 and 13 as the potential-
line-end characters. And "other record formats" are not relevant in a
discussion of text files, as has been explained already.
> On 24-02-2011 15:49, Tom Anderson wrote:
>> On Fri, 25 Feb 2011, Peter Duniho wrote:
>>
>>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>>> outdated "standard".
>>>>>
>>>>> Used by "obsolete systems". A key point in my amusement. :)
>>>>
>>>> I thought so, but Ken seemed to need an explanation.
>>>
>>> Yes, and it was a good explanation. Unfortunately, I don't think he
>>> understood the explanation, nor do I think he will understand further
>>> clarification. I think it more likely that the harder anyone tries to
>>> explain to him these points, the more dug in his heels will be.
>>>
>>> To do otherwise would necessarily require an admission that there's no
>>> single "text file" format, and that even if there were, ASCII or any
>>> of the single-byte derivatives thereof ain't it. I don't see any way
>>> such an admission would ever be produced.
>>
>> There is a single text file format: lines of characters in some
>> encoding, terminated by an end-of-line sequence which is distinguishable
>> from any other characters.
>>
>> It's merely the case that some current mainframes, and some obscure or
>> historical systems, do not store text in text files!
>
> No.
Yes.
> There are also count prefix (and sometimes suffix) formats.
Which, although they may be used to store text, are not text files.
> They have the advantage of begin able to actually have
> all possible values in lines.
True. I wish we used more formats like this.
tom
--
If it ain't Alberta, it ain't beef.
> On 24-02-2011 17:11, Michael Wojcik wrote:
>> Ken Wesson wrote:
>>> There is nothing at all prominent about those IBM dinosaurs. They may
>>> have been prominent 30 years ago, but not now.
>>
>> Tell that to the many thousands of organizations that still use them.
>>
>> And the majority of business transactions still runs on IBM mainframe
>> and midrange systems, and similar offerings from other companies.
>>
>> IBM had just shy of $100B in sales last year. A good chunk of that was
>> from mainframes: mainframe sales were up 68% from 2009, to the best
>> level in six years.
>
> The biggest chunk of IBM's revenue is services.
>
> But they still sell a lot of big iron.
Do they actually sell them? What happened to the leasing model?
Good question.
I don't know if they sell or lease them out.
IBM deliver boxes to customers and get a ton of money
in return.
Arne
Of course they are text files.
If I edit Foobar.java in a text editor and write a Java program
and saves it, then why should it be less of a text file, because
the record format used on that system is not delimited?
Arne
He did and he was.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
Given that:
data + LF
data + CR + LF
are alo record formats then that is nonsense.
>> They know that there are other character sets and other record formats.
>
> Other character sets mostly intersect in ASCII. Nearly all in any kind of
> widespread use intersect in using characters 10 and 13 as the potential-
> line-end characters. And "other record formats" are not relevant in a
> discussion of text files, as has been explained already.
As has been proven not to be the case.
Arne
This is an IT group.
Not a group for hairdressers or chefs.
This mean that we use exact terms.
ASCII is a very well defined standard specified by ANSI and ISO.
There are no such thing as ASCII++.
>>> Mac text
>>> files are flat ASCII files (with CR line ends). Unix text files are
>>> flat ASCII files (with LF line ends).
>>
>> No.
>>
>> They are ISO-8859-1 or UTF-8.
>
> Which are ASCII++, for all intents and purposes.
>
>>> And that exhausts 99.99% of the
>>> operating system market share right there, if not more,
>>
>> No.
>>
>> z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.
>
> Nonsense. There are *at least* ten thousand PCs running Windows for every
> one machine running one of those operating systems.
>
> Ten thousand *PCs running Windows*.
The PC/mainframe ratio is probably like 100000:1.
But the relevance is not that big. Because mainframes happen
to be a lot more expensive than PC's.
>>> I can't remember the last time I had to interoperate with any machine
>>> that had anything other than standard ASCII as the native format for
>>> text files. It's gotta be decades.
>>
>> Possible that you only work with 20+ year old Unix and OpenVMS systems
>> with 7 bit VT100 access.
>>
>> But that is not very common.
>
> I work with what nearly everyone in the field works with these days: a
> mix of Unix, MacOS, and Windows, mainly Unix server blades whose services
> are accessed by mainly Windows desktop/netbook users with a smattering of
> Mac users and a small but growing contingent of smartphone users.
The you won't have any users using ASCII.
Arne
> If by "very common" you mean used on one in ten thousand or fewer of
> their computers. For every single z/OS machine in corporate America
> there are probably a thousand blade servers and ten thousand office PCs
> and employer-provided laptops and God alone knows how many employee
> smartphones with plans and/or handsets paid for by their company.
>
By that standard PCs, in which lets include desktops and laptops, are
also a tiny small proportion of all computers once you count phones and
all the embedded computers in vehicles.
IMO its a silly argument because very many PCs are used for only a small
part of the day and do very little apart from using electricity and
occasionally receiving and sending a few e-mails. A better measure is the
number of transactions and documents handled by each machine per year.
I have news for you - the number of business entities in those
3 sectors are a lot higher than 3.
We already understand that you have no knowledge about businesses.
But I assume that you have seen a world map. You are no aware
that other countries has public sectors??
> See other posts. Perhaps a collected few tens of thousands of computers
> using museum-worthy OSes like those versus a collected *billion* or more
> of machines running Windows, MacOS, iOS, Android, and Unix.
There are also more flies than humans on earth.
That does not make flies more important.
>>> There is nothing at all prominent about those IBM dinosaurs. They may
>>> have been prominent 30 years ago, but not now.
>>
>> Both z/OS and i are widely used today.
>
> If by "widely used" you mean on one in ten thousand or fewer computers.
But a lot more in revenue.
>>> Fine, then -- corporate America and home computers in America then.
>>
>> OK - neither z/OS or i are common on home computers.
>>
>> But they are very common in corporate America.
>
> If by "very common" you mean used on one in ten thousand or fewer of
> their computers. For every single z/OS machine in corporate America there
> are probably a thousand blade servers and ten thousand office PCs and
> employer-provided laptops and God alone knows how many employee
> smartphones with plans and/or handsets paid for by their company.
And?
If a company buys a mainframe for 20 M$ and 10000 PC's for 10 M$,
then it is 2/3 mainframe.
>> If all z/OS systems disappeared over night then everything would break
>> down, because so many critical systems are running on them.
>
> A somewhat scary thought, but hardly relevant unless you're trying to
> stir up enough public alarm to foment a general movement to replace these
> legacy systems with more modern ones.
It is relevant because the point is that most of the world
important data are processed by mainframes.
Some claim 80% of all financial data is stored on mainframe.
Sure they can be replaced. 10-20 years and 10-20 trillion dollars.
Arne
Yes.
>> Linux will be either ISO-8859-1 or UTF-8 not ASCII.
>
> Both contain ASCII as a subset -- if you take a pure-ASCII file and
> reencode it in either the result is the identical byte sequence.
Yes, but that does not change that they do not use ASCII. They
use ISO-8859-1 or UTF-8.
Arne
Bad argument: a text file contains records. They are variable length
records with a 'newline' encoding as the delimiter.
BTW, you can use C to handle iSeries text files through the usual gets()
and puts() functions despite the iSeries holding text in what are
effectively database rows. They have three fields per row - a line
number, a fixed length text field and an 8 byte ID. The latter is
equivalent to the way the last few columns of punched cards were often
used. I don't know why an OS/400 text file would need an ID field, but
its there. The reason that C's standard text handling works on these
files is down to the standard library, which is written to inter-convert
between C's internal null delimited string representations of lines and
the external fixed field representation.
Getting back on topic, I haven't used Java on an OS/400 but its available
and will almost certainly work the same way and, in addition, will
probably manage the mapping between EBCDIC and Unicode. It has to or it
would break WORA.
True.
But Java do run on some of these platforms.
>> The RandomAccessFile and search for CR and LF does not.
>
> It probably runs on all platforms Java is normally used on. It certainly
> runs on 99.99% or more of the machines anyone is likely to run Java on,
If you are counting machines: yes.
If you are counting dollars: no.
> AND the remaining less than .01% are ones sufficiently oddball that their
> operators will *know* to expect common crossplatform software to often
> break on them. Typical C code using I/O will probably not work on such
> machines without heavy modification, even C code that compiles and works
> fine on every POSIX-compliant system and every Windows box and most other
> machines.
C code just like Java code works if the code according to the
standard has well defined behavior.
But this functionality is not guaranteed to work in C either.
fgetpos and fsetpos do not work on offsets but on an opague type
that can contain more than offset.
fseek and ftell work on offsets for binary files, but for text
files it is opaque.
POSIX/SUS then adds lseek, which will either work with
offsets or return an error.
> Hell, these machines may not even be able to represent C source
> trees normally, requiring the compiler vendor to jump through hoops and
> requiring unusual tools and IDEs be used to hack C sources and not just
> the system text editor.
Text editors are by definition able to create text files and source
code is text files.
Try think logical.
> Hell, I wouldn't be surprised if there were no
> working C implementations on some of these systems
They do have C.
> -- and I'd be
> surprised if many, if any, of them ran Java at all, let alone had a fully
> compliant JavaSE 5/6 implementation.
I am not surprised that you would be surprised - you don't seem to know
much about systems.
z/OS, i and OpenVMS all has certified Java versions.
>> Whether it works on platforms that interest you are completely
>> irrelevant.
>
> On the contrary, whether software works on platforms that interest its
> developer and user base is 100% relevant and whether it works on
> platforms that *don't* interest its developer and user base is irrelevant.
No.
Not if the discussion is about general usage.
And it is bad Java programming to write code that only works
on some Java platforms even though the expectation is that the
program will only be used on platforms where it do work.
>>> Since those days, the world has
>>> standardized on ASCII flat files for text files.
>>
>> Not really.
>>
>> Windows uses CP-1252, UTF-8 and UTF-16 Unix/Linux/VMS uses ISO-8859-1
>> and UTF-8
>
> All ASCII supersets. Which means the common denominator among all those
> is ... ta-da! ASCII. :)
That does not make them use ASCII.
>> IBM mainframe uses EBCDIC
>
> And hardly anyone uses IBM mainframe (sic). What was that figure again?
> 0.01% of all computers?
I think the number was 80% of financial data.
:-)
>> There are really very few systems today that uses just ASCII.
>
> But many that use ASCII.
Very few.
Most support ASCII because they use something that
is compatible with ASCII.
Arne
And with what do you support your claim for this definition of "text
file"?
I hope it's something more solid than KW's flailing appeals to
"notion" and the like, which are unsupported by contemporary or
historical uses of the term "text", in the computing disciplines or
more broadly. Have you something better to offer?
--
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
Embedded computers have a huge majority over all general-purpose
computers, by orders of magnitude, if we're counting CPUs. The line
between "smartphones" and other mobile phones is fuzzy; but among
computing devices that support at least some general-purpose
applications (as opposed to dedicated controllers), phones are far and
away in the majority, by number of CPUs.
In other words, wrong again, Ken.
No, you wouldn't.
> nor
> is polluting a newsgroup with off-topic traffic.
Unlike polluting a newsgroup with ignorance and dull repetition, eh?
> Your personal opinions of others are not the topic of this newsgroup.
Actually, they are. Check the charter.
> Do you have anything Java-related to say?
Yes.
>>>> On the IBM i machines (formerly i Series, formerly System i, formerly
>>>> AS/400, successor to the System/3x), blah blah blah
>>> You're one to talk about provincialism. Who the hell uses these ancient
>>> museum pieces any more?
>> Thousands of organizations, which is why they still enjoy healthy sales.
>
> Ah, must be vendor lockin. Sucks to be them. Soon they'll be outcompeted
> by newer, nimbler firms that use modern things like the free Unixes on
> commodity hardware.
Yes, soon, no doubt. O glorious day, when we are ushered into the Age
of Wessonism! Free unicorns for all!
> Of course, they might last a while if they can keep
> convincing the government to give them "bailouts" or other protectionist
> help in the face of competitors and their own screwups.
Careful, Ken - you'll short out your keyboard with all that spittle.
> Still, your "thousands" of organizations are outweighed by the *hundreds*
> of thousands that don't use such systems
No, they aren't. But do let us know when your cognitive abilities pass
beyond counting.
(On second thought - don't.)
>> And the majority of business transactions
>
> have no bearing on this discussion, which has to do with the majority of
> *computers*
No, it doesn't. You don't have the power to determine what the
discussion is about; it's about whatever the participants - all the
participants - decide to discuss.
I'm pleased to see that my prediction of your failure to learn was
right on the money.
It is still a different char set.
Arne
They are not using ASCII.
They are using something that is backwards compatible
with ASCII.
Arne
The makes it not ASCII.
A Java 1.6 app is not a Java 1.2.2 app just because
some of the functionality were present in Java 1.2.2
as well.
>>> Technically it's not your granddaddy's ASCII with that
>>> in use, but it's close enough for government work, and certainly close
>>> enough not to mess with using tests for CR/LF to detect line
>>> boundaries.
>>
>> The character set and the record format are independent of each other.
>
> Record formats are not relevant here, since text files do not have record
> formats;
Lines are a record format.
> they are raw sequences in some character set more or less by
> definition. Anything with additional structure over and above that is
> something other than a text file. Generically we call such things "binary
> files" though commonly binary files do *contain* text. But all contain
> additional structure that cannot be represented in, say, a
> java.lang.String without resort to some form of escaping or encoding. And
> that makes them not pure text, but text-and-some-other-stuff or some-
> other-stuff-that-happens-to-contain-text.
Not true.
Which you can easily verify by having a Java program read
such a file, those lines are read fine into a String.
Arne
Occasionally backwards compatibility is a design goal.
If you knew about programming, then you would have
seen that before.
Arne
That just mean that it use something ASCII compatible - not that
it uses ASCII.
And you can easily verify that it indeed supports characters
not part of ASCII.
>> DOS used CP-437, CP-850 etc..
>>
>> 32/64 bit Windows uses CP-1252 (which is practically the same as
>> ISO-8859-1) and some UTF-16.
>
> All of those seem to be ASCII plus another up to 128 characters, or in
> the case of UTF-16, another up to 65408 characters.
>
> Saying that a 7-bit-clean file interpreted in one of those is not ASCII
> is like saying that humans are not mammals.
And?
Noone is saying that such a file is not ASCII.
We are saying that the system are not reading ASCII. It reads
a character set that is backwards compatible with ASCII.
Arne
PS: UTF-16 is *not* ASCII compatible.
Well the topic was market share.
Market share is counted in dollars.
And since somebody is willing to pay a lot more for a mainframe
running an entire bank than for somebody to be able to read email,
then counting computers does not really reflect market share.
Arne
Not really - the high number of end users mean that the company
is willing to pay a lot of money for those systems, which impacts
the market share.
Arne
A text file is something you read and write as lines of text.
Whether the system used LF delimiters or CR LF delimters or
a counted approach does not matter.
>> They have the advantage of begin able to actually have all possible
>> values in lines.
>
> That's nonsense. The only character a normal text file cannot have in
> lines is a line break, and in actual fact you cannot have a line break in
> the middle of a line *by definition*. Wherever there is a line break one
> line ENDS and another one BEGINS, *by definition*. If that weren't the
> case then it wouldn't be a line break!
No.
newline is a code 10 in many char sets.
It is perfectly valid as content in the middle of a line on
older MacOS systems (because they use another line delimiter
and on all systems using count prefixes (no line delimiter at
all).
> So there is no "advantage" here. What you are actually describing is a
> "list-of-strings" file, not a text file
A text file is a list of strings.
> Face it: those record-oriented file formats are not text files. They have
> additional structure that cannot be represented natively in a String,
Neither can delimited files
> therefore represent more than just a String, such as a collection of
> Strings, and therefore are not text files but something else -- archives
> of multiple text files bundled into single files.
>
> The main use for such a thing over plain ordinary text files that I can
> think of is storing a mailbox without resorting to hacks that behave
> oddly when lines in the bodies start with the word "from". And these days
> filesystems work fine with large directories full of tiny files, so
> there's less need for that sort of thing than there once was.
>
>> And the disadvantage of various hacks assuming all records use
>> delimiters does not work.
>
> Nobody is assuming records use delimiters. They are assuming text files
> are text files. The lines in text files use delimiters as an inherent
> property.
No.
That is an illusion that you seem to have.
> If you have a text in a String, seeking backward from the end
> until a newline character (or the beginning of the String, whichever you
> hit first) will reliably find the start of the last line in the String.
No.
It will not work on systems that uses CR as line delimiter or systems
using count prefixed lines.
> The same is true of any disk file format that faithfully represents the
> String as a flat string of text rather, and in particular of the formats
> commonly used to store, e.g., C source files.
Wrong.
C source files are stored using count prefix line format son systems
that uses such.
Arne
Somebody with the name of Ken Wesson wrote:
# Since those days, the world has standardized on ASCII flat files for
text files.
# Windows text files are flat ASCII files (with CRLF line ends). Mac text
# files are flat ASCII files (with CR line ends). Unix text files are flat
# ASCII files (with LF line ends).
Arne
Alleged by my Usenet provider.
I was trying to extract the wisdom in your postings. Give me some credit
here. That quote is most certainly what you (pertinently) wrote, minus
the fluff.
And please, I beg of you sincerely and benevolently, stop acting like
such a loonie.
--
DF.
Dear Mr. Vajhøj ,
We'll see you in court.
Yours frivolously,
D. Futtorovic
Chief Representative of the Local (Pubic) Hairdresser's Union
"Your grand-daddy's ASCII" is exactly today's ASCII. Ergo, "It's not
your grand-daddy's ASCII" is exactly "It's not ASCII".
--
Lew
Indeed. And let's not forget where a lot of Eclipse funding comes from.
--
Jim Janney