Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ReadLine vs. ReadAll

967 views
Skip to first unread message

Dave "Crash" Dummy

unread,
Apr 19, 2012, 9:56:49 AM4/19/12
to
Many posts deal with editing text files. There are generally two ways to
do this, read/write one line at a time (Method 1, below) or read/write
the content all at once (Method 2, below). I generally use Method 1. I
only use Method 2 when I want to write the modified content back to the
original file, or I need random access to lines. My feeling is the fewer
operations, the better.

I have heard it argued that reading the whole file into memory in a
single read operation is faster than multiple read operations, but that
is a myth. Unless the file is really huge, the whole thing will be read
into memory as soon as it is opened for reading with the OpenTextFile
operation. After that, content is read from memory. The VBScript
interpreter has put the lines in an array for you, so why repeat the
process? Write operations are also buffered, so there is no penalty for
writing one line at a time. No actual disk writing occurs until the file
is closed.

That's my story and I'm sticking to it!
I'd like to hear some discussion, pro or con.

'==================================
Set objFile1=fso.OpenTextFile("file1.txt")
Set objFile2=fso.CreateTextFile("file2.txt")

'--------- Method 1:
do until objFile1.AtEndOfStream
line=objFile1.ReadLine
line=LineEdit(line)
objFile2.WriteLine line
loop

'--------- Method 2:
txt=objFile1.ReadAll
lines=split(txt,vbCRLF)
for x=0 to ubound(lines)
lines(x)=LineEdit(lines(x))
next
txt=join(lines)
objFile2.Write txt
'---------------------------

Function LineEdit(strng)
' LineEdit=edited string
End Function

--
Crash

"Something there is that doesn't love a wall, that wants it down."
~ Robert Frost ~

Mayayana

unread,
Apr 19, 2012, 10:35:14 AM4/19/12
to
I thought about this with your first character removal
script, but didn't say anything. I prefer ReadAll.
I like the idea of closing files as soon as possible. (Your
two samples don't include the Textstream.Close line, so
the issue is not so obvious in the way you've presented
the two options.)

Also, I don't know where you got your info. about buffering
but I don't think it's right. (I certainly *hope* that scrrun.dll
is not so poorly designed that it immediately loads all opened
files into a string array of lines! ...Though I guess I wouldn't
put that past the original scrrun.dll writers. It's a surprisingly
inefficient library.)

I wrote the following simple script:

'-----------------------------------------------
Dim FSO, TS
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TS = FSO.CreateTextFile("C:\Windows\desktop\test2.txt", True)

For i = 0 to 9
WScript.sleep 1000
TS.WriteLine Now
Next

TS.Close
Set TS = Nothing
Set FSO = Nothing
'-----------------------------------------

I then ran it with Filemon running and got lines like this:

10:16:27 AM WScript.exe:1044 WRITE C:\Windows\desktop\test2.txt SUCCESS
Offset: 115 Length: 21
10:16:27 AM WScript.exe:1044 WRITE C:\Windows\desktop\test2.txt SUCCESS
Offset: 136 Length: 2
10:16:28 AM WScript.exe:1044 WRITE C:\Windows\desktop\test2.txt SUCCESS
Offset: 138 Length: 21
10:16:28 AM WScript.exe:1044 WRITE C:\Windows\desktop\test2.txt SUCCESS
Offset: 159 Length: 2
10:16:29 AM WScript.exe:1044 WRITE C:\Windows\desktop\test2.txt SUCCESS
Offset: 161 Length: 21
10:16:29 AM WScript.exe:1044 WRITE C:\Windows\desktop\test2.txt SUCCESS
Offset: 182 Length: 2
10:16:30 AM WScript.exe:1044 WRITE C:\Windows\desktop\test2.txt SUCCESS
Offset: 184 Length: 21
10:16:30 AM WScript.exe:1044 WRITE C:\Windows\desktop\test2.txt SUCCESS
Offset: 205 Length: 2
10:16:31 AM WScript.exe:1044 WRITE C:\Windows\desktop\test2.txt SUCCESS
Offset: 207 Length: 21

As you can see, the file is held open while wscript writes
a few bytes to disk each second. (Oddly, it seems to do
a separate write just for the carriage return.)

Close does not mean "I'm done with the file now, you
can save it to disk." Close really does mean "close the file
and release my lock on it." You can write at any time while
the file is held open. As far as I know, the behavior
you're talking about is only used with INI file operations
via API.

So, it does make more sense to read out the file, close
it, then work on the string. The bigger the file and the more
involved the operations, the more efficient that will be. (It
also reduces the risk of a file locked until next reboot if your
script crashes before calling Close.)
The one exception to that, for me, is when I'm dealing with
binary files, which require special handling of null characters.
With a binary file ReadAll will stop at the first Chr(0). The
way around that is to first get the file size and then use
Read(filesize).

I also disagree with you about using Mid instead of
Right, by the way, but I don't have any technical reason
for that. :) It's just my preference. I find it confusing to
think of Mid as a substitute for Right. I like to keep them
separate for clarity and only use Mid with all 3 parameters,
when I need a string from within a string. But, as I said,
that's just my personal style of coding. Mid is documented
to work just as you use it.




-----------------------------------
--
--
"Dave "Crash" Dummy" <inv...@invalid.invalid> wrote in message
news:jmp5j1$voq$1...@dont-email.me...

Dave "Crash" Dummy

unread,
Apr 19, 2012, 1:11:46 PM4/19/12
to
Perhaps I should have been more explicit. The material is written to the
disk when the script is closed or there is some other break in the
action. Your test script "cheats" because it uses the "sleep" method.
Try it without that operation.

I didn't close either text stream explicitly because they are closed and
released when the script closes, which is immediately after execution. I
also wanted to keep the example simple.

I'm a lousy typist. I would much rather type
"mid(strng,n)" than
"right(strng,len(strng)+1-n)"
also, what happens when string length less than n is encountered?
--
Crash

Today is the first day of the rest of your life,
and there's not a damned thing you can do about it.

Mayayana

unread,
Apr 19, 2012, 3:35:28 PM4/19/12
to
| Perhaps I should have been more explicit. The material is written to the
| disk when the script is closed or there is some other break in the
| action. Your test script "cheats" because it uses the "sleep" method.
| Try it without that operation.
|
That still isn't right. If I remove WScript.Sleep I get
all the same writes. It just does them all at the same
time. In other words, every time I call WriteLine it writes
to disk, whether that's 10 time in 100 ms or once every
second. (And it does them all before the file is closed.)

Where did you get the information you're basing this
on? The idea of writing to a buffer doesn't make sense.
The idea that scrrun puts the entire file into a vbCrLf-
delimited array when the file is opened is even more crazy.
Scrrun doesn't know that you plan to read the file by line!
And what if you have a Unix file that's vbCr-delimited?
But aside from all that, why would it be so incredibly
wasteful? If you don't believe me, try this:

Dim FSO, TS
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TS = FSO.OpenTextFile("C:\Windows\desktop\test2.txt", 1)
TS.Close
Set TS = Nothing
Set FSO = Nothing

This is the result in Filemon:

3:02:36 PM wscript.exe:908 OPEN C:\Windows\desktop\test2.txt SUCCESS
Options: Open Access: All
3:02:36 PM wscript.exe:908 CLOSE C:\Windows\desktop\test2.txt SUCCESS

The file was opened and then closed. To say that opening
includes a full read doesn't make sense. (If I skip TS.Close
I still get the same result.)

You don't have to take my word for all this. You can
run the tests yourself. I just can't imagine how you arrived
at your conclusions in the first place. Maybe "The
Scripting Guys"? I can imagine "The Scripting Guys" could
have said something that could have been misconstrued,
but I can't think where else you might have seen that info.

(The Scripting Guys should be ignored in general, from my
point of view. From what I've seen they tend to assume
stupidity on the part of their audience, then they say
what they think the audience should hear, rather than
stating the facts clearly... I don't know why it is that
most official Microsoft bloggers seem to be smug and
condescending, but in my experience they are. There's
a tendency to tell the audience only what they think the
audience should hear -- which may be partial truth or
even falsehood.

One of my favorite examples is the WinSxS folder on
Vista/7. According to MS President Sinofsy there are
almost no actual files in WinSxS:

"nearly every file in the WinSxS directory is a "hard link" to the physical
files elsewhere on the system"
http://blogs.msdn.com/b/e7/archive/2008/11/19/disk-space.aspx

Acording to the "Windows Server Core Team", everything
is in WinSxS and nowhere else:

"All of the components in the operating system are found in the WinSxS
folder - in fact we call this location the component store."
http://blogs.technet.com/b/askcore/archive/2008/09/17/what-is-the-winsxs-directory-in-windows-2008-and-windows-vista-and-why-is-it-so-large.aspx

Who's right? Who knows! I just wanted to know how to
clean out unnecessary WinSxS files safely. Silly me, I tried
to get that information from Microsoft, and they don't think
we should worry our pretty little heads poking around in
WinSxS. :)

It reminds me of a friend I used to have who was an MD.
One of his favorite sayings was, "a little knowledge is a
dangerous thing". His point was that if "laypeople" tried
to understand health issues they'd only get themselves into
trouble. ...Oddly enough, he once asked me for advice about
vitamins. It turned out that he didn't think vitamins was
a topic relevant to an MD's job!)

Fortunately, with issues like Textstream ops, we don't have to
trust the "experts". We can all easily test what's true and
what isn't.

| I didn't close either text stream explicitly because they are closed and
| released when the script closes, which is immediately after execution.

I wouldn't depend on that. At best it's a bad habit to
depend on WScript to clean up one's mess. But also,
in most actual cases, at least in my scripts, it doesn't
make sense to leave files open. It's sloppy. It uses more
memory. It's hard to keep track. There are cases where I
might open 100 large files, one at a time. Is WScript
closing each one before I open the next? Maybe. Either
way it's a problem. If WScript is closing the last file when
I reuse the TS variable then it's second guessing me when
it should raise an error. If it's not closing the file then it's either
going to cause an error or cause 100 files to be stuck in
memory.

Alternatively, I might be opening 10 files and writing back
to one or more. To not close after a read or write would be
asking for trouble. Try this:

Dim FSO, TS
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TS = FSO.OpenTextFile("C:\Windows\desktop\test2.txt", 1)
s = TS.ReadLine
'-- Do lots of stuff here.
TS.WriteLine "okay"

It's another example of the limitations of your
initial example. Writing after CreateTextFile is
a simple operation, but if you open an existing file
you have to specify the mode. The above causes
an error with WriteLine because the file was opened
for reading.

If I'm doing anything more involved than one read
and write I usually write short subs for ReadFile and
WriteFile. The Textstream can then be created and
released within the sub and I can safely deal with
file content, not needing to keep track of what's open
in what mode.

| I'm a lousy typist. I would much rather type
| "mid(strng,n)" than
| "right(strng,len(strng)+1-n)"
| also, what happens when string length less than n is encountered?

It causes an error. I don't have a problem with that.
Errors are informative and can be trapped.

But as I said, I'm not arguing with your method. It's
perfectly "legal". I just don't like it in my own code
because I find it ambiguous. I think Mid should mean
Mid.... just my personal quirk.


Dave "Crash" Dummy

unread,
Apr 19, 2012, 3:55:32 PM4/19/12
to
I concede that Method 2 is faster than Method 1. I wrote two test files
that included a timer and used them to edit a long log file (49111
lines). They are shown below. Method 2 is also more versatile when
needed, but I do not concede that the file handling is any safer. I will
probably continue to use Method 1 unless Method 2 is really necessary
for a particular task.

'================= Method1.vbs =================
time1=timer

Set fso=CreateObject("Scripting.FileSystemObject")
Set objFile1=fso.OpenTextFile("easeus.log")
Set objFile2=fso.CreateTextFile("file1.txt")
Set fso=nothing

do until objFile1.AtEndOfStream
line=objFile1.ReadLine
objFile2.WriteLine mid(line,2)
loop

objFile1.close
objFile2.close
Set objFile1=nothing
Set objFile2=nothing

timex=round(timer-time1,3)
msgbox timex

'================= Method2.vbs =================
time1=timer

Set fso=CreateObject("Scripting.FileSystemObject")

Set objFile1=fso.OpenTextFile("easeus.log")
txt=objFile1.ReadAll
objFile1.close
Set objFile1=nothing

lines=split(txt,vbCRLF)
for x=0 to ubound(lines)
lines(x)=mid(lines(x),2)
next
txt=join(lines,vbCRLF)

Set objFile2=fso.CreateTextFile("file2.txt")
Set fso=nothing
objFile2.Write txt
objFile2.close
Set objFile2=nothing

timex=round(timer-time1,3)
msgbox timex
'=====================================

GS

unread,
Apr 19, 2012, 6:51:57 PM4/19/12
to
Dave,
FWIW.., I do lots of text file processing and I must say that I'm with
Mayayana on every point.

One thing I'll add is that file I/O processing should be done via
dedicated procedures that are passed a filename and a variable that
holds the text. This leaves you free to manage the code that processes
the text without the added overhead of I/O code that's otherwise
reusable.

The read procedure needs to be a function if you don't pass it a
variable to hold the text.

The write procedure should include an option to append to the target
file when necessary.

Both procedures should read or write in a single line of code *and*
close the file immediately.

An additional option could be whether the ending crlf is omitted or
not.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion


Dave "Crash" Dummy

unread,
Apr 20, 2012, 8:38:48 AM4/20/12
to
GS wrote:
> Dave,
> FWIW.., I do lots of text file processing and I must say that I'm with
> Mayayana on every point.
>
> One thing I'll add is that file I/O processing should be done via
> dedicated procedures that are passed a filename and a variable that
> holds the text. This leaves you free to manage the code that processes
> the text without the added overhead of I/O code that's otherwise reusable.
>
> The read procedure needs to be a function if you don't pass it a
> variable to hold the text.
>
> The write procedure should include an option to append to the target
> file when necessary.
>
> Both procedures should read or write in a single line of code *and*
> close the file immediately.
>
> An additional option could be whether the ending crlf is omitted or not.

Okay. No argument. I'm here to learn.

--
Crash

"When you want to fool the world, tell the truth."
~ Otto von Bismarck ~

Mayayana

unread,
Apr 20, 2012, 9:49:00 AM4/20/12
to
| Okay. No argument. I'm here to learn.
|
That's a generous attitude. I never think of
introducing discussions, like you did, but often
new information ends up coming out of it. (Not
to mention that it helps keep the group alive.)


Dave "Crash" Dummy

unread,
Apr 20, 2012, 12:15:06 PM4/20/12
to
I may be forced to change some bad habits. :-)

--
Crash

"I am not young enough to know everything."
~ Oscar Wilde ~

GS

unread,
Apr 20, 2012, 1:31:35 PM4/20/12
to
Dave "Crash" Dummy used his keyboard to write :
> Mayayana wrote:
>> | Okay. No argument. I'm here to learn.
>> |
>> That's a generous attitude. I never think of
>> introducing discussions, like you did, but often
>> new information ends up coming out of it. (Not
>> to mention that it helps keep the group alive.)
>
> I may be forced to change some bad habits. :-)

And so is what validates using this NG..! Ditto for me!
0 new messages