Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

writelines puzzle

28 views
Skip to first unread message

William R. Wing (Bill Wing)

unread,
Aug 22, 2012, 11:38:55 AM8/22/12
to pytho...@python.org, William R. Wing (Bill Wing)
In the middle of a longer program that reads and plots data from a log file, I have added the following five lines (rtt_data is fully qualified file name):

wd = open(rtt_data, 'w')
stat = wd.write(str(i))
stat = wd.writelines(str(x_dates[:i]))
stat = wd.writelines(str(y_rtt[:i]))
wd.close()

The value of i is unknown before I have read through the input log file, but is typically in the neighborhood of 2500. x_dates is a list of time stamps from the date2num method, that is values of the form 734716.72445602, day number plus decimal fraction of a day. y_rtt is a list of three- or four-digit floating point numbers. The x_dates and y_rtt lists are complete and plot correctly using matplotlib. Reading and parsing the input log file and extracting the data I need is time consuming, so I decided to save the data for further analysis without the overhead of reading and parsing it every time.

Much to my surprise, when I looked at the output file, it only contained 160 characters. Catting produces:

StraylightPro:Logs wrw$ cat RTT_monitor.dat
2354[ 734716.72185185 734716.72233796 734716.72445602 ..., 734737.4440162
734737.45097222 734737.45766204][ 240. 28.5 73.3 ..., 28.4 27.4 26.4]

Clearly I'm missing something fundamental about using the writelines method, and I'm sure it will be a DUH moment for me, but I'd sure appreciate someone telling me how to get that data all written out. I certainly don't insist on writelines, but I would like the file to be human-readable.

Python 2.7.3
OS-X 10.8

Thanks,
Bill

Chris Kaynor

unread,
Aug 22, 2012, 12:48:41 PM8/22/12
to pytho...@python.org
Reading your post, I do not see for sure what your actual issue is, so
I am taking my best guess: that the file does not contain as much data
as would be expected.

On Wed, Aug 22, 2012 at 8:38 AM, William R. Wing (Bill Wing)
<w...@mac.com> wrote:
> In the middle of a longer program that reads and plots data from a log file, I have added the following five lines (rtt_data is fully qualified file name):
>
> wd = open(rtt_data, 'w')

Here, you are opening the data for write ("w"), which will replace the
contents of the file each time the file is opened. I am guessing you
want append ("a").

> stat = wd.write(str(i))
> stat = wd.writelines(str(x_dates[:i]))
> stat = wd.writelines(str(y_rtt[:i]))
> wd.close()

Also, rather than opening the file, writing to it, then closing it
manually, you would be better off using the with statement (presuming
Python 2.5+), like so:

with open(rtt_data, 'w') as wd:
wd.write(str(i))
wd.writelines(str(x_dates[:i]))
wd.writelines(str(y_rtt[:i]))

In this case, you can be absolutely certain that the file will be
closed at the end, even if one of the commands in the middle fails.
The way you had it written, if one of the write/writeline, or any of
the formatting, fails, the file would be left open for an
indeterminate amount of time (the wd variable may be kept around as
part of the exception, preventing it from being garbage collected and
thus closed). Not a big deal, but more important if you are doing more
work with the file open.

>
> The value of i is unknown before I have read through the input log file, but is typically in the neighborhood of 2500. x_dates is a list of time stamps from the date2num method, that is values of the form 734716.72445602, day number plus decimal fraction of a day. y_rtt is a list of three- or four-digit floating point numbers. The x_dates and y_rtt lists are complete and plot correctly using matplotlib. Reading and parsing the input log file and extracting the data I need is time consuming, so I decided to save the data for further analysis without the overhead of reading and parsing it every time.
>
> Much to my surprise, when I looked at the output file, it only contained 160 characters. Catting produces:
>
> StraylightPro:Logs wrw$ cat RTT_monitor.dat
> 2354[ 734716.72185185 734716.72233796 734716.72445602 ..., 734737.4440162
> 734737.45097222 734737.45766204][ 240. 28.5 73.3 ..., 28.4 27.4 26.4]
>
> Clearly I'm missing something fundamental about using the writelines method, and I'm sure it will be a DUH moment for me, but I'd sure appreciate someone telling me how to get that data all written out. I certainly don't insist on writelines, but I would like the file to be human-readable.
>
> Python 2.7.3
> OS-X 10.8
>
> Thanks,
> Bill
> --
> http://mail.python.org/mailman/listinfo/python-list

Joel Goldstick

unread,
Aug 22, 2012, 12:49:13 PM8/22/12
to William R. Wing (Bill Wing), pytho...@python.org
On Wed, Aug 22, 2012 at 11:38 AM, William R. Wing (Bill Wing)
<w...@mac.com> wrote:
> In the middle of a longer program that reads and plots data from a log file, I have added the following five lines (rtt_data is fully qualified file name):
>
> wd = open(rtt_data, 'w')
> stat = wd.write(str(i))
> stat = wd.writelines(str(x_dates[:i]))
> stat = wd.writelines(str(y_rtt[:i]))
> wd.close()
>
> The value of i is unknown before I have read through the input log file, but is typically in the neighborhood of 2500. x_dates is a list of time stamps from the date2num method, that is values of the form 734716.72445602, day number plus decimal fraction of a day. y_rtt is a list of three- or four-digit floating point numbers. The x_dates and y_rtt lists are complete and plot correctly using matplotlib. Reading and parsing the input log file and extracting the data I need is time consuming, so I decided to save the data for further analysis without the overhead of reading and parsing it every time.
>
> Much to my surprise, when I looked at the output file, it only contained 160 characters. Catting produces:
>
> StraylightPro:Logs wrw$ cat RTT_monitor.dat
> 2354[ 734716.72185185 734716.72233796 734716.72445602 ..., 734737.4440162
> 734737.45097222 734737.45766204][ 240. 28.5 73.3 ..., 28.4 27.4 26.4]
>
> Clearly I'm missing something fundamental about using the writelines method, and I'm sure it will be a DUH moment for me, but I'd sure appreciate someone telling me how to get that data all written out. I certainly don't insist on writelines, but I would like the file to be human-readable.
>
> Python 2.7.3
> OS-X 10.8
>
> Thanks,
> Bill
> --
> http://mail.python.org/mailman/listinfo/python-list

writelines writes a list of strings to a file.
you are using this:
> stat = wd.writelines(str(x_dates[:i]))
which is the same as my second line below
If you use map it will perform the first argument over the list.
See if that works for you

>>> l = [1,2,3]
>>> str(l)
'[1, 2, 3]'
>>> s = map(str, l)
>>> s
['1', '2', '3']



--
Joel Goldstick

Jerry Hill

unread,
Aug 22, 2012, 1:28:38 PM8/22/12
to pytho...@python.org
On Wed, Aug 22, 2012 at 11:38 AM, William R. Wing (Bill Wing)
<w...@mac.com> wrote:
> Much to my surprise, when I looked at the output file, it only contained 160 characters. Catting produces:
>
> StraylightPro:Logs wrw$ cat RTT_monitor.dat
> 2354[ 734716.72185185 734716.72233796 734716.72445602 ..., 734737.4440162
> 734737.45097222 734737.45766204][ 240. 28.5 73.3 ..., 28.4 27.4 26.4]

If that's the full output, then my guess is that x_dates and y_rtt are
not actual python lists. I bet they are, in fact, numpy arrays and
that the string representation of those arrays (what you're getting
from str(x_dates), etc) include the '...' in the middle instead of the
full contents.

Am I close?

--
Jerry

Peter Otten

unread,
Aug 22, 2012, 1:44:02 PM8/22/12
to pytho...@python.org
William R. Wing (Bill Wing) wrote:

> In the middle of a longer program that reads and plots data from a log
> file, I have added the following five lines (rtt_data is fully qualified
> file name):
>
> wd = open(rtt_data, 'w')
> stat = wd.write(str(i))
> stat = wd.writelines(str(x_dates[:i]))
> stat = wd.writelines(str(y_rtt[:i]))
> wd.close()
>
> The value of i is unknown before I have read through the input log file,
> but is typically in the neighborhood of 2500. x_dates is a list of time
> stamps from the date2num method, that is values of the form
> 734716.72445602, day number plus decimal fraction of a day. y_rtt is a
> list of three- or four-digit floating point numbers. The x_dates and
> y_rtt lists are complete and plot correctly using matplotlib. Reading and
> parsing the input log file and extracting the data I need is time
> consuming, so I decided to save the data for further analysis without the
> overhead of reading and parsing it every time.
>
> Much to my surprise, when I looked at the output file, it only contained
> 160 characters. Catting produces:
>
> StraylightPro:Logs wrw$ cat RTT_monitor.dat
> 2354[ 734716.72185185 734716.72233796 734716.72445602 ...,
> 734737.4440162
> 734737.45097222 734737.45766204][ 240. 28.5 73.3 ..., 28.4
> 27.4 26.4]
>
> Clearly I'm missing something fundamental about using the writelines
> method, and I'm sure it will be a DUH moment for me, but I'd sure
> appreciate someone telling me how to get that data all written out. I
> certainly don't insist on writelines, but I would like the file to be
> human-readable.

When you apply str() to a numpy array big arrays are helpfully truncated,
probably because the functionality is meant to be used in the interactive
interpreter rather than to write to a file.
The default value is 1000 entries. One way to get the desired output is to
increase the threshold:

>>> numpy.set_printoptions(threshold=4)
>>> print numpy.arange(10)
[0 1 2 ..., 7 8 9]
>>> numpy.set_printoptions(threshold=10)
>>> print numpy.arange(10)
[0 1 2 3 4 5 6 7 8 9]

Also, in

file.writelines(some_str)

writelines iterates over the characters of the some_string, so you should
instead write the above as

file.write(some_str)

Your code will become

assert numpy.get_printoptions()["threshold"] >= i
wd.write(str(x_dates[:i]))

If you intended to write one array entry at a time with writelines() here's
how to do that:

wd.write("[")
wd.writelines("%s " % x for x in x_dates[:i])
wd.write("]\n")

numpy.savetxt() may also suit your needs.

William R. Wing (Bill Wing)

unread,
Aug 22, 2012, 2:00:52 PM8/22/12
to Chris Kaynor, pytho...@python.org, William R. Wing (Bill Wing)
On Aug 22, 2012, at 12:48 PM, Chris Kaynor <cka...@zindagigames.com> wrote:

> Reading your post, I do not see for sure what your actual issue is, so
> I am taking my best guess: that the file does not contain as much data
> as would be expected.
>

Sorry, I should have been more explicit. The value of "i" in this instance is 2354, so the file should (I thought) have contained the value of "i" followed by 2 x 2354 values of the data.

> On Wed, Aug 22, 2012 at 8:38 AM, William R. Wing (Bill Wing)
> <w...@mac.com> wrote:
>> In the middle of a longer program that reads and plots data from a log file, I have added the following five lines (rtt_data is fully qualified file name):
>>
>> wd = open(rtt_data, 'w')
>
> Here, you are opening the data for write ("w"), which will replace the
> contents of the file each time the file is opened. I am guessing you
> want append ("a").
>

No, I really do want 'w'. In the final package, this will be invoked once at the end of the month, before the log file is compressed and rolled.

[big snip]

>>
>> Much to my surprise, when I looked at the output file, it only contained 160 characters [instead of ~2700 multi-character data values]. Catting produces:
>>
>> StraylightPro:Logs wrw$ cat RTT_monitor.dat
>> 2354[ 734716.72185185 734716.72233796 734716.72445602 ..., 734737.4440162
>> 734737.45097222 734737.45766204][ 240. 28.5 73.3 ..., 28.4 27.4 26.4]
>>

Please look at what cat found in the file. First there is the value of "i" as expected. This is then followed by the first 3 values of the time stamp x-data, then a comma, an ellipsis, another comma, and the last three data values. That patten (3 values, an ellipsis, and 3 final values) is repeated again for the y-data array/list. It is almost as though "writelines" had written an editorial summary of the data rather than the data.

>> Clearly I'm missing something fundamental about using the writelines method, and I'm sure it will be a DUH moment for me, but I'd sure appreciate someone telling me how to get that data all written out. I certainly don't insist on writelines, but I would like the file to be human-readable.
>>
>> Python 2.7.3
>> OS-X 10.8
>>
>> Thanks,
>> Bill
>> --
>> http://mail.python.org/mailman/listinfo/python-list
> --
> http://mail.python.org/mailman/listinfo/python-list

Dave Angel

unread,
Aug 22, 2012, 3:28:28 PM8/22/12
to William R. Wing (Bill Wing), pytho...@python.org
On 08/22/2012 02:00 PM, William R. Wing (Bill Wing) wrote:
> On Aug 22, 2012, at 12:48 PM, Chris Kaynor <cka...@zindagigames.com> wrote:
>
>> Reading your post, I do not see for sure what your actual issue is, so
>> I am taking my best guess: that the file does not contain as much data
>> as would be expected.
>>
> Sorry, I should have been more explicit. The value of "i" in this instance is 2354, so the file should (I thought) have contained the value of "i" followed by 2 x 2354 values of the data.
>
>> On Wed, Aug 22, 2012 at 8:38 AM, William R. Wing (Bill Wing)
>> <w...@mac.com> wrote:
>>> In the middle of a longer program that reads and plots data from a log file, I have added the following five lines (rtt_data is fully qualified file name):
>>>
>>> wd = open(rtt_data, 'w')
>> Here, you are opening the data for write ("w"), which will replace the
>> contents of the file each time the file is opened. I am guessing you
>> want append ("a").
>>
> No, I really do want 'w'. In the final package, this will be invoked once at the end of the month, before the log file is compressed and rolled.
>
> [big snip]
>
>>> Much to my surprise, when I looked at the output file, it only contained 160 characters [instead of ~2700 multi-character data values]. Catting produces:
>>>
>>> StraylightPro:Logs wrw$ cat RTT_monitor.dat
>>> 2354[ 734716.72185185 734716.72233796 734716.72445602 ..., 734737.4440162
>>> 734737.45097222 734737.45766204][ 240. 28.5 73.3 ..., 28.4 27.4 26.4]
>>>
> Please look at what cat found in the file. First there is the value of "i" as expected. This is then followed by the first 3 values of the time stamp x-data, then a comma, an ellipsis, another comma, and the last three data values. That patten (3 values, an ellipsis, and 3 final values) is repeated again for the y-data array/list. It is almost as though "writelines" had written an editorial summary of the data rather than the data.

This problem has nothing to do with writelines(). You pass writelines()
a simple string, it writes it to the file. writelines() normally EXPECTS
a list, but you convert it with the str() function. So it iterates
through the characters, rather than through the elements of a list.

Clearly, the value x_data is not a list either, as you claimed in your
original message. Calling str() on a list would have produced a single
string representing a list, and assuming the elements are floats, it'd
look pretty much what you had except for the ellipses. Is that really
what you wanted? If so, i'd fix the code that produced a non-list for
that value (and similarly for y_rtt).

On the other hand, perhaps you expected the values to be one or two per
line, judging from your use of the writelines() function.

I suspect you have a numpy array or similar object, not a list. So
confirm what you've actually got, and what you want/expect the output to
look like, and somebody will (or already has) help out.



--

DaveA

William R. Wing (Bill Wing)

unread,
Aug 22, 2012, 2:36:34 PM8/22/12
to Jerry Hill, pytho...@python.org, William R. Wing (Bill Wing)
On Aug 22, 2012, at 1:28 PM, Jerry Hill <malac...@gmail.com> wrote:

> On Wed, Aug 22, 2012 at 11:38 AM, William R. Wing (Bill Wing)
> <w...@mac.com> wrote:
>> Much to my surprise, when I looked at the output file, it only contained 160 characters. Catting produces:
>>
>> StraylightPro:Logs wrw$ cat RTT_monitor.dat
>> 2354[ 734716.72185185 734716.72233796 734716.72445602 ..., 734737.4440162
>> 734737.45097222 734737.45766204][ 240. 28.5 73.3 ..., 28.4 27.4 26.4]
>
> If that's the full output, then my guess is that x_dates and y_rtt are
> not actual python lists. I bet they are, in fact, numpy arrays and
> that the string representation of those arrays (what you're getting
> from str(x_dates), etc) include the '...' in the middle instead of the
> full contents.
>
> Am I close?
>
> --
> Jerry

Yes - bingo. They are numpy arrays. And that was the hint I needed to go look in the numpy docs. Found numpy.set_printoptions(threshold='nan'), which has indeed allowed me to get all the data into the file.

I now see Peter Otten made the same suggestion.

Thanks to you both.

Bill

0 new messages