django cron job - stops after reading some portion of huge file - why is this?

89 views
Skip to first unread message

doniyor

unread,
Oct 15, 2013, 6:55:31 AM10/15/13
to django...@googlegroups.com
I am reading file from url and parsing it and saving some information out of this file into db - using cron job. 

i am testing now in my local dev. 

the problem is: job is reading file and saving into db without any problem but after some time, since file is very huge approx. >8GB, job doesnot do anything and freezes, without giving any error, 

i am using django 1.4, python 2.7 and postgresql. is there any limit for writing into db? why is it freezing? 


Bill Freeman

unread,
Oct 15, 2013, 9:40:42 AM10/15/13
to django-users
One possibility is that your code keeps all that is read (or something derived from it) in memory, and you are running out.

E.g.; Is your database code trying to do all this in a single transaction?

Another possibility is that something in the file at that spot triggers a but in your code that contains an infinite loop.

There are other possibilities.  But there's no diagnosing it with the information you've given.

Can you, in python, read through the file, doing nothing with the data?  E.g.:

    f = open('your/file/path/here')
    n = 0
    s = True
    while s:
        s = f.read(1024*1024)
        n += len(s)
        print n
    print 'done'

That should work.  If not, does your O/S not correctly handle files that big?

Bill


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/b567f273-508b-4f14-9d40-197af8c8a079%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

doniyor

unread,
Oct 15, 2013, 9:59:52 AM10/15/13
to django...@googlegroups.com
yes, db code is doin all these calls in single transaction, i mean, i am not using transactions, may be this is the reason? 

this is my cron code: http://pastebin.com/Lrym1z8E i know, very ugly code, it is saving at least some objects into db


also i noticed now that in db, there are objects whose some fields are not fully filled out even if the xml file does have those information. it means, this is a transaction issue, right? 

could you please take a look at the code? would transaction solve this issue? 

Bill Freeman

unread,
Oct 15, 2013, 11:04:02 AM10/15/13
to django-users
Yes, you should split the db activity into sensible transactions, since information about how to roll back is being stored somewhere (though some DBs may not have a problem with this.

You've added a whole new dimension when you say that this data is not, in fact, a local file that you are reading, but a network request.  There are many more things between you and the data source that could have trouble with the large data size.  I suspect that the most likely is that the server limits the time allowed for the request to complete.  Hopefully a server with such a limit provides for restarting the transfer from other than the beginning.

I'm sorry, but I don't have the spare cycles to debug this for you.  Try instrumenting things to confirm whether it is a read on the source that is hanging or something else.  Since it's hard to get data from a hung process, this requires some imagination.  You could write to a file an indication of the point in the code when you are about to read the source, when the source completes, when you are about to talk to the database, when that completes, etc., but note that you must close the file after each write (and open it anew before the next) since otherwise the write may be buffered in the process when it hangs.  All those opens and closes will be slow, so if you feel adventurous, a write to a piece of shared memory, shared with a monitoring process, might be better.

If you find something other than the read on the source not returning, write again and I, or someone else, with think with you some more.



doniyor

unread,
Oct 15, 2013, 11:07:27 AM10/15/13
to django...@googlegroups.com
Awesome, then let me try these things you mentioned.. i let you know then.. thanks in tons for now 
Reply all
Reply to author
Forward
0 new messages