nzbperl parsing nzb files before completely written to disk

21 views
Skip to first unread message

Torawk

unread,
Dec 10, 2009, 5:52:57 PM12/10/09
to nzbperl
Hi,

I had an issue where nzbperl.pl was finding the nzb files before
they were completely written out. This caused nzbperl to fail in
parsing the xml and forget about the file. The main cause for this is
due to using nzbperl in daemon mode and copying files to the machine
it is on remotely via samba or even scp over my ISP as nzbperl sees
the files before the transfer is complete.

I fixed this quite awhile but thought others could use this patch.
It simply checks the modified date of the file instead of just the
existence of a file. So nzbperl will ignore the file until the
modified time is over 2 seconds ago, this allows for a slower
connection to full transfer the file before any parsing is done. I'm
using this on linux (ubuntu 9.10) so I'm not sure if it is portable
but any linux/unix like system should be fine.

Also, thanks to Jason for writing nzbperl in the first place.. I have
been using it for years.. I might have some future patches as I have a
couple other things I want to do... an enhancement for the remote port
summary.. and also a way to queue an nzb named the same as one already
downloaded but since removed from the file system.

-Brian

patch is simply:

--- orig/nzbperl.pl 2006-03-01 01:46:10.000000000 -0400
+++ nzbperl.pl 2009-12-07 23:17:30.867202784 -0400
@@ -1029,7 +1029,8 @@
opendir(QDIR, $queuedir);
my @candidates = grep(/\.nzb$/, readdir(QDIR));
foreach my $file (@candidates){
- if( !defined($nzbfiles{'files'}->{$file})){ # not queued yet
+ my $mtime = time() - (stat($queuedir . "/" . $file))[9];
+ if( !defined($nzbfiles{'files'}->{$file}) && ($mtime > 2)){ # not
queued yet && last modified more than 2 seconds ago
statMsg("Queueing new nzb file found on disk: $file");
$nzbfiles{'files'}->{$file}->{'read'} = 0;
$retCt++;

Jason Plumb

unread,
Dec 16, 2009, 3:04:09 AM12/16/09
to nzb...@googlegroups.com
Torawk wrote:
> I had an issue where nzbperl.pl was finding the nzb files before
> they were completely written out [when uploading on a somewhat slow
> link while using daemon mode].
<snip>

<snip patch that filters files younger than 2 seconds ago>

Hi Torawk! Thanks for the info and the patch!

This is a totally common scenario in the file process automation world.
The common, trivial solution is to upload the file to a temporary name
(usually via an extension -- like a.nzb would upload as a.nzb.tmp) and
then to rename the file after the entire upload has finished (mv
a.nzb.tmp a.nzb).

Similarly, you could just upload to a different directory on the same
filesystem and then remotely move into your queue dir after the upload
completes.

The move/rename is almost always atomic across operating/file systems,
and solves the partial read problem.

While I think your patch is awesome and I'm totally grateful for it and
appreciate having it in the archives, I can see this code change as a
slippery slope that probably unnecessarily leads to additional
configuration. Think about the guy who might say "Well, I like to
upload my nzb files over packet radio and it commonly stalls for 15 or
30 seconds". Sure...contrived, but a totally viable use case.

The better solution still, IMO, is to do the queueing atomically.

-jason
http://noisybox.net

Torawk

unread,
Jan 4, 2010, 7:13:00 AM1/4/10
to nzbperl

Sure no problem.. this works well for me so as there are updates to
nzbperl I can just apply it again. But yeah keeping it out of your
main nzbperl code makes sense from a support/emails asking "why does
it take so long to see an nzb" if someone incorrectly sets the value
(if it was a config var).

-Brian

Reply all
Reply to author
Forward
0 new messages