How to continuously read a file while the file is been writing

3,886 views
Skip to first unread message

宓俊

unread,
Feb 2, 2012, 5:30:07 AM2/2/12
to nod...@googlegroups.com
I want the user to be able to download a file while it is uploading. So I write this code to test.

var fs = require('fs');
var writeStream = fs.createWriteStream(__dirname + "/uploading.txt");
writeStream.on('open', function(){
  console.log('Write stream has open');
  //Write a number every 1 second.
  var i = 1;
  var repeat = function(){
    writeStream.write(i);
    console.log('Number ' + i + ' has been write to the file');
    i = i + 1;
    setTimeout(repeat, 1000);
  };
  repeat();
  var readStream = fs.ReadStream(__dirname + "/uploading.txt");
  readStream.on('data', function(data){
    console.log('Data is coming: ' + data);
  });
  readStream.on('end', function(){
    console.log('Stream END');
  });
});

But the result is like this

Write stream has open
Number 1 has been write to the file
Data is coming: 1
Stream END
Number 2 has been write to the file
Number 3 has been write to the file
Number 4 has been write to the file

When the reading speed is faster than the writing speed, I will receive a 'end' signal, rather than paused.

So how to deal with this problem?

Diogo Resende

unread,
Feb 2, 2012, 8:54:18 AM2/2/12
to nod...@googlegroups.com
I think that's because of readstream reaching EOF. I don't know if
you can avoid this. One way could be knowing and having the size
updated for you and pausing readstream to avoid reaching EOF.

Not sure if it's feasible.. :)

---
Diogo R.

> --
> Job Board: http://jobs.nodejs.org/ [1]
> Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> [2]
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en [3]
>
>
> Links:
> ------
> [1] http://jobs.nodejs.org/
> [2]
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> [3] http://groups.google.com/group/nodejs?hl=en?hl=en

Matt

unread,
Feb 2, 2012, 9:04:04 AM2/2/12
to nod...@googlegroups.com
Yeah, ReadStream can't do it without re-opening the file, because you need seek() to be able to get rid of the EOF flag, and Node doesn't implement seek() (partly because of the integer problem with getting up to 64 bits, but that's a bit of a cop-out since createReadStream can take start/end params, so why was it ok there?). The "tail" module on npm just re-opens the file if the size changes using fs.watchFile().

Probably easier (and faster) to open a childprocess to tail -f, since opening and closing a file all those times has an overhead.

Matt.


 For more options, visit this group at

You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to

Diogo Resende

unread,
Feb 2, 2012, 9:18:41 AM2/2/12
to nod...@googlegroups.com
On Thu, 2 Feb 2012 09:04:04 -0500, Matt wrote:
> Yeah, ReadStream can't do it without re-opening the file, because you
> need seek() to be able to get rid of the EOF flag, and Node doesn't
> implement seek() (partly because of the integer problem with getting
> up to 64 bits, but that's a bit of a cop-out since createReadStream
> can take start/end params, so why was it ok there?). The "tail"
> module
> on npm just re-opens the file if the size changes using
> fs.watchFile().
>
> Probably easier (and faster) to open a childprocess to tail -f, since
> opening and closing a file all those times has an overhead.
>
> Matt.

Yeah.. I was thinking about tail.. :) just 2 "questions".

- Does Windows have any tail-like tool ?
- Are binary files a problem to tail in any way?

---
Diogo R.

fent

unread,
Feb 2, 2012, 9:41:13 AM2/2/12
to nodejs

Matt

unread,
Feb 2, 2012, 10:26:05 AM2/2/12
to nod...@googlegroups.com
There's one in the Windows Resource Kit. But nothing shipped with Windows.
 
- Are binary files a problem to tail in any way?

There's no concept of "binary files" on Unix systems (so basically, no problem). I have no idea about the Windows version.

Matt.

fent

unread,
Feb 2, 2012, 3:00:07 PM2/2/12
to nodejs
BTW, I was thinking of coding some type of service like this that lets
users download files as they're being uploaded. Would like to see what
you have if you don't mind.

On Feb 2, 3:30 am, 宓俊 <miju...@gmail.com> wrote:

Marcel Laverdet

unread,
Feb 2, 2012, 6:44:56 PM2/2/12
to nod...@googlegroups.com
You can do this easily with fs.watchFile() and fs.read(). I have a gist which uses fs.watchFile() on a MySQL binlog to stream off queries as they are written to the binary log:


The file handling logic starts at line 174. I don't see why everyone is so quick to load up tail when you can write this in pure node in 12 lines or so.

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to

Mi Jun

unread,
Feb 2, 2012, 11:31:06 PM2/2/12
to nod...@googlegroups.com
I am using formidable to upload files, so I can access to file.bytesReceived and file.bytesExpected.

I think tail is not a good solution. Since I have to pipe the readStream to response, I have to end the response when the file is completely uploaded, rather than continuously reading.

node-growing-file only support end by timeout. If it support end by bytesExpected, this will be great. I will see if I can submit a patch.

Marcel Laverdet

unread,
Feb 3, 2012, 1:16:24 AM2/3/12
to nod...@googlegroups.com
Seriously what you want is 12 lines with fs.watchFile(), just try it you will be shocked by how easy it is to get this running.

--

Mi Jun

unread,
Feb 3, 2012, 2:56:06 AM2/3/12
to nod...@googlegroups.com
Using fs.watchFile() and fs.read() will force me to re-implement stream.pipe(response) again. Using fs.createReadStream(path, {start:offset}) is much more easier.

Shin Suzuki

unread,
Feb 6, 2012, 2:39:57 AM2/6/12
to nod...@googlegroups.com
Marcel,

I wrote a simple code with 12 lines to read newly added lines of a file when changes happen, as you do in the previous sample.


However, it couldn't read newly added data (Node v0.6.9, macOS X).

Is something wrong with my code?
How can your code work without re-opening the file?



2012/2/3 Mi Jun <mij...@gmail.com>
Using fs.watchFile() and fs.read() will force me to re-implement stream.pipe(response) again. Using fs.createReadStream(path, {start:offset}) is much more easier.

--

Marcel Laverdet

unread,
Feb 6, 2012, 1:32:54 PM2/6/12
to nod...@googlegroups.com
Shin, I commented on your gist with code you can use. You just need to modify it to catchup with existing data. Please treat this only as a proof of concept, and be aware you need to handle error unexpected errors like truncated files, deleted files, and so on. Also keep in mind that on OS X fs.watchFile() is still kind of slow, on Linux you will get much faster results.

Matt

unread,
Feb 6, 2012, 11:37:23 PM2/6/12
to nod...@googlegroups.com
Also keep in mind that this code is horribly broken and full of race conditions :)

Adam Pritchard

unread,
Feb 7, 2012, 8:20:26 PM2/7/12
to nod...@googlegroups.com
I've also been working on a little log-file-following (tail -f) node module. 

I need it to work on Windows, so it's cross-platform. It works okay already, but I still have some stuff to do.

The behaviour of fs.watch is a bit sketchy. For example...

If I didn't need/want to support Windows I'd probably use fs.watchFile.

Matt

unread,
Feb 8, 2012, 11:40:31 AM2/8/12
to nod...@googlegroups.com
Still has a race condition in it.

--

Adam Pritchard

unread,
Feb 8, 2012, 4:36:57 PM2/8/12
to nod...@googlegroups.com
Can you elaborate? I'm pretty new to JS/Node, so maybe I'm not seeing it.

I recognize that the file reading isn't one-to-one with the notifications (i.e., many rapid file modifications will probably be processed in a single file read, and then following notifications won't have an effect until the file grows again), but I don't see that that's a problem -- the data should still get processed in a timely fashion.

On the other hand... my tests aren't running through successfully on two of three OSes, so there's certainly something not right...

I'd appreciate any bugs or flaws you can point out.

Adam

Matt

unread,
Feb 8, 2012, 6:08:05 PM2/8/12
to nod...@googlegroups.com
There's a race between when you get the results from stat and re-open the file. You know it's changed, you assume it has grown, so you re-open at "last-byte-read" position. But what if between that time, the file got truncated to zero, meaning you need to read from the start of the file?

Mark Hahn

unread,
Feb 8, 2012, 6:09:33 PM2/8/12
to nod...@googlegroups.com
That's not really a race, just a condition not covered.

Marcel Laverdet

unread,
Feb 8, 2012, 10:13:01 PM2/8/12
to nod...@googlegroups.com
Matt keeps screaming race condition because conditions stated to not be handled are not handled.

Mark Hahn

unread,
Feb 8, 2012, 10:21:37 PM2/8/12
to nod...@googlegroups.com
I'm curious why not handling this condition would be considered "horribly broken".  And a condition I can't imagine happening.  Why would a file being uploaded ever be zeroed out?

Shin Suzuki

unread,
Feb 8, 2012, 10:37:46 PM2/8/12
to nod...@googlegroups.com
Marcel,

Thanks for the sample code.
Unfortunately, it didn't work in my environment (Mac OS X and CentOS).
Essentially the code you've shown and my code is the same, in that both don't re-open the file.
It seems we cannot fetch the added data without re-opening,
Considering the overhead of opening the file in each "change" event, just spawning "tail -f" would be better as Matt says.
What's difficult is that we cannot know at which line the child process of "tail -f" starts to read when the file is growing.


2012/2/9 Marcel Laverdet <mar...@laverdet.com>

Marcel Laverdet

unread,
Feb 8, 2012, 11:04:56 PM2/8/12
to nod...@googlegroups.com
You certainly can do this without reopening the file. I'm using very similar code to tail MySQL's binlogs. Are you sure that you're appending to the file? Like others mentioned this does not handle reopening the file in the case it's removed or whatever. If that is a constraint you will have to handle it accordingly.

I'm running the code I posted on your gist on OS X just fine. I'm running it on a file and then in another terminal session running "echo hello >> file" and then seeing the output in the other terminal. There is a two second delay or so.

Adam Pritchard

unread,
Feb 8, 2012, 11:05:47 PM2/8/12
to nodejs
I was aware that my shrinking-file behaviour wasn't well defined yet
(except if/when watchit emits 'create' or 'unlink' events), but I
agree that it is a bad race condition between the statSync() and
createReadStream() calls. I didn't want to open and hold a file
descriptor in case it messes up any "primary" (log writing) processes,
but I'll have to be careful not to crash utterly. I want to learn how
to write JavaScript that's a robust as possible, so I'd better start
handling more exceptions (like, from calling createReadStream() on a
nonexistent file... which will probably/certainly crash any
application using it).

Thanks for your help.

Adam

On Feb 8, 6:08 pm, Matt <hel...@gmail.com> wrote:
> There's a race between when you get the results from stat and re-open the
> file. You know it's changed, you assume it has grown, so you re-open at
> "last-byte-read" position. But what if between that time, the file got
> truncated to zero, meaning you need to read from the start of the file?
>
> On Wed, Feb 8, 2012 at 4:36 PM, Adam Pritchard <pritchard.a...@gmail.com>wrote:
>
>
>
>
>
>
>
> > Can you elaborate? I'm pretty new to JS/Node, so maybe I'm not seeing it.
>
> > I recognize that the file reading isn't one-to-one with the notifications
> > (i.e., many rapid file modifications will probably be processed in a single
> > file read, and then following notifications won't have an effect until the
> > file grows again), but I don't see that that's a problem -- the data should
> > still get processed in a timely fashion.
>
> > On the other hand... my tests aren't running through successfully on two
> > of three OSes, so there's certainly something not right...
>
> > I'd appreciate any bugs or flaws you can point out.
>
> > Adam
>
> > On Wed, Feb 8, 2012 at 11:40 AM, Matt <hel...@gmail.com> wrote:
>
> >> Still has a race condition in it.
>
> >> On Tue, Feb 7, 2012 at 8:20 PM, Adam Pritchard <pritchard.a...@gmail.com>wrote:
>
> >>> I've also been working on a little log-file-following (tail -f) node
> >>> module.
> >>>https://github.com/adam-p/text-file-follower/blob/master/lib/index.co...

Shin Suzuki

unread,
Feb 8, 2012, 11:14:29 PM2/8/12
to nod...@googlegroups.com
Sorry, I mistook it. It went fine.
In some programs, they replace a file instead of appending...


2012/2/9 Marcel Laverdet <mar...@laverdet.com>

Marcel Laverdet

unread,
Feb 9, 2012, 3:36:18 AM2/9/12
to nod...@googlegroups.com
Yeah if the file is being replaced or truncated you will need to handle that differently. The code I posted is fine for the simple case of a unique file being uploaded, but obviously your case may be more complex. This is enough to get you started though! Gambate~
Reply all
Reply to author
Forward
0 new messages