While this isn't an Erlang-specific question, the problem arises from my using Richard Carlsson's file_monitor (https://github.com/richcarl/eunit/blob/master/src/file_monitor.erl), which sends messages when a file or directory is changed. I have found that it is not unusual to get a message about a new file before the file has been completely written.
I had thought that by doing a file:open(Filepath, [read]) and making sure I got back {ok, _} rather than {error, eacces} I could avoid those cases, but that approach has failed for me: this morning, I got back {ok, _}, but the file was not completely written yet.
Another approach I tried was to attempt to obtain an exclusive lock (I think it was file:open(Filepath, [read, exclusive])), but in my testing I came across the bizarre scenario where I would copy a file into the monitored directory, the file_monitor would send the message, but the Erlang process that does the file-open didn't see it, so created the file (the documentation says it creates the file if it does not exist), and then I got a message in my window where I was copying that the file already exists, do I want to overwrite it.
Another approach I tried was renaming the file to itself. All my tests indicated that that approach would work, but all my tests also indicated that just doing the file:open(Filepath, [read]) would work, too, so I chose it, as it seemed cleaner. I could revert to the rename approach, but I'm not even sure now that that will work.
I imagine others among us have encountered this issue, and rather than reinvent the wheel, what is the favored approach to handling this issue?
> While this isn’t an Erlang-specific question, the problem arises from my using Richard Carlsson’s file_monitor(https://github.com/richcarl/eunit/blob/master/src/file_monitor.erl), which sends messages when a file or directory is changed. I have found that it is not unusual to get a message about a new file before the file has been completely written.
> I had thought that by doing a file:open(Filepath, [read]) and making sure I got back {ok, _} rather than{error, eacces} I could avoid those cases, but that approach has failed for me: this morning, I got back {ok, _}, but the file was not completely written yet.
> Another approach I tried was to attempt to obtain an exclusive lock (I think it was file:open(Filepath, [read, exclusive])), but in my testing I came across the bizarre scenario where I would copy a file into the monitored directory, the file_monitor would send the message, but the Erlang process that does the file-open didn’t see it, so created the file (the documentation says it creates the file if it does not exist), and then I got a message in my window where I was copying that the file already exists, do I want to overwrite it.
> Another approach I tried was renaming the file to itself. All my tests indicated that that approach would work, but all my tests also indicated that just doing the file:open(Filepath, [read]) would work, too, so I chose it, as it seemed cleaner. I could revert to the rename approach, but I’m not even sure now that that will work.
> I imagine others among us have encountered this issue, and rather than reinvent the wheel, what is the favored approach to handling this issue?
"Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix"
> While this isn’t an Erlang-specific question, the problem arises from my > using Richard Carlsson’s /file_monitor/ > (https://github.com/richcarl/eunit/blob/master/src/file_monitor.erl), > which sends messages when a file or directory is changed. I have found > that it is not unusual to get a message about a new file before the file > has been completely written.
> I had thought that by doing a file:open(Filepath, [read]) and making > sure I got back {ok, _} rather than {error, eacces} I could avoid those > cases, but that approach has failed for me: this morning, I got back > {ok, _}, but the file was not completely written yet.
> Another approach I tried was to attempt to obtain an exclusive lock (I > think it was file:open(Filepath, [read, exclusive])), but in my testing > I came across the bizarre scenario where I would copy a file into the > monitored directory, the file_monitor would send the message, but the > Erlang process that does the file-open didn’t see it, so created the > file (the documentation says it creates the file if it does not exist), > and then I got a message in my window where I was copying that the file > already exists, do I want to overwrite it.
> Another approach I tried was renaming the file to itself. All my tests > indicated that that approach would work, but all my tests also indicated > that just doing the file:open(Filepath, [read]) would work, too, so I > chose it, as it seemed cleaner. I could revert to the rename approach, > but I’m not even sure now that that will work.
> I imagine others among us have encountered this issue, and rather than > reinvent the wheel, what is the favored approach to handling this issue?
> Cheers,
> David Mercer
Hey, a user! I haven't had any reports about this module before (and the fact that it's still in my development branch of eunit is more of a historical accident; it's not shipped with OTP). I don't know of any real issues with it though.
In this case, I think the problem is just the underlying file system semantics. I presume it's Linux, and in Unix:y file systems a file can be seen to exist and can be opened for reading as soon as it has been created. Trying to fiddle with exclusive locks is probably always going to have corner cases. The only techniques you can trust to practically always work and be portable across file systems are directory creation and file renaming. So what Tony suggested is likely to be the best solution: create the file under another name or in a separate directory, and when it's completely written, rename it.
I'm not the one writing the file. I'm the one reading it. I have no control over the writing.
Thanks for the thoughts, though.
DBM
From: Tony Rogvall [mailto:t...@rogvall.se] Sent: Wednesday, March 07, 2012 11:40 AM To: David Mercer Cc: erlang-questi...@erlang.org Subject: Re: [erlang-questions] Reading a file before it has been completely written
- Create and open a file with a temporary name.
- Write the file content.
- Close the file.
- Rename the file to the name/place you want.
works ?
/Tony
On 7 mar 2012, at 18:25, David Mercer wrote:
While this isn't an Erlang-specific question, the problem arises from my using Richard Carlsson's file_monitor(https://github.com/richcarl/eunit/blob/master/src/file_monitor. erl), which sends messages when a file or directory is changed. I have found that it is not unusual to get a message about a new file before the file has been completely written.
I had thought that by doing a file:open(Filepath, [read]) and making sure I got back {ok, _} rather than{error, eacces} I could avoid those cases, but that approach has failed for me: this morning, I got back {ok, _}, but the file was not completely written yet.
Another approach I tried was to attempt to obtain an exclusive lock (I think it was file:open(Filepath, [read, exclusive])), but in my testing I came across the bizarre scenario where I would copy a file into the monitored directory, the file_monitor would send the message, but the Erlang process that does the file-open didn't see it, so created the file (the documentation says it creates the file if it does not exist), and then I got a message in my window where I was copying that the file already exists, do I want to overwrite it.
Another approach I tried was renaming the file to itself. All my tests indicated that that approach would work, but all my tests also indicated that just doing the file:open(Filepath, [read]) would work, too, so I chose it, as it seemed cleaner. I could revert to the rename approach, but I'm not even sure now that that will work.
I imagine others among us have encountered this issue, and rather than reinvent the wheel, what is the favored approach to handling this issue?
"Installing applications can lead to corruption over time. Applications gradually write over each other's libraries, partial upgrades occur, user and system errors happen, and minute changes may be unnoticeable and difficult to fix"
On Wednesday, March 07, 2012, Richard Carlsson wrote: > Hey, a user! I haven't had any reports about this module before (and > the > fact that it's still in my development branch of eunit is more of a > historical accident; it's not shipped with OTP). I don't know of any > real issues with it though.
It works fine. If you know of a better one, I'd be OK switching. This was just the one that came up when I Googled.
> In this case, I think the problem is just the underlying file system > semantics. I presume it's Linux, and in Unix:y file systems a file can > be seen to exist and can be opened for reading as soon as it has been > created. Trying to fiddle with exclusive locks is probably always going > to have corner cases. The only techniques you can trust to practically > always work and be portable across file systems are directory creation > and file renaming. So what Tony suggested is likely to be the best > solution: create the file under another name or in a separate > directory, > and when it's completely written, rename it.
I might go back to my renaming approach, which also had no failures during testing. I just attempt to rename the file to itself. If it fails, I try again 5 seconds later.
(Sorry, Richard for getting this multiple times, forgot to include the list)
Just as an added caveat to what Richard mentions: you probably want to make sure that the temp file is created on the same file system as your test directory. A rename between file systems is really just a copy and delete, with the exact same problems you had previously (depending on OS I guess), while a rename within a file system is a link/unlink with the contents being untouched.
/Daniel
On 7 March 2012 12:54, Richard Carlsson <carlsson.rich...@gmail.com> wrote:
>> I had thought that by doing a file:open(Filepath, [read]) and making >> sure I got back {ok, _} rather than {error, eacces} I could avoid those >> cases, but that approach has failed for me: this morning, I got back >> {ok, _}, but the file was not completely written yet.
>> Another approach I tried was to attempt to obtain an exclusive lock (I >> think it was file:open(Filepath, [read, exclusive])), but in my testing >> I came across the bizarre scenario where I would copy a file into the >> monitored directory, the file_monitor would send the message, but the >> Erlang process that does the file-open didn’t see it, so created the >> file (the documentation says it creates the file if it does not exist), >> and then I got a message in my window where I was copying that the file >> already exists, do I want to overwrite it.
>> Another approach I tried was renaming the file to itself. All my tests >> indicated that that approach would work, but all my tests also indicated >> that just doing the file:open(Filepath, [read]) would work, too, so I >> chose it, as it seemed cleaner. I could revert to the rename approach, >> but I’m not even sure now that that will work.
>> I imagine others among us have encountered this issue, and rather than >> reinvent the wheel, what is the favored approach to handling this issue?
>> Cheers,
>> David Mercer
> Hey, a user! I haven't had any reports about this module before (and the > fact that it's still in my development branch of eunit is more of a > historical accident; it's not shipped with OTP). I don't know of any real > issues with it though.
> In this case, I think the problem is just the underlying file system > semantics. I presume it's Linux, and in Unix:y file systems a file can be > seen to exist and can be opened for reading as soon as it has been created. > Trying to fiddle with exclusive locks is probably always going to have > corner cases. The only techniques you can trust to practically always work > and be portable across file systems are directory creation and file > renaming. So what Tony suggested is likely to be the best solution: create > the file under another name or in a separate directory, and when it's > completely written, rename it.
> On Wednesday, March 07, 2012, Richard Carlsson wrote:
>> Hey, a user! I haven't had any reports about this module before (and >> the >> fact that it's still in my development branch of eunit is more of a >> historical accident; it's not shipped with OTP). I don't know of any >> real issues with it though.
> It works fine. If you know of a better one, I'd be OK switching. This was > just the one that came up when I Googled.
No, I think it's pretty good and I don't know any other portable implementation. I'd just like to add optional inotify support (and whatever it's called on Windows) on supported platforms. Right now it only works by polling. Which is actually good enough in a lot of cases.
On Wednesday, March 07, 2012, I wrote: > On Wednesday, March 07, 2012, Richard Carlsson wrote:
> > Trying to fiddle with exclusive locks is probably always > going > > to have corner cases. The only techniques you can trust to > practically > > always work and be portable across file systems are directory > creation > > and file renaming. So what Tony suggested is likely to be the best > > solution: create the file under another name or in a separate > > directory, > > and when it's completely written, rename it.
> I might go back to my renaming approach, which also had no failures > during > testing. I just attempt to rename the file to itself. If it fails, I > try > again 5 seconds later.
For closure here, I went back to my approach of attempting to rename the file to itself before reading it. I'll let y'all know if I encounter any more corner cases.