Hi Jelmer,
we are using Dulwich in a kind of embedded environment (Windows with SSD but without UPS).
The system can be turned off or hard resettet without shutdown at any time.
We found out that when you turn off shortly (up to 30 seconds) after a commit was completed, the result are corrupted files (HEAD, refs/heads/xxx, the commit object and some more involved objects). The files have the proper length, but the content is completely filled with '0'-bytes.
We did some tests how we can get rid of that problem outside Dulwich. So after the commit we tried to reopen some of the files in 'append' mode and performed an fsync and closed. All files treated in that way were intact.
This would be a possible solution, but we have to track all files Dulwich touches and fsync them again.
Next experiment was inserting a flush right before the fsync inside the GitFile class of Dulwich. As a result we had no data loss any more.
It seems that the fsync in '
GitFile.close()' writes an unfilled buffer to disk and the implicit flush inside the 'file.close()' fills the buffer. But that buffer is synced sowehow later by the operating system.
Due to those findings it would be great if you could add the flush to the code for stability improvement.
(Please Python 2 support as well)
Regards,
Hans
P.S.:
Maybe opening the file with bufsize=0 would have the same effect, but we did not test this.