|Requesting code review: Asynchronous filehasher||Robert Nitsch||20.01.13 13:56|
I have recently started developing with Go. As a first project I have chosen an asynchronous filehasher. I would like to request a code review for the current state. I'm sure we can do better together. The project is hosted at bitbucket.org: https://bitbucket.org/rsnitsch/filehasher
The requirements are as follows:
- the library/application is for hashing files (e.g. SHA1).
- the interface should support that the caller specify the type of hash function.
- the application should be able to do I/O and hashing in parallel to optimize the overall performance.
- it should be possible to pause/resume the hashing at any time.
- it should also be possible to shutdown the concurrent goroutines completely at any time (not just pause them).
- later it should be possible to implement advanced optimizations like hashing files from different physical devices in parallel.
What I like about the project is that the requirements are easy to understand and it offers a lot to learn about Go's concurrency features. At the same time there are some non-trivial caveats involved. Maybe this project could later serve as an example project for beginners.
There are some things that I would like to discuss about in particular:
- currently, in the worker goroutine read(...) there is duplicate code when it comes to handling the pause/resume/abort signals. The first signal handler is at line 210 and it is used to handle signals while the goroutine is idle. The second is at line 271 and it is used to handle signals while the goroutine is busy. I would like to eliminate the duplicate code as far as possible.
- currently, in the worker goroutine read(...) I cannot use a defer statement to guarantee that the file handle's Close() method is called, because the goroutine may never quit. To work around this issue I have tried to encapsulate this part (starting at line 236) in a closure, but this conflicts with the requirement that the goroutine can be shutdown completely at any time. I.e. when an abort signal is received, the read(...) function should be quit immediately. However, from an inner function, I cannot issue a return for the outer function.
- how do you like the worker structs?
- how do you like the GetResult() and GetResultHash() functions? Do you think it might be better to expose a receive channel to the outside, such that the user may code his own select statements (e.g. with timeouts)?
I am glad for all the feedback I can get.
Thanks and best regards,