On 08.10.2012, at 07:38, ptone <
pre...@ptone.com> wrote:
> so after scanning this thread and the ticket again - it is still unclear that there could be a completely universal solution.
>
> While it would be nice if the storage API had a checksum(name) or md5(name) method - not all custom storage backends are going to support a single checksum standard. S3 doesn't explicitly support MD5 (apparently it unofficially does through ETags). Without a universal checksum - you can't use it to compare files across arbitrary backends.
You're able to ask S3 for the date of last modification, I don't see why a comparison by hashing the file content is needed additionally. It'd have to download the full file to do that on Django's side and I'm not aware of a API for getting a hash from cloudfiles, S3 etc.
I beg to differ, returning a datetime object makes absolute sense for comparing it to another datetime object. What I meant before is that the modified_time method can be written however the user wants as long as it returns a datetime object, even a date that is known to be older than the file on disk.
> On Sunday, October 7, 2012 8:59:16 PM UTC-7, Dan Loewenherz wrote:
> This issue just got me again tonight, so I'll try to push once more on this issue. It seems right now most people don't care that this is broken, which is a bummer, but in which case I'll just continue using my working solution.
>
> Dan
>
> On Sat, Oct 6, 2012 at 10:48 AM, Dan Loewenherz <
d...@dlo.me> wrote:
> Hey Jannis,
>
> On Mon, Oct 1, 2012 at 12:47 AM, Jannis Leidel <
lei...@gmail.com> wrote:
>
> On 30.09.2012, at 23:41, Dan Loewenherz <
d...@dlo.me> wrote:
>
> > Many backends don't support last modified times, and even if they all did, it's incorrect to assume that last modified time is an accurate heuristic for whether a file has already been uploaded or not.
>
> Well but it's an accurate way to decide whether a file has been changed on the filesystem, and that's what collectstatic cares about. The storage backend *is* the API to extend that when needed, so feel free to use it.
>
> It's accurate *only* in certain situations. And on a distributed development team, I've run into a lot of issues with developers re-upload files that have already been uploaded because they just recently updated their repo.
>
> A checksum is the only true accurate method to determine if a file has changed.
>
> Additionally, you didn't address my point that I quoted from. Storage backends don't just reflect filesystems--they could reflect files stored in a database, S3, etc. And some of these filesystems don't support last modified times.
>
> > It might be a better idea to let the backends decide when a file has been changed (instead of just calling the backend's last modified method).
>
> I don't understand, you can easily implement exactly that in the last_modified method if you'd like.
>
> This is a bit confusing...why call it last_modified when that's doesn't necessarily reflect what it's doing? It would be more flexible to create two methods:
>
> def modification_identifier(self):
>
> def has_changed(self):
>
> Then, any backend could implement these however they might like, and collectstatic would have no excuse in uploading the same file more than once. Overloading last_modified to also do things like calculate md5's seems a bit hacky to me, and confusing for any developer maintaining a custom storage backend that doesn't support last modified.
>
> Dan
>
>