Message from discussion
Feature request: collectstatic shouldn't recopy files that already exist in destination
Received: by 10.180.107.38 with SMTP id gz6mr1265335wib.0.1349545823846;
Sat, 06 Oct 2012 10:50:23 -0700 (PDT)
X-BeenThere: django-developers@googlegroups.com
Received: by 10.181.12.12 with SMTP id em12ls4070858wid.2.gmail; Sat, 06 Oct
2012 10:49:36 -0700 (PDT)
Received: by 10.180.75.197 with SMTP id e5mr1262870wiw.1.1349545776669;
Sat, 06 Oct 2012 10:49:36 -0700 (PDT)
Received: by 10.180.75.197 with SMTP id e5mr1262869wiw.1.1349545776656;
Sat, 06 Oct 2012 10:49:36 -0700 (PDT)
Return-Path: <dloewenh...@gmail.com>
Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42])
by gmr-mx.google.com with ESMTPS id fb20si492180wid.3.2012.10.06.10.49.36
(version=TLSv1/SSLv3 cipher=OTHER);
Sat, 06 Oct 2012 10:49:36 -0700 (PDT)
Received-SPF: pass (google.com: domain of dloewenh...@gmail.com designates 74.125.82.42 as permitted sender) client-ip=74.125.82.42;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of dloewenh...@gmail.com designates 74.125.82.42 as permitted sender) smtp.mail=dloewenh...@gmail.com; dkim=pass header...@gmail.com
Received: by mail-wg0-f42.google.com with SMTP id fm10so954565wgb.5
for <django-developers@googlegroups.com>; Sat, 06 Oct 2012 10:49:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20120113;
h=mime-version:sender:in-reply-to:references:from:date
:x-google-sender-auth:message-id:subject:to:content-type;
bh=cqyZ6rpJw6o3hzYG8qoFgLi2rb7NpY18P1lWtz5JgrA=;
b=eELMv2RsV9ffK7fmvzcqdXKNncnZOuDXCbROT/Eqb0rfJ38TW6+tBETRROmiv/C5sY
bP2wU1BWdQlUAZpB+4MZJcaBXagfW35WKmILlaUc+mcuRlhrY3e2df0f1SEPJiUAE7nt
SvsjzSCYYYQE+g71Pr10KjST0GfQRf0pZdZJEygOZBwZl+OuwFhloS9rh3SgT8pfBP3Y
3mF+H/XygHP22GCXQiXAmox6+AnEWG7DugHUD09Hk49xMYB4WAyzLRr/L7x/HEvvWFXw
bL8XjD7fz50GCkId+ZMFLIa4dxGHFoBTONBWMLMb6TTptj1qloRavMmBRloIgNzyv/K9
0+7g==
Received: by 10.180.90.201 with SMTP id by9mr10428543wib.5.1349545776468; Sat,
06 Oct 2012 10:49:36 -0700 (PDT)
MIME-Version: 1.0
Sender: dloewenh...@gmail.com
Received: by 10.223.204.138 with HTTP; Sat, 6 Oct 2012 10:48:56 -0700 (PDT)
In-Reply-To: <BFC83DCA-157E-4577-A32D-C52302300...@gmail.com>
References: <CAOKWLrHqJW+Db18n2Qbs8z4n4qVcfEJ5aZRPGuT6HkN47G5...@mail.gmail.com>
<7516c233-e5f2-4109-8ecf-6f4a0054b798@googlegroups.com> <CAOKWLrHydT4M8LX8j9n5SrJRghZ1gWGM4RFbPGbO_zS_AzE...@mail.gmail.com>
<BFC83DCA-157E-4577-A32D-C52302300...@gmail.com>
From: Dan Loewenherz <d...@dlo.me>
Date: Sat, 6 Oct 2012 10:48:56 -0700
Message-ID: <CAOKWLrGhKnhR7tiBhCO1UPp2-v-Ri5J1bTPyn-m2oPJtU1K...@mail.gmail.com>
Subject: Re: Feature request: collectstatic shouldn't recopy files that
already exist in destination
To: django-developers@googlegroups.com
Content-Type: multipart/alternative; boundary=f46d0438eba99d51fb04cb679c13
--f46d0438eba99d51fb04cb679c13
Content-Type: text/plain; charset=UTF-8
Hey Jannis,
On Mon, Oct 1, 2012 at 12:47 AM, Jannis Leidel <lei...@gmail.com> wrote:
>
> On 30.09.2012, at 23:41, Dan Loewenherz <d...@dlo.me> wrote:
>
> > Many backends don't support last modified times, and even if they all
> did, it's incorrect to assume that last modified time is an accurate
> heuristic for whether a file has already been uploaded or not.
>
> Well but it's an accurate way to decide whether a file has been changed on
> the filesystem, and that's what collectstatic cares about. The storage
> backend *is* the API to extend that when needed, so feel free to use it.
>
It's accurate *only* in certain situations. And on a distributed
development team, I've run into a lot of issues with developers re-upload
files that have already been uploaded because they just recently updated
their repo.
A checksum is the only true accurate method to determine if a file has
changed.
Additionally, you didn't address my point that I quoted from. Storage
backends don't just reflect filesystems--they could reflect files stored in
a database, S3, etc. And some of these filesystems don't support last
modified times.
> It might be a better idea to let the backends decide when a file has been
> changed (instead of just calling the backend's last modified method).
>
> I don't understand, you can easily implement exactly that in the
> last_modified method if you'd like.
>
This is a bit confusing...why call it last_modified when that's doesn't
necessarily reflect what it's doing? It would be more flexible to create
two methods:
def modification_identifier(self):
def has_changed(self):
Then, any backend could implement these however they might like, and
collectstatic would have no excuse in uploading the same file more than
once. Overloading last_modified to also do things like calculate md5's
seems a bit hacky to me, and confusing for any developer maintaining a
custom storage backend that doesn't support last modified.
Dan
--f46d0438eba99d51fb04cb679c13
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Hey Jannis,<div><br><div class=3D"gmail_quote">On Mon, Oct 1, 2012 at 12:47=
AM, Jannis Leidel <span dir=3D"ltr"><<a href=3D"mailto:lei...@gmail.com=
" target=3D"_blank">lei...@gmail.com</a>></span> wrote:<br><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex">
<div class=3D"im"><br>
On 30.09.2012, at 23:41, Dan Loewenherz <<a href=3D"mailto:d...@dlo.me">d=
a...@dlo.me</a>> wrote:<br>
<br>
> Many backends don't support last modified times, and even if they =
all did, it's incorrect to assume that last modified time is an accurat=
e heuristic for whether a file has already been uploaded or not.<br>
<br>
</div>Well but it's an accurate way to decide whether a file has been c=
hanged on the filesystem, and that's what collectstatic cares about. Th=
e storage backend *is* the API to extend that when needed, so feel free to =
use it.<br>
</blockquote><div><br></div><div>It's accurate *only* in certain situat=
ions. And on a distributed development team, I've run into a lot of iss=
ues with developers re-upload files that have already been uploaded because=
they just recently updated their repo.</div>
<div><br></div><div>A checksum is the only true accurate method to determin=
e if a file has changed.</div><div><br></div><div>Additionally, you didn=
9;t address my point that I quoted from. Storage backends don't just re=
flect filesystems--they could reflect files stored in a database, S3, etc. =
And some of these filesystems don't support last modified times.</div>
<div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex=
;border-left:1px #ccc solid;padding-left:1ex"><div class=3D"im">
> It might be a better idea to let the backends decide when a file has b=
een changed (instead of just calling the backend's last modified method=
).<br>
<br>
</div>I don't understand, you can easily implement exactly that in the =
last_modified method if you'd like.<br></blockquote><div><br></div><div=
>This is a bit confusing...why call it last_modified when that's doesn&=
#39;t necessarily reflect what it's doing? It would be more flexible to=
create two methods:</div>
<div><br></div><div>def modification_identifier(self):</div><div><br></div>=
<div>def has_changed(self):</div><div><br></div><div>Then, any backend coul=
d implement these however they might like, and collectstatic would have no =
excuse in uploading the same file more than once. Overloading last_modified=
to also do things like calculate md5's seems a bit hacky to me, and co=
nfusing for any developer maintaining a custom storage backend that doesn&#=
39;t support last modified.</div>
<div><br></div><div>Dan</div></div></div>
--f46d0438eba99d51fb04cb679c13--