[Django] #19021: collectstatic should only copy files if they have changed or don't exist in destination

23 views
Skip to first unread message

Django

unread,
Sep 24, 2012, 8:03:05 PM9/24/12
to django-...@googlegroups.com
#19021: collectstatic should only copy files if they have changed or don't exist in
destination
-------------------------------+-------------------------
Reporter: dloewenherz | Owner: dloewenherz
Type: New feature | Status: new
Component: Uncategorized | Version: 1.4
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+-------------------------
When running `./manage.py collectstatic`, all files in all static
directories are copied to the location specified by `STATICFILES_STORAGE`,
regardless of whether they have already been copied or not.

I propose that collectstatic should only copy files to the destination if
they have changed or don't yet exist. I wrote my own solution which
doesn't incorporate staticfiles, but I'd like to see this in Django
proper. Without this feature, it can take ages to upload static media for
a large project. It makes sense to only update those assets which have
changed between deploys.

I currently solve this problem by creating a file containing metadata of
all the static media at the root of the destination. This file is a JSON
object that contains file paths as keys and checksum as values. When an
upload is started, the uploader checks to see if the file path exists as a
key in the dictionary. If it does, it checks to see if the checksums have
changed. If they haven't changed, the uploader skips the file. At the end
of the upload, the checksum file is updated on the destination.

--
Ticket URL: <https://code.djangoproject.com/ticket/19021>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Sep 27, 2012, 1:54:58 PM9/27/12
to django-...@googlegroups.com
#19021: collectstatic should only copy files if they have changed or don't exist in
destination
-------------------------------+---------------------------------------

Reporter: dloewenherz | Owner: dloewenherz
Type: New feature | Status: new
Component: Uncategorized | Version: 1.4
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+---------------------------------------
Changes (by anonymous):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

This does not sound right. When I run collectstatic it does not copy
unmodified files:
{{{
$ ./manage.py collectstatic
You have requested to collect static files at the destination
location as specified in your settings file.

This will overwrite existing files.
Are you sure you want to do this?

Type 'yes' to continue, or 'no' to cancel: yes

0 static files copied to '/var/www/html/static' (83 unmodified).
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:1>

Django

unread,
Sep 27, 2012, 2:24:24 PM9/27/12
to django-...@googlegroups.com
#19021: collectstatic should only copy files if they have changed or don't exist in
destination
-------------------------------+---------------------------------------
Reporter: dloewenherz | Owner: dloewenherz
Type: New feature | Status: closed
Component: Uncategorized | Version: 1.4
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage: Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+---------------------------------------
Changes (by aaugustin):

* status: new => closed
* resolution: => needsinfo


Comment:

I agree with comment #1.

--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:2>

Django

unread,
Sep 27, 2012, 2:35:31 PM9/27/12
to django-...@googlegroups.com
#19021: collectstatic should only copy files if they have changed or don't exist in
destination
-------------------------------+---------------------------------------
Reporter: dloewenherz | Owner: dloewenherz
Type: New feature | Status: closed
Component: Uncategorized | Version: 1.4
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage: Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+---------------------------------------

Comment (by dloewenherz):

Oy...I feel dumb now.

Must have been using an older version. Sorry for wasting everyone's time!

--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:3>

Django

unread,
Sep 27, 2012, 3:37:00 PM9/27/12
to django-...@googlegroups.com
#19021: collectstatic should only copy files if they have changed or don't exist in
destination
-------------------------------+---------------------------------------
Reporter: dloewenherz | Owner: dloewenherz
Type: New feature | Status: closed
Component: Uncategorized | Version: 1.4
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage: Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+---------------------------------------
Changes (by streeter):

* cc: django@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:4>

Django

unread,
Sep 27, 2012, 6:23:46 PM9/27/12
to django-...@googlegroups.com
#19021: collectstatic should only copy files if they have changed or don't exist in
destination
-------------------------------+---------------------------------------
Reporter: dloewenherz | Owner: dloewenherz
Type: New feature | Status: reopened
Component: Uncategorized | Version: 1.4
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+---------------------------------------
Changes (by dloewenherz):

* status: closed => reopened
* resolution: needsinfo =>


Comment:

Weird, I can't reproduce this (initially I commented saying I might have
overlooked something).

First of all, when working on a team with multiple people, the current
solution doesn't work. Every computer is on its own. Here's a real world
scenario:

1. Person A runs collectstatic. All files are uploaded.
2. Person B makes a change to one file and runs collectstatic. Again, all
files are uploaded.

This is untenable for a team size N > 1.

Secondly, there is a bug in the heuristic used to identify changed files.
I just changed a js file in one of my static folder after running
collectstatic earlier. When I run collectstatic again, nothing is re-
uploaded, even though the file has changed.

if you look at the source, this is because collectstatic only checks if a
file has changed by checking it's path name. The contents of the files
themselves are ignored.

--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:5>

Django

unread,
Sep 27, 2012, 11:47:43 PM9/27/12
to django-...@googlegroups.com
#19021: collectstatic should only copy files if they have changed or don't exist in
destination
-------------------------------+---------------------------------------
Reporter: dloewenherz | Owner: dloewenherz
Type: New feature | Status: reopened
Component: Uncategorized | Version: 1.4
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+---------------------------------------

Comment (by dloewenherz):

Alright--I'm being educated here...I missed a bunch of things, but there
are still a couple of problems. I noted them in an email to the developers
list.

The heuristic is not file name. Last modified time is the heuristic, but
some backends don't have a reliable implementation of it (or don't support
it at all) and therefore this feature doesn't work for those backends.

Additionally, in any sort of source control, when a user updates their
repo, local files that were updated remotely show up as modified at the
time the repo is cloned or updated, not when the file was actually last
saved by the last author. You then have the same scenario I pointed to
earlier: when multiple people work on a project, they will re-upload the
same files multiple times.

For the reasons noted above, I would propose moving towards checksums and
away from last modified times to check if a file has been modified.

--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:6>

Django

unread,
Oct 8, 2012, 12:54:52 PM10/8/12
to django-...@googlegroups.com
#19021: collectstatic should support checksums as method to determine a file's
changed state
-------------------------------+---------------------------------------
Reporter: dloewenherz | Owner: dloewenherz
Type: New feature | Status: closed
Component: Uncategorized | Version: 1.4
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+---------------------------------------
Changes (by ptone):

* status: reopened => closed
* resolution: => wontfix


Comment:

In an ideal world - checksums would be an perfectly viable way to compare
files
for collectstatic. However, practically - they can't be well supported by
the
collectstatic API.

While md5 may be somewhat common - it is neither universal nor standard.
For
cloud based storage backends to support a comparison metric other than
modification times for use by collectstatic, they would need to provide
that
value as a remote/api call, it would do no good if the only way to support
this involved retrieving the remote object to get a hash on it (even if
you had
drastically asymetrical bandwidth - this is just poor design).

Checksums have a compute cost that modification dates don't - so a
checksum
comparison would always need to be an alternate, not the primary
comparison.

Without a good universal way for a range of storage backends to provide
some
sort of fingerprint/hash - there is no good way for collectstatic to take
advantage of that approach.

In cases where your modification dates are rendered invalid because of
some
specific environment set up (like the git based team issue), there are a
couple
workarounds. Perhaps the best is to use collectstatic locally - where the
performance of copying every file isn't as bad, and then use a sync tool
(such as rsync --checksum) or something home built that can do the
checksum
based comparison knowing the specific remote storage you are working with
in
your project.

A couple other links that might prove useful for those working with git
(provided without endorsement or review)

http://gitorious.org/sstamp

http://repo.or.cz/w/metastore.git

--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:7>

Django

unread,
Aug 27, 2024, 2:08:12 PM8/27/24
to django-...@googlegroups.com
#19021: collectstatic should support checksums as method to determine a file's
changed state
--------------------------------+------------------------------------------
Reporter: Dan Loewenherz | Owner: Dan Loewenherz
Type: New feature | Status: closed
Component: Uncategorized | Version: 1.4
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+------------------------------------------
Comment (by Natalia Bidart):

Ticket #35709 was a duplicate.
--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:8>
Reply all
Reply to author
Forward
0 new messages