I propose that collectstatic should only copy files to the destination if
they have changed or don't yet exist. I wrote my own solution which
doesn't incorporate staticfiles, but I'd like to see this in Django
proper. Without this feature, it can take ages to upload static media for
a large project. It makes sense to only update those assets which have
changed between deploys.
I currently solve this problem by creating a file containing metadata of
all the static media at the root of the destination. This file is a JSON
object that contains file paths as keys and checksum as values. When an
upload is started, the uploader checks to see if the file path exists as a
key in the dictionary. If it does, it checks to see if the checksums have
changed. If they haven't changed, the uploader skips the file. At the end
of the upload, the checksum file is updated on the destination.
--
Ticket URL: <https://code.djangoproject.com/ticket/19021>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Comment:
This does not sound right. When I run collectstatic it does not copy
unmodified files:
{{{
$ ./manage.py collectstatic
You have requested to collect static files at the destination
location as specified in your settings file.
This will overwrite existing files.
Are you sure you want to do this?
Type 'yes' to continue, or 'no' to cancel: yes
0 static files copied to '/var/www/html/static' (83 unmodified).
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:1>
* status: new => closed
* resolution: => needsinfo
Comment:
I agree with comment #1.
--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:2>
Comment (by dloewenherz):
Oy...I feel dumb now.
Must have been using an older version. Sorry for wasting everyone's time!
--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:3>
* cc: django@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:4>
* status: closed => reopened
* resolution: needsinfo =>
Comment:
Weird, I can't reproduce this (initially I commented saying I might have
overlooked something).
First of all, when working on a team with multiple people, the current
solution doesn't work. Every computer is on its own. Here's a real world
scenario:
1. Person A runs collectstatic. All files are uploaded.
2. Person B makes a change to one file and runs collectstatic. Again, all
files are uploaded.
This is untenable for a team size N > 1.
Secondly, there is a bug in the heuristic used to identify changed files.
I just changed a js file in one of my static folder after running
collectstatic earlier. When I run collectstatic again, nothing is re-
uploaded, even though the file has changed.
if you look at the source, this is because collectstatic only checks if a
file has changed by checking it's path name. The contents of the files
themselves are ignored.
--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:5>
Comment (by dloewenherz):
Alright--I'm being educated here...I missed a bunch of things, but there
are still a couple of problems. I noted them in an email to the developers
list.
The heuristic is not file name. Last modified time is the heuristic, but
some backends don't have a reliable implementation of it (or don't support
it at all) and therefore this feature doesn't work for those backends.
Additionally, in any sort of source control, when a user updates their
repo, local files that were updated remotely show up as modified at the
time the repo is cloned or updated, not when the file was actually last
saved by the last author. You then have the same scenario I pointed to
earlier: when multiple people work on a project, they will re-upload the
same files multiple times.
For the reasons noted above, I would propose moving towards checksums and
away from last modified times to check if a file has been modified.
--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:6>
* status: reopened => closed
* resolution: => wontfix
Comment:
In an ideal world - checksums would be an perfectly viable way to compare
files
for collectstatic. However, practically - they can't be well supported by
the
collectstatic API.
While md5 may be somewhat common - it is neither universal nor standard.
For
cloud based storage backends to support a comparison metric other than
modification times for use by collectstatic, they would need to provide
that
value as a remote/api call, it would do no good if the only way to support
this involved retrieving the remote object to get a hash on it (even if
you had
drastically asymetrical bandwidth - this is just poor design).
Checksums have a compute cost that modification dates don't - so a
checksum
comparison would always need to be an alternate, not the primary
comparison.
Without a good universal way for a range of storage backends to provide
some
sort of fingerprint/hash - there is no good way for collectstatic to take
advantage of that approach.
In cases where your modification dates are rendered invalid because of
some
specific environment set up (like the git based team issue), there are a
couple
workarounds. Perhaps the best is to use collectstatic locally - where the
performance of copying every file isn't as bad, and then use a sync tool
(such as rsync --checksum) or something home built that can do the
checksum
based comparison knowing the specific remote storage you are working with
in
your project.
A couple other links that might prove useful for those working with git
(provided without endorsement or review)
http://repo.or.cz/w/metastore.git
--
Ticket URL: <https://code.djangoproject.com/ticket/19021#comment:7>