Error building from a clean state and other cache issues

23 views
Skip to first unread message

Mariano Draghi

unread,
Apr 27, 2015, 10:48:27 PM4/27/15
to nikola-...@googlegroups.com
Hi,

I'm having a couple of issues with nikola's cache. I suspect there's something weird in my particular configuration, but I can't find what it is. Whenever I build the site from scratch, either after running "nikola clean" and "nikola forget", or after deleting the doit database and the cache/ and output/ folders, I get the following error:

[...]
. render_tags:output/assets/js/tag_cloud_data.json
. render_posts:cache/posts/rearrancando.html
. render_posts:cache/pages/home.html
Traceback (most recent call last):
File "/home/mariano/.virtualenvs/nikola/lib/python3.4/site-packages/doit/doit_cmd.py", line 121, in run
return command.parse_execute(args)
File "/home/mariano/.virtualenvs/nikola/lib/python3.4/site-packages/doit/cmd_base.py", line 84, in parse_execute
return self.execute(params, args)
File "/home/mariano/.virtualenvs/nikola/lib/python3.4/site-packages/doit/cmd_base.py", line 278, in execute
return self._execute(**exec_params)
File "/home/mariano/.virtualenvs/nikola/lib/python3.4/site-packages/doit/cmd_run.py", line 215, in _execute
return runner.run_all(self.control.task_dispatcher())
File "/home/mariano/.virtualenvs/nikola/lib/python3.4/site-packages/doit/runner.py", line 237, in run_all
self.run_tasks(task_dispatcher)
File "/home/mariano/.virtualenvs/nikola/lib/python3.4/site-packages/doit/runner.py", line 204, in run_tasks
self.process_task_result(node, catched_excp)
File "/home/mariano/.virtualenvs/nikola/lib/python3.4/site-packages/doit/runner.py", line 175, in process_task_result
self.dep_manager.save_success(task)
File "/home/mariano/.virtualenvs/nikola/lib/python3.4/site-packages/doit/dependency.py", line 413, in save_success
timestamp = os.path.getmtime(dep)
File "/home/mariano/.virtualenvs/nikola/lib/python3.4/genericpath.py", line 55, in getmtime
return os.stat(filename).st_mtime
FileNotFoundError: [Errno 2] No such file or directory: 'cache/wordpress-migrated/de-tastil-a-leon.html'

The file (in this particular example "de-tastil-a-leon") is semi-random. It's always the first file doit picks from the wordpress-migrated folder, which is a folder I'm using for the posts I migrated from WordPress, and it's at the same level of posts and pages.

Anyway, if I run "nikola build" again immeditately after the error, it builds ok. The error appears only on the first run (after a clean start).

I get exactly the same behavior (i.e., the same error) both on my laptop and on the VPS.

The second issue is that Nikola always seems to be rebuilding more than it's necessary. I seems to rebuild almost the whole site as far as HTML rendering is concerned, even if I make two consecutive builds without touching anything. It is not exactly a problem (I can live with that, it's just a few more seconds), but it's weird and I think that's not the way it's supposed to work, right?

Any ideas?

Here's the code: https://github.com/chaghi/elblogdechaghi

And these are the details of the virtualenv I'm using:
blinker (1.3)
certifi (14.5.14)
docutils (0.12)
doit (0.27.0)
ghp-import (0.4.1)
ipython (3.1.0)
Jinja2 (2.7.3)
jsonschema (2.4.0)
livereload (2.3.2)
Logbook (0.9.0)
lxml (3.4.3)
Mako (1.0.1)
Markdown (2.6.2)
MarkupSafe (0.23)
micawber (0.3.2)
mistune (0.5.1)
natsort (3.5.6)
Nikola (7.3.1.1)
phpserialize (1.3)
Pillow (2.8.1)
pip (6.1.1)
ptyprocess (0.4)
pygal (1.7.0)
Pygments (2.0.2)
pyinotify (0.9.5)
Pyphen (0.9.1)
PyRSS2Gen (1.1)
python-dateutil (2.4.2)
pyzmq (14.6.0)
requests (2.6.2)
setuptools (15.1)
six (1.9.0)
smartypants (1.8.6)
terminado (0.5)
tornado (4.1)
typogrify (2.0.7)
Unidecode (0.4.17)
webassets (0.10.1)
Yapsy (1.11.23)

Thank you!

Chris Warrick

unread,
Apr 28, 2015, 10:19:05 AM4/28/15
to Nikola—Discuss
Cannot reproduce. You may want to run:

$ nikola build
$ nikola dumpdb > run1
$ nikola build
$ nikola dumpdb > run2
$ diff -u run1 run2

and read the diff output, to see what dependent files are the troublemakers.

Also, the issue is most likely caused by `pages/home.rst` and its post
list. You may want to set the date of that page to 1970-01-01
00:00:00 and avoid race conditions.


--
Chris Warrick <https://chriswarrick.com/>
PGP: 5EAAEA16

Roberto Alsina

unread,
Apr 28, 2015, 10:43:49 AM4/28/15
to nikola-...@googlegroups.com
I am starting to worry about post list, since it seems to cause lots of trouble.

Maybe we can just make it so pages with post lists are always marked as dirty?

That will mean you rebuild them every time, but at least it will avoid races.


--
You received this message because you are subscribed to the Google Groups "nikola-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nikola-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mariano Draghi

unread,
Apr 28, 2015, 12:26:54 PM4/28/15
to nikola-...@googlegroups.com
On Tuesday, April 28, 2015 at 11:19:05 AM UTC-3, Chris Warrick wrote:
On Tue, Apr 28, 2015 at 4:48 AM, Mariano Draghi <mdr...@gmail.com> wrote:
> Hi,
>
> I'm having a couple of issues with nikola's cache. I suspect there's
> [...]
Cannot reproduce.  You may want to run:

$ nikola build
$ nikola dumpdb > run1
$ nikola build
$ nikola dumpdb > run2
$ diff -u run1 run2

and read the diff output, to see what dependent files are the troublemakers.


The diff is HUGE! (like 9MB). I uploaded a compressed version, if it helps... I really not sure what I should be looking for there:
http://chaghi.com.ar/blog/wp-content/uploads/build.diff.tgz

 
Also, the issue is most likely caused by `pages/home.rst` and its post
list.  You may want to set the date of that page to 1970-01-01
00:00:00 and avoid race conditions.


That did the trick!
Changing that date took care of the cache errors when building from scratch.

But nikola is still re-rendering almost the whole site's html output even if I change just one existing post. For example:
(nikola)mariano@chaghi:~/Devel/elblogdechaghi/site$ nikola build
[... end of successful build ...]
.  sitemap:output/sitemap.xml
.  sitemap:output/sitemapindex.xml
.  robots_file:output/robots.txt
(nikola)mariano@chaghi:~/Devel/elblogdechaghi/site$ echo -e "\n" >> posts/rearrancando.rst
(nikola)mariano@chaghi:~/Devel/elblogdechaghi/site$ nikola build
Scanning posts.....done!
.  render_sources:output/blog/rearrancando/index.rst
.  render_tags:output/tags/index.html
.  render_posts:cache/posts/rearrancando.html
.  copy_assets:output/assets/css/code.css
.  render_archive:output/2014/index.html
.  render_archive:output/2015/index.html
[...]
.  render_pages:output/blog/el-gran-truco/index.html
.  render_pages:output/blog/fede-de-visita/index.html
.  sitemap:output/sitemap.xml
(nikola)mariano@chaghi:~/Devel/elblogdechaghi/site$

I'm attaching the whole 2nd build output, as you can see, it re-renders al the html for the indexes, tags, posts, and the sitemap, apart from the expected things (the edited post, the RSS, and the indexes for the tag(s) affected).

--
Mariano

nikola-rebuild.txt

Mariano Draghi

unread,
Apr 28, 2015, 12:32:32 PM4/28/15
to nikola-...@googlegroups.com, ral...@kde.org
On Tuesday, April 28, 2015 at 11:43:49 AM UTC-3, Roberto Alsina wrote:
I am starting to worry about post list, since it seems to cause lots of trouble.

Maybe we can just make it so pages with post lists are always marked as dirty?

That will mean you rebuild them every time, but at least it will avoid races.


Sounds ok! I don't think anyone should be using that feature a lot, or on a lot of pages, so having those pages always treated as dirty shouldn't be a problem.

In fact, that problematic page (`pages/home.rst` in my case) was not being rebuilt, and because of that when I added the last post (`posts/rearrancando.rst`) it didn't get linked from the home until y manually made some dummy edit int `pages/home.rst` to force a rebuild of it.

--
Mariano

Mariano Draghi

unread,
Apr 28, 2015, 1:33:51 PM4/28/15
to Roberto Alsina, nikola-...@googlegroups.com, ral...@kde.org
On Tue, Apr 28, 2015 at 2:04 PM Roberto Alsina <ral...@netmanagers.com.ar> wrote:
On 28/04/15 13:32, Mariano Draghi wrote:
In fact, that problematic page (`pages/home.rst` in my case) was not being rebuilt, and because of that when I added the last post (`posts/rearrancando.rst`) it didn't get linked from the home until y manually made some dummy edit int `pages/home.rst` to force a rebuild of it.


That looks like a huge bug. If someone can provide me steps to reproduce this sort of things from scratch I will take a look in my next free time :-(



I'll try to give it a shot later using a new site, and see if it's reproducible in a predictable way.

Mariano Draghi

unread,
Apr 28, 2015, 6:09:07 PM4/28/15
to Roberto Alsina, nikola-...@googlegroups.com, ral...@kde.org
Hi,

I could reproduce the problem starting a new site from scratch.

1) `nikola init site`, accepting all the defaults, no comments system.

2) Change the configuration as follows:
a) use "" as the destination folder for stories
b) change `INDEX_PATH` to "posts"

3) Add a couple off dummy posts

4) Add a new "home page" for the site
a) Set its slug to `index`, so it works as a proper home page
b) Change its date to "1970-01-01 00:00:00" to avoid the race condition that was mentioned before (maybe this is unrelated... but still).
c) Add a `post-list` directive to include the last 5 posts

5) `nikola build`
The whole site is built as expected.

6) Add a new post

7) `nikola build`
The file output/index.html (i.e., the home.rst rendering) is NOT rebuilt, and as such, the post list in it is not updated;

I'm attaching:
- conf.py
- test.log: the stdout of the whole process (stripping some of the first build), with a couple of comments highlighting the points where I made changes
- packages.txt: my list of packages (`pip list` from the virtualenv I'm using). This is on Ubuntu 4.10, with Python 3.4, with `nikola[extras]` installed in a virtualenv.

If you confirm this is a bug and you want me to open an issue in GitHub, please let me know.

Also, I *might* get at shot at it if someone give me some pointers. My python-fu is way too rusty nowadays :'(, but who knows...

Thanks!

conf.py
test.log
packages.txt

Mariano Draghi

unread,
Apr 28, 2015, 9:37:42 PM4/28/15
to nikola-...@googlegroups.com
Aaaaaand it's even easier to reproduce. Which is good news, I suppose.

Forget steps 2) and 4.a), i.e., the bug is reproducible with a 100% default `nikola init`, and the page that has the `post-list` directive does not need to have its slug set to "index".

So, in a nutshell:
- have some posts
- have a page/story with a post-list
- add some more posts
- the page with the post-list is not updated to reflex the new posts

I can confirm that you still need the "1970-01-01 00:00:00" workaround in the page that has the `post-list` to avoid the race condition error on the first build after a forget/clean.

Chris Warrick

unread,
Apr 29, 2015, 11:48:26 AM4/29/15
to Nikola—Discuss
On Wed, Apr 29, 2015 at 3:37 AM, Mariano Draghi <mdr...@gmail.com> wrote:
> Aaaaaand it's even easier to reproduce. Which is good news, I suppose.
>
> Forget steps 2) and 4.a), i.e., the bug is reproducible with a 100% default
> `nikola init`, and the page that has the `post-list` directive does not need
> to have its slug set to "index".
>
> So, in a nutshell:
> - have some posts
> - have a page/story with a post-list
> - add some more posts
> - the page with the post-list is not updated to reflex the new posts
>
> I can confirm that you still need the "1970-01-01 00:00:00" workaround in
> the page that has the `post-list` to avoid the race condition error on the
> first build after a forget/clean.

For this post-list bug, you may open a GitHub issue. A fix would
require a few architectural changes (like adding a meta field to
specify “always rebuild”).

For the bug with files being rebuilt when not needed: please try
reproducing it with all your posts in a clean site — maybe your config
or theme is doing something bad?

Roberto Alsina

unread,
Apr 29, 2015, 1:11:37 PM4/29/15
to nikola-...@googlegroups.com
I think I can do a simple fix by using rst's record-dependencies to pass along some extra stuff.

Mariano Draghi

unread,
Apr 29, 2015, 1:20:32 PM4/29/15
to nikola-...@googlegroups.com


On Wednesday, April 29, 2015 at 12:48:26 PM UTC-3, Chris Warrick wrote:
On Wed, Apr 29, 2015 at 3:37 AM, Mariano Draghi <mdr...@gmail.com> wrote:
> Aaaaaand it's even easier to reproduce. Which is good news, I suppose.
[...]
For this post-list bug, you may open a GitHub issue.  A fix would
require a few architectural changes (like adding a meta field to
specify “always rebuild”).


Done
* Issue #1671: Pages with `post-list` directives are not being rebuilt when adding new posts - https://github.com/getnikola/nikola/issues/1671
* Issue #1672: Race condition building site from a clean state when there is a page with a `post-list` directive - https://github.com/getnikola/nikola/issues/1672
 
 
For the bug with files being rebuilt when not needed: please try
reproducing it with all your posts in a clean site — maybe your config
or theme is doing something bad?


Ok, I'll give it a shot later and report back.

Thank you!

--
Mariano

Mariano Draghi

unread,
Apr 29, 2015, 9:48:47 PM4/29/15
to nikola-...@googlegroups.com

I've found the problem... kind of... but can't explain it. It's definetely related to the FAVICONS set, when it has more than one tuple.

I copied my posts and pages to a clean site as suggested, and tried to reproduce the problem. I couldn't. I started to copy bit by bit other things (files, assets, the custom theme, etc.), and adjusting the bits of `conf.py` accordingly. At one point the "copied" site was like 90% of my real site, but still couldn't reproduce the problem. So I started to go through all the differences between my real `conf.py` and the "clean" `conf.py`, and eventually I was able to find out that the FAVICONS set was the culprit.

If I commented out the FAVICONS set, everything worked as expected. As soon as I uncommented the set, the bug was triggered. Then I found out that with just one tuple in the set I could not reproduce the problem either.

So I started a new test from scratch, to come up with a very minimal set of conditions to reproduce the problem 100% of the time... and I couldn't :( I found some situations in which the bug *might* be triggered. With a small-ish site you can run N consecutive builds without triggering the problem, and suddenly with the consecutive build N+1 (almost) all the html is regenerated. With a bigger site like mine (~500 posts), the unnecessary rebuild is triggered almost always. Sounds like another weird race condition, don't you think?

Anyway, after multiple tests both with my real site and minimal test sites created from scratch, I've found this:
- I could not trigger the bug with an empty FAVICONS set;
- I could not trigger the bug with a FAVICONS set with just 1 tuple;
- When the FAVICONS set had more than 1 tuple, the bug could be triggered almost always in large-ish sites, and sometimes with small-ish sites
- And for the fun part: I could NEVER trigger the bug if I turned the FAVICONS set in a tuple of tuples (like POSTS or PAGES), no matter how many tuples I put inside, and no matter the size of the site being built. Don't ask me why I tried this, even less why the type of the collection affects the result. :p Well, yes, sets are mutable, tuples are not, so there's that... but I can't explain it.

So, to sum this up: it would be nice if someone else tried this:
- Init a new site
- Add a couple of posts
- Add a couple of icons/images in the files folder

Then, for each one of this cases, do several consecutive `nikola build`:
- FAVICONS set empty/commented out
- FAVICONS set with just one tuple (pointing to one of the images)
- FAVICONS set with 2 tuples (pointing to both images)
- same as the previous case, but with FAVICONS defined as a tuple (replacing the "{}" by "()")

The size of the site seems to affect the result, so with a small site you might need to run several consecutive `nikola build` until the bug is triggered (i.e., most of the html is re-rendered even if you didn't touch anything in between).

Thanks!


Roberto Alsina

unread,
May 1, 2015, 4:41:37 PM5/1/15
to nikola-...@googlegroups.com
And just a FYI if someone else was reading about this here: https://github.com/getnikola/nikola/pull/1673#partial-pull-merging
Reply all
Reply to author
Forward
0 new messages