Parallelism?

48 views
Skip to first unread message

Joel Dueck

unread,
Apr 22, 2019, 11:21:32 AM4/22/19
to Pollen
Question for Matthew: the docs for raco setup [1] mention something interesting:

-jobs n, --workers n, or -j n

use up to n parallel processes. By default, raco setup uses (processor-count) jobs, which typically uses all of the machine’s processing cores.


Indeed when raco setup is rebuilding all the scribble docs, it maxes out my CPU. By contrast, when rendering a large batch of Pollen documents, my CPU util% is generally between 10–20%.

Would it be feasible to add some optional parallelism to Pollen, either as an argument to raco pollen render or as an alternate form of render-batch?

Authors would have to take care not to shoot themselves in the feet with it, of course, but the same is true of programming generally.

I can see where there may be something in Pollen's design that prevents any Pollen project from being amenable to parallel processing, but I don't know enough about the subject to figure out what that would be.

If you think it could work but you'd rather not add it or have to support it right now (please don’t do anything that would noticeably interrupt your work on Quad!), that would be useful information as well; it would mean it might be worth my while to try rolling my own experimental render-batch-parallel.

Matthew Butterick

unread,
Apr 22, 2019, 11:43:13 AM4/22/19
to Joel Dueck, Pollen


On Apr 22, 2019, at 8:21 AM, Joel Dueck <dueck...@gmail.com> wrote:

Would it be feasible to add some optional parallelism to Pollen, either as an argument to raco pollen render or as an alternate form of render-batch? 

`raco pollen setup` operates in parallel, to a maximum of however many cores are on your machine (= `(processor-count)`) Run that before `raco pollen render`, and it's usually faster than running `raco pollen render` alone.

`raco pollen setup` only precompiles & caches the `doc` & `metas`. At one point I looked into parallelizing the rest of the render (that is, putting `doc` and `metas` into a template and writing output). IIRC what I discovered is that part is relatively fast. Also, within a project, rendered pages can depend intricately on each other (templates and preprocessor files etc). So if you do your parallel render in the wrong order, you can end up being slower (because you're re-rendering work you don't need to).  

Still, parallel rendering isn't hard to implement. So if `raco pollen setup` is underwhelming, I can take another look at it.

Joel Dueck

unread,
Apr 22, 2019, 2:26:28 PM4/22/19
to Pollen
On Monday, April 22, 2019 at 10:43:13 AM UTC-5, Matthew Butterick wrote:
`raco pollen setup` operates in parallel, to a maximum of however many cores are on your machine (= `(processor-count)`) Run that before `raco pollen render`, and it's usually faster than running `raco pollen render` alone.

`raco pollen setup` only precompiles & caches the `doc` & `metas`. At one point I looked into parallelizing the rest of the render (that is, putting `doc` and `metas` into a template and writing output). IIRC what I discovered is that part is relatively fast. Also, within a project, rendered pages can depend intricately on each other (templates and preprocessor files etc). So if you do your parallel render in the wrong order, you can end up being slower (because you're re-rendering work you don't need to).  

Still, parallel rendering isn't hard to implement. So if `raco pollen setup` is underwhelming, I can take another look at it.

On my `thenotepad` repo [1], which has 72 .poly.pm files in its `posts/`  subfolder, I tried the following sets of commands. Before each set, I did `rm posts/*.html`, `raco pollen reset` and `touch pollen.rkt`:
  1.  Cold-start render: `raco pollen render -t html posts/*.poly.pm` = 166 seconds
  2. Setup + render:
    1. raco pollen setup posts/ = 65 seconds
    2. raco pollen render -t html posts/*.poly.pm = 112 seconds
    3. total time 177 seconds
My CPU was maxed out during `raco pollen setup` and I could see it churning through them roughly four at a time, so that part is great.

But based on what you said I would have expected the subsequent render to take less time than the preheat.

Maybe there's something about my template file [2] that makes it especially slow?

I get what you're saying about tricky interdependencies. We have separate tools for handling this issue though (makefiles) so as long as it's possible to specify which files are included in a given batch, it might still be good to have the option of doing the whole process for the batch in parallel. (I.e., keep the current behavior the default)

(Incidentally: through trial and [non-] error I discovered I can specify a subdirectory after `raco pollen setup` and limit it to those files. I was glad to see this was possible, but it isn't in the docs!)

Matthew Butterick

unread,
Apr 22, 2019, 11:40:23 PM4/22/19
to Joel Dueck, Pollen

On Apr 22, 2019, at 11:26 AM, Joel Dueck <dueck...@gmail.com> wrote:

But based on what you said I would have expected the subsequent render to take less time than the preheat. 


I sent you a PR with a fix. I think the problem is that in your `setup` module, you're using relative paths instead of absolute paths made with `define-runtime-path`. [1] The result is that the cache keys don't match when they should, so you're getting cache misses.




(Incidentally: through trial and [non-] error I discovered I can specify a subdirectory after `raco pollen setup` and limit it to those files. I was glad to see this was possible, but it isn't in the docs!)


The docs for `raco pollen setup` do say that it takes a directory argument. But you're right that it calls it a "different project directory" when it will in fact work on any directory. I will clarify the docs.

Matthew Butterick

unread,
Apr 23, 2019, 1:37:04 AM4/23/19
to Joel Dueck, Pollen
Separately, I pushed a fix that makes the preheat operation go faster generally, so try that and see if it helps. 

dueck...@gmail.com

unread,
Apr 23, 2019, 12:02:20 PM4/23/19
to Pollen
Thanks for these things! After updating Pollen and merging your PR, a big improvement:
  1. Cold-start render: `raco pollen render -t html posts/*.poly.pm` = 111 seconds (previously 166)
  2. Setup + render:
    1. raco pollen setup posts/ = 42 sec (previously 65)
    2. raco pollen render -t html posts/*.poly.pm = 56 sec (previously 112)
    3. total time 98 seconds (previously 177)
Basically almost a 2x speedup :-o   Clearly my cache misses were a big part of the problem.

I'm still interested in possible gains from parallelizing the template step. My current project will likely start out with around 400 separate documents and grow from there, hence my interest in improving build times.

Matthew Butterick

unread,
Apr 23, 2019, 2:10:30 PM4/23/19
to dueck...@gmail.com, Pollen
OK, I've pushed a parallel-rendering routine which you can access by passing the `-p` or `--parallel` flag to any invocation of `raco pollen render`. (I will leave it undocumented for now until it has proven itself).

If you just do `raco pollen render -p` instead of `raco pollen render`, as I would expect, it's more wasteful in terms of overall rendering time (because it can't rely on the cache as heavily) but that the net time is shorter (because the waste is spread across multiple cores).

That being so, I got the best performance by doing `raco pollen setup` (which preheats the cache in parallel) followed by `raco pollen render -p` (which renders in parallel). By setting up the cache before starting any rendering, you minimize the possibility of duplication of work between parallel renders.

I have seen "cache lock" warnings from time to time in the console — AFAIK these are just notices from the cache subsystem that it's doing its job by blocking attempts at simultaneous disk writes by parallel jobs. It shouldn't affect the result.

Try it and see what you think.

Matthew Butterick

unread,
Apr 23, 2019, 2:27:29 PM4/23/19
to dueck...@gmail.com, Pollen
PS a couple quick tests, in both cases showing the parallel render takes about 2/3 the time.


`raco pollen setup` then `raco pollen render` = 141s
`raco pollen setup` then `raco pollen render -p` = 88s


`raco pollen setup` then `raco pollen render` = 285s
`raco pollen setup` then `raco pollen render -p` = 190s

dueck...@gmail.com

unread,
Apr 23, 2019, 3:46:14 PM4/23/19
to Pollen
PS a couple quick tests, in both cases showing the parallel render takes about 2/3 the time.

 My results are very similar. After a `raco pollen setup` doing `raco pollen render -p -t html posts/*.poly.pm` now takes 32 seconds as opposed to 56 (shorter than the setup time!) So the setup+render method is down to 74 seconds from 98 in my previous post. And I can do a `make all` from scratch in 82 seconds.

I'll have to do a little more reading/work on the PDF rendering side. On first attempt it looks like calling `system` to launch xelatex from inside a template doesn't do anything when rendering in parallel. But even if there's no way around that I can live with it.

dueck...@gmail.com

unread,
Apr 26, 2019, 1:59:41 PM4/26/19
to Pollen
Further notes. So far this has been working great for me on the HTML side.

On the PDF side, if I use render -p to try and make PDFs in parallel, the processes finish really fast with no errors, and no output of any kind.

My PDF template follows the approach in the tutorials of making a temp folder, writing the LaTeX into it, running xelatex and then spitting out the bytes.

From testing and some unanswered threads in the Racket Users group, it seems like `current-directory` (and maybe `current-project-root`) evaluate to "/" specifically when being evaluated inside a template from within a Racket "place". Which means there is probably a permissions error happening pretty quick when the temporary folder is created, short-circuiting the render before it even gets to the `system` call. I need to do more work to figure out how to fix it, but I've been busy. It just seems like a caveat that should probably be mentioned in the docs eventually.


Matthew Butterick

unread,
Apr 26, 2019, 2:44:56 PM4/26/19
to dueck...@gmail.com, Pollen
After some more fiddling, I think the problem was that the `current-poly-target` was not being communicated to the parallel rendering places, so they were just regenerating HTML when you asked for PDF. I just pushed a fix that addresses this. I don't have LaTeX on my machine right now, but let me know if it works for you.

dueck...@gmail.com

unread,
Apr 26, 2019, 10:59:19 PM4/26/19
to Pollen
That was it!

On this same repo after `reco pollen setup posts/`, doing `render -p -t pdf posts/*.pdf` now takes me 148 sec as opposed to 292 sec. (And most importantly, I get real PDFs again)
Reply all
Reply to author
Forward
0 new messages