Pyppeteer for automatic static previews of interactive elements

31 views
Skip to first unread message

Rob Beezer

unread,
Jun 28, 2022, 10:44:05 PM6/28/22
to prete...@googlegroups.com
I've swapped out the node/Javascript "pageres" package for the Python
"pyppeteer" package. That's progress! No npm and it'll play nice with the CLI.
;-)

A two-step install into your Python virtual environment:

https://pretextbook.org/doc/guide/html/pip-install.html

Then take off your author hat, put on your developer hat, and test with the
pretext/pretext script with something like

pretext/pretext -vv -d /tmp/foo -c preview -p
examples/sample-article/publication.xml examples/sample-article/sample-article.xml

Only that won't work, since three CalcPlot3D interactives consistently crash the
process (hmmm, maybe I need to catch exceptions at that point?). So try on your
own project and we can see what else is bad before a new CLI ships. Or debug
why CalcPlot3d is a problem.

Rob

D. Brian Walton

unread,
Jun 29, 2022, 9:22:02 AM6/29/22
to pretext-dev
I wanted to report that this worked for me, and the CalcPlot3D actually generated images on my machine without error. The Wolfram CDF examples were blank, but I see they are broken on the Sample Article page for my regular browser as well.

- Brian

--
You received this message because you are subscribed to the Google Groups "PreTeXt development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pretext-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pretext-dev/6cd36ac0-2952-725a-317c-62229ac3c842%40ups.edu.

Rob Beezer

unread,
Jun 29, 2022, 12:55:29 PM6/29/22
to prete...@googlegroups.com
Thanks very much for the testing, Brian.

Interesting about CalcPlot3D! I'll have to look closer at mine.

Yes, I'm not sure Wolfram ever worked. It was for a project at the University
of Cape Town, maybe I should ping them.

I think I know what the problem was with math (on old thread) - I'm going to
investigate right now.

Rob
> <mailto:pretext-dev%2Bunsu...@googlegroups.com>.
> <https://groups.google.com/d/msgid/pretext-dev/6cd36ac0-2952-725a-317c-62229ac3c842%40ups.edu>.
>
> --
> You received this message because you are subscribed to the Google Groups
> "PreTeXt development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to pretext-dev...@googlegroups.com
> <mailto:pretext-dev...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pretext-dev/CAH7VRoLafCh3P08c_ZR371Yw%2BEgOCPoL%2BBt6yJ49ykwsJhFDvw%40mail.gmail.com
> <https://groups.google.com/d/msgid/pretext-dev/CAH7VRoLafCh3P08c_ZR371Yw%2BEgOCPoL%2BBt6yJ49ykwsJhFDvw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Steven Clontz

unread,
Jun 29, 2022, 10:35:01 PM6/29/22
to PreTeXt development
Trying to implement this for CLI. So, looking at the source, I think that it requires setting https://pretextbook.org/doc/guide/html/publication-file-online.html#online-baseurl which feels off to me - in order to snapshot a preview of an interactive, the work needs to already be deployed somewhere already? Why don't we just spin up https://docs.python.org/3/library/http.server.html with what we want to snapshot locally, then point pyppeteer to localhost?

Rob Beezer

unread,
Jun 29, 2022, 10:45:20 PM6/29/22
to prete...@googlegroups.com
On 6/29/22 19:35, Steven Clontz wrote:
> Why don't we

Because I really had not considered a cheap local server instantiated from the
Python code when I first worked this up?

Can you make that change? All Python, I think. Except two lines in the XSL
"extraction" stylesheet that sneak in the base URL as a bit of expedient
programming.

We still use the base URL for some other things.

Rob


D. Brian Walton

unread,
Jun 29, 2022, 11:12:22 PM6/29/22
to pretext-dev
I had started exploring this before Rob did this most recent work.

Just by way of clarification, are you (Steven) assuming that the html build has already been completed locally, or that the local iframe page is generated just in time for the local server to offer up?

The second method is roughly what I am needing to do for the dynamic exercises with some extra work to pull out dynamic text for the static version as well. The first is clearly easier if it is available, but it seems the script doesn't really need to require the html build is complete.

Also, something I noticed while playing with this. There can be multiple python servers running on separate ports. But if you accidentally open a second thread trying to play with the same port, it's (not surprisingly) unhappy but doesn't seem to complain.

- Brian


--
You received this message because you are subscribed to the Google Groups "PreTeXt development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pretext-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pretext-dev/184716b2-9b8a-c590-a32a-d38a72ca9dac%40ups.edu.

Steven Clontz

unread,
Jun 30, 2022, 9:01:47 AM6/30/22
to prete...@googlegroups.com
I was thinking just in time: we create a minimal HTML file for each (all?) interactive, and serve them locally to get our snapshots.

You received this message because you are subscribed to a topic in the Google Groups "PreTeXt development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pretext-dev/qiQv9VD4J_E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pretext-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pretext-dev/CAH7VRoLKD-LU1TU%2B5wN94N0-4QoEzLfVpXRtscsGLkKtjismRg%40mail.gmail.com.

D. Brian Walton

unread,
Jun 30, 2022, 10:57:47 AM6/30/22
to pretext-dev
Would it make sense to open the browser and page outside of the loop over interactives and then iterate using the same page on different URLs?  I'm not sure how much overhead each call to create a new browser instance creates. Certainly the 5 second delay for the page to finish loading is the dominant bottle-neck.

I'm just starting to learn how this asyncio works, but it seems like this should be possible.

- Brian

Rob Beezer

unread,
Jun 30, 2022, 11:14:29 AM6/30/22
to prete...@googlegroups.com
On 6/30/22 06:01, Steven Clontz wrote:
> I was thinking just in time: we create a minimal HTML file for each (all?)
> interactive

We already make such files. One per interactive. Not absolutely minimal, but
close.

Steven Clontz

unread,
Jun 30, 2022, 11:53:11 AM6/30/22
to PreTeXt development
>  We already make such files. 

@Rob thanks for clarifying - I thought that might be the case in my skim of your code yesterday (though I guess these are made as part of the HTML build, so we'd need to figure out how to just do those files as part of the generation step).

My next question is whether for performance reasons it's better to make one all-inclusive HTML file with all interacts, wait for that to load, then take all the snapshots from a single request. Maybe that's sometime to try later though if we already have the separate file solution ready to go.

> Would it make sense to open the browser and page outside of the loop over interactives and then iterate using the same page on different URLs? 

Using the same headless browser on different URLS you mean? That was my first thought - the optimal solution might acutally be spinning up a dozen browsers and divvying up however many interactives we have between them. But I figure we should start with the simplest solution to implement (whether that's one browser total or one browser per interactive) even if it's not performant - we can refactor later.

Rob Beezer

unread,
Jun 30, 2022, 12:09:48 PM6/30/22
to prete...@googlegroups.com
On 6/30/22 08:53, Steven Clontz wrote:
> > We already make such files.
>
> @Rob thanks for clarifying - I thought that might be the case in my skim of your
> code yesterday (though I guess these are made as part of the HTML build, so we'd
> need to figure out how to just do those files as part of the generation step).

Files are built as a by-product of the HTML build. It would not be a trivial
matter to split that out. But it would be an interesting stylesheet to build
and maintain.

Rob

D. Brian Walton

unread,
Jun 30, 2022, 2:08:57 PM6/30/22
to pretext-dev
Steven,

I think we are talking about the same thing.

Model 1 (easy, currently in place):

Loop on interactives:
  - open browser
  - open page in browser
  - send page to URL
  - wait for render
  - snapshot
  - wait for completion before advancing (asyncio run until complete)

Model 2 (What I was thinking: one browser, iterate only over URLS):

Open browser
Open page
Loop on interactives:
  - goto URL
  - wait for render
  - snapshot
  - wait for completion before advancing (asyncio run until complete)

Model 3 (What I think Steven described):

Gather on interactives (rather than loop?):
  - open browser
  - open page in browser
  - send page to URL
  - wait for render
  - snapshot
  - add to the io queue
Wait for completion outside of the loop so that all of the different browsers are threaded (??)

That sounds like a good idea and why asyncio was invented. I'd worry about whether there is a limit to how many browsers should be instantiated, and if so how to queue up the interactives when there are more than this limit.

- Brian

--
You received this message because you are subscribed to the Google Groups "PreTeXt development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pretext-dev...@googlegroups.com.

Steven Clontz

unread,
Jul 30, 2022, 10:42:33 AM7/30/22
to PreTeXt development
The upcoming 1.0 release of PreTeXt-CLI will have pyppeteer as a dependency, but we need to update the core script to run a local server and capture images from there before I can enable that feature with the CLI (whose workflow assumes assets can be generated before a project is built or deployed).
Reply all
Reply to author
Forward
0 new messages