StaticPublishQueue and Subsites modules don't seem to work together.

152 views
Skip to first unread message

Nedmas

unread,
Nov 21, 2013, 4:36:13 AM11/21/13
to silverst...@googlegroups.com
Hi all,

So I'm working on a project at the moment that uses the Subsites module and I'm trying to add support for the StaticPublishQueue module. I've been following the suggestions in https://github.com/silverstripe-labs/silverstripe-staticpublishqueue/blob/master/docs/en/stale-static-main.example.php and http://doc.silverstripe.org/framework/en/3.0/reference/staticpublisher however the StaticPublishQueue only creates output for the main site.

I'm happy to work on a solution and provide PR's but I just wanted to check if anyone else has encountered this problem or is working on a solution?

Mateusz Uzdowski

unread,
Nov 21, 2013, 3:42:23 PM11/21/13
to silverst...@googlegroups.com
Hi Nedmas,

We are just working on staticpublishqueue-subsites compatibility, started about a week ago :-)

Currently we have taken a minimal approach, trying to use existing staticpublisher APIs. We change the behaviour of the staticpublisherqueue triggers to queue URLs with explicitly specified SubsiteID. On the other side of things, during queue processing, when we detect the ?SubsiteID=X parameter the task kicks into the subsite-aware mode and generates the cache into cache/<domain> directory. 

Main site is is enqueued as SubsiteID=0, and goes into the FilesystemPublisher::static_base_url directory, so effectively nothing will be generated on the top level of cache dir.


See if that works for you - we'd definitely appreciate your input ;-) Also I'm focusing mostly on the generation side here, because I'm serving these directories from nginx. So it's possible that the .htaccess or Apache example configs are broken now. Not sure what is your plan re serving these?

m

Nedmas

unread,
Nov 22, 2013, 4:59:59 AM11/22/13
to silverst...@googlegroups.com
Hi Mateusz,

Thanks for the reply. I actually stumbled upon your work after posting. It's looking really good :-)

The only thing I noticed is that the task RebuildStaticCacheTask doesn't seem to be subsite aware yet. If I can get round to looking at this before you I'll send a PR.

Will this be getting merged into the main repo? Also I'm serving it from Nginx too so I'm not fused about the Apache configs either.

Cheers for your work.

Nedmas

Nedmas

unread,
Nov 22, 2013, 5:32:55 AM11/22/13
to silverst...@googlegroups.com
Hi Mateusz,

I see that you've already started work on RebuildStaticCacheTask nice one!

I was just wondering how you have configured Nginx to serve the different caches depending on hostname?

Thanks,

Nedmas

Mateusz Uzdowski

unread,
Nov 24, 2013, 7:53:23 PM11/24/13
to silverst...@googlegroups.com
Hey Nedmas,

Yeah, still refactoring it :-) Stig fixed up the build-all task, and also we dumped the staticpublisher dependency - a lot of doubled up classes went away, and turned out the only class that we really used was the FilesystemPublisher. Copied that over and all seems to work fine. Now working on the one remaining issue - the generation of the error pages using the standard naming pattern, so the webserver can refer to them directly (via error-page-404.html) instead of delegating this to the PHP layer (which doesn't exist on the frontend machine :P). 

We are currently naughtily sneaking in the GET params into the build queue (/test/?SubsiteID=3) to fix the subsite problem because the Director::test has no way of figuring out what subsite the URL refers to - it has a hard time to map the URL to an object explicitly. This seems to be generally useful, so I'm just experimenting (https://github.com/stojg/silverstripe-staticpublishqueue/commit/d63cf39e66c29836509411ed7dd4d0581e477437) with attaching more specific metadata to the URL - "/test/?_ID=23&_ClassName=Page". This will allow us to grab the object directly while dequeueing, without relying on SiteTree::get_by_link (which is not Subsite-aware, and cannot be made so). I suspect that this will allow us to do awesome things ;-)

I'm not entirely happy how this is architected - sneaking in the GET params into URLs seems hackish. But general approach to passing more metadata into the queue seems to be a good direction to go, so maybe we could rewrite the queue to accept a json-encoded structure, or simply get the database table have more fields that would accomodate the metadata? Would that be something you'd like to help with? Maybe you can see a better approach?

For nginx configs, you can just define one "server" block per subsite, map that to hostname via "server_name", and then use the "try_files" directive to map $uri into the file. This is still relevant https://github.com/stojg/silverstripe-staticpublishqueue/blob/master/docs/en/nginx.vhost .

m

Mateusz Uzdowski

unread,
Nov 27, 2013, 9:33:27 PM11/27/13
to silverst...@googlegroups.com
Hey,

Another update for staticpublishqueue. We have moved into a new branch 2.0, https://github.com/stojg/silverstripe-staticpublishqueue/blob/2.0/

Trying to use this module with subsites highlighted for us that the API was unclear, and the legacy baggage didn't really help. 

We have gone back to the board and thought that objects actually have two distinct behaviours:
- some of them want to be publishable (generating HTML files)
- some of them want to trigger publishing of others (have dependencies)

We have come up with two interfaces that define exactly these (https://github.com/stojg/silverstripe-staticpublishqueue/tree/2.0/code/interfaces). 

Objects can be publishable by providing the urlsToCache method - they get to define the URLs they wish to "own" (implementing StaticallyPublishable). This explicitly establishes the object responsible for a specific URL and makes it easy to generate the full cache - just find everything that is StaticallyPublishable, and ask for URLs. Also because URLs have always some associated objects, it's possible to use the ORM to find out the context - such as "which subsite does this URL belong to" or "does this URL belong to an ErrorPage"? 

On the other hand objects can request updates of some group of objects (not URLs!) by providing objectsToDelete and objectsToUpdate methods (implementing StaticPublishingTrigger). It's up to the specific implementation to decide when to ask for this information though - it's like the object was simply saying "In some general sense, I have dependencies on these other guys, and when I change, I'm likely to impact them".

We are providing the following basic implementation for the SiteTree use case:

- PublishableSiteTree which implements both interfaces for your most usual use case - a hierarchical, publishable page.
- PublishableRedirectorPage which partly replaces the above to make sure it's own link is used as the URL (not the redirect target)

Then we have some concrete engines that consume these:

- SiteTreePublishingEngine that keeps the SiteTree objects updated as pages are published and unpublished (consumes both interfaces)
- SiteTreeFullBuildEngine which republishes the entire tree (consumes only the publishable one)

Publishing is working now with subsites, and also the engine can generate error pages with names you can rely upon (error-<code>.html). I haven't tested this without subsites, VirtualPages probably won't work yet, and deletions are not hooked up for subsites. Also there is no other implementations apart from the one that couples with the SiteTree, so if you want to associate an URL with a Controller, you'd need to write your own implementation and engine for it :-) Ah yeah, and the queue implementation could get a refresh so we don't have to smuggle ID/ClassName in the URL...

Now, I wonder if this all works with Translatable <succumbs into mad laughter>

If you want to help - you know where to find the code :-)
m

Stig Lindqvist

unread,
Nov 28, 2013, 2:53:48 PM11/28/13
to silverst...@googlegroups.com
I'm really pleased that someone else than me realised that the API from the old publisher was not decoupled enough and had to change.

So as far I understand the high level changes are:

The module is no longer depending on the staticpublisher module, it was only using one method from FilesystemPublisher but had to disable and override a lot of other behaviour. So it has been copy/paster into this module. 

In the future I would like to see this component pluggable, so we could implement something like the old rsync publisher, a varnish cache or maybe pushing the data to a document store or any other weird infrastructure requirements.

The module relies on overriding extensions for different objects. Example, it adds a catchall extension for SiteTree, but needs to 'override' it for redirector page. It does this by using the Config system and the before/after options. IMHO it feels hacky, but as Matesuz mused 'it's not a hack, just a less explicit API' (I will quote that from now on).

There is still a lot of cleanup worth doing.

Top of my head suggestions on improvements.
 - Make the event system easier to use. The use case for the event system is to collect urls from controllers that isn't Page_Controllers (doesn't have a DataObject failover)
 - Clean up the queue so it possible to implement a queue in other storages that the default DB. Examples: redis, memcache, mongo, webscale!!  
 - More tests, a lot more tests, like 100% coverage :)
 - Better reports and notification to the Content author about the state of the cache. 
 - Documentation and tutorials. 
 - Triggering cache updates from the QueuedJobs and/or the resque module.

But I think the most important thing is to make the system easy to use and provide good documentation on how to set it up on various frameworks. Cache invalidation is hard as it is.

Awesome work Mateusz!

Mateusz Uzdowski

unread,
Dec 15, 2013, 9:49:52 PM12/15/13
to silverst...@googlegroups.com
Just to let you know, this has just been merged into the master of https://github.com/silverstripe-labs/silverstripe-staticpublishqueue . If you are using the unstable version of this module and you do not wish to upgrade, switch to the 2.0 branch we have just created. 

Since the last time we have fixed up cache deletions for subsites, returned the "stale.html" cache functionality, added VirtualPage support, fixed and cleaned up the tests by converting URLArrayObject into an injectable object, and added a little more documentation.

Let me know if you have any questions regarding the usage of the new interfaces :-)

Many thanks to Stig for architectural oversight and all the useful discussions!

m
Reply all
Reply to author
Forward
0 new messages