Silverstripe (Assets) vs. Scaling in the Cloud

895 views
Skip to first unread message

Stefan

unread,
Mar 5, 2013, 9:29:25 AM3/5/13
to silverst...@googlegroups.com
Hi everybody!

I want to start a new discussion about the handling of assets in Silverstripe.

The first big part is about our experience and how we are using Silverstripe now and the second part should offer new input for further discussions (we tried to implement and use a storage API for the assets).

First a short summary:
Currently it is not possible to use external storage providers (like Amazon S3) for assets because they have to be present on the local filesystem. The only way is to mount external storages to the local filesystem (often buggy) or to setup a self-hosted NFS. But this is not possible for services like Heroku where you only have a temporary filesystem for your code base.
We did some research and tried to extract all file accesses (concerning /assets only!) to a new storage API based on Djangos approaches and APIs (https://docs.djangoproject.com/en/1.4/ref/files/storage/).
Although our API is all but perfect (to be honest we just wanted a quick result) and there are some core functionalities which need to be rewritten, we think it is time to improve the whole assets handling and allow Silverstripe to be deployed in the cloud.

Part 1:
Our Experiences:

We (http://wunderweiss.com) are using Silverstripe intensively, also for some big community based sites in austria. The biggest problem we are facing is how to scale a Silverstripe site horizontally.
Since the whole assets management is constrained to a local filesystem (fopen, fwrite, mkdir,...absolute paths like "/assets" nearly everywhere in the core) it is nearly impossible to store and deliver assets through an external provider (eg. Amazon S3).

One way we found was to mount all these external storages to the local filesystem (to be able to use the cms backend) and - to increase performance on the frontend - rewrite all URLs in the output HTML to target those external storages (like http://werner.mundraeuber.de/archives/148-Silverstripe-and-Amazon-Cloudfront-CDN).
But in case of Amazon S3 we had a lot of bad experiences with those filesystem drivers as they are buggy and tend to crash very often.
We were not able to use this setup since the intense use of the local filesystem in silverstripe was too much for a mounted S3 storage.

Where we've ended:
  • self-hosted and maintained NFS to scale Silverstripe's assets
  • Amazon's Cloudfront which accesses the assets through a dedicated webserver for static files
  • Other frameworks to handle actual frontend rendering

So at this time we were not able to create a Silverstripe-only site because of these limitations.

Part 2:

How are other frameworks solving the problem of integrating different storages?

Django (https://www.djangoproject.com) for example is providing a basic Storage-API which is used all over the framework and allows to extend the storage handling with new providers. The plugin django-storages (https://bitbucket.org/david/django-storages) is extending Django with a lot of different storage backends.

What has to be changed in the Silverstripe core to allow different storages?
  • File access has to be abstracted away from php's file methods to allow future extensions.
  • To decrease unnecessary file accesses (assume a remote storage), the database has to be used as much as possible (from version 2.4 to 3 a lot of improvements took place, but there are still some bottlenecks (sync, deleteFormattedImages,...). For Example the image size is retrieved by accessing the image file instead of a meta data field in the database. This could be done one time on uploading or resizing.
  • The usage of absolute PATHS has to be changed to absolute URLS to allow external storages.

What we've done:

On our own we've tried following:
Based on Django's Storage-API we've extracted all file accesses (only concerning the assets) into an extendable interface. Basically we took the existing Filesystem-class, made it abstract, added abstract methods (to replace php's file access) like for saving, removing, renaming of new files and exchange between the local (transient) filesystem and the active (remote) storage.
New storage backends extend this abstract class and could then be injected into the system.

So we replaced all file accesses with our new Storage-API and implemented a storage backend for the local filesystem.

After that we did some basic tests with the cms and ran the unit test suite. It basically worked, but of course since we wanted to see results quickly, the developed API is far from perfect. But within one day it was possible to replace the basic file access with our new API and use the cms like before.

Final words:

The research has shown, that it is not as hard as we thought before to replace the php file accesses with a new interface. Further we found many places, where optimizations or refactorings need to be done (basic meta data has to be available in the database!).
We hope we could start a new discussion on this topic and may see changes in near versions of Silverstripe.
Without this limitation it would be possible for many people to use services like Heroku, Amazon, etc. to deploy Silverstripe!

If requested we could put the resulting code on github (but it is only a dirty prototype for our research and it may need some cleanup ;) ). But our main goal was to start a discussion on this and we hope it ends in plans for upcoming releases.

Thanks
Stefan
wunderweiss.com

Marcus Nyeholt

unread,
Mar 5, 2013, 8:55:50 PM3/5/13
to silverst...@googlegroups.com
I've been working on a few interrelated modules that are designed around a similar idea, in that content read/write should be hidden behind an interface that can be implemented by various storage backends, that all use a single way of referring to that content, so 

file:||path/on/disk.txt 

would refer to something to stored via the FileWriter/Reader pair, whereas

mys3:||folder/filename.txt 

would refer to something stored in the configured 'mys3' S3Writer/Reader location 

It's still experimental, and at present configured to work alongside the local storage of assets, but I have been playing with the idea of gutting all the filesystem interaction to replace it with this so that you don't even need the additional local storage of assets. Or the alternative where all locally stored assets are protected access, and when they're published/readable by public, they're then pushed up to their relevant CDN (even where that CDN might be another part of the local filesystem. 


See some background at https://groups.google.com/forum/?hl=en&fromgroups=#!topic/silverstripe-dev/Z7CmioND5Ow


Add extensions

Object::add_extension('Folder', 'CDNFolder');
Object::add_extension('File', 'CDNFile');

And mysite/_config/s3.yml

Injector:
  S3Service:
    constructor:
      - {your_api_key}
      - {your_api_secret}
  S3ContentReader:
    type: prototype
    properties:
      s3service: %$S3Service
      bucket: {your_bucket_name}
  S3ContentWriter:
    type: prototype
    properties:
      s3service: %$S3Service
      bucket: {your_bucket_name}






Stefan
wunderweiss.com

--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to silverstripe-d...@googlegroups.com.
To post to this group, send email to silverst...@googlegroups.com.
Visit this group at http://groups.google.com/group/silverstripe-dev?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Stefan

unread,
Mar 6, 2013, 4:29:32 AM3/6/13
to silverst...@googlegroups.com
Nice work Marcus.
But as you pointed out, you still have the local storage of assets. To deploy Silverstripe on Heroku for example, all those local file accesses have to be abstracted to be able to use a different storage engine. And therefore there have to be changes in the core.


Am Mittwoch, 6. März 2013 02:55:50 UTC+1 schrieb Marcus Nyeholt:
I've been working on a few interrelated modules that are designed around a similar idea, in that content read/write should be hidden behind an interface that can be implemented by various storage backends, that all use a single way of referring to that content, so 

file:||path/on/disk.txt 

would refer to something to stored via the FileWriter/Reader pair, whereas

mys3:||folder/filename.txt 

would refer to something stored in the configured 'mys3' S3Writer/Reader location 

Your way of referring to files is interesting since it can be used to add different storage types and I think the core would need less changes instead of the shortcode approach. But maybe this could be combined with shortcodes to ease the lookup of all file usages.

xeraa

unread,
Mar 6, 2013, 10:02:25 AM3/6/13
to silverst...@googlegroups.com
Minor remark: I've heard quite a lot of praise for TYPO3's File Abstraction Layer (FAL), see the (unfinished) documentation: http://docs.typo3.org/typo3cms/FileAbstractionLayerReference/
It seems to heavily rely on meta information in the database and supports pluggable storage (local, S3, Dropbox,...).

Cheers,
Philipp

Mark Guinn

unread,
Mar 6, 2013, 10:21:29 AM3/6/13
to silverst...@googlegroups.com
Thanks for your work Marcus and Stefan. We've been talking about a similar approach with Rackspace Cloud Files + servers. I'll be interested to see how this progresses. It'll probably be a few months before our company really moves that direction but when we do hopefully we can contribute to the effort on the Rackspace front.
> --
> You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to silverstripe-d...@googlegroups.com (mailto:silverstripe-d...@googlegroups.com).
> To post to this group, send email to silverst...@googlegroups.com (mailto:silverst...@googlegroups.com).

Stefan

unread,
Mar 8, 2013, 5:07:15 AM3/8/13
to silverst...@googlegroups.com
From Silverstripe I would like to know if there are any plans for the next 3.X releases to add those feature?


Am Mittwoch, 6. März 2013 16:21:29 UTC+1 schrieb Mark Guinn:
Thanks for your work Marcus and Stefan. We've been talking about a similar approach with Rackspace Cloud Files + servers. I'll be interested to see how this progresses. It'll probably be a few months before our company really moves that direction but when we do hopefully we can contribute to the effort on the Rackspace front.



On Wednesday, March 6, 2013 at 9:02 AM, xeraa wrote:

> Minor remark: I've heard quite a lot of praise for TYPO3's File Abstraction Layer (FAL), see the (unfinished) documentation: http://docs.typo3.org/typo3cms/FileAbstractionLayerReference/
> It seems to heavily rely on meta information in the database and supports pluggable storage (local, S3, Dropbox,...).
>
> Cheers,
> Philipp
>
> --
> You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to silverstripe-d...@googlegroups.com (mailto:silverstripe-dev+unsub...@googlegroups.com).

dpde

unread,
Jun 27, 2013, 10:47:00 AM6/27/13
to silverst...@googlegroups.com
This would be great, I am also interested in this.

con...@webtorque.co.nz

unread,
Jun 29, 2013, 10:21:45 PM6/29/13
to silverst...@googlegroups.com
I've decided on a different approach on a current project.

There are 4 servers on a private network, so no outside access. Originally, I planned to use network shares to sync the files between servers, but this requires extra setup. I plan to use curl to update the files on the other servers after they are uploaded/deleted etc. To setup other servers only requires adding the ip address to a list.

You ending up storing all the files multiple times, but to me, this approach seems simple and is platform independent so can be easily rolled out no matter what the hosting environment is.
Reply all
Reply to author
Forward
0 new messages