Storing files in dataobject fields

256 views
Skip to first unread message

Hamish Friedlander

unread,
Oct 22, 2014, 5:28:25 PM10/22/14
to silverst...@googlegroups.com
Hello,

So I'd like to kick off a discussion about a feature I'd like to see go into framework in a future major version, http://silverstripe.uservoice.com/forums/251266-new-features/suggestions/6425928-files-should-be-stored-in-dataobject-fields

It's designed to fix these issues with the current assets system

- They aren't versioned
- They always have to be stored on the local filesystem
- There's no way to have files be specific to a DataObject, or otherwise not globally shared

This all occurs because the current solution has a File class which doesn't actually store the file at all - SilverStripe assumes that each File instance has an equivalent actual file at a known location on the local filesystem.

There have been various efforts to write modules to solve these issues individually, but nothing to try and solve all three (that I know of anyway). None of them have really taken off as the "recommended" solution, I think mostly because of the limitations of trying to do it as a module.

The solution I'd like to propose is to add an extra $db field type, 'File'. To the developer you'd just use it like any other field type (https://gist.github.com/hafriedlander/49a38ba86a334bef8f30)

Internally, we'd just store a generated SHA hash in the database, which we'd use as a key, and then have an API that defines a backend for storing files by key. We'd probably start off writing filesystem and S3 backends (or actually, taking other people's already written backends & modifying them).

The current File DataObject would stay, but we'd change it to explicitly keep it's contents in a File field on the object. We'd have to write a migration script for existing assets. The Folder class would remain, but wouldn't derive from File anymore (probably?).

This would solve all three problems above:

- Any File fields on a Versioned DataObject would be automatically versioned along with the rest of the DataObject
- By using a defined backend API to store files by hash we allow defining extra backends
- And obviously, the files can live right on the DataObject, although we'd retain the existing File object too

There would be a lot of related changes:

- The "assets" path is explicitly referenced in lots of places (in the Content HTML when inserting an image, in CSS sometimes, etc) and we'd have to eliminate those usages (by using short codes when inserting images - which will need some fixes to the short code system, providing handler APIs, etc).

- We'd need to update the existing upload fields to understand how to upload to a File field and not just a has_one reference to a File object.

- Because we'd probably use composite fields (to store both the File itself and it's metadata in the same "field") we'd need to clean up any issues they have (they're not often used at the moment, so could have rot).

Probably the most controversial part of this is that there's no longer a direct correlation between any filesystem structure and the Files & Images section in SilverStripe. I've done this on purpose because:

- No common server filesystem handles versioning in a native format

- Limiting backends to just "store this data with this key", "give me this data with this key if you have it" and (for some backends) "give me the link to serve the file with this key directly if you have it" makes them easier to write and make them compatible with more backends (can use with a KV store like MongoDB for example)

- By just storing by key, there's no difficultly keeping multiple backends in sync. When using multiple servers and storing locally, rsyncing between servers will never raise a clash. And we can just ask each registered backend in turn "got this key?", so when we're using S3 we can serve from S3 once the async replication has uploaded it, and off a local filesystem before then.

However we do want to ensure that people don't loose their files if their database becomes corrupt, so I'm imagining we'd also store all the file metadata in each backend too as a backup, and write a tool to rebuilt the "live" version of the assets from those backed-up metadata files in an emergency. We'll also want to make sure it's easy to add REST APIs or similar to replace situations where people are uploading files via FTP, etc.

Another tricky bit of this would be garbage collection - we'd need a process to archive or remove old versions once they were no longer referenced. The metadata backup might help there too.

Thoughts? Objections?

Hamish Friedlander

Conrad Dobbs

unread,
Oct 22, 2014, 7:28:30 PM10/22/14
to silverst...@googlegroups.com
Sounds promising.

While current has_one relationships wouldn't be too bad to migrate, many_many relationships might be a bit trickier as you'll need to create a new DataObject to handle the relationship (and store extra fields like sorting). Or, would this be handled by having a separate field for multiple files, so the file set gets versioned on the main record as well.

The idea of having the filesystem backup a S3 implementation is interesting, could be setup to work like Zend_Cache, where you can use different backends with one option being to use the Two Level backend, which can combine multiple backends.

Hamish Friedlander

unread,
Oct 22, 2014, 7:37:53 PM10/22/14
to silverst...@googlegroups.com
For existing code, you wouldn't have to migrate anything if you didn't want to, as the File object itself wouldn't be going away.

But for updates or new code, yeah, you'd need to create a new DataObject class with the File field, and then have a has_many from the owner object to that new DataObject. I think in most situations where you're using a has_many that would lead to a more natural data model anyway, but I'm interested to see specific examples if you've got any.

The backend system as described is definitely designed to handle cascading / multiple active backends. S3 is tricky to work with if you stick to the rules (there's a delay after uploading before the file is available which theoretically could be several minutes, it might fail and never become available at all, etc). None of those really seem to happen in practise, but still if we can design a system that handles them nicely that's better.

Hamish Friedlander

Conrad Dobbs

unread,
Oct 22, 2014, 8:09:45 PM10/22/14
to silverst...@googlegroups.com
My one problem with moving to a DataObject and a has_many to store multiple files is that you then normally end up moving to using a GridField and using another module to manage bulk uploading. You can use the UploadField to handle multiple files currently, but it doesn't do sorting, unless you use a module which adds this in. It would be great if the default setup handled uploading single/multiple images with sorting, as this is probably 90% or more of most uses cases.

If you had something like a MultipleFiles field, this could work like a DataList, giving you First, Last etc. A simple implementation might be to return a hash which is the combined hash of it's files in order. So if any of the files change or are reordered, the hash of the MultipleFiles field would change creating a new version.

Marcus Nyeholt

unread,
Oct 22, 2014, 9:38:27 PM10/22/14
to silverst...@googlegroups.com
The content-services + cdncontent module does a lot of this already (including the idea of a 'file pointer' field for binding a file reference to a data object directly). Some more historical discussion on this topic is at https://groups.google.com/forum/?hl=en&fromgroups=#!topic/silverstripe-dev/Z7CmioND5Ow



It currently works by binding an extension to the File class, so is not a complete replacement for that just yet; we've just started using it for a few real projects to see how it fares in the wild, and so far it's letting us do everything we need (including with the versionedfiles module, and resampling images etc). Documentation is sparse at present though, as the code is somewhat prone to change. 

So, while this wouldn't be a drop-in for something to go into framework, I'd hope that the underlying file management functionality would use an established project for doing things - Flysystem (https://github.com/thephpleague/flysystem) came out after I started playing around with things, but would probably be worth looking into. 

--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to silverstripe-d...@googlegroups.com.
To post to this group, send email to silverst...@googlegroups.com.
Visit this group at http://groups.google.com/group/silverstripe-dev.
For more options, visit https://groups.google.com/d/optout.

Zauberfisch

unread,
Oct 22, 2014, 10:33:27 PM10/22/14
to silverst...@googlegroups.com
ok, so I have just spoken with Hamish about this, here is a quick
summary of what we talked about:

first, I was afraid he might be proposing to actually store raw files in
the database.
But I am relieved to hear: No. Hash in the database. File itself
"somewhere" - either on the filesystem, or S3, or some other backend

some backends take care of storing and naming files for us (like s3) to
provide unique URLs.
In case of filesystem, we would need to do that ourselves, and it might
look something like this:

eg: /assets/deadbeef1234/my-photo.png
or: /assets/d/e/deadbeef1234/my-photo.png

so it basically is "assets/$hash/$base_filename", where the $hash is
unique hash that identifies a version of a file, and $base_filename to
name of the file, to keep that in tact when downloading / viewing.


the API for a backend might look something like:

interface FileBackend {
function store($key, $data, $filename);
function get($key);
function link($key);
}

- store puts the file to where it belongs (filesystem, s3, ftp, ...)
- get returns the file object or null if no file with that hash exists
- link returns the link for the file if the backend is able to provide
links or null otherwise (we might have to discuss if a provider has to
be able to provide a link or not)

this backend is not to be confused with the file handling itself, so a
developer using SilverStripe (building a website) probably never touches
the backend, because the File class (or what ever name it has) will just
utilise the backend to store the file.
Getting a file probably still looks like this:

$myFile = File::get()->byID(42);
$link = $myFile->link(); // this in turn will call
$fileBackend->get('deadbeef1234'); and return the returned value.


<°(((-<
> <mailto:silverstripe-d...@googlegroups.com>.
> To post to this group, send email to
> silverst...@googlegroups.com
> <mailto:silverst...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/silverstripe-dev.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "SilverStripe Core Development" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to silverstripe-d...@googlegroups.com
> <mailto:silverstripe-d...@googlegroups.com>.
> To post to this group, send email to silverst...@googlegroups.com
> <mailto:silverst...@googlegroups.com>.

Sam Minnée

unread,
Oct 22, 2014, 10:56:07 PM10/22/14
to silverst...@googlegroups.com
I think this is a good idea, but ideally I'd like to see the file management system refactored such that there was an alternative "backend" that maintained something closer to the current structure.

I see this has useful for a couple of reasons:

 - It allows for backwards compatibility without holding us back too much - the new approach can be implemented as a separate backend
 - The current approach is a lot simpler, and might still have its use-cases
 - We could potentially start your more involved implementation as a module, and pull it into core once we've seen it in action. Compared to raising the pre-release work as a pull request, it will be easier for developers to try things out.

I'd advocate that approach for ongoing development generally: refactor current behaviour to be one backend implementation that plugs into an interface, and then doing Next Cool New Thing as an alternative implementation.

If it's not feasible to do what I suggest, well, then we can reconsider, but I see a lot of value in what I'm proposing. One way to assess that would be to sketch out the PHP interface that the file saver backend would take.

A specific impact I can see is that you wouldn't be able to say your file GUID is always a SHA of its content. In the case of a backwards compatible system, the GUID would probably be the filename, relative to either the webroot or the assets/ folder. As long as the file backend is responsible for generating the GUIDs, I don't see this as excessively problematic.

We might, however, make a recommendation that for future backend implementations, developers use content SHAs as GUIDs. We might also have some mechanism for the backend implementation to indicate whether the content referenced by a GUID is immutable.

Sam Minnée

unread,
Oct 22, 2014, 11:01:50 PM10/22/14
to silverst...@googlegroups.com
Given the backends it already has available, the high standard of code that comes out of The League, and the low level of scaffolding/setup needed to make use of the library, I think that Flysystem would be a great tool for the core persistence piece.

However, I still believe we'd need to have some additional interface that sits between Flysystem and the rest of SilverStripe, to decide how values from file fields are mapped to filesystem locations.

So, the refactoring would like something like:

File DataObject, AssetAdmin, etc <--> new interface <--> Flysystem

Hamish Friedlander

unread,
Oct 22, 2014, 11:57:24 PM10/22/14
to silverst...@googlegroups.com
We could refactor the current system to just have pluggable filesystem backends for Files and Images without any of the rest of the design, but that doesn't solve the versioning or other issues. It also doesn't seem to get us significantly closer to the system I described (although we could always just not do that) - although it will require fixing the same "core is tied to assets" issues.

Specifically, my design relies on the filename being the same on all backends. Otherwise when using cascading backends, you need to store the filename for each backend in the database. You might not even know that when first having the file uploaded - I'm imagining the filesystem -> s3 synchronisation would happen in the background, so the files will just "appear" on S3 at some point in the future without any opportunity to query the filename or write it back to the database.

So I can't see an easy bridge between "the backend for Files and Images is pluggable" to "it's easy to have versioned files", or at least no easier than it would be now.

Hamish Friedlander

Sam Minnée

unread,
Oct 23, 2014, 12:34:28 AM10/23/14
to silverst...@googlegroups.com

We could refactor the current system to just have pluggable filesystem backends for Files and Images without any of the rest of the design, but that doesn't solve the versioning or other issues. It also doesn't seem to get us significantly closer to the system I described (although we could always just not do that) - although it will require fixing the same "core is tied to assets" issues.

My suggestion is that this refactoring means that all of the work that you'd like to do—which is good work—is the creation of the new backend rather than the mandatory replacement of current core functionality.

What I am recommending is to design the backend API such it is possible to make a backwards-compatible back-end that doesn't support versioning.

As a quick sketch, something like this: https://gist.github.com/anonymous/ea30f2d02f5bc8230216
It's missing a lot of methods to see if files exists, delete, etc.

Key points:

 - GUIDs are generated by the backend and assumptions aren't made about their format (could be a filename or a SHA).
 - "relative URL" is passed to the create files methods (setContent and transferFile) to assist in deciding how the content should be saved.
 - Deciding what the link is should be the responsibility of the back-end.

If you have those three features in your API we'll be able to make a backend that saves to the assets directory, that doesn't support versioning. Specifically, it would be a mutable backend.

Specifically, my design relies on the filename being the same on all backends. Otherwise when using cascading backends, you need to store the filename for each backend in the database. You might not even know that when first having the file uploaded - I'm imagining the filesystem -> s3 synchronisation would happen in the background, so the files will just "appear" on S3 at some point in the future without any opportunity to query the filename or write it back to the database.
 
If I understand correct, what this would mean for my API above is that the *GUIDs* would need to be consistent across backends, so that this code of this form would work:

if($s3Backend->hasFile($guid)) {
  return $this->redirectTo($s3Backend->getLinkForGUID($guid);
} else {
  return $this->redirectTo($filesystemBackend->getLinkForGUID($guid);
}

Thinking about it more, I think that content-SHA is an inappropriate GUID. The problem is that uploading the same file into two different places is something that is quite likely to happen from time to time, and if the download URL isn't able to be different for those two download links, it will confuse people as to what's going on.

A simple solution would be to pack the user-expected URL into the the GUID, make it something like "sha:relativeURL". I'm sure there are better solutions too, but it's worth considering this problem independently from my previous commentary about backwards compatibility.

If we're going to drop the whole notion of the Files & Images section letting you manage a hierarchy of files, where the hierarchy corresponds to the URL, that's probably a bridge too far.

Michael Strong

unread,
Oct 23, 2014, 1:54:01 AM10/23/14
to silverst...@googlegroups.com
Personally I don't see why versioning needs to be in core. I think the core API should expose itself enough to easily add in this functionality as a module, which at the moment it doesn't do (at least not without causing pain and torment to those involved).

Just so I understand, the File data field will store a hash, right? In which case, what would the job of the File DataObject be?

As is already being discussed on Github i've been working on the abstraction of the filesystem backend which will solve one of the issues here. I'm also going to talk to Hamish at some point in the future to ensure we're not going to cause any conflicts and we're both heading in the same direction. Any decisions that come out of that will be relayed back to the dev list for discussion.

Michael


On 23 October 2014 17:34, Sam Minnée <s...@silverstripe.com> wrote:

We could refactor the current system to just have pluggable filesystem backends for Files and Images without any of the rest of the design, but that doesn't solve the versioning or other issues. It also doesn't seem to get us significantly closer to the system I described (although we could always just not do that) - although it will require fixing the same "core is tied to assets" issues.

My suggestion is that this refactoring means that all of the work that you'd like to do—which is good work—is the creation of the new backend rather than the mandatory replacement of current core functionality.

What I am recommending is to design the backend API such it is possible to make a backwards-compatible back-end that doesn't support versioning.

As a quick sketch, something like this: https://gist.github.com/anonymous/ea30f2d02f5bc8230216
It's missing a lot of methods to see if files exists, delete, etc.

Key points:

 - GUIDs are generated by the backend and assumptions aren't made about their format (could be a filename or a SHA).
 - "relative URL" is passed to the create files methods (setContent and transferFile) to assist in deciding how the content should be saved.
 - Deciding what the link is should be the responsibility of the back-end.

If you have those three features in your API we'll be able to make a backend that saves to the assets directory, that doesn't support versioning. Specifically, it would be a mutable backend.

Specifically, my design relies on the filename being the same on all backends. Otherwise when using cascading backends, you need to store the filename for each backend in the database. You might not even know that when first having the file uploaded - I'm imagining the filesystem -> s3 synchronisation would happen in the background, so the files will just "appear" on S3 at some point in the future without any opportunity to query the filename or write it back to the database.
 
If I understand correct, what this would mean for my API above is that the *GUIDs* would need to be consistent across backends, so that this code of this form would work:

if($s3Backend->hasFile($guid)) {
  return $this->redirectTo($s3Backend->getLinkForGUID($guid);
} else {
  return $this->redirectTo($filesystemBackend->getLinkForGUID($guid);
}

Thinking about it more, I think that content-SHA is an inappropriate GUID. The problem is that uploading the same file into two different places is something that is quite likely to happen from time to time, and if the download URL isn't able to be different for those two download links, it will confuse people as to what's going on.

A simple solution would be to pack the user-expected URL into the the GUID, make it something like "sha:relativeURL". I'm sure there are better solutions too, but it's worth considering this problem independently from my previous commentary about backwards compatibility.

If we're going to drop the whole notion of the Files & Images section letting you manage a hierarchy of files, where the hierarchy corresponds to the URL, that's probably a bridge too far.

On Thursday, 23 October 2014 16:57:24 UTC+13, Hamish Friedlander wrote:
We could refactor the current system to just have pluggable filesystem backends for Files and Images without any of the rest of the design, but that doesn't solve the versioning or other issues. It also doesn't seem to get us significantly closer to the system I described (although we could always just not do that) - although it will require fixing the same "core is tied to assets" issues.

Specifically, my design relies on the filename being the same on all backends. Otherwise when using cascading backends, you need to store the filename for each backend in the database. You might not even know that when first having the file uploaded - I'm imagining the filesystem -> s3 synchronisation would happen in the background, so the files will just "appear" on S3 at some point in the future without any opportunity to query the filename or write it back to the database.

So I can't see an easy bridge between "the backend for Files and Images is pluggable" to "it's easy to have versioned files", or at least no easier than it would be now.
 

--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to silverstripe-d...@googlegroups.com.
To post to this group, send email to silverst...@googlegroups.com.



--
Michael Strong | Platform Support Developer
SilverStripe
http://silverstripe.com/

Phone: +64 4 978 7330
Skype: micmania1

Hamish Friedlander

unread,
Oct 23, 2014, 2:30:23 AM10/23/14
to silverst...@googlegroups.com
 
My suggestion <snip>

Thanks for the detail. It's basically what I thought you were suggesting.

I guess the question is - what value are you trying to retain by keeping the existing assets structure? Is it for backwards compatibility, or to allow people to still browse the file structure outside SilverStripe, or some other reason?

File fields, as opposed to File objects, don't have a default relativePath. They have a base name (a filename with no folders) but they live on an object, not in a folder structure. We could invent one, like "fields/{$ClassName}-{$ID}/basename.jpg", but that's entirely synthetic, so it's still not useful for retaining browse-ability.

You're also still breaking versioning, because the backwards-compatible backend will return the same GUID for multiple versions of the file, as long as it's relativePath is the same.
 
The problem is that uploading the same file into two different places is something that is quite likely to happen from time to time

It shouldn't ever happen, I'm assuming we're making the backend storage locations off-limits to CMS users except through SilverStripe. This is required for any sort of workflow - no project owner is going to accept "workflow is always enforced, except when you use this common method to bypass it". It's also required for versioning - there's no file structure that'll clearly indicate versioning to ensure editing the same file in two difference places doesn't end up with mistakes being made.

A simple solution would be to pack the user-expected URL into the the GUID, make it something like "sha:relativeURL". I'm sure there are better solutions too, but it's worth considering this problem independently from my previous commentary about backwards compatibility.

I'm already sort of suggesting that - the key would actually be "{$sha}/{$basename}". It just wouldn't include the relative path. (We could, but since the hash comes first it wouldn't be for backwards compatibility or browse-ability reasons).
 
If we're going to drop the whole notion of the Files & Images section letting you manage a hierarchy of files, where the hierarchy corresponds to the URL, that's probably a bridge too far.

I'm only suggesting dropping that last part - i.e that the full folder hierarchy corresponds to the URL. And as I say, we don't really even need to do that - just stick the hash in first. I am suggesting that the hierarchy visible in Files & Images isn't replicated anywhere else - it's in the database only.

Hamish Friedlander


Hamish Friedlander

unread,
Oct 23, 2014, 2:33:23 AM10/23/14
to silverst...@googlegroups.com
On 23 October 2014 18:53, Michael Strong <mst...@silverstripe.com> wrote:
Personally I don't see why versioning needs to be in core. I think the core API should expose itself enough to easily add in this functionality as a module, which at the moment it doesn't do (at least not without causing pain and torment to those involved).

The whole "File as field" concept could be pulled out to a module, but CMS would rely on it (like it would on Versioned if that was pulled out into a module). Without CMS, there's no Files & Images, so this change would affect the same audience either way.

Hamish Friedlander

Daniel Hensby

unread,
Oct 23, 2014, 5:07:29 AM10/23/14
to silverst...@googlegroups.com
For a long time I've felt that the file management in SS needs to be decoupled from the filesystem so that the File objects can run off of any storage medium (remote or local), so this is a great discussion to be having.

I agree with Sam, though. I think that the changes Hamish is proposing are exciting but feel like far too much of an overhaul and complete re-invention rather than a slightly more incremental change (which is what I think is needed at least to start). The biggest thing missing at the moment is the ability for developers to have control over the way files are stored, it's extremely tightly coupled to the filesystem and that needs to be abstracted away into a Backend that can be replaced.

The current behaviour is suitable, Versioned files aren't a strict requirement of core, IMO, at least no more than other types of versioned dataobjects. There is a lot to be said for simplicity and the fact that the logical filesystem is the same in the CMS GUI as it is on the file-system. Storing files in `assets/[hash]/[filename]` is not a way to make things intuitive or obvious to a developer or CMS user. I'd also have thought that, to make it obvious to CMS users, you'd have to abstract away the file system structure to present files based on where they think they are in an imaginary filesystem.

I'm not sure of the benefit of storing files in a folder that is the hash of the file they contain; the hash can just be stored in the DB as a field, this way the structure of the DB and the filesystem can easily be rebuilt if required. Also, having control over URLs and the names of files is important so assuming we can just use hashes for addressing files by URL is a bit nasty, IMO.


Daniel Hensby
Director

Better Brief

e: daniel...@betterbrief.co.uk
t:  020 7183 9266
w: http://www.betterbrief.co.uk

--

Sam Minnée

unread,
Oct 23, 2014, 5:26:29 PM10/23/14
to silverst...@googlegroups.com
 
Personally I don't see why versioning needs to be in core. I think the core API should expose itself enough to easily add in this functionality as a module, which at the moment it doesn't do (at least not without causing pain and torment to those involved).

That's kind of what I was getting at with "the current behaviour should be 1 backend, and then we can make another back-end that is versionable". I tried to clarify what requirements the API would need to have in my previous comment.

Just so I understand, the File data field will store a hash, right? In which case, what would the job of the File DataObject be?

It would store the file hierarchy. It doesn't really change the responsibilities of the File object much - right now we're storing a filename, in the new model we'd just store a hash.

As is already being discussed on Github i've been working on the abstraction of the filesystem backend which will solve one of the issues here. I'm also going to talk to Hamish at some point in the future to ensure we're not going to cause any conflicts and we're both heading in the same direction. Any decisions that come out of that will be relayed back to the dev list for discussion.

Yeah - agreed. It seems like these pieces of work are in a similar space and so it would be good to build on the work you've already done. Looking at your code, one refactoring would be to push all of the calls to the file backend from the File.php to the File DB field.

Sam Minnée

unread,
Oct 23, 2014, 5:45:32 PM10/23/14
to silverst...@googlegroups.com

I am suggesting that the hierarchy visible in Files & Images isn't replicated anywhere else - it's in the database only.


My view is that I have strong opinions about the API and people can do whatever the heck they like in the backends. So, as long as its *possible* to make a backend that's more like the current system, I'm relaxed about the storage scheme we take for a new versionable system.

If you boil it down, what we're talking about is whether we index files by "<hash>:<basename>" or "<hash>:<relativeURL>". I see benefits in the latter, because

 - it gives us more options when building backends
 - it makes it easier to have friendlier URLs for free
 - it lets us build a backward-compatible backend that a few people at least seem to see value in

However, I haven't considered all of the disadvantages or limitations. Some that come to mind:

 - If you just chuck a file field on a data object, there's not going to be a notion of a relative URL—you'll only have a base name. So, if you plugged the "regular old assets/ dir" backend into it, you'd get a lot of files in the root.
 - The whole idea of filename conflicts is supposed to be meaningless: you should be able to upload 2 files of the same name to different file fields and not have one of them say "tsk tsk! you've used that name already!"

So you might need to have the expectation that relativeURLs are just a suggestion, and that the backend needs to be able to store multiple files with the same relativeURL. Alternatively, you could give the backend some mechanism for resolving conflicts as it sees fit, such as having the "save object" method return the relativeURL, even though it was included as input. You could then have a simple backend chuck everything without a base dir into an Uploads/ directory.

Or, as a complete alternative, the filesystem backend could be passed a parent object, which could be the Folder object that you're uploading to, or the DataObject on which the file-upload object sits. This would let the system be a bit more flexible in giving context to what the URL of the file should be - e.g. the URL of the file could be based on $parent->Link().

Regardless, I think the idea of context-free files floating in a large SHA-hashed bucket is better in theory than in practise, and that building some kind of context into the files in the bucket will help more than harm.

Will Morgan

unread,
Oct 24, 2014, 4:29:47 PM10/24/14
to silverst...@googlegroups.com
This is a great discussion to be having! There are definitely a few ugly parts of the file system that need modification.

Sam's idea of using Flysystem or <insert pre-rolled component here> appeals to me, and I think we should be taking that approach for more things in core instead of re-inventing the wheel. One other piece of functionality I'd like to have in core is the ability to download files from remote sources and add them to this new location-agnostic filesystem via a URL, S3 reference, etc.

I also see this as being a piece of work where we could really replace a lot of functionality without breaking much, if any BC at all.

I guess the question is... who's working on something now? And if you are, post your fork so I can help :)

Hamish Friedlander

unread,
Oct 27, 2014, 4:06:03 AM10/27/14
to silverst...@googlegroups.com
"Just replicate exactly what we've got, except it can go in S3 too" would be an improvement over the current system, but there's no particular technical difficulty in it, it's just work.

However I can't see any way to solve versioning or replication across backends without some core support. So I'd oppose any solution that didn't at least have a clear idea how to allow those problems to be solved.

Versioning is already an optional part of Framework. However I see it as an integral part of CMS. I've never worked on a CMS project where versioning wasn't desirable, and can count on one hand the number where it wasn't an absolute requirement. Currently it's almost inevitable that part way through a project we find ourselves saying to a client "this part is versioned, and won't affect the page until you click Save & Publish, but this other part affects the live site immediately". It's confusing to the client, and one of the most significant UX issues in the CMS in my opinion.

And we're basically *only* discussing CMS here, since the File class and the Files & Images section are in CMS.

Sam, I think we can probably design a backend interface API that lets us write both "versions and everything" backends, and "no features, backwards compatible" backends, but it will increase the size (and therefore complexity) of that interface. A good example: we'd need to figure out how to indicate to a user whether the backend they were using supported versioning or not. I'm still not sure what's driving the desire to keep the current system though, and figuring that out would help a lot. For instance, if it was to allow direct view of the current live assets, we could do what we do now with tables, where we have assets_versions and assets_Live.

Will, the reason I've brought up this discussion, and we've marked it as "planned" in UserVoice, is that we're looking to work on this (at least, as long as we an get a general agreement on a solution we'd still like to see implemented). Current plans are to start fairly soon with technical experiments. Happy to involve you in that work if you'd like once it kicks off.

Hamish Friedlander

Simon J Welsh

unread,
Oct 27, 2014, 4:28:46 AM10/27/14
to silverst...@googlegroups.com
On 27/10/2014, at 19:06, Hamish Friedlander <ham...@silverstripe.com> wrote:
> "Just replicate exactly what we’ve got, except it can go in S3 too" would be an improvement over the current system, but there's no particular technical difficulty in it, it's just work.

Something shouldn’t need to be technically difficult before it gets solved.

> And we’re basically *only* discussing CMS here, since the File class and the Files & Images section are in CMS.

File, Folder, Filesystem and Image are all in framework, not CMS. AssetAdmin is the only part in the CMS.

---
Simon Welsh
Admin of http://91carriage.com/

Hamish Friedlander

unread,
Oct 27, 2014, 4:35:23 AM10/27/14
to silverst...@googlegroups.com

> "Just replicate exactly what we’ve got, except it can go in S3 too" would be an improvement over the current system, but there's no particular technical difficulty in it, it's just work.

Something shouldn’t need to be technically difficult before it gets solved.

No, but I've got some people who are willing to solve the other bit. And if you ignore a feature because it's technically challenging when solving something, the risk is you'll have to do the work again if you want to include the feature later.
 
> And we’re basically *only* discussing CMS here, since the File class and the Files & Images section are in CMS.

File, Folder, Filesystem and Image are all in framework, not CMS. AssetAdmin is the only part in the CMS.

Huh, was sure I checked this last week, and was semi-surprised to see they were in CMS. But you're right, they are in framework.

Hamish Friedlander

Ingo Schommer

unread,
Oct 27, 2014, 4:07:53 PM10/27/14
to silverst...@googlegroups.com
Awesome discussion! Somebody (Hamish?) should summarise this into a design doc soon though, it becomes quite time intensive to read ;)

Hamish wrote:
- By just storing by key, there's no difficultly keeping multiple backends in sync. When using multiple servers and storing locally, rsyncing between servers will never raise a clash. And we can just ask each registered backend in turn "got this key?", so when we're using S3 we can serve from S3 once the async replication has uploaded it, and off a local filesystem before then.

I like the idea of "cascading backends", but I can't see from the above discussion if the design also considers parallel backends? For example, you might want to store all images uploaded by content editors on S3, but get documents they reference out of their own readonly document management system (DMS). This could be solved by adding a backend type to each File record. While you could go and ask each backend for availability of this key, I'd say that's a performance issue, particularly for remote backends where this check is a relatively slow HTTP call.

The case of a DMS backend might be a misuse of the API though, since other consumers are expected to contribute to the file repository there (create, remove and change keys as well as content), causing sync issues. Do we expect to be the only consumer and contributor to repositories used by the File API? We haven't discussed what happens to the Folder API much in here. I guess it would be only one way to structure file records, while others might rely on HTTP calls to list files and traverse hierarchies. Yet others might rely on non-hierarchical local organisation through tags (many-many). So far I haven't seen anything in the presented interfaces that would contradict these use cases, but we should consider these aspects when designing, right?

Regarding a smooth transition to this new system, I don't really see how this can gain adoption as "the way to manage files in SS" without core modifications. There's 72 uses of the ASSETS_PATH constant in core, all of these hardcode assumptions about local filesystem usage. 

Another other big question: Do we provide stream-based APIs? Flysystem does: http://flysystem.thephpleague.com/api/

I was wondering about asynchronous APIs to better support ReactPHP-style serving, but it seems the Flysystem guys decided its not viable at the moment so I'd be inclined to follow their decision: https://github.com/thephpleague/flysystem/issues/198

Ingo

On Thursday, October 23, 2014 10:28:25 AM UTC+13, Hamish Friedlander wrote:

Conrad Dobbs

unread,
Oct 27, 2014, 5:55:10 PM10/27/14
to silverst...@googlegroups.com
 

On Tuesday, October 28, 2014 9:07:53 AM UTC+13, Ingo Schommer wrote:
Awesome discussion! Somebody (Hamish?) should summarise this into a design doc soon though, it becomes quite time intensive to read ;)

Hamish wrote:
- By just storing by key, there's no difficultly keeping multiple backends in sync. When using multiple servers and storing locally, rsyncing between servers will never raise a clash. And we can just ask each registered backend in turn "got this key?", so when we're using S3 we can serve from S3 once the async replication has uploaded it, and off a local filesystem before then.

I like the idea of "cascading backends", but I can't see from the above discussion if the design also considers parallel backends? For example, you might want to store all images uploaded by content editors on S3, but get documents they reference out of their own readonly document management system (DMS). This could be solved by adding a backend type to each File record. While you could go and ask each backend for availability of this key, I'd say that's a performance issue, particularly for remote backends where this check is a relatively slow HTTP call.

Originally I didn't see the benefit of having a standardised key, thinking it would be better if the backend returned the key based on the information passed to it when storing the file. I can see the benefit though if you implementing multiple backends which need to store the same file based on the same key.
 

The case of a DMS backend might be a misuse of the API though, since other consumers are expected to contribute to the file repository there (create, remove and change keys as well as content), causing sync issues. Do we expect to be the only consumer and contributor to repositories used by the File API? We haven't discussed what happens to the Folder API much in here. I guess it would be only one way to structure file records, while others might rely on HTTP calls to list files and traverse hierarchies. Yet others might rely on non-hierarchical local organisation through tags (many-many). So far I haven't seen anything in the presented interfaces that would contradict these use cases, but we should consider these aspects when designing, right?

I'd been thinking about the possibility of using tags, but wasn't sure if it would be worthwhile as it is an edge case. Sounded cool in my head, just not sure how useful it would be. I pictured it being separated into two interfaces, a File interface, and a FileContainer interface. A Folder would implement the FileContainer interface.

What might be more/equally challenging is how this is integrated into the separate parts of the CMS e.g inserting media

Sam Minnée

unread,
Oct 27, 2014, 11:44:40 PM10/27/14
to silverst...@googlegroups.com
On Monday, 27 October 2014 21:06:03 UTC+13, Hamish Friedlander wrote:
"Just replicate exactly what we've got, except it can go in S3 too" would be an improvement over the current system, but there's no particular technical difficulty in it, it's just work.

However I can't see any way to solve versioning or replication across backends without some core support. So I'd oppose any solution that didn't at least have a clear idea how to allow those problems to be solved.

I agree. However, I still see no evidence that we need to introduce and API that makes it impossible to build a back-end that replicates the current behaviour.

My expectation for how this refactoring will play out is that in the short term, we'd see two backends:

 - A backward compatible, feature-restricted backend that either only saves files to assets, or saves files in the current simple structure to, say, a Flysystem endpoint
 - A new exciting backend with a different way of storing files that supports versioning and other fun stuff.

In general, I think that we should design the API where the File/FileDBField object talks to the backend to support a broad range of persistence approaches.

I would want some evidence that the allowing of the backward compatible backend is going to make things really complicated before I thought that dropping it was the best way forward.

To boil it down: I will be happy if you include one of the following in the "file reference key" that gets passed to the backend when looking things up.

 - Either URLs-relative-to-the-file-repository-root
 - Or an optional reference to a parent object (such a a Folder, but for future use-cases this could be a SiteTree)

I think all of the other arguments about whether or not a storage scheme is too complex can then be side-stepped, because they come down to the implementation details of specific asset storage backends, which can be provided with modules, etc.

Mateusz Uzdowski

unread,
Jan 13, 2015, 11:47:11 PM1/13/15
to silverst...@googlegroups.com
Hi,

As discussed in the RFC thread, we have rolled this discussion into an RFC.

The "RFC-1 Asset abstraction" can be found at:
https://github.com/silverstripe/silverstripe-framework/issues/3792

If you strongly feel your opinion has not been heard, please let me know!

I will submit this tomorrow to core-committers to discuss if it can be approved in the current shape.

Best regards,
m

Mateusz Uzdowski

unread,
Jan 14, 2015, 4:56:28 PM1/14/15
to silverst...@googlegroups.com
Dear dev-mailinglisters,

There is some discussion going on in the comments on the GitHub issue for the RFC, in case you didn't notice :-)

m

Damian Mooyman

unread,
Sep 13, 2015, 9:09:46 PM9/13/15
to SilverStripe Core Development
Hi all,

Just to give an update on the progress of this feature, myself and others in my team have been developing a proof of concept for this RFC (at https://github.com/silverstripe/silverstripe-framework/issues/3792). I invite you all to review and comment on the following pull requests, before we merge this into core.


Part of the foundation work has been focused on refactoring and improving support for composite database fields which, up until now, has been little more than a sparse interface. I ended up refactoring the polymorphic foreign key class, partly into this base class, and moved the classname enum into a proper DBField class (with more appropriate detection of legacy classnames). This new field type is also used by the DataObject ClassName field.

The actual core of the asset abstraction work in the subsequent pull request focuses on these parts of the RFC:
  • Development of the APL (Asset Persistence Layer) interface
  • Implement a default APL using flysystem
  • Create a proof-of-concept database field to store and present assets
This initial proof of concept is quite limited, and does not implement any of the following:
  • UploadField support
  • Support for file variants (such as generated images / documents)
  • Integration with File dataobject (or AssetAdmin).
  • File versioning
  • Support for the fourth tuple variable ($parent object)
I hope that this makes sense to those who have contributed to the discussion up until now. ;) If I have missed anything which has been previously decided in relation to this feature, please drop a note on the pull request and we can continue discussion there.

Cheers!

Damian Mooyman
Reply all
Reply to author
Forward
0 new messages