SilverStripe 4.0 File management and DBFile validation

330 views
Skip to first unread message

Damian Mooyman

unread,
Oct 7, 2015, 7:30:37 PM10/7/15
to SilverStripe Core Development
Over the last few weeks I have been working with the Open Sourcerers team to develop a set of new features for SilverStripe 4.0. The goal is to implement the RFC-1 asset abstraction functionality, as presented by Mateusz at https://github.com/silverstripe/silverstripe-framework/issues/3792

Just to give an indication of progress, we have already merged a prototype DBFile field into master, and am currently in the process of updating File dataobject to use this field, rather than itself being the manager for the physical underlying files.


Ok, so here's the problem I have run into, and am hoping that I could find help with.

Apart from FormField validation, it's also necessary to provide data-object level validation (triggered on write) to ensure that the underlying save of any file (whether to a DBFile field or a File dataobject) is saving the correct file type. This validation is extremely important when user code is relying on image content; Front-end templates sometimes rely on certain files being images, so that they can be resized, and we don't want to force template-level code to deal with that validation.

At the moment, the current status of the API looks similar to the below:

class File extends DataObject {
   
private static $db = array(
       
'File' => 'DBFile'
   
);
}

/** @deprecated */
class Image extends File{}

class DBFile extends DBField {
   
// contains code that used to be in File.php
}

class ExampleObject extends DataObject {
   
private static $db = array(
       
'BannerField' => 'DBFile'
   
);
   
private static $db = array(
       
'BannerObject' => 'File'
   
);
}


The issue with the current state of this API is that there is no way to force fields to be allowed as images only. Image ended up deprecated for the time being, in lieu of a solution for specifying a "DBFile which can only be an image".

A potential solution has been proposed, but would rely on certain other problems being solved. This API could look similar to the below:



class File extends DataObject {
   
private static $db = array(
       
'File' => 'DBFile'
   
);
}

class Image extends File{
   
public function __construct() {
        $this
->File->setAllowedCategories('image');
   
}
}

class DBFile extends DBField {
   
public function construct($types){
        $this
->types = $types;
   
}
   
public function validate() {
       
// some kind of error handling
   
}
}

class ExampleObject extends DataObject {
   
private static $db = array(
       
'BannerField' => 'DBFile(image)'
   
);
   
private static $db = array(
       
'BannerObject' => 'Image'
   
);
}



In this case, Image means "a file which can only be an image" and DBFile() or File means "a file of any type, including images". 

The issue with this code is that as DBFile is a DBField, not a FormField, thus the DBFile::validate method is never called as a part of normal processing. Because the model exists at a lower level to the form subsystem, it's expected that this error will not be able to be run through the normal form validation, which surfaces errors to the user on the submitted form.

In this case, correct assignment of images to these fields rely on:

1. The user of this code creating an UploadField with the correct allowed file types.
2. If the user doesn't do this, then some mechanism exists to force validation to be checked on all DBFields prior to saving a dataobject.

It is necessary to ensure that, regardless of the formfield validation, the underlying data model can never have necessary constraints violated.

As it stands, DBFields such as SS_Datetime, will simply set "NULL" as the value if assigned an invalid value, without surfacing any notices.

There are a couple of directions we could take to ensure data consistency:

Option 1: maintain the status quo

DBFields will validate themselves on setValue, and set themselves to null if they do not have a valid value. If the user does NOT create a formfield with the necessary validation, this means that errors could be nullified without useful front-end errors. DBFile('image') would simply nullify any images assigned with extensions that aren't image types, as would the Image dataobject.

If the user sets up their formfields properly, errors will be properly displayed there before this validation step is reached.

We can still bake these rules into the necessary auto-scaffolding, and if a developer follows the same rules for generating custom fields, they should expect that they never run into fields being nullified on error.

Option 2: Hook DBField validation into DataObject::validate

On saving of dataobject, it will have to loop through each database field in order to validate itself.

class DataObject extends ViewableData {
        public function validate() {
         $result = ValidationResult::create();
foreach ($this->config()->db as $field => $spec) {
$field = $this->dbObject($field);
$field->validate($result);
}
$this->extend('validate', $result);
return $result;
}
}

This would sacrifice some performance, since we could potentially have many fields on a DataObject, and is quite a complicated solution to ensure DBFiles assigned directly to a DataObject are validated. There is also the risk that subclasses of DataObject could override this incorrectly, blocking validation on DBFields.

Option 3: Exception on invalid type

This is similar to option 1, but instead of nullifying, an exception would be generated.

This could cause disruption to workflow in a small number of cases, meaning errors will surface as CMS ajax popups rather than via the normal form validation (possibly). Throwing an exception (even if it's a ValidationException) does not guarantee it will be rendered as a part of normal form validation. However, this is the most robust for the sake of data validation, as invalid values can never be saved.

If user-code is diligent in the generation of their formfields, such that the correct file type filters are assigned to an UploadField, they should be able to avoid this. However, we could expect that a user could neglect to set these properly when not using the default form scaffolder.

To a degree, this is already a problem in the current 3.x, as users can create UploadFields for Image dataobject without setting the type validation rules.

Option 4: Something I haven't thought of that you have.

Fill in the blanks. :)





Of these solution, my preference is probably option 3. Does anyone else have a feeling on how this issue should be addressed?

Thanks for all your assistance. :)

Kind regards,

Damian Mooyman


Marcus Nyeholt

unread,
Oct 7, 2015, 8:44:06 PM10/7/15
to silverst...@googlegroups.com
I'd lean towards #3 too, as the data validation being done is _model layer_ validation, regardless of how the data is being added/saved; handling that exception is something the UI layer should manage as a separate responsibility, but it's a model layer data issue. 

I put together a constraints module (https://github.com/nyeholt/silverstripe-constraints) to handle model data verification at the model level rather than the form level, which in my opinion is the wrong spot for it (not to say that _form data_ verification is not useful). 

Configuration looks like

private static $constraints = array(
'Notes' => array('MaxLengthConstraint' => 'length=1000'),
// or expanded as
'Status' => array(
'RegexConstraint' => array(
'regex' => '^([A-Za-z0-9 \@\(\)\*\/\:\.\,\&-]+)$',
'message' => 'Please only use alphanumeric characters, and "@()-*/:.,&"',
)
),
);

A simple constraint check ends up being something like

public function holds() {
$length = $this->opt('length', 0);
$convertNewlines = $this->opt('convertnewlines', 1);
$val = $convertNewlines ? str_replace("\r\n", "\n", $this->getValue()) : $this->getValue();
return mb_strlen($val, 'utf-8') >= $length;
}
or a little more complex

public function holds() {
$perm = $this->opt('permission');
if (!$perm) {
return true;
}
$object = $this->object;
if ($object instanceof \DataObject) {
$changed = $this->object->isChanged($this->field, 2);
if ($changed) {
return \Permission::check($perm);
}
return true;
}

so for a DBField, I'd have a constraint holds() method check the extension of $this->getValue() (or however that's resolved to the actual filename) so that config would be something like

static $db = array('Image' => 'DBFile');
static $constraints = array(
'ImageFile' => array('AllowedExtension' => 'ext=jpg,png,bmp')
);

Cheers,

Marcus

--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to silverstripe-d...@googlegroups.com.
To post to this group, send email to silverst...@googlegroups.com.
Visit this group at http://groups.google.com/group/silverstripe-dev.
For more options, visit https://groups.google.com/d/optout.

Damian Mooyman

unread,
Oct 7, 2015, 9:31:22 PM10/7/15
to silverst...@googlegroups.com
Thanks Marcus, it's really good to see some alternative solutions. I'll see if I can use ideas from your work in mine.

Damian Mooyman | Senior Platform Developer
SilverStripe

Skype: tractorcow


--
You received this message because you are subscribed to a topic in the Google Groups "SilverStripe Core Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/silverstripe-dev/cMRZ8HVe4Os/unsubscribe.
To unsubscribe from this group and all its topics, send an email to silverstripe-d...@googlegroups.com.

Marcus Nyeholt

unread,
Oct 7, 2015, 9:42:30 PM10/7/15
to silverst...@googlegroups.com
I ended up not having the time back then to push it further, but I'd really like to see some kind of model layer level constraints management as a core part of SS, so if you're going to look at the problem from that aspect, happy to help! 


Jonathon Menz

unread,
Oct 8, 2015, 1:25:01 PM10/8/15
to SilverStripe Core Development
I might be thinking a little broadly here so feel free to ignore. But it occurs to me that when we say Image in SilverStripe we mean a Raster Graphic, specifically a GIF, PNG or JPEG. These are the most common image formats traditionally seen on the web. But SVGs (scalable vector graphics) are getting a lot more popular and that's only going to increase. This is the tip of the iceberg too, ImageMagick alone support over 100 'major' formats.

Depending on whether you use GD or ImageMagick, and what version of those modules you're running, and what version of a file format spec a file uses, will all impact on whether or not we can treat a particular file as an image. Who knows what formats will be invented and supported in the future too... I imagine it won't be long before Live Photos from the new batch of iPhones are being uploaded to SilverStripe websites. We might not be able to treat those files as images right now, but we surely will at some point. The status of a file as an image is context sensitive, so I feel like it should be up to developers on individual projects to decide which formats they want to consider images, and they should be able to update that list as technology changes. Casting files specifically as Images in the database could limit this.

So I'm wondering if we could ditch the Image dataobject type, and where an image selection needs to be enforced we just have file format validation on the form. Where we want to list images only we could filter on file extension, and provide an easy way of accessing this based on what the config system says are valid image types (and may have been overridden by a developer). Side note: File extension alone might be a risky way to determine whether a file is an image. I know I've seen users mess up file names plenty of times and have seen the framework get confused when it expects .jpg but gets .JPG, .jpeg or .JPEG. Or what if someone renames my-word-doc.docx to my-word-doc.jpg?

To plug the gap we could perhaps create a RasterGraphic object that subclasses ViewableData and contains the methods currently found on Image. Then on File we have a getRaster() method, which attempts to create and return an Image object. Some helper methods like isRaster(), canRasterise() and isVector() on File could help with this and take environmental variables in to account. This could form the base for a flexible rendering system. One example use case is that you could measure the file size of an SVG and return a rasterised PNG instead if it will create a smaller file.

Anyway just the start of an idea - but if it is worth considering, probably best to do so now amidst this major File refactor.

Daniel Hensby

unread,
Oct 9, 2015, 9:53:45 AM10/9/15
to SilverStripe Core Development
I think I agree with Jono here.

I'm not sure that we should make a distinction between a "file" and an "image" - an image is just a file that we want to assign some special behaviour to, right? It's not actually important to the DB layer if this is or is not an image nor really to the ORM either.

It's only important when we want to make some kind of assumptions about how we can manipulate it in templates. Personally, I'd prefer if instead of relying on some magic on a file-like object which happens to be an image, we instead pass the file to an image manipulator and use that to perform manipulations instead.

The responsibility of ensuring that Images are uploaded when they are expected would remain the role of the developer (as it is now).

Perhaps we could register "manipulators" to the file object to allow us to do $File.ManipulateAsImage.SetSize or $File.ManipulateAsDoc.toPDF as some examples of how this could work. (sure the syntax here is verbose, but I'm doing that on purpose for readability of my example).

Alternatively a manipulator could inject methods onto object if they meet the criteria the manipulators expect (effectively just a DataExtension, I suppose) so that the "traditional" $File.SetSize() would still work.

Dan

Patrick Nelson

unread,
Oct 9, 2015, 10:44:40 AM10/9/15
to silverst...@googlegroups.com
I'd be interested in a simple API. I can genuinely appreciate the simplicity of having something like an "Announcement" object and then simply getting the "Image" associated with it. I understand the necessity for the behind the scenes complexity of needing to abstract away the Image'yness at a lower level. Just when I get to the template, I'd like to preserve the ability to simply <% loop $Announcements %>$Image $Title<% end_loop %> whilst also being fairly DRY and just defining the fact that I have a ::$has_one = ["Image" => "Image"] (or => "DBFileImage" or what-have-you). Calling something like ->SetRatio() should be available immediately off my ->Image() relation. Stuff like that.

Would any of these suggestions take away from that in any form? I know this is a pretty in-depth discussion so please excuse my shortcut taking :)

Jonathon Menz

unread,
Oct 9, 2015, 12:43:04 PM10/9/15
to SilverStripe Core Development
@patrick it would depend how far we went. For the most flexible foundation I think you would need to add one extra step (i.e. an extra method in the chain) in templates, because the Image methods wouldn't be available initially. But it could open up some really powerful possibilities for converting files in to different formats - like rendering a PDF as an image for instance, or fetching a frame from a video and creating a thumbnail. It wouldn't need to stop at Images either - as Dan suggested (and with the appropriate module installed) you could potentially convert a Word Doc to a PDF, or maybe extract the audio from a MOV file and return it as an MP3.

We could also add a forTemplate() method to File which looks at the file format and renders it appropriately if it can. This would allow you to keep using your basic $File $Title example and get an image tag if $File is an image. For manipulations though you'd probably have to do something like $File.Raster.ScaleWidth(200). The cool change here is that $File might be a JPG or it might be a PDF or an SVG - if it's possible to rasterise the resource you'll get an image file back. Might look like getRaster($format='auto') so you can specify an image format if you want to.

I think an approach like this would be more future proof and environment proof than what we're currently doing and could set a pattern for manipulating files beyond just raster graphics.

Patrick Nelson

unread,
Oct 9, 2015, 6:59:04 PM10/9/15
to silverst...@googlegroups.com
Personally I'm not too keen on the type insensitivity. I have a custom extension setup for responsive images which I've setup and incorporated the ability to chain multiple "Media()" methods which facilitates definition of multiple possible media queries, different aspect ratios and etc. Doing this wouldn't make sense on files; only images and this is used throughout the site. Once that gets rendered (i.e. reaches the end of the chain and the template automatically triggers ->forTemplate() method) it then figures out how to generate the appropriate HTML.

I like the idea of defining constraints/structure in terms of schema definition (e.g. static ::$db), not just in terms of whatever I happen to be setting on CMS fields in ->getCMSFields(). Perhaps I'm not understanding why the below code example would get in the way of some of the desired features defined in this RFC? What about this class definition (and subsequent implied template capability) would obstruct those needs?


class MenuItem extends DataObject {

private static $db = [
"Title" => "Varchar",
];

private static $has_one = [
"Image" => "Image",
"PDFDownload" => "PDFFile", // Custom extension of File object
"Category" => "MenuCategory",
];

// ... standard stuff below.

}

In the example above, the Image or PDF files (by proxy of File) could still be extensions of the proposed "FileReferenceField" which handles the nitty gritty details of where/how the file exists and what to do to generate URL's to that file and etc. I can still point to an "Image" object, maybe custom or maybe SS core, that defines a specific set of Image-like functionality that I can use (e.g. ->SetSize or my own extensions) which are naturally exclusive to that type of thing. That way you're not going to get unexpected results with calling SetSize on a PDF file or calling "GeneratePreviewImageFromPDF" (facetious example) in the template or in some random supporting layer behind the template. When I do a ->forTemplate() (implicitly by just putting the object reference in the template), I may not always want to do it the same way; I may want or need to customize my object in the template/view without being force to abstract everything way to yet another new method, but that's lesser to my point.

Does that make any sense? Am I off base in thinking the above example (or some variant of it) will not be possible? I'd be fine with calling those fields something different, as long as I can still simple have $item->Image() and $item->PDFDownload() and access my convenience methods of ->URL() and/or ->SetSize() and etc on the proper objects where it is relevant. I don't want too much magic behind the scenes; just a well abstraction to simplify my work (which this is well on its way toward anyway).

Marcus Nyeholt

unread,
Oct 9, 2015, 7:24:28 PM10/9/15
to silverst...@googlegroups.com

That actually raises the issue of end user (developer) intent, versus the actual underlying framework handling of things.  In that declaring

has_one Item => Type

at the moment is binding Item to a specific PHP class type. Whereas the actual intent is that Item is an object that behaves like a Type - but should that meab that it has to be that PHP Type?

What I'm getting at is that maybe the framework should be looking at

has_one Pdf => PDFFile

And interpreting that as

has_one Pdf => File
constraints Pdf => filetypeconstraint(pdf)

This probably also has crossovers with trait implementation in some way

Patrick Nelson

unread,
Oct 9, 2015, 7:40:00 PM10/9/15
to silverst...@googlegroups.com
Well, is there no reason why you cannot have a class (assuming it didn't already exist) called "Image" that had a protected property of  $type = "Image" (or just inferred via get_called_class()) which itself extended FileTypeConstraint?

This really avoids the mess of having magical methods which only exist or are relevant depending on a property value and instead can just be defined using PHP's built in capability of soloing methods/functionality on a per class basis. Like we already do now. But yes -- you basically need to get a jack/lift, overhaul the interstitial layers of abstraction and put it back into place so that it's (as it's supposed to be) transparent at the DataObject relation level. I don't see any issues with these files being represented as DataObjects. I think it is however the responsibility of the "File" abstraction (as it currently stands in v3.x) to no longer be so tightly bound to the file system as it is now and instead begin storing your polymorphic relationships there (under File) as inferred/set via current class relationships and etc. It can (and should) still create an actual instance of the class you're pointing to, but now it's transparently handling information about where/how it's being stored (e.g. filesystem vs. AWS vs. RackSpace files)

Sorry I'm a bit tired (long day) and this isn't coherent. 

Jonathon Menz

unread,
Oct 9, 2015, 9:01:20 PM10/9/15
to SilverStripe Core Development
I don't know specifically how something like this should be implemented - all I'm trying to say at this point is that I don't think the classification of a file as an image is black and white, so maybe we shouldn't be making the call about how to treat a file at the database level. If a file has any kind of visual component, maybe you can treat it as an image with the right tools - I don't think one set of validation rules will work for every environment. Plus consider this - if you rename a txt file to words.txt.jpg and upload it to SilverStripe, all the File functions will work fine because it's still a file. But it's going to be stored as an Image, which it just isn't. So if we're talking about validation, I think it's better to do validation when you want to actually work with the file and say "Can I interpret this file as a raster graphic right now in this environment?"
...
Reply all
Reply to author
Forward
0 new messages