How to get resource size information from Trellis

48 views
Skip to first unread message

Christoph Knabe

unread,
Mar 11, 2021, 9:28:10 AM3/11/21
to Trellis LDP
Hello folks,
for accounting purposes at our start-up spoter.ME we need to sum up the size of all resources belonging to a POD.

For example look at the source view of https://chris31.inrupt.net/public/

You will find properties st:size for each entry in this container.
So how to make Trellis or the Trellis ResourceService return such information?

In the Solid HTTPS REST API Spec is noted
"For every resource in a container, a Solid server may include additional metadata, such as the time the resource was modified, the size of the resource, and more importantly any other RDF type specified for the resource in its metadata."
Judging by the Trellis source code Trellis as such does not store the resource size. So what would be the recommended way to get the information if you still want the flexibility of different ResourceService implementations?

Thanks in advance, Christoph

Aaron Coburn

unread,
Mar 11, 2021, 9:47:26 AM3/11/21
to trell...@googlegroups.com
Hi Christoph,

Thanks for the question. You are correct that Trellis does not return this information.

The important thing to note is that Trellis implements the Linked Data Platform specification, not the Solid Specification.

As for the st:size property, while the Node Solid Server uses that property, it's a rather filesystem-specific property, and I'm really not sure how relevant that is for servers that don't use filesystem-based resources. One might be able to make an argument for calculating size for nonRDF resources, but for RDF, byte size is highly variable depending on the serialization used. There are even more difficulties for containers given that containment triples are dynamically computed. Membership triples are similarly problematic.

There is also a security angle -- producing metadata about resources that one might not have access to is, arguably, quite problematic. Effectively: I wouldn't recommend supporting this even if those data were available.

Still, if you really want that information, your options would basically be: implement your own persistence layer (ResourceService). There are two existing ResourceService implementations in the main trellis repository (TripleStore and JDBC) and a Cassandra-based on in the trellis-extensions repository that you could use as inspiration.

I hope that helps!
Aaron





--
You received this message because you are subscribed to the Google Groups "Trellis LDP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trellis-ldp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trellis-ldp/86c16ba6-6ebf-461b-aa22-6a4b6c7104den%40googlegroups.com.

Christoph Knabe

unread,
Mar 12, 2021, 5:31:53 AM3/12/21
to Trellis LDP
Hi Aaron,

thanks for your quick and detailed answer.
I agree that the size of an RDF triple set is not exactly defined and probably will not be the cost-deciding factor.

But binary resources can easily cause significant storage costs. That is why I would like to capture the resource sizes. My approach would be to store the size of each Binary resource in its BinaryMetadata.

1) In FileBinaryService.setContent the function nio.files.File.copy is called and returns the number of bytes transferred. Do you think I could store its result value in the BinaryMetadata of the Non-RDF resource?

2) Another variant would be to intercept the PUT/POST requests for binary resources and modify the BinaryMetadata accordingly.

Do you think my approach is feasible and which variant would be easier to implement?

Thanks and best wishes,

Christoph

Aaron Coburn

unread,
Mar 12, 2021, 2:57:06 PM3/12/21
to trell...@googlegroups.com
Hi, Christoph,
You can definitely store a binary's size in the BinaryMetadata object. Take a look at the BinaryMetadata::getHints method, which a binary service can use to store arbitrary data about a given NonRDF resource.

That data can then be used to inform the ResourceService about the size of a particular NonRDF resource.

This will certainly involve extending some of the existing classes, but it shouldn't be too difficult. The security concerns I mentioned earlier still apply, so I wouldn't want that to go into the core Trellis code, but for a downstream application, you could extend the existing classes easily enough.

In terms of intercepting the PUT/POST requests, the challenge has more to do with streaming that into the back-end system. The core of Trellis allows for direct streaming into a backend binary storage system. It also allows for capturing that data, buffering/staging it somewhere and then passing it through to a backend. That is all the job of the BinaryService implementation. If you are implementing your own BinaryService (this is not difficult), there are a lot of different ways to do this -- all of that is really just implementation detail.

Personally, if I were implementing this feature using a file-based store, I wouldn't worry about capturing the data on writes. I would instead extract it on read operations directly from the underlying store. If the concern is about enforcing quotas, that would suggest a different set of considerations, but either way, I would rely on the data provided by the underlying blob store rather than trying to manage it manually based on inflight HTTP bodies.

Cheers, Aaron



Christoph Knabe

unread,
Mar 15, 2021, 7:31:29 AM3/15/21
to Trellis LDP
Hi Aaron,
thank you very much for your explanations.
Now I have some material to consider.
Best wishes,
Christoph

Reply all
Reply to author
Forward
0 new messages