External Content: Redirect or Proxy?

167 views
Skip to first unread message

Peter Matthew Eichman

unread,
Jan 9, 2018, 11:49:22 AM1/9/18
to fedor...@googlegroups.com
Hello all,

Are binaries with external content (i.e., Content-Type: message/external-body; URL=https://...) served via an HTTP redirect (thus exposing the external URL) or are they proxied by the fcrepo server?

If they are not proxied by default, is there a way to set that up? I feel like proxying would allow you to rely on fcrepo for resource authorization without having to duplicate access rules on the web server serving the actual binaries. In addition, that server could be locked down to only accept requests from the fcrepo server.

Thanks,
-Peter

--
Peter Eichman
Senior Software Developer
University of Maryland Libraries

Esmé Cowles

unread,
Jan 9, 2018, 1:07:13 PM1/9/18
to fedor...@googlegroups.com
Peter-

Currently, it just sends an HTTP redirect. But the spec says it may proxy the content or copy it locally. And for the reasons you outline, I think that proxying would be better in many ways. There is a ticket for adding support for local-file external content, but that doesn't look like it will necessarily enable proxying for remote HTTP content. Though once the local-file proxying is in place, adapting it to support proxying HTTP would probably be a pretty small enhancement.

-Esmé
> --
> You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
> To post to this group, send email to fedor...@googlegroups.com.
> Visit this group at https://groups.google.com/group/fedora-tech.
> For more options, visit https://groups.google.com/d/optout.

Benjamin Pennell

unread,
Jan 10, 2018, 9:21:33 AM1/10/18
to Fedora Tech
To follow on what Esme said, I am currently working on the ticket to add support for local-file proxying, and following that it should be pretty simple to add proxying of http external body resources as well.  There's just not a way to request it in the API at this point to the best of my knowledge, so I'm not clear on how to expose it.  I haven't looked into that too much yet, but if others have suggestions on how to request that behavior I would be interested.

Peter Matthew Eichman

unread,
Jan 10, 2018, 11:02:23 AM1/10/18
to fedor...@googlegroups.com
Thanks Esmé and Ben! I'm glad to hear this is a feature that is being worked on.

Regarding how to request proxy vs. redirect behavior (is that your question, Ben?), I feel like that is something that should be specified by the resource in the repository, and not a behavior controllable by the client via the REST API. For instance, for our potential use case, we would not want to expose the HTTP URIs for the externally stored binaries, nor would we want to give an arbitrary client the ability to override that decision. Thus it seems to me like a parameter on the MIME type would be the most appropriate; something like "message/external-body; access-type=URL; URL=https://...; retrieval-strategy=proxy". But there does not appear to be any standard for this. Maybe it could also stored in a triple on the binary's metadata.

-Peter


To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech+unsubscribe@googlegroups.com.

To post to this group, send email to fedor...@googlegroups.com.
Visit this group at https://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.

Benjamin Pennell

unread,
Jan 10, 2018, 12:57:30 PM1/10/18
to fedor...@googlegroups.com
Thanks Peter, yes that is what I meant in a much better articulated way :)  And I agree that whether something is proxied or not should be controlled by the server.  Including an extra parameter in the content type seems like a good approach to me, and keeps the details of the external body together.

Similarly I've added a non-standard subfield to allow clients to specify a mimetype for the referenced content (since local-files aren't behind a web server to provide a content-type response header) since submitting two Content-Type headers wasn't an option.  Once I open the PR, feedback would be very much appreciated on the design if you're interested.  Hopefully that should be later today.

--
You received this message because you are subscribed to a topic in the Google Groups "Fedora Tech" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fedora-tech/kMVBaX5kViU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fedora-tech+unsubscribe@googlegroups.com.

Benjamin Armintor

unread,
Jan 10, 2018, 2:10:08 PM1/10/18
to fedor...@googlegroups.com
Hi,

I'd be remiss not to mention that "whether something is proxied or not should be controlled by the server" is not uncontroversial: Even administrative clients or privileged ones are still clients, and the variability of networked storage might make proxying punitive in performance terms (cf questions historically about how to map an FCR3 datastream URI to a file path, or the expressed interest in using HEAD to a remote-storage LDP-NR to get the "real" path).

Also, for what it's worth, submitting two Content-Type headers *should* be an option (it's a behavior that is explicitly described in RFC2017 https://datatracker.ietf.org/doc/rfc2017/) and (in the event that your external body includes a local file path) there's the "local-file" external body type in RFC 1521.

- Ben

Andrew Woods

unread,
Jan 10, 2018, 2:51:15 PM1/10/18
to fedor...@googlegroups.com
Hello Peter and Bens,
Maybe this has not been completely resolved, but after a variety of previous conversations in GitHub and otherwise, I believe we are moving away from Redirects in favor of Proxying content.

Related:

Regards,
Andrew

Benjamin Armintor

unread,
Jan 10, 2018, 3:03:27 PM1/10/18
to fedor...@googlegroups.com
Right, but critically that proxying behavior [SHOULD] expose the external location- I might have read too much into the original post, though!

Benjamin Pennell

unread,
Jan 10, 2018, 3:09:56 PM1/10/18
to Fedora Tech
Ben A,

In terms of multiple Content-Type headers, I am going to include a note about that in my PR, but basically I found out the hard way that some web containers strictly require there to be a single Content-Type header on requests, including the one being used for integration tests in fcrepo.

I don't have an opinion really on whether access-type=url resources should be proxied or how that should be controlled.  I am leaving the proxying aspect intact at present, but could change that if it comes up in feedback.
Peter Eichman
Senior Software Developer
University of Maryland Libraries

--
You received this message because you are subscribed to a topic in the Google Groups "Fedora Tech" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fedora-tech/kMVBaX5kViU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fedora-tech...@googlegroups.com.

To post to this group, send email to fedor...@googlegroups.com.
Visit this group at https://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at https://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at https://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.

Benjamin Pennell

unread,
Jan 10, 2018, 3:52:02 PM1/10/18
to Fedora Tech
Here is the PR if anyone is interested.  There are definitely decisions that could use outside opinions:

Aaron Birkland

unread,
Jan 10, 2018, 5:05:14 PM1/10/18
to fedor...@googlegroups.com

Hi all,

 

A little worried that too much information is being encoded in the external body mime type, it gives the impression of hack-upon-hack-upon-hack.  For me, things started to get a little uncomfortable with using the ‘expires’ header to mean “the server MUST copy this”; now the concern is encoding mime types in mime types.  It can be done if the spec is explicit and clear about it, but it makes me want to think if a different approach might be more straightforward.

 

I wonder it’s possible to make things a bit more manageable (and explicit) by defining a number of _Prefer_ headers to specify a client’s intent (which the server may or may not obey, but has the Preference-Applied mechanism by which to communicate what it ended up doing).  I think such an approach might be a more flexible going forward for implementations that would _like_ to approach external content via redirects where it makes sense, or for implementations that only do proxying, etc.

 

Imagine if a request could use one of these Prefer headers (or something like them):

 

Prefer: external-content; method=”copy”

Prefer: external-content; method=”proxy”

Prefer: external-content; method=”redirect”

 

.. then used the Content-Location header to provide the external content location, and Content-Type to specify *its* content type.   

 

So the meaning of the prefer header is something like “Please handle this as external content using the requested mechanism”.

 

So something like:

 

POST /path/to/container

Content-Type: video/mpeg

Prefer: external-content; method=”proxy”

Content-Location: file:/path/to/file

 

Then the response:

HTTP/1.1 201 Created

Preference-Applied: external-content; method=”proxy”

Content-Type: video/mpeg

Content-Location: file:/path/to/file

Location: http://my.server.example.org/path/to/container/abc123

 

.. or let’s say we wanted to ingest-by-reference

 

POST /path/to/container

Content-Type: video/mpeg

Prefer: external-content; method=”copy”

Content-Location: http://example.org/videos/funny_cat_video.mpg

 

The server could suitably respond

HTTP/1.1 202 Accepted

Preference-Applied: external-content; method=”copy”

Location: http://my.server.example.org/path/to/container/abc123

 

(meaning “yes, you asked me to copy.  I’m attempting it, but have no idea if it’ll be successful or how long it’ll take.  Check the messaging bus for indication of success”)

 

I don’t think the “access-type” part of the external body spec brings much to the table; the server would use the URI scheme of the provided Content-Location to know how to dereference it.

 

Would exploring this sort of approach further be helpful?

 

  -Aaron

Benjamin Armintor

unread,
Jan 11, 2018, 8:54:20 AM1/11/18
to fedor...@googlegroups.com
Aaron,

A lot of what you're suggesting went through proposal and commentary in the process towards the candidate spec. Here are some things people pointed out (I'm trying to summarize a lot of back and forth, please forgive terseness):

1. The expires parameter is ambiguous on mutating requests, so we stopped referring to/relying on it.
2. No matter how much we wanted it to, Content-Location seemed specifically defined not to act as a pointer in lieu of the entity (whereas message/external-body specifically is); we also discussed using a Location header on PUT/POST but if I remember correctly rejected as being an undefined use of a specified header.
3. Content-Location on PUT/POST is also undefined, and we tried to minimize unspecified behaviors in the spec
4. see also the Prefer headers
5. the access-type part of external-body only matters in the understanding of where the identified location is resolvable, and there was an earlier proposal to use the behavior Ben P discusses (local-file means proxy, url means redirect) but not a consensus around it, so it too was rejected

I think for my part not specifying whether the content is available as redirect or proxy in the spec is probably a good idea, and including Content-Location in HEAD requests is a sufficient reproduction of the old FCR 3 external datastream behaviors for me, too- but I know you had some concerns about this part of the spec that you wanted to resurface, a while back?

- Ben

To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech+unsubscribe@googlegroups.com.

To post to this group, send email to fedor...@googlegroups.com.
Visit this group at https://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech+unsubscribe@googlegroups.com.

Aaron Birkland

unread,
Jan 11, 2018, 10:25:05 AM1/11/18
to fedor...@googlegroups.com
Hi Ben,  

Thanks for the summary.  External content itself is an unusual/nonstandard concept, and trying to make it work in an API is a really hard problem.  I appreciate the effort of the spec team and those that have contributed to conversation.  That being said, I don't think that it has really 'landed' yet (see the desire for expressing the content type of the external resource, for example), and we as a community are still trying to understand what it means to implement this part of the spec, and use it in clients.  

With regard to a summary of past discussion on this issue:
 
1. The expires parameter is ambiguous on mutating requests, so we stopped referring to/relying on it.

Section 3.9.2 of the spec currently states that the 'expiration' parameter determines whether the server must proxy or copy?

See also:

 An example would help greatly; my reading of the spec suggests that it is used in mutating requests, but maybe I'm not understanding something.
 
2. No matter how much we wanted it to, Content-Location seemed specifically defined not to act as a pointer in lieu of the entity (whereas message/external-body specifically is); we also discussed using a Location header on PUT/POST but if I remember correctly rejected as being an undefined use of a specified header.

Right - BUT I don't think the spec is really using the 'pointer in lieu of the entity', except in the case of copying content (when there is an expiration parameter).   In other words, there really is only one case where the server behaves "as if the entity were included in the request".  

In any case, I *do* believe that Content-Location in a request is fine, but only when coupled with Prefer.  

Quoth RFC7231:
A user agent that sends Content-Location in a request message is stating that its value refers to where the user agent originally obtained the content of the enclosed representation (prior to any modifications made by that user agent).

An origin server that receives a Content-Location field in a request message MUST treat the information as transitory request context rather than as metadata to be saved verbatim as part of the representation. An origin server MAY use that context to guide in processing the request or to save it for other uses, such as within source links or versioning metadata. However, an origin server MUST NOT use such context information to alter the request semantics.

So that's all good.  Content-Location is not altering request semantics.  Upon a PUT or POST containing an empty entity body, it'll create the (empty) corresponding LDP resource as usual, and can ignore Content-Location for all we're concerned.

Now comes the Prefer header, which is specifically intended for a client "to request that certain behaviours be employed by a server while processing a request" (which the server may ignore, or satisfy)

I see a preference that means "use Content-Location in this particular way" to be permissible.  The presence of Content-Location isn't altering the semantics of the request, the preference stated in the Prefer header is.  And the server can ignore it (and the client can tell it ignored it).  That really sounds nice to me , which is why I'm making a last ditch effort to consider stepping back again, with more experience and perspectives under our collective belts, and think about it.
 
3. Content-Location on PUT/POST is also undefined, and we tried to minimize unspecified behaviors in the spec

Correct, by definition.. but the presence of Prefer headers does define what the client wants.
 
4. see also the Prefer headers

Prefer headers are an extensible mechanism for a client to express its intent.  It's natural that there are no preferences related to external content in the IANA registry - it is an unusual use case.  But we *do* have a use case, and a compelling case for an "external content" preference.   This sounds exactly what the Prefer header is supposed to do.  

 
I think for my part not specifying whether the content is available as redirect or proxy in the spec is probably a good idea, 

As a client, if I'm putting external content in the repository I'd want to specify what I wanted the server to do with that external content (ingest by reference?  redirect?  proxy?).  Comparing to Fedora 3, I'd see this preference as kind of analogous to the datastream control group (E, M, R, X).  Back in the Fedora 3 days, a client would ingest foxml, which explicitly states how datastreams are to be managed.  A Prefer header would be an extensible mechanism of doing that.  There may be more ways of managing external content than the spec imagines, and I think a Prefer header is the most flexible and straightforward way to express this concept, and a repository's ability (or non-ability) to comply.  

  -Aaron 

Esmé Cowles

unread,
Jan 11, 2018, 11:27:56 AM1/11/18
to fedor...@googlegroups.com
Aaron-

Regarding Content-Location in particular, I agree that it seems like an attractive option, and we discussed it at length. But I think the RFC is very clear that it can only be an identifier for the body, not a replacement for it.

https://tools.ietf.org/html/rfc7231#section-3.1.4.2:
> The "Content-Location" header field references a URI that can be used
> as an identifier for a specific resource corresponding to the
> representation in this message's payload

So, I really don't think we can use it as a pointer to content instead of including it in the PUT/POST body.

More broadly, I'd say that nobody (AFAIK) is particularly happy with the message/external-body approach. But it's the only mechanism for specifically referring to remote content instead of including it in a message that we've been able to find.

-Esmé

Joshua Allan Westgard

unread,
Jan 11, 2018, 4:08:05 PM1/11/18
to Fedora Tech, Aaron Birkland, esco...@princeton.edu, armi...@gmail.com, Andrew Woods, bbpe...@email.unc.edu, Bethany Seeger, Daniel Lamb, Elliot Metsger, Peter Eichman, Este Pope, Stefano Cossu, Aaron Coburn
Aaron and all,

For what it is worth, our specific use-case for raising this topic is that we are looking to move from an on-premises Fedora hosted on filesystem storage in our datacenter to cloud-object hosted storage.

As we make this transition, it seemed like some period of mixed storage (some local, some cloud) was going to be inevitable, and one way to manage that would be to use proxied content stored elsewhere. So yes there is some sense in which we are trying to centralize decentralized content, but this "undecentralization" is not really something we are trying to intrude into the LDP universe (if that makes sense), but rather is more of a backend storage concern.

Perhaps that is not really the intent of the external content feature, but I think the mixed storage use-case is reasonable and not likely to be an uncommon need, and given that the REST API is really the sole sanctioned method of interacting with the system, this seems like the logical way to accomplish this. I'd be happy to know if there are other recommended ways to achieve this.

Josh


> On Jan 11, 2018, at 15:20, Aaron Coburn <aco...@amherst.edu> wrote:
>
> Note: I am sending this off-list (as part of this topic: https://groups.google.com/d/topic/fedora-tech/kMVBaX5kViU/discussion) because my messages to fedora-tech get moderated, which to me is a strong disincentive to post to that forum at all. -A
>
>
> Hi,
>
> I generally think about external content in the context of "decentralization",
> which is a pretty hot topic these days in linked data systems.
>
> Curiously, the mechanism proposed by the Fedora API, from what I can tell, is
> precisely the opposite of "decentralization": it takes content that is already,
> in a sense, decentralized and makes it appear centralized.
>
> Is this really the intention? And if so, is that really such a good idea?
>
> I realize that external content has historically been a part of Fedora systems,
> but it also is a usage pattern that originated at a time when applications and
> systems were designed around stronger notions of centralization.
>
> On a more practical level, the Fedora specification requires two parts of the
> external-body spec to be implemented: local-file and URL. It is worth noting
> that "local-file" implies the existence of a local filesystem, which in cloud
> and distributed implementations makes very little sense. That leaves the URL
> portion of this, but (basically) the same mechanism already exists natively with
> LDP: links. To me, this raises the question: why specify a non-standard behavior
> that can already be accomplished with existing mechanisms within LDP?
>
> Yes, Fedora has historically supported such externally managed content,
> and this does seem to be the closest specification for defining that behavior, but
> it is an uncomfortable match at best. In fact, if you look into who else uses this
> external content mechanism, you'll find that it's used by folks who implement
> email clients. It feels a lot like the Fedora spec is "forcing" the external-body
> mechanism into an LDP context (round peg in a square hole).
>
> There are some other issues with this that haven't (AFAIK) been discussed.
>
> 1. If a client can define arbitrary URLs that act as proxies, what happens if
> a client defines the target URL itself (i.e. a self-referential reference).
> Depending on how a server handles these requests, one would have to ensure that
> a client can't create a recursive HTTP request loop. On a self-reference, this
> is easy to control for. It becomes harder if two or more nodes are used to
> create this loop: /A proxies content from /B which proxies content from /C which
> proxies content from /A. If proxies are able to refer to other repository resources,
> then an implementation would need to be _very_ careful about how this is handled.
> And even if there is special handling for in-domain resources, unless a server
> is willing to do full DNS resolution of hostnames, it isn't always so easy to
> tell if a given host is "local". At the very least, this is easy to circumvent.
> Plus, providing support for an in-domain URL-based message/external-body request
> is an explicit recommendation (per Fedora 4.5.3).
>
> 2. If a client provides an external URL and that URL responds in such a way that
> an HTTP connection is held open for an extended period of time (e.g. hours or days),
> how hard would it be for a small number of clients to exhaust the server's internal HTTP
> connection pool? Again, this condition is not hard to create -- in fact, only a
> few lines of code in Node.js or Erlang. Once the server's internal connection pool
> is exhausted, what then? That seems like a DDOS is way too easy.
>
> One counter argument to this is: our Fedora runs inside our firewall and we control
> which clients interact with it. That is a fine and sensible approach. But does that
> mean that it is _generally not advisable_ to ever put a Fedora server directly on the web?
>
> Because if the server is running directly on the web, these sorts of conditions
> will need to be considered.
>
> In my work attempting to implement the Fedora API specification (Trellis), I
> have simply ignored this portion of the spec -- which makes Trellis non conforming
> by definition, but I have (I think) good reasons not to implement this.
>
> Aaron C
>
>

Stefano Cossu

unread,
Jan 11, 2018, 4:27:56 PM1/11/18
to Joshua Allan Westgard, Fedora Tech, Aaron Birkland, esco...@princeton.edu, armi...@gmail.com, Andrew Woods, bbpe...@email.unc.edu, Bethany Seeger, Daniel Lamb, Elliot Metsger, Peter Eichman, Este Pope, Aaron Coburn, Kevin Ford
Thanks, this is an interesting discussion. Shall we bring it to the
fedora-tech list? We should be able to bypass moderation.

Looping in Kevin F. from the AIC team.

Best,
Stefano
--
Stefano Cossu
Director of Application Services, Collections

The Art Institute of Chicago
116 S. Michigan Ave.
Chicago, IL 60603
312-499-4026

Simeon Warner

unread,
Jan 12, 2018, 7:15:29 AM1/12/18
to fedor...@googlegroups.com
Just to pile on here, I too really wanted to use Content-Location but by
my reading of the HTTP specs we would be redefining its meaning if we did.

Cheers,
Simeon

Joshua Allan Westgard

unread,
Jan 12, 2018, 3:08:36 PM1/12/18
to Fedora Tech, Benjamin Armintor, Peter Matthew Eichman, Esme Cowles, Aaron Birkland, Andrew Woods, Ben Pennell, Bethany Seeger, Daniel Lamb, Elliot Metsger, Este Pope, Stefano Cossu, Kevin Ford, Aaron Coburn
To Aaron's point about taking a wait-and-see approach on this, for what it is worth the compromise that was arrived at with respect to transactions (that those behaviors became their own spec) has worked fine for us. We continue to make heavy use of the feature of the current Modeshape implementation, but recognize that not every future Fedora necessarily will be able to implement them, nor should that be a requirement for compliance. Perhaps a similar compromise could be reached with external content.

But that having been said, I think it important to keep reminding ourselves -- and honestly, I don't think anyone is in fact suggesting otherwise in this thread -- that the API spec, while it may be necessary, is not by itself sufficient for building a repository system that is going to meet all the requirements of many of the project stakeholders. There are a host of other decisions that need to be made in order to create a useful repository, and it seems that there should be scope in the context of this community for working together to solve some of these problems even when the approaches taken might not gain acceptance in the API spec. I'm not sure precisely what shape such efforts should take, since the more out-of-spec features get implemented and used, the further away we will move from the vision of any sort of interoperability between Fedora implementations. It is increasingly clear to us, however, that we will need to consider other (out-of-spec) approaches to certain problems, in order to achieve the sort of scale we need.

If possible I would like those solutions to be community developed because I think there is a lot of collective wisdom in this community, but perhaps local circumstances and preferences are just too disparate. Also, I can certainly see where a community reference implementation that includes a lot of features not in the spec could create other problems.

Josh

> On Jan 12, 2018, at 13:40, Aaron Coburn <aco...@amherst.edu> wrote:
>
> Ben,
>
> I really appreciate the additional context describing Columbia's use case.
>
> To be clear, I am not questioning Columbia's particular needs for external content representations in a Fedora server. I am asking whether the semantics of "external content" is a feature sine qua non of Fedora. That is: is Columbia's use case everyone's use case?
>
> And pursuant to Esme's point that the Fedora API does not technically require support for "URL" or "local-file" access-types, that is quite true. But that is orthogonal to the point I'm making. The point I am making is that the Fedora specification is giving special significance to "message/external-body" HTTP entities and doing so in ways that are neither well understood nor widely used.
>
> Furthermore, if one were to imagine the "message/external-body" sections of the Fedora specification to be entirely removed, there would be nothing stopping Cavendish or any other HTTP server from supporting such semantics.
>
> I have not been the only one to raise concerns about "message/external-body". Nor do the editors seem entirely at ease with it. So why is it still given such significance in the specification? Why not describe it in an external wiki page -- leaving it out of the spec -- and see what happens over the next few years? Given the lack of experience that exists with these semantics (not just within the Fedora community but among web developers in general), why insist on keeping this feature in the specification at all?
>
> I think that most parts of the specification are solid and well thought out, but this is a weak point. In my opinion, by keeping it in the specification, you will weaken the entire specification.
>
> Aaron C
>
>
>
> On 1/11/18, 5:09 PM, "Benjamin Armintor" <armi...@gmail.com> wrote:
>
> We've had to date similar concerns at Columbia, but come to different conclusions: we also use external storage, take fedora off the Web, restrict access to the application servers, which handle the requisite authorization, and require the application
> servers to negotiate the storage service.
>
>
> For us, the repository is a system of information management, and our Fedora is not generally used to manage storage (it coordinates the output of a digital archivist and archivematica, etc.). We also (in the case of media) rarely refer to the large files
> again after they've been derived into streamable forms, and if we did would again not want them (in all likelihood) arduously proxied through Fedora.
>
>
> I bring this up to say that very similar concerns can lead to very different practices, and a circumspect specification allows the implementations to sort this kind of thing out for their more specific audiences.
>
>
> Another example: I share many of Aaron C's concerns, but concluded that an approach of universal redirection was the right way to go with Cavendish. I feel like I should hasten to add that we tried to be careful about not making support for external content
> required with implementations like Trellis in mind, and for making that support detectable from client interactions (advertisement in Accept-Post, 415 if not supported (which I'd expect to generally be true regardless of whether the unsupported Content-Type
> is external-body)).
>
>
> - Ben
>
>
>
>
>
>
> On Thu, Jan 11, 2018 at 4:34 PM, Peter Matthew Eichman
> <peic...@umd.edu> wrote:
>
> In our (UMD) use case, we foresee having large amounts of A/V resources in our repository, but we would like to not be bottlenecked by being forced to ingest large binaries via HTTP onto the filesystem (or other "internal" storage backend). At
> the same time, though, we do not want to expose the external storage directly, since that would complicate access control (i.e., the binary storage system would need to have some way of checking for authorized access at the per-resource level, rather than
> just being locked down to just respond to requests from the repository server, which has already done the requisite authorization.)
>
>
> -Peter
>
>
> On Thu, Jan 11, 2018 at 4:00 PM, Esme Cowles
> <esco...@princeton.edu> wrote:
>
> Aaron-
>
> I think your posts to fedora-tech are moderated because you're not subscribed.
>
> And, yes, I think your characterization is right: the point of the external content pattern is to have content stored separately (either because of storage architecture, or integration with filesystem-based workflows) but still appear centralized, so it integrates
> better with tools using Fedora.
>
> For what it's worth,
>
>> On a more practical level, the Fedora specification requires two parts of the external-body spec to be implemented: local-file and URL.
>
> Isn't accurate. Support for access-type=url is SHOULD, and local-file is only mentioned in a non-normative note.
>
> -Esmé
>
>> On Jan 11, 2018, at 3:20 PM, Aaron Coburn <aco...@amherst.edu> wrote:
>>
>> Note: I am sending this off-list (as part of this topic:
> https://groups.google.com/d/topic/fedora-tech/kMVBaX5kViU/discussion <https://groups.google.com/d/topic/fedora-tech/kMVBaX5kViU/discussion>) because my messages to fedora-tech get moderated, which to me is a strong disincentive to post to that forum at all. -A

Aaron Birkland

unread,
Jan 12, 2018, 3:59:50 PM1/12/18
to fedor...@googlegroups.com, Benjamin Armintor, Peter Matthew Eichman, Esme Cowles, Andrew Woods, Ben Pennell, Bethany Seeger, Daniel Lamb, Elliot Metsger, Este Pope, Stefano Cossu, Kevin Ford, Aaron Coburn

Hi all,

 

In general, I’m a fan of placing specialized, controversial, or new/untested aspects of the spec into ‘sidecar’ specs (like transactions, external content) and see where things land.  At the very least, this means that behaviour is written down in a public place for reference, comment, and evolution.  Some day, it might make sense for these sidecar specs to become an essential part of Fedora; other times they may die.  Done this way, there’s at least a chance that another implementation will align on the same sidecar spec (or maybe somebody writes an API-X extension to implement a sidecar spec).  Maybe calling them RFCs (and numbering them) is better.   

 

Another thing to consider is that the spec aims to be released once there are *two* implementations.  For one reason or another several implementations have no specific intention of complying with the Fedora spec fully:

 

 

It would be great if these were all fedoras too; in my mind they basically are. Exploring different aspects of linked data/repository space from different angles is a good thing.  Although specs are all about technicalities, it would be a shame that these aren’t fedoras *because* of a technicality.  The work by the spec editors has been great, maybe shifting things around a bit and making the core a little bit leaner is all we need.  A dialog between the various implementers of fedoras and the spec team (maybe just a thread on fcrepo-tech, or the spec GitHub repo, or whatever) with the goal of negotiating and landing N compliant implementations would be fruitful?

 

Too bad it’s so close to the OR deadline – a panel of fedora implementers (focusing on different priorities, lessons learned) would be a great proposal .

 

  -Aaron

--

Andrew Woods

unread,
Jan 15, 2018, 1:04:41 PM1/15/18
to fedor...@googlegroups.com, Aaron Birkland, Benjamin Armintor, Peter Matthew Eichman, Esme Cowles, Ben Pennell, Bethany Seeger, Daniel Lamb, Elliot Metsger, Este Pope, Stefano Cossu, Kevin Ford, Aaron Coburn, fedora-specifi...@googlegroups.com
Hello All,
Thanks for continuing to provide input into the Fedora API Specification. We are seeking refinement and polish as we transition from Candidate Recommendation to full Recommendation.

As it relates to the suggestions in this thread around 'message/external-body', it seems like a rehashing of a topic that has been previously discussed in detail:

The compromise on which we landed is to:
1. Make the feature optional.
"In the case that a Fedora server does not support external LDP-NR content, all message/external-body messages must be rejected with 415 (Unsupported Media Type)." - https://fedora.info/2017/11/27/spec/#external-content

2. Provide guidance for consistency of approach across implementations in support of the goal to minimize the need for modifications to client applications in order to work with different implementations of the Fedora API Specification.

Fundamentally, this thread once again reflects a difference in philosophy on the role of the Fedora Specification.

The Fedora API Specification Charter [1] lays out the following three goals:
1. Define the characteristics and expectations of how clients interact with Fedora implementations
2. Define such interactions such that an implementation’s conformance is testable
3. Enable interoperability by striving to minimize the need for modifications to client applications in order to work with different implementations of the Fedora API specification

The specification editors have and continue to seek a balance between enabling both innovation in implementation as well as guidance towards interoperability.

The present sections in the specification are very intentional.

Part of the validation process for the specification is to have multiple implementations. If we find that certain aspects of the specification are unreasonable to implement, that is good reason to revisit the appropriateness of a given requirement. In the meantime, the specification continues to be guided by the goals above.

Regards,
Andrew

To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech+unsubscribe@googlegroups.com.

Aaron Coburn

unread,
Jan 15, 2018, 4:44:21 PM1/15/18
to Andrew Woods, fedor...@googlegroups.com, Aaron Birkland, Benjamin Armintor, Peter Matthew Eichman, Esme Cowles, Ben Pennell, Bethany Seeger, Daniel Lamb, Elliot Metsger, Este Pope, Stefano Cossu, Kevin Ford, fedora-specifi...@googlegroups.com
Hello,

The compromise on which we landed is to:
1. Make the feature optional.
"In the case that a Fedora server does not support external LDP-NR content, all message/external-body messages must be rejected with 415 (Unsupported Media Type)." - https://fedora.info/2017/11/27/spec/#external-content

I disagree that the feature is optional. It is still *required* that an implementation provide special handling for the "message/external-body" type -- even if that special handling is to throw an exception. Nor have I (in any of these conversations) heard any response to the very real performance and security concerns I have about the semantics of this feature. That strikes me as problematic.

2. Provide guidance for consistency of approach across implementations in support of the goal to minimize the need for modifications to client applications in order to work with different implementations of the Fedora API Specification.

I would frame the discussion differently. To me, the key question is: what are the essential elements of a Fedora repository? According to the specification, every MUST statement is, by definition, a fundamental feature of Fedora.

Part of the validation process for the specification is to have multiple implementations. If we find that certain aspects of the specification are unreasonable to implement, that is good reason to revisit the appropriateness of a given requirement. In the meantime, the specification continues to be guided by the goals above.

Here I can speak as someone who has attempted to implement the Fedora specification (Trellis). I have encountered areas that are well thought out. I have also encountered areas that are problematic. My question is: are the editors interested in working with the people who have worked on implementations and/or who express concerns about aspects of the specification? In principle, having more conforming implementations would seem to be in everyone's best interest.


As some background to those who haven't followed the details of the specification effort:

At a high level, the Fedora specification requires conformance with five existing specifications: LDP, Memento, Instance Digests, WebAC and Activity Streams. This, I think, is a very solid foundation. It is, in fact, what Trellis does: Trellis conforms to all of these specifications.

Beyond this, there are many additional requirements in the Fedora specification, and they can be placed into three broad categories.


1. Strengthening. (I.e. requiring optional features from an underlying spec)

For example, PUT and PATCH are optional features of LDP, but Fedora requires them. All of these requirements are implemented in Trellis. These requirements do eliminate the possibility of a read-only Fedora repository, but I don't see anything fundamentally wrong with these requirements. This class of requirements also builds on existing interaction patterns defined in the underlying specifications.


2. Narrowing. (I.e. requiring one particular pattern when many are possible)

One feature of these underlying specifications is that they provide flexibility for implementations while retaining a semantically consistent model for interaction. An example of this is the Memento Datetime negotiation. The Memento specification describes several semantically equivalent patterns for discovering a Memento resource from an Original Resource (section 4 -- there are eight patterns described). But Fedora requires that one particular pattern MUST be implemented. It so happens that this pattern is the same one that Trellis implements, and while one can make an argument as to why that particular pattern makes sense for a repository, it also eliminates other possibilities. In particular, imagine the case where your data outlives your software (in my experience, this is pretty common): in this case, it may make good sense to separate your version management system from the repository resources. However, that implementation pattern is simply not possible with the Fedora specification, and there is no explanation for why such a pattern is required. In my mind, this category of requirements isn't exactly problematic, but it is questionable.


3. Inventing. (I.e. creating new behaviors or semantics)

These are requirements where there is no precedent, no prior art and no clear justification. There are quite a few requirements in this category, and in writing Trellis, I ignore almost all of these. This is the category that I find explicitly problematic. Here are a few examples:

a) The definition of an interaction pattern for defining ACL locations upon resource creation via link headers. There is no released software anywhere that does this. And there are problems with this, since it allows one to completely circumvent the acl:Control mode of an ACL resource, thereby escalating a user's permission such that they can become an administrator for a segment of the repository (this is a *serious* security hole that the Fedora spec opens up).

b) The Fedora specification requires this header for TimeGate resources: 

    Link: <http://mementoweb.org/ns#TimeGate>; rel="type"

It is worth noting that there is already a mechanism for finding TimeGate resources by following this header on any OriginalResource:

    Link: <TimeGate URL>; rel="timegate"

and then noting that the linked resource (whether it is the same as the OriginalResource or not) contains a "Vary: Accept-Datetime" header. So why is this new header a requirement? And wouldn't such a feature encourage clients to bypass the already well-defined Datetime interaction model of Memento?

I see this category as a new type of "Fedora-ism": something that will need to be written specially for Fedora clients and which will likely exist nowhere outside of the Fedora community. I wonder why these are strict requirements (MUST) and not optional features (MAY). As optional features, they can be given time to mature in production settings before they move into hard requirements of all Fedora implementations.


Given that Trellis already doesn't conform to the Fedora specification (because of the requirements that are in conflict with a distributed implementation), there is no incentive for me to implement any of the requirements in the third category above. I would note, however, that it would be fairly easy to write an API-X extension or even a JAX-RS filter that would convert a Trellis-based server into a conforming Fedora server (modulo the requirements that the repository be strongly consistent and atomic). Is there a middle ground on these features? Yes, of course, but as it is, I have not heard a clear rationale for any of these ("Inventing category") features.

Regards,

Aaron Coburn

signature.asc

Esme Cowles

unread,
Jan 15, 2018, 5:16:00 PM1/15/18
to fedor...@googlegroups.com
On Jan 15, 2018, at 4:00 PM, Aaron Coburn <aco...@amherst.edu> wrote:
>
> Hello,
>
>> The compromise on which we landed is to:
>> 1. Make the feature optional.
>> "In the case that a Fedora server does not support external LDP-NR content, all message/external-body messages must be rejected with 415 (Unsupported Media Type)." - https://fedora.info/2017/11/27/spec/#external-content
>
> I disagree that the feature is optional. It is still *required* that an implementation provide special handling for the "message/external-body" type -- even if that special handling is to throw an exception.

This is a compromise position — support for the feature is required, but an implementation could simply respond with a 415 Unsupported Media Type for any message/external-body content type. So in essence, implementing support for external content is optional, but properly notifying the client that their request hasn't been complied with is required.

It is entirely fine with me if an implementation doesn't need external content, and doesn't implement it. But it's a very common use case in the repositories I'm familiar with (about as common as access control or versioning) that I think it's important to not silently ignore clients attempts to create external content (which would probably result in creating unwanted 0-byte LDP-NRs).

> Nor have I (in any of these conversations) heard any response to the very real performance and security concerns I have about the semantics of this feature. That strikes me as problematic.

I'd welcome a PR to add those concerns to the security section of the spec (https://fcrepo.github.io/fcrepo-specification/#security). In general, I would expect an implementation to handle the issues you've raised in this area the same way as any other DOS attack: enforce access controls first, limit the number of threads/buffers/etc. you would load, and respond with 503 Service Unavailable if those limits were exceeded. Other limits (such as limiting the domains allowed for external content) might be appropriate.

>> 2. Provide guidance for consistency of approach across implementations in support of the goal to minimize the need for modifications to client applications in order to work with different implementations of the Fedora API Specification.
>
> I would frame the discussion differently. To me, the key question is: what are the essential elements of a Fedora repository? According to the specification, every MUST statement is, by definition, a fundamental feature of Fedora.

Right, and the MUST statements here are:
* MUST advertise supported access-type values
* MUST respond with 415 if you can't handle the specific message/external-body header
* MUST include appropriate headers in HEAD/GET responses for external content LDP-NRs

In brief, you have to provide reasonable external content support, or at the very least fail with reasonable status codes if you don't.

>> Part of the validation process for the specification is to have multiple implementations. If we find that certain aspects of the specification are unreasonable to implement, that is good reason to revisit the appropriateness of a given requirement. In the meantime, the specification continues to be guided by the goals above.
>
> Here I can speak as someone who has attempted to implement the Fedora specification (Trellis). I have encountered areas that are well thought out. I have also encountered areas that are problematic. My question is: are the editors interested in working with the people who have worked on implementations and/or who express concerns about aspects of the specification? In principle, having more conforming implementations would seem to be in everyone's best interest.

We're very interested in working with implementers. And I think a fair review of the last 6 months would show us to have been very open to suggestions, and working very hard to accommodate your position.

I think this is an excellent example: I would much rather just require support for at least access-type=url. But in an effort to accommodate your position, we haven't done that. We've made it possible to completely avoid implementing support for external content, and only require that reasonable responses are given to clients when an implementation doesn't support external content.


[...]
> 3. Inventing. (I.e. creating new behaviors or semantics)
>
>
> These are requirements where there is no precedent, no prior art and no clear justification. There are quite a few requirements in this category, and in writing Trellis, I ignore almost all of these. This is the category that I find explicitly problematic. Here are a few examples:
>
>
> a) The definition of an interaction pattern for defining ACL locations upon resource creation via link headers. There is no released software anywhere that does this. And there are problems with this, since it allows one to completely circumvent the acl:Control mode of an ACL resource, thereby escalating a user's permission such that they can become an administrator for a segment of the repository (this is a *serious* security hole that the Fedora spec opens up).

Except that implementations are free to refuse such requests with an appropriate 4xx or 5xx response code. In theory, an implementation could refuse any request which requests using a specific ACL. But I would hope that implementations would support this feature, which is intended to allow large numbers of resources to governed by a small number of ACLs.

> b) The Fedora specification requires this header for TimeGate resources:
>
> Link: <http://mementoweb.org/ns#TimeGate>; rel="type"
>
> It is worth noting that there is already a mechanism for finding TimeGate resources by following this header on any OriginalResource:
>
>
> Link: <TimeGate URL>; rel="timegate"

Indeed, the purpose of this header isn't to find the TimeGate — it's to tell the client that the current resource is the TimeGate. This is the same pattern that LDP uses elsewhere, and we're are simply specifying similar behavior here, so that Memento resources behave like other LDP resources.

> and then noting that the linked resource (whether it is the same as the OriginalResource or not) contains a "Vary: Accept-Datetime" header. So why is this new header a requirement? And wouldn't such a feature encourage clients to bypass the already well-defined Datetime interaction model of Memento?

Again, we are specifying that the Vary header advertise when you can use the Memento Accept-Datetime header — as with other LDP requirements for advertising support for other kinds of features (like LDP's requirements for Accept-Post and Accept-Patch advertisement).

> I see this category as a new type of "Fedora-ism": something that will need to be written specially for Fedora clients and which will likely exist nowhere outside of the Fedora community. I wonder why these are strict requirements (MUST) and not optional features (MAY). As optional features, they can be given time to mature in production settings before they move into hard requirements of all Fedora implementations.
>
>
>
>
> Given that Trellis already doesn't conform to the Fedora specification (because of the requirements that are in conflict with a distributed implementation), there is no incentive for me to implement any of the requirements in the third category above. I would note, however, that it would be fairly easy to write an API-X extension or even a JAX-RS filter that would convert a Trellis-based server into a conforming Fedora server (modulo the requirements that the repository be strongly consistent and atomic). Is there a middle ground on these features? Yes, of course, but as it is, I have not heard a clear rationale for any of these ("Inventing category") features.
>
> Regards,
>
> Aaron Coburn

The underlying principle at work in all of these instances is giving clients a consistent experience — features are advertised when they are available, and when a feature isn't available, the server responds with a reasonable failure status codes. The goal here is interoperability, so clients can look for features that are advertised, and response codes to requests, and adapt to differences in implementations. As much as possible, we hope clients won't need to sniff server headers to figure out which implementation they are working with and trigger custom behavior.


-Esmé

Benjamin Pennell

unread,
Jan 21, 2018, 12:47:56 PM1/21/18
to fedor...@googlegroups.com, aco...@amherst.edu
Thank you all for the discussion on this topic, I hadn't realized how involved the issue was.  A lot of valuable ideas came up in this discussion, and this is a feature that my institution is very interested in having present in 5.0.

While I don't have anything useful to add to the Content-Location conversation, the suggestion to use a Prefer header to specify copy/proxy/redirect behavior seems valuable regardless.  It addresses the need of the original question in this thread, while being easily ignorable by non-implementing servers.

I worry that I'm missing something or oversimplifying, but it seems like:

Content-Type: message/external-body; access-type="url"; url="file:/..."
Prefer: external-content; method="proxy"

would effectively provide the same functionality as access-type="local-file", in addition to filling in how specify an http url should be proxied and making copying more explicit.  So I wonder if it would make sense to propose just supporting URL access types, and being flexible about the protocols accepted?

I am still interested in working out how to specify the mimetype of the external body.  The approach in my PR of adding a mime-type subfield to message/external-body content types simply seemed in line with the API definition I started from, and multiple Content-Type headers caused issues.  'Prefer: external-content; mimetype="text/plain"' seems like an alternative, and feels slightly less circular than nesting a mimetype in a mimetype.  It would still need to be translated into some other structure in fedora though, unlike storing it in the message/external-body type, and having multiple ebucore:hasMimeType fields could be problematic.  I'm interested to know what others feel would be valid approaches, or how to decide this out?

Also, I generally support the idea of having a few common extension specifications, such as for handling external bodies in an fcrepo server.  Particularly if written with a mind towards graceful degradation when migrating from an implementation that supports it to one that doesn't.  I don't have a strong opinion on this, and others have already made arguments, but it seems more flexible in an environment where many flavors of fedora are expected.

Anyways, thanks again for the discussion so far, I feel like progress is being made and I'm interested in seeing how we can work this out in our multi-implementation future.


--
You received this message because you are subscribed to a topic in the Google Groups "Fedora Tech" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fedora-tech/kMVBaX5kViU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fedora-tech+unsubscribe@googlegroups.com.

Esmé Cowles

unread,
Jan 22, 2018, 8:10:22 AM1/22/18
to fedor...@googlegroups.com
Ben-

I like the idea of using a Prefer header to indicate whether you'd like the content to be ingested, proxied, or redirected. This seems like it would be more explicit than using the expires parameter (which could be used for other purposes).

We should think through the implications of different Prefer headers on the creation of the external resource and subsequent retrieval. For example, what if you do a POST with a Prefer header requesting that the content by ingested, and then do a GET asking for it to be proxied?

-Esmé
> To unsubscribe from this group and all its topics, send an email to fedora-tech...@googlegroups.com.
> To post to this group, send email to fedor...@googlegroups.com.
> Visit this group at https://groups.google.com/group/fedora-tech.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.

Benjamin Pennell

unread,
Jan 22, 2018, 9:59:17 AM1/22/18
to fedor...@googlegroups.com
Esme,
I had been thinking of the header primarily in the context of POST/PUT, in part since 'copy' seems like an ingest only request.  There are combinations of protocols and this prefer header that could be problematic on POST/PUT too though, like redirect on a file uri would generally not work out with http clients.  I don't have a use case locally for picking proxy/redirect at GET request time versus at ingest time, but I am open to that if others do.

> To unsubscribe from this group and all its topics, send an email to fedora-tech+unsubscribe@googlegroups.com.

> To post to this group, send email to fedor...@googlegroups.com.
> Visit this group at https://groups.google.com/group/fedora-tech.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech+unsubscribe@googlegroups.com.
> To post to this group, send email to fedor...@googlegroups.com.
> Visit this group at https://groups.google.com/group/fedora-tech.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Fedora Tech" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fedora-tech/kMVBaX5kViU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fedora-tech+unsubscribe@googlegroups.com.

Benjamin Armintor

unread,
Jan 22, 2018, 1:17:36 PM1/22/18
to fedor...@googlegroups.com
Ben P,

For what it's worth, I think a file URI + redirect behavior is fine, or at least within the bounds of "trouble you can choose to get into if you can create resources".

The Prefer header also makes more sense to me in the context of this thread as a POST/PUT request header - but that assumes the Content-Location responses.

If you and Esmé want to pursue that approach further, I would really encourage trying to think of a Prefer type that could be feasibly proposed to https://www.iana.org/assignments/http-parameters/http-parameters.xhtml#preferences - maybe "storage" or "message-resolution", but not something quite so local in nomenclature as "external-content". Even if you want to depart from the IANA list (yikes), I think you want to aspire to inclusion there.

- Ben

Benjamin Pennell

unread,
Jan 23, 2018, 9:21:04 AM1/23/18
to fedor...@googlegroups.com
Thanks Ben A.

I feel a bit out of my depth, but here is a gist of a first draft proposal for a Prefer header for this copy/proxy/redirect behavior:

It's probably way off, so any and all feedback would be appreciated.  I tried to keep it broader than external-body content, but that may make it less precise.  And the name of the header is definitely not final.

Aaron Birkland

unread,
Jan 23, 2018, 9:54:44 AM1/23/18
to fedor...@googlegroups.com

Hi Bens A and P,

 

I really like the suggestion that we ought to aspire to inclusion in the IANA list of preferences.  What we’re doing with external content is quite different from anything on the IANA list at present, so I think that taking a rigorous approach to defining a new preference is the right way to proceed.  It would be ideal if we actually submitted a proposal for inclusion on that list, even if it isn’t taken up due to the specialized nature of external content (but If it *is*, that would be fantastic.  It would really broaden the communities that are interested in these sort of concepts).

 

I also think we ought to get feedback from some of the members of the broader IETF community.  I don’t think there is broad agreement with our *own* community that external content has fully landed in a manner that meets our various disparate needs, but I think this conversation gets us closer.  Some “disinterested” external input might be helpful.  I’d love, for example, to float some of the ideas related to external content by the IETF HTTP working group.  That could help answer the Content-Location vs message/external-body questions, as well as get feedback on the proposed Prefer headers.

 

Anyway I like where this is going.  I also like the name “content-resolution” for naming the proposed preference.  I haven’t had time to fully review and think about the entirety proposal in the gist, but plan to have more feedback

 

  -Aaron

To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.

Peter Matthew Eichman

unread,
Jan 25, 2018, 3:38:29 PM1/25/18
to fedor...@googlegroups.com
Hi Ben P.,

Thanks for putting together that draft of a prefer header proposal. In the proxy case (which is what we are most interested in here at UMD), I think it would make sense to call out that the server may use the content-type header of the upstream content (e.g., "image/png" in your example) to define the content-type header in its response. While there would be security concerns in allowing an arbitrary third-party server to set your http response headers for you, I think in the use case where we'd likely be proxying to a locked down server that we also control, it wouldn't be a major risk. I could envision a server-side config of "trusted proxy hosts" as another potential security measure.

-Peter

To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech+unsubscribe@googlegroups.com.

To post to this group, send email to fedor...@googlegroups.com.
Visit this group at https://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech+unsubscribe@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at https://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.



--

Benjamin Pennell

unread,
Jan 25, 2018, 6:55:50 PM1/25/18
to fedor...@googlegroups.com
Hi Peter,

Hmm, I was naively thinking of a proxied binary resource simply being a resolvable URL that you could open a stream from, and therefore a proxied binary could be treated like a local binary for the most part by the rest of the application.  That wouldn't leave any space for passing along headers from the host of the resources, unlike with redirected resources.   Instead this would imply that a proxied http URL would behave like an actual http proxy (crazy, right?).  

I'm not opposed to this, still thinking through some of the implications, but I am interested in other people's opinions on if this is the behavior they would expect and if it is a route that they would like to see fedora take with proxying.  Establishing fedora as a true proxy for http resources could open up a number of follow up issues that don't come up with redirects since the client is left dealing with them in that case (like authentication), and possible inconsistencies in headers that show up on proxied binaries vs local ones.

Peter Matthew Eichman

unread,
Jan 26, 2018, 11:47:03 AM1/26/18
to fedor...@googlegroups.com
Hi Ben,

I think my biggest concern was over losing the content-type of proxied resources. If there is a good way to do it without Fedora being a full-fledged HTTP proxy server, in principle I am fine with that. I would worry, though, about the use of the term "proxy" in that case being confusing to someone who *was* expecting a more full-featured HTTP reverse proxy.

-Peter
Reply all
Reply to author
Forward
0 new messages