Discovery

36 views
Skip to first unread message

Pádraic Brady

unread,
Dec 3, 2014, 2:11:13 PM12/3/14
to php-fig-psr-9-.
Bits of confusion from the original thread made me think to summarise
a few thoughts in a separate thread.

First, discovery is basically the rules that an implementor of the
standard would use to locate formatted data on vulnerability data
given a basic starting point. A starting point in this case could be a
website URL or a git repository URI. We could also go the dependency
route, and use a project reference used by Packagist (where one assume
they uniquely identify a project name with a URI), but I'm loathe to
create dependencies on external services as a basic requirement.

Assuming we go for a web based format and a repository based format
(roughly aligning with some projects publishing data online and others
not having a website but a handy free git repo), I can make two
suggestions (heavily summarised):

0. Figure out if the starting URI refers to a website or a repository
or something completely unpredicted.

1. HTTP(S) based files

Starting Point: The base URL of the website
Discovery Process:
- Fetch the base URL
- Locate the relevant header, e.g. Link:
<http://example.com/disclosures.xml>; rel="php-vuln-disclosures"
- Alternatively, check the fetched HTML for an embedded Link
- Fetch the URL indicated by the link
- Verify that the fetched file is in an acceptable vulnerability data format
- Parse the formatted file and make merry
- Perhaps worry a bit about pagination if you need more than the
last 10 most recent disclosures...

2. Repository files

Starting Point: The base repository URI
Discovery Process:
- Identify what branch passes as the primary working branch using
a given list of conventions
- Clone or fetch that branch
- Look for a specific file in the root directory of the branch.
- Verify that the file is in an acceptable vulnerability data format
- Parse the formatted file and make merry
- Or not
- Check any supported alternative relative paths as necessary
- Feel free to make fetching the file more efficient using web
based interfaces to the repository
- Pagination/Collection navigation? More worrying.

Paddy

--

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Zend Framework PHP-FIG Representative

enygma

unread,
Dec 8, 2014, 11:08:39 AM12/8/14
to php-fig-psr-...@googlegroups.com
I like the idea of splitting it up like this...essentially adding in the extra step of determining if it's a website or repo. I'm not sure on the checkout/clone part on the repo side though. Wouldn't it just make more sense for them to give the direct URL to the file for discovery rather than us trying to figure it out? 

For example, in a composer.json URL from GitHub, we get the branch name along with it in the raw URL: https://raw.githubusercontent.com/enygma/yubikey/master/composer.json

The custom link tag in the HTML fetch is nice but I think the simpler the better for a first shot. Having them directly specify the URL seems like a better option to me.

-chris

Larry Garfield

unread,
Dec 8, 2014, 11:22:39 AM12/8/14
to php-fig-psr-...@googlegroups.com
I don't think the split between "website" and "repo" is necessary. As
enygma notes, a file in Github is easily accessible directly if one were
so inclined. That's a needless distinction.

As long as it's a web-accessible URI, we're good. The only question in
my mind is if we recommend a particular way of publicizing that URI (eg,
Link tag, composer.json, billboards on I-80 in Nebraska, etc.)

--Larry Garfield
> http://www.survivethedeepend.com <http://www.survivethedeepend.com>
> Zend Framework Community Review Team
> Zend Framework PHP-FIG Representative
>
> --
> You received this message because you are subscribed to the Google
> Groups "php-fig-psr-9-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to php-fig-psr-9-disc...@googlegroups.com
> <mailto:php-fig-psr-9-disc...@googlegroups.com>.
> To post to this group, send email to
> php-fig-psr-...@googlegroups.com
> <mailto:php-fig-psr-...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/php-fig-psr-9-discussion/420e4ced-5b8e-49b4-a65e-fd82afca7c18%40googlegroups.com
> <https://groups.google.com/d/msgid/php-fig-psr-9-discussion/420e4ced-5b8e-49b4-a65e-fd82afca7c18%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

Pádraic Brady

unread,
Dec 8, 2014, 2:08:16 PM12/8/14
to Larry Garfield, php-fig-psr-9-.
On 8 December 2014 at 16:22, Larry Garfield <la...@garfieldtech.com> wrote:
> I don't think the split between "website" and "repo" is necessary. As
> enygma notes, a file in Github is easily accessible directly if one were so
> inclined. That's a needless distinction.

Github isn't the whole git ecosystem however. If you remove it from
the equation, and assume a git repository hosted on a personal server
which doesn't have the GH web interface...how do we describe
discovery? Yes, you can use git over HTTP, but HTTP is just used as a
transport for the git transfer protocols. Without some web-based
interface (which may vary in URL construction to raw files), that
person can only make the file available across that git transfer
protocol requiring a client to at least checkout a branch. So there is
an actual distinction here to be considered.

> As long as it's a web-accessible URI, we're good. The only question in my
> mind is if we recommend a particular way of publicizing that URI (eg, Link
> tag, composer.json, billboards on I-80 in Nebraska, etc.)

A Link header is best suited for anything available over the web -
it's already used for similar purposes. We could use composer.json but
control over it's format/syntax is entirely at the mercy of Composer
so I'm reluctant to use it.

Paddy

Korvin Szanto

unread,
Dec 8, 2014, 2:12:52 PM12/8/14
to Pádraic Brady, Larry Garfield, php-fig-psr-9-.

Shouldn’t we also remember that not every project is hosted with VCS? We need to make sure that we’re not closing this down to only one set of projects, I’d hate to see PSR–9 be so easily obsoleted.



--
Korvin Szanto
--
You received this message because you are subscribed to the Google Groups "php-fig-psr-9-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to php-fig-psr-9-disc...@googlegroups.com.
To post to this group, send email to php-fig-psr-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/php-fig-psr-9-discussion/CALwr1G%3DwhxtJ80nSADu0t_rkqAe7tBzyrkFgxRddVaSccLG_Vw%40mail.gmail.com.

Pádraic Brady

unread,
Dec 8, 2014, 2:50:14 PM12/8/14
to Korvin Szanto, Larry Garfield, php-fig-psr-9-.
On 8 December 2014 at 19:12, Korvin Szanto <korvin...@gmail.com> wrote:
> Shouldn’t we also remember that not every project is hosted with VCS? We
> need to make sure that we’re not closing this down to only one set of
> projects, I’d hate to see PSR–9 be so easily obsoleted.

If anything is distributed outside of a VCS, it's likely still going
to have a directory structure. So solving the isolated git issue would
likely be reusable.

Rooting out the basics, I'm differentiating between web-based and
directory-based structures (more basic than even referring to a
specific VCS).

Larry Garfield

unread,
Dec 9, 2014, 12:24:54 PM12/9/14
to Pádraic Brady, Korvin Szanto, php-fig-psr-9-.
On 12/08/2014 01:50 PM, Pádraic Brady wrote:
> On 8 December 2014 at 19:12, Korvin Szanto <korvin...@gmail.com> wrote:
>> Shouldn’t we also remember that not every project is hosted with VCS? We
>> need to make sure that we’re not closing this down to only one set of
>> projects, I’d hate to see PSR–9 be so easily obsoleted.
> If anything is distributed outside of a VCS, it's likely still going
> to have a directory structure. So solving the isolated git issue would
> likely be reusable.
>
> Rooting out the basics, I'm differentiating between web-based and
> directory-based structures (more basic than even referring to a
> specific VCS).

Here's the key distinction: It sounds like you're assuming that the URI
is to a file that is packaged up or stored somewhere along side the code
(possibly in a VCS). I'm not. The only requirement I think we should
make is that it's a web-accessible URI.

That URI could be on a 3rd party "SA publishing service" like Sensio
Labs runs. It could be a Feed Burner URL. (Do they still exist?) It
could be a file in your git root. That URI could be to a file, or it
could be to a dynamic feed generated off of a database. (Drupal, for
instance, would almost certainly do that rather than a static file; our
SA list for core and contrib over the last 15 years is not short.)

"There's a URI I can GET" is the bare minimum requirement we can ask of
someone working on the web. Whether that URI is served from the same
continent as the source code it refers to is entirely irrelevant. That
makes the "private repo" question moot; as long as it's a URI I can GET,
I don't care if the code is even public in the first place. (Even a
proprietary system could make use of it.)

--Larry Garfield

Larry Garfield

unread,
Dec 9, 2014, 12:27:33 PM12/9/14
to Pádraic Brady, php-fig-psr-9-.
On 12/08/2014 01:08 PM, Pádraic Brady wrote:
>> As long as it's a web-accessible URI, we're good. The only question in my
>> mind is if we recommend a particular way of publicizing that URI (eg, Link
>> tag, composer.json, billboards on I-80 in Nebraska, etc.)
> A Link header is best suited for anything available over the web -
> it's already used for similar purposes. We could use composer.json but
> control over it's format/syntax is entirely at the mercy of Composer
> so I'm reluctant to use it.
>
> Paddy

If we establish a standard link relationship (like these:
http://www.iana.org/assignments/link-relations/link-relations.xhtml)
then that can be specified via a <link> tag, a LINK header, or anywhere
else you want. Composer could adopt the same key name too, if it were
so inclined.

Then the question becomes where we tell people to put that LINK. Project
home page? (Again, assuming Composer goes along I expect that to be the
de facto standard but we shouldn't bank on that.)

--Larry Garfield

Chris Cornutt

unread,
Dec 9, 2014, 1:26:38 PM12/9/14
to Larry Garfield, Pádraic Brady, Korvin Szanto, php-fig-psr-9-.
+1 on this...I'd think that requiring some kind of URI would definitely be good. It reduces complexity on this side and allows a project a good opportunity for separation if they don't necessarily want their code to be checked out (or it's even allowed to be).

-chris

--
You received this message because you are subscribed to the Google Groups "php-fig-psr-9-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to php-fig-psr-9-discussion+unsub...@googlegroups.com.
To post to this group, send email to php-fig-psr-9-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/php-fig-psr-9-discussion/54873061.2010602%40garfieldtech.com.

For more options, visit https://groups.google.com/d/optout.



--
Senior Editor
PHPDeveloper.org
ccor...@phpdeveloper.org
@enygma

Pádraic Brady

unread,
Dec 9, 2014, 2:47:49 PM12/9/14
to Larry Garfield, Korvin Szanto, php-fig-psr-9-.
On 9 December 2014 at 17:24, Larry Garfield <la...@garfieldtech.com> wrote:
> On 12/08/2014 01:50 PM, Pádraic Brady wrote:
>> Rooting out the basics, I'm differentiating between web-based and
>> directory-based structures (more basic than even referring to a
>> specific VCS).
>
>
> Here's the key distinction: It sounds like you're assuming that the URI is
> to a file that is packaged up or stored somewhere along side the code
> (possibly in a VCS). I'm not. The only requirement I think we should make
> is that it's a web-accessible URI.

So we're eliminating the "bung in git" option and requiring all
libraries, etc to have web based hosting somewhere? That's fine for
many projects, but there are also those which aren't linked to any
specific web hosting whatsoever. Mockery, one of mine, has no actual
web host. I can reuse a catchall domain and a server, but that's a
convenience I happen to have access to. If I didn't have access to
that convenience, you'd be eliminating me from publishing security
data because I'm not allowed to use the obvious option: sticking it in
the git repo (or whatever VCS/tarball folk prefer).

It's not guaranteed that every library out there has controllable
hosting (or wants to pay for it), but it is guaranteed that they have
some known distribution mechanism. There's a reason people use Github
for open source stuff and one part of that is that it's completely
free. Personal hosting and the maintenance that goes with it is not a
requirement.

This doesn't eliminate the web based URL option that you're
considering by any means, but it does become optional. It's also
imperfect by any stretch of the imagination - there's cloning involved
and other messiness that implementers will need to get over by seeking
efficiencies (e.g. Github's raw file access through the website).
However, it's easy. It's cheap. A monkey could do it.

Paddy

Pádraic Brady

unread,
Dec 9, 2014, 2:54:49 PM12/9/14
to Larry Garfield, php-fig-psr-9-.
On 9 December 2014 at 17:27, Larry Garfield <la...@garfieldtech.com> wrote
> If we establish a standard link relationship (like these:
> http://www.iana.org/assignments/link-relations/link-relations.xhtml) then
> that can be specified via a <link> tag, a LINK header, or anywhere else you
> want. Composer could adopt the same key name too, if it were so inclined.
>
> Then the question becomes where we tell people to put that LINK. Project
> home page? (Again, assuming Composer goes along I expect that to be the de
> facto standard but we shouldn't bank on that.)
>
> --Larry Garfield

A homepage would be ideal. We could use an equivalent for non-HTML
formats - and ensure all keys/values are indeed equivalent. There are
rules to be followed, such as basing a custom link relation on a
followable URI rather than a short tag (which need to be registered -
not likely!), e.g. http://www.php-fig.org/relation/php-disclosures.

enygma

unread,
Dec 9, 2014, 3:01:50 PM12/9/14
to php-fig-psr-...@googlegroups.com, la...@garfieldtech.com, korvin...@gmail.com
Mockery's not a prime example since GitHub files can be accessed by web URL, but I get what you're saying. I'm still worried about other issues that might come with the cloning of repos as a part of the discovery process. Would the system doing the discovery be liable for the code that lives on the server, no matter how briefly? I'm assuming if a project is looking to publish security issues, they're going to at least be a little concerned about how their code is handled.

I'm still on the side of a published URL personally....that puts the onus back on the project to maintain that link rather than just assuming the discovery mechanism will be reliable enough to always get their latest updates.

-chris

Pádraic Brady

unread,
Dec 9, 2014, 3:12:48 PM12/9/14
to enygma, php-fig-psr-9-., Larry Garfield, Korvin Szanto
Hey Chris,

On 9 December 2014 at 20:01, enygma <eny...@phpdeveloper.org> wrote:
> Mockery's not a prime example since GitHub files can be accessed by web URL,
> but I get what you're saying. I'm still worried about other issues that
> might come with the cloning of repos as a part of the discovery process.
> Would the system doing the discovery be liable for the code that lives on
> the server, no matter how briefly? I'm assuming if a project is looking to
> publish security issues, they're going to at least be a little concerned
> about how their code is handled.

Depends on the specific concerns involved. Anything we execute on a
machine could do anything. All the implementation has to do, at the
very worst with a VCS, is clone a branch, read a file or two, and
remove the clone once done. Done correctly and obviously not on a
mission critical machine (one hopes), it shouldn't become an issue. I
guess it depends on whether we want to go as far as making
recommendations directly on implementation (i.e. use temp dirs, don't
meddle with existing branches, cleanup properly, perform
anti-overwrite checks, etc.). Is that down the same track as your
concerns?

enygma

unread,
Dec 9, 2014, 3:22:56 PM12/9/14
to php-fig-psr-...@googlegroups.com, eny...@phpdeveloper.org, la...@garfieldtech.com, korvin...@gmail.com
Yup, that's the same kinds of things I'm worried about. I guess that would be up to the ones implementing the process but it still gives me some pause. I get what you're saying about not everyone having access to a web URL to publish at but I wonder how many projects that would be. It seems like between things like GitHub/Bitbucket/etc and project web pages themselves, that might be a relatively small number (though I have no figures to back that up).

-chris

Larry Garfield

unread,
Dec 9, 2014, 4:02:16 PM12/9/14
to php-fig-psr-9-.
On 12/09/2014 02:12 PM, Pádraic Brady wrote:
> Hey Chris,
>
> On 9 December 2014 at 20:01, enygma <eny...@phpdeveloper.org> wrote:
>> Mockery's not a prime example since GitHub files can be accessed by web URL,
>> but I get what you're saying. I'm still worried about other issues that
>> might come with the cloning of repos as a part of the discovery process.
>> Would the system doing the discovery be liable for the code that lives on
>> the server, no matter how briefly? I'm assuming if a project is looking to
>> publish security issues, they're going to at least be a little concerned
>> about how their code is handled.
> Depends on the specific concerns involved. Anything we execute on a
> machine could do anything. All the implementation has to do, at the
> very worst with a VCS, is clone a branch, read a file or two, and
> remove the clone once done. Done correctly and obviously not on a
> mission critical machine (one hopes), it shouldn't become an issue. I
> guess it depends on whether we want to go as far as making
> recommendations directly on implementation (i.e. use temp dirs, don't
> meddle with existing branches, cleanup properly, perform
> anti-overwrite checks, etc.). Is that down the same track as your
> concerns?
>
> Paddy

There's a very important phrase you used there. "Clone a branch". Which
branch? Which branch has the "latest" SA feed? master? develop? trunk?
highest semver branch name? Specified in something you publish along
with the project somehow?

A VCS works on branches. But SAs are a project-level feed. I don't
know any modern VCS that lets you have a file that crosses branches (by
design). An SA, however, inherently is multi-version because it's
saying "There's a sec hole in version X, please use version Y instead."
So do you look for that SA on version X or Y?

However, I do not feel that "you need to have access to some URI
somewhere on the Internet that you can edit" is a particularly high
bar. Between public Git hosting (which all have web-accessible URIs for
files), trivial web sites, 3rd party services that already exist and
more may exist, etc, I think it's entirely reasonable to set a
requirement of "you can somehow control the value of a URI somewhere on
the web" for participation. It's perhaps the lowest possible bar we can
set.

--Larry Garfield

Larry Garfield

unread,
Dec 9, 2014, 4:04:55 PM12/9/14
to php-fig-psr-9-.
On 12/09/2014 01:54 PM, Pádraic Brady wrote:
> On 9 December 2014 at 17:27, Larry Garfield <la...@garfieldtech.com> wrote
>> If we establish a standard link relationship (like these:
>> http://www.iana.org/assignments/link-relations/link-relations.xhtml) then
>> that can be specified via a <link> tag, a LINK header, or anywhere else you
>> want. Composer could adopt the same key name too, if it were so inclined.
>>
>> Then the question becomes where we tell people to put that LINK. Project
>> home page? (Again, assuming Composer goes along I expect that to be the de
>> facto standard but we shouldn't bank on that.)
>>
>> --Larry Garfield
> A homepage would be ideal. We could use an equivalent for non-HTML
> formats - and ensure all keys/values are indeed equivalent. There are
> rules to be followed, such as basing a custom link relation on a
> followable URI rather than a short tag (which need to be registered -
> not likely!), e.g. http://www.php-fig.org/relation/php-disclosures.
>
> Paddy

There's no need to go through IETF. Other organizations have published
"standard" link relationships. Such a relationship may even be highly
useful to non-PHP projects, too. (Nothing we've discussed so far is PHP
specific; only FIG is.)

There's a dozen and one ways to leverage hypermedia links, which is one
reason I like that as an approach. We could just use a bunch of
"SHOULDs" for suggested places to link from (eg, project home page,
download page, etc.), but not provide an exhaustive list.

--Larry Garfield

Pádraic Brady

unread,
Dec 9, 2014, 4:15:31 PM12/9/14
to Larry Garfield, php-fig-psr-9-.
Hi Larry,

On 9 December 2014 at 21:02, Larry Garfield <la...@garfieldtech.com> wrote:
> There's a very important phrase you used there. "Clone a branch". Which
> branch? Which branch has the "latest" SA feed? master? develop? trunk?
> highest semver branch name? Specified in something you publish along with
> the project somehow?
>
> A VCS works on branches. But SAs are a project-level feed. I don't know
> any modern VCS that lets you have a file that crosses branches (by design).
> An SA, however, inherently is multi-version because it's saying "There's a
> sec hole in version X, please use version Y instead." So do you look for
> that SA on version X or Y?
>
> However, I do not feel that "you need to have access to some URI somewhere
> on the Internet that you can edit" is a particularly high bar. Between
> public Git hosting (which all have web-accessible URIs for files), trivial
> web sites, 3rd party services that already exist and more may exist, etc, I
> think it's entirely reasonable to set a requirement of "you can somehow
> control the value of a URI somewhere on the web" for participation. It's
> perhaps the lowest possible bar we can set.

My concern would be that we set a minimum bar, and then find a lot of
people unwilling to jump over it. Having a URL "somwhere" doesn't tell
us where that somewhere actually is - it's not a discovery solution.
That's the motivation of at least examining the repository/point of
distribution option and seeing whether it is viable. We're already
seeing links back to composer being suggested - which is by definition
repo based itself if we're relying on some given composer.json to
carry a signpost. If we find it's completely unviable, then by all
means we can still go with a purely web based URL option which we're
already bound to have in place regardless.

enygma

unread,
Dec 17, 2014, 10:21:55 AM12/17/14
to php-fig-psr-...@googlegroups.com, la...@garfieldtech.com
Some kind of bar has to be set here, honestly. The same problem comes up regardless of the type of discovery here (repo or URL) in that we have to be told where to look. This "where" is either going to be a repo location or a URL to file the consistently named file. The project will have to define that somewhere, either in a centralized tool or in their documentation somewhere otherwise people won't know where to look. What if the project has both a website and a repo that are publicly accessible? Would someone check in both locations? I think determining a single method for discovery would be good, regardless of where the actual content is located. I'm an advocate for the URL personally.
Reply all
Reply to author
Forward
0 new messages