Thanks for all the responses so far! Comments inline...
On 30 May 2015 at 21:09, Jonas Sicking <jo...@sicking.cc
> We should use whatever formats people are using to mark up pages. If that
> is microdata we should use that. If it's RDF we should use that. If its
> JSONLD we should use that.
Agree, that's what I'd like to find out.
On 30 May 2015 at 21:31, Gordon Brander <gbra...@mozilla.com
> We should consider a series of fallbacks for this internal API.
> The metadata story for things like icon, title, description, hero images,
> is complicated. Implementation in the follows real-world use cases like
> posting rich snippets to Facebook or getting an image to show up on
> Twitter, rather than some standard.
> I think it would be best to think of this kind of API as a sort of light
> "scraper" that crawls through a collection of known in-the-wild patterns to
> provide a good-enough answer.
Agree, I think we will at least initially need to support multiple formats,
ideally translating them all internally into a single format that Gaia can
consume via the Browser API.
On 2 June 2015 at 00:53, Karl Dubost <kdu...@mozilla.com
> if not done yet, you might want to talk with Dan Brickley. He is working
> at Google on everything related to schema.org
. danbri -AT- google.com
Thanks, I will!
It might be possible to have a conversion tables in between the different
> markups so making it easier to start with one markup and build up little by
The JSON-LD spec gives examples of how Turtle, RDFa, Microformats and
Microdata can be expressed in JSON-LD . Given JSON is an ideal format
for us to use in Gaia, I like the idea of internally translating into that.
On 2 June 2015 at 01:34, Jonas Sicking <jo...@sicking.cc
> I think we're already talking about reverse-engineering what search
> engines and twitter/facebook/etc do.
Exactly, this is about getting more value out of the content that already
exists on the web, not defining new ways to create content.
But given how small marketshare browsers in general have as metadata
> consumers, I think any standardization efforts would have to be driven
> by the current matadata consumers, like search engines and social
Agree. Though that doesn't prevent us from contributing to that discussion
if we have something to say.
On 2 June 2015 at 01:42, Gordon Brander <gbra...@mozilla.com
> Yup. We’re really talking about 2 things in parallel:
> 1. Defining a standards-based approach to marking these things up (using
> pre-existing patterns where it makes sense). Encouraging authors to use it.
> 2. Creating internal APIs that will leverage this metadata, and in cases
> where the standards-based metadata does not exist, scraping reasonable
> results from other common metadata or markup patterns.
Agree, except I don't want to solve the problem of multiple formats by
creating another format, I'd like to either pick one of the existing
formats or (more likely) hedge our bets and support multiple popular
formats, giving developer warnings for non-standard usage where necessary.
If we find we have suggestions of how to improve the existing formats, then
we should participate in the groups that already exist to make that happen.
On 3 June 2015 at 00:45, Tantek Çelik <tan...@cs.stanford.edu
> The summary among all the myriad proprietary (read: single corp /
> oligopoly controlled) proposals is that Facebook OGP meta tags have a
> strong lead over all the other proprietary approaches
That seems to match our anecdotal experience in building a prototype. Open
Graph is quite primitive in comparison to other formats in terms of what
can be expressed (and it's not clear to me whether it validates as either
valid HTML5 or valid RDFa), but it does seem like a clear contender.
> (for various
> reasons we can get into offline if desired),
I would like to understand those reasons. Are there reasons beyond "
Facebook and Twitter make use of this data so people add it to their web
while among the "open
> standards community" options - i.e. per Mozilla open web principles,
> microformats have the lead.
That is the answer I would expect from the person whose name happens to be
used as an hCard example in a W3C spec under the heading of "Microformats"
> This analysis and conclusion matches what we've been figuring out with
> implementations and deployments in the IndieWebCamp community as well
> (which has several use-cases similar to pins/cards for providing
> summaries/link-previews of pages on the web). In short, the general
> approach involves parsing for two general sets of published data /
> 1. Pragmatic parsing of what's most out there:
> a) according to anecdotal sampling by the indieweb community,
> Facebook OGP, and
> b) according to studies / open/common crawl datasets: classic
> 2. Simplest open standards based approach (so we can recommend the
> easiest, least work, and most openly developed/maintained approach to
> authors and site owners) - microformats2.
I'd like to understand more about the lessons learned here, please email me
off thread if that makes more sense.
I'm happy to provide citations/specs for these, as well as follow-up
> on any detailed questions.
This is what I'd really like to get more of, particularly usage data.