Just wanted to introduce ourselves to the group, we are Embedly (Art
Gibson and Sean Creeley) a couple of web developers funded by
YCombinator to bring embedding to the masses..
We have put together a pretty vast oEmbed API that wraps some of the
existing oEmbed api endpoints ( we try to follow the spec, where as we
see a bunch of sites don't)
We'd love for you guys to take a look, give us any feedback .... This
API is very simliar to Deepaks oohEmbed, where we try to make one
endpoint for all oembed calls , kudos to him for the inspiration.
http://api.embed.ly .... if you decide to use us for your sites ..
we'd love to promote your site
( Hopefully this didnt come off spammish, we really love oEmbed and
believe there are endless use cases for it )
Art
@embedly
a...@embed.ly
Samin.
We do support oEmbed discovery, we just don't show it to the user
quite yet. We will at some point soon though.
Thanks,
Sean.
> --
> You received this message because you are subscribed to the Google Groups "OEmbed" group.
> To post to this group, send email to oem...@googlegroups.com.
> To unsubscribe from this group, send email to oembed+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/oembed?hl=en.
>
>
--
603.547.0509
e-mail: sean.c...@gmail.com
On Apr 3, 4:38 am, art_embedly <arthur.gib...@gmail.com> wrote:
> http://api.embed.ly .... if you decide to use us for your sites ..
You provide a single interface to "all" (known) providers out there.
This works around oembed's worst design decision, ever. But...
1) do you cache the provider's responses, or pass every request along?
(speed)
2) do you have fallbacks in place? (redundancy, high-availability)
Your service won't be worth much, if a bit of high traffic kills your
server. (`dig api.embed.ly` names only a single IP…)
You inject data omitted by the original provider
youtube.com: http://hurl.it/hurls/bafba0740c01a69e40a800c08150bc7bef286de2/da6fb9a0ca329b80f81f2d3e87ae6e8f7a51cff9
embed.ly: http://hurl.it/hurls/6821cb07e41dac90847cf91f7f5d573b9d80114a/0b74d384f1690dc3f56373412cdd9c69a51cb144
Here the missing thumbnail is added by your API. Very nice! You should
(prominently) mention that.
Since you already have a pretty comprehensive list of providers, and
offer some desperately needed features (single endpoint, fixing
missing data), I'm pretty sure you'll get your attention. There is one
more feature, you might want to consider, though.
You could offer a registration form, so providers can register their
service. And while you're at it, introduce a signing feature (you
know, "network of trust"…). Users can then sign providers to state
their trust in them. The TrustRank™ could be added to your API
responses. Any implementor can then choose to accept your response
based on their individual TrustRank threshold. This would enable
implementors like WordPress to offer the oEmbed service, without the
"sorry, that endpoint was not hardcoded into our core" hassle. (Or do
something altogether different. Who cares - as long something is
done…)
Best Regards,
Rod
> 1) do you cache the provider's responses, or pass every request along?
> (speed)
Heavily cached. Our stack is nginx, tornado, memcache, cassandra. Once
we have seen a link we tend to be twice as fast as directly making the
call to the provider (save youtube where we are about the same).
> 2) do you have fallbacks in place? (redundancy, high-availability)
> Your service won't be worth much, if a bit of high traffic kills your
> server. (`dig api.embed.ly` names only a single IP…)
All cloud based. We can easily add nodes if our traffic spikes.
> You inject data omitted by the original provider
> youtube.com: http://hurl.it/hurls/bafba0740c01a69e40a800c08150bc7bef286de2/da6fb9a0ca329b80f81f2d3e87ae6e8f7a51cff9
> embed.ly: http://hurl.it/hurls/6821cb07e41dac90847cf91f7f5d573b9d80114a/0b74d384f1690dc3f56373412cdd9c69a51cb144
> Here the missing thumbnail is added by your API. Very nice! You should
> (prominently) mention that.
Thanks! We will add something to the docs that mentions this.
> You could offer a registration form, so providers can register their
> service.
We will add this as well, it's on our soon to be launched embed.ly
homepage, but not on the api. We will link to it.
> And while you're at it, introduce a signing feature (you
> know, "network of trust"…). Users can then sign providers to state
> their trust in them. The TrustRank™ could be added to your API
> responses. Any implementor can then choose to accept your response
> based on their individual TrustRank threshold. This would enable
> implementors like WordPress to offer the oEmbed service, without the
> "sorry, that endpoint was not hardcoded into our core" hassle. (Or do
> something altogether different. Who cares - as long something is
> done…)
I like this idea, but it probably won't be added for a bit. We need
work out a few other things first.
for http://api.embed.ly/tools/generator you might consider attaching
the click-handler to the <li> (or adding a <label>, which would be
semantically correct):
jQuery( 'ul.generator li' ).click( function(e){ if( e.target != this )
return; var c = jQuery( ":checkbox", jQuery(this)); c.attr( 'checked',
c.attr('checked') ? '' : 'checked' ); } );
On Apr 5, 8:32 pm, Sean Creeley <sean.cree...@gmail.com> wrote:
> Heavily cached. Our stack is nginx, tornado, memcache, cassandra. Once
> we have seen a link we tend to be twice as fast as directly making the
> call to the provider (save youtube where we are about the same).
Sounds delicious.
What's the reason for cassandra (or any database, for that matter)?
I mean, for how long do you want to cache results? How much traffic
are you expecting?
> > And while you're at it, introduce a signing feature (you
> > know, "network of trust"…). Users can then sign providers to state
> > their trust in them. The TrustRank™ could be added to your API
> > responses. Any implementor can then choose to accept your response
> > based on their individual TrustRank threshold. This would enable
> > implementors like WordPress to offer the oEmbed service, without the
> > "sorry, that endpoint was not hardcoded into our core" hassle. (Or do
> > something altogether different. Who cares - as long something is
> > done…)
>
> I like this idea, but it probably won't be added for a bit. We need
> work out a few other things first.
Take your time - but don't lose focus. Your centralized gateway
already is one heck of an improvement over plain oEmbed. A network-of-
trust feature would simply be the sugar coating.
Anyways, after a closer (and pretty satisfying) look at embed.ly and
your explanation of the setup, I guess I'll be switching back to
oEmbed after all. Thanks!
Best Regards,
Rod
Nice. Love the extra development help!
>> Heavily cached. Our stack is nginx, tornado, memcache, cassandra. Once
>> we have seen a link we tend to be twice as fast as directly making the
>> call to the provider (save youtube where we are about the same).
>
> Sounds delicious.
> What's the reason for cassandra (or any database, for that matter)?
> I mean, for how long do you want to cache results? How much traffic
> are you expecting?
Speed mostly. Once a link falls out of memcache it's best if we don't
have to go get it again at the time of the request. We like to do that
async in the backend.
> Anyways, after a closer (and pretty satisfying) look at embed.ly and
> your explanation of the setup, I guess I'll be switching back to
> oEmbed after all. Thanks!
Awesome. Look forward to it!
On Apr 5, 9:47 pm, Sean Creeley <sean.cree...@gmail.com> wrote:
> > What's the reason for cassandra (or any database, for that matter)?
> > I mean, for how long do you want to cache results? How much traffic
> > are you expecting?
>
> Speed mostly. Once a link falls out of memcache it's best if we don't
> have to go get it again at the time of the request. We like to do that
> async in the backend.
The question regarding cache-time remains unanswered. (This should be
in your docs, too, imho).
I don't know how likely it is, but what if the description / title /
whatever of some content was modified? How long would the stale
information remain in your system? Or aren't we talking about strict
caching, rather "IF-Modified-Since-HTTP-ReuestHeader" caching? (sorry,
don't know the exact term at the moment).
Best Regards,
Rod
We check on urls based on how frequently they are accessed and if they
have changed in the proceeding times we have checked on them.
We are doing that once a day per url at most right now. If you have
any strong opinions on this I'd love to hear them because as of now we
don't have any.
Thanks,
Sean
I'm currently working on a "similar" system as embed.ly. No worries,
It's not aiming for rich content integration (which I'll most likely
be doing via embed.ly now ;). In my world content changes are a bit
more frequent, as I'm not working with "immutable" content (such as
videos), but with websites in general. Especially those websites, that
are on the technical and SEO levels of 1998. Since I can't count on
assumptions as yours, I cache my results for 6h flat. I would've
implemented some IF-Modified-Since caching, that would've checked
modification times (not for every request though) - but… 1998… again.
On Apr 5, 10:20 pm, Sean Creeley <sean.cree...@gmail.com> wrote:
> There is no set time for caching at this point. Content changes rather
> infrequently when we talk about videos and images. Generally once they
> exist they aren't going to change.
That's a fair assumption. It should definitely be noted in your docs,
though.
> We check on urls based on how frequently they are accessed and if they
> have changed in the proceeding times we have checked on them.
So you keep count on what URIs have been requested via embed.ly. Based
on the count / time ratio you re-request the data from the original
provider. If I got you right here, I wonder on what basis you assume
that highly frequented content is more likely to change. In general I
don't see any correlation between $numerOfRequests, $numberOfChanges,
$probablyChanged.
> We are doing that once a day per url at most right now. If you have
> any strong opinions on this I'd love to hear them because as of now we
> don't have any.
I just had a look at the providers of vimeo and youtube. Both
incapable of proper If-Modified-Since handling. So much for »web 2.0 -
we don't even know the basics, the hell yeah«™
Under the assumption that the content won't change (much), and not
being afraid of a little stale content, I'd stick to your approach.
Cache results for 24h - flat. Then you'd re-check your data anyways.
Since most (assumption based on youtube and vimeo) providers don't
know their HTTP, IF-Modified-Since and etag are out of the question.
So you have to re-request the whole shebang anyways. So there's no
reason to cache your data for longer than 24h. You should really note
this in your docs, though.
I would offer some sort of PURGE interface, though. This could be a
simple form (similar to the query form) which would remove the content
for an URL from memcache / cassandra and send a PURGE request to the
proxy (See http://labs.frickle.com/nginx_ngx_cache_purge/). This way
someone stumbling accross a stale cache could easily "fix" the issue
themselves. If PURGEing from the outsite can be piped from nginx to
your app, anyone could send a »PURGE http://api.embed.ly/v1/api/...
HTTP/1.1« request. This would enable automatic purging by providers.
I'm pretty sure that that's not gonna happen though ;)
Best Regards,
Rod
There is not a high correlation in the number of requests and the
probably of change. But for the the most requested content there is
high correlation between stale content and anger.
> Cache results for 24h - flat. Then you'd re-check your data anyways.
> Since most (assumption based on youtube and vimeo) providers don't
> know their HTTP, IF-Modified-Since and etag are out of the question.
> So you have to re-request the whole shebang anyways. So there's no
> reason to cache your data for longer than 24h. You should really note
> this in your docs, though.
We save url in cassandra for statistics purposes as well. What's the
most embedded video, requests things like that, so purging is not
really an option for us, but i understand where you are going with
that. I'll add it to the docs.
> your app, anyone could send a »PURGE http://api.embed.ly/v1/api/...
> HTTP/1.1« request. This would enable automatic purging by providers.
> I'm pretty sure that that's not gonna happen though ;)
We have an internal purge, but i guess opening it up to the outside
world wouldn't be bad. A little worried about the abuse on that one,
but the benefits could out weigh my issues with it.
We will work on better support for etags and if-modified headers as
well. I think that's beneficial.
Thanks for the input.
Sean
On Apr 6, 12:39 am, Sean Creeley <sean.cree...@gmail.com> wrote:
> There is not a high correlation in the number of requests and the
> probably of change. But for the the most requested content there is
> high correlation between stale content and anger.
point taken.
> We save url in cassandra for statistics purposes as well. What's the
> most embedded video, requests things like that, so purging is not
> really an option for us, but i understand where you are going with
> that. I'll add it to the docs.
I assume that you don't save the number of hits and the json response
in the same tuple. So purging the response cache should be completely
independent of your statistical stuff. (might want to add the PURGEs
to your counting, too?)
> We will work on better support for etags and if-modified headers as
> well. I think that's beneficial.
If your're talking about your client communication: yes, please.
If you're talking about commincation with providers: why bother?
Best Regards,
Rod