dropbox.js

Jan-Christoph Borchardt

unread,

Sep 2, 2012, 4:31:28 PM9/2/12

to Unhosted

Dropbox just released dropbox.js
(https://github.com/dropbox/dropbox-js ), a client-side library for
the Dropbox API. For that to work, they also added CORS headers to
their API. It seems very similar to remoteStorage.js

More info in the article: https://tech.dropbox.com/?p=345

Now that they have CORS headers, it would be interesting to integrate
it with remoteStorage.js, so that you can use your Dropbox account as
your remoteStorage. That would be part of the adoption ramp, or
»legacy support« for data silos. Then there could be a liberation app,
moving the data from your Dropbox to another remoteStorage.

Michiel de Jong

unread,

Sep 2, 2012, 4:59:44 PM9/2/12

to unho...@googlegroups.com

On Sun, Sep 2, 2012 at 10:31 PM, Jan-Christoph Borchardt
<h...@jancborchardt.net> wrote:
> Dropbox just released dropbox.js
> (https://github.com/dropbox/dropbox-js ), a client-side library for
> the Dropbox API. For that to work, they also added CORS headers to
> their API. It seems very similar to remoteStorage.js
>
> More info in the article: https://tech.dropbox.com/?p=345
>
> Now that they have CORS headers, it would be interesting to integrate
> it with remoteStorage.js, so that you can use your Dropbox account as
> your remoteStorage. That would be part of the adoption ramp, or
> »legacy support« for data silos.

that's great news!

> Then there could be a liberation app,

by liberator app i've always meant an app that sucks your data out of
existing web 2.0 platforms. is that what you mean?

> moving the data from your Dropbox to another remoteStorage.

that already exists, it's called Seven20. You can export data from one
remoteStorage account into a zip file, and then import that into
another account.

do keep in mind though that Dropbox.js will, as i understand it, get
its own shielded-off subfolder under apps/ on your dropbox account, it
will not be able to access any existing data on there, will it?

ciao,
Michiel

Ian Bicking

unread,

Sep 2, 2012, 11:40:53 PM9/2/12

to unho...@googlegroups.com

On Sun, Sep 2, 2012 at 3:59 PM, Michiel de Jong <mic...@unhosted.org> wrote:

> Then there could be a liberation app,

by liberator app i've always meant an app that sucks your data out of
existing web 2.0 platforms. is that what you mean?

As aside to this, but on the topic of data liberation, I've been working on this experiment which might interest people on this list: https://github.com/ianb/seeitsaveit

I haven't really talked about it anywhere since it's not really "live" (the server component isn't properly deployed, for instance), but a lot of the pieces are working at this point.

Quick summary: it's a Firefox extension for running data extraction scripts on a page you are viewing, with a centralized server for hosting those scripts (i.e., crowdsourcing the scraping process), and a development tool for creating those scripts. Each extraction script produces typed data, and consuming services would advertise that they accept that type of data (ideally that would be handled by Web Intents/Activities, but right now I have what's essentially placeholder functionality with a central registry of apps). Users ultimately just select the service to which they want to send the data from the current page. And because it works in the browser, on the DOM (not HTTP) and doesn't use APIs, it does not rely on special relationships between services that can be revoked

I don't have many docs for getting started, and there's lots of incomplete things, so it's really only interesting right now to developers who are okay digging around. But if anyone is interested, find me on IRC if you have questions about the code or approach.

Ian

Michiel de Jong

unread,

Sep 3, 2012, 2:37:12 AM9/3/12

to unho...@googlegroups.com

cool! how do you determine the data types out of the html blurb that
the script sees? from semantic markup maybe? it seems tricky at first
sight to extract reusable data from html pages?

when you say crowd-source the scraping process you mean develop the
scripts as free software, right? or do you also mean that people
scrape (publically viewable) pages to one same public repository? in
that case it becomes something more like scraperwiki probably?

ciao,
Michiel

> --
>
>
>

☮ elf Pavlik ☮

unread,

Sep 3, 2012, 2:45:00 AM9/3/12

to Ian Bicking, unhosted

Excerpts from Ian Bicking's message of 2012-09-03 03:40:53 +0000:

do you know: https://scraperwiki.com ?
they stay active for quite a while now :)

Ian Bicking

unread,

Sep 3, 2012, 3:37:47 AM9/3/12

to unho...@googlegroups.com

On Mon, Sep 3, 2012 at 1:37 AM, Michiel de Jong <mic...@unhosted.org> wrote:

cool! how do you determine the data types out of the html blurb that
the script sees? from semantic markup maybe?

The extraction script indicates what sites it works for, and what kinds of data it produces. Here's an example of getting product data from an Amazon page: https://github.com/ianb/seeitsaveit/blob/master/seeit-services/prefill-data/ianb%2540mozilla.com/www.amazon.com.js – the script indicates what domain it works for, it should (but doesn't) indicate what URLs it might work for (since it can extract product information, but can't extract information from a shopping cart page, for instance), and then indicates it produces data of type "product". Another service says it can accept "product" information (e.g., a wish list application), and then the user can just say "add to wishlist" and the extracting script runs and the data is sent on to the wish list app.

it seems tricky at first
sight to extract reusable data from html pages?

Usually there's a CSS class or other indication of what you want to get from the page. On something like Craigslist you have to use regular expressions, as there's practically no markup at all – but that's also an option, and works okay. It's a challenge to keep these scripts up to date, but I'm hoping if the scripts are more widely useful then ongoing maintenance will be more feasible. The app for developing new scripts is pretty important as well, and I think makes it a lot easier.

when you say crowd-source the scraping process you mean develop the
scripts as free software, right?

Yes; the data that is extracted is private to the user, but the script to extract that data is public.

or do you also mean that people
scrape (publically viewable) pages to one same public repository? in
that case it becomes something more like scraperwiki probably?

It's a bit different than scraperwiki, particularly that it runs in the browser itself. So the data comes from the authenticated page the user is on; e.g., I can scrape my Facebook news feed, that I happen to read at http://www.facebook.com – but if you went to the same page of course you'd get different content. Also intranet content is accessible, as you aren't sending a URL off to another foreign service. And dynamic pages work fine too – scripts get the DOM as it is displayed at that moment, not as it was first served up.

One thing scraperwiki is good at that I don't know what to do about is spidering sites. For lots of fairly simple tasks you need to get more information than is on one page itself. That could be as simple as needing to page through a long list of contacts. I haven't figured out how to approach that problem.

Reply all

Reply to author

Forward