Any progress on allowing automated downloads of archived bookmarks?
cf. http://pinboard.in/faq/#download_archived
It seems like it would be really simple to implement by publishing a
page that has a list of links to all of a user's archived bookmarks,
then people could just run "wget -np -m" to grab a copy. (and
incremental updates would be fast + cheap for both sides)
Alternatively, has anyone set up automated crawling/scraping of
their archived bookmarks?
--
Gerald Oskoboiny <ger...@impressive.net>
http://impressive.net/people/gerald/
> > Any progress on allowing automated downloads of archived bookmarks?
> > cf.http://pinboard.in/faq/#download_archived
> My plan is to let people queue downloads and get emailed a link when
> those are ready. You'll be able to request a full download, or just a
> tarball of stuff crawled since a certain date.
I would really like an implementation that makes it easy to
automate downloads, so I can just set it up and forget about it.
(no manual intervention needed)
> Eventually it will also be possible to have Pinboard regularly
> sync stuff to a Dropbox or S3 account of your choice.
That sounds better, though it would obligate me to pay for
storage elsewhere.
> > It seems like it would be really simple to implement by publishing a
> > page that has a list of links to all of a user's archived bookmarks,
> > then people could just run "wget -np -m" to grab a copy. (and
> > incremental updates would be fast + cheap for both sides)
> The problem with the approach you outline is that the archive servers
> are not set up for a high volume of read traffic, and will bog down
> quickly if people try to crawl their stuff.
I think you could handle those issues easily by adding a cache
and/or rate-limiting requests to archived content. I'd be happy
to chat about implementation details if that would help.
> > > It seems like it would be really simple to implement by publishing a
> > > page that has a list of links to all of a user's archived bookmarks,
> > > then people could just run "wget -np -m" to grab a copy. (and
> > > incremental updates would be fast + cheap for both sides)
>
> > The problem with the approach you outline is that the archive servers
> > are not set up for a high volume of read traffic, and will bog down
> > quickly if people try to crawl their stuff.
Another idea:
Allow archival account holders to request a one-time download, as
you do now, then publish a page for each user that has a list of
links to the most recent 24 hours of crawled content. Then people
could do a full download once, and run wget -m daily to keep
their copy up to date with minimal server load.