How do you use online data sources?

122 views
Skip to first unread message

Denis Defreyne

unread,
Feb 11, 2010, 11:45:02 AM2/11/10
to na...@googlegroups.com
Hi,

I’ve got a few questions for you.

In a recent commit, I removed the online data sources than came with nanoc 3.0 (the Last.fm, Delicious and Twitter ones) since these data sources were mostly intended to be an example, and not for production. The three reasons why I’ve removed these data sources:

1. They slow down the compilation process. During every compilation, the data source needs to request data from a remote location, and it can take a while before this request is completed. nanoc 3.0 uses a caching HTTP client, but the delay is noticeable nonetheless.

2. They prevent compilation if the remote location is inaccessible. A site isn’t compileable if data is missing. Pretending that the data source returned no items/layouts instead of generating an error is possible, but it will most likely make nanoc generate a broken site anyway, so that is not a good solution.

3. The data returned by the bundled data sources is incomplete. All three data sources could be modified to provide additional information; for example, Twitter could return users as items and Delicious could return tags as items as well. Which data is necessary depends on the use case; a data source that returns all possible information will also be needlessly slow.

A solution to these three problems is to make nanoc *not* use those data sources for fetching data from the web. Instead, data from online sources should be fetched using scripts of rake tasks; this would allow data to be fetched when it is deemed necessary by yourself, eliminating issue 1 and 2. Issue 3 cannot easily be fixed, because it likely requires customized code.

(I’m sort-of regretting that I removed these data sources, because it breaks backwards compatibility. I’ll likely readd these data sources before the final 3.1 release.)

My question to you: have you used online data sources? Do you use the ones that are bundled with nanoc 3? Have you written custom online data sources? Have you stumbled on the same issues as I have (any of the above three)? Would you like nanoc 3.1 to retain online data sources?

Thanks,

Denis

--
Denis Defreyne
denis.d...@stoneship.org

Alexander Mankuta

unread,
Feb 11, 2010, 12:19:39 PM2/11/10
to na...@googlegroups.com
Short answers are...
No. No. No. Obviously, no. No.


--
Best regards,
Alex

> --
> You received this message because you are subscribed to the nanoc discusssion group.
>
> To post to this group, send email to na...@googlegroups.com
> To unsubscribe from this group, send email to nanoc+un...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/nanoc
>

John Schofield

unread,
Feb 11, 2010, 12:51:20 PM2/11/10
to na...@googlegroups.com
I believe how these data sources work is that they grab data from the live site and output it (according to rules, layouts, etc.) in the output folder.

That is not how I think it should work.

I'm interested in using this (for example) for backing up AND mirroring my twitter account. So it should run (as you suggest) before the compile step, and should download twitter messages to a location in the *content* folder, where Nanoc would handle it with nothing other than the existing page/item functionality.

I fully support the changes you're discussing.


John

---------------------------------------------------
John Mark Schofield
ro...@sudosu.net
jscho...@gmail.com
http://www.sudosu.net
http://blog.sudosu.net
(310) 751-0022



Seth Falcon

unread,
Feb 11, 2010, 6:19:25 PM2/11/10
to na...@googlegroups.com
Hi Denis,

On 2/11/10 8:45 AM, Denis Defreyne wrote:
> My question to you: have you used online data sources? Do you use the
> ones that are bundled with nanoc 3? Have you written custom online
> data sources? Have you stumbled on the same issues as I have (any of
> the above three)? Would you like nanoc 3.1 to retain online data
> sources?

For a site I'm working on, I wrote a custom data source using the
delicious data source as an example. My data source fetches RSS from
GMANE and presents "recent activity of a mailing list".

Just this afternoon, I finally got around to adding data caching and use
of conditional GET so that site compilation will not be slow and to be
nicer to GMANE.

While I didn't use the delicious data source directly, I found the
example useful when I was getting going with nanoc. I like being able
to "plug in" to nanoc instead of writing an ad-hoc rake task.

Especially for a nanoc newbie, it made accessing this sort of "online"
data easier. For example, it led me to the notion of using
Nanoc3::Item.new, gave me easy access to site config (not that this is
hard), and some structure to help keep things modular. It also helped
to convince me to use nanoc in the first place, as it gave me the
impression that nanoc would be easy to extend to for my needs.

I also like that the custom data source items are made available to
other parts of the nanoc compile process without writing content files.
Could a rake-based solution do that? This is useful because in my
case, I don't want to render the separate items (they get rendered on
the parent page) and writing actual files to content/ would force me to
add git ignore rules, etc.

Did I run into performance issues with site compilation? *YES*.
Especially when using the autocompiler, and some warnings in the docs
would have been helpful (a remote call is made for every aco req).
[aside: I've found the autocompiler slow for other reasons and have
started using a file-watch type approach with nginx].

Issues 1 and 2 are not really solved just by using rake, but by making
the code smarter: caching, error handling.


Cheers,

+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/user

Nicky Peeters

unread,
Feb 12, 2010, 4:43:57 AM2/12/10
to nanoc
Hi Denis, thanks for starting his discussion!

I use the Delicious, LastFM and Twitter datasources to put aggregate
the content of those sites on my blog/landing page in deliberate a
static manner. I own the data they retrieve and want it on my personal
site without having to resort
to Javascript widgets. This way the data also gets picked up by
spiders as a part of my domain.

I consider the availability of the remote sources at compile time an
integral dependency of my final static data,
and one of the reasons why I switched to a site compiler instead of a
dynamic web application. I consider the current
implementation 'good enough' to do what I want.

I agree with you that they might have to be a little more forgiving
when they could cause problems (downtime, stale data, caching etc.) as
not to disrupt compilation when you _really_ have to get your site
compiled and pushed. But I'm guessing there are other features
you personally consider uncompleted but leave them in nonetheless.

I'm not familiar enough with the code to suggest a way to do it, but a
simple skip-remote-datasources option could to the trick
on a manual compile operation when you're in a bind.

And people using an auto-compiled nanoc setup shouldn't complain about
remote datasources failing!

They should switch to a dynamic web application instead ;-)

Nicky

On Feb 11, 5:45 pm, Denis Defreyne <denis.defre...@stoneship.org>
wrote:

> denis.defre...@stoneship.org

Denis Defreyne

unread,
Feb 14, 2010, 6:47:41 AM2/14/10
to na...@googlegroups.com
Hi,

An idea I had in mind for solving issues 1 and 2 is by using a rake task that fetches the data when requested. Something like this:

require 'open-uri'

task :fetch_twitter do
# Configure
username = 'ddfreyne'

# Read
data = open("http://twitter.com/statuses/user_timeline/#{username}.json").read

# Write
FileUtils.mkdir_p('tmp')
File.open("tmp/twitter-#{username}.json", 'w') { |io| io.write(data) }
end

A “rake fetch_twitter” invocation would then fetch the data to tmp/twitter-ddfreyne.json. This may take a while (a few seconds here) and may even fail if Twitter is down (which is not unlikely, hehe), but at least the data from the previous “rake fetch_twitter” invocation remains available.

The Twitter data source would have to be adjusted to read data from tmp/twitter-username.json, then. Something along the lines of http://pastie.org/pastes/824269 would do the trick (haven’t tested that piece of code, so may contain bugs). The same technique can be applied to the Last.FM, Delicious, … data sources.

This pre-fetching approach solves issues #1 and #2 pretty nicely, but not #3… but I’m not sure #3 can ever be solved entirely: data formats change, new data is added, etc. In any case, compiling the site will be fast again, and the autocompiler will be fast as well (or at least, *acceptably* slow). (As far as the autocompiler is concerned: I’m considering rewriting the autocompiler to use a file-watcher approach that compiles the site as soon as any files have changed.)

Reply all
Reply to author
Forward
0 new messages