Search integration with static website?

125 views
Skip to first unread message

Michael Below

unread,
Nov 1, 2011, 12:28:35 PM11/1/11
to picky...@googlegroups.com
Hi,

I'm wondering if picky is the right tool for a website search
engine:

I am building the website in nanoc, so i have the data in
categories like title, body, tag. Ideally, the indexer should build the
index when nanoc generates the site (offline), so all that categories
information is still there, and let me deploy the index to the server
together with the static website.

I imagine I can integrate a Sinatra server with the static pages to find
& display the results.

Do you think this makes sense?

Cheers
Michael

--
Michael Below <be...@judiz.de>

Niko Dittmann

unread,
Nov 1, 2011, 3:29:38 PM11/1/11
to picky...@googlegroups.com
Hi Micheal,

that's an interesting question. It seems to me there are two different answers.

1) The technical anser is: Yes that's easily possible and it fits really well to the classical method of first generating an index and then loading this index once when the picky server is started. And you'll greatly benefit from Picky returning categorized results and doing that fast.

2) The rather philosophical answer derives from the premise that you're using a static site engine for a good reason. The two most obvious one would be: Your hosting company just supports static HTML or you have really a lot of traffic (like in millions of page views per day). If the latter is the case, Picky would be a good match, as it can really handle a lot of traffic, fast. In the former case you would loose the benefit of being able to just deploy static pages. Then you'd perhaps rather try to find a solution like a Google site search or (if your index is small) a Javascript based search.

This gets me thinking... (perhaps you should stop reading here) could all responses for - lets say - *every single word search* be generated with Picky and put into static files? How many files would that be? I hacked together a small script to download a wikipedia page and tokenize it with one of pickys tokenizers:

https://gist.github.com/1331581

In the first case the text has 5799 words with 768 uniques. This results in 3816 substrings, that would mean as many separate static JSON documents which you would have to generate and deploy. I tried another wikipedia text with 17778 words and 1911 uniques: 9000 substrings. Mind the number of documents wouldn't matter, only the total number of words. And this certainly only works for single word searches. Would be fun playing with larger text bodies.

OK. Crazy. Sorry for capturing your thread for this sort of craziness.

Niko.

Picky

unread,
Nov 1, 2011, 6:30:05 PM11/1/11
to picky...@googlegroups.com
Hi Michael,

It makes sense. I think it is one the right tools, but am wondering if it is the best tool for the job.
That's why I still have a few questions:
- Where is the nanoc data? Are the categories only there while nanoc generates the site?
- What kind of search interface did you want? The Picky one or one that simply shows a list of URLs together with a short description?

Cheers,
   Florian

As a fun exercise, and maybe as an inspiration I wrote a small Picky search that indexes all html pages that are in the same directory and returns a simple list of result URLs: 
(Thanks to Niko for the idea)

Michael Below

unread,
Nov 1, 2011, 6:43:07 PM11/1/11
to picky...@googlegroups.com
Hi,
sounds good...
About the philosophical part: I started using nanoc because installing a "real" CMS with all bells & whistles seemed like overkill for the project. Now the site has grown and some of those whistles start to look good... But nanoc is very flexible, so I try to extend my existing setup.

Probably a Javascript-based solution would be enough for my project, but I haven't found something convincing: IIRC the compass documentation builds a JSON-based index from within nanoc and searches it with Javascript, but the solution reaches its limits, the index is bloated with partial words etc., so I looked for a solution with a better indexer and found picky...

Michael

Michael Below

unread,
Nov 1, 2011, 7:07:15 PM11/1/11
to picky...@googlegroups.com
Hi Florian,

the nanoc data is in text files, they have a YAML header with title, date etc. and markdown content. While the nanoc compiler runs the content is represented in ruby objects (each item has title, date etc.), I think that would be the right time to run the indexer (pre- or postprocessing).

I am not really decided about the output question. I like the picky way to show the categories, limit queries etc., but it's not really necessary. The web site addresses the general public, not library research assistants, so many possibilities maybe won't be used. My goal is more like: the interface should be user friendly, and search should distinguisch between words in the main content or tags for a blog article, and words in a tag cloud that happens to be on the same page..

Michael

Picky / Florian Hanke

unread,
Nov 2, 2011, 3:06:55 AM11/2/11
to picky...@googlegroups.com, Michael Below
Hi Michael,

That sounds good. Regarding the data: Can you hook into the compilation somehow?

You could define your index in a separate file, and store it in a constant, like so:

PagesIndex = Picky::Index.new(:pages) do
  source { nanoc_items }
  category :title
  category :body
  # ...
end

Then run the indexing process when nanoc is compiling. One way to do that is this:

require File.expand_path('../path/to/your/index/definition'), __FILE__
PagesIndex.source { your_nanoc_items }
PagesIndex.index

Then, start up your Picky server, where you also use this index definition:

require File.expand_path('../path/to/your/index/definition'), __FILE__
pages_search = Picky::Search.new PagesIndex

I do not know how well you know Ruby. So this is just the bare bones skeleton as an idea on how you could do it.
Note that Picky is less of a one-size-fits-all solution that you can just throw in, but then it is very easy and visible on how to change just about everything.

Regarding the interface:
It might be a good idea to just use simple javascript and do a JSON request to the Picky server itself. Then, it would display the results in a very simple way, clickable. Again, I do not know how well you are versed in Javascript.

All the best,
   Florian

Picky / Florian Hanke

unread,
Nov 2, 2011, 3:10:38 AM11/2/11
to picky...@googlegroups.com, Michael Below
Correction:
PagesIndex = Picky::Index.new(:pages) do
  source { } # <- This source would be empty, as we do not have the nanoc items yet.
  category :title

Michael Below

unread,
Nov 2, 2011, 8:50:11 AM11/2/11
to picky...@googlegroups.com
Hi Florian,

Am Mittwoch, den 02.11.2011, 00:06 -0700 schrieb Picky / Florian Hanke:

> That sounds good. Regarding the data: Can you hook into the compilation
> somehow?

Denis, the author of nanoc, advised me on irc to do this via a Rake
file. That way i don't have to run the indexer on every recompile of the
site. That would be something like:

site = Nanoc3::Site.new('.')


PagesIndex = Picky::Index.new(:pages) do

source { site.items }
category :title
category :tag
category :description
# ...
end

This sounds like a good idea, will try it... I will report back as soon
as I got that far.

> I do not know how well you know Ruby.

My Ruby knowledge is mostly how to plug some elements together --
luckily, Ruby seems to be good for this approach, there are a lot of
building blocks I can use... :-)

> Regarding the interface:
> It might be a good idea to just use simple javascript and do a JSON request
> to the Picky server itself. Then, it would display the results in a very
> simple way, clickable. Again, I do not know how well you are versed in
> Javascript.

Not very much, I still have to get to the stage to plug something
together that makes sense... But probably that means I should learn it
some time.

Picky / Florian Hanke

unread,
Nov 2, 2011, 4:53:19 PM11/2/11
to picky...@googlegroups.com
Hi Michael,

On Wednesday, 2 November 2011 23:50:11 UTC+11, below wrote:
Hi Florian,

Am Mittwoch, den 02.11.2011, 00:06 -0700 schrieb Picky / Florian Hanke:

> That sounds good. Regarding the data: Can you hook into the compilation
> somehow?

Denis, the author of nanoc, advised me on irc to do this via a Rake
file. That way i don't have to run the indexer on every recompile of the
site. That would be something like:

site = Nanoc3::Site.new('.')
PagesIndex = Picky::Index.new(:pages) do
   source { site.items }
   category :title
   category :tag
   category :description
   # ...
 end

This sounds like a good idea, will try it... I will report back as soon
as I got that far.

That is perfect, and also a good idea. Good luck!

> I do not know how well you know Ruby.

My Ruby knowledge is mostly how to plug some elements together --
luckily, Ruby seems to be good for this approach, there are a lot of
building blocks I can use... :-)

Yes indeed :)
 

> Regarding the interface:
> It might be a good idea to just use simple javascript and do a JSON request
> to the Picky server itself. Then, it would display the results in a very
> simple way, clickable. Again, I do not know how well you are versed in
> Javascript.

Not very much, I still have to get to the stage to plug something
together that makes sense... But probably that means I should learn it
some time.

Alright. It's probably a good idea to do the first step first (get a running server with good results) and then think about the interface.
Btw, did you know that you can search a server directly from the terminal? "picky search http://localhost:4567/pages" Also see
(I need to add that you might need to define the whole URL, not just the path)

Cheers,
   Florian

Michael Below

unread,
Nov 7, 2011, 11:07:08 AM11/7/11
to picky...@googlegroups.com
Hi,

Am Mittwoch, den 02.11.2011, 13:53 -0700 schrieb Picky / Florian Hanke:

> > site = Nanoc3::Site.new('.')
> > PagesIndex = Picky::Index.new(:pages) do
> > source { site.items }
> > category :title
> > category :tag
> > category :description
> > # ...
> > end
> >
> > This sounds like a good idea, will try it... I will report back as soon
> > as I got that far.
> >
> That is perfect, and also a good idea. Good luck!

I didn't get too far: I have added the above to app.rb (in the
all_in_one config). When I try to build an index with rake, it throws an
error because #id is no longer defined. The friendly people on
#ruby-lang are telling me: "Use #object_id, if you really must"

$ rake index
Loaded picky with environment 'development' in /home/mbelow/html/judiz
on Ruby 1.9.3.
:public is no longer used to avoid overloading Module#public,
use :public_folder instead
from /home/mbelow/html/judiz/app.rb:55:in `<class:CommentSearch>'
Application loaded.
16:27:18: Indexing using 4 processors, in random order.
16:27:23: "development:pages": Starting parallel data preparation.
/home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/indexers/parallel.rb:41:in `block in process': undefined method `id' for <Nanoc3::Item:0x12209ec identifier=/stylesheet/ binary?=false>:Nanoc3::Item (NoMethodError)

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/indexers/parallel.rb:40:in `each'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/indexers/parallel.rb:40:in `process'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/indexers/base.rb:23:in `index'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/index_indexing.rb:78:in `index_in_parallel'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/index_indexing.rb:27:in `index'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:53:in `call'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:53:in `block (2 levels) in forked'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:51:in `fork'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:51:in `block in forked'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:41:in `loop'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:41:in `forked'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/indexes_indexing.rb:30:in `index'
from (__DELEGATION__):2:in `index'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/tasks/index.rake:10:in `block in <top (required)>'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `call'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `block in execute'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `each'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `execute'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:158:in `block in invoke_with_call_chain'

from /home/mbelow/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:151:in `invoke_with_call_chain'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:144:in `invoke'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:116:in `invoke_task'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `block (2 levels) in top_level'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `each'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `block in top_level'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:133:in `standard_exception_handling'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:88:in `top_level'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:66:in `block in run'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:133:in `standard_exception_handling'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:63:in `run'

from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/bin/rake:33:in `<top (required)>'
from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/bin/rake:19:in `load'
from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/bin/rake:19:in `<main>'
16:27:23: Indexing finished.

> Alright. It's probably a good idea to do the first step first (get a
> running server with good results) and then think about the interface.

I have been thinking about something like this: Probably I could add a
simple search form to my site template, and that forwards the user to
something like your search page. If I understand things right, that
takes all-in-one solution one server process, just like a backend server
would that is listening to javascript JSON requests, right?

> Btw, did you know that you can search a server directly from the terminal?
> "picky search http://localhost:4567/pages" Also see
> http://florianhanke.com/blog/2011/04/11/searching-with-picky-rake-search.html
> (I need to add that you might need to define the whole URL, not just the
> path)

Sounds useful, will try that...

Michael Below

unread,
Nov 7, 2011, 12:14:22 PM11/7/11
to picky...@googlegroups.com
Am Montag, den 07.11.2011, 17:07 +0100 schrieb Michael Below:

> I didn't get too far: I have added the above to app.rb (in the
> all_in_one config). When I try to build an index with rake, it throws an
> error because #id is no longer defined. The friendly people on
> #ruby-lang are telling me: "Use #object_id, if you really must"

Further idea: maybe it's better to build an index based on the canonical
URL for a page, i.e. url_for(item), instead of runtime IDs? I don't see
how picky stores that ID 4711 is actually
http://do.main.com/impressum/index.html

I guess usually there is a content server running that knows the IDs,
but does this work with static pages? (Or maybe I am missing something?)

Picky / Florian Hanke

unread,
Nov 8, 2011, 5:45:52 AM11/8/11
to picky...@googlegroups.com
Hi Michael,


On Tuesday, 8 November 2011 03:07:08 UTC+11, below wrote:
Hi,

Am Mittwoch, den 02.11.2011, 13:53 -0700 schrieb Picky / Florian Hanke:

> > site = Nanoc3::Site.new('.')
> > PagesIndex = Picky::Index.new(:pages) do
> >    source { site.items }
> >    category :title
> >    category :tag
> >    category :description
> >    # ...
> >  end
> >
> > This sounds like a good idea, will try it... I will report back as soon
> > as I got that far.
> >
> That is perfect, and also a good idea. Good luck!

I didn't get too far: I have added the above to app.rb (in the
all_in_one config). When I try to build an index with rake, it throws an
error because #id is no longer defined. The friendly people on
#ruby-lang are telling me: "Use #object_id, if you really must"

Picky uses the id method to identify/index the objects. That depends on what you want it to be identified with.
Maybe the url is a good idea?

If yes, you can extend the Nanoc Items class (I have no idea what it is called, I'm sorry) like this, for example:
module Nanoc
   class Item
      def id
         url
      end
   end
end

Before indexing, Picky will load this and the Nanoc class will automatically return the url as its id.

Please tell me when you need more detailed help. I am happy to provide it.
 

> Alright. It's probably a good idea to do the first step first (get a 

> running server with good results) and then think about the interface.

I have been thinking about something like this: Probably I could add a
simple search form to my site template, and that forwards the user to
something like your search page. If I understand things right, that
takes all-in-one solution one server process, just like a backend server
would that is listening to javascript JSON requests, right?

Yes. To try it you could quickly do a
picky generate all_in_one testsearch
to see how it works.
 

> Btw, did you know that you can search a server directly from the terminal?
> "picky search http://localhost:4567/pages" Also see
> http://florianhanke.com/blog/2011/04/11/searching-with-picky-rake-search.html
> (I need to add that you might need to define the whole URL, not just the
> path)

Sounds useful, will try that...

Cheers,
   Florian 

Picky / Florian Hanke

unread,
Nov 8, 2011, 5:49:06 AM11/8/11
to picky...@googlegroups.com
Usually there is, but with the all_in_one server both are contained in one. It's best to just generate one as described in the last post and look at what it does.

With static pages I think it's best to use the url as the id.

Don't hesitate to ask for more details if you need it.

Cheers,
   Florian

Michael Below

unread,
Nov 8, 2011, 9:22:50 AM11/8/11
to picky...@googlegroups.com
Am Dienstag, den 08.11.2011, 02:45 -0800 schrieb Picky / Florian Hanke:

> If yes, you can extend the Nanoc Items class (I have no idea what it is
> called, I'm sorry) like this, for example:
> module Nanoc
> class Item
> def id
> url
> end
> end
> end
>
> Before indexing, Picky will load this and the Nanoc class will
> automatically return the url as its id.

I tried something like that using the path. But somehow it looks like
picky stores a 0 instead of the path string, the json files look like:

{"kosten":[0],"seite":[0],"nicht":[0],...

Any idea why?

Florian Hanke

unread,
Nov 8, 2011, 9:28:54 AM11/8/11
to picky...@googlegroups.com
Yes. Picky does not know the id type - you can tell it that it should assume it's symbols by setting
key_format :to_sym
inside the index definition.

Cheers!

(Message from mobile, hence short)

Michael Below

unread,
Nov 8, 2011, 3:10:47 PM11/8/11
to picky...@googlegroups.com
Am Mittwoch, den 09.11.2011, 01:28 +1100 schrieb Florian Hanke:
> Yes. Picky does not know the id type - you can tell it that it should assume it's symbols by setting
> key_format :to_sym
> inside the index definition.

Yes, that does it. Nice, indexing works!

Now I am also indexing the item content (before layout, i.e. just the
article text), and i have noticed that words are indexed every time they
appear: some words have three or four entries for the same item, and
maybe one more for another item.

I guess that can make sense if the results are weighted, like "this is
90% relevant" - "this is 30% relevant". Does picky do that? If this is
more like a unintended consequence from my strange use case, should I
try to "clean" the index somehow?

Picky / Florian Hanke

unread,
Nov 8, 2011, 6:52:27 PM11/8/11
to picky...@googlegroups.com
On Wednesday, 9 November 2011 07:10:47 UTC+11, below wrote:
Am Mittwoch, den 09.11.2011, 01:28 +1100 schrieb Florian Hanke:
> Yes. Picky does not know the id type - you can tell it that it should assume it's symbols by setting
>   key_format :to_sym
> inside the index definition.

Yes, that does it. Nice, indexing works!

Good to hear!

Now I am also indexing the item content (before layout, i.e. just the
article text), and i have noticed that words are indexed every time they
appear: some words have three or four entries for the same item, and
maybe one more for another item.

I guess that can make sense if the results are weighted, like "this is
90% relevant" - "this is 30% relevant". Does picky do that? If this is
more like a unintended consequence from my strange use case, should I
try to "clean" the index somehow?

I am not perfectly sure what you mean. Did you look at the indexes and see that one word references the same id multiple times, like so:
:word => [1, 1, 1, 3, 1]
Or something like that?

Can you give us an example if this isn't it?

Cheers,
   Florian

Michael Below

unread,
Nov 9, 2011, 4:56:56 AM11/9/11
to picky...@googlegroups.com
Hi,

Am Dienstag, den 08.11.2011, 15:52 -0800 schrieb Picky / Florian Hanke:

> > Now I am also indexing the item content (before layout, i.e. just the
> > article text), and i have noticed that words are indexed every time they
> > appear: some words have three or four entries for the same item, and
> > maybe one more for another item.
> >
> > I guess that can make sense if the results are weighted, like "this is
> > 90% relevant" - "this is 30% relevant". Does picky do that? If this is
> > more like a unintended consequence from my strange use case, should I
> > try to "clean" the index somehow?
> >
> I am not perfectly sure what you mean. Did you look at the indexes and see
> that one word references the same id multiple times, like so:
> :word => [1, 1, 1, 3, 1]
> Or something like that?

Yes, the JSON file body_exact_inverted.memory.json contains entries
like: "der":["page1","page1","page1","page1","page2","page2","page3"]

Now this tells me that I should tweak the list of stop words, but it
also makes me wonder if this shouldn't be:
"der":["page1","page2","page3"]

Best

Florian Hanke

unread,
Nov 9, 2011, 7:51:45 AM11/9/11
to picky...@googlegroups.com
Hello again :)

You are absolutely right on both accounts. I am wondering what's happening here. How do you index? Using a source and "rake index"?

Feel free to post your app.rb so we can try to reproduce the problem (or send it to my email address if it is too public for you).

Thanks for your perseverance!
Florian

Michael Below

unread,
Nov 9, 2011, 10:18:08 AM11/9/11
to picky...@googlegroups.com
Am Mittwoch, den 09.11.2011, 23:51 +1100 schrieb Florian Hanke:

> > Yes, the JSON file body_exact_inverted.memory.json contains entries
> > like: "der":["page1","page1","page1","page1","page2","page2","page3"]
> >
> > Now this tells me that I should tweak the list of stop words, but it
> > also makes me wonder if this shouldn't be:
> > "der":["page1","page2","page3"]
>
> You are absolutely right on both accounts. I am wondering what's happening here. How do you index? Using a source and "rake index"?
>
> Feel free to post your app.rb so we can try to reproduce the problem (or send it to my email address if it is too public for you).

I am attaching the app.rb (the search part doesn't work yet).The
interesting bit is probably how I get the body content: the text is in
Markdown files. Those are processed through ERB and RDiscount
(compiled_content), but no layout is added (therefore it's called
the :pre snapshot). I am running that through Nokogiri to retrieve the
text.

Cheers

app.rb

Picky / Florian Hanke

unread,
Nov 10, 2011, 2:35:18 AM11/10/11
to picky...@googlegroups.com
Hi Michael,


On Thursday, 10 November 2011 02:18:08 UTC+11, below wrote:
Am Mittwoch, den 09.11.2011, 23:51 +1100 schrieb Florian Hanke:

> You are absolutely right on both accounts. I am wondering what's happening here. How do you index? Using a source and "rake index"?
>
> Feel free to post your app.rb so we can try to reproduce the problem (or send it to my email address if it is too public for you).

I am attaching the app.rb (the search part doesn't work yet).

The interesting bit is probably how I get the body content: the text is in


Markdown files. Those are processed through ERB and RDiscount
(compiled_content), but no layout is added (therefore it's called
the :pre snapshot). I am running that through Nokogiri to retrieve the
text.

It's very interesting indeed how you are getting the data. Kudos!

I was able to reproduce the problem and am now fixing it.
The interesting thing here is that in the results, the problem does not occur anymore. That is probably why nobody noticed it.
I have probably introduced the error a few versions back and am adding a regression test for it.
Thanks!

Please update to 3.4.2 in 1/2 an hour.

Cheers

Picky / Florian Hanke

unread,
Nov 10, 2011, 3:19:14 AM11/10/11
to picky...@googlegroups.com
P.S: Or better, 3.4.3.

Michael Below

unread,
Nov 10, 2011, 8:08:56 AM11/10/11
to picky...@googlegroups.com
Hi again,

Am Donnerstag, den 10.11.2011, 00:19 -0800 schrieb Picky / Florian
Hanke:
> P.S: Or better, 3.4.3.

Should we take this off-list? Maybe this threads gets a bit long for a
public mailing list...

Anyway, I installed 3.4.3 (instead of 3.4.0) and now I have an
interesting new problem:

13:55:54: Indexing using 4 processors, in random order.
13:55:54: "development:pages": Starting parallel data preparation.
/home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/nanoc3-3.2.3/lib/nanoc3/base/result_data/item_rep.rb:243:in `compiled_content': The current item cannot be compiled yet because of an unmet dependency on the “/kosten/” item (rep “default”). (Nanoc3::Errors::UnmetDependency)

This is the same error I had when I didn't call the site.compile method
before the index definitions: nanoc can't output the first item because
it isn't compiled yet.

I don't understand how this error is coming back now, when I am
explicitly calling the compile method. It looks like the body method is
used before the compile is done. Any ideas? Is there a place to get the
site compilation going earlier?

(Wild guess: maybe this is caused by the 4 parallel indexing threads
somehow? can I tell picky to get parallel only after the compile is
done?)

Picky / Florian Hanke

unread,
Nov 10, 2011, 8:26:08 AM11/10/11
to picky...@googlegroups.com
Hi Michael,


On Friday, 11 November 2011 00:08:56 UTC+11, below wrote:
Hi again,

Am Donnerstag, den 10.11.2011, 00:19 -0800 schrieb Picky / Florian
Hanke:
> P.S: Or better, 3.4.3.

Should we take this off-list? Maybe this threads gets a bit long for a
public mailing list...

It's fine for me. The list needs a bit of life ;)
 

Anyway, I installed 3.4.3 (instead of 3.4.0) and now I have an
interesting new problem:

13:55:54: Indexing using 4 processors, in random order.
13:55:54: "development:pages": Starting parallel data preparation.
/home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/nanoc3-3.2.3/lib/nanoc3/base/result_data/item_rep.rb:243:in `compiled_content': The current item cannot be compiled yet because of an unmet dependency on the “/kosten/” item (rep “default”). (Nanoc3::Errors::UnmetDependency)

This is the same error I had when I didn't call the site.compile method
before the index definitions: nanoc can't output the first item because
it isn't compiled yet.

I don't understand how this error is coming back now, when I am
explicitly calling the compile method. It looks like the body method is
used before the compile is done. Any ideas? Is there a place to get the
site compilation going earlier?

(Wild guess: maybe this is caused by the 4 parallel indexing threads
somehow? can I tell picky to get parallel only after the compile is
done?)

It is surprising that it didn't occur in 3.4.0, as nothing groundbreaking has been changed. However, that might just have been luck.

If it is the parallel indexing (in separate processes through forking, not threads), and assuming that site.compile does return before it is finished (if it wouldn't, the class would not finish loading until it was compiled and the forks would only be made later) – then maybe a simple sleep X after the site.compile is of help.
Although, that would just be for testing whether that is the problem.

Another idea would be to specifically start indexing an index or even category. This does not use multiple processes to index.
Call it as follows:
rake index[pages]
or
rake index[pages,title] && rake index[pages,tags] etc.

I hope that helps! Cheers

Michael Below

unread,
Nov 10, 2011, 10:23:27 AM11/10/11
to picky...@googlegroups.com
Hi Florian,

Am Donnerstag, den 10.11.2011, 05:26 -0800 schrieb Picky / Florian
Hanke:

> It is surprising that it didn't occur in 3.4.0, as nothing groundbreaking

> has been changed. However, that might just have been luck.

Probably. I went back to 3.4.0, but the error is still there. There was
also an update for yajl when i did gem update this morning, but i don't
think this will be related...

> If it is the parallel indexing (in separate processes through forking, not
> threads), and assuming that site.compile does return before it is finished
> (if it wouldn't, the class would not finish loading until it was compiled
> and the forks would only be made later) – then maybe a simple sleep X after
> the site.compile is of help.

No, this doesn't help... I tried up to 60 seconds sleep, and there is
anotable pause, but the same error.

> rake index[pages,title] && rake index[pages,tags] etc.

For the title and the tags it works fine, but for the body I am still
getting that error.

** Execute index
16:21:53: "development:pages": Starting parallel data preparation.
rake aborted!


The current item cannot be compiled yet because of an unmet dependency
on the “/kosten/” item (rep “default”).

Best

Picky / Florian Hanke

unread,
Nov 10, 2011, 7:36:43 PM11/10/11
to picky...@googlegroups.com
Hi Michael,


On Friday, 11 November 2011 02:23:27 UTC+11, below wrote:
Hi Florian,

Am Donnerstag, den 10.11.2011, 05:26 -0800 schrieb Picky / Florian
Hanke:

> It is surprising that it didn't occur in 3.4.0, as nothing groundbreaking
> has been changed. However, that might just have been luck.

Probably. I went back to 3.4.0, but the error is still there. There was
also an update for yajl when i did gem update this morning, but i don't
think this will be related...

I don't think so either. It's good to know it occurs with 3.4.0 as well.

So: Does it sometimes occur and sometimes not, or is it all the time now?
 

> If it is the parallel indexing (in separate processes through forking, not
> threads), and assuming that site.compile does return before it is finished
> (if it wouldn't, the class would not finish loading until it was compiled
> and the forks would only be made later) – then maybe a simple sleep X after
> the site.compile is of help.

No, this doesn't help... I tried up to 60 seconds sleep, and there is
anotable pause, but the same error.

Good to know. Thus I think Nanoc compiles this synchronously, i.e. it is finished with it when it returns.
 

> rake index[pages,title] && rake index[pages,tags] etc.

For the title and the tags it works fine, but for the body I am still
getting that error.

** Execute index
16:21:53: "development:pages": Starting parallel data preparation.
rake aborted!
The current item cannot be compiled yet because of an unmet dependency
on the “/kosten/” item (rep “default”).

I don't know Nanoc very well. When a Nanoc item cannot meet a dependency, it pushes the item to the back of its compilation queue and continues. If it then reaches the end, and the item still cannot meet dependencies, it will raise this error.

I don't think this is a Picky problem (but let's try to test this assumption later). It just occurs at a time when Picky tries to access the site.items (in the source block).

Perhaps following this helps?

Can you maybe just run this script?
  require 'nanoc3'
  site = Nanoc3::Site.new('.')
  site.compile
  site.items.reject { |item| item.identifier=="/stylesheet/"}.each { |item| item.body[1..10] }

This simulates basically what Picky does, but without Picky. So if this runs into problems, we have to look within Nanoc. If it doesn't, we have to continue looking.

Could it be that you updated the site? Did you already try to compile in the usual Nanoc way? (Using a rake task, I assume)

Cheers and much success,
   Florian

Michael Below

unread,
Nov 11, 2011, 6:34:06 AM11/11/11
to picky...@googlegroups.com
Hi,

Am Donnerstag, den 10.11.2011, 16:36 -0800 schrieb Picky / Florian
Hanke:

> So: Does it sometimes occur and sometimes not, or is it all the time now?

Yes, it's all the time now... And all I remember doing in between on
that project was turning off the machine, turning it on again and doing
a gem update...

> I don't know Nanoc very well. When a Nanoc item cannot meet a dependency,
> it pushes the item to the back of its compilation queue and continues. If
> it then reaches the end, and the item still cannot meet dependencies, it
> will raise this error.

No, that error isn't produced during compilation, it happens in the
method that accesses the compiled content after compilation, see
http://nanoc.stoneship.org/docs/api/3.2/Nanoc3/Item.html#compiled_content-instance_method

In the source snippet there, it looks like nanoc calls a check
"compiled?" before returning the compiled content for an item. That
makes sense, but somehow this test seems to fail.

> I don't think this is a Picky problem (but let's try to test this
> assumption later). It just occurs at a time when Picky tries to access the
> site.items (in the source block).

Yes, looks like.

Hm, yes, the problem looks similar, but there seems to be no reply on
that question...

> Can you maybe just run this script?
> require 'nanoc3'
> site = Nanoc3::Site.new('.')
> site.compile
> site.items.reject { |item| item.identifier=="/stylesheet/"}.each { |item|
> item.body[1..10] }

Yes, same error there. Picky seems to be innocent :-)

I reduced the example to :

require 'nanoc3'
site = Nanoc3::Site.new('.')
site.compile

site.items.each { |myitem| myitem.compiled_content(:snapshot => :pre) }

That fails in a fresh nanoc site, which contains just two items
(/stylesheet/ and /). I sent a question on this to the nanoc mailing
list, let's see what they say.

> Could it be that you updated the site? Did you already try to compile in
> the usual Nanoc way? (Using a rake task, I assume)

No, I didn't update the site, and yes, "nanoc compile" runs fine...

Cheers

Picky / Florian Hanke

unread,
Nov 11, 2011, 6:42:40 AM11/11/11
to picky...@googlegroups.com
Hi Michael,

On Friday, 11 November 2011 22:34:06 UTC+11, below wrote:
Hi,

Am Donnerstag, den 10.11.2011, 16:36 -0800 schrieb Picky / Florian
Hanke:

> So: Does it sometimes occur and sometimes not, or is it all the time now?

Yes, it's all the time now... And all I remember doing in between on
that project was turning off the machine, turning it on again and doing
a gem update...

Also of the Nanoc gem?
 

<snip>

Yes, same error there. Picky seems to be innocent :-)

I reduced the example to :

require 'nanoc3'
site = Nanoc3::Site.new('.')
site.compile
site.items.each { |myitem| myitem.compiled_content(:snapshot => :pre) }

That fails in a fresh nanoc site, which contains just two items
(/stylesheet/ and /). I sent a question on this to the nanoc mailing
list, let's see what they say.

Ok, I wish you all the best and don't hesitate to ask more question (should you have them) if you get back to running Picky.

Cheers,
   Florian

Michael Below

unread,
Nov 11, 2011, 6:54:52 AM11/11/11
to picky...@googlegroups.com
Hi Florian,

Am Freitag, den 11.11.2011, 03:42 -0800 schrieb Picky / Florian Hanke:

> > Yes, it's all the time now... And all I remember doing in between on
> > that project was turning off the machine, turning it on again and doing
> > a gem update...
> >
> Also of the Nanoc gem?

No, only picky and yajl...

> Ok, I wish you all the best and don't hesitate to ask more question (should
> you have them) if you get back to running Picky.

Thanks for your help, I hope I will get back on that soon...

Michael Below

unread,
Nov 11, 2011, 2:14:26 PM11/11/11
to picky...@googlegroups.com
Hi Florian,

Am Mittwoch, den 09.11.2011, 23:35 -0800 schrieb Picky / Florian Hanke:

> I was able to reproduce the problem and am now fixing it.
> The interesting thing here is that in the results, the problem does not
> occur anymore. That is probably why nobody noticed it.
> I have probably introduced the error a few versions back and am adding a
> regression test for it.

I just got a solution for the nanoc problem (tweaking in the compiler.rb
so it doesn't forget which pages are compiled), so now I can confirm
this bit: your fix works, there are only single results in the index
now.

Reply all
Reply to author
Forward
0 new messages