Search integration with static website? | below | 11/1/11 9:28 AM | Hi, I'm wondering if picky is the right tool for a website search I am building the website in nanoc, so i have the data in I imagine I can integrate a Sinatra server with the static pages to find Do you think this makes sense? Cheers -- |
Re: Search integration with static website? | Niko Dittmann | 11/1/11 12:29 PM | Hi Micheal, that's an interesting question. It seems to me there are two different answers. 1) The technical anser is: Yes that's easily possible and it fits really well to the classical method of first generating an index and then loading this index once when the picky server is started. And you'll greatly benefit from Picky returning categorized results and doing that fast. 2) The rather philosophical answer derives from the premise that you're using a static site engine for a good reason. The two most obvious one would be: Your hosting company just supports static HTML or you have really a lot of traffic (like in millions of page views per day). If the latter is the case, Picky would be a good match, as it can really handle a lot of traffic, fast. In the former case you would loose the benefit of being able to just deploy static pages. Then you'd perhaps rather try to find a solution like a Google site search or (if your index is small) a Javascript based search. This gets me thinking... (perhaps you should stop reading here) could all responses for - lets say - *every single word search* be generated with Picky and put into static files? How many files would that be? I hacked together a small script to download a wikipedia page and tokenize it with one of pickys tokenizers: https://gist.github.com/1331581 In the first case the text has 5799 words with 768 uniques. This results in 3816 substrings, that would mean as many separate static JSON documents which you would have to generate and deploy. I tried another wikipedia text with 17778 words and 1911 uniques: 9000 substrings. Mind the number of documents wouldn't matter, only the total number of words. And this certainly only works for single word searches. Would be fun playing with larger text bodies. OK. Crazy. Sorry for capturing your thread for this sort of craziness. Niko. |
Re: Search integration with static website? | Picky / Florian Hanke | 11/1/11 3:30 PM | Hi Michael, It makes sense. I think it is one the right tools, but am wondering if it is the best tool for the job. That's why I still have a few questions: - Where is the nanoc data? Are the categories only there while nanoc generates the site? - What kind of search interface did you want? The Picky one or one that simply shows a list of URLs together with a short description? Cheers, Florian As a fun exercise, and maybe as an inspiration I wrote a small Picky search that indexes all html pages that are in the same directory and returns a simple list of result URLs: (Thanks to Niko for the idea) |
AW: Re: Search integration with static website? | below | 11/1/11 3:43 PM | Hi, sounds good... About the philosophical part: I started using nanoc because installing a "real" CMS with all bells & whistles seemed like overkill for the project. Now the site has grown and some of those whistles start to look good... But nanoc is very flexible, so I try to extend my existing setup. Probably a Javascript-based solution would be enough for my project, but I haven't found something convincing: IIRC the compass documentation builds a JSON-based index from within nanoc and searches it with Javascript, but the solution reaches its limits, the index is bloated with partial words etc., so I looked for a solution with a better indexer and found picky... Michael |
AW: Re: Search integration with static website? | below | 11/1/11 4:07 PM | Hi Florian, the nanoc data is in text files, they have a YAML header with title, date etc. and markdown content. While the nanoc compiler runs the content is represented in ruby objects (each item has title, date etc.), I think that would be the right time to run the indexer (pre- or postprocessing). I am not really decided about the output question. I like the picky way to show the categories, limit queries etc., but it's not really necessary. The web site addresses the general public, not library research assistants, so many possibilities maybe won't be used. My goal is more like: the interface should be user friendly, and search should distinguisch between words in the main content or tags for a blog article, and words in a tag cloud that happens to be on the same page.. Michael |
Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/2/11 12:06 AM | Hi Michael, That sounds good. Regarding the data: Can you hook into the compilation somehow? You could define your index in a separate file, and store it in a constant, like so: PagesIndex = Picky::Index.new(:pages) do source { nanoc_items } category :title category :body # ... end Then run the indexing process when nanoc is compiling. One way to do that is this: require File.expand_path('../path/to/your/index/definition'), __FILE__ PagesIndex.source { your_nanoc_items } PagesIndex.index Then, start up your Picky server, where you also use this index definition: require File.expand_path('../path/to/your/index/definition'), __FILE__ pages_search = Picky::Search.new PagesIndex I do not know how well you know Ruby. So this is just the bare bones skeleton as an idea on how you could do it. Note that Picky is less of a one-size-fits-all solution that you can just throw in, but then it is very easy and visible on how to change just about everything. Regarding the interface: It might be a good idea to just use simple javascript and do a JSON request to the Picky server itself. Then, it would display the results in a very simple way, clickable. Again, I do not know how well you are versed in Javascript. All the best, Florian |
Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/2/11 12:10 AM | Correction:
source { } # <- This source would be empty, as we do not have the nanoc items yet. category :title |
Re: Re: Search integration with static website? | below | 11/2/11 5:50 AM | Hi Florian, Am Mittwoch, den 02.11.2011, 00:06 -0700 schrieb Picky / Florian Hanke: > That sounds good. Regarding the data: Can you hook into the compilation Denis, the author of nanoc, advised me on irc to do this via a Rake site = Nanoc3::Site.new('.') source { site.items } category :title category :tag category :description # ... end This sounds like a good idea, will try it... I will report back as soon > I do not know how well you know Ruby. My Ruby knowledge is mostly how to plug some elements together -- > Regarding the interface: Not very much, I still have to get to the stage to plug something
|
Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/2/11 1:53 PM | Hi Michael,
That is perfect, and also a good idea. Good luck!
Yes indeed :)
Alright. It's probably a good idea to do the first step first (get a running server with good results) and then think about the interface. Btw, did you know that you can search a server directly from the terminal? "picky search http://localhost:4567/pages" Also see (I need to add that you might need to define the whole URL, not just the path) Cheers, Florian |
Re: Re: Search integration with static website? | below | 11/7/11 8:07 AM | Hi, Am Mittwoch, den 02.11.2011, 13:53 -0700 schrieb Picky / Florian Hanke: > > site = Nanoc3::Site.new('.') I didn't get too far: I have added the above to app.rb (in the $ rake index from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/indexers/parallel.rb:40:in `each' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/indexers/parallel.rb:40:in `process' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/indexers/base.rb:23:in `index' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/index_indexing.rb:78:in `index_in_parallel' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/index_indexing.rb:27:in `index' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:53:in `call' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:53:in `block (2 levels) in forked' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:51:in `fork' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:51:in `block in forked' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:41:in `loop' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/cores.rb:41:in `forked' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/picky/indexes_indexing.rb:30:in `index' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0/gems/picky-3.3.3/lib/tasks/index.rake:10:in `block in <top (required)>' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `call' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `block in execute' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `each' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `execute' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:158:in `block in invoke_with_call_chain' from /home/mbelow/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:151:in `invoke_with_call_chain' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/task.rb:144:in `invoke' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:116:in `invoke_task' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `block (2 levels) in top_level' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `each' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `block in top_level' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:133:in `standard_exception_handling' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:88:in `top_level' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:66:in `block in run' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:133:in `standard_exception_handling' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/lib/rake/application.rb:63:in `run' from /home/mbelow/.rvm/gems/ruby-1.9.3-p0@global/gems/rake-0.9.2.2/bin/rake:33:in `<top (required)>' > Alright. It's probably a good idea to do the first step first (get a I have been thinking about something like this: Probably I could add a > Btw, did you know that you can search a server directly from the terminal? Sounds useful, will try that...
|
Re: Re: Search integration with static website? | below | 11/7/11 9:14 AM | Am Montag, den 07.11.2011, 17:07 +0100 schrieb Michael Below: > I didn't get too far: I have added the above to app.rb (in the Further idea: maybe it's better to build an index based on the canonical I guess usually there is a content server running that knows the IDs,
|
Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/8/11 2:45 AM | Hi Michael,
Picky uses the id method to identify/index the objects. That depends on what you want it to be identified with. Maybe the url is a good idea? If yes, you can extend the Nanoc Items class (I have no idea what it is called, I'm sorry) like this, for example: module Nanoc class Item def id url end end end Before indexing, Picky will load this and the Nanoc class will automatically return the url as its id. Please tell me when you need more detailed help. I am happy to provide it.
Yes. To try it you could quickly do a picky generate all_in_one testsearch to see how it works.
Cheers, Florian |
Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/8/11 2:49 AM | Usually there is, but with the all_in_one server both are contained in one. It's best to just generate one as described in the last post and look at what it does. With static pages I think it's best to use the url as the id. Don't hesitate to ask for more details if you need it. Cheers, Florian |
Re: [picky:24] Re: Re: Search integration with static website? | below | 11/8/11 6:22 AM | Am Dienstag, den 08.11.2011, 02:45 -0800 schrieb Picky / Florian Hanke: > If yes, you can extend the Nanoc Items class (I have no idea what it is I tried something like that using the path. But somehow it looks like {"kosten":[0],"seite":[0],"nicht":[0],... Any idea why?
|
Re: [picky:25] Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/8/11 6:28 AM | Yes. Picky does not know the id type - you can tell it that it should assume it's symbols by setting key_format :to_sym inside the index definition. Cheers! (Message from mobile, hence short) |
Re: [picky:26] Re: Re: Search integration with static website? | below | 11/8/11 12:10 PM | Am Mittwoch, den 09.11.2011, 01:28 +1100 schrieb Florian Hanke: Yes, that does it. Nice, indexing works! Now I am also indexing the item content (before layout, i.e. just the I guess that can make sense if the results are weighted, like "this is
|
Re: [picky:26] Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/8/11 3:52 PM | On Wednesday, 9 November 2011 07:10:47 UTC+11, below wrote:Am Mittwoch, den 09.11.2011, 01:28 +1100 schrieb Florian Hanke: Good to hear!
I am not perfectly sure what you mean. Did you look at the indexes and see that one word references the same id multiple times, like so: :word => [1, 1, 1, 3, 1] Or something like that? Can you give us an example if this isn't it? Cheers, Florian |
Re: [picky:29] Re: Re: Search integration with static website? | below | 11/9/11 1:56 AM | Hi, Am Dienstag, den 08.11.2011, 15:52 -0800 schrieb Picky / Florian Hanke: > > Now I am also indexing the item content (before layout, i.e. just the Yes, the JSON file body_exact_inverted.memory.json contains entries Now this tells me that I should tweak the list of stop words, but it Best
|
Re: [picky:29] Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/9/11 4:51 AM | Hello again :) You are absolutely right on both accounts. I am wondering what's happening here. How do you index? Using a source and "rake index"? Feel free to post your app.rb so we can try to reproduce the problem (or send it to my email address if it is too public for you). Thanks for your perseverance! |
Re: [picky:30] Re: Re: Search integration with static website? | below | 11/9/11 7:18 AM | Am Mittwoch, den 09.11.2011, 23:51 +1100 schrieb Florian Hanke: I am attaching the app.rb (the search part doesn't work yet).The Cheers
|
Re: [picky:30] Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/9/11 11:35 PM | Hi Michael,
Am Mittwoch, den 09.11.2011, 23:51 +1100 schrieb Florian Hanke:> You are absolutely right on both accounts. I am wondering what's happening here. How do you index? Using a source and "rake index"?
It's very interesting indeed how you are getting the data. Kudos! I was able to reproduce the problem and am now fixing it. The interesting thing here is that in the results, the problem does not occur anymore. That is probably why nobody noticed it. I have probably introduced the error a few versions back and am adding a regression test for it. Thanks! Please update to 3.4.2 in 1/2 an hour. Cheers |
Re: [picky:30] Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/10/11 12:19 AM | P.S: Or better, 3.4.3. |
Re: [picky:34] Re: Re: Search integration with static website? | below | 11/10/11 5:08 AM | Hi again, Am Donnerstag, den 10.11.2011, 00:19 -0800 schrieb Picky / Florian Should we take this off-list? Maybe this threads gets a bit long for a Anyway, I installed 3.4.3 (instead of 3.4.0) and now I have an 13:55:54: Indexing using 4 processors, in random order. This is the same error I had when I didn't call the site.compile method I don't understand how this error is coming back now, when I am (Wild guess: maybe this is caused by the 4 parallel indexing threads
|
Re: [picky:34] Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/10/11 5:26 AM | Hi Michael,
It's fine for me. The list needs a bit of life ;)
It is surprising that it didn't occur in 3.4.0, as nothing groundbreaking has been changed. However, that might just have been luck. If it is the parallel indexing (in separate processes through forking, not threads), and assuming that site.compile does return before it is finished (if it wouldn't, the class would not finish loading until it was compiled and the forks would only be made later) – then maybe a simple sleep X after the site.compile is of help. Although, that would just be for testing whether that is the problem. Another idea would be to specifically start indexing an index or even category. This does not use multiple processes to index. Call it as follows: rake index[pages] or rake index[pages,title] && rake index[pages,tags] etc. I hope that helps! Cheers |
Re: [picky:36] Re: Re: Search integration with static website? | below | 11/10/11 7:23 AM | Hi Florian, Am Donnerstag, den 10.11.2011, 05:26 -0800 schrieb Picky / Florian > It is surprising that it didn't occur in 3.4.0, as nothing groundbreaking > has been changed. However, that might just have been luck. Probably. I went back to 3.4.0, but the error is still there. There was > If it is the parallel indexing (in separate processes through forking, not No, this doesn't help... I tried up to 60 seconds sleep, and there is > rake index[pages,title] && rake index[pages,tags] etc. For the title and the tags it works fine, but for the body I am still ** Execute index
Best
|
Re: [picky:36] Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/10/11 4:36 PM | Hi Michael, On Friday, 11 November 2011 02:23:27 UTC+11, below wrote: Hi Florian, I don't think so either. It's good to know it occurs with 3.4.0 as well. So: Does it sometimes occur and sometimes not, or is it all the time now?
Good to know. Thus I think Nanoc compiles this synchronously, i.e. it is finished with it when it returns.
I don't know Nanoc very well. When a Nanoc item cannot meet a dependency, it pushes the item to the back of its compilation queue and continues. If it then reaches the end, and the item still cannot meet dependencies, it will raise this error. I don't think this is a Picky problem (but let's try to test this assumption later). It just occurs at a time when Picky tries to access the site.items (in the source block). Perhaps following this helps? Can you maybe just run this script? require 'nanoc3' site = Nanoc3::Site.new('.') site.compile site.items.reject { |item| item.identifier=="/stylesheet/"}.each { |item| item.body[1..10] } This simulates basically what Picky does, but without Picky. So if this runs into problems, we have to look within Nanoc. If it doesn't, we have to continue looking. Could it be that you updated the site? Did you already try to compile in the usual Nanoc way? (Using a rake task, I assume) Cheers and much success, Florian |
Re: [picky:38] Re: Re: Search integration with static website? | below | 11/11/11 3:34 AM | Hi, Am Donnerstag, den 10.11.2011, 16:36 -0800 schrieb Picky / Florian > So: Does it sometimes occur and sometimes not, or is it all the time now? Yes, it's all the time now... And all I remember doing in between on > I don't know Nanoc very well. When a Nanoc item cannot meet a dependency, No, that error isn't produced during compilation, it happens in the In the source snippet there, it looks like nanoc calls a check > I don't think this is a Picky problem (but let's try to test this Yes, looks like. > Perhaps following this helps? Hm, yes, the problem looks similar, but there seems to be no reply on > Can you maybe just run this script? Yes, same error there. Picky seems to be innocent :-) I reduced the example to : require 'nanoc3'site.items.each { |myitem| myitem.compiled_content(:snapshot => :pre) } That fails in a fresh nanoc site, which contains just two items > Could it be that you updated the site? Did you already try to compile in No, I didn't update the site, and yes, "nanoc compile" runs fine... Michael -- |
Re: [picky:38] Re: Re: Search integration with static website? | Picky / Florian Hanke | 11/11/11 3:42 AM | Hi Michael,
Also of the Nanoc gem?
Ok, I wish you all the best and don't hesitate to ask more question (should you have them) if you get back to running Picky. Cheers, Florian |
Re: [picky:40] Re: Re: Search integration with static website? | below | 11/11/11 3:54 AM | Hi Florian, Am Freitag, den 11.11.2011, 03:42 -0800 schrieb Picky / Florian Hanke: > > Yes, it's all the time now... And all I remember doing in between on No, only picky and yajl... > Ok, I wish you all the best and don't hesitate to ask more question (should Thanks for your help, I hope I will get back on that soon...
|
Re: [picky:33] Re: Re: Search integration with static website? | below | 11/11/11 11:14 AM | Hi Florian, Am Mittwoch, den 09.11.2011, 23:35 -0800 schrieb Picky / Florian Hanke: > I was able to reproduce the problem and am now fixing it. I just got a solution for the nanoc problem (tweaking in the compiler.rb
|