In lieu of a dynamic search facility, some sites work better (I think)
with a site index, although these are not so often seen. It's an older
metaphor, but still a very familiar one, so users ought not to have
difficulty with it. It's also conceptually easier to set up, and lower
cost to run.
I don't know of any existing routine which could be used as a final
filter in Webby, to set up this index - is there one? I'm amusing
myself thinking about the fun of writing one. As a beginning Ruby
programmer (I have more experience with several other languages), it
looks like a fairly easy, and useful project.
> In lieu of a dynamic search facility, some sites work better (I think) > with a site index, although these are not so often seen. It's an older > metaphor, but still a very familiar one, so users ought not to have > difficulty with it. It's also conceptually easier to set up, and lower > cost to run.
> I don't know of any existing routine which could be used as a final > filter in Webby, to set up this index - is there one? I'm amusing > myself thinking about the fun of writing one. As a beginning Ruby > programmer (I have more experience with several other languages), it > looks like a fairly easy, and useful project.
> Any thoughts, anyone?
There was this little challenge that I threw a week or two ago.
Don't know if a sitemap is the same concept of your site index, but they sound very similar. If you feel like coding this up, I'm sure others would find it useful, too.
On Thu, Apr 24, 2008 at 9:51 PM, Tim Pease <tim.pe...@gmail.com> wrote:
> On Apr 22, 2008, at 5:16 PM, tomcloyd wrote:
> > In lieu of a dynamic search facility, some sites work better (I think) > > with a site index, although these are not so often seen. It's an older > > metaphor, but still a very familiar one, so users ought not to have > > difficulty with it. It's also conceptually easier to set up, and lower > > cost to run.
> > I don't know of any existing routine which could be used as a final > > filter in Webby, to set up this index - is there one? I'm amusing > > myself thinking about the fun of writing one. As a beginning Ruby > > programmer (I have more experience with several other languages), it > > looks like a fairly easy, and useful project.
> > Any thoughts, anyone?
> There was this little challenge that I threw a week or two ago.
> Don't know if a sitemap is the same concept of your site index, but they > sound very similar. If you feel like coding this up, I'm sure others would > find it useful, too.
> On Thu, Apr 24, 2008 at 9:51 PM, Tim Pease <tim.pe...@gmail.com> wrote:
> > On Apr 22, 2008, at 5:16 PM, tomcloyd wrote:
> > > In lieu of a dynamic search facility, some sites work better (I think)
> > > with a site index, although these are not so often seen. It's an older
> > > metaphor, but still a very familiar one, so users ought not to have
> > > difficulty with it. It's also conceptually easier to set up, and lower
> > > cost to run.
> > > I don't know of any existing routine which could be used as a final
> > > filter in Webby, to set up this index - is there one? I'm amusing
> > > myself thinking about the fun of writing one. As a beginning Ruby
> > > programmer (I have more experience with several other languages), it
> > > looks like a fairly easy, and useful project.
> > > Any thoughts, anyone?
> > There was this little challenge that I threw a week or two ago.
> > Don't know if a sitemap is the same concept of your site index, but they
> > sound very similar. If you feel like coding this up, I'm sure others would
> > find it useful, too.
This looks interesting, and useful, BUT it's not a site index, it's a
site *map*. I will likely make use of it - thanks!
What I have in mind would work like this:
1. All pages files in a target directory would be processed, except
for those one an "ignore" list.
2. All content between a list of tags would be processed. Such a list
might look like
<h1>
<div id="maincontent">
<div id="sidebarRight">
etc...
3. Each word in the target areas on each page processed would, if not
already there become a key in the index hash. The associated value for
each key would be an array containing the relative URL and title of
each page where the word is found. Obviously, one would need to create
and increment, over time, a "stop list" of words which do NOT go into
the index (because they are trivial, irrelevant, etc.
4. The index hash is then output in HTML as an alphabetized list of
words, with associated page title links.
I would expect to run this routine, and aggressively move indexed
words to the stop list, leaving a selected list of important words to
be in the index.
The advantages of this is that it's simple, could be set up on any
site (no server database needed), and it uses a metaphor (a book
index) with which people are familiar, can easily be updated at any
time, and one has complete control over content, both via the "stop
list" and manual editing.
So...if I don't find something that does this, or someone beats me to
it, I'll probably take a stab at doing this myself. It's within my
capability, which I certainly cannot say about many things I'd like to
do with Ruby (which is one of may reason I really, really like Webby -
'cause it's so much better than anything I might have even conceived
of doing).
This sort of thing would work best on a site which is focused on
written content - the kind of sites I run and build for others. It
wouldn't be appropriate for all sites, surely.
On Sun, Apr 27, 2008 at 6:49 AM, tomcloyd <t...@tomcloyd.com> wrote:
> On Apr 24, 9:10 pm, "Bruce Williams" <br...@codefluency.com> wrote:
> > On Thu, Apr 24, 2008 at 9:51 PM, Tim Pease <tim.pe...@gmail.com> wrote:
> > > On Apr 22, 2008, at 5:16 PM, tomcloyd wrote:
> > > > In lieu of a dynamic search facility, some sites work better (I think) > > > > with a site index, although these are not so often seen. It's an older > > > > metaphor, but still a very familiar one, so users ought not to have > > > > difficulty with it. It's also conceptually easier to set up, and lower > > > > cost to run.
> > > > I don't know of any existing routine which could be used as a final > > > > filter in Webby, to set up this index - is there one? I'm amusing > > > > myself thinking about the fun of writing one. As a beginning Ruby > > > > programmer (I have more experience with several other languages), it > > > > looks like a fairly easy, and useful project.
> > > > Any thoughts, anyone?
> > > There was this little challenge that I threw a week or two ago.
> > > Don't know if a sitemap is the same concept of your site index, but they > > > sound very similar. If you feel like coding this up, I'm sure others would > > > find it useful, too.
> This looks interesting, and useful, BUT it's not a site index, it's a > site *map*. I will likely make use of it - thanks!
> What I have in mind would work like this:
> 1. All pages files in a target directory would be processed, except > for those one an "ignore" list. > 2. All content between a list of tags would be processed. Such a list > might look like > <h1> > <div id="maincontent"> > <div id="sidebarRight"> > etc... > 3. Each word in the target areas on each page processed would, if not > already there become a key in the index hash. The associated value for > each key would be an array containing the relative URL and title of > each page where the word is found. Obviously, one would need to create > and increment, over time, a "stop list" of words which do NOT go into > the index (because they are trivial, irrelevant, etc. > 4. The index hash is then output in HTML as an alphabetized list of > words, with associated page title links.
> I would expect to run this routine, and aggressively move indexed > words to the stop list, leaving a selected list of important words to > be in the index.
> The advantages of this is that it's simple, could be set up on any > site (no server database needed), and it uses a metaphor (a book > index) with which people are familiar, can easily be updated at any > time, and one has complete control over content, both via the "stop > list" and manual editing.
> So...if I don't find something that does this, or someone beats me to > it, I'll probably take a stab at doing this myself. It's within my > capability, which I certainly cannot say about many things I'd like to > do with Ruby (which is one of may reason I really, really like Webby - > 'cause it's so much better than anything I might have even conceived > of doing).
> This sort of thing would work best on a site which is focused on > written content - the kind of sites I run and build for others. It > wouldn't be appropriate for all sites, surely.
> Any thoughts or reactions, anyone?
> Tom
Sounds like a good idea :-)
I'd use an attribute on each page you'd like to ignore to flag it (vs maintaining a separate ignore list), Hpricot to yank out content to process, etc. Looks like a fun little project!
> > > On Thu, Apr 24, 2008 at 9:51 PM, Tim Pease <tim.pe...@gmail.com> wrote:
> > > > On Apr 22, 2008, at 5:16 PM, tomcloyd wrote:
> > > > > In lieu of a dynamic search facility, some sites work better (I think)
> > > > > with a site index, although these are not so often seen. It's an older
> > > > > metaphor, but still a very familiar one, so users ought not to have
> > > > > difficulty with it. It's also conceptually easier to set up, and lower
> > > > > cost to run.
> > > > > I don't know of any existing routine which could be used as a final
> > > > > filter in Webby, to set up this index - is there one? I'm amusing
> > > > > myself thinking about the fun of writing one. As a beginning Ruby
> > > > > programmer (I have more experience with several other languages), it
> > > > > looks like a fairly easy, and useful project.
> > > > > Any thoughts, anyone?
> > > > There was this little challenge that I threw a week or two ago.
> > > > Don't know if a sitemap is the same concept of your site index, but they
> > > > sound very similar. If you feel like coding this up, I'm sure others would
> > > > find it useful, too.
> > This looks interesting, and useful, BUT it's not a site index, it's a
> > site *map*. I will likely make use of it - thanks!
> > What I have in mind would work like this:
> > 1. All pages files in a target directory would be processed, except
> > for those one an "ignore" list.
> > 2. All content between a list of tags would be processed. Such a list
> > might look like
> > <h1>
> > <div id="maincontent">
> > <div id="sidebarRight">
> > etc...
> > 3. Each word in the target areas on each page processed would, if not
> > already there become a key in the index hash. The associated value for
> > each key would be an array containing the relative URL and title of
> > each page where the word is found. Obviously, one would need to create
> > and increment, over time, a "stop list" of words which do NOT go into
> > the index (because they are trivial, irrelevant, etc.
> > 4. The index hash is then output in HTML as an alphabetized list of
> > words, with associated page title links.
> > I would expect to run this routine, and aggressively move indexed
> > words to the stop list, leaving a selected list of important words to
> > be in the index.
> > The advantages of this is that it's simple, could be set up on any
> > site (no server database needed), and it uses a metaphor (a book
> > index) with which people are familiar, can easily be updated at any
> > time, and one has complete control over content, both via the "stop
> > list" and manual editing.
> > So...if I don't find something that does this, or someone beats me to
> > it, I'll probably take a stab at doing this myself. It's within my
> > capability, which I certainly cannot say about many things I'd like to
> > do with Ruby (which is one of may reason I really, really like Webby -
> > 'cause it's so much better than anything I might have even conceived
> > of doing).
> > This sort of thing would work best on a site which is focused on
> > written content - the kind of sites I run and build for others. It
> > wouldn't be appropriate for all sites, surely.
> > Any thoughts or reactions, anyone?
> > Tom
> Sounds like a good idea :-)
> I'd use an attribute on each page you'd like to ignore to flag it (vs
> maintaining a separate ignore list), Hpricot to yank out content to
> process, etc. Looks like a fun little project!
Thanks very much for the suggestions - I'm not familiar with Hpricot,
though I've heard of it. I'll check into it.
Can you explain what you mean by using an "attribute" to flag a page?
Would that be something like inserting <!-- noindex --> in the <head>
tag? That's all I can imagine you might mean. That *would* appear to
be a simpler way to stop page indexing than what I'd proposed.
> > > > On Thu, Apr 24, 2008 at 9:51 PM, Tim Pease <tim.pe...@gmail.com> wrote:
> > > > > On Apr 22, 2008, at 5:16 PM, tomcloyd wrote:
> > > > > > In lieu of a dynamic search facility, some sites work better (I think) > > > > > > with a site index, although these are not so often seen. It's an older > > > > > > metaphor, but still a very familiar one, so users ought not to have > > > > > > difficulty with it. It's also conceptually easier to set up, and lower > > > > > > cost to run.
> > > > > > I don't know of any existing routine which could be used as a final > > > > > > filter in Webby, to set up this index - is there one? I'm amusing > > > > > > myself thinking about the fun of writing one. As a beginning Ruby > > > > > > programmer (I have more experience with several other languages), it > > > > > > looks like a fairly easy, and useful project.
> > > > > > Any thoughts, anyone?
> > > > > There was this little challenge that I threw a week or two ago.
> > > > > Don't know if a sitemap is the same concept of your site index, but they > > > > > sound very similar. If you feel like coding this up, I'm sure others would > > > > > find it useful, too.
> > > > > Blessings, > > > > > TwP
> > > > Incidently, I had to do this just today for some documentation at work: > > > > http://pastie.caboo.se/186595
> > > This looks interesting, and useful, BUT it's not a site index, it's a > > > site *map*. I will likely make use of it - thanks!
> > > What I have in mind would work like this:
> > > 1. All pages files in a target directory would be processed, except > > > for those one an "ignore" list. > > > 2. All content between a list of tags would be processed. Such a list > > > might look like > > > <h1> > > > <div id="maincontent"> > > > <div id="sidebarRight"> > > > etc... > > > 3. Each word in the target areas on each page processed would, if not > > > already there become a key in the index hash. The associated value for > > > each key would be an array containing the relative URL and title of > > > each page where the word is found. Obviously, one would need to create > > > and increment, over time, a "stop list" of words which do NOT go into > > > the index (because they are trivial, irrelevant, etc. > > > 4. The index hash is then output in HTML as an alphabetized list of > > > words, with associated page title links.
> > > I would expect to run this routine, and aggressively move indexed > > > words to the stop list, leaving a selected list of important words to > > > be in the index.
> > > The advantages of this is that it's simple, could be set up on any > > > site (no server database needed), and it uses a metaphor (a book > > > index) with which people are familiar, can easily be updated at any > > > time, and one has complete control over content, both via the "stop > > > list" and manual editing.
> > > So...if I don't find something that does this, or someone beats me to > > > it, I'll probably take a stab at doing this myself. It's within my > > > capability, which I certainly cannot say about many things I'd like to > > > do with Ruby (which is one of may reason I really, really like Webby - > > > 'cause it's so much better than anything I might have even conceived > > > of doing).
> > > This sort of thing would work best on a site which is focused on > > > written content - the kind of sites I run and build for others. It > > > wouldn't be appropriate for all sites, surely.
> > > Any thoughts or reactions, anyone?
> > > Tom
> > Sounds like a good idea :-)
> > I'd use an attribute on each page you'd like to ignore to flag it (vs > > maintaining a separate ignore list), Hpricot to yank out content to > > process, etc. Looks like a fun little project!
> Thanks very much for the suggestions - I'm not familiar with Hpricot, > though I've heard of it. I'll check into it.
> Can you explain what you mean by using an "attribute" to flag a page? > Would that be something like inserting <!-- noindex --> in the <head> > tag? That's all I can imagine you might mean. That *would* appear to > be a simpler way to stop page indexing than what I'd proposed.
> Tom
Tom,
I'm talking about the metadata at the top of each page (in content/**); I wouldn't process the output files in output/** directly.
For example you could do something like the following:
and simply check for the `ignore' attribute on page objects.
Also, rather than just writing a script that processed content/** files directly, I'd try to do it programmatically (probably in a Rake task; Tim might have some tips here) by loading webby and using Webby::Resources::DB#find to grab all the pages (see http://webby.rubyforge.org/rdoc/classes/Webby/Resources/DB.html#M000056), and checking for page.ignore -- and you could get the HTML output of each page for processing by calling page.render and the URL by calling page.url (see http://webby.rubyforge.org/manual/#h2_1_1).
> > > > > On Thu, Apr 24, 2008 at 9:51 PM, Tim Pease <tim.pe...@gmail.com> wrote:
> > > > > > On Apr 22, 2008, at 5:16 PM, tomcloyd wrote:
> > > > > > > In lieu of a dynamic search facility, some sites work better (I think)
> > > > > > > with a site index, although these are not so often seen. It's an older
> > > > > > > metaphor, but still a very familiar one, so users ought not to have
> > > > > > > difficulty with it. It's also conceptually easier to set up, and lower
> > > > > > > cost to run.
> > > > > > > I don't know of any existing routine which could be used as a final
> > > > > > > filter in Webby, to set up this index - is there one? I'm amusing
> > > > > > > myself thinking about the fun of writing one. As a beginning Ruby
> > > > > > > programmer (I have more experience with several other languages), it
> > > > > > > looks like a fairly easy, and useful project.
> > > > > > > Any thoughts, anyone?
> > > > > > There was this little challenge that I threw a week or two ago.
> > > > > > Don't know if a sitemap is the same concept of your site index, but they
> > > > > > sound very similar. If you feel like coding this up, I'm sure others would
> > > > > > find it useful, too.
> > > > > > Blessings,
> > > > > > TwP
> > > > > Incidently, I had to do this just today for some documentation at work:
> > > > > http://pastie.caboo.se/186595
> > > > This looks interesting, and useful, BUT it's not a site index, it's a
> > > > site *map*. I will likely make use of it - thanks!
> > > > What I have in mind would work like this:
> > > > 1. All pages files in a target directory would be processed, except
> > > > for those one an "ignore" list.
> > > > 2. All content between a list of tags would be processed. Such a list
> > > > might look like
> > > > <h1>
> > > > <div id="maincontent">
> > > > <div id="sidebarRight">
> > > > etc...
> > > > 3. Each word in the target areas on each page processed would, if not
> > > > already there become a key in the index hash. The associated value for
> > > > each key would be an array containing the relative URL and title of
> > > > each page where the word is found. Obviously, one would need to create
> > > > and increment, over time, a "stop list" of words which do NOT go into
> > > > the index (because they are trivial, irrelevant, etc.
> > > > 4. The index hash is then output in HTML as an alphabetized list of
> > > > words, with associated page title links.
> > > > I would expect to run this routine, and aggressively move indexed
> > > > words to the stop list, leaving a selected list of important words to
> > > > be in the index.
> > > > The advantages of this is that it's simple, could be set up on any
> > > > site (no server database needed), and it uses a metaphor (a book
> > > > index) with which people are familiar, can easily be updated at any
> > > > time, and one has complete control over content, both via the "stop
> > > > list" and manual editing.
> > > > So...if I don't find something that does this, or someone beats me to
> > > > it, I'll probably take a stab at doing this myself. It's within my
> > > > capability, which I certainly cannot say about many things I'd like to
> > > > do with Ruby (which is one of may reason I really, really like Webby -
> > > > 'cause it's so much better than anything I might have even conceived
> > > > of doing).
> > > > This sort of thing would work best on a site which is focused on
> > > > written content - the kind of sites I run and build for others. It
> > > > wouldn't be appropriate for all sites, surely.
> > > > Any thoughts or reactions, anyone?
> > > > Tom
> > > Sounds like a good idea :-)
> > > I'd use an attribute on each page you'd like to ignore to flag it (vs
> > > maintaining a separate ignore list), Hpricot to yank out content to
> > > process, etc. Looks like a fun little project!
> > Thanks very much for the suggestions - I'm not familiar with Hpricot,
> > though I've heard of it. I'll check into it.
> > Can you explain what you mean by using an "attribute" to flag a page?
> > Would that be something like inserting <!-- noindex --> in the <head>
> > tag? That's all I can imagine you might mean. That *would* appear to
> > be a simpler way to stop page indexing than what I'd proposed.
> > Tom
> Tom,
> I'm talking about the metadata at the top of each page (in
> content/**); I wouldn't process the output files in output/**
> directly.
> For example you could do something like the following:
> and simply check for the `ignore' attribute on page objects.
> Also, rather than just writing a script that processed content/**
> files directly, I'd try to do it programmatically (probably in a Rake
> task; Tim might have some tips here) by loading webby and using
> Webby::Resources::DB#find to grab all the pages (seehttp://webby.rubyforge.org/rdoc/classes/Webby/Resources/DB.html#M000056),
> and checking for page.ignore -- and you could get the HTML output of
> each page for processing by calling page.render and the URL by calling
> page.url (seehttp://webby.rubyforge.org/manual/#h2_1_1).
Wow - that's a far more interesting approach than I had in mind. I
tend to keep things very simple - I often have no choice. Often for me
the question is not HOW to do something in ruby but can I do it at
all. I don't have much time to work on things, and have to learn WHILE
I'm trying to get some piece of work accomplished. It's a luxery to
have time to read other people's code, or to study some aspect of the
language simple to learn more about it. Just a fact of my life.
So, I'm fascinated with your suggestions, as they open up who new
paths of exploration and learning for me, and will likely result in
better results as well.
What I originally had in mind was simply a routine which would act
directly on a set of HTML files, regardless of origin. That would be
usable by all sorts of folks, should they wish.
At this point, I think I'd like to have it operate as you suggest,
from within Webby, because this will assist me in learning Webby more
quickly. Later, I can write another version which can use some of the
same code to realize my original concept.
Thanks again so much for your suggestions. I benefit greatly from
them.
to the metadata of some of my pages when I don't want them to be
indexed by google, but where I can't use a robots.txt file to exclude
the entire directory. For example, I don't want google to index the
year and month archive pages of a blog, I only want the blog posts
themselves to show up.
In my layout I have:
<% unless p['index'] || p['index'].nil? -%>
<!-- Don't index pages with index: false in their metadata. -->
<!-- For example, don't index blog year and month listings. -->
<meta name="robots" content="noindex, follow" />
<% end -%>
I use this same code in my RSS feed
@pages.find(...).each do |p|
next unless p['index'] || p['index'].nil?
to skip things which aren't blog posts.
This might be a useful convention if someone wants to write a Sitemap
(http://www.sitemaps.org/) generator before I get around to it. :-)
On Apr 28, 2:35 am, tomcloyd <t...@tomcloyd.com> wrote:
> > > > > > On Thu, Apr 24, 2008 at 9:51 PM, Tim Pease <tim.pe...@gmail.com> wrote:
> > > > > > > On Apr 22, 2008, at 5:16 PM, tomcloyd wrote:
> > > > > > > > In lieu of a dynamic search facility, some sites work better (I think)
> > > > > > > > with a site index, although these are not so often seen. It's an older
> > > > > > > > metaphor, but still a very familiar one, so users ought not to have
> > > > > > > > difficulty with it. It's also conceptually easier to set up, and lower
> > > > > > > > cost to run.
> > > > > > > > I don't know of any existing routine which could be used as a final
> > > > > > > > filter in Webby, to set up this index - is there one? I'm amusing
> > > > > > > > myself thinking about the fun of writing one. As a beginning Ruby
> > > > > > > > programmer (I have more experience with several other languages), it
> > > > > > > > looks like a fairly easy, and useful project.
> > > > > > > > Any thoughts, anyone?
> > > > > > > There was this little challenge that I threw a week or two ago.
> > > > > > > Don't know if asitemapis the same concept of your site index, but they
> > > > > > > sound very similar. If you feel like coding this up, I'm sure others would
> > > > > > > find it useful, too.
> > > > > > > Blessings,
> > > > > > > TwP
> > > > > > Incidently, I had to do this just today for some documentation at work:
> > > > > > http://pastie.caboo.se/186595
> > > > > This looks interesting, and useful, BUT it's not a site index, it's a
> > > > > site *map*. I will likely make use of it - thanks!
> > > > > What I have in mind would work like this:
> > > > > 1. All pages files in a target directory would be processed, except
> > > > > for those one an "ignore" list.
> > > > > 2. All content between a list of tags would be processed. Such a list
> > > > > might look like
> > > > > <h1>
> > > > > <div id="maincontent">
> > > > > <div id="sidebarRight">
> > > > > etc...
> > > > > 3. Each word in the target areas on each page processed would, if not
> > > > > already there become a key in the index hash. The associated value for
> > > > > each key would be an array containing the relative URL and title of
> > > > > each page where the word is found. Obviously, one would need to create
> > > > > and increment, over time, a "stop list" of words which do NOT go into
> > > > > the index (because they are trivial, irrelevant, etc.
> > > > > 4. The index hash is then output in HTML as an alphabetized list of
> > > > > words, with associated page title links.
> > > > > I would expect to run this routine, and aggressively move indexed
> > > > > words to the stop list, leaving a selected list of important words to
> > > > > be in the index.
> > > > > The advantages of this is that it's simple, could be set up on any
> > > > > site (no server database needed), and it uses a metaphor (a book
> > > > > index) with which people are familiar, can easily be updated at any
> > > > > time, and one has complete control over content, both via the "stop
> > > > > list" and manual editing.
> > > > > So...if I don't find something that does this, or someone beats me to
> > > > > it, I'll probably take a stab at doing this myself. It's within my
> > > > > capability, which I certainly cannot say about many things I'd like to
> > > > > do with Ruby (which is one of may reason I really, really like Webby -
> > > > > 'cause it's so much better than anything I might have even conceived
> > > > > of doing).
> > > > > This sort of thing would work best on a site which is focused on
> > > > > written content - the kind of sites I run and build for others. It
> > > > > wouldn't be appropriate for all sites, surely.
> > > > > Any thoughts or reactions, anyone?
> > > > > Tom
> > > > Sounds like a good idea :-)
> > > > I'd use an attribute on each page you'd like to ignore to flag it (vs
> > > > maintaining a separate ignore list), Hpricot to yank out content to
> > > > process, etc. Looks like a fun little project!
> > > Thanks very much for the suggestions - I'm not familiar with Hpricot,
> > > though I've heard of it. I'll check into it.
> > > Can you explain what you mean by using an "attribute" to flag a page?
> > > Would that be something like inserting <!-- noindex --> in the <head>
> > > tag? That's all I can imagine you might mean. That *would* appear to
> > > be a simpler way to stop page indexing than what I'd proposed.
> > > Tom
> > Tom,
> > I'm talking about the metadata at the top of each page (in
> > content/**); I wouldn't process the output files in output/**
> > directly.
> > For example you could do something like the following:
> > and simply check for the `ignore' attribute on page objects.
> > Also, rather than just writing a script that processed content/**
> > files directly, I'd try to do it programmatically (probably in a Rake
> > task; Tim might have some tips here) by loading webby and using
> > Webby::Resources::DB#find to grab all the pages (seehttp://webby.rubyforge.org/rdoc/classes/Webby/Resources/DB.html#M000056),
> > and checking for page.ignore -- and you could get the HTML output of
> > each page for processing by calling page.render and the URL by calling
> > page.url (seehttp://webby.rubyforge.org/manual/#h2_1_1).
> Wow - that's a far more interesting approach than I had in mind. I
> tend to keep things very simple - I often have no choice. Often for me
> the question is not HOW to do something in ruby but can I do it at
> all. I don't have much time to work on things, and have to learn WHILE
> I'm trying to get some piece of work accomplished. It's a luxery to
> have time to read other people's code, or to study some aspect of the
> language simple to learn more about it. Just a fact of my life.
> So, I'm fascinated with your suggestions, as they open up who new
> paths of exploration and learning for me, and will likely result in
> better results as well.
> What I originally had in mind was simply a routine which would act
> directly on a set of HTML files, regardless of origin. That would be
> usable by all sorts of folks, should they wish.
> At this point, I think I'd like to have it operate as you suggest,
> from within Webby, because this will assist me in learning Webby more
> quickly. Later, I can write another version which can use some of the
> same code to realize my original concept.
> Thanks again so much for your suggestions. I benefit greatly from
> them.
> to the metadata of some of my pages when I don't want them to be
> indexed by google, but where I can't use a robots.txt file to exclude
> the entire directory. For example, I don't want google to index the
> year and month archive pages of a blog, I only want the blog posts
> themselves to show up.
> In my layout I have:
> <% unless p['index'] || p['index'].nil? -%>
> <!-- Don't index pages with index: false in their metadata. -->
> <!-- For example, don't index blog year and month listings. -->
> <meta name="robots" content="noindex, follow" />
> <% end -%>
> I use this same code in my RSS feed
> @pages.find(...).each do |p|
> next unless p['index'] || p['index'].nil?
> to skip things which aren't blog posts.
> This might be a useful convention if someone wants to write a Sitemap
> (http://www.sitemaps.org/) generator before I get around to it. :-)
> On Apr 28, 2:35 am, tomcloyd <t...@tomcloyd.com> wrote:
> > > > > > > On Thu, Apr 24, 2008 at 9:51 PM, Tim Pease <tim.pe...@gmail.com> wrote:
> > > > > > > > On Apr 22, 2008, at 5:16 PM, tomcloyd wrote:
> > > > > > > > > In lieu of a dynamic search facility, some sites work better (I think)
> > > > > > > > > with a site index, although these are not so often seen. It's an older
> > > > > > > > > metaphor, but still a very familiar one, so users ought not to have
> > > > > > > > > difficulty with it. It's also conceptually easier to set up, and lower
> > > > > > > > > cost to run.
> > > > > > > > > I don't know of any existing routine which could be used as a final
> > > > > > > > > filter in Webby, to set up this index - is there one? I'm amusing
> > > > > > > > > myself thinking about the fun of writing one. As a beginning Ruby
> > > > > > > > > programmer (I have more experience with several other languages), it
> > > > > > > > > looks like a fairly easy, and useful project.
> > > > > > > > > Any thoughts, anyone?
> > > > > > > > There was this little challenge that I threw a week or two ago.
> > > > > > > > Don't know if asitemapis the same concept of your site index, but they
> > > > > > > > sound very similar. If you feel like coding this up, I'm sure others would
> > > > > > > > find it useful, too.
> > > > > > > > Blessings,
> > > > > > > > TwP
> > > > > > > Incidently, I had to do this just today for some documentation at work:
> > > > > > > http://pastie.caboo.se/186595
> > > > > > This looks interesting, and useful, BUT it's not a site index, it's a
> > > > > > site *map*. I will likely make use of it - thanks!
> > > > > > What I have in mind would work like this:
> > > > > > 1. All pages files in a target directory would be processed, except
> > > > > > for those one an "ignore" list.
> > > > > > 2. All content between a list of tags would be processed. Such a list
> > > > > > might look like
> > > > > > <h1>
> > > > > > <div id="maincontent">
> > > > > > <div id="sidebarRight">
> > > > > > etc...
> > > > > > 3. Each word in the target areas on each page processed would, if not
> > > > > > already there become a key in the index hash. The associated value for
> > > > > > each key would be an array containing the relative URL and title of
> > > > > > each page where the word is found. Obviously, one would need to create
> > > > > > and increment, over time, a "stop list" of words which do NOT go into
> > > > > > the index (because they are trivial, irrelevant, etc.
> > > > > > 4. The index hash is then output in HTML as an alphabetized list of
> > > > > > words, with associated page title links.
> > > > > > I would expect to run this routine, and aggressively move indexed
> > > > > > words to the stop list, leaving a selected list of important words to
> > > > > > be in the index.
> > > > > > The advantages of this is that it's simple, could be set up on any
> > > > > > site (no server database needed), and it uses a metaphor (a book
> > > > > > index) with which people are familiar, can easily be updated at any
> > > > > > time, and one has complete control over content, both via the "stop
> > > > > > list" and manual editing.
> > > > > > So...if I don't find something that does this, or someone beats me to
> > > > > > it, I'll probably take a stab at doing this myself. It's within my
> > > > > > capability, which I certainly cannot say about many things I'd like to
> > > > > > do with Ruby (which is one of may reason I really, really like Webby -
> > > > > > 'cause it's so much better than anything I might have even conceived
> > > > > > of doing).
> > > > > > This sort of thing would work best on a site which is focused on
> > > > > > written content - the kind of sites I run and build for others. It
> > > > > > wouldn't be appropriate for all sites, surely.
> > > > > > Any thoughts or reactions, anyone?
> > > > > > Tom
> > > > > Sounds like a good idea :-)
> > > > > I'd use an attribute on each page you'd like to ignore to flag it (vs
> > > > > maintaining a separate ignore list), Hpricot to yank out content to
> > > > > process, etc. Looks like a fun little project!
> > > > Thanks very much for the suggestions - I'm not familiar with Hpricot,
> > > > though I've heard of it. I'll check into it.
> > > > Can you explain what you mean by using an "attribute" to flag a page?
> > > > Would that be something like inserting <!-- noindex --> in the <head>
> > > > tag? That's all I can imagine you might mean. That *would* appear to
> > > > be a simpler way to stop page indexing than what I'd proposed.
> > > > Tom
> > > Tom,
> > > I'm talking about the metadata at the top of each page (in
> > > content/**); I wouldn't process the output files in output/**
> > > directly.
> > > For example you could do something like the following:
> > > and simply check for the `ignore' attribute on page objects.
> > > Also, rather than just writing a script that processed content/**
> > > files directly, I'd try to do it programmatically (probably in a Rake
> > > task; Tim might have some tips here) by loading webby and using
> > > Webby::Resources::DB#find to grab all the pages (seehttp://webby.rubyforge.org/rdoc/classes/Webby/Resources/DB.html#M000056),
> > > and checking for page.ignore -- and you could get the HTML output of
> > > each page for processing by calling page.render and the URL by calling
> > > page.url (seehttp://webby.rubyforge.org/manual/#h2_1_1).
> > Wow - that's a far more interesting approach than I had in mind. I
> > tend to keep things very simple - I often have no choice. Often for me
> > the question is not HOW to do something in ruby but can I do it at
> > all. I don't have much time to work on things, and have to learn WHILE
> > I'm trying to get some piece of work accomplished. It's a luxery to
> > have time to read other people's code, or to study some aspect of the
> > language simple to learn more about it. Just a fact of my life.
> > So, I'm fascinated with your suggestions, as they open up who new
> > paths of exploration and learning for me, and will likely result in
> > better results as well.
> > What I originally had in mind was simply a routine which would act
> > directly on a set of HTML files, regardless of origin. That would be
> > usable by all sorts of folks, should they wish.
> > At this point, I think I'd like to have it operate as you suggest,
> > from within Webby, because this will assist me in learning Webby more
> > quickly. Later, I can write another version which can use some of the
> > same code to realize my original concept.
> > Thanks again so much for your suggestions. I benefit greatly from
> > them.
I've tried :sort_by => 'path' but that doesn't seem to do much. It
would be nice, for us people at least, to have the site's root come
first and the rest be sorted alphabetically.
Just now, though, I see if I sort by created_at this would delete
pages where this is nil and I wouldn't have to skip them manually, so
maybe that's a better way to do it since the robots don't care about
the ordering.
On Jun 22, 2:21 pm, Tim Pease <tim.pe...@gmail.com> wrote:
> I've tried :sort_by => 'path' but that doesn't seem to do much. It > would be nice, for us people at least, to have the site's root come > first and the rest be sorted alphabetically.
> Just now, though, I see if I sort by created_at this would delete > pages where this is nil and I wouldn't have to skip them manually, so > maybe that's a better way to do it since the robots don't care about > the ordering.
> On Jun 22, 2:21 pm, Tim Pease <tim.pe...@gmail.com> wrote:
>> On Jun 18, 2008, at 3:18 PM, Ana Nelson wrote:
>>> This might be a useful convention if someone wants to write a Sitemap >>> (http://www.sitemaps.org/) generator before I get around to it. :-)
>> A site indexer would be fantastic! If you're willing to share the code >> when you're done, I'll gladly include it with the next release of webby.
>> Blessings, >> TwP
I'm way, way below you folks in skills, but I just have to say that I do NOT grasp the idea of an autogenerated site map. I don't see how that in formation is contained in the sparse matrix of hyperlinks IN a set of pages, and it cannot reliably be obtained from directory structure, since many of us don't use that notion for site organization.
But beyond all that, when I do a site map, I want the page groupings listed in MY order, not alphabetical order, and I often want some kind of brief description accompanying each page listing. I can envision how that might be all set up with metadata, but it seems easier to me to just keep a running outline of the conceptual organization of your site, and expand that into a site map.
I keep wondering if I'm missing something here. Must be.
FINALLY - when I started this threat, what I was referring to was the production of something akin to a book index, but for a website. Static search output, if you will, but browsable. Tim had some comments about how best to do this, and I liked them (and need to review them). Here's a description I recently wrote to one of my website design customers (and I expect to start this this week - ASAP) - it describes a standalone program, but I can see this as a part of Webby, easily enough:
"One sets up, as an option and not a necessity, a set of tags (keywords, we call them in other contexts) which are associated with a page, and which are put IN the page, but styled to be invisible in a browser. Burt my program can find them. The point of the tags is to call special attention to principal content. The tag words will appear in the index output in bold font, indicating a MAIN source of information - the first place a user might want to browse to.
"Regardless of whether or not a given page is tagged, all other words on the page are indexed. The results are reviewed, and meaningless words are put on a "stop" list, which causes them NOT to appear in the index.
"The output then generated shows main entries (the tags aforementioned), and all others, alphabetically, grouped by letter. Following each entry is a link to the page where this entry appears.
"It's that simple. The webmaster can direct the output by use of the tags, or not. Either way, the site user can see better with this tool than with any other way the range of topics available, all on one page. Browsable. Formatted as the webmaster desires."
It might be feasible to set this up as a rake task. That'd be cool, but it's hardly my first priority, and besides I don't yet know how to do that.
So...if someone beats me to this, cool. If not, I'll be happy to put my code out for massaging by some more capable hands, if they so wish. I just want the bloody functionality, yesterday.
t.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tom Cloyd, MS MA, LMHC Private practice Psychotherapist Bellingham, Washington, U.S.A: (360) 920-1226 << t...@tomcloyd.com >> (email) << TomCloyd.com >> (website & psychotherapy weblog) << sleightmind.wordpress.com >> (mental health issues weblog) << DirectPathDesign.TomCloyd.com >> (web site design & consultation) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> I'm way, way below you folks in skills, but I just have to say that > I do NOT grasp the idea of an autogenerated site map. I don't see > how that in formation is contained in the sparse matrix of > hyperlinks IN a set of pages, and it cannot reliably be obtained > from directory structure, since many of us don't use that notion for > site organization.
> But beyond all that, when I do a site map, I want the page groupings > listed in MY order, not alphabetical order, and I often want some > kind of brief description accompanying each page listing. I can > envision how that might be all set up with metadata, but it seems > easier to me to just keep a running outline of the conceptual > organization of your site, and expand that into a site map.
Hi,
Such an XML-based sitemap is actually meant to be used by search engines. In addition to proving a complete list of all pages on a web site (which makes hard-to-discover pages easy to find), it also allows you to set priorities for pages and can also give a hint about a page's update frequency, so spiders can fine-tune their crawl rates for a site with an XML sitemap.
My site has an auto-generated XML sitemap (meant for spiders) as well as an (auto-generated) HTML sitemap (meant for humans), and they're generated in quite different ways (they have different purposes after all).
>> I'm way, way below you folks in skills, but I just have to say that I >> do NOT grasp the idea of an autogenerated site map. I don't see how >> that in formation is contained in the sparse matrix of hyperlinks IN >> a set of pages, and it cannot reliably be obtained from directory >> structure, since many of us don't use that notion for site organization.
>> But beyond all that, when I do a site map, I want the page groupings >> listed in MY order, not alphabetical order, and I often want some >> kind of brief description accompanying each page listing. I can >> envision how that might be all set up with metadata, but it seems >> easier to me to just keep a running outline of the conceptual >> organization of your site, and expand that into a site map.
> Hi,
> Such an XML-based sitemap is actually meant to be used by search > engines. In addition to proving a complete list of all pages on a web > site (which makes hard-to-discover pages easy to find), it also allows > you to set priorities for pages and can also give a hint about a > page's update frequency, so spiders can fine-tune their crawl rates > for a site with an XML sitemap.
> My site has an auto-generated XML sitemap (meant for spiders) as well > as an (auto-generated) HTML sitemap (meant for humans), and they're > generated in quite different ways (they have different purposes after > all).
> Hope this helps!
> Denis
Thanks a bunch. I now officially 'have a clue'. Guess I'm still in the game!
Tom
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tom Cloyd, MS MA, LMHC Private practice Psychotherapist Bellingham, Washington, U.S.A: (360) 920-1226 << t...@tomcloyd.com >> (email) << TomCloyd.com >> (website & psychotherapy weblog) << sleightmind.wordpress.com >> (mental health issues weblog) << DirectPathDesign.TomCloyd.com >> (web site design & consultation) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~