Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
static site indexer filter
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  15 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
tomcloyd  
View profile  
 More options Apr 22 2008, 7:16 pm
From: tomcloyd <t...@tomcloyd.com>
Date: Tue, 22 Apr 2008 16:16:09 -0700 (PDT)
Local: Tues, Apr 22 2008 7:16 pm
Subject: static site indexer filter
In lieu of a dynamic search facility, some sites work better (I think)
with a site index, although these are not so often seen. It's an older
metaphor, but still a very familiar one, so users ought not to have
difficulty with it. It's also conceptually easier to set up, and lower
cost to run.

I don't know of any existing routine which could be used as a final
filter in Webby, to set up this index - is there one? I'm amusing
myself thinking about the fun of writing one. As a beginning Ruby
programmer (I have more experience with several other languages), it
looks like a fairly easy, and useful project.

Any thoughts, anyone?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Pease  
View profile  
 More options Apr 24 2008, 10:51 pm
From: Tim Pease <tim.pe...@gmail.com>
Date: Thu, 24 Apr 2008 20:51:19 -0600
Local: Thurs, Apr 24 2008 10:51 pm
Subject: Re: [webby] static site indexer filter
On Apr 22, 2008, at 5:16 PM, tomcloyd wrote:

> In lieu of a dynamic search facility, some sites work better (I think)
> with a site index, although these are not so often seen. It's an older
> metaphor, but still a very familiar one, so users ought not to have
> difficulty with it. It's also conceptually easier to set up, and lower
> cost to run.

> I don't know of any existing routine which could be used as a final
> filter in Webby, to set up this index - is there one? I'm amusing
> myself thinking about the fun of writing one. As a beginning Ruby
> programmer (I have more experience with several other languages), it
> looks like a fairly easy, and useful project.

> Any thoughts, anyone?

There was this little challenge that I threw a week or two ago.

<http://groups.google.com/group/webby-forum/browse_thread/thread/b3469...
 >

Don't know if a sitemap is the same concept of your site index, but  
they sound very similar. If you feel like coding this up, I'm sure  
others would find it useful, too.

Blessings,
TwP


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bruce Williams  
View profile  
 More options Apr 25 2008, 12:10 am
From: "Bruce Williams" <br...@codefluency.com>
Date: Thu, 24 Apr 2008 23:10:04 -0500
Local: Fri, Apr 25 2008 12:10 am
Subject: Re: [webby] Re: static site indexer filter

Incidently, I had to do this just today for some documentation at work:
  http://pastie.caboo.se/186595

Cheers,
Bruce

---
Bruce Williams
http://codefluency.com
twitter: wbruce


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tomcloyd  
View profile  
 More options Apr 27 2008, 7:49 am
From: tomcloyd <t...@tomcloyd.com>
Date: Sun, 27 Apr 2008 04:49:40 -0700 (PDT)
Local: Sun, Apr 27 2008 7:49 am
Subject: Re: static site indexer filter

On Apr 24, 9:10 pm, "Bruce Williams" <br...@codefluency.com> wrote:

This looks interesting, and useful, BUT it's not a site index, it's a
site *map*. I will likely make use of it - thanks!

What I have in mind would work like this:

1. All pages files in a target directory would be processed, except
for those one an "ignore" list.
2. All content between a list of tags would be processed. Such a list
might look like
<h1>
<div id="maincontent">
<div id="sidebarRight">
etc...
3. Each word in the target areas on each page processed would, if not
already there become a key in the index hash. The associated value for
each key would be an array containing the relative URL and title of
each page where the word is found. Obviously, one would need to create
and increment, over time, a "stop list" of words which do NOT go into
the index (because they are trivial, irrelevant, etc.
4. The index hash is then output in HTML as an alphabetized list of
words, with associated page title links.

I would expect to run this routine, and aggressively move indexed
words to the stop list, leaving a selected list of important words to
be in the index.

The advantages of this is that it's simple, could be set up on any
site (no server database needed), and it uses a metaphor (a book
index) with which people are familiar, can easily be updated at any
time, and one has complete control over content, both via the "stop
list" and manual editing.

So...if I don't find something that does this, or someone beats me to
it, I'll probably take a stab at doing this myself. It's within my
capability, which I certainly cannot say about many things I'd like to
do with Ruby (which is one of may reason I really, really like Webby -
'cause it's so much better than anything I might have even conceived
of doing).

This sort of thing would work best on a site which is focused on
written content - the kind of sites I run and build for others. It
wouldn't be appropriate for all sites, surely.

Any thoughts or reactions, anyone?

Tom


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bruce Williams  
View profile  
 More options Apr 27 2008, 11:07 am
From: "Bruce Williams" <br...@codefluency.com>
Date: Sun, 27 Apr 2008 10:07:14 -0500
Local: Sun, Apr 27 2008 11:07 am
Subject: Re: [webby] Re: static site indexer filter

Sounds like a good idea :-)

I'd use an attribute on each page you'd like to ignore to flag it (vs
maintaining a separate ignore list), Hpricot to yank out content to
process, etc.  Looks like a fun little project!

Cheers,
Bruce

---
Bruce Williams
http://codefluency.com
twitter: wbruce


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tomcloyd  
View profile  
 More options Apr 27 2008, 11:59 am
From: tomcloyd <t...@tomcloyd.com>
Date: Sun, 27 Apr 2008 08:59:51 -0700 (PDT)
Local: Sun, Apr 27 2008 11:59 am
Subject: Re: static site indexer filter

On Apr 27, 8:07 am, "Bruce Williams" <br...@codefluency.com> wrote:

Thanks very much for the suggestions - I'm not familiar with Hpricot,
though I've heard of it. I'll check into it.

Can you explain what you mean by using an "attribute" to flag a page?
Would that be something like inserting <!-- noindex --> in the <head>
tag? That's all I can imagine you might mean. That *would* appear to
be a simpler way to stop page indexing than what I'd proposed.

Tom


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bruce Williams  
View profile  
 More options Apr 27 2008, 12:15 pm
From: "Bruce Williams" <br...@codefluency.com>
Date: Sun, 27 Apr 2008 11:15:37 -0500
Local: Sun, Apr 27 2008 12:15 pm
Subject: Re: [webby] Re: static site indexer filter

Tom,

I'm talking about the metadata at the top of each page (in
content/**); I wouldn't process the output files in output/**
directly.

For example you could do something like the following:

---
title:      Foo Bar
created_at: 2008-04-18 22:40:00 -06:00
ignore: true
filter:
  - textile

and simply check for the `ignore' attribute on page objects.

Also, rather than just writing a script that processed content/**
files directly, I'd try to do it programmatically (probably in a Rake
task; Tim might have some tips here) by loading webby and using
Webby::Resources::DB#find to grab all the pages (see
http://webby.rubyforge.org/rdoc/classes/Webby/Resources/DB.html#M000056),
and checking for page.ignore -- and you could get the HTML output of
each page for processing by calling page.render and the URL by calling
page.url (see http://webby.rubyforge.org/manual/#h2_1_1).

Cheers,
Bruce

---
Bruce Williams
http://codefluency.com
twitter: wbruce


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tomcloyd  
View profile  
 More options Apr 27 2008, 9:35 pm
From: tomcloyd <t...@tomcloyd.com>
Date: Sun, 27 Apr 2008 18:35:38 -0700 (PDT)
Local: Sun, Apr 27 2008 9:35 pm
Subject: Re: static site indexer filter

On Apr 27, 9:15 am, "Bruce Williams" <br...@codefluency.com> wrote:

Wow - that's a far more interesting approach than I had in mind. I
tend to keep things very simple - I often have no choice. Often for me
the question is not HOW to do something in ruby but can I do it at
all. I don't have much time to work on things, and have to learn WHILE
I'm trying to get some piece of work accomplished. It's a luxery to
have time to read other people's code, or to study some aspect of the
language simple to learn more about it. Just a fact of my life.

So, I'm fascinated with your suggestions, as they open up who new
paths of exploration and learning for me, and will likely result in
better results as well.

What I originally had in mind was simply a routine which would act
directly on a set of HTML files, regardless of origin. That would be
usable by all sorts of folks, should they wish.

At this point, I think I'd like to have it operate as you suggest,
from within Webby, because this will assist me in learning Webby more
quickly. Later, I can write another version which can use some of the
same code to realize my original concept.

Thanks again so much for your suggestions. I benefit greatly from
them.

Tom


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ana Nelson  
View profile  
 More options Jun 18 2008, 5:18 pm
From: Ana Nelson <nelson....@gmail.com>
Date: Wed, 18 Jun 2008 14:18:09 -0700 (PDT)
Local: Wed, Jun 18 2008 5:18 pm
Subject: Re: static site indexer filter
Sort of related to this thread, I have added

index: false

to the metadata of some of my pages when I don't want them to be
indexed by google, but where I can't use a robots.txt file to exclude
the entire directory. For example, I don't want google to index the
year and month archive pages of a blog, I only want the blog posts
themselves to show up.

In my layout I have:

   <% unless p['index'] || p['index'].nil? -%>
   <!-- Don't index pages with index: false in their metadata. -->
   <!-- For example, don't index blog year and month listings. -->
   <meta name="robots" content="noindex, follow" />
   <% end -%>

I use this same code in my RSS feed
   @pages.find(...).each do |p|
    next unless p['index'] || p['index'].nil?

to skip things which aren't blog posts.

This might be a useful convention if someone wants to write a Sitemap
(http://www.sitemaps.org/) generator before I get around to it. :-)

On Apr 28, 2:35 am, tomcloyd <t...@tomcloyd.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ana Nelson  
View profile  
 More options Jun 18 2008, 5:29 pm
From: Ana Nelson <nelson....@gmail.com>
Date: Wed, 18 Jun 2008 14:29:01 -0700 (PDT)
Local: Wed, Jun 18 2008 5:29 pm
Subject: Re: static site indexer filter
And of course that should be @page['index'] not p['index'] in my
layout.

(This is SO ironic. I should have generated my email text using webby
and live code.)

On Jun 18, 10:18 pm, Ana Nelson <nelson....@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Pease  
View profile  
 More options Jun 22 2008, 9:21 am
From: Tim Pease <tim.pe...@gmail.com>
Date: Sun, 22 Jun 2008 07:21:38 -0600
Local: Sun, Jun 22 2008 9:21 am
Subject: Re: [webby] Re: static site indexer filter
On Jun 18, 2008, at 3:18 PM, Ana Nelson wrote:

> This might be a useful convention if someone wants to write a Sitemap
> (http://www.sitemaps.org/) generator before I get around to it. :-)

A site indexer would be fantastic! If you're willing to share the code  
when you're done, I'll gladly include it with the next release of webby.

Blessings,
TwP


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ana Nelson  
View profile  
 More options Jun 23 2008, 4:56 pm
From: Ana Nelson <nelson....@gmail.com>
Date: Mon, 23 Jun 2008 13:56:03 -0700 (PDT)
Local: Mon, Jun 23 2008 4:56 pm
Subject: Re: static site indexer filter
I've had a first go at a sitemap:
http://github.com/ananelson/webby/commit/5f4a64df3f48479a5e6448e6253e...

Results look something like this:
http://pastie.org/220691

I've tried :sort_by => 'path' but that doesn't seem to do much. It
would be nice, for us people at least, to have the site's root come
first and the rest be sorted alphabetically.

Just now, though, I see if I sort by created_at this would delete
pages where this is nil and I wouldn't have to skip them manually, so
maybe that's a better way to do it since the robots don't care about
the ordering.

On Jun 22, 2:21 pm, Tim Pease <tim.pe...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom Cloyd  
View profile  
 More options Jun 23 2008, 5:59 pm
From: Tom Cloyd <tomcl...@comcast.net>
Date: Mon, 23 Jun 2008 14:59:18 -0700
Local: Mon, Jun 23 2008 5:59 pm
Subject: Re: [webby] Re: static site indexer filter

I'm way, way below you folks in skills, but I just have to say that I do
NOT grasp the idea of an autogenerated site map. I don't see how that in
formation is contained in the sparse matrix of hyperlinks IN a set of
pages, and it cannot reliably be obtained from directory structure,
since many of us don't use that notion for site organization.

But beyond all that, when I do a site map, I want the page groupings
listed in MY order, not alphabetical order, and I often want some kind
of brief description accompanying each page listing.  I can envision how
that might be all set up with metadata, but it seems easier to me to
just keep a running outline of the conceptual organization of your site,
and expand that into a site map.

I keep wondering if I'm missing something here. Must be.

FINALLY - when I started this threat, what I was referring to was the
production of something akin to a book index, but for a website. Static
search output, if you will, but browsable. Tim had some comments about
how best to do this, and I liked them (and need to review them). Here's
a description I recently wrote to one of my website design customers
(and I expect to start this this week - ASAP) - it describes a
standalone program, but I can see this as a part of Webby, easily enough:

"One sets up, as an option and not a necessity, a set of tags (keywords,
we call them in other contexts) which are associated with a page, and
which are put IN the page, but styled to be invisible in a browser. Burt
my program can find them. The point of the tags is to call special
attention to principal content. The tag words will appear in the index
output in bold font, indicating a MAIN source of information - the first
place a user might want to browse to.

"Regardless of whether or not a given page is tagged, all other words on
the page are indexed. The results are reviewed, and meaningless words
are put on a "stop" list, which causes them NOT to appear in the index.

"The output then generated shows main entries (the tags aforementioned),
and all others, alphabetically, grouped by letter. Following each entry
is a link to the page where this entry appears.

"It's that simple. The webmaster can direct the output by use of the
tags, or not. Either way, the site user can see better with this tool
than with any other way the range of topics available, all on one page.
Browsable. Formatted as the webmaster desires."

It might be feasible to set this up  as a rake task.  That'd be cool,
but it's hardly my first priority, and besides I don't yet know how to
do that.

So...if someone beats me to this, cool. If not, I'll be happy to put my
code out for massaging by some more capable hands, if they so wish. I
just want the bloody functionality, yesterday.

t.

--

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tom Cloyd, MS MA, LMHC
Private practice Psychotherapist
Bellingham, Washington, U.S.A: (360) 920-1226
<< t...@tomcloyd.com >> (email)
<< TomCloyd.com >> (website & psychotherapy weblog)
<< sleightmind.wordpress.com >> (mental health issues weblog)
<< DirectPathDesign.TomCloyd.com >> (web site design & consultation)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Denis Defreyne  
View profile  
 More options Jun 24 2008, 4:54 am
From: Denis Defreyne <denis.defre...@stoneship.org>
Date: Tue, 24 Jun 2008 10:54:46 +0200
Local: Tues, Jun 24 2008 4:54 am
Subject: Re: [webby] Re: static site indexer filter
On 23 Jun 2008, at 23:59, Tom Cloyd wrote:

> I'm way, way below you folks in skills, but I just have to say that  
> I do NOT grasp the idea of an autogenerated site map. I don't see  
> how that in formation is contained in the sparse matrix of  
> hyperlinks IN a set of pages, and it cannot reliably be obtained  
> from directory structure, since many of us don't use that notion for  
> site organization.

> But beyond all that, when I do a site map, I want the page groupings  
> listed in MY order, not alphabetical order, and I often want some  
> kind of brief description accompanying each page listing.  I can  
> envision how that might be all set up with metadata, but it seems  
> easier to me to just keep a running outline of the conceptual  
> organization of your site, and expand that into a site map.

Hi,

Such an XML-based sitemap is actually meant to be used by search  
engines. In addition to proving a complete list of all pages on a web  
site (which makes hard-to-discover pages easy to find), it also allows  
you to set priorities for pages and can also give a hint about a  
page's update frequency, so spiders can fine-tune their crawl rates  
for a site with an XML sitemap.

My site has an auto-generated XML sitemap (meant for spiders) as well  
as an (auto-generated) HTML sitemap (meant for humans), and they're  
generated in quite different ways (they have different purposes after  
all).

Hope this helps!

Denis

--
Denis Defreyne
denis.defre...@stoneship.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom Cloyd  
View profile  
 More options Jun 24 2008, 8:39 am
From: Tom Cloyd <tomcl...@comcast.net>
Date: Tue, 24 Jun 2008 05:39:51 -0700
Local: Tues, Jun 24 2008 8:39 am
Subject: Re: [webby] Re: static site indexer filter

Thanks a bunch. I now officially 'have a clue'. Guess I'm still in the game!

Tom

--

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tom Cloyd, MS MA, LMHC
Private practice Psychotherapist
Bellingham, Washington, U.S.A: (360) 920-1226
<< t...@tomcloyd.com >> (email)
<< TomCloyd.com >> (website & psychotherapy weblog)
<< sleightmind.wordpress.com >> (mental health issues weblog)
<< DirectPathDesign.TomCloyd.com >> (web site design & consultation)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »