more on sitemaps

5 views
Skip to first unread message

Billy Gray

unread,
Jun 1, 2010, 5:54:52 PM6/1/10
to stati...@googlegroups.com
An update on my previous inquiry. Admittedly, 'tis a bit long-winded, as it involves a lot of poking at staticmatic internals.

I was able to hack out a sitemap building scheme by adding some methods and a small modification to Andrew Neal's navigation_helper.rb, and staticmatic does properly generate a 'sitemap.xml' document from a 'sitemap.xml.haml' doc, but in the end I find myself playing a bit of whack-a-mole with staticmatic between the render and the build process. Stephen, if you have any advice, I'd be much obliged.

In navigation_helper.rb:

Had to mess with the scan_directory method to ignore partials, and it's quite possible that this screws up the intended use of navigation_helper, which is not a concern for me, as I set up my own nav helpers:

  def scan_directory(path, parent=nil, depth=1)
    # find or create index.haml file
    dir = index_file(path, depth-1)
    dir.parent = parent
    
    Dir[path].each do |item|
      if File.directory?(item)
        index = scan_directory(File.join(item,"*"), dir, depth+1)
        dir.add_child(index)
      else
        # be sure to skip the index file!
        if item.index(INDEX_PATTERN)
          next
        end
        
        if item.split('/').last =~ /^_.*$/ #ignore partials...
          puts "ignoring partial #{item}"
          next
        end
        # next if item.index(INDEX_PATTERN)
        page = create_page(item, depth)
        dir.add_child(page)
      end
    end
    dir.child.sort
    connect_siblings(dir.child)
    return dir
  end

This is important because navigation_helper sets up a @@pages variable that looks like it will be an Array, but in practice it often contains a data structure where @@pages itself is a NavigationHelper::Page object which may or may not have children. The child nodes can be Page objects, or another array of children, which makes sense, and is very useful for creating a sitemap! 

Adding the following methods to navigation_helper.rb finishes the job:

  # top-level method to output one big sitemap xml string
  def xml_sitemap
    setup
    tag(:url_set, :xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') do
      output = "\n" + xml_traverse(@@pages) { |page| xml_url_node(page) }
    end
  end
  
  # recursive traversal of child notes
  def xml_traverse(page, &block)
    return unless block_given?
    output = yield( page ) + "\n"
    if (page.has_children?)
      output << page.child.collect{ |child| xml_traverse(child, &block) }.join
    end
    output
  end
  
  def xml_url_node(page)
    tag(:url) do
      "\n\t" + 
      [
        tag(:loc) { 'http://www.zetetic.net' + page.relative },
        tag(:lastmod) { Time.now.strftime('%Y-%m-%d') },
        tag(:changefreq) { 'weekly' },
      ].join("\n\t") + "\n"
    end
  end

Then I created the following document at src/pages/sitemap.haml:

- @layout = 'blank'
= xml_sitemap

I had to create a literally blank layout document that simply contains '= yield'. This gives me:

<url> 
<lastmode>2010-06-01</lastmode> 
<changefreq>weekly</changefreq> 
</url> 
...
</url_set>

Great success! All my pages are properly listed.

The only problem here is that when I build, staticmatic will build sitemap.html, and I really want sitemap.xml. Now we get into the monkey-patching. I was trying to do as little as possible from within config/site.rb without having to re-write entire methods. But the further I go into this, the more I think it would require some serious refactoring to support. Anyway:

First I set about messing with the build process to get what I wanted: if the template was named 'sitemap.xml.haml', then it should build 'sitemap.xml'. To do this, I modified StaticMatic::BuildMixin so that generate_site_file wouldn't do any messing with the extension, allowing save_page to make the call:

module StaticMatic::BuildMixin
  def save_page(filename, content)
    extension = File.extname(filename)
    filename << '.html' if extension == ''
    generate_site_file(filename, content)
  end

  def save_stylesheet(filename, content)
    generate_site_file(File.join('stylesheets', filename) + '.css', content)
  end

  def generate_site_file(filename, content)
    path = File.join(@site_dir,filename)
    FileUtils.mkdir_p(File.dirname(path))
    File.open(path, 'w+') do |f|
      f << content
    end
    
    puts "created #{path}"
  end
end

It's cheap, but it works. I re-name sitemap.haml to sitemap.xml.haml and voila, I've got what I need.

For building. When you run 'staticmatic build <dir>' , the file site/sitemap.xml will be properly generated. But it won't ever get updated again until you remove it yourself, because now we've created a static file who's name doesn't match the 'filename' recognized internally by staticmatic during render and subsequent builds (in preview mode the module attempts to find a template matching the path name - after chopping off the extension, and the in build mode the module tries to avoid obliterating truly static files like your images, etc).

So, instead of just deleting the file before each re-build, I figured I'd try and get staticmatic to do the right thing. I started down that road by modifying the template_exists? method just to see how this would all break down:

module StaticMatic  
  class Base    
    def template_exists?(name, dir = '')
      File.exists?(File.join(@src_dir, 'pages', dir, "#{name}.haml")) || 
      File.exists?(File.join(@src_dir, 'stylesheets', "#{name}.sass")) || 
      File.exists?(File.join(@src_dir, 'stylesheets', "#{name}.scss")) ||
      File.exists?(File.join(@src_dir, 'pages', dir, "#{name}.xml.haml")) || # kludge to support sitemap.xml.haml :-/
      File.exists?(File.join(@src_dir, 'pages', dir, "#{name}.html.haml")) # more kludge-in' just to try it
    end
  end
end

This causes staticmatic during build mode to successfully determine that, yes, "staticmatic.xml.haml" does exist. But it breaks down during render:

StaticMatic::Server#call chops off the extension name of the request (so localhost:3000/sitemap.xml => sitemap), so then I monkey-patched generate_html_with_layout to accept a file extension param:

module StaticMatic::RenderMixin
  def generate_html_with_layout(source, extension = '', source_dir = '')
    source = "#{source}.#{extension}" if extension != ''
    @current_page = File.join(source_dir, "#{source}.html")
    @current_file_stack.unshift(File.join(source_dir, "#{source}.haml"))
    begin 
      template_content = generate_html(source, source_dir)
      generate_html_from_template_source(source_for_layout) { template_content }
    rescue Exception => e
      render_rescue_from_error(e)
    ensure
      clear_template_variables!
      @current_page = nil
      @current_file_stack.shift
    end
  end
end

This doesn't work, despite modifying Server#call to pass it in:

module StaticMatic
  class Server
    def call(env)
      @staticmatic.load_helpers
      path_info = env["PATH_INFO"]

      file_dir, file_name, file_ext = expand_path(path_info)

     ... skip...

      begin
        if file_ext == "css"
          res.write @staticmatic.generate_css(file_name, file_dir)
        else
          res.write @staticmatic.generate_html_with_layout(file_name, file_ext, file_dir)
        end
      rescue StaticMatic::Error => e
        res.write e.message
      end

      res.finish
    end
  end
end

It's a bad kludge, but it does get me closer to seeing the problem, as requesting 'localhost:3000/sitemap' (no extension) gives me:

Errno::ENOENT
No such file or directory - ./src/pages/sitemap.html.haml
Stack Trace
/opt/local/lib/ruby/gems/1.8/gems/staticmatic-0.11.0.alpha.5/bin/../lib/staticmatic/template_error.rb:10:in `read'
/opt/local/lib/ruby/gems/1.8/gems/staticmatic-0.11.0.alpha.5/bin/../lib/staticmatic/template_error.rb:10:in `initialize'
/opt/local/lib/ruby/gems/1.8/gems/staticmatic-0.11.0.alpha.5/bin/../lib/staticmatic/mixins/render.rb:30:in `new'
/opt/local/lib/ruby/gems/1.8/gems/staticmatic-0.11.0.alpha.5/bin/../lib/staticmatic/mixins/render.rb:30:in `generate_html'
(eval):64:in `generate_html_with_layout'
(eval):103:in `call'
... snip ...

Note that it is looking for the wrong template file. 

I think to really make this happen, I'll have to end up doing a LOT of re-wiring under the hood here. I'm not sure it's entirely worth it. Stephen, if you have a moment, can you tell me if I am barking up the right tree here?

I'm starting to think maybe it would be better if I added a structure like the NavigationHelper::Page/@@pages to staticmatic itself, and using it from within the rest of the build and render process to determine what the basename of the file is, if there is a secondary extension indicating content type, etc.

Sorry for the long story, and thanks for reading!

--
Billy Gray
http://zetetic.net

Stephen Bartholomew

unread,
Jun 1, 2010, 6:08:02 PM6/1/10
to StaticMatic
Hey Billy,

You're absolutely right that the internal structure of staticmatic
makes this kind of thing a real pita - I've been finding the same. If
you're prepared to wait a week or two, I'll be pushing up some *major*
changes to the way things work which will make this really easy:

1) Proper internal representation of source files - you'll have access
to something like: @staticmatic.pages which will give you an array of
pages represented as objects that you can pull data from.
2) Build file extensions - this is a long time coming, but you'll be
able to name your file: sitemap.xml.haml - and that's the file that
will be built.

Your site map will be dead simple then: create a file called
sitemap.xml.haml and iterate @staticmatic.pages in haml to produce
your XML sitemap.

I'm working on documentation for the next few days, but these features
are top of my list after that. Seems a shame to waste all your hard
work, but hopefully this will give you a more elegant solution!

Any thoughts?

Steve
> /opt/local/lib/ruby/gems/1.8/gems/staticmatic-0.11.0.alpha.5/bin/../lib/sta ticmatic/mixins/render.rb:30:in

Billy Gray

unread,
Jun 2, 2010, 9:47:24 AM6/2/10
to stati...@googlegroups.com
Hi Stephen,

That sounds like just the thing! I'm not in any hurry over here, but if you think there's any small part of it you'd like some help with, don't be afraid to ask.

Thanks so much!
Billy

--
You received this message because you are subscribed to the Google Groups "StaticMatic" group.
To post to this group, send email to stati...@googlegroups.com.
To unsubscribe from this group, send email to staticmatic...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/staticmatic?hl=en.

Reply all
Reply to author
Forward
0 new messages