Alternative to preserve() helper?

9 views
Skip to first unread message

Wincent Colaiuta

unread,
Feb 8, 2008, 5:11:16 AM2/8/08
to Haml
Hi all!

I'm working on a fast wikitext-to-HTML translator and I'm embedding
the output in Haml templates. The trouble is that if the translated
text has a <pre> block in it:

<pre>foo
bar</pre>

And appears inside a Haml template:

.outer
.inner
=wikitext

The <pre> formatting is broken:

<div class="outer">
<div class="inner">
<pre>foo
bar</pre>
</div>
</div>

Note how the second line of the embedded <pre> block is indented and
unfortunately that means it will appear in the browser as:

foo
bar

Rather than the intended:

foo
bar

So I looked at using the preserve() helper and that does indeed
produce output that renders correctly in the browser but looks like
this in the source:

<div class="outer">
<div class="inner">
<pre>foo&#x000A;bar</pre>
</div>
</div>

While this would be fine for small blocks it isn't so nice for large
slabs of text, and large slabs of text are the most common use in a
wiki, unfortunately. The wikitext translator goes to some effort to
emit nicely aligned, easy-to-debug output text (much like Haml!) so it
is a shame to see it transformed into a big blob of hard to decipher
gibberish.

The other thing is that the wikitext parser is a C extension written
with speed as its number one design goal. It seems a shame to slow
things down by having Haml make another sweep over that text when it's
already been processed. Basically I am looking for a way of outputting
a block of literal text without Haml doing any manipulation of it at
all.

So I tried to investigate alternatives to the preserve() helper. I
tried to write my own helper method which basically did the following
(in psuedo-code):

def preserving &block
tab_down back to 0
yield
tab_up back to where you were bfore
end

But that didn't work. Inspection of the Haml source code revealed that
there's "tabulation" but also "real_tabs" which get prepend. So my
next attempt was the following hack which directly manipulated the
@tabulation and @real_tabs instance variables (hideous I know):

def preserving &block
tabs = buffer.instance_variable_get :@tabulation
real_tabs = buffer.instance_variable_get :@real_tabs
buffer.instance_variable_set :@tabulation, 0
buffer.instance_variable_set :@real_tabs, 0
yield
buffer.instance_variable_set :@tabulation, tabs
buffer.instance_variable_set :@real_tabs, real_tabs
end

But unfortunately that didn't work either. There was still some
additional whitespace coming in from somewhere.

That was at 2:30 AM in the morning, so my investigations ended there.

Now it's a new day though, so I thought I'd ask here about
alternatives to preserve(). Is there a non-hacky way to do what I
want?

And ultimately, seeing as what I am looking for is a way to say to
Haml "render the following text without touching it _at all_", would
you accept a patch to introduce such a method if I can come up with
one?

Cheers,
Wincent

Wincent Colaiuta

unread,
Feb 8, 2008, 8:27:36 AM2/8/08
to Haml
Ok, I've now done a bit more investigation and now understand what
Haml is doing under the covers.

The reason why my hack manipulating the tabulation didn't work is
because of the way templates are nested in Rails with Haml.

Basically, let's say you suppress the tabulation in your "show"
template using the hack shown above. I've found that it's only
necessary to set @real_tabs to 0, as in the end what's going to get
called is push_script() and that only cares about @real_tabs, not
@tabulation.

But what happens next is that your application layout gets rendered,
and inside it is a yield call that gets the output of the precompiled
"show" template. This means that the output goes through push_script()
a second time and you get unwanted indentation again.

This is a tricky issue: you need some way for nested templates to
communicate upwards to the templates that enclose them, and advise
them not to meddle with the tabulation.

I've got a new hack that works, but it is terribly ugly. Here it is:

Here's a snippet of my "show" template:

-preserving do
=@the_wikitext

And here's the updated preserving helper:

def preserving &block
real_tabs = buffer.instance_variable_get :@real_tabs
buffer.instance_variable_set :@real_tabs, 0
buffer.buffer << "<!-- Haml: start pre -->\n"
yield
buffer.buffer << "<!-- Haml: end pre -->\n"
buffer.instance_variable_set :@real_tabs, real_tabs
end

Notice how it not only suppresses indentation; it is also inserting
HTML comments into the buffer which will be visible to any higher-
level template which encloses it.

And finally, here is the patch to the push_script method that scans
for these HTML comments and acts accordingly:

diff --git a/vendor/plugins/haml/lib/haml/buffer.rb b/vendor/plugins/
haml/lib/haml/buffer.rb
index 4bdac1d..5e8638c 100644
--- a/vendor/plugins/haml/lib/haml/buffer.rb
+++ b/vendor/plugins/haml/lib/haml/buffer.rb
@@ -78,8 +78,22 @@ module Haml
@buffer << "\n"
end

- result = result.gsub(/^/m, tabs(tabulation))
- @buffer << "#{result}\n"
+ tab = tabs(tabulation)
+ pre = false
+ result.each do |line|
+ case line
+ when /<!-- Haml: start pre -->/
+ pre = true
+ when /<!-- Haml: end pre -->/
+ pre = false
+ else
+ if pre
+ @buffer << "#{line}"
+ else
+ @buffer << "#{tab}#{line}"
+ end
+ end
+ end

if close_tag
@buffer << "#{tabs(tabulation-1)}</#{close_tag}>\n"

Yes, it is a horrible hack, and probably quite slow, but it works. I'd
still like to find a more elegant solution though. I can't think of a
way to communicate this stuff back up into enclosing templates without
embedding some kind of marker like that in the text.

And on a side note, a minor bug found while inspect the code in that
file:

diff --git a/vendor/plugins/haml/lib/haml/buffer.rb b/vendor/plugins/
haml/lib/haml/buffer.rb
index 4bdac1d..5e8638c 100644
--- a/vendor/plugins/haml/lib/haml/buffer.rb
+++ b/vendor/plugins/haml/lib/haml/buffer.rb
@@ -152,7 +166,7 @@ module Haml
# Gets <tt>count</tt> tabs. Mostly for internal use.
def tabs(count)
tabs = count + @tabulation
- ' ' * tabs
+ ' ' * tabs # BUG: <-- this line has no effect
@@tab_cache[tabs] ||= ' ' * tabs
end

Note how the second line has no side effects so is really just wasting
time. It effectively renders the optimization that that method tries
to implement (avoiding the recalculation of the tab text) ineffective,
because it recalculates it anyway and just throws it away.

Cheers,
Wincent

Nathan Weizenbaum

unread,
Feb 8, 2008, 9:32:21 AM2/8/08
to ha...@googlegroups.com
This is why we use the newline escapes: nested templates don't touch them at all. Note that your solution still falls down if you have a three-deep nested template.

If you're using Rails, the best solution would probably be to post-process the template and change all newline escapes to literal newlines.

As for the extra work in tabs, that's certainly not supposed to be there. I've gotten rid of it in the repo.

- Nathan

Wincent Colaiuta

unread,
Feb 8, 2008, 12:05:57 PM2/8/08
to Haml
On 8 feb, 15:32, "Nathan Weizenbaum" <nex...@gmail.com> wrote:
> This is why we use the newline escapes: nested templates don't touch them at
> all. Note that your solution still falls down if you have a three-deep
> nested template.

I hadn't tested that so I'll take your word for it. I imagine I could
avoid that by echoing the markers through rather than suppressing
them; I probably should have done that anyway. Of course, even if it
works with any level of nesting it's still a hack.

I have another nascent idea that may work and which I like much better
than this hacky one I've shown so far. Basically if I can train
push_script to accept an Array parameter rather than a straight String
then it could act in a way that preserves whitespace when appropriate
and leaves it untouched otherwise through any number of nested
template levels.

Basically it would act as follows:

* if param is a String, act just like it does now
* if param is an Array, iterate and handle each item as it does now
* if param (or any item during iteration) is a special String
subclass, say PreservingString, suppress the tabulation and instead of
appending the result to the existing @buffer, start a new one marked
as preserving whitespace and stick the result in there (not sure yet
whether this should be a Haml::Buffer subclass or just a normal
instance with an attribute set)
* this would be accompanied by a helper method that instantiates one
of these PreservingString instances

Then elsewhere in the code (at the Engine level I guess, haven't
explored it yet) I'd need to make changes so that it was prepared to
handle multiple buffers instead of just one. Evidently I'd need to
change the code wherever it assumed a single buffer and replace that
with an Array. Probably haven't explained my idea very well but
hopefully you get some idea of what I'm suggesting.

It's just an idea which I haven't tried to put into practice yet so it
might be flawed.

Cheers,
Wincent

Nathan Weizenbaum

unread,
Feb 10, 2008, 1:31:37 AM2/10/08
to ha...@googlegroups.com
This seems like a lot of extra work for a relatively minor issue, and it
might significantly affect performance - code needs to be really tight
when dealing with the buffer stuff. I really think the best way to deal
with this is to monkeypatch a post-processor into #render.

- Nathan

Wincent Colaiuta

unread,
Feb 10, 2008, 9:12:51 AM2/10/08
to Haml
On 10 feb, 07:31, Nathan Weizenbaum <nex...@gmail.com> wrote:
> This seems like a lot of extra work for a relatively minor issue, and it
> might significantly affect performance - code needs to be really tight
> when dealing with the buffer stuff. I really think the best way to deal
> with this is to monkeypatch a post-processor into #render.

I've never done that before but I'll look into it.

Still, a bit of a shame. I've gone into a lot of effort to make the
wikitext-to-HTML translator very fast and produce very nicely
formatted output. I haven't finished optimizing it yet but it's
currently capable of translating about 10 megabytes of wikitext markup
per second on my lowly 1.83 MHz Core Duo iMac. So it's a bit of a
shame to see that effort counteracted by two essentially redundant
additional phases (one to insert the newline escapes and another to
remove them), but if it's the way it has to be, then I guess it's the
way it has to be!

Cheers,
Wincent

Nathan Weizenbaum

unread,
Feb 10, 2008, 2:23:26 PM2/10/08
to ha...@googlegroups.com
The insertion and removal of the escapes aren't really redundant -
they're a way to avoid Haml indenting your pres. It's not a lot
different than adding and removing comments for the same purpose.

- Nathan

Reply all
Reply to author
Forward
0 new messages