wiki markup pain

blinder

unread,

Jun 23, 2009, 2:11:41 PM6/23/09

to Gitorious

trying to make a wiki page, that includes markup. first i was
disappointed to see there's no support for tables. makes making
sophisticated pages a little impossible, but oh well, it is what it
is.

but what has been a real pain is dealing with code markup. i'm trying
to make a wiki page that includes XML markup (which i want to display
as code) well, nothing doing.

if i uses HTML entities (< >) they don't get translated to their
appropriate characters, and if i don't use entities, well then you
can't see the tags.

I just wish this worked like mediawiki or even (gasp) google code's
wiki system. any hope of getting a better markup system than markdown
(which seems rather incomplete) ???

Johan Sørensen

unread,

Jun 24, 2009, 4:11:37 AM6/24/09

to gito...@googlegroups.com

On Tue, Jun 23, 2009 at 8:11 PM, blinder<blinde...@gmail.com> wrote:
> but what has been a real pain is dealing with code markup. i'm trying
> to make a wiki page that includes XML markup (which i want to display
> as code) well, nothing doing.
>
> if i uses HTML entities (< >) they don't get translated to their
> appropriate characters, and if i don't use entities, well then you
> can't see the tags.

In what format are you inserting it in?

JS

Waylan Limberg

unread,

Jun 24, 2009, 8:54:09 AM6/24/09

to Gitorious

On Jun 23, 2:11 pm, blinder <blinder.d...@gmail.com> wrote:
> I just wish this worked like mediawiki or even (gasp) google code's
> wiki system. any hope of getting a better markup system than markdown
> (which seems rather incomplete)

Actually, (at least IMO) markdown is great. It's just that the
Gitorious implementation is broken. For obvious safety reasons, they
are not allowing raw HTML (although markdown does by default). The
problem is that code blocks should allow HTML, XML and its variants as
markdown will escape those. However, whatever Gitrious is using to
strip raw HTML is a little to zealous and strips the code blocks as
well. In fact, it is so zealous that code blocks (python, ruby,
whatever) which contain a lot of greater-than and less-than logic will
have whole sections of code missing.

Now, I will note that some of those greater-than and less-than signs
will make it through. I suspect that the logic of the stripper could
use some tweeking. However, why it is even looking at the code blocks?

Until this is fixed, I consider the wiki unusable.

Waylan Limberg

unread,

Jun 24, 2009, 12:00:59 PM6/24/09

to Gitorious

On Jun 24, 8:54 am, Waylan Limberg <way...@gmail.com> wrote:
>
> Now, I will note that some of those greater-than and less-than signs
> will make it through. I suspect that the logic of the stripper could
> use some tweeking. However, why it is even looking at the code blocks?
>

As an example of the problem, some time ago I made this page which
demonstrates the simplest case where the problem appears:

http://gitorious.org/python-markdown/pages/Sandbox

The source of that page looks like this:

foo bar

some code '<'

blah foo

'<more' code

baz bar

blah >

As you can see, everything between and including `<more` and `blah >`
is striped. It even strips across multiple blocks and paragraphs.
Again, why is it even looking at the code blocks?!?

Johan Sørensen

unread,

Jun 30, 2009, 5:20:36 AM6/30/09

to gito...@googlegroups.com

On Wed, Jun 24, 2009 at 6:00 PM, Waylan Limberg<way...@gmail.com> wrote:
> http://gitorious.org/python-markdown/pages/Sandbox
>
> The source of that page looks like this:
>
> foo bar
>
> some code '<'
>
> blah foo
>
> '<more' code
>
> baz bar
>
> blah >
>
> As you can see, everything between and including `<more` and `blah >`
> is striped. It even strips across multiple blocks and paragraphs.
> Again, why is it even looking at the code blocks?!?

You tell me:

$ echo " code '<'" > foo.txt
$ cat foo.txt | Markdown.pl; # Grubers Markdown.pl
<pre><code>code '<'
</code></pre>
$ markdown foo.txt; # python-markdown
<pre><code>code '<'
</code></pre>

With your example text:
$ diff -u python.markdown gruber.markdown
--- python.markdown 2009-06-30 11:17:51.000000000 +0200
+++ gruber.markdown 2009-06-30 11:17:57.000000000 +0200
@@ -1,9 +1,14 @@
foo bar
+
<pre><code>some code '<'
</code></pre>
+
blah foo
+
<pre><code>'<more' code
</code></pre>
+
baz bar
+
<pre><code>blah >
-</code></pre>
\ No newline at end of file
+</code></pre>

But I agree it doesn't make sense really, if it's ending up wrapped in
<pre><code> combo browsers will afaik escape it. But you probably know
more about Markdown than I do.

JS

Waylan Limberg

unread,

Jul 1, 2009, 1:14:54 PM7/1/09

to Gitorious

On Jun 30, 5:20 am, Johan Sørensen <jo...@johansorensen.com> wrote:
>
> With your example text:
> $ diff -u python.markdown gruber.markdown
> --- python.markdown 2009-06-30 11:17:51.000000000 +0200
> +++ gruber.markdown 2009-06-30 11:17:57.000000000 +0200

[snip]

The issue is not between the various markdown implementations (which
in your diff consists of a few differences with insignificant
whitespace), but rather, between what Gitorious outputs and what is
expected. Consider this diff using the same example source:

--- gruber.markdown 2009-07-01 13:02:23.000000000 -0400
+++ gitorious.markdown 2009-07-01 12:59:36.000000000 -0400
@@ -1,15 +1,10 @@
foo bar

-<pre><code>some code '<'
+<pre><code>some code '&lt;'
</code></pre>

blah foo

-<pre><code>'<more' code
-</code></pre>
-
-baz bar
-
-<pre><code>blah >
+<pre><code>'
</code></pre>

I have entire sections of paragraphs completely missing from my
documents because those sections are between code blocks with a few <
and > signs in them. This is unacceptable and renders the Gitourious
wiki unusable.

> But I agree it doesn't make sense really, if it's ending up wrapped in
> <pre><code> combo browsers will afaik escape it.

Well, actually the markdown parser will take care of escaping it in a
way that the browser can understand, but yes, no other code should be
trying to escape the code blocks before or after markdown runs on the
text.

Waylan Limberg

unread,

Jul 1, 2009, 2:14:39 PM7/1/09

to Gitorious

On Jun 30, 5:20 am, Johan Sørensen <jo...@johansorensen.com> wrote:

> But you probably know more about Markdown than I do.
>

I do know that most implementations have a HTML sanitizer built in to
avoid these kinds of problems. In fact, it appears that RDiscount does
- which you should be using instead of sanitizing it yourself. I
finally looked at the code to see what Gitorious was doing.

Now, I'm not all that familiar with ruby and haven't tested this, but
I believe you want to make this change:

--- app/helpers/pages_helper.rb 2009-06-23 08:54:31.000000000 -0400
+++ app/helpers/pages_helper.rb 2009-07-01 14:04:08.000000000 -0400
@@ -22,7 +22,7 @@

def wikize(content)
content = wiki_link(content)
- rd = RDiscount.new(sanitize(content), :smart, :generate_toc)
+ rd = RDiscount.new(content, :smart, :filter_html, :generate_toc)
content = content_tag(:div, rd.to_html, :class => "page-content")
toc_content = rd.toc_content
if !toc_content.blank?

Notice that I removed the call to `sanitize` (which is not markdown
aware and causing the problem) and added the `:filter_html` attribute.
The builtin html filter should accomplish the same thing as sanitize,
but in a way that won't break the markdown syntax.

Johan Sørensen

unread,

Jul 2, 2009, 6:34:45 AM7/2/09

to gito...@googlegroups.com

On Wed, Jul 1, 2009 at 7:14 PM, Waylan Limberg<way...@gmail.com> wrote:
> On Jun 30, 5:20 am, Johan Sørensen <jo...@johansorensen.com> wrote:
>>
>> With your example text:
>> $ diff -u python.markdown gruber.markdown
>> --- python.markdown 2009-06-30 11:17:51.000000000 +0200
>> +++ gruber.markdown 2009-06-30 11:17:57.000000000 +0200
> [snip]
>
> The issue is not between the various markdown implementations (which
> in your diff consists of a few differences with insignificant
> whitespace), but rather, between what Gitorious outputs and what is
> expected. Consider this diff using the same example source:
>
> --- gruber.markdown 2009-07-01 13:02:23.000000000 -0400
> +++ gitorious.markdown 2009-07-01 12:59:36.000000000 -0400
> @@ -1,15 +1,10 @@
> foo bar
>
> -<pre><code>some code '<'
> +<pre><code>some code '&lt;'
> </code></pre>

[snip]

> I have entire sections of paragraphs completely missing from my
> documents because those sections are between code blocks with a few <
> and > signs in them. This is unacceptable and renders the Gitourious
> wiki unusable.

Waylan, first off, thanks for spelling it out like this for me
(again!), really appreciate the effort. It seems I completely
misunderstood what the issue was, I thought your beef was with the
encoding of "<" to a "<", but now I understand what you mean.

>
>> But I agree it doesn't make sense really, if it's ending up wrapped in
>> <pre><code> combo browsers will afaik escape it.
>
> Well, actually the markdown parser will take care of escaping it in a
> way that the browser can understand, but yes, no other code should be
> trying to escape the code blocks before or after markdown runs on the
> text.

Actually, the issue is that we sanitize _before_ markdownizing it. We
should allow most html as per markdown, but we want to filter out any
XSS <script>'s etc. The correct way to do it is of course to pass it
through sanitize() _after_ we've run it through markdown. I'll deploy
it to Gitorious shortly.

Another thing I think I have to do, which unfortunately isn't valid
markdown, is to convert a single newline to a tag (well,
space-space-newline and let discount make the ). This is something
I've gotten numerous reports from frustrated users about, and to be
honest I kinda agree with it. What's your opinion on such a change?

Cheers,
JS

Waylan Limberg

unread,

Jul 2, 2009, 9:39:23 AM7/2/09

to Gitorious

On Jul 2, 6:34 am, Johan Sørensen <jo...@johansorensen.com> wrote:
>
> Actually, the issue is that we sanitize _before_ markdownizing it. We
> should allow most html as per markdown, but we want to filter out any
> XSS <script>'s etc. The correct way to do it is of course to pass it
> through sanitize() _after_ we've run it through markdown. I'll deploy
> it to Gitorious shortly.

Hmm, I'm not sure what this `sanitize()` does exactly (where is it in
the Gitorious source?), but I think you may need to do more than just
filter out <script> tags. See http://ha.ckers.org/xss.html. That's why
many markdown implementations provide a builtin html filter. The setup
generally recommended it to strip all html from any publicly editable
content and only allow html in markdown text provided by trusted
users. Or perhaps as all users must be logged in to edit pages you see
them as trusted enough. Obviously, that's your choice.

> Another thing I think I have to do, which unfortunately isn't valid
> markdown, is to convert a single newline to a tag (well,
> space-space-newline and let discount make the ). This is something
> I've gotten numerous reports from frustrated users about, and to be
> honest I kinda agree with it. What's your opinion on such a change?

I'm not sure I understand what your suggesting here. Are you
suggesting converting a single newline to or space-space-newline
to . Because the later is already supported and a part of the
markdown syntax. However, the former is not markdown syntax and
generaly *not* considered an acceptable addon by the markdown
community.

Remember that some of us still use text editors like vim or emacs and
don't have word-wrapping turned on. We have manual breaks throughout
every paragraph. If those newlines were to suddenly turn into s we
would be as annoyed, if not more so, than others are now by the
current behavior. The current behavior (ignoring a single newline) is
*a feature* and one of the things I love about markdown over other
markup languages.

That said, I do agree that the current markdown syntax for manual
 s is not optimal as it is only defined by whitespace and therefore
completely non-obvious to the editor of an existing document.
Therefore, my suggestion would be to introduce a secondary syntax as
an alternate way to define a that does not rely on whitespace
only (perhaps a slash followed by a newline as some other markup
languages use). That way, those of us familiar with markdown syntax
can continue to use the standard markdown syntax, while others can use
the added syntax if they desire.

Waylan Limberg

Diego Algorta

unread,

Jul 2, 2009, 12:02:20 PM7/2/09

to gito...@googlegroups.com

I think what Johan says is to do the same (or very similar) as what
github does. Nicely explained here:
http://github.github.com/github-flavored-markdown/

--
Diego Algorta
www.oboxodo.com

Waylan Limberg

unread,

Jul 3, 2009, 11:53:31 AM7/3/09

to Gitorious

I hope that doesn't happen -- for the previously stated reasons.

Diego Algorta

unread,

Jul 3, 2009, 12:02:10 PM7/3/09

to gito...@googlegroups.com

I see your point on supporting the standard and I think you're right.
But github (and johan) have a point too. Maybe gitorious could support
the "github-flavored-markdown" (maybe johan can talk to the github
guys and agree to rename it as developers-flavored-markdown) as an
opt-in?

I think best option would be to have a default option at a
user/project/whatever level and then be able to override it at the
wiki page level.

What you both say?

Diego

Yuri Takhteyev

unread,

Jul 3, 2009, 12:19:58 PM7/3/09

to gito...@googlegroups.com

> I see your point on supporting the standard and I think you're right.
> But github (and johan) have a point too.

Do they? The standard behavior not only has a benefit of being
standard, but also makes it possible to edit the files using editors
that do make it easy to work with long lines, which means nearly all
terminal editors.

Call me a nerd, but it seems to me that not being able to easily edit
the files through a terminal is such a huge minus, that I don't see
what can compensate for it. Plus, in cases where you really care about
line breaks, wouldn't you want the text to show <pre> anyway?

> guys and agree to rename it as developers-flavored-markdown) as an

More like "developers-who-do-not-believe-in-wrapping-at-78-characters-flavored-markdown".
:)

- yuri

--
http://spu.tnik.org/

Johan Sørensen

unread,

Jul 3, 2009, 5:35:56 PM7/3/09

to gito...@googlegroups.com

On Fri, Jul 3, 2009 at 6:19 PM, Yuri Takhteyev<takh...@gmail.com> wrote:
>> guys and agree to rename it as developers-flavored-markdown) as an
>
> More like "developers-who-do-not-believe-in-wrapping-at-78-characters-flavored-markdown".
> :)

Seems like I typed my previous email a bit too fast, I was actually
going to say that it would be limited to comments, since that's where
people tend to not think so much about the markdown format. As opposed
to wiki pages where, at least I, are more in the "markdown zone",
rather than just reacting to something and need to type it down
without worrying too much about formatting beyond links and code
snippets.

So, the newline hack would only be limited to comments, whereas the
wikipages would be standard markdown (with the sanitize fixes
discussed previously)

Yuri Takhteyev

unread,

Jul 6, 2009, 5:44:11 PM7/6/09

to gito...@googlegroups.com

> Seems like I typed my previous email a bit too fast, I was actually
> going to say that it would be limited to comments, since that's where
> people tend to not think so much about the markdown format. As opposed
> to wiki pages where, at least I, are more in the "markdown zone",
> rather than just reacting to something and need to type it down
> without worrying too much about formatting beyond links and code
> snippets.

This makes more sense.

BTW, in either case, you might want to look into MarkItUp, which has
support for Markdown:

http://markitup.jaysalvat.com/examples/markdown/

I find it to be a good compromise, offering a toolbar with buttons for
users who want them, while allowing power users to just type in the
markup.

We've integrated it into our wiki engine (http://spu.tnik.org) with
some simple customization (adding a button for wikilinks, removing
some buttons, etc), and this was fairly straightforward.

Antono Vasiljev

unread,

Jul 23, 2009, 1:42:56 AM7/23/09

to gito...@googlegroups.com

On Tue, 2009-06-23 at 11:11 -0700, blinder wrote:

> I just wish this worked like mediawiki or even (gasp) google code's
> wiki system. any hope of getting a better markup system than markdown
> (which seems rather incomplete) ???

Just want to mention some nice libs:

http://rubyforge.org/projects/wikicreole/
http://rubyforge.org/projects/mediacloth/

Reply all

Reply to author

Forward