Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Haskell-cafe] Optional EOF in Parsec.

73 views
Skip to first unread message

Kannan Goundan

unread,
Apr 3, 2009, 9:25:18 PM4/3/09
to haskel...@haskell.org

I'm writing a parser with Parsec. In the input language, elements of a sequence
are separated by commas:

[1, 2, 3]

However, instead of a comma, you can also use an EOL:

[1, 2
3]

Anywhere else, EOL is considered ignorable whitespace. So it's not as simple as
just making EOL a token and looking for (comma | eol).

I've implemented this functionality in a hand-written parser (basically a hack
that keeps track of whether the last read token was preceded by an EOL,
without making EOL itself a token). Does anybody have ideas about how to
do this with Parsec?

_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Antoine Latter

unread,
Apr 4, 2009, 12:13:19 AM4/4/09
to Kannan Goundan, haskel...@haskell.org
On Fri, Apr 3, 2009 at 8:17 PM, Kannan Goundan <kan...@cakoose.com> wrote:
>
> I'm writing a parser with Parsec.  In the input language, elements of a sequence
> are separated by commas:
>
>   [1, 2, 3]
>
> However, instead of a comma, you can also use an EOL:
>
>  [1, 2
>  3]
>
> Anywhere else, EOL is considered ignorable whitespace.  So it's not as simple as
> just making EOL a token and looking for (comma | eol).
>

Untested, but hopefully enough so you get an idea of where to start:

> -- End of line parser. Consumes the carriage return, if present.
> eol :: Parser ()
> eol = eof <|> char '\n'

> -- list-element separator.
> listSep :: Parser ()
> listSep = eol <|> (char ',' >> spaces)

> -- list parser. The list may be empty - denoted by "[]"
> myListOf :: Parser a -> Parser [a]
> myListOf p =
> char '[' >>
> sepBy p listSep >>= \vals ->
> char ']' >>
> return vals

This would probably be better off with a custom version of the
'spaces' parser that didn't parse newlines.

Antoine

Martijn van Steenbergen

unread,
Apr 4, 2009, 5:59:07 AM4/4/09
to Kannan Goundan, haskel...@haskell.org
Kannan Goundan wrote:
> I've implemented this functionality in a hand-written parser (basically a hack
> that keeps track of whether the last read token was preceded by an EOL,
> without making EOL itself a token). Does anybody have ideas about how to
> do this with Parsec?

You can do exactly the same with Parsec:

* create a lexer that yields a [Token], including EOL tokens;
* write a function of type [Token] -> [(Token, Bool)] that discards EOLs
and tells for each token whether it was preceded by a (now discarded) EOL;
* write your pToken :: Token -> Parsec Token function (I omitted some
type variables there) that recognises one (Token, Bool)-tuple from the
input stream.

Or, perhaps easier:

* create a lexer that yields a [Token], including EOL tokens;
* write a function of type [Token] -> [Token] that discards only those
EOL tokens that aren't needed -- for example, those EOL tokens that
occur when there are no open ['s, then parse those EOL's explicitly in
your parser.

Hope this helps,

Martijn.

Stephan Friedrichs

unread,
Apr 4, 2009, 6:01:42 AM4/4/09
to haskel...@haskell.org
Kannan Goundan wrote:
> I'm writing a parser with Parsec. In the input language, elements of a sequence
> are separated by commas:
>
> [1, 2, 3]
>
> However, instead of a comma, you can also use an EOL:
>
> [1, 2
> 3]
>
> Anywhere else, EOL is considered ignorable whitespace. So it's not as simple as
> just making EOL a token and looking for (comma | eol).

Hi Kannan,

let's construct the parser top-down. On the top level, you have opening
and closing characters, '[' and ']'. Parsec has a function for that:

between (char '[') (char '])

And what's in between? A list of elements separated by something. Parsec
provides a sepBy function for that:

element `sepBy` separator

which parses a list of elements separated by separator. What's your
separator? Well it's either ',' or a new line and spaces before and
after that:

mySpaces >> (newline <|> char ',') >> mySpaces -- [1]

Let's combine what we've got:

myListOf :: (Parsec String () a) -> Parsec String () [a]
myListOf elem = between
(char '[')
(char ']')
(elem `sepBy` (mySpaces >> (newline <|> char ',') >> mySpaces))
where
mySpaces = many (oneOf (" \t"))

And test it in ghci:

*Main> parseTest (myListOf anyChar) "[a , b, d ,d\np]"
"abddp"

Hope this helps!

Stephan

PS: The important thing is that there are a lot solutions for tricky
situations (like yours) in Text.Parsec.Combinator (especially the sepBy
and many families). Knowing them can save a lot of work :)

[1] I don't use parsec's spaces function because it also accepts newline
characters.

>
> I've implemented this functionality in a hand-written parser (basically a hack
> that keeps track of whether the last read token was preceded by an EOL,
> without making EOL itself a token). Does anybody have ideas about how to
> do this with Parsec?
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskel...@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>

--

Früher hieß es ja: Ich denke, also bin ich.
Heute weiß man: Es geht auch so.

- Dieter Nuhr

0 new messages