Pipes.Attoparsec skip invalid input

75 views
Skip to first unread message

Michael Whitehead

unread,
Oct 24, 2015, 1:30:57 AM10/24/15
to Haskell Pipes
Not sure if I should try to solve my problem with attoparsec itself or with pipes. Either way I don't really know where to start so I would appreciate it if someone could help me a little.

{-# LANGUAGE OverloadedStrings #-}

module Main where

import qualified Data.ByteString as B
import Pipes
import qualified Pipes.Prelude as P
import qualified Pipes.Attoparsec as P
import Data.Attoparsec.ByteString.Char8 as A


data Foo = Foo deriving Show
data Bar = Bar deriving Show
data Baz = Baz deriving Show
data Command = Command Foo Bar Baz deriving Show
data Escape = Escape deriving Show

fooParser :: Parser Foo
fooParser = string "foo" >> return Foo

barParser :: Parser Bar
barParser = string "bar" >> return Bar

bazParser :: Parser Baz
bazParser = string "baz" >> return Baz

exitParser :: Parser Escape
exitParser = string "escape" >> return Escape

commandParser :: Parser Command
commandParser =
    Command <$> fooParser
            <*> barParser
            <*> bazParser


example :: Producer B.ByteString IO ()
example = do
  yield "foobarbaz"
  yield "foo"
  yield "bar"
  yield "invalid" -- skip over invalid input
  yield "baz"
  yield "foo"
  yield "escape" -- exit early from parse based on specific value
  yield "foobarbaz"

main :: IO ()
main = do
  runEffect $ P.parsed commandParser example >-> P.print
  return ()


Basically I want to make it so that when there is an invalid input it will just skip it and continue where it left off. Also I would like to be able to handle special values that will allow an exit from the parsing so I can start fresh.

Daniel Díaz

unread,
Oct 24, 2015, 5:30:07 AM10/24/15
to Haskell Pipes
In your use case, do different commads have a clear delimiter character, like a newline? That would simplify things.

Michael Whitehead

unread,
Oct 24, 2015, 11:04:36 AM10/24/15
to Haskell Pipes
If you mean like "foo\nbar\nbaz\n" then yes I can do that. I don't think I could do a delimiter for the command as a whole. Something like "foobarbaz\n" would not work.

Daniel Díaz

unread,
Oct 24, 2015, 4:54:05 PM10/24/15
to Haskell Pipes
I have another doubt. When you say:

> Basically I want to make it so that when there is an invalid input it will just skip it and continue where it left off.

Do you mean that if a command is being parsed, and some invalid input appears, after skipping the invalid inputs we should continue parsing that same command, instead of starting to parse a new one?

In other words: should your example return one parsed command, or two?

Michael Whitehead

unread,
Oct 24, 2015, 9:41:13 PM10/24/15
to Haskell Pipes
> Do you mean that if a command is being parsed, and some invalid input appears, after skipping the invalid inputs we should continue parsing that same command, instead of starting to parse a new one?

Yes

> In other words: should your example return one parsed command, or two?

So in this example

example :: Producer B.ByteString IO ()
example = do
  yield "foobarbaz"
  yield "foo"
  yield "bar"
  yield "invalid" -- skip over invalid input
  yield "baz"
  yield "foo"
  yield "escape" -- exit early from parse based on specific value
  yield "foobarbaz"

I would like there to be 3 parsed commands that get returned.

Daniel Díaz

unread,
Oct 25, 2015, 11:47:50 AM10/25/15
to Haskell Pipes
I have a possible implementation of in this Gist: https://gist.github.com/danidiaz/4445282a4fcc7ee8c61e It assumes that each token is separated by a newline, like "foo\nbar\nbaz\n"

What I did is to have tree nested layers of parsers: first, one layer for parsing individual command tokens + invalid tokens + escape tokens.

Another layer (stopAtEscape) for stopping the parsing after encountering an Escape token.

And a final layer (parseCommands)  for actually parsing the Commands from the command tokens.

The gist uses the "hoist lift" trick described by Gabriel here: https://github.com/Gabriel439/Haskell-Pipes-Parse-Library/issues/31

The idea is to have Parsers whose underlying monad (and not just the state) is a Producer of parsed results, so that after running them you end with a Producer -> Producer function.

The type signature for "completerParser" ended up overly complex, with all those nested producers in the return type. Maybe there is a simpler way.

Daniel Díaz

unread,
Oct 25, 2015, 12:19:23 PM10/25/15
to Haskell Pipes
Actually, I think I misunderstood the purpose of "escape". It doesn't abort the parsing of the whole stream of commands, only the parsing of the current command. So my solution is invalid :(


On Sunday, October 25, 2015 at 2:41:13 AM UTC+1, Michael Whitehead wrote:

Gabriel Gonzalez

unread,
Oct 25, 2015, 5:45:47 PM10/25/15
to haskel...@googlegroups.com, whiteh...@gmail.com
The solution I came up with was to push as much logic into the parser as possible:


    {-# LANGUAGE OverloadedStrings #-}

    module Main where

    import Control.Applicative

    import qualified Data.ByteString as B
    import Pipes
    import qualified Pipes.Prelude as P
    import qualified Pipes.Attoparsec as P
    import Data.Attoparsec.ByteString.Char8 as A


    data Foo = Foo deriving Show
    data Bar = Bar deriving Show
    data Baz = Baz deriving Show
    data Command = Command Foo Bar Baz deriving Show
    data Escape = Escape deriving Show

    wrap :: Parser a -> Parser (Maybe a)
    wrap p
        =   fmap Just p
        <|> ("escape" *> pure Nothing)
        <|> anyChar *> wrap p


    fooParser :: Parser Foo
    fooParser = string "foo" >> return Foo

    barParser :: Parser Bar
    barParser = string "bar" >> return Bar

    bazParser :: Parser Baz
    bazParser = string "baz" >> return Baz

    exitParser :: Parser Escape
    exitParser = string "escape" >> return Escape

    commandParser :: Parser (Maybe Command)
    commandParser =
        liftA3 (liftA3 Command)
            (wrap fooParser)
            (wrap barParser)
            (wrap bazParser)


    example :: Producer B.ByteString IO ()
    example = do
        yield "foobarbaz"
        yield "foo"
        yield "bar"
        yield "invalid" -- skip over invalid input
        yield "baz"
        yield "foo"
        yield "escape" -- exit early from parse based on specific value
        yield "foobarbaz"

    main :: IO ()
    main = do
        runEffect $ P.parsed commandParser example >-> P.print
        return ()

This gives the following output:

    Just (Command Foo Bar Baz)
    Just (Command Foo Bar Baz)
    Nothing

Then you can tell the downstream stages to stop looping when you get a `Nothing` since that indicates that it parsed the `escape` token.

Michael Whitehead

unread,
Oct 26, 2015, 10:54:06 AM10/26/15
to Haskell Pipes, whiteh...@gmail.com
> The solution I came up with was to push as much logic into the parser as possible:

Thanks for your solution. That is pretty much what I was looking for

Just one last thing that I am trying to figure out is that the command that immediately follows an "escape" gets swallowed somehow.

So in this example

    example :: Producer B.ByteString IO ()
    example = do
        yield "foobarbaz"
        yield "foo"
        yield "escape" -- exit early from parse based on specific value
        yield "foobarbaz"

It should give
    
    Just (Command Foo Bar Baz)
    Nothing
    Just (Command Foo Bar Baz)

But it only gives this

    Just (Command Foo Bar Baz)
    Nothing

I know that there is probably something really simple that I am missing.

Michael Whitehead

unread,
Oct 26, 2015, 4:18:07 PM10/26/15
to Haskell Pipes, whiteh...@gmail.com
Nevermind. If I change the wrap function then I get what I want.

wrap :: Parser a -> Parser a
wrap p = p
         <|> ("escape" *> p)
         <|> anyChar *> wrap p

Thanks for the help!

Gabriel Gonzalez

unread,
Oct 26, 2015, 4:18:48 PM10/26/15
to haskel...@googlegroups.com, whiteh...@gmail.com
You're welcome!
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages