Parser puzzlingly seems to be losing the final element.

54 views
Skip to first unread message

Sean Parsons

unread,
Feb 13, 2016, 7:51:39 PM2/13/16
to Haskell Pipes
I'm trying to parse a very large XML file, so I've hooked up tagsoup to create a minimalistic streaming XML parser.

The following Parser I've been using to do a little bit of cleanup, but when I introduce it into the pipeline with parseForever_ the last element disappears.

textCombiner :: (Monad m) => PP.Parser TTag m (Maybe TTag)
textCombiner = do
  first   <- PP.draw
  second  <- PP.peek
  case (first, second) of
    (Just firstTag@(TS.TagText firstText), (Just secondTag@(TS.TagOpen _ _))) -> trace "1" $ if T.all (== ' ') firstText then PP.draw >> return (Just secondTag) else return (Just firstTag)
    (Just (TS.TagText firstText), Just (TS.TagText secondText))               -> trace "2" $ PP.draw >> (return $ Just $ TS.TagText $ append firstText secondText)
    (Just firstTag, _)                                                        -> trace "3" $ return (Just firstTag)
    (Nothing, _)                                                              -> trace "4" $ return Nothing 

With a TagOpen, TagText and then TagClose test combination I see the trace "3" twice and only the TagOpen and TagText elements make it through the pipeline.

Originally this code used draw twice, but I changed the second one to use peek as I thought that might make a difference, but nothing has changed.

What do I do?
Sean.

Michael Thompson

unread,
Feb 13, 2016, 11:08:26 PM2/13/16
to Haskell Pipes
Do you get better results instead of making the parser into a pipe, you use something like this (named for the similar operation in `pipes-attoparsec`). It might reduce to more fundamental pipes-parse things, I'm not sure:


    parsed :: Monad m 
           => PP.Parser a m (Maybe b) 
           -> Producer a m r 
           -> Producer b m (Producer a m r)
    parsed (StateT s) = loop where
      loop p = do
        (a,rest) <- lift (s p)
        case a of
          Nothing -> return rest
          Just b -> do 
            yield b
            loop rest



Michael Thompson

unread,
Feb 13, 2016, 11:14:01 PM2/13/16
to Haskell Pipes
Sorry, maybe I should have made it clear that I meant to use `parsed` (`parseRepeatedly`?) in place of `parseForever_` 
The, if I have 

     test :: [Tag T.Text]
     test = [TagOpen "open" [], TagText "text", TagClose "close"]

I see

    >>> runEffect $  each test  >-> PP.parseForever_ textCombiner >-> P.print
    3
    TagOpen "open" []
    3
    TagText "text"

    >>> rest <- runEffect $ parsed textCombiner (each test)  >-> P.print
    3
    TagOpen "open" []
    3
    TagText "text"
    3
    TagClose "close"
    4

    >>> runEffect $ rest >-> P.print

    >>> 

Gabriel Gonzalez

unread,
Feb 14, 2016, 12:04:54 AM2/14/16
to haskel...@googlegroups.com, seantp...@gmail.com
I should really just delete the `parseForever` and `parseForever_` functions since they will silently terminate when you run out of input as you just experienced.  They were contributed by a user, but I think I should not have accepted them.

The correct function to use is `parsed`, which will do the right thing.
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.

Gabriel Gonzalez

unread,
Feb 14, 2016, 12:22:15 AM2/14/16
to haskel...@googlegroups.com, practica...@gmail.com, seantp...@gmail.com
Alright, I added `parsed`/`parsed_` to `pipes-parse` and deprecated `parseForever`/`parseForever_` and I'll remove them after one release cycle.

Sean Parsons

unread,
Feb 19, 2016, 4:55:02 PM2/19/16
to Haskell Pipes, practica...@gmail.com, seantp...@gmail.com
This is just what I needed, thanks!
Reply all
Reply to author
Forward
0 new messages