Filtering out block comments from source code?

45 views
Skip to first unread message

Mitchell Rosen

unread,
Dec 12, 2015, 4:47:35 PM12/12/15
to Haskell Pipes
Hi,

I'm wondering if it's possible to use a pipes-like stream to express "contextual" filtering of elements. For example, taking a simplistic view of a block comment as a groups lines such that the first line that begins with "{-" and the last line begins with "-}", I'd like to be able to filter a stream like:

[ "foo"
, "{- hi"
, "there -}"
, "bar"
, "{- whoops"
]


into

[ "foo"
, "bar"
]

Using the streaming library (similar to pipes + pipes-group + pipes-parse), my first attempt was something like:

import Streaming
import qualified Streaming.Prelude as S

filterBlockComments :: forall m r. Stream (Of String) m r -> Stream (Of String) m r
filterBlockComments s0 = do
    -- s0 == ["foo", "{- hi", "there -}", "bar", "{- whoops"]

    -- yield ["foo"], leaving
    -- s1 == ["{- hi", "there -}", "bar", "{- whoops"]
    s1 :: Stream (Of String) m r
        <- S.span (\line -> not ("{-" `isPrefixOf` line)) s0

    -- break s1 into (["{- hi"], ["there -}", "bar", "{- whoops"]),
    -- then drop the first element of the second half, leaving
    -- (["{- hi"], ["bar", "{- whoops"])
    let s2 :: Stream (Of String) m (Stream (Of String) m r)
        s2 = fmap (S.drop 1) (S.break (\line -> "-}" `isSuffixOf` line) s1)

    -- run s2 without yielding its elements, leaving
    -- s3 == ["bar", "{- whoops"]
    s3 <- lift (S.effects s2)

    -- loop
    filterBlockComments s3
    
However, this doesn't terminate, for reasons I haven't quite figured out other than I'm doing streaming "wrong". So, is there a more idiomatic way to write such a function?

Thanks,
Mitchell
    
    
    

Michael Thompson

unread,
Dec 13, 2015, 2:11:19 PM12/13/15
to Haskell Pipes
Right, I think you are looping because you aren't dealing with the eof case, so it keeps spanning and breaking an empty stream.

You need something like 'next' in there somewhere. I think there are a number of places to do the inspection. The following
is I think the same idea you had, formulated with pipes and streaming.


import Pipes
import qualified Pipes.Prelude as P
import qualified Pipes.Parse as P
import Streaming
import qualified Streaming.Prelude as S
import Lens.Simple
import Data.List


pFiltered start = do
  s0 <- view (P.span (\line -> not ("{-" `isPrefixOf` line))) start
  let s1 = view (P.span (\line -> not ("-}" `isSuffixOf` line))) s0
  s2 <- lift $ runEffect $ s1 >-> P.drain  
  e  <- lift $ P.next s2
  case e of
    Left r       -> return r
    Right (_,s3) -> pFiltered s3

sFiltered start = do
  s0 <- S.break (\line -> "{-" `isPrefixOf` line) start
  let s1 = S.break (\line -> "-}" `isSuffixOf` line) s0
  s2 <- lift $ S.effects s1 
  e  <- lift $ S.next s2
  case e of
    Left r -> return r
    Right (_,s3) ->  sFiltered s3

main = do 
  putStrLn "--------"
  runEffect $ pFiltered (each a) >-> P.stdoutLn
  putStrLn "--------"
  S.stdoutLn $ sFiltered (S.each a) 
  putStrLn "--------"
  runEffect $ pFiltered (each b) >-> P.stdoutLn
  putStrLn "--------"
  S.stdoutLn $ sFiltered (S.each b) 
    where
    a = 
        [ "good"
        , "{- hi"
        , "bad"
        , "there -}"
        , "good"
        , "{- whoops"
        , "bad"
        , "unwhoops -}"
        , "good"
        , "{- start"
        , "bad"
        , "end -}"
        , "good"
        ]
    b = []

This gives me

> main

--------

good

good

good

good

--------

good

good

good

good

--------

--------

Michael Thompson

unread,
Dec 13, 2015, 2:56:42 PM12/13/15
to Haskell Pipes
Oh, I meant to devise a Data.List equivalent of the problem. The first of these loops unproductively for the same reason. 

    list1 ls = 
      let (good, other) = break ("{-" `isPrefixOf`) ls
          (bad, rest) =   break ("{-" `isPrefixOf`) other
      in good ++ list1 rest
   
    list2 ls = 
      let (good, other) = break ("{-" `isPrefixOf`) ls
          (bad, rest) =   break ("-}" `isSuffixOf`) other
      in case rest of 
         [] -> good 
         x:xs -> good ++ list2 xs


Michael Thompson

unread,
Dec 13, 2015, 3:02:17 PM12/13/15
to Haskell Pipes
Bah, the first, looping one of those, which corresponds to the original problem program, should have been

list1 ls = 
  let (good, other) = break ("{-" `isPrefixOf`) ls
      (bad, rest) =   break ("-}" `isSuffixOf`) other
  in good ++ list1 (drop 1 rest)

Mitchell Rosen

unread,
Dec 13, 2015, 3:50:06 PM12/13/15
to Haskell Pipes
Fantastic, thank you!

Michael Thompson

unread,
Dec 13, 2015, 4:12:30 PM12/13/15
to Haskell Pipes
By the way, my pipes program had a pointless 'lift.runEffect' in it. Since I am building a producer, I can just write

    s2 <- s1 >-> P.drain  

rather than

     s2 <- lift $ runEffect $ s1 >-> P.drain  


The meaning is a little clearer that way.

Daniel Díaz

unread,
Jan 1, 2016, 6:29:54 AM1/1/16
to Haskell Pipes
Just for kicks, I've implemented the comment-filtering functionality using foldl-transduce:

textWithComments :: [T.Text]
textWithComments = [
      "foo"
    , "{- hi"
    , "there -}"
    , "bar"
    , "{- another"
    , "comment -}"
    , "baz"
    ]

removeComments :: L.Fold T.Text T.Text
removeComments =
    groups sectionSplitter ignoreAlternating L.mconcat
  where
    sectionSplitter = sections (cycle ["{-","-}"])
    ignoreAlternating = Moore $
        unfold
        (\b -> (bool (reify (transduce ignore)) (reify id) b,
                     \_ -> not b))
        True

main :: IO ()
main = do
    T.putStrLn (L.fold removeComments textWithComments)
Reply all
Reply to author
Forward
0 new messages