elm-tools/parser: how to access intermediate parsed objects downstream in parsing pipeline?

141 views
Skip to first unread message

Dave Doty

unread,
Aug 1, 2017, 11:17:57 PM8/1/17
to Elm Discuss
The elm-tools/parser documentation recommends using parsing pipelines such as 

type alias Point = { x : Float, y : Float}

point : Parser Point
point =
  succeed Point
    |. symbol "("
    |. spaces
    |= float
    |. spaces
    |. symbol ","
    |. spaces
    |= float
    |. spaces
    |. symbol ")"

spaces : Parser ()
spaces =
  ignore zeroOrMore (\c -> c == ' ')

I am parsing text in this way, but it is much longer than just two floats. The high-level parser parses text with five major parts in order (describing portions of a finite automaton) and it looks like this (and uses five "subroutine" parsers that I've not shown):

type alias State = String
type
alias DFATransition = State -> Char -> Result String State
type
alias DFA =
   
{ states : List State
   
, inputAlphabet : List Char
   
, startState : State
   
, acceptStates : List State
   
, delta : DFATransition
   
}

dfaParser
: Parser.Parser DFA
dfaParser
=
   
Parser.succeed DFA
       
|. spaces
       
|. Parser.keyword "states:"
       
|. spaces
       
|= statesParser
       
|. spaces
       
|. Parser.keyword "input_alphabet:"
       
|. spaces
       
|= alphabetParser
       
|. spaces
       
|. Parser.keyword "start_state:"
       
|. spaces
       
|= startStateParser
       
|. spaces
       
|. Parser.keyword "accept_states:"
       
|. spaces
       
|= statesParser
       
|. spaces
       
|. Parser.keyword "delta:"
       
|. spaces
       
|= deltaParser
       
|. spaces

to parse text such as, for instance, 

"""
states:          {q,q0,q00,q000}
input_alphabet:  {0,1}
start_state:     q
accept_states:   {q,q0,q00}
delta:
q,1    -> q
q0,1   -> q
q00,1  -> q
q000,1 -> q
q,0    -> q0
q0,0   -> q00
q00,0  -> q000
q000,0 -> q000
"""

Here's what I want to do: insert code in the middle of the pipeline that can reference the data that has been parsed so far.

For example, after this portion of the pipeline has succeeded: 

dfaParser =
   
Parser.succeed DFA
       
|. spaces
       
|. Parser.keyword "states:"
       
|. spaces
       
|= statesParser
       
|. spaces
       
|. Parser.keyword "input_alphabet:"
       
|. spaces
       
|= alphabetParser
       
...

then the data for states and alphabet have been successfully parsed into two Lists. I would like to access those lists by name, later down the pipeline. 

One reason is that I would like to pass those lists as input to subsequent parsers (startStateParser, acceptStatesParser, and deltaParser), to help them do error-checking. 

For example, the next thing parsed is a String parsed by startStateParser, and I want to ensure that the parsed String is an element of the List parsed by statesParser. But at the time I put the line |= startStateParser in the pipeline, the parsed result of statesParser doesn't have a name that I can refer to.

Another reason is that I want to do error-checking in the middle of a pipeline. For example, my implementation of deltaParser reads the lines such as "q,0 -> q0" and "q0,1 -> q" one at a time, and I would like to access data parsed by previous lines when looking for errors on the current line. (For example, it is an error to have duplicates on the left side of -> such as the line "q,1 -> q" followed later by "q,1 -> q0", but to indicate this error and reference the correct line number, I need access to the lines parsed so far as I am processing the line with the error.)

I get the feeling that perhaps I'm structuring this incorrectly, so I welcome advice on a better way to structure the parser.

Yosuke Torii

unread,
Aug 2, 2017, 1:49:47 PM8/2/17
to Elm Discuss
You cannot do like that in the middle of pipeline, but instead you can use `andThen` to make a new parser based on the previous value you parsed.

For example, you can pass `states` value to parse `acceptStates` like this. (Simplified a lot, not tested)

dfaParser : Parser.Parser DFA
dfaParser =
  statesParser
    |> andThen (\states ->
      succeed (DFA states)
        |= alphabetParser
        |= acceptStateParser states
        |= ...
    )

You can also make a recursive parser and pass the intermediate state to the later parser.

deltaListParser : Context -> Parser.Parser (List Delta)
deltaListParser context =
  oneOf
    [ deltaParser
        |> andThen (\delta ->
           if checkDuplication delta context then
             deltaListParser (updateContext delta context)
               |> map (\rest -> delta :: rest)
           else
             fail "found duplicated values"
    , succeed []
    ]
    
deltaParser : Parser.Parser Delta

That said, I don't think validation is necessary during the parsing process. You can check it after everything is parsed. That is much simpler.


2017年8月2日水曜日 12時17分57秒 UTC+9 Dave Doty:

Dave Doty

unread,
Aug 2, 2017, 1:59:03 PM8/2/17
to Elm Discuss
Thanks! I assumed andThen would be involved somehow but didn't quite see how to use it.

That said, I don't think validation is necessary during the parsing process. You can check it after everything is parsed. That is much simpler.

If I check everything after it is parsed, how do I know what row/col the mistake came from in the input String? If the parsing succeeds, then the result is simply type OK DFA, but if I find a mistake then I want to be able to report which specific part of the input String caused that problem.

Yosuke Torii

unread,
Aug 2, 2017, 2:09:11 PM8/2/17
to elm-d...@googlegroups.com
how do I know what row/col the mistake came from in the input String?

Have you looked at Parser.LowLevel module? I haven't tried it yet but it should help tracking the error position.


--
You received this message because you are subscribed to a topic in the Google Groups "Elm Discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elm-discuss/gxy9D6bncIQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elm-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages