I've read through the other threads in this group regarding approaches to indentation languages, but I still seem to be stuck on the approach for parsing what I'm trying to do.
Basically, I'd like to be able to parse this:
h1#morp
div
span funfun
div#fun single line content
#fun
My language is a superset of this, but just getting beyond this chunk would probably carry me to the finish line. I'd like the above to parse into
[ { tagName: 'h1', attrs: { id: 'morp', classes: [Object] }
content: [
{ tagName: 'div', attrs: {}, content: null },
{ tagName: 'span', attrs: {}, content: "funfun" }
]
},
{ tagName: 'div', attrs: { id: 'fun', classes: [] }, content: 'single line content' },
{ tagName: null, attrs: { id: 'fun', classes: [] }, content: null } ]
The tricky part for me (and I don't know if this falls into the context / context-free problem, but basically, the indentation following a line is optional, but can't be used if single line content has been specified.
I've already pre-processed things to have INDENT DEDENT TERM tokens. The input for the parser looks like:
h1#morpTERMINDENTdivTERMspan funfunTERMDEDENTdiv#fun single line contentTERM#funTERM
I don't want to riddle this place with too much code if it's not relevant, but here's the first few lines of the parser:
start = content
content = statements
statements
= statement*
statement
= !TERM t:htmlTag TERM INDENT c:content TERM DEDENT { t.content = c; return t; }
/ !TERM t:htmlTag _ TERM { return t; }
/ !TERM t:htmlTag c:(singleLineContent)? TERM { t.content = c[1]; return t; }
/ !TERM htmlTag TERM
------
I don't necessarily need a hand-holding to the solution, but just a sanity check that what I'm trying to do is possible/healthy with PegJS and whether I'm doing something obviously stupid in my first attempts with the parser. Basically, my general problem is that the parser seems to be extremely greedy and eat up more than it's supposed to without looking out for the TERMs. I apologize if this is not enough information, and I'll provide more info if needed, but I figured this might also be something where I'm quite immediately obviously going about something the wrong way.
Thanks in advance for any help you can offer,
Alex