Using JavaScript to write parsers

734 views
Skip to first unread message

jerome

unread,
Mar 31, 2013, 1:31:39 PM3/31/13
to js-t...@googlegroups.com
Hi I'm interested in defining some new data formats and learning how to parse those into valid JSON, CSS and HTML. I would like to write my own parsers in JavaScript to do this. Can any of you recommend a good starting point? I've read some of Crockford's writing on the parse-tree used in JSLint and I have read though some of the codebase for PegJS and now I need more. Please, suggestions!

Yusuke SUZUKI

unread,
Mar 31, 2013, 3:15:29 PM3/31/13
to js-t...@googlegroups.com
Hello,

I recommend reading Esprima parser[1], it's simple recursive descent parser example written in JS.



On Mon, Apr 1, 2013 at 2:31 AM, jerome <jeromec...@gmail.com> wrote:
Hi I'm interested in defining some new data formats and learning how to parse those into valid JSON, CSS and HTML. I would like to write my own parsers in JavaScript to do this. Can any of you recommend a good starting point? I've read some of Crockford's writing on the parse-tree used in JSLint and I have read though some of the codebase for PegJS and now I need more. Please, suggestions!

--
--
http://clausreinke.github.com/js-tools/resources.html - tool information
http://groups.google.com/group/js-tools - mailing list information
 
---
You received this message because you are subscribed to the Google Groups "js-tools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to js-tools+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Regards,
Yusuke Suzuki

Michal Kuklis

unread,
Mar 31, 2013, 3:20:15 PM3/31/13
to js-t...@googlegroups.com
Another good tool to check out is JISON [1]

[1] http://zaach.github.com/jison
> for PegJS and now I need more. Please, suggestions! --

Ariya Hidayat

unread,
Mar 31, 2013, 3:55:01 PM3/31/13
to js-t...@googlegroups.com

Cheney, Edward A SFC MIL USA FORSCOM

unread,
Mar 31, 2013, 10:28:36 PM3/31/13
to js-t...@googlegroups.com
Classification: UNCLASSIFIED
Jerome,

I strongly recommend reading and examining the parsers mentioned. My strategy for writing a parser would be these steps:

1) Tokenizer - Parse the input into tokens, which are independent atomic pieces. It is absolutely essential that this step come first.

2) Analyzer - Examine the relationships between tokens to provide meaning and syntax validation. You have to know if something is being compared or assigned. What exactly is the current token doing in relationship to the surrounding tokens? If you want to account for automatic semicolon insertion you would do that in this step, but if you wish to ignore ASI be prepared to output an error in the case of a missing semicolon.

3) Builder - Once the analysis phase is complete you have to organize the tokens into some kind of output. This output must be a structured and well defined form that combines the tokens with the analysis so that the code can be executed by merely reading this output. Esprima, for example, produces a JSON output that uses metadata names to describe the tokens and the structure of the tokens describes the relationship between the tokens.

4) Evaluator - The code that uses the output from the previous step to execute the code and return a result. It is possible to combine steps 3 and 4, but I do not recommend this as it would make your application a bit more challenging to extend later. By keeping steps 3 and 4 separate you can output the parsed code for evaluation and troubleshooting of your parser, which helps find and remove bugs.


I imagine the hardest part of writing a parser is the QA piece at the end. There is a fair amount of sloppiness allowed in JavaScript and you have to be prepared to test a diversity of code sample to ensure you have not missed anything. Be prepared to push hundreds of samples through your parser to ensure it does exactly what you expect it should do.

Austin

On 03/31/13, jerome <jeromec...@gmail.com> wrote:

> Hi I'm interested in defining some new data formats and learning how to parse those into valid JSON, CSS and HTML. I would like to write my own parsers injava_script to do this. Can any of you recommend a good starting point? I've read some of Crockford's writing on the parse-tree used in JSLint and I have read though some of the codebase for PegJS and now I need more. Please, suggestions!
>
>
>
>
>
> --
>
> --
>
> http://clausreinke.github.com/js-tools/resources.html(blockedhttp://clausreinke.github.com/js-tools/resources.html) - tool information
>
> http://groups.google.com/group/js-tools(blockedhttp://groups.google.com/group/js-tools) - mailing list information
>
>  
>
> ---
>
> You received this message because you are subscribed to the Google Groups "js-tools" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to js-tools+u...@googlegroups.com.
>
> For more options, visit https://groups.google.com/groups/opt_out(blockedhttps://groups.google.com/groups/opt_out).
>
>  
>
>  
>
>
Classification: UNCLASSIFIED

Peter van der Zee

unread,
Apr 1, 2013, 6:43:21 AM4/1/13
to js-t...@googlegroups.com
All I can say is "Be the parser". The rest will come naturally.

- peter
> http://groups.google.com/group/js-tools - mailing list information
>
> ---
> You received this message because you are subscribed to the Google Groups "js-tools" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to js-tools+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
Reply all
Reply to author
Forward
0 new messages