Status of GOLD Engine Version 5

267 views
Skip to first unread message

Christopher Wells

unread,
Sep 19, 2012, 3:14:24 PM9/19/12
to gold-pars...@googlegroups.com
I am looking at the files in http://goldparser.org/engine/5/net/index.htm including "GOLD-Engine-DLL-5.0.0.zip" and "GOLD-Engine-Source-5.0.0-VB.zip".

What is the status of these files:

- Are they meant more as a demo for other engine builders, or is it intended for actual use?
- Do you use it, or do you use a different or modified engine?
- Are you or will you be making any other more changes to the engine?

I would like to get and contribute any improvements to the engine.

I have made some changes, which I hope are improvements:

- Convert the source from VB to C#
- Use generic List<> everywhere instead of untyped ArrayList
- Simplify or avoid some string and char conversions
- Change some methods like Count(), to become a property like Count
- Make the EGTReader disposable
- Use StringBuilder instead of string concatenation

Do you have any thoughts about whether and where to put the Engine source code so that someone like me can contribute to it?

Jay B

unread,
Sep 20, 2012, 6:22:56 PM9/20/12
to gold-pars...@googlegroups.com
I'm curious if you made these changes with the benefit of profiling?

There are cases where StringBuilder can actually be more expensive than simple string concatenation, for instance.

Jay

Christopher Wells

unread,
Sep 21, 2012, 3:33:55 PM9/21/12
to gold-pars...@googlegroups.com
I made the changes because the downloadable engine.dll wasn't built with a strong name, so I had to rebuild it before using it.

To rebuild it, I converted the VB source to C#.

The auto-converted C# had several compile-time errors caused by various char to string, and Token.Data, type conversions: that's why I fixed them.

I would like make slight further changes, but I posted here to ask whether I should share what I'm doing.

jasonp

unread,
Sep 21, 2012, 3:51:40 PM9/21/12
to gold-pars...@googlegroups.com
I had modified some thongs on the engine myself which allows you to grab parsing exceptions and attempt to skip to the next token. It worked well for me when attempting to parse complex languages like C++ (which gold has a hard time dealing with all the ambiguities). I am out if town at the moment  but i can sent your way when I get back. Just let me know if your interested.

Christopher Wells

unread,
Sep 21, 2012, 4:26:01 PM9/21/12
to gold-pars...@googlegroups.com
I too want to implement parser exception handling.

I am parsing CSS -- where the input CSS can be malformed, and the specification defines Rules for handling parsing errors which I should implement.

Sure, I would like to see what your modifications are.

Devin Cook

unread,
Sep 21, 2012, 4:44:11 PM9/21/12
to gold-pars...@googlegroups.com
I would definitely would be interested in your work. The VB code was
created on an older version of VB, so I didn't get to take advantage
of the new language features.
> --
> You received this message because you are subscribed to the Google Groups
> "GOLD Parsing System" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/gold-parsing-system/-/2O389fqgmAoJ.
>
> To post to this group, send email to gold-pars...@googlegroups.com.
> To unsubscribe from this group, send email to
> gold-parsing-sy...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/gold-parsing-system?hl=en.

jasonp

unread,
Sep 23, 2012, 3:36:59 PM9/23/12
to gold-pars...@googlegroups.com, mi...@devincook.com
Just realized its not exception handling but it wouldnt be too hard to change to (just throw the object). Its been awhile since I have looked at this code so sorry for the misinformation.

Here is how I got the parser to gracefully handle errors:

Add a parameter to the ParseLALR routine which is a boolean I called allowErrors (pretty self explanatory).

Created a class which is passed from the parser:

Public Class ParserError
  Public Token As Token
  Public Position As Position
End Class

Of course, initialize the class somewhere in the code.

  '===== The Parser Error information.
  Private m_ParserError As New ParserError    'Holds the Parser Error information

I wanted to keep the same accessibility to what you (Devin) used. So somewhere in the code, make the object accessible:

  <Description("Get the Parse Error Information")> _
  Public Function ParserError() As ParserError
    Return m_ParserError
  End Function

Now the next part can most-likely be modified to be more efficient. This approach seems to work well for me but I have no problems if you all want to modify it.

In the ParseLALR routine, you did setup the syntax error portion, what I did was add a while loop to basically skip tokens till it started a new line in which it SHOULD (depending on the grammar that is) be a new rule. The good thing is even if that is not the case, it will fail out again and repeat until it can successfully find a good starting point.

Remember I wanted to populate the object with useful information which allowed my application to seek to the source code given the location. So you will see all this in the code:

Result = ParseResult.SyntaxError
      If AllowErrors Then
        Dim NextLine As Boolean
        NextLine = False
                While Not NextLine
                    Try
                        If TypeOf m_Stack.Top().Data Is GOLD.Reduction AndAlso m_CurrentPosition.Line = CType(m_Stack.Top().Data, GOLD.Reduction).Item(0).Position().Line Then
                            m_Stack.Pop()
                            m_CurrentLALR = m_Stack.Top().State
                        ElseIf m_CurrentPosition.Line = m_Stack.Top().Position().Line Then
                            m_Stack.Pop()
                            m_CurrentLALR = m_Stack.Top().State
                        Else
                            NextLine = True
                            m_ParserError.Position = New Position
                            m_ParserError.Position.CharPos = m_CurrentPosition.CharPos
                            m_ParserError.Position.Line = m_CurrentPosition.Line
                            m_ParserError.Position.Column = m_CurrentPosition.Column
                            m_ParserError.Token = m_InputTokens.Pop()
                            m_CurrentPosition.Line = m_CurrentPosition.Line + 1
                            m_CurrentPosition.Column = 0
                            m_CurrentPosition.CharPos = 0
                            m_SysPosition.Column = 0
                            m_SysPosition.CharPos = 0
                            m_SysPosition.Line = m_SysPosition.Line + 1
                        End If
                    Catch ex As Exception
                        NextLine = True
                        m_ParserError.Position = New Position
                        m_ParserError.Position.CharPos = m_CurrentPosition.CharPos
                        m_ParserError.Position.Line = m_CurrentPosition.Line
                        m_ParserError.Position.Column = m_CurrentPosition.Column
                        m_ParserError.Token = m_Stack.Pop()
                        m_CurrentPosition.Line = m_CurrentPosition.Line + 1
                        m_CurrentPosition.Column = 0
                        m_CurrentPosition.CharPos = 0
                        m_SysPosition.Column = 0
                        m_SysPosition.CharPos = 0
                        m_SysPosition.Line = m_SysPosition.Line + 1
                    End Try
                End While

      End If

I will attach the Parser.vb file. Again, this was my attempt to understand what the Engine was actually doing. There is most-likely a better way to check the location then looking at the first item of the stack (which is why there is a try-catch surrounding the majority of the loop).

In my own code which used Gold Parser, I created a class which holds parsing info. This is bound to a UI object so the user can view the error on the source itself:
 
      Model.Output output = new Model.Output();

                output.FilePath = mSource;
                output.Message = "Unexpected Token: '" + args.UnexpectedToken.Text + "'";
                output.LineNumber = args.UnexpectedToken.Location.LineNr;
                output.ColumnNumber = args.UnexpectedToken.Location.ColumnNr;
                output.Position = args.UnexpectedToken.Location.Position;

Let me know if you want any more clarification.

Thanks,

Jason
Parser.txt

Christopher Wells

unread,
Sep 24, 2012, 10:38:18 AM9/24/12
to gold-pars...@googlegroups.com, mi...@devincook.com
# Published changes

I published a version-controlled copy of the project, converted to C#, at https://github.com/cwellsx/GOLDEngine  

The commit history for the "master" branch (which is the only branch at the moment) is at https://github.com/cwellsx/GOLDEngine/commits/master

The only substantive change to date is https://github.com/cwellsx/GOLDEngine/commit/9f090f362b7f9ff211c122c9e686aa8cc791548f (it contains only trivial changes in order to fix compiler syntax errors in the auto-converted C#, which I inspected).

I have tested this using https://github.com/cwellsx/GOLDEngine/tree/master/TestGOLDEngine (also by using and regression-testing it in my own project).

This is a good baseline for any further changes: refactoring, new features, and/or changes to the public API.


# How to test-drive the engine

Do you have an example grammar which uses the new "Groups" feature, which I could use for testing?

I would add such a grammar, and corresponding test file, to https://github.com/cwellsx/GOLDEngine/tree/master/TestGOLDEngine

The only example grammar which I am using now, to test-drive the engine, is your older "GOLD Meta-Language (2.6.0).grm".


# Future changes

I'd like to make further changes (git calls each change a "commit").

The only branch in the repository at the moment is "master".

Instead of creating new branches to contain my commits, I will push my  commits to the (one and only) "master" branch.

The advantage of the "master" branch is that it's the default branch, which is visible to new visitors: so if I commit to the "master" branch, those changes are more easily discoverable.

There are other ways to use branches, e.g. as follows, which I will NOT use unless other people want to use them:

- separate branches for each user or each feature
- peer review of changes made in a branch, before merging the changes
- using pull requests, from branches and/or forked repositories: https://help.github.com/articles/using-pull-requests

In summary, I suggest I put my commits in the "master" branch of the cwellsx/GOLDEngine/ repository.

Other people can share that branch with me, or create a new branch of their own in the repository, or fork the repository, or discuss with me how to use branches.

I look forward to seeing your future suggestions and changes.

Dave Dolan

unread,
Sep 24, 2012, 10:57:32 AM9/24/12
to gold-pars...@googlegroups.com
Plain old forking and pull requests are often the standard fare at
github. This amounts to users forking the repo to their own accounts,
making changes to that, and submitting a pull request which you can
either choose to merge or not. Same effect as branching per user, so
to speak.
> --
> You received this message because you are subscribed to the Google Groups
> "GOLD Parsing System" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/gold-parsing-system/-/NGPf5UzkicYJ.
>
> To post to this group, send email to gold-pars...@googlegroups.com.
> To unsubscribe from this group, send email to
> gold-parsing-sy...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/gold-parsing-system?hl=en.



--
---------------------------------------------------------------
Dave Dolan
http://davedolan.com/blog

Christopher Wells

unread,
Sep 25, 2012, 4:02:22 PM9/25/12
to gold-pars...@googlegroups.com, mi...@devincook.com
I refactored it a lot, for example Reduction and Terminal are now subclasses of Token.

Your parsing algorithms are unchanged (and it still works).




On Friday, September 21, 2012 4:44:12 PM UTC-4, Devin Cook wrote:

Dave Dolan

unread,
Sep 25, 2012, 10:12:13 PM9/25/12
to gold-pars...@googlegroups.com
We still love you Devin, but generics are not "newer language
features" anymore and haven't been for a long time! Keep up the good
work :)
> --
> You received this message because you are subscribed to the Google Groups
> "GOLD Parsing System" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/gold-parsing-system/-/RkSUtQyO_WQJ.
>
> To post to this group, send email to gold-pars...@googlegroups.com.
> To unsubscribe from this group, send email to
> gold-parsing-sy...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/gold-parsing-system?hl=en.



Christopher Wells

unread,
Oct 12, 2012, 11:20:35 AM10/12/12
to gold-pars...@googlegroups.com, mi...@devincook.com
> In the ParseLALR routine, you did setup the syntax error portion, what I did was add a while loop to basically skip tokens till it started a new line in which it SHOULD (depending on the grammar that is) be a new rule.

I find I need to do more than that.

I might get a syntax error in a subproduction, after some tokens have already been shifted and pushed onto the stack.

For example, a statement may contain an expression, which contains a term: and if the syntax of a term is bad, I need to pop the bad term and also "roll back" the enclosing expression and partially-completed statement, and tokenize the whole thing as a "bad expression" token.

To implement that "popping" and "rolling back", that I need to pop already-shifted tokens from the state stack until I find a state I can recover from (i.e. state which expects whole "statement", or a whole "bad statement").

I think I must modify the engine's stack so that its items contain not only the "token" and the "goto state", but now also remember the "previous state", so that I can restore the state when the recovery routine pops erroneously-shifted items.
Reply all
Reply to author
Forward
0 new messages