Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
mixing lexers with camlp4
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Pietro Abate  
View profile  
 More options Feb 1 2007, 8:39 pm
Newsgroups: fa.caml
From: Pietro Abate <Pietro.Ab...@anu.edu.au>
Date: Fri, 02 Feb 2007 01:39:04 UTC
Local: Thurs, Feb 1 2007 8:39 pm
Subject: [Caml-list] mixing lexers with camlp4
Hi all,
I want to parsa a language like this one:
l := l & l | l % l | Id

where the symbols & , % , ... are almost arbitrary.
This my first step toward the idea of expanding the camlp4 language on
the fly.  So for the moment I'm parsing the language, then I'll add the
actions to extend the grammar. For the moment I'm happy to return a list
of type stype.

I've written the following camlp4 extension:

type stype = Lid | Symbol of string ;;
let (=~) s re = Str.string_match (Str.regexp re) s 0;;
let tok = ["[a-z][A-Z]*[a-z]*";"[A-Z][A-Z]*[a-z]*";
           "%";"&";"*";"?";"~";"[";"]";"<";">"] ;;
let symbex s = List.exists (fun e -> s =~ e) tok ;;

let grammar = Grammar.gcreate (Plexer.gmake ());;
let symbol strm =
    match Stream.peek strm with
    | Some(_,s) when (symbex s) -> Stream.junk strm; s
    | _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser grammar "symbol" symbol ;;
let gram_list = Grammar.Entry.create grammar "gram_list";;

EXTEND
GLOBAL: gram_list;

gram_list: [[ grams = LIST1 gram; EOI -> grams ]];

gram: [[ p = LIDENT; ":="; rules = LIST1 rule SEP "|" -> (p,rules) ]];

rule: [[ psl = LIST1 psymbol -> psl ]];

psymbol: [[
     "Id" -> Lid
    | e = symbol -> Symbol(e)
]];
END
;;

now my problem is with the production symbol, that I'd like to parse not using
the standard camlp4 lexer, but one of my own. This is because I want to allow
almost arbitrary symbols in my language and the Plexer is to restrictive. My
solution above works but it's very clumsy. The easiest way I can think of is
to use the Genlex module. So to have something like:

let lexer = Genlex.make_lexer [
    "+";"-";"*";"/";"=";
    "[";"]";"<";">";
    "%";"&";"*";"?";"~"
];;

let symbgrammar = Grammar.gcreate (lexer);;
let symbol strm =
    |Kwd s -> Stream.junk strm; s
    |Ident i -> ....
    .........
    | _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser symbgrammar "symbol" symbol ;;

of course the Genlex module is not immediately compatible with the Plexer
interface so I'm a bit lost...

- Is this the best way of doing it ?

- How can I make the Genlex module compatible with the Plexer
  interface (example ?) ?

- Does camlp4 allows me to mix lexers for different productions in the same
  extension ?

I believe this kind of things are going to be much easier with the new
camlp4 version...

:)
p

--
++ Blog: http://blog.rsise.anu.edu.au/?q=pietro
++
++ "All great truths begin as blasphemies." -George Bernard Shaw
++ Please avoid sending me Word or PowerPoint attachments.
   See http://www.fsf.org/philosophy/no-word-attachments.html

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pietro Abate  
View profile  
 More options Feb 2 2007, 1:25 am
Newsgroups: fa.caml
From: Pietro Abate <Pietro.Ab...@anu.edu.au>
Date: Fri, 02 Feb 2007 06:25:13 UTC
Local: Fri, Feb 2 2007 1:25 am
Subject: Re: [Caml-list] mixing lexers with camlp4
In the best traditions, I partially answer to myself (below) but I've a new
question:

> - Does camlp4 allows me to mix lexers for different productions in the same
>   extension ?

well, it seems it doesn't. Now I get this error:

Error: entries "psymbol" and "symbol" do not belong to the same grammar.
Fatal error: exception Failure("Grammar.extend error")

- Is there a deep reason why I cannot mix different grammars ?
- Is there a way of forcing this behaviour ?

On Fri, Feb 02, 2007 at 12:40:11PM +1100, Pietro Abate wrote:
> Hi all,
> I want to parsa a language like this one:
> l := l & l | l % l | Id
[...]
> of course the Genlex module is not immediately compatible with the Plexer
> interface so I'm a bit lost...

> - Is this the best way of doing it ?

don't know, maybe not.

> - How can I make the Genlex module compatible with the Plexer
>   interface (example ?) ?

This should do the job (I think) even if ignore the location...

open Genlex
let lexer = Genlex.make_lexer [
    "+";"-";"*";"/";"=";
    "[";"]";"<";">";
    "%";"&";"*";"?";"~"
];;
let getkwd = function Kwd s -> s | _ -> failwith "aa" ;;
let rec glexer = parser
    [< 'Kwd ("+" | "-" | "*" | "/"
            |"=" | "[" | "]" | "<"
            |">" | "%" | "&" | "?" | "~" ) as s >] -> ("", getkwd s)
    | [< 'Ident s >] -> ("LIDENT",s)
    | [< >] -> ("EOI","")
;;
let lexer_gmake () = {
    Token.tok_func =
    Token.lexer_func_of_parser (fun s -> (glexer (lexer s), Token.dummy_loc));
    Token.tok_using = (fun _ -> ());
    Token.tok_removing = (fun _ -> ());
    Token.tok_match = Token.default_match;
    Token.tok_text = Token.lexer_text;
    Token.tok_comm = None

}

;;

The full code of my example:

to compile:
#> camlp4o pa_extend.cmo pr_o.cmo pa_test.ml >> test.ml
#> ocamlfind ocamlc -package camlp4 camlp4.cma str.cma test.ml

------------ pa_test.ml ------------
open Genlex
type stype = Lid | Symbol of string ;;

let lexer = Genlex.make_lexer [
    "+";"-";"*";"/";"=";
    "[";"]";"<";">";
    "%";"&";"*";"?";"~"
];;
let getkwd = function Kwd s -> s | _ -> failwith "fail getkwd" ;;
let rec glexer = parser
    [< 'Kwd ("+" | "-" | "*" | "/"
            |"=" | "[" | "]" | "<"
            |">" | "%" | "&" | "?" | "~" ) as s >] -> ("", getkwd s)
    | [< 'Ident s >] -> ("LIDENT",s)
    | [< >] -> ("EOI","")
;;
let lexer_gmake () = {
    Token.tok_func =
    Token.lexer_func_of_parser (fun s -> (glexer (lexer s), Token.dummy_loc));
    Token.tok_using = (fun _ -> ());
    Token.tok_removing = (fun _ -> ());
    Token.tok_match = Token.default_match;
    Token.tok_text = Token.lexer_text;
    Token.tok_comm = None

}

;;

let symbgrammar = Grammar.gcreate (lexer_gmake ());;
let symbol strm =
    match Stream.peek strm with
    |Some("",s) -> Stream.junk strm; s
    |Some("LINDENT",s) -> Stream.junk strm; s
    | _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser symbgrammar "symbol" symbol ;;
let grammar = Grammar.gcreate (Plexer.gmake ());;
let gram_list = Grammar.Entry.create grammar "gram_list";;

EXTEND
GLOBAL: gram_list;

gram_list: [[ grams = LIST1 gram; EOI -> grams ]];

gram: [[ p = LIDENT; ":="; rules = LIST1 rule SEP "|" -> (p,rules) ]];

rule: [[ psl = LIST1 psymbol -> psl ]];

psymbol: [[
     "VAR" -> Lid
    | e = symbol -> Symbol(e)
]];

END
;;

let apply s = Grammar.Entry.parse gram_list (Stream.of_string s);;
(apply "l := VAR");;
(apply "l := VAR & VAR");;
(apply "l := VAR U VAR");;

Je vous remercie énormément pour votre aide.

:)
p

--
++ Blog: http://blog.rsise.anu.edu.au/?q=pietro
++
++ "All great truths begin as blasphemies." -George Bernard Shaw
++ Please avoid sending me Word or PowerPoint attachments.
   See http://www.fsf.org/philosophy/no-word-attachments.html

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pietro Abate  
View profile  
 More options Feb 4 2007, 6:40 pm
Newsgroups: fa.caml
From: Pietro Abate <Pietro.Ab...@anu.edu.au>
Date: Sun, 04 Feb 2007 23:40:21 UTC
Local: Sun, Feb 4 2007 6:40 pm
Subject: Re: [Caml-list] mixing lexers with camlp4
On Fri, Feb 02, 2007 at 05:26:06PM +1100, Pietro Abate wrote:
> well, it seems it doesn't. Now I get this error:
> Error: entries "psymbol" and "symbol" do not belong to the same grammar.
> Fatal error: exception Failure("Grammar.extend error")
> - Is there a way of forcing this behaviour ?

I don't think this is possible as the type "grammar 'te" in gramext.ml
is not exposed in the mli and there are no functions to modify its
value. In particular I don't think gram_reinit and reinit_gram are of
any use in this case.

Ok, I stop here with this problem.

p

--
++ Blog: http://blog.rsise.anu.edu.au/?q=pietro
++
++ "All great truths begin as blasphemies." -George Bernard Shaw
++ Please avoid sending me Word or PowerPoint attachments.
   See http://www.fsf.org/philosophy/no-word-attachments.html

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »