Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Caml-list] mixing lexers with camlp4

20 views
Skip to first unread message

Pietro Abate

unread,
Feb 1, 2007, 8:39:04 PM2/1/07
to ocaml ml
Hi all,
I want to parsa a language like this one:
l := l & l | l % l | Id

where the symbols & , % , ... are almost arbitrary.
This my first step toward the idea of expanding the camlp4 language on
the fly. So for the moment I'm parsing the language, then I'll add the
actions to extend the grammar. For the moment I'm happy to return a list
of type stype.

I've written the following camlp4 extension:

type stype = Lid | Symbol of string ;;
let (=~) s re = Str.string_match (Str.regexp re) s 0;;
let tok = ["[a-z][A-Z]*[a-z]*";"[A-Z][A-Z]*[a-z]*";
"%";"&";"*";"?";"~";"[";"]";"<";">"] ;;
let symbex s = List.exists (fun e -> s =~ e) tok ;;

let grammar = Grammar.gcreate (Plexer.gmake ());;
let symbol strm =
match Stream.peek strm with
| Some(_,s) when (symbex s) -> Stream.junk strm; s
| _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser grammar "symbol" symbol ;;
let gram_list = Grammar.Entry.create grammar "gram_list";;

EXTEND
GLOBAL: gram_list;

gram_list: [[ grams = LIST1 gram; EOI -> grams ]];

gram: [[ p = LIDENT; ":="; rules = LIST1 rule SEP "|" -> (p,rules) ]];

rule: [[ psl = LIST1 psymbol -> psl ]];

psymbol: [[
"Id" -> Lid
| e = symbol -> Symbol(e)
]];
END
;;

now my problem is with the production symbol, that I'd like to parse not using
the standard camlp4 lexer, but one of my own. This is because I want to allow
almost arbitrary symbols in my language and the Plexer is to restrictive. My
solution above works but it's very clumsy. The easiest way I can think of is
to use the Genlex module. So to have something like:

let lexer = Genlex.make_lexer [
"+";"-";"*";"/";"=";
"[";"]";"<";">";
"%";"&";"*";"?";"~"
];;

let symbgrammar = Grammar.gcreate (lexer);;
let symbol strm =
|Kwd s -> Stream.junk strm; s
|Ident i -> ....
.........
| _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser symbgrammar "symbol" symbol ;;

of course the Genlex module is not immediately compatible with the Plexer
interface so I'm a bit lost...

- Is this the best way of doing it ?

- How can I make the Genlex module compatible with the Plexer
interface (example ?) ?

- Does camlp4 allows me to mix lexers for different productions in the same
extension ?

I believe this kind of things are going to be much easier with the new
camlp4 version...

:)
p

--
++ Blog: http://blog.rsise.anu.edu.au/?q=pietro
++
++ "All great truths begin as blasphemies." -George Bernard Shaw
++ Please avoid sending me Word or PowerPoint attachments.
See http://www.fsf.org/philosophy/no-word-attachments.html

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Pietro Abate

unread,
Feb 2, 2007, 1:25:13 AM2/2/07
to caml...@yquem.inria.fr, ocaml ml
In the best traditions, I partially answer to myself (below) but I've a new
question:

> - Does camlp4 allows me to mix lexers for different productions in the same
> extension ?

well, it seems it doesn't. Now I get this error:

Error: entries "psymbol" and "symbol" do not belong to the same grammar.
Fatal error: exception Failure("Grammar.extend error")

- Is there a deep reason why I cannot mix different grammars ?
- Is there a way of forcing this behaviour ?

On Fri, Feb 02, 2007 at 12:40:11PM +1100, Pietro Abate wrote:
> Hi all,
> I want to parsa a language like this one:
> l := l & l | l % l | Id

[...]


> of course the Genlex module is not immediately compatible with the Plexer
> interface so I'm a bit lost...
>
> - Is this the best way of doing it ?

don't know, maybe not.

> - How can I make the Genlex module compatible with the Plexer
> interface (example ?) ?

This should do the job (I think) even if ignore the location...

open Genlex


let lexer = Genlex.make_lexer [
"+";"-";"*";"/";"=";
"[";"]";"<";">";
"%";"&";"*";"?";"~"
];;

let getkwd = function Kwd s -> s | _ -> failwith "aa" ;;
let rec glexer = parser
[< 'Kwd ("+" | "-" | "*" | "/"
|"=" | "[" | "]" | "<"
|">" | "%" | "&" | "?" | "~" ) as s >] -> ("", getkwd s)
| [< 'Ident s >] -> ("LIDENT",s)
| [< >] -> ("EOI","")
;;
let lexer_gmake () = {
Token.tok_func =
Token.lexer_func_of_parser (fun s -> (glexer (lexer s), Token.dummy_loc));
Token.tok_using = (fun _ -> ());
Token.tok_removing = (fun _ -> ());
Token.tok_match = Token.default_match;
Token.tok_text = Token.lexer_text;
Token.tok_comm = None
}
;;

The full code of my example:

to compile:
#> camlp4o pa_extend.cmo pr_o.cmo pa_test.ml >> test.ml
#> ocamlfind ocamlc -package camlp4 camlp4.cma str.cma test.ml

------------ pa_test.ml ------------
open Genlex


type stype = Lid | Symbol of string ;;

let lexer = Genlex.make_lexer [


"+";"-";"*";"/";"=";
"[";"]";"<";">";
"%";"&";"*";"?";"~"
];;

let getkwd = function Kwd s -> s | _ -> failwith "fail getkwd" ;;
let rec glexer = parser
[< 'Kwd ("+" | "-" | "*" | "/"
|"=" | "[" | "]" | "<"
|">" | "%" | "&" | "?" | "~" ) as s >] -> ("", getkwd s)
| [< 'Ident s >] -> ("LIDENT",s)
| [< >] -> ("EOI","")
;;
let lexer_gmake () = {
Token.tok_func =
Token.lexer_func_of_parser (fun s -> (glexer (lexer s), Token.dummy_loc));
Token.tok_using = (fun _ -> ());
Token.tok_removing = (fun _ -> ());
Token.tok_match = Token.default_match;
Token.tok_text = Token.lexer_text;
Token.tok_comm = None
}
;;

let symbgrammar = Grammar.gcreate (lexer_gmake ());;


let symbol strm =
match Stream.peek strm with

|Some("",s) -> Stream.junk strm; s
|Some("LINDENT",s) -> Stream.junk strm; s
| _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser symbgrammar "symbol" symbol ;;


let grammar = Grammar.gcreate (Plexer.gmake ());;

let gram_list = Grammar.Entry.create grammar "gram_list";;

EXTEND
GLOBAL: gram_list;

gram_list: [[ grams = LIST1 gram; EOI -> grams ]];

gram: [[ p = LIDENT; ":="; rules = LIST1 rule SEP "|" -> (p,rules) ]];

rule: [[ psl = LIST1 psymbol -> psl ]];

psymbol: [[
"VAR" -> Lid


| e = symbol -> Symbol(e)
]];

END
;;

let apply s = Grammar.Entry.parse gram_list (Stream.of_string s);;
(apply "l := VAR");;
(apply "l := VAR & VAR");;
(apply "l := VAR U VAR");;

Je vous remercie énormément pour votre aide.

Pietro Abate

unread,
Feb 4, 2007, 6:40:21 PM2/4/07
to caml...@yquem.inria.fr, ocaml ml
On Fri, Feb 02, 2007 at 05:26:06PM +1100, Pietro Abate wrote:
> well, it seems it doesn't. Now I get this error:
> Error: entries "psymbol" and "symbol" do not belong to the same grammar.
> Fatal error: exception Failure("Grammar.extend error")
> - Is there a way of forcing this behaviour ?
I don't think this is possible as the type "grammar 'te" in gramext.ml
is not exposed in the mli and there are no functions to modify its
value. In particular I don't think gram_reinit and reinit_gram are of
any use in this case.

Ok, I stop here with this problem.

0 new messages