[Haskell-cafe] What *is* a DSL?

Günther Schmidt

unread,

Oct 7, 2009, 11:11:43 AM10/7/09

to haskel...@haskell.org

Hi all,

for people that have followed my posts on the DSL subject this question
probably will seem strange, especially asking it now.

I have read quite a lot lately on the subject, most of it written by the
great old ones, (come on guys you know whom I mean :)).

What I could gather from their papers was, that a DSL is basically
something entirely abstract as such, ie. it allows you build and combine
expressions in a language which is specific for your problem domain.
Irregardless of further details on how to do that, and there are quite a
few, the crux as such is that they are abstract of "meaning".

The meaning depends how you *evaluate* the expression, which can be in
more than merely one way, which is where, as far as I understand it, the
true power lies.

So, you might wonder, since I figured it out this far, why ask what a DSL
is?

Because out there I see quite a lot of stuff that is labeled as DSL, I
mean for example packages on hackage, quite useuful ones too, where I
don't see the split of assembling an expression tree from evaluating it,
to me that seems more like combinator libraries.

Thus:

What is a DSL?

Gï¿œnther

_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Emil Axelsson

unread,

Oct 7, 2009, 11:25:11 AM10/7/09

to Günther Schmidt, haskel...@haskell.org

Hi,

A DSL is just a domain-specific language. It doesn't imply any specific
implementation technique.

An *embedded* DSL is a library implemented in a more general language,
which has been designed to give the "feeling" of a stand-alone language.
Still nothing about implementation.

A *shallow embedding* of a DSL is when the "evaluation" is done
immediately by the functions and combinators of the DSL. I don't think
it's possible to draw a line between a combinator library and a
shallowly embedded DSL.

A *deep embedding* is when interpretation is done on an intermediate
data structure.

/ Emil

Gï¿œnther Schmidt skrev:

Joe Fredette

unread,

Oct 7, 2009, 11:26:44 AM10/7/09

to Günther Schmidt, haskel...@haskell.org

Let me add to this, as I've used the term "DSL" without (*gasp*) fully
understanding it before.

In addition to "What is a DSL", I'd like to ask:

"How is a DSL different from an API?" -- in the sense that an API is a
set of, say, combinators to filter email + a monad in which to combine
them. Or even the API in the more traditional sense of the set of
exposed operations on a given type. Is an API a kind of DSL? A kind of
Embedded DSL?

Also,

"What is the difference between an EDSL and a DSL?" -- I've got a
vague intuition of the difference, but am unsure how to particularly
delineate them.

Also, any good introductory papers/books/other resources on DSLs and
how to design, build and use them would be _lovely_.

/Joe

On Oct 7, 2009, at 11:10 AM, G�nther Schmidt wrote:

> Hi all,
>
>
> for people that have followed my posts on the DSL subject this
> question probably will seem strange, especially asking it now.
>
> I have read quite a lot lately on the subject, most of it written by
> the great old ones, (come on guys you know whom I mean :)).
>
> What I could gather from their papers was, that a DSL is basically
> something entirely abstract as such, ie. it allows you build and
> combine expressions in a language which is specific for your problem
> domain.
> Irregardless of further details on how to do that, and there are
> quite a few, the crux as such is that they are abstract of "meaning".
>
> The meaning depends how you *evaluate* the expression, which can be
> in more than merely one way, which is where, as far as I understand
> it, the true power lies.
>
>
> So, you might wonder, since I figured it out this far, why ask what
> a DSL is?
>
> Because out there I see quite a lot of stuff that is labeled as DSL,
> I mean for example packages on hackage, quite useuful ones too,
> where I don't see the split of assembling an expression tree from
> evaluating it, to me that seems more like combinator libraries.
>
> Thus:
>
> What is a DSL?
>
>

> G�nther

Günther Schmidt

unread,

Oct 7, 2009, 11:31:45 AM10/7/09

to Emil Axelsson, haskel...@haskell.org

Hi Emil,

now that is an interpretation I could live with!

Glad I posted the question.

Gï¿œnther

Am 07.10.2009, 17:24 Uhr, schrieb Emil Axelsson <em...@chalmers.se>:

> Hi,
>
> A DSL is just a domain-specific language. It doesn't imply any specific
> implementation technique.
>
> An *embedded* DSL is a library implemented in a more general language,
> which has been designed to give the "feeling" of a stand-alone language.
> Still nothing about implementation.
>
> A *shallow embedding* of a DSL is when the "evaluation" is done
> immediately by the functions and combinators of the DSL. I don't think
> it's possible to draw a line between a combinator library and a
> shallowly embedded DSL.
>
> A *deep embedding* is when interpretation is done on an intermediate
> data structure.
>
> / Emil
>
>

_______________________________________________

Joe Fredette

unread,

Oct 7, 2009, 11:33:21 AM10/7/09

to Emil Axelsson, haskel...@haskell.org, Günther Schmidt

So, if I understand this:

Parsec is a DSL, I'm going to venture it's a "Deep embedding" -- I
don't understand the internals, but if I were to build something like
Parsec, I would probably build up a "Parser" datastructure and then
apply optimizations to it, then "run" it with another function.

Am I on the right track here?

/Joe

On Oct 7, 2009, at 11:24 AM, Emil Axelsson wrote:

> Hi,
>
> A DSL is just a domain-specific language. It doesn't imply any
> specific implementation technique.
>
> An *embedded* DSL is a library implemented in a more general
> language, which has been designed to give the "feeling" of a stand-
> alone language. Still nothing about implementation.
>
> A *shallow embedding* of a DSL is when the "evaluation" is done
> immediately by the functions and combinators of the DSL. I don't
> think it's possible to draw a line between a combinator library and
> a shallowly embedded DSL.
>
> A *deep embedding* is when interpretation is done on an intermediate
> data structure.
>
> / Emil
>
>
>

> G�nther Schmidt skrev:

>> Hi all,
>> for people that have followed my posts on the DSL subject this
>> question probably will seem strange, especially asking it now.
>> I have read quite a lot lately on the subject, most of it written
>> by the great old ones, (come on guys you know whom I mean :)).
>> What I could gather from their papers was, that a DSL is basically
>> something entirely abstract as such, ie. it allows you build and
>> combine expressions in a language which is specific for your
>> problem domain.
>> Irregardless of further details on how to do that, and there are
>> quite a few, the crux as such is that they are abstract of "meaning".
>> The meaning depends how you *evaluate* the expression, which can be
>> in more than merely one way, which is where, as far as I understand
>> it, the true power lies.
>> So, you might wonder, since I figured it out this far, why ask what
>> a DSL is?
>> Because out there I see quite a lot of stuff that is labeled as
>> DSL, I mean for example packages on hackage, quite useuful ones
>> too, where I don't see the split of assembling an expression tree
>> from evaluating it, to me that seems more like combinator libraries.
>> Thus:
>> What is a DSL?

>> G�nther

Günther Schmidt

unread,

Oct 7, 2009, 11:42:38 AM10/7/09

to Joe Fredette, haskel...@haskell.org

Hi Joe

Am 07.10.2009, 17:26 Uhr, schrieb Joe Fredette <jfre...@gmail.com>:

> Let me add to this, as I've used the term "DSL" without (*gasp*) fully
> understanding it before.
>

Welcome to the club then! :)

> In addition to "What is a DSL", I'd like to ask:
>
> "How is a DSL different from an API?" -- in the sense that an API is a
> set of, say, combinators to filter email + a monad in which to combine
> them. Or even the API in the more traditional sense of the set of
> exposed operations on a given type. Is an API a kind of DSL? A kind of
> Embedded DSL?
>
> Also,
>
> "What is the difference between an EDSL and a DSL?" -- I've got a vague
> intuition of the difference, but am unsure how to particularly delineate
> them.

Well that part I think I can answer.

An EDSL is when you don't start from scratch. IE. when you do not, let's
say build a compiler that parses a String and then eventually "executes"
it.

Rather you define the "Terms", ie. primitive Terms (Terminals) and
Non-Terminals with the means of the "host" language (Haskell in my case).

>
> Also, any good introductory papers/books/other resources on DSLs and how
> to design, build and use them would be _lovely_.
>

Well as a book I could recommend Paul Hudaks "School of Expression". The
way he abstracts is by means of using a DSL. He assembles objects,
Geometrics Regions, Triangles, circles, squares etc. combines them with
the help of functions and *later* evaluates them. Now he is definatly
using a DSL here, but that is by no means the only way of implementing the
abstract through a DSL. Once that has sunk in I suggest papers from Oleg
and others on the subject, but to get started SOE would be a good idea.

Gï¿œnther

minh thu

unread,

Oct 7, 2009, 11:49:43 AM10/7/09

to Joe Fredette, haskel...@haskell.org, Günther Schmidt

Hi,

Some random observation:

A (E)DSL and an API fall on the same plane when they just expose
functionality of a library.

The difference between EDSL and a DSL is really just the E which means
embedded into a host language so the embedded language can be built on
top of some existing machinery, in Haskell typically the type system.

Haskell is particularly good for EDSL (but also Scheme or CL) because
the syntax of Haskell lets have a nice syntax for the embedded
language and the type system makes it possible to have, with more or
less simplicity, typing guarantees for the specifi language.

A regular expression library comprises often a regexp language, which
is considerd part of the API. That language is (or can be) parsed,
compiled and executed.

Some EDSL require to execute the Haskell program to output some
"object" code, others require only the execution of some function
equivalent to runState for the particular monad the EDSL uses.

Providing a specialised language on top of a library is quite common,
for instance command line tools to process images. Those command line
tool can later be used in some progreams (think scripting languages).
For instance, the "dot" program of the graphviz suite can be run with
unsafePerformIO to get graphviz features inside Haskell.

Parsing a String into some data structure is just a special case of
transforming some data structure into other data structure because it
easier to process that way. For instance HOAS into de Bruijn and vice
versa.

So for me, there is not a so strong distinction between API and language.

Cheers,
Thu

2009/10/7 Joe Fredette <jfre...@gmail.com>:

Robert Atkey

unread,

Oct 7, 2009, 12:30:10 PM10/7/09

to Joe Fredette, Günther Schmidt, haskel...@haskell.org

On Wed, 2009-10-07 at 11:32 -0400, Joe Fredette wrote:
> So, if I understand this:
>
> Parsec is a DSL, I'm going to venture it's a "Deep embedding" -- I
> don't understand the internals, but if I were to build something like
> Parsec, I would probably build up a "Parser" datastructure and then
> apply optimizations to it, then "run" it with another function.
>
> Am I on the right track here?

Parsec, like most other parser combinator libraries, is a shallowly
embedded DSL. The "Parser a" type is a Haskell function that does
parsing, i.e. a function of type String -> Maybe (String, a).
(Obviously, the real Parsec library allows more than strings, and has
better error reporting than this type, but this is the basic idea).

You can't analyse it further---you can't transform it into another
grammar to optimise it or print it out---because the information about
what things it accepts has been locked up into a non-analysable Haskell
function. The only thing you can do with it is feed it input and see
what happens.

A deep embedding of a parsing DSL (really a context-sensitive grammar
DSL) would look something like the following. I think I saw something
like this in the Agda2 code somewhere, but I stumbled across it when I
was trying to work out what "free" applicative functors were.

First we define what a production with a semantic action is,
parameterised by the type of non-terminals in our grammar and the result
type:

> data Production nt a
> = Stop a
> | Terminal Char (Production nt a)
> | forall b. NonTerminal (nt b) (Production nt (b -> a))

You can think of a production as a list of either terminals or
non-terminals, terminated by the "value" of that production. The
non-regular nested type argument in NonTerminal means that the final
value can depend on the values that will be returned when parsing the
strings that match other non-terminals.

Productions are functors:

> instance Functor (Production nt) where
> fmap f (Stop a) = Stop (f a)
> fmap f (Terminal c p) = Expect c (fmap f p)
> fmap f (NonTerminal nt p) = NonTerminal nt (fmap (fmap f) p)

They are also applicative functors:

> instance Applicative (Production nt) where
> pure = Stop
> (Stop f) <*> a = fmap f a
> (Terminal c t) <*> a = Terminal c (t <*> a)
> (NonTerminal nt t) <*> a = NonTerminal nt (fmap flip t <*> a)

A rule in one of our grammars is just a list of alternative productions:

> newtype Rule nt a = Rule [Production nt a]

Since lists are (applicative) functors and (applicative) functors
compose, Rule nt is also a Functor and Applicative functor:

> instance Functor (Rule nt) where
> fmap f (Rule l) = Rule (fmap (fmap f) l)

> instance Applicative (Rule nt) where
> pure x = Rule $ pure (pure x)
> (Rule lf) <*> (Rule la) = Rule $ (<*>) <$> lf <*> la

It is also an instance of Alternative, because we composed with lists:

> instance Alternative (Rule nt) where
> empty = Rule []
> (Rule r1) <|> (Rule r2) = Rule $ r1 <|> r2

A grammar is a map from nonterminals to rules, which are lists of
alternative productions, which may themselves refer back to nonterminals
in the grammar:

> type Grammar nt = forall a. nt a -> Rule nt a

Given a value of type "Grammar nt", and a starting nonterminal in "nt a"
for some "a", one can easily write a function that translates it into a
Parsec grammar to do actual parsing, or implement a different parsing
strategy using memoisation or something similar. The translation to a
traditional parser combinator library is actually a
(indexed-)homomorphism of applicative functors + extra operations, which
is pretty cool.

If you also know some extra facts about the "nt" type (e.g. that it is
finite), then it should be possible implement an CYK or Earley parser
using this, or to print out the grammar (for documentation purposes, or
for telling another node in a distributed network what things you
accept, for instance).

Note that these grammars are strictly less powerful than the ones that
can be expressed using Parsec because we only have a fixed range of
possibilities for each rule, rather than allowing previously parsed
input to determine what the parser will accept in the future. This is
the fundamental reason for using the applicative functor interface
rather than the monad interface here.

I'll give an example grammar for parsing expressions modelled by the
following data type:

> data Expr = ENum Int
> | ESum Expr Expr
> | EProduct Expr Expr
> deriving Show

To define a grammar in this formalism, one first has to define the set
of nonterminals that one wants to use:

> data NT a where
> Value :: NT Expr
> Product :: NT Expr
> Sum :: NT Expr

Now, a grammar is simply a function from members of this type to
productions. We use the applicative/alternative functor interface to
build up the productions. Conor's SHE would make this look a lot nicer,
using idiom brackets.

> myGrm :: Grammar NT
> myGrm Value = ENum <$> posInt
> <|> id <$ char '(' <*> nt Sum <* char ')'
>
> myGrm Product = EProduct <$> nt Value <* char '*' <*> nt Product
> <|> id <$> nt Value
>
> myGrm Sum = ESum <$> nt Product <* char '+' <*> nt Sum
> <|> id <$> nt Product

This needs a couple of simple functions to make things look nice:

> char :: Char -> Rule nt ()
> char c = Rule [Terminal c $ Stop ()]

> nt :: nt a -> Rule nt a
> nt nonterminal = Rule [NonTerminal nonterminal $ Stop id]

And a general definition for parsing single-digit numbers. This works
for any set of non-terminals, so it is a reusable component that works
for any grammar:

> choice :: Alternative f => [f a] -> f a
> choice = foldl (<|>) empty
>
> digit :: Rule nt Int
> digit = choice [ x <$ char (intToDigit x) | x <- [0..9] ]
>
> posInt :: Rule nt Int
> posInt = fix 1 . reverse <$> some digit
> where fix n [] = 0
> fix n (d:ds) = d*n + fix (n*10) ds

Bob

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Dan Piponi

unread,

Oct 7, 2009, 12:35:42 PM10/7/09

to Joe Fredette, haskel...@haskell.org, Günther Schmidt

2009/10/7 Joe Fredette <jfre...@gmail.com>:

> Let me add to this, as I've used the term "DSL" without (*gasp*) fully
> understanding it before.
>
> In addition to "What is a DSL", I'd like to ask:
>
> "How is a DSL different from an API?"

I don't think there is a sharp divide here. A nice example was given
by Pat Hanrahan at the recent nvidia GPU conference. He proposed the
idea that OpenGL was a DSL. His reasoning was that he could give a
formal grammar that accurately captured the structure of many
fragments of code making calls to OpenGL. For example you have blocks
of code bracketed by glBegin() and glEnd() with sequences of
primitives in between. In fact, some people indent their code to
reflect this structure as if glBegin() and glEnd() were control
structures within the host language.

I've argued that every monad gives a DSL. They all have the same
syntax - do-notation, but each choice of monad gives quite different
semantics for this notation. For example the list monad gives a DSL
for non-determinism.
--
Dan

Don Stewart

unread,

Oct 7, 2009, 1:30:40 PM10/7/09

to Dan Piponi, Günther Schmidt, haskel...@haskell.org

dpiponi:

> 2009/10/7 Joe Fredette <jfre...@gmail.com>:
> > Let me add to this, as I've used the term "DSL" without (*gasp*) fully
> > understanding it before.
> >
> > In addition to "What is a DSL", I'd like to ask:
> >
> > "How is a DSL different from an API?"
>
> I don't think there is a sharp divide here. A nice example was given
> by Pat Hanrahan at the recent nvidia GPU conference. He proposed the
> idea that OpenGL was a DSL. His reasoning was that he could give a
> formal grammar that accurately captured the structure of many
> fragments of code making calls to OpenGL. For example you have blocks
> of code bracketed by glBegin() and glEnd() with sequences of
> primitives in between. In fact, some people indent their code to
> reflect this structure as if glBegin() and glEnd() were control
> structures within the host language.
>
> I've argued that every monad gives a DSL. They all have the same
> syntax - do-notation, but each choice of monad gives quite different
> semantics for this notation. For example the list monad gives a DSL
> for non-determinism.

I've informally argued that a true DSL -- separate from a good API --
should have semantic characteristics of a language: binding forms,
control structures, abstraction, composition. Some have type systems.

Basic DSLs may only have a few charateristics of languages though -- a
(partial) grammar. That's closer to a well-defined API in my books.

-- Don

minh thu

unread,

Oct 7, 2009, 2:06:56 PM10/7/09

to Günther Schmidt, haskel...@haskell.org

2009/10/7 G�nther Schmidt <gue.s...@web.de>:
> Hi Don,

>
>> I've informally argued that a true DSL -- separate from a good API --
>> should have semantic characteristics of a language: binding forms,
>> control structures, abstraction, composition. Some have type systems.
>>
>

> That is one requirement that confuses me, abstraction.
>
> I thought of DSLs as "special purpose" languages, ie. you give your DSL
> everything it needs for that purpose.
>
> Why would it also need the ability to express even further abstractions, it
> is supposed to *be* the abstraction.
>
> G�nther

> _______________________________________________
> Haskell-Cafe mailing list
> Haskel...@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>

Hi,

Programming abstractions at the DSL level, not to further abstract
what the DSL covers.

Functions, for instance, are typical abstraction means offered by
programming languages. Even if your language is specific to some
domain, being able to create your own functions, and not only rely on
those provided by the DSL implementation, is important.

Imagine a (E)DSL for 3D programming (e.g. shading language): the
language is designed to fit well the problem (e.g. in this case, 3D
linear algebra, color operations, ...) but you'll agree it would be a
shame to not be able to provide your own functions.

Cheers,
Thu

Ben Franksen

unread,

Oct 7, 2009, 2:53:44 PM10/7/09

to haskel...@haskell.org

minh thu wrote:
> 2009/10/7 Günther Schmidt <gue.s...@web.de>:

>>> I've informally argued that a true DSL -- separate from a good API --
>>> should have semantic characteristics of a language: binding forms,
>>> control structures, abstraction, composition. Some have type systems.
>>>
>>
>> That is one requirement that confuses me, abstraction.
>>
>> I thought of DSLs as "special purpose" languages, ie. you give your DSL
>> everything it needs for that purpose.
>>
>> Why would it also need the ability to express even further abstractions,
>> it is supposed to *be* the abstraction.
>

> Programming abstractions at the DSL level, not to further abstract
> what the DSL covers.
>
> Functions, for instance, are typical abstraction means offered by
> programming languages. Even if your language is specific to some
> domain, being able to create your own functions, and not only rely on
> those provided by the DSL implementation, is important.
>
> Imagine a (E)DSL for 3D programming (e.g. shading language): the
> language is designed to fit well the problem (e.g. in this case, 3D
> linear algebra, color operations, ...) but you'll agree it would be a
> shame to not be able to provide your own functions.

But isn't one of the advantages of an _E_DSL that we can use the host
language (Haskell) as a meta or macro language for the DSL? I would think
that this greatly reduces the need to provide abstraction
facilities /inside/ the DSL. In fact most existing (and often cited
examples of) EDSLs in Haskell do not provide abstraction.

Cheers
Ben

John Van Enk

unread,

Oct 7, 2009, 3:00:37 PM10/7/09

to Ben Franksen, haskel...@haskell.org

On Wed, Oct 7, 2009 at 2:52 PM, Ben Franksen

>
> But isn't one of the advantages of an _E_DSL that we can use the host
> language (Haskell) as a meta or macro language for the DSL?
>

Substantially so. I've used brief examples where the EDSL syntax is
basically the data declaration (perhaps with some operators overloading
constructors) to demonstrate Haskell's fitness as a host language for EDSLs.

This is also a credit to the expressiveness of Haskell's data declarations.

/jve

minh thu

unread,

Oct 7, 2009, 3:04:15 PM10/7/09

to Ben Franksen, haskel...@haskell.org

2009/10/7 Ben Franksen <ben.fr...@online.de>:
> minh thu wrote:
>> 2009/10/7 G�nther Schmidt <gue.s...@web.de>:

>>>> I've informally argued that a true DSL -- separate from a good API --
>>>> should have semantic characteristics of a language: binding forms,
>>>> control structures, abstraction, composition. Some have type systems.
>>>>
>>>
>>> That is one requirement that confuses me, abstraction.
>>>
>>> I thought of DSLs as "special purpose" languages, ie. you give your DSL
>>> everything it needs for that purpose.
>>>
>>> Why would it also need the ability to express even further abstractions,
>>> it is supposed to *be* the abstraction.
>>
>> Programming abstractions at the DSL level, not to further abstract
>> what the DSL covers.
>>
>> Functions, for instance, are typical abstraction means offered by
>> programming languages. Even if your language is specific to some
>> domain, being able to create your own functions, and not only rely on
>> those provided by the DSL implementation, is important.
>>
>> Imagine a (E)DSL for 3D programming (e.g. shading language): the
>> language is designed to fit well the problem (e.g. in this case, 3D
>> linear algebra, color operations, ...) but you'll agree it would be a
>> shame to not be able to provide your own functions.
>
> But isn't one of the advantages of an _E_DSL that we can use the host
> language (Haskell) as a meta or macro language for the DSL?

It is.

> I would think
> that this greatly reduces the need to provide abstraction
> facilities /inside/ the DSL. In fact most existing (and often cited
> examples of) EDSLs in Haskell do not provide abstraction.

Even when you have good macro supports, you don't code everything at
the macro level. But it all depends on the particular EDSL we talk
about. If the EDSL is close to a regular programming language, it is
likely to provide the ability to create functions.

Cheers,
Thu

Robert Atkey

unread,

Oct 7, 2009, 3:18:31 PM10/7/09

to Günther Schmidt, haskel...@haskell.org

> What is a DSL?

How about this as a formal-ish definition, for at least a pretty big
class of DSLs:

A DSL is an algebraic theory in the sense of universal algebra. I.e. it
is an API of a specific form, which consists of:
a) a collection of abstract types, the carriers. Need not all be of
kind *.
b) a collection of operations, of type
t1 -> t2 -> ... -> tn
where tn must be one of the carrier types from (a), but the others
can be any types you like.
c) (Optional) a collection of properties about the operations (e.g.
equations that must hold)

Haskell has a nice way of specifying such things (except part (c)): type
classes.

Examples of type classes that fit this schema include Monad, Applicative
and Alternative. Ones that don't include Eq, Ord and Show. The Num type
class would be, if it didn't specify Eq and Show as superclasses.

An implementation of a DSL is just an implementation of corresponding
type class. Shallowly embedded DSLs dispense with the type class step
and just give a single implementation. Deeply embedded implementations
are *initial* implementations: there is a unique function from the deep
embedding to any of the other implementations that preserves all the
operations. The good thing about this definition is that anything we do
to the deep embedding, we can do to any of the other implementations via
the unique map.

Thanks to Church and Reynolds, we can always get a deep embedding for
free (free as in "Theorems for Free"). If our DSL is defined by some
type class T, then the deep embedding is:
type DeepT = forall a. T a => a
(and so on, for multiple carrier types, possibly with type
parameterisation).

Of course, there is often an easier and more efficient way of
representing the initial algebra using algebraic data types.

Conor McBride often goes on about how the initial algebra (i.e. the deep
embedding) of a given specification is the one you should be worrying
about, because it often has a nice concrete representation and gives you
all you need to reason about any of the other implementations.

Bob

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________

Emil Axelsson

unread,

Oct 7, 2009, 3:54:15 PM10/7/09

to Ben Franksen, haskel...@haskell.org

Ben Franksen skrev:

I would say that the DSL is what the user sees. In this view, I think
it's correct to say that many (or most) DSLs need function abstraction.
Whether or not the internal data structure has function abstraction is
an implementation detail.

/ Emil

Creighton Hogg

unread,

Oct 7, 2009, 8:15:00 PM10/7/09

to Robert Atkey, haskel...@haskell.org, Günther Schmidt

2009/10/7 Robert Atkey <bob....@ed.ac.uk>:

It's funny, because I wouldn't have thought about this in terms of
type classes from the top of my head. What I've been thinking about a
lot lately (because I'm trying to prepare notes on it) is building
classifying categories from signatures, then considering the category
of all possible functorial "models" (read: "dsl embeddings") into the
target category. I guess we're essentially talking about the same
thing. The difference from looking at it as type classes is that you
really do get all your equations preserved with product preserving
functors from your classifying category; however, the topic came up
earlier today of what would a language look like if it had a built in
notion of functorial semantics - my guess is that it'd be like a
stronger version of ML functors, but I don't really know.

Cheers,
C

George Pollard

unread,

Oct 8, 2009, 7:01:20 AM10/8/09

to haskel...@haskell.org

I'd also like to note that the canonical pronunciation of DSL ends in "-izzle".

Colin Paul Adams

unread,

Oct 8, 2009, 7:08:30 AM10/8/09

to George Pollard, haskel...@haskell.org

>>>>> "George" == George Pollard <por...@porg.es> writes:

George> I'd also like to note that the canonical pronunciation of
George> DSL ends in "-izzle".

Whose canon?

Interestingly, I have always assumed the canonical pronunciation of
DSSSL was diesel, as JADE stands for JAmes's DSSSL Engine.

I don't see why removing extra S-es should shorten the vowel.
--
Colin Adams
Preston Lancashire

Leonel Fonseca

unread,

Oct 9, 2009, 1:29:52 AM10/9/09

to

Hi,

> > What is a DSL?

A small language with just the objects and operations required to
specify behaviour in a narrow domain.

What I mean is, a DSL it's a tool for an end-user who may not be
interested in the technological underpinings of the DSL's
implementation.

Are simplicity and regularity the critical success factors when
implementing
a DSL?

Regards.

Ben Franksen

unread,

Oct 9, 2009, 6:00:02 PM10/9/09

to haskel...@haskell.org

If it is a stand-alone DSL (i.e. with its own parser), then yes. But I was
referring to Embedded DSLs, i.e. DSL as a library in a host language (eg
Haskell). In this case the user sees the host language by construction,
which means she has less need of function abstraction /inside/ the DSL.

Cheers
Ben

Gregg Reynolds

unread,

Oct 9, 2009, 6:12:10 PM10/9/09

to Colin Paul Adams, haskel...@haskell.org

On Thu, Oct 8, 2009 at 6:08 AM, Colin Paul Adams
<co...@colina.demon.co.uk>wrote:

> >>>>> "George" == George Pollard <por...@porg.es> writes:
>
> George> I'd also like to note that the canonical pronunciation of
> George> DSL ends in "-izzle".
>
> Whose canon?
>
> Interestingly, I have always assumed the canonical pronunciation of
> DSSSL was diesel, as JADE stands for JAmes's DSSSL Engine.
>
> I don't see why removing extra S-es should shorten the vowel.
>

> Wht vwl? U mst b Englsh. 2 n Amrcn, DSSSL is "dissel"; all short vowels.
DSL is "dee-ess-ell". "Dizzle" is a brbrzm.

-grgg

Colin Paul Adams

unread,

Oct 9, 2009, 6:48:27 PM10/9/09

to Gregg Reynolds, haskel...@haskell.org

>>>>> "Gregg" == Gregg Reynolds <d...@mobileink.com> writes:

Gregg> On Thu, Oct 8, 2009 at 6:08 AM, Colin Paul Adams
Gregg> <co...@colina.demon.co.uk>wrote:

> >>>>> "George" == George Pollard <por...@porg.es> writes:
>>
George> I'd also like to note that the canonical pronunciation of
George> DSL ends in "-izzle".
>>
>> Whose canon?
>>
>> Interestingly, I have always assumed the canonical
>> pronunciation of DSSSL was diesel, as JADE stands for JAmes's
>> DSSSL Engine.
>>
>> I don't see why removing extra S-es should shorten the vowel.
>>
>> Wht vwl? U mst b Englsh. 2 n Amrcn, DSSSL is "dissel"; all
>> short vowels.

Certainly I am English, and so is James Clark.

Ben Franksen

unread,

Oct 10, 2009, 2:12:46 PM10/10/09

to haskel...@haskell.org

Robert Atkey wrote:
> A deep embedding of a parsing DSL (really a context-sensitive grammar
> DSL) would look something like the following. I think I saw something
> like this in the Agda2 code somewhere, but I stumbled across it when I
> was trying to work out what "free" applicative functors were.
>

> [snip code & explanation]

This is extremely cool. I tried to understand in my head how this all works
but it just didn't click. It all seemed like magic.

Then I went ahead and tried to write a printer for your example grammar and
now everything is much clearer. Although I had to fight the type checker
quite a bit. This is the generic part:

> class Print f where
> pr :: f a -> String

> instance Print nt => Print (Production nt) where
> pr = printProduction

> printProduction :: Print nt => Production nt a -> String
> printProduction (Stop _) = ""
> printProduction (Terminal t (Stop _)) = show t
> printProduction (Terminal t p) = show t ++ " " ++ printProduction p
> printProduction (NonTerminal nt (Stop _)) = pr nt
> printProduction (NonTerminal nt p) = pr nt ++ " " ++ printProduction p

> instance Print nt => Print (Rule nt) where
> pr (Rule ps) = printPs ps where
> printPs [] = ""
> printPs [p] = printProduction p
> printPs (p:ps) = printProduction p ++ " | " ++ printPs ps

> data Any f = forall a. Any (f a)

> class Enumerable f where
> enumeration :: [Any f]

> printRule :: Print nt => (nt a -> Rule nt a) -> nt a -> String
> printRule g nt = pr nt ++ " ::= " ++ pr (g nt)

> printGrammar :: (Print nt, Enumerable nt) => Grammar nt -> String
> printGrammar g = foldr (++) "" (intersperse "\n" rules) where
> rules = map printAnyRule enumeration
> printAnyRule (Any nt) = printRule g nt

We must also provide instances for the concrete types:

> instance Enumerable NT where
> enumeration = [Any Sum, Any Product, Any Value]

> instance Print NT where
> pr Value = "Value"
> pr Product = "Product"
> pr Sum = "Sum"

So far so good. This even works... almost ;-)

*Main> putStrLn $ printGrammar myGrm
Sum ::= Product '+' Sum | Product
Product ::= Value '*' Product | Value
Value ::= Interrupted. -- had to hit Ctrl-C here

When I replace 'posInt' with 'digit' in the rule for Value

> myGrm Value = ENum <$> digit

> <|> id <$ char '(' <*> nt Sum <* char ')'

then the printer terminates just fine:

*Main> putStrLn $ printGrammar myGrm
Sum ::= Product '+' Sum | Product
Product ::= Value '*' Product | Value
Value ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '('
Sum ')'

I found that the problem is the use of function 'some' from
Control.Applicative in

> posInt :: Rule nt Int
> posInt = fix 1 . reverse <$> some digit where
> fix n [] = 0
> fix n (d:ds) = d*n + fix (n*10) ds

Since 'some' is defined recursively, this creates an infinite production for
numbers that you can neither print nor otherwise analyse in finite time.

I can see at least two solutions: One is to parameterize everything over the
type of terminals, too. A type suitable for the example would be

> data T = TNum Int | TPlus | TMult | TOParen | TCParen

and leave token recognition to a separate scanner.

The second solution (which I followed) is to break the recursion by adding
another nonterminal to the NT type:

> data NT a where
> Sum :: NT Expr
> Product :: NT Expr
> Value :: NT Expr
> Number :: NT [Int]
> Digit :: NT Int

> instance Enumerable NT where
> enumeration = [Any Sum, Any Product, Any Value, Any Number, Any Digit]

> instance Print NT where
> pr Sum = "Sum"
> pr Product = "Product"
> pr Value = "Value"
> pr Number = "Number"
> pr Digit = "Digit"

(Adding Digit /and/ Number is not strictly necessary, but it makes for a
nicer presentation.)

> myGrm :: Grammar NT

> myGrm Sum = ESum <$> nt Product <* char '+' <*> nt Sum
> <|> id <$> nt Product
>
> myGrm Product = EProduct <$> nt Value <* char '*' <*> nt Product
> <|> id <$> nt Value
>

> myGrm Value = (ENum . toNat) <$> nt Number

> <|> id <$ char '(' <*> nt Sum <* char ')'
>

> myGrm Number = extend <$> nt Digit <*> optional (nt Number)
>
> myGrm Digit = digit

> extend d Nothing = [d]
> extend d (Just ds) = d:ds

> toNat :: [Int] -> Int
> toNat = fix 1 . reverse where

> fix n [] = 0
> fix n (d:ds) = d*n + fix (n*10) ds

With this I get

*Main> putStrLn $ printGrammar myGrm
Sum ::= Product '+' Sum | Product
Product ::= Value '*' Product | Value
Value ::= Number | '(' Sum ')'
Number ::= Digit Number | Digit
Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

Morale: Be careful with recursive functions when constructing a data
representation (e.g. for a deep DSL). You might get an infinite
representation which isn't what you want in this case.

Oh, and another point: there should be a distinguished "start" nonterminal,
otherwise this is not really a grammar. This suggests something like

> type Grammar nt a = (nt a, forall b. nt b -> Rule nt b)

Next thing I'll try is to transform such a grammar into an actual parser...

Cheers
Ben

Ben Franksen

unread,

Oct 11, 2009, 3:55:10 PM10/11/09

to haskel...@haskell.org

Ben Franksen wrote:
> Next thing I'll try is to transform such a grammar into an actual
> parser...

Which I also managed to get working. However, this exposed yet another
problem I am not sure how to solve.

The problem manifests itself with non-left-factored rules like

Number ::= Digit Number | Digit

Translating such a grammar directly into a Parsec parser leads to parse
errors because Parsec's choice operator is predictive: if a production has
consumed any input the whole choice fails, even if alternative productions
would not:

*Main> P.parseTest (parseGrammar myGrm) "2"
parse error at (line 1, column 2):
unexpected end of input
expecting Number

Of course, one solution is to apply Parsec's try combinator to all choices
in a rule. But this rather defeats the purpose of using a (by default)
predictive parser in the first place which is to avoid unnecessary
backtracking.

So, a better solution is to left-factor the grammar before translating to
Parsec. Since we have a data representation of the grammar that we can
readily analyse and transform, this should be possible given some suitable
algorithm. But how is this transformation to be typed?

My first naive attempt was to define (recap: nt :: * -> * is the type of
nonterminals, t :: * is the type of terminals a.k.a. tokens, and a is the
result type):

> leftFactor :: Grammar nt t a -> Grammar nt t a

Of course, this is wrong: Left-factoring is expected to introduce new
nonterminals, so on the right-hand side of the type we should not have the
same 'nt' as on the left. Instead we shoudl have some other type that
is "'nt' extended with new constructors". Moreover, we cannot statically
know how many new nonterminals are added, so we cannot simply apply a type
function to nt. Is this solvable at all in Haskell or do I need proper
dependent types to express this?

I have very vague ideas that revolve around setting up some recursive type
function that on each level adds one constructor, define a common interface
with a (multiparam) type class and then use existential quantification in
the result type to hide the resulting type of nonterminals.

The next question is: Even if this turns out to be possible, isn't it
overkill? Maybe it is better to use an infinite type for the nonterminals
in the first place and let the grammar be a partial function? OTOH, the
formulation of the grammar as a function that pattern matches on the
nonterminals is very elegant.

Ben Franksen

unread,

Oct 11, 2009, 6:01:43 PM10/11/09

to haskel...@haskell.org

Ben Franksen wrote:
> Ben Franksen wrote:
>> Next thing I'll try is to transform such a grammar into an actual
>> parser...
>
> Which I also managed to get working.

First, before all this talking to myself here is boring you to death, please
shout and I'll go away. Anyway, at least one person has privately expressed
interest, so I'll post my code for the translation.(*)

> {-# LANGUAGE ExistentialQuantification, GADTs, Rank2Types #-}
> {-# LANGUAGE TypeSynonymInstances, MultiParamTypeClasses,
ImpredicativeTypes #-}
> import qualified Text.ParserCombinators.Parsec as P

Note that I have parameterized everything on the token (terminal) type. Here
are the data types, adapting the rest of the code is completely mechanical.

> data Production nt t a
> = Stop a
> | Terminal t (Production nt t a)
> | forall b. NonTerminal (nt b) (Production nt t (b -> a))

> newtype Rule nt t a = Rule [Production nt t a]

> type RuleSet nt t = forall a. nt a -> Rule nt t a

> type Grammar nt t b = (nt b, RuleSet nt t)

I should probably turn this into a proper data type, which would BTW also
make the ImpredicativeTypes extension unnecessary.

Translation to Parsec
---------------------

We restrict ourselves to Char as terminals for simplicity. The
generalization to arbitrary token types would need another three arguments:
showTok :: (tok -> String), nextPos :: (SourcePos -> tok -> [tok] ->
SourcePos), and testTok :: (tok -> Maybe a), which are needed by
P.tokenPrim.

> parseGrammar :: Print nt => Grammar nt Char a -> P.Parser a
> parseGrammar (start,rules) = parseNonTerminal rules start

> parseNonTerminal :: Print nt => RuleSet nt Char -> nt a -> P.Parser a
> parseNonTerminal rs nt = parseRule rs (rs nt) P.<?> pr nt

> parseRule :: Print nt => RuleSet nt Char -> Rule nt Char a -> P.Parser a
> parseRule rs (Rule ps) = P.choice (map ({- P.try . -} parseProduction rs)
ps)

> parseProduction :: Print nt => RuleSet nt Char -> Production nt Char a ->
P.Parser a
> parseProduction _ (Stop x) = return x
> parseProduction rs (Terminal c p) = P.char c >> parseProduction rs p
> parseProduction rs (NonTerminal nt p) = do
> vnt <- parseNonTerminal rs nt
> vp <- parseProduction rs p
> return (vp vnt)

This is really not difficult, once you understand how the list-like
Production type works. The trick is that a NonTerminal forces the "rest" of
the production to return a function type, so you can apply its result to
the result of parsing the nonterminal. Whereas the result of parsing
terminals gets ignored by the "rest" of the production. You might wonder
how the code manages to return the correct integer values inside an ENum.
Well, I did, at least. I don't yet understand it completely but I think the
answer is in in the Functor and Applicative instances: all the code that
interprets syntactic elements (up to the abstract syntax) inside the myGrm
function gets pushed down through the elements of a production until it
ends up at a Stop, where we can finally pull it out (see the first clause
of parseProduction).

Note also the (commented-out) use of P.try in function parseRule. Let's try
it:

*Main> putStrLn (printGrammar myGrm)
*Start ::= Sum

Sum ::= Product '+' Sum | Product
Product ::= Value '*' Product | Value
Value ::= Number | '(' Sum ')'

Number ::= Digit Number | Digit

Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

*Main> P.parseTest (parseGrammar myGrm) "2*(2+52)"

parse error at (line 1, column 2):

unexpected "*"
expecting Number

After re-inserting the P.try call, I can actually parse expressions (yay!):

*Main> :r
[1 of 1] Compiling Main ( Grammar.lhs, interpreted )
Ok, modules loaded: Main.
*Main> P.parseTest (parseGrammar myGrm) "2*(2+52)"
EProduct (ENum 2) (ESum (ENum 2) (ENum 52))

BTW, does anyone know a source (books, papers, blogs, whatever) about
algorithms for automatic left-factoring? I searched with google and found
some interesting papers on eliminating left recursion but nothing so far on
left-factoring. Have these problems all been solved before the internet
age?

Cheers
Ben

(*) One of these days I really should get my hands dirty and set up a
weblog; suggestions for how to proceed are appreciated. I would especially
like something where I can just upload a literate Haskell file and it gets
formatted automagically. Bonus points for beautifying operator symbols a la
lhs2tex ;-)

Brandon S. Allbery KF8NH

unread,

Oct 11, 2009, 6:30:37 PM10/11/09

to Ben Franksen, haskel...@haskell.org

On Oct 11, 2009, at 18:00 , Ben Franksen wrote:
> Ben Franksen wrote:
>> Ben Franksen wrote:
>>> Next thing I'll try is to transform such a grammar into an actual
>>> parser...
>>
>> Which I also managed to get working.
>
> First, before all this talking to myself here is boring you to
> death, please
> shout and I'll go away. Anyway, at least one person has privately
> expressed
> interest, so I'll post my code for the translation.(*)

It's -cafe, so let 'er rip. And maybe write it up for TMR, if you
don't want to set up a blog with all the goodies?

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] all...@kf8nh.com
system administrator [openafs,heimdal,too many hats] all...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university KF8NH

PGP.sig

Sjoerd Visscher

unread,

Oct 12, 2009, 9:50:17 AM10/12/09

to Robert Atkey, Haskell Cafe mailing list

Hi Bob,

I tried to understand this by applying what you said here to your deep
embedding of a parsing DSL. But I can't figure out how to do that.
What things become the type class T?

greetings,
Sjoerd

--
Sjoerd Visscher
sjo...@w3future.com

S. Doaitse Swierstra

unread,

Oct 12, 2009, 11:26:25 AM10/12/09

to Ben Franksen, haskel...@haskell.org

This problem of dynamically transforming grammars and bulding parsers
out of it is addressed in:

@inproceedings{1411296,
author = {Viera, Marcos and Swierstra, S. Doaitse and Lempsink,
Eelco},
title = {Haskell, do you read me?: constructing and composing
efficient top-down parsers at runtime},
booktitle = {Haskell '08: Proceedings of the first ACM SIGPLAN
symposium on Haskell},
year = {2008},
isbn = {978-1-60558-064-7},
pages = {63--74},
location = {Victoria, BC, Canada},
doi = {http://doi.acm.org/10.1145/1411286.1411296},
publisher = {ACM},
address = {New York, NY, USA},
}

and the code can be found on hackage under the name ChristmasTree
The left-factorisation is explained in a paper we presented at the
last LDTA and which will appear in ENTCS. Since we have signed some
copyright form I do notthink I can attach it here, but if you send me
a mail, I can definitely send you the paper.

Doaitse

Robert Atkey

unread,

Oct 12, 2009, 1:13:03 PM10/12/09

to Sjoerd Visscher, Haskell Cafe mailing list

On Mon, 2009-10-12 at 15:49 +0200, Sjoerd Visscher wrote:
> Hi Bob,
>
> I tried to understand this by applying what you said here to your deep
> embedding of a parsing DSL. But I can't figure out how to do that.
> What things become the type class T?

Here's the "API" version of the grammar DSL:

class GrammarDSL grammar where
type Rule grammar :: (* -> *) -> * -> *

pure :: a -> Rule grammar nt a
(<*>) :: Rule grammar nt (a -> b) -> Rule grammar nt a ->
Rule grammar nt b

empty :: Rule grammar nt a
(<|>) :: Rule grammar nt a -> Rule grammar nt a ->
Rule grammar nt a

char :: Char -> Rule grammar nt ()
nt :: nt a -> Rule grammar nt a

grammar :: forall nt a. nt a ->
(forall a. nt a -> Rule grammar nt a) -> grammar nt a

The language of typed-grammars-with-actions is composed of:

* two sorts: "grammar"s and "rule"s

* "rule"s support the applicative and alternative interfaces, and also
two special operators for incorporating terminals and nonterminals into
rules.

* "grammar"s support a single operation: taking a nonterminal-indexed
collection of rules, and a starting non-terminal (as Ben Franksen
pointed out), producing a grammar.

Basically, the idea is to think 1) "what are the syntactic categories of
my DSL?", these become the sorts; 2) "what are the basic syntactic
constructions of my DSL?", these become the operations of the type
class. Because we are embedded in Haskell, we can have infinite syntax,
as demonstrated by the "grammar" operation.

WRT the recipe for getting deep embeddings in my previous post, it isn't
quite true that the type

Grammar nt a = forall grammar. GrammarDSL grammar => grammar nt a

is isomorphic to the deep embedding I posted before, because it doesn't
guarantee that the applicative functor or alternative laws hold, while
the deep embedding does (and it also ensures that <*> and <|>
distribute). It isn't hard to come up with a deep embedding that is
initial for the completely free version though. The deep embedding from
the previous post is an instance of this type class. So is, as Ben
Franksen showed, a Parsec parser.

I ended up having to inline the applicative and alternative interfaces
into the class definition above. I wanted to write:

class (Applicative (Rule grammar nt), Alternative (Rule grammar nt)) =>
Grammar grammar where ...

but GHC wouldn't let me, complaining that 'nt' wasn't bound. Is there
any reason this couldn't be made to work?

Ben Franksen

unread,

Oct 12, 2009, 5:06:31 PM10/12/09

to haskel...@haskell.org

S. Doaitse Swierstra wrote:
> This problem of dynamically transforming grammars and bulding parsers
> out of it is addressed in:
>
> @inproceedings{1411296,
> author = {Viera, Marcos and Swierstra, S. Doaitse and Lempsink,
> Eelco},
> title = {Haskell, do you read me?: constructing and composing
> efficient top-down parsers at runtime},

> [...]
> }

Indeed, it looks as if you solved exactly the problem that vexed me! I had
just found the presentation that corresponds to the paper you mention, and
I also found a preprint for "Typed transformations of Typed Abstract
Syntax" which I am studying at the moment. I must say your construction is,
well, involved...

Not that I want to belittle this really astounding and clever achievement...
but one disadvantage of your approach (which it shares with almost all
examples I have seen for dependently typed or heterogeneous collections) is
that (IIUC) the typed map from references to abstract syntactic terms is
operationally an association list, indexed by unary numbers. I would expect
this to scale poorly if the number of references (e.g. nonterminals) gets
large.

I think it would make for quite an interesting research project to study
whether it is possible to achieve the same level of precise static typing
with more efficient data structures. I imagine that using some 'fake
dependent type' variant of [Bit] for the key and the equivalent of a
patricia tree for the map could perhaps be made to work???

Brent Yorgey

unread,

Oct 12, 2009, 8:19:28 PM10/12/09

to haskel...@haskell.org

On Sun, Oct 11, 2009 at 06:29:58PM -0400, Brandon S. Allbery KF8NH wrote:
> On Oct 11, 2009, at 18:00 , Ben Franksen wrote:
>> Ben Franksen wrote:
>>> Ben Franksen wrote:
>>>> Next thing I'll try is to transform such a grammar into an actual
>>>> parser...
>>>
>>> Which I also managed to get working.
>>
>> First, before all this talking to myself here is boring you to death,
>> please
>> shout and I'll go away. Anyway, at least one person has privately
>> expressed
>> interest, so I'll post my code for the translation.(*)
>
> It's -cafe, so let 'er rip. And maybe write it up for TMR, if you don't

Yes please! =)

-Brent

Nils Anders Danielsson

unread,

Oct 13, 2009, 8:29:37 AM10/13/09

to Robert Atkey, haskel...@haskell.org

On 2009-10-07 17:29, Robert Atkey wrote:
> A deep embedding of a parsing DSL (really a context-sensitive grammar
> DSL) would look something like the following. I think I saw something
> like this in the Agda2 code somewhere, but I stumbled across it when I
> was trying to work out what "free" applicative functors were.

The Agda code you saw may have been due to Ulf Norell and me. There is a
note about it on my web page:

http://www.cs.nott.ac.uk/~nad/publications/danielsson-norell-parser-combinators.html

> Note that these grammars are strictly less powerful than the ones that
> can be expressed using Parsec because we only have a fixed range of
> possibilities for each rule, rather than allowing previously parsed
> input to determine what the parser will accept in the future.

Previously parsed input /can/ determine what the parser will accept in
the future (as pointed out by Peter Ljunglöf in his licentiate thesis).
Consider the following grammar for the context-sensitive language
{aⁿbⁿcⁿ| n ∈ ℕ}:

data NT a where
Start :: NT () -- Start ∷= aⁿbⁿcⁿ
ABC :: Nat -> NT () -- ABC n ∷= aˡbⁿ⁺ˡcⁿ⁺ˡ
X :: Char -> Nat -> NT () -- X x n ∷= xⁿ

g :: Grammar NT
g Start = nt (ABC 0)
g (ABC n) = char 'a' <* nt (ABC (succ n))
<|> nt (X 'b' n) <* nt (X 'c' n)
g (X c n)
| n == 0 = pure ()
| otherwise = char c <* nt (X c (pred n))

> And a general definition for parsing single-digit numbers. This works
> for any set of non-terminals, so it is a reusable component that works
> for any grammar:

Things become more complicated if the reusable component is defined
using non-terminals which take rules (defined using an arbitrary
non-terminal type) as arguments. Exercise: Define a reusable variant of
the Kleene star, without using grammars of infinite depth.

--
/NAD

This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.

Robert Atkey

unread,

Oct 22, 2009, 9:45:03 AM10/22/09

to Ben Franksen, haskel...@haskell.org

On Sat, 2009-10-10 at 20:12 +0200, Ben Franksen wrote:

> Since 'some' is defined recursively, this creates an infinite production for
> numbers that you can neither print nor otherwise analyse in finite time.

Yes, sorry, I should have been more careful there. One has to be careful
to handle EDSLs that have potentially infinite syntax properly.

> I can see at least two solutions: One is to parameterize everything over the
> type of terminals, too.

> The second solution (which I followed) is to break the recursion by adding

> another nonterminal to the NT type:

A third solution is to add the Kleene star to the grammar DSL, so the
representation of productions becomes:

> data Production nt a
> = Stop a
> | Terminal Char (Production nt a)
> | forall b. NonTerminal (nt b) (Production nt (b -> a))
> | forall b. Star (Production nt b) (Production nt ([b] -> a))

You also need to add the necessary parts for Alternative to the
Production type too, because they may be nested inside Star
constructors:

> | Zero
> | Choose (Production nt a) (Production nt a)

The type Production nt is now directly an Applicative and an Alternative
and also has the combinator:
> star :: Production nt a -> Production nt [a]
> star p = Star p $ Stop id

The type of grammars is changed to (with the additional of the starting
nonterminal, as you point out):

> type Grammar nt = forall a. nt a -> Production nt a

It is probably also possible to write a function that converts grammars
with “Star”s in to ones without by introducing new non-terminals in the
way you did.

Bob

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________

Robert Atkey

unread,

Oct 22, 2009, 9:47:25 AM10/22/09

to Ben Franksen, haskel...@haskell.org

On Sun, 2009-10-11 at 21:54 +0200, Ben Franksen wrote:
> Ben Franksen wrote:
> > Next thing I'll try is to transform such a grammar into an actual
> > parser...
>
> Which I also managed to get working. However, this exposed yet another
> problem I am not sure how to solve.

Another option is to not use a recursive descent parser, and switch to a
parsing algorithm for any context-free such as CYK or Earley's
algorithm. A little test implementation of a well-typed version of the
CYK algorithm seems to work and seems to be as efficient as the normal
imperative one if enough memoisation is used. I'm trying to see if I can
get Earley's algorithm to work nicely in the well-typed setting.

Bob

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________

Robert Atkey

unread,

Oct 22, 2009, 9:57:31 AM10/22/09

to Nils Anders Danielsson, haskel...@haskell.org

On Tue, 2009-10-13 at 13:28 +0100, Nils Anders Danielsson wrote:
> On 2009-10-07 17:29, Robert Atkey wrote:
> > A deep embedding of a parsing DSL (really a context-sensitive grammar
> > DSL) would look something like the following. I think I saw something
> > like this in the Agda2 code somewhere, but I stumbled across it when I
> > was trying to work out what "free" applicative functors were.
>
> The Agda code you saw may have been due to Ulf Norell and me. There is a
> note about it on my web page:
>
> http://www.cs.nott.ac.uk/~nad/publications/danielsson-norell-parser-combinators.html

Yes, it might have been that, OTOH I'm sure I saw it in some Haskell
code. Maybe I was imagining it.

> > Note that these grammars are strictly less powerful than the ones that
> > can be expressed using Parsec because we only have a fixed range of
> > possibilities for each rule, rather than allowing previously parsed
> > input to determine what the parser will accept in the future.
>
> Previously parsed input /can/ determine what the parser will accept in
> the future (as pointed out by Peter Ljunglöf in his licentiate thesis).
> Consider the following grammar for the context-sensitive language
> {aⁿbⁿcⁿ| n ∈ ℕ}:

Yes, sorry, I was sloppy in what I said there. Do you know of a
characterisation of what languages having a possibly infinite amount of
nonterminals gives you. Is it all context-sensitive languages or a
subset?

> > And a general definition for parsing single-digit numbers. This works
> > for any set of non-terminals, so it is a reusable component that works
> > for any grammar:
>
> Things become more complicated if the reusable component is defined
> using non-terminals which take rules (defined using an arbitrary
> non-terminal type) as arguments. Exercise: Define a reusable variant of
> the Kleene star, without using grammars of infinite depth.

I see that you have an answer in the paper you linked to above. Another
possible answer is to consider open sets of rules in a grammar:

> data OpenRuleSet inp exp =
> forall hidden. OpenRuleSet (forall a. (exp :+: hidden) a ->
> Rule (exp :+: hidden :+: inp) a)

> data (f :+: g) a = Left2 (f a) | Right2 (g a)

So OpenRuleSet inp exp, exports definitions of the nonterminals in
'exp', imports definitions of nonterminals in 'inp' (and has a
collection of hidden nonterminals).

It is then possible to combine them with a function of type:

> combineG :: (inp1 :=> exp1 :+: inp) ->
> (inp2 :=> exp2 :+: inp) ->
> OpenRuleSet inp1 exp1 ->
> OpenRuleSet inp2 exp2 ->
> OpenRuleSet inp (exp1 :+: exp2)

One can then give a reusable Kleene star by stating it as an open rule
set:

> star :: forall a nt. Rule nt a -> OpenRuleSet nt (Equal [a])

where Equal is the usual equality GADT.

Obviously, this would be a bit clunky to use in practice, but maybe more
specialised versions combineG could be given.

Bob

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________

Nils Anders Danielsson

unread,

Oct 27, 2009, 9:44:47 PM10/27/09

to Robert Atkey, haskel...@haskell.org

On 2009-10-22 14:56, Robert Atkey wrote:
> Yes, it might have been that, OTOH I'm sure I saw it in some Haskell
> code. Maybe I was imagining it.

There is some related Haskell code in the Agda repository.

> Do you know of a characterisation of what languages having a possibly
> infinite amount of nonterminals gives you. Is it all context-sensitive
> languages or a subset?

I found a PhD thesis by Marvin Solomon (Cornell, 1977) which mentions
the following, in retrospect obvious, fact: With an infinite set of
non-terminals you can represent /any/ (countable) language, by using one
non-terminal for every string in the language.

I adapted this argument to show that a total parser combinator library
which I have implemented in Agda has exactly the same expressive power
as (total) functions of type List Bool → List R (assuming the token type
is Bool):

Parser combinators are as expressive as possible
http://sneezy.cs.nott.ac.uk/fplunch/weblog/?p=271

--
/NAD

This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.

_______________________________________________

Nils Anders Danielsson

unread,

Oct 27, 2009, 10:14:49 PM10/27/09

to Robert Atkey, Ben Franksen, haskel...@haskell.org

On 2009-10-22 14:44, Robert Atkey wrote:
> On Sat, 2009-10-10 at 20:12 +0200, Ben Franksen wrote:
>
>> Since 'some' is defined recursively, this creates an infinite production for
>> numbers that you can neither print nor otherwise analyse in finite time.
>
> Yes, sorry, I should have been more careful there. One has to be careful
> to handle EDSLs that have potentially infinite syntax properly.

I find it useful to carefully distinguish between inductive and
coinductive components of the syntax. Consider the following recogniser
combinator language, implemented in Agda, for instance:

data P : Bool → Set where
∅ : P false
ε : P true
tok : Bool → P false
_∣_ : ∀ {n₁ n₂} → P n₁ → P n₂ → P (n₁ ∨ n₂)
_·_ : ∀ {n₁ n₂} → P n₁ → ∞? n₁ (P n₂) → P (n₁ ∧ n₂)

The recognisers are indexed on their nullability; the index is true iff
the recogniser accepts the empty string. The definition of P is
inductive, except that the right argument of the sequencing combinator
(_·_) is allowed to be coinductive when the left argument does not
accept the empty string:

∞? : Set → Bool → Set
∞? true A = A
∞? false A = ∞ A

(You can read ∞ A as a suspended computation of type A.) The limitations
imposed upon coinduction in the definition of P ensure that recognition
is decidable. For more details, see
http://www.cs.nott.ac.uk/~nad/listings/parser-combinators/TotalRecognisers.html.

--
/NAD

This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.

_______________________________________________

S. Doaitse Swierstra

unread,

Oct 28, 2009, 5:19:20 PM10/28/09

to Haskell Cafe

On 22 okt 2009, at 15:56, Robert Atkey wrote:
>>> ....

>>
>> Previously parsed input /can/ determine what the parser will accept
>> in
>> the future (as pointed out by Peter Ljunglöf in his licentiate
>> thesis).
>> Consider the following grammar for the context-sensitive language
>> {aⁿbⁿcⁿ| n ∈ ℕ}:
>
> Yes, sorry, I was sloppy in what I said there. Do you know of a
> characterisation of what languages having a possibly infinite amount
> of
> nonterminals gives you. Is it all context-sensitive languages or a
> subset?

The answer is: all context-sensitive languages. This is a very old
insight which has come back in various forms in computer science. The
earliest conception in CS terms is the concept of an affix-grammar, in
which the infinite number of nonterminals is generated by
parameterising non-terminals by trees. They were invented by Kees
koster and Lambert Meertens (who applied them to generate music: http://en.wikipedia.org/wiki/index.html?curid=5314967)
in the beginning of the sixties of the last century. There is a long
follow up on this idea, of which the two most well-known versions are
the so-called two-level grammars which were used in the Algol68 report
and the attribute grammar formalism first described by Knuth. The full
Algol68 language is defined in terms of a two-level grammar. Key
publications/starting points if you want to learn more about these are:

- the Algol68 report: http://burks.brighton.ac.uk/burks/language/other/a68rr/rrtoc.htm
- the wikipedia paper on affix grammars: http://en.wikipedia.org/wiki/Affix_grammar
- a nice book about the basics od two-level grammars is the
Cleaveland & Uzgalis book, "Grammars for programming languages", which
may be hard to get,
but there is hope: http://www.amazon.com/Grammars-Programming-Languages-languages/dp/0444001875
- http://www.agfl.cs.ru.nl/papers/agpl.ps
- http://comjnl.oxfordjournals.org/cgi/content/abstract/32/1/36

Doaitse Swierstra