Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Grammar for roman numerals

1,250 views
Skip to first unread message

msull...@gmail.com

unread,
Mar 27, 2007, 9:27:36 AM3/27/07
to
I am doing a self-taught independent study in compiler design through
my school using the Red Dragon book as a text. One of the exercises I
am doing is writing a grammar for roman numerals. I wanted to check my
grammar's correctness, but could not find any grammars on the internet
that covered all of the letters (up to M).

Here is my grammar (I allow an arbitrary number of Ms)

numeral -> thousands
thousands -> thous_part hundreds | thous_part | hundreds
thous_part -> thous_part M | M
hundreds -> hun_part tens | hun_part | tens
hun_part -> hun_rep | CD | D | D hun_rep | CM
hun_rep -> C | CC | CCC
tens -> tens_part ones | tens_part | ones
tens_part -> tens_rep | XL | L | L tens_rep | XC
tens_rep -> X | XX | XXX
ones -> ones_rep | IV | V | V ones_rep | IX
ones_rep -> I | II | III

Comments?

Thanks,
Mike Sullivan

Martin Ward

unread,
Mar 29, 2007, 12:59:10 AM3/29/07
to
On Tuesday 27 Mar 2007 14:27, msull...@gmail.com wrote:
> Here is my grammar (I allow an arbitrary number of Ms)
>
> numeral -> thousands
> thousands -> thous_part hundreds | thous_part | hundreds
> thous_part -> thous_part M | M
> hundreds -> hun_part tens | hun_part | tens
> hun_part -> hun_rep | CD | D | D hun_rep | CM
> hun_rep -> C | CC | CCC
> tens -> tens_part ones | tens_part | ones
> tens_part -> tens_rep | XL | L | L tens_rep | XC
> tens_rep -> X | XX | XXX
> ones -> ones_rep | IV | V | V ones_rep | IX
> ones_rep -> I | II | III
>
> Comments?

This doesn't accept IIII for 4 (as found on many clocks with Roman
Numeral faces, for example), nor does it accept the "shorthand"
forms: IC for 99, IIC for 98, MVM for 1995 and so on.
The rule is that any smaller number placed before a larger
number is subtracted from the larger number.
I know of no examples where the "smaller number"
consists of other than a single numeral, or the two identical numerals
II, XX or CC. However, constructions such as IIIII for "five", IIX for "eight"
or VV for "ten" have been discovered in manuscripts.

A bar placed over a number multiplies it by one thousand,
and a double bar multiplies it by one million.
This could be implemented in your system by using parentheses
to denote the bar: thus (I) would represent 1,000.
(In the Middle Ages, 500, usually D, was sometimes written as
I followed by an apostrophus, resembling a backwards C, while 1,000
was written as CI followed by an apostrophus.)

The more general question raised by this discussion (and more relevant
to comp.compilers) is how "forgiving" should a parser be in the case
where the language being parsed has no formal definition: or where
there are several, conflicting formal definitions?
Do you accept anything that can possibly be interpreted,
or do you place "arbitrary" restrictions in order to simplify
the grammar, at the expense of rejecting existing files?

--
Martin

mar...@gkc.org.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/

Ivan Boldyrev

unread,
Mar 29, 2007, 11:05:56 PM3/29/07
to
On 9791 day of my life msully wrote:
> One of the exercises I am doing is writing a grammar for roman
> numerals.
...
> Comments?

Now rewrite the grammar as a regular expression :)

--
Ivan Boldyrev

Dmitry A. Kazakov

unread,
Mar 30, 2007, 8:31:20 AM3/30/07
to

Roman numerals (in their standard form without IIII and other stuff) don't
much differ from decimal positional system. The only problem is that glyphs
of the decimal places depend on the position. The grammar could look like

numeral := [ numeral ] decimal-place(n)
decimal-place(n) := 0(n) | 1(n) | 2(n) | 3(n) | ... | 9(n)
0(n) := <empty>
1(n) := x(n)
2(n) := x(n)x(n)
3(n) := x(n)x(n)x(n)
4(n) := x(n)v(n)
5(n) := v(n)
6(n) := v(n)x(n)
7(n) := v(n)x(n)x(n)
8(n) := v(n)x(n)x(n)x(n)
9(n) := x(n)x(n+1)

Here n is the number of the decimal place and finally:

x(1) = I
x(2) = X
x(3) = C
x(4) = M

v(1) = V
v(2) = L
v(3) = D

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Martin Ward

unread,
Mar 30, 2007, 11:05:47 PM3/30/07
to
On Friday 30 Mar 2007 04:05, Ivan Boldyrev wrote:
> On 9791 day of my life msully wrote:
> > One of the exercises I am doing is writing a grammar for roman
> > numerals.
>
> Now rewrite the grammar as a regular expression :)

Following the "Robustness Principle" attributed to John Postal,
("Be liberal in what you accept, and conservative in what you send."
http://ironick.typepad.com/ironick/2005/05/my_history_of_t.html),
there is a very simple regular expression for Roman Numerals

[MDCLXVI]+

Taking the most liberal interpretation of the rule "a smaller number
placed before a larger numeral should be subtracted from it,
instead of added to it", every string of numerals has a valid interpretation.
For example, "IXM" is interpreted as as IX (9) subtraced from M (1000)
to give 991.

This method has the advantage of providing representations for zero
and negative numbers: for example VVX is one representation of zero,

If only the Romans had followed the Robustness Principle, they might
have discovered zero and negative numbers, and changed the course
of mathematical history!

Dmitry A. Kazakov

unread,
Apr 1, 2007, 8:55:42 AM4/1/07
to

Not in "regulars", but the following more or less obvious pattern

NOEMPTY
( ['M'['M'['M']]]
['C'['D'|'M'|'C'['C']] | 'D'['C'['C'['C']]]]
['X'['L'|'C'|'X'['X']] | 'L'['X'['X'['X']]]]
['I'['V'|'X'|'I'['I']] | 'V'['I'['I'['I']]]]
)

should do the trick. More complex were to use immediate assignments for
evaluation of the numeral's value it gets matched. But it is also doable.
Though a hand-written scanner would be faster and cleaner.

Hans-Peter Diettrich

unread,
Apr 1, 2007, 8:56:07 AM4/1/07
to
Martin Ward wrote:

> Taking the most liberal interpretation of the rule "a smaller number
> placed before a larger numeral should be subtracted from it,
> instead of added to it", every string of numerals has a valid interpretation.
> For example, "IXM" is interpreted as as IX (9) subtraced from M (1000)
> to give 991.

That's my understanding as well. Nonetheless I'm not sure of the full
algorithm, with regards to the total sum.

What about "IXIM"? Intuitively this would be 990 (-(-I+X+I)+M), but it
might be calculated as (-I+X)+(-I+M) = 1008 as well.

Similarly: "IXX" vs. "XIX", where "IX" is a smaller number, and "XI" is
not a smaller number, than the following numeral "X".


> This method has the advantage of providing representations for zero
> and negative numbers: for example VVX is one representation of zero,
>
> If only the Romans had followed the Robustness Principle, they might
> have discovered zero and negative numbers, and changed the course
> of mathematical history!

Great ideas :-)

DoDi

whiskey

unread,
Apr 6, 2007, 12:00:11 AM4/6/07
to
> What about "IXIM"? Intuitively this would be 990 (-(-I+X+I)+M), but it
> might be calculated as (-I+X)+(-I+M) = 1008 as well.

Why (-I+X)+(-I+M) and not -(-I+X)+(-I+M) ? Following the same rule as
before ("a smaller number placed before a larger numeral should be
subtracted from it, instead of added to it") the result will be -(-I+X)
+(-I+M) = -(9)+(999) = 999 - 9 = 990. Since (-I+X) is smaller than (-I
+M), it will be substracted, not added.

Thomas Dickey

unread,
Apr 6, 2007, 12:03:58 AM4/6/07
to
Dmitry A. Kazakov <mai...@dmitry-kazakov.de> wrote:
> Roman numerals (in their standard form without IIII and other stuff) don't

fwiw, I've read that IIII was the standard form which the Romans used,
and that IV was a later innovation.

googling "roman iiii iv" gives this for instance:

http://elginwatches.org/help/roman_IIII.html
http://www.bhi.co.uk/hints/roman.htm

--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net

Dmitry A. Kazakov

unread,
Apr 6, 2007, 10:57:30 PM4/6/07
to
On 6 Apr 2007 00:03:58 -0400, Thomas Dickey wrote:

> Dmitry A. Kazakov <mai...@dmitry-kazakov.de> wrote:
>> Roman numerals (in their standard form without IIII and other stuff) don't
>
> fwiw, I've read that IIII was the standard form which the Romans used,
> and that IV was a later innovation.

I meant modern standard usage of.

But my point was about Roman numerals as a decimal positional system with
glyphs varying with the position. This drastically simplifies understanding
of how they are built up as well as the implementation of input / output.
Compare this with four rules as described in Webster.

Actually whether IIII and VIIII glyphs should be used for 4 and 9 is
irrelevant as long as IIII is not mixed with VI. On the contrary, other
constructs like MIM (for 1999) and IC (for 99), which were never used
otherwise than mistakenly, contradict to very positional representation.
Which makes me think that the system was positional at least in the minds
of Romans.

Hans-Peter Diettrich

unread,
Apr 8, 2007, 9:51:48 AM4/8/07
to
Dmitry A. Kazakov wrote:

> constructs like MIM (for 1999) and IC (for 99), which were never used
> otherwise than mistakenly, contradict to very positional representation.
> Which makes me think that the system was positional at least in the minds
> of Romans.

It might be necessary to distinguish between the usage in calculations,
and the usage in documents. In documents a shorthand notation might have
been used, whereas I dont know how Romans did their calculations. Known
is the spelling of numbers, and there of course we find your assumed
positional system.

A grammar for the shorthand notation even may be context sensitive, also
in an cultural/educational sense, where the most elaborated form was
used by people which had to cast numbers into stone...

DoDi

0 new messages