Looking for Unix lex for modern systems

29 views
Skip to first unread message

Aharon Robbins

unread,
Jan 6, 2022, 7:09:53 PMJan 6
to
Can anyone point me at a version of Unix lex that will run on Linux?

Thanks,

Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
[I wouldn't hold my breath. Perhaps someone has a retrocomputing
Vax or PDP-11 that can run an antique lex and then you can use the
output. Or maybe it might be easier to dig into the ugly lex
application and figure out what it's doing to the insides of
the old lex scanner. -John]


gah4

unread,
Jan 6, 2022, 9:37:45 PMJan 6
to
On Thursday, January 6, 2022 at 4:09:53 PM UTC-8, Aharon Robbins wrote:
> Can anyone point me at a version of Unix lex that will run on Linux?

On my Linux system, /usr/bin/lex is a symbolic link to /usr/bin/flex

On FreeBSD, they are both hard links to the same file.

On OS X, they are two different files (cmp -l shows differences)
of the same size.

A web search shows the Oracle lex man page for Solaris, which does not mention
flex, and so might not be a link of any kind.

I have hardware that can run SunOS and Solaris. (It should be easy to find
hardware to run Solaris-x86 versions.)

As to actual copyright AT&T lex, that might be a little harder.
[Flex can take the same input as lex but its internals are totally different.

Bell Labs long ago released the code to early Unix systems. The source
for lex is here:
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/lex or on
the 4.2BSD src archive at
https://www.tuhs.org/Archive/Distributions/UCB/4.2BSD/
I tried to compile the 4.2BSD version on FreeBSD and the errors were
ugly. -John]

gah4

unread,
Jan 7, 2022, 8:22:56 PMJan 7
to
On Thursday, January 6, 2022 at 4:09:53 PM UTC-8, Aharon Robbins wrote:
> Can anyone point me at a version of Unix lex that will run on Linux?

A web search for lex source found this:

http://heirloom.sourceforge.net/devtools.html

which sounds like exactly what you want. It is supposed to compile on Linux,
and seems to be derived from Solaris source, and has the CDDL license:

http://www.opensolaris.org/os/licensing

Otherwise, as noted previously, Solaris-x86 should run on easily found x86 systems.
(Or in a virtual machine on such systems, if you don't have one available.)

gah4

unread,
Jan 7, 2022, 8:28:34 PMJan 7
to
(snip, our moderator wrote)

> [Flex can take the same input as lex but its internals are totally different.
>
> Bell Labs long ago released the code to early Unix systems. The source
> for lex is here:
> https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/lex or on
> the 4.2BSD src archive at
> https://www.tuhs.org/Archive/Distributions/UCB/4.2BSD/
> I tried to compile the 4.2BSD version on FreeBSD and the errors were
> ugly. -John]

It seems that real lex known about RATFOR, and I suspect that actual flex doesn't.
Is that a good test for which source you have?

In any case, with

gcc -std=c89 -Dunix

there aren't so many errors (that aren't warnings).

The warnings are from conversion of either the wrong pointer type,
or between integer and pointer. I am not so sure how well current
systems do the latter. (That seems to be usual for C from those years.)

Fixing the actual errors, including removing the initialization
of *errorf with stdout, and not declaring calloc, it compiles and
(with the -t option) runs.

It then stops with:

(Error) output table overflow
5/1000 nodes(%e), 10/2500 positions(%p), 3/500 (%n), 254 transitions
, 2/1000 packed char classes(%k), 3/2000 packed transitions(%a), 0/0 output slots(%o)

(I have the sample file from the Wikipedia page for input.)

Reminds me, in the days of OS/2 1.0, I was compiling the GNU utilities,
and especially grep and diff, for OS/2. In many cases, they would mix integer
and (char*), especially in function arguments. Replacing 0 with (char*)0 fixed
those, but I also complained to the GNU people. The reply was that, pretty much,
any system with sizeof(int) not equal to sizeof(char*) was broken, and it
wasn't their problem to fix.
[If the comments in the source code say "written by Eric Schmidt", it's lex,
otherwise, it's flex. Yes, that Eric Schmidt. -John]

Aharon Robbins

unread,
Jan 9, 2022, 4:45:11 PMJan 9
to
In article <22-0...@comp.compilers>, gah4 <ga...@u.washington.edu> wrote:
>On Thursday, January 6, 2022 at 4:09:53 PM UTC-8, Aharon Robbins wrote:
>> Can anyone point me at a version of Unix lex that will run on Linux?
>
>A web search for lex source found this:
>
>http://heirloom.sourceforge.net/devtools.html
>
>which sounds like exactly what you want.

I got this to build and run, but it ran out of buffer space. :-(

I have since made good progress with flex. The original lexer
was doing its own token buffering. I moved to using yytext, and
also changed YY_INPUT to get one character of input at a time
as lex used to do. These two together have allowed me to make
real progress.

Performance isn't an issue, so doing one character at a time is fine.

Thanks everyone for the help.

gah4

unread,
Jan 9, 2022, 8:46:03 PMJan 9
to
On Sunday, January 9, 2022 at 1:45:11 PM UTC-8, Aharon Robbins wrote:
> In article <22-0...@comp.compilers>, gah4 <ga...@u.washington.edu> wrote:
> >On Thursday, January 6, 2022 at 4:09:53 PM UTC-8, Aharon Robbins wrote:
> >> Can anyone point me at a version of Unix lex that will run on Linux?

> >A web search for lex source found this:

> >http://heirloom.sourceforge.net/devtools.html

> >which sounds like exactly what you want.
> I got this to build and run, but it ran out of buffer space. :-(

I compiled what I believe is actual lex on Linux. There were two compile
time errors to fix, and a bunch of warnings that I didn't fix.

The warnings are related to pointer conversions, so I hope it
does it right.

I then ran it with the sample program in the Wikipedia lex article,
and it ran out of buffer space. It isn't very big, either.

But then I ran it with the sample from the Solaris lex man page,
and it works. It even works with -r to generate ratfor output.
(As far as I know, flex doens't have the -r option.)

In any case, I don't understand the buffer space message.
[AT&T lex was a student summer project and it has a bunch of fixed
size buffers. -John]

gah4

unread,
Jan 12, 2022, 6:00:30 PMJan 12
to
On Sunday, January 9, 2022 at 5:46:03 PM UTC-8, gah4 wrote:

(snip on running actual lex)

> I then ran it with the sample program in the Wikipedia lex article,
> and it ran out of buffer space. It isn't very big, either.

(snip)

> In any case, I don't understand the buffer space message.
> [AT&T lex was a student summer project and it has a bunch of fixed
> size buffers. -John]

OK, the sample in Wikipedia lex article has lines:

/* This tells flex to read only one input file */
%option noyywrap

It turns out that if you give that line to lex, it sets the size of the
output buffer to zero. (I got suspicious when the comment
mentioned flex, but had already found the output buffer
size was zero.)

Since I have the O'Reilly "Lex & Yacc" book, I could look up
lex options. It seems that

%o (number)

sets the output buffer size in lex, and zero if there is no number.

The rest of the syntax might be the same between lex and flex,
but the option syntax is not! (Hint to those working with old files.)
[It's on page 159, %e %p %n %k %a %o. That last flag is the number
of "output slots" whatever they were. -John]
Reply all
Reply to author
Forward
0 new messages