A Totally Clueless Newbie Tilting At Windmills

clueless newbie

unread,

May 7, 2014, 1:57:07 PM5/7/14

to marpa-...@googlegroups.com

Digression: Company "A" has lots of COBOL. Attempts are being made to "translate" some of this COBOL to Perl. Currently, the approach is to use Perl's regexes as a means of generated helper code.

I suspect that the process is doomed and will result in failure. I have no idea if the goal of getting off the mainframes is just a public goal or a real goal. -- But enough of that.

Question given Marpa and the existence of a BNF for COBOL, can the problem of parsing COBOL and generating passable Perl be done? If so where does one start and where does one go for assistance --- it's not like Joe down the street knows anything about Marpa?

Thanks,

A Totally Clueless Newbie

Jeffrey Kegler

unread,

May 7, 2014, 2:19:55 PM5/7/14

to marpa-...@googlegroups.com

In dealing with questions of this sort, I assume that it's my technical advice being sought, as opposed to reflections on project strategy. (Though I have completed far more successful projects than most project managers out there.) But I can't resist a few comments, so I hope you'll indulge me for a few sentences. COBOL to X translation (for X fill in the language of your choice) can be done. In fact, when the US government first started handing out big UNIX contracts, I naively imagined the government contractors would hire some folks who knew UNIX and C. Instead they bought COBOL translation tools and used their current staff.

How well this works is another question. The software failures we hear about are the exception. Most totally failed projects are hidden. All I can say is that I've heard about a lot of these projects getting started, but I've never heard about one being successful. On the other hand, success doesn't get talked about all that much either. Perhaps the biggest clue is that after a while I did not hear any more about projects that planned to use COBOL-to-C translators.

OK, now for the kind of answer that's closer to what you really wanted from me -- about Marpa. One way or another, Marpa can do this, given that it also allows procedural hacks in addition to general BNF parsing. So if it's out there and being parsed, Marpa can parse it. I frankly don't know much about COBOL's particular quirks. As for assistance, there are the usual channels, including this mailing list. How effective the assistance can be depends a lot on how much of the code those responsible are willing to share. A COBOL-to-X translator would probably engage my interest a bit, if only for nostalgic reasons, and if it were an open source project, it'd make an interesting example.

-- jeffrey

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

clueless newbie

unread,

May 7, 2014, 2:33:24 PM5/7/14

to marpa-...@googlegroups.com

Being that no one in authority at Company A is going to say ... "Let's use the Marpa approach", I'm going to have to attack this on my own time. And since I'm certainly not capable of accomplishing it on my own it would very much have to be open sourced.

Jeffrey Kegler

unread,

May 7, 2014, 2:39:17 PM5/7/14

to marpa-...@googlegroups.com

By the way, if you prefer IRC, there's an IRC channel -- irc.freenode.net: #marpa

You've gotten me curious enough to go to the COBOL Wikipedia pages. And others in the Marpa community have done transpilers -- we have a ECMAScript transpiler and a C compiler frontend, so there is a fair amount of relevant experience out there. Though I suspect the COBOL expertise is a bit thin. :-)

-- jeffrey

clueless newbie

unread,

May 7, 2014, 3:01:37 PM5/7/14

to marpa-...@googlegroups.com

With COBOL being as "verbose" as it is, just being able to parse the classic "Hello, World" program would be a great start. Obviously the first thing one does is kill the meaningless chaff (the sequence name area and the program name area) then concatenate the lines that are continuation with their predecessor.

COBOL BNF Another COBOL BNF

Jeffrey Kegler

unread,

May 7, 2014, 3:04:54 PM5/7/14

to marpa-...@googlegroups.com

I was about to advise an incremental approach, so you're ahead of me. You'll find that Marpa lends itself very nicely to an incremental approach.

Is the COBOL standard freely available? And which one are you targeting?

-- jeffrey

clueless newbie

unread,

May 7, 2014, 3:50:17 PM5/7/14

to marpa-...@googlegroups.com

There are ANSI standards for COBOL. I assume since DOD was a heavyweight in its development that the standards should be in public domain.

Any recent standard should do. COBOL programmer's don't tend to use advanced features.

(I never did get to thank you for responding so quickly. I figured everyone would say COBOL ... get out of here!)

Aristotle Pagaltzis

unread,

May 7, 2014, 3:54:06 PM5/7/14

to marpa-...@googlegroups.com

* clueless newbie <li.han...@gmail.com> [2014-05-07 20:00]:

> Question given Marpa and the existence of a BNF for COBOL, can the
> problem of parsing COBOL and generating passable Perl be done?

Without knowing COBOL in depth I’ll venture that strictly speaking, the
answer is, Yes, it is just a matter of sufficient effort.

However, everything hangs on that “passable”. What is passable?

You can, if COBOL is not particular crazy in some way, certainly produce
Perl code that corresponds mechanically to the original COBOL code. But
that Perl code will not look anything like code that a Perl programmer
would write. It will look like COBOL code expressed in Perl, because it
is – and not just in the trivially tautological sense: it will be built
on top of the exact semantics that COBOL code is, it will work exactly
like the COBOL code does that it came from. Short of a miracle, it is
not going to be a useful basis for refactoring it into *good* Perl code,
because all the choices for interfaces, abstractions and distribution of
responsibilities in the code were made when it was written in COBOL, all
of which you’d choose differently if you were writing Perl. Conversion
will just carry those over mechanically, because it cannot rethink them.
The only thing that can is called a programmer.

Note that the consequence is that if you want to hire Perl programmers
to work on this code base you will not find them very effective, because
it won’t be written the way good Perl code is.

So the question is what the project’s ultimate goal is. It is highly
likely that management is imagining it will achieve something that it
cannot, something much different from the very narrow kind of goal that
this approach is limited to.

It is almost certain that the only way to achieve what management really
wants is a piecemeal rewrite of the system, by first breaking it down to
isolated components if need be, and then swapping them out over time for
rewritten pieces. It’s a lot of effort, but I know of no other approach
that has ever been successful. Conversions between radically different
languages never work, for the above reasons, and complete rewrites fail
because they are by definition waterfall projects, and almost always
beyond the scale where that model can work. The only hope for a rewrite
to succeed is to break it into chunks that can be finished and then
proven in the crucible of production use after finite amounts of effort
each, to get away from failure being an all-or-nothing proposition for
the effort as a whole. Even if the budget runs low before the conclusion
of the project, there are still working deliverables instead of a total
write-off.

(Yes, in the absolute worst case this will require ultimately rewriting
every part of the system two or three times in order to get rid of all
the structures imposed by the original system’s design without changing
more than one or two pieces at a time. But there is no faster way to get
there, not successfully. The upside is that you get new code running in
production at each step along the way. I.e. it transforms “huge project”
into “long-term commitment”.)

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

Michael Roberts

unread,

May 7, 2014, 4:14:50 PM5/7/14

to marpa-...@googlegroups.com

I would totally be on board to help with an open-source COBOL parser - but Aristotle's reply is sadly the most accurate assessment I've ever heard of this entire approach to modernization. It can be done, but you're not going to end up with code that's based on the same mental constructs that it would have been had it been written (or rewritten) in Perl.

However, it *would* run on non-mainframes, and you'd have a path forward. Eventually.

I'll bet you could end up with code that was *accessible* to a Perl programmer - COBOL's mostly about record and string manipulation to start with, and so is Perl. It would most definitely be a fascinating project to hack at.

clueless newbie

unread,

May 7, 2014, 4:37:06 PM5/7/14

to marpa-...@googlegroups.com

In earlier days the same thing was said of compilers. That a good assembler programmer could always outcode the compiler. NOTE the keyword good! The problem is lack of good programmers!

Given that a requisite for the resources on this project ... is one or more years of Perl, it's likely that the Perl produced will vary widely from very bad to passable with the majority being indifferent.

Aristotle Pagaltzis

unread,

May 7, 2014, 5:14:25 PM5/7/14

to marpa-...@googlegroups.com

* clueless newbie <li.han...@gmail.com> [2014-05-07 22:40]:

> In earlier days the same thing was said of compilers. That a good
> assembler programmer could always outcode the compiler. NOTE the
> keyword good! The problem is lack of good programmers!

I’m afraid you picked the wrong metaphor.

1. The goals are fundamentally different.

Nobody uses a compiler to do a one-time switch from a C codebase to
an assembler codebase so that they can hire assembler programmers to
maintain it. In fact no one even cares what the output of the
compiler looks like specifically, just that it accurately reflects
the semantics of the code in the source language and runs as fast as
possible.

So what is asked of a compiler is much less demanding than the kind
of project you have been tasked with.

2. Much more fundamentally, assembler offers fewer abstractions than any
non-esoteric language. In fact, every real compiler always translates
in the direction of fewer abstractions.

But Perl has a lot *more* abstractions than COBOL.

You are really trying to go in the other direction – i.e. to write
a decompiler, essentially. Those do exist, but they are very limited.
All the useful ones work only because they assume they are looking at
the output of some (quite particular) compiler, and they translate
that back to the original language. There is no decompiler that can
take arbitrary assembler and spit readable and maintainable C out the
other end.

If the ultimate goal of your project really is just accurate mechanical
translation, of the sort a compiler does, then yes absolutely, that can
be achieved.

If the goal is to get readable maintainable Perl out of the exercise, …

clueless newbie

unread,

May 7, 2014, 5:36:39 PM5/7/14

to marpa-...@googlegroups.com

I can only judge management's goals by management's selection of resources it is willing to dedicate to the project.

Few Perl programmers with only a year under their belt can code stuff worthy of a Burke or a Conway.

Michael Roberts

unread,

May 7, 2014, 5:41:50 PM5/7/14

to marpa-...@googlegroups.com, clueless newbie

Here you go: http://sourceforge.net/projects/open-cobol/ GnuCOBOL actually works by translating COBOL into intermediate C. It would be relatively easy to convert that into intermediate Perl instead of C. You're done. Sure, it would be crappy Perl, but it would be Perl.

--
You received this message because you are subscribed to a topic in the Google Groups "marpa parser" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/marpa-parser/OBh6tMOCaPE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to marpa-parser...@googlegroups.com.

clueless newbie

unread,

May 7, 2014, 5:55:41 PM5/7/14

to marpa-...@googlegroups.com, clueless newbie

I think the Marpa approach has more appeal than bootstrapping from open-cobol.

Michael Roberts

unread,

May 7, 2014, 5:57:08 PM5/7/14

to marpa-...@googlegroups.com, clueless newbie

You could definitely steal their code generation as a start, though. (I have to admit Marpa would be much cooler.)

clueless newbie

unread,

May 7, 2014, 6:04:25 PM5/7/14

to marpa-...@googlegroups.com, clueless newbie

Plus all but the XS is Perl.

Michael Roberts

unread,

May 7, 2014, 6:06:08 PM5/7/14

to marpa-...@googlegroups.com, clueless newbie

So? You'll get Perl out no matter what you write the hypothetical translator in. Maybe I'm missing the point.

clueless newbie

unread,

May 7, 2014, 6:09:32 PM5/7/14

to marpa-...@googlegroups.com, clueless newbie

Except for the XS part, the COBOL to Perl "translator" would Perl.

Ron Savage

unread,

May 7, 2014, 6:24:30 PM5/7/14

to marpa-...@googlegroups.com, clueless newbie

Hi

Firstly, as an aside, I have to say, about your choice of topic:

A Totally Clueless Newbie Tilting At Windmills parses (using my wetware) as:

1) Totally Clueless => Beginner

2) Newbie => Beginner

3) Tilting At Windmills => Beginner

So I think you've coined a triple tautology there!? Well done! But of course it could be parsed by Marpa as well.....

As for (shudder) Cobol, hmm, yes, I was a Cobol contractor for 11 years, but protective amnesia, umm, protects me from recalling too many details. Although I still do remember the day a manager came into our office and told us that, due to the economy drive, we could only take 2 pencils at a time from the stationery cabinet. (I swear this is a true story). He must have had a memory issue (too), because he forgot to offer to take a pay cut.

Ahh, those were the days....

As for you project, I strongly advise you to forget it, and accept that hand-writing to corresponding Perl is the only way to go.

A massive problem IMHO is the part of Cobol where you declare the picture clause of the variables. To Cobol newbies, this is the storage space, decimal place alignment, etc, type of stuff - per variable.

This matters because the declarations of 2 variables affect how the results of calculations are effected. Possibly, intermediate values are affect too, but I forget that detail.

The pain I predict means I must warn you not to go there. Now, you might think that doing it by hand would involve the same sort of thing, so why not automate it? I still say no. This is a way of saying your brain will, after doing a couple of programs, develop the skill to effectively automate it, i.e. get faster and faster. Perhaps with some experience you might return to the Marpa-based approach to handle just the logic (which I discuss next).

What I would do is: Start by concentrating on the productivity gain of writing fresh Perl to emulate the logic, /without/ caring about the accuracy of the results. Ie make the Perl embody the spirit of what the Cobol is trying to do. Then, study the representation of the data as declared in the Cobol picture clauses, in order to make the code do exactly the right thing.

For example, if it's high-quality code it will follow the rule we did, of only ever performing a section, not just a paragraph. This means you turn the sections, or, at worst, the paras, into Perl subs, as the first step. Then it's back to the logic.

Or, just get another job :-(.

Ron Savage

unread,

May 7, 2014, 6:27:58 PM5/7/14

to marpa-...@googlegroups.com

On Thursday, 8 May 2014 07:36:39 UTC+10, clueless newbie wrote:

I can only judge management's goals by management's selection of resources it is willing to dedicate to the project

But you also said: "I'm going to have to attack this on my own time." I.e. No official resources.

Surely this means management's attitude is that the project is (perhaps implicitly) worthless (to put it bluntly)?

Ron Savage

unread,

May 7, 2014, 6:41:22 PM5/7/14

to marpa-...@googlegroups.com, clueless newbie

For non-Cobolers, I should add that one point of pictures is implicit declaration of the position of the decimal place. All that has to be made explicit in the Perl.

clueless newbie

unread,

May 7, 2014, 6:46:24 PM5/7/14

to marpa-...@googlegroups.com, clueless newbie

Hi, Ron,

As to your aside, I've seen seasoned programmer that were totally clueless.
Too much in the box thinking.
One can tilt at windmills with the right tools --- primacord and c4 come readily to mind.

.

Ron Savage

unread,

May 7, 2014, 6:47:33 PM5/7/14

to marpa-...@googlegroups.com

As for finding work, I should add that a couple of months ago now I found a marvellous contract (writing PHP unfortunately) via oDesk.com.

Companies post jobs and request applications. You register and become eligible to apply.

I work from home, about 25 hours a week. No commuting. Time to run the washing machine, etc. Very well paid (ask vie email for details if desired). And the co. I work for is Australian! And they are in the same time zone, which is extremely fortuitous.

Message has been deleted

clueless newbie

unread,

May 7, 2014, 6:57:25 PM5/7/14

to marpa-...@googlegroups.com

No official resources for the Marpa approach ... (just remember Victory has a thousand fathers, defeat is an orphan.)

So if it were to work ... there would be all sorts of "secret" sponsors!

Jeffrey Kegler

unread,

May 7, 2014, 7:03:38 PM5/7/14

to marpa-...@googlegroups.com

Aristotle's summary is excellent as to why the classic model, where an
employer who has lots of COBOL programmers hopes to complete C projects
by buying a program to translate, didn't work. (Except in the sense the
contractors who tried it got probably got paid handsomely before the
government wised up.)

A more open question is whether or not a COBOL-to-Perl utility might
have some limited usefulness in specific contexts. For example, if
you're about to translate a COBOL program to Perl, having the output of
such a program might sometimes be a good place to start. If (as will
often be the case), the output of the automatic translator is simply not
useful at all, you can always throw it away.

For a utility with restricted usefulness, you want the investment
required to be very limited as well. Also, you want the investment
required to be incremental, so that you can give up when the point of no
return is reached without writing down all of your efforts to date.
Marpa does allow you to get started quickly and to approach the task
incrementally.

Note also that a Marpa-powered COBOL transpiler would be a useful basis
for other utilities, such as one to enforce a shop's standards on the
code, a literate programming frontend, etc., etc.

But as Aristotle points out, code is an expression of a mental model.
The mental gets lost in automatic translation, so that the result is
unreadable and maintainable, although it may run correctly.

-- jeffrey

John Alvord

unread,

May 7, 2014, 7:11:02 PM5/7/14

to marpa-...@googlegroups.com

I did about 6 months of COBOL in early 1970. It wasn't the worst experience of my career. I remember even then the manuals were massive. Assembler was much more rewarding.

Before tackling it I would suggest you define a success metric. If you get something 90% working after a year, will that benefit you in some way? How much source code do you have to deal with? In 1970 at John Hancock it was maybe 800,000 lines of code. Goodness knows what it is now!!! It has libraries of course, in various languages - sort of like glib. And much of the code runs in specific environments, like batch jobs and CICS transactions. These are not standalone utilities, at all.

If you are just tackling it for fun, I am sure you will learn a lot.

John Alvord

--

You received this message because you are subscribed to the Google Groups "marpa parser" group.

To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.

clueless newbie

unread,

May 7, 2014, 8:03:44 PM5/7/14

to marpa-...@googlegroups.com

Company "A" is in the same business and larger than John Hancock. I'd bet that one is talking of tens of millions of lines of code. --- but remember that generally speaking only 60 characters of a COBOL line is useful.

Ron Savage

unread,

May 7, 2014, 11:14:09 PM5/7/14

to marpa-...@googlegroups.com, clueless newbie

On Thursday, 8 May 2014 08:46:24 UTC+10, clueless newbie wrote:

Hi, Ron,

As to your aside, I've seen seasoned programmer that were totally clueless.

Too true.

Too much in the box thinking.
One can tilt at windmills with the right tools --- primacord and c4 come readily to mind.

I am mercifully ignorant of primacord and c4......

Aristotle Pagaltzis

unread,

May 7, 2014, 11:51:03 PM5/7/14

to marpa-...@googlegroups.com

* clueless newbie <li.han...@gmail.com> [2014-05-08 00:55]:
> At any rate I want to learn how to use Marpa.

And please do not let my cautions keep you from doing that. There is
a big difference between doing a project as a means to a business end
vs. doing it to learn, in fact the wise choices among them are almost
diametrically opposed. (Reinventing the wheel? As a business decision,
a terrible idea, just as conventional wisdom has it; for learning, one
of the best approaches, contra the conventional wisdom. Etc.)

clueless newbie

unread,

May 8, 2014, 2:04:31 AM5/8/14

to marpa-...@googlegroups.com

It's just my guess that the first step is to get the SLIF to successfully parse COBOL. After that pulling stats related to the frequency of occurrence of statement, which would suggest where the biggest bang for the buck is to be found.

Ruslan Shvedov

unread,

May 8, 2014, 2:39:15 AM5/8/14

to marpa-...@googlegroups.com

On Thu, May 8, 2014 at 9:04 AM, clueless newbie <li.han...@gmail.com> wrote:

It's just my guess that the first step is to get the SLIF to successfully parse COBOL. After that pulling stats related to the frequency of occurrence of statement, which would suggest where the biggest bang for the buck is to be found.

This can be the cart before the horse.

From my experience, a translation (esp. an untrivial, idiomatic one) frequently needs a different AST than, e.g., pretty printing, for they are different semantics.

So, trivial as it may be, if you need to translate, start with a dictionary. A pattern dicitonary that is. A pattern is: this piece of COBOL translates to that chunk of Perl, with the needed substitutions. A piece or chunk can be a module, a sub or a code fragment. You can define patterns based on frequency (Sequitur algorithm can help here) or idiomaticity or priority.

Once you have patterns, you can build your parser around them and build your translator as a filter — parse what you can translate, pass through (comment out) what you can't. Thus the dev can be split to more manageable dictionary/parser/translator refinement cycles.

That looks sort of like what they're doing now with Perl regexps, but more structured and future-proof, so to say.

Hope this helps, -- rns.

Durand Jean-Damien

unread,

May 9, 2014, 1:16:22 PM5/9/14

to marpa-...@googlegroups.com

Beging a noob with COBOL I just wanna know if the language is perfectly standardized, or if programmers in it are tempted to use extensions that makes a general parser more difficult, like with C code using GNUCC only extensions.
Side-effect of my question, is the COBOL source code you are targetting subject to some extensions, or is it writen if perfect (ANSI ?) COBOL.
Thanks,

Btw I found this link interesting.

clueless newbie

unread,

May 9, 2014, 1:56:12 PM5/9/14

to marpa-...@googlegroups.com

Durand,

Thanks for the link!

COBOL programmers, in general, aren't adventurous -most will stick to the standard that they're are familiar with. Large COBOL shops, on the other hand, won't allow you to use an extension unless you can get written sign-offs from each of your sixteen grandmothers --- too bad if you have fewer!

I've started by playing with amon's transforming syntax. Once I got a reworked version (I wanted the SLIF-BNF in the __DATA__ section and the input to come from a file), I needed to pre-process the raw cobol by stripping coomments and the sequence number and the program name areas. So now I can get to the "guts" of the COBOL program. Simple regexes allow me to split that into divisions.

My plan is to begin with the IDENTIFICATION DIVISION and go (or fall down) from there.

Reply all

Reply to author

Forward

Message has been deleted