True, but not entirely accurate:
The
all-purpose nature of the STRIP.TEC macro completely cleans up all
sorts of irregularities, including LF without CR, CR without LF and LF/CR
instead of CR/LF. In short, just on line termination alone it cleans
up all mistakes, etc. But it does not case convert. Thus, all of the
files are fine in
that regard.
I
just went further than that in making a case-folded version from the
already cleaned up input by using OS8CON, the P?S/9 conversion program.
It is not quite as "smart" with regard to the other problems, but
generally comes close. However, in this case, it was a post-process
after what was already established in the STRIP.TEC pass.
When An OS/8 file [the mixed-case but otherwise OK file with CR/LF ending every line] is passed through OS8CON towards P?S/8, all of the following conversions occur:
1) [This will be fixed in the next release]. A couple of characters are totally ignored. In the next release they will be converted to the # character instead. The characters are @ and ` which are in violation of the P/S/8 6-bit character set. Technically, they have no proper function in a six-bit word as in six-bit they become 000000 which is universally accepted as some form of null situation, admittedly system-specific as to exactly what. But at the minimum, the way the TEXT and SIXBIT directives work in PAL, this has to be an end of string at the very least. Thus, any practical implementation must not allow these characters directly, etc. DIAL does it's internal reckoning somewhat differently, and in the process winds up slightly more restricted, in part because the file format has no internal word restrictions. This is also part of the reason that editing with DIAL has pathological cases where you lose control for minutes at a time as it pathetically starts a "fix-up" process, etc. I may in the future do a P?S/8 SHELL editor for the PDP-12 screen which by nature really has to be all upper-case, but there are better ways to do it than DIAL, and many have soundly criticized both it and the Vista editor for not being "smart" etc. I would use the technique invented by Stewart Duwar except I would do a final clearout which as far as I know, he never implemented. [This is getting off-topic, but the point is you have to have ground-rules, and the P/S/8 ones are consistent with the TEXT and SIXBIT directives of PAL, etc.]
In any case, the next release of OS8CON will instead of tossing the characters, they will be arbitrarily converted into # because the one valid case where they can be tolerated would be allowed.
Originally, the TEXT directive allows only the " characters delimiting the text. This was then extended to " or ' as delimiters. In TOPS10 PAL10, it was generalized to any innocuous character as a delimiter, most notably ; is not allowed because that indicates multiple statements on a line, but just about anything else could be tried. Since this is a 7-bit character set situation, instead of flagging both ` and @ as illegal characters, they allowed both of them [not interchangeably]. But this is only true as delimiters of TEXT directives and no other place in the file is valid. Since P?S/8 cannot support this, it was deleted. But the new change will convert it to # because this will allow statements of the form
TEXT #HELLO#
which is perfectly fine. As a slight super set, P?S/8 PAL does not require the second delimiter IF and only IF the TEXT directive is the last statement on the line. [The presence of a ; character cannot be considered as a delimiter, etc.]
These conversions are not affected by this situation, but it is germane to the topic.
2) Since P?S/8 text files cannot handle FF, the syntax EJECT with an optional horizontal tab in front is treated as the equivalent. The conversion will provide the HT for neatness, but when going towards OS/8 it can be missing. In most sixbit languages this is the syntax to be used originally, and it is an accommodation to the seven-bit character set that the FF character is equivalent to that sequence. DIAL also supports this feature as does OS/8. OS8CON thus converts FF to <HT>EJECT<CR> <LF>. This is important because both PAL10 and P?S/8 PAL also support the TITLE ditrective and its alternative syntax:
EJECT STRING
If there is a string after the EJECT, it becomes the title line on the next printed page. OS/8 PAL8 does not support at least one of these methods, but P?S/8 is dual-compatible with PAL19 in this regard. The default is to build the title line string from the first line of every input file, but P?S/8 also supports a command-line switch to turn off the secondary changes because the source program may be the concatenated convents of many files, thus it would be pointless to change it on every file in that situation.When a converted program is to a shorter series of P?S/8 extended-length files, this is a valid use of the feature, etc. For example, the proper source files for FOCAL, 1969 is two files, normally called FOCAL.ZZM and FLOAT.ZZM. When separately converted to P?S/8 extended-length files, each can participate as intended; the resultant listing is 100% authentic when compared to PAL10, etc.
Again, this is an issue not germane here, but this is to be thorough as these issues invariably do come up, etc especially when converting source code from DIAL. P?S/8 supports another similar conversion program between DIAL tapes and P?S/8. P?S/8 L6DCON uses internal LIINCtape routines and can only run on a LINC-8 or PDP-12. The stated problem above does not apply, but there are other considerations. [Note: Both programs work in BOTH DIRECTIONS!].
3) CR/LF is converted into P?S/8 internal <EOL> which can be likened to the unix usage of LF alone, which was the original complaint. Thus, if the ffiles were imaged directly into OS/8 complete with the inappropriate end-of-line, OS8CON would also have fixed the same problem. However, it would also fold the file characters to upper-case as this is the character set of P?S/8 files.
Thus, without having to care, OS8CON likely could just be the only programming necessary, but it was clearly already resolved by the STRIP.TEC macro reducing the additional conversion to all upper-case.
Thus, as a post-process I decided to create all UC versions after the fact. But since the ifles were as you described, P?S/8 OS8CON could have done that all by itself [I just didn't care because STRIP.TEC does EVERYTHING!].
Thus, in hindsight, this is what Warren was referring to, not liking it,but dealing with it from a unix position which was not helpful. Since MS-DOS text conventions are the same as DEC conventions, the PEPS package for Windows will always solve these problems.
Always keep me in the loop when attempting any of these situations. It's likely I've already been there and done that regardless of the particulars. That way, a lot less time is wasted reinventing wheels. If there is one criticism of Warren, it was that he was arguably TOO independent and self-sufficient and didn't ask if that was the case. Since we are carrying on without him, let's all cooperate to help each other out where appropriate, etc.
In the meantime, I should be done with another pass on the PEPS user guide soon. I will announce that soon in the pepswin group; still will be a "beta" but should be closer to final [sans appendixes resolving].
I had a snag I fixed last week: The AnyPDF convertor turned out had a bug in it as used out-of-the-box where it misformatted lines wider than 120 characters. This seldom occurs, but not never. I created output that it ruined, etc. fortunately,I studied their package more thoroughly and found obscure marghin property settings that corrected the problem. [Largely undocumented, but it worked! The settings exist, but the parameter numbing system is not documented, so I just started experimenting until I saw wha might work, etc. Worked first time, etc.] so this is no longer an issue.
cjl