Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Turbo Pascal to Delphi

97 views
Skip to first unread message

Philippe Ranger

unread,
Sep 16, 1999, 3:00:00 AM9/16/99
to
To lurkers: this is the answer to a question posted under the same Subject
in non-tech.

<<Jeff:
What's the best way to read comma delimited strings?
>>

CommaText property in TstringList, but that's a big bite right now. So, back
to the old ways.

<<
What I'm doing now is:
ReadLn-ing a single string (which is one line) from the file
While s[n]<> ',' (where s is the entire string, and n is just an integer
for
location) and making another string:= that (doing this via a function)
>>

Don't pseudo-code. Post your exact code. The general method would be --

Const
CnFields = 16; // or whatever your count is
Var
aFields: array[1..CnFields] of string;
t: textFile;
s: string;
ja: integer; //index into aFields
js: integer; //index into s
Begin
assignFile(t, fileName);
try
reset(t);
while (not eof(t)) do begin
readln(t, s);
ja := 1;
js := pos(',', s);
while (js <> 0) do begin
aFields[ja] := copy(s, 1, js-1);
//don't copy comma
inc(ja);
s := copy(s, js+1, $FFFF);
//copy whole tail of s, to s
js := pos(',', s);
end;
//here, you process aFields
end;
finally
closeFile(t);
end;
End;

There are only two new things in this. One, strings of indefinite length
(dynamic strings). Two, the "exception handling" that makes sure that,
whatever happens, the file gets closed. You don't *need* either, but better
to start off on the right foot. The ...File suffixes aren't necessary
either, but now sometimes there may be ambiguity otherwise, so it's better
to use them.

<<
will a "packed record" accept multiple types? (eg string, chr, integer,
real)
>>

Yes, of course. But you don't have "records" in the Pascal sense, you have
lines in a text file. So forget that aspect of it.

<<
I was going to use the AnsiQuotedStr function to
convert the non-quoted input data into quoted output data for those fields
which must be quoted (strings only -- numbers are unquoted except for ssn
and
phone numbers).
>>

Why does anything need quoting? The only purpose of it would be to include
commas in the quotes. Also, ansiQuotedStr is there mostly to serve with the
Win API. You can just as well do --

myStr := '"' + myStr + '"';

<<
And of course, the output file has to be comma delimeted as well.
>>

If you want to do a lot of changes on those string fields, then the above
code would be nicer to use if it was based on a TstringList, which easily
supports Insert and Delete, instead of an array.

This goes this way --

--------------
Uses sysUtils, classes;

Procedure getFields(s: string; sls: TstringList); forward;
(*Pre: sls is empty.
Post: sls is filled with the comma-delimited fields of s, minus commas*)

Procedure writeFields(var t: textFile; sls: TstringList); forward;
(*Pre: t is open for writing.
Post: One line is added to t, holding the contents of sls,
comma-delimited.*)

Procedure processFile(srcName, destName: string);
Var
tSrc, tDest: textFile;
s: string;
sls: TstringList;
Begin
sls := TstringList.create;
assignFile(tSrc, srcName);
reset(tSrc);
try
assignFile(tDest, destName);
rewrite(tDest);
try
while (not eof(tSrc)) do begin
readln(tSrc, s);
getFields(s, sls);
writeFields(tDest, sls);
sls.clear;
end;
finally
closeFile(tDest);
end;
finally
closeFile(tSrc);
sls.free;
end;
End;

Procedure getFields(s: string; sls: TstringList);
Var
js: integer; //index into s
Begin
js := pos(',', s);
while (js <> 0) do begin
sls.add(copy(s, 1, js-1));
s := copy(s, js+1, $FFFF);
js := pos(',', s);
end;
End;
----------------------

As you can see, using the stringList only simplifies the code. I think you
can see your way to doing writeFields on the same model.

PhR

Jeff Buffington

unread,
Sep 16, 1999, 3:00:00 AM9/16/99
to
Philippe Ranger wrote:

> To lurkers: this is the answer to a question posted under the same Subject
> in non-tech.

Had a feeling this would be moving here <g>

> CommaText property in TstringList, but that's a big bite right now. So, back
> to the old ways.

Ok..KISS is the rule for now

> Don't pseudo-code. Post your exact code. The general method would be --

Ok will do.

> Why does anything need quoting? The only purpose of it would be to include
> commas in the quotes. Also, ansiQuotedStr is there mostly to serve with the
> Win API. You can just as well do --
>
> myStr := '"' + myStr + '"';

Re: quotes -- Well, the program that will be importing my new data requires
text fields (including phone and SSN numbers) be quoted, numeric fields do not,
and may include decimals (not applicable for this specific data file, but other
files will have money involved). What I'm actually doing is porting data out of
an old medical billing package into a format that the new package can import..
the new package utilizes Btrieve, and thus would be much more complicated to
directly access, so they created an import/export program..it just requires
files be formatted in a specific manner (quotes, commas, dashes [SSN], YYYYMMDD
dates [old just has YYMMDD]).

Re: myStr:='"'+myStr+'"'; -- If you say it's easier, than that's what I'll do
(was doing that for test purposes to write what was read to screen anyways).

> If you want to do a lot of changes on those string fields, then the above
> code would be nicer to use if it was based on a TstringList, which easily
> supports Insert and Delete, instead of an array.

> As you can see, using the stringList only simplifies the code. I think you
> can see your way to doing writeFields on the same model.
>
> PhR

Just so you can follow my logic -- here's what I've been doing:

Function MedStrings(s: string; i:integer):string;
var
txtstr : string;
Begin
txtstr:='';
While s[i]<>',' do
begin
if s[i]<>',' then txtstr:=txtstr+s[i];
inc(i);x:=i;
end;
x:=x+1;
MedStrings:=txtstr;
End;

Procedure ReadPLData;
var
medacc,medltr,medlname,medfname,medmi,meddob,medsex,medrel1,medrel2,medrel3,

medrel4,medrel5,medrel6,medpro,medlastact,medrclcyl,medlang : string;
f1, f2: text;
s : string;
Begin
AssignFile(f1, 'datpl.dat');
{AssignFile(f2, 'lm01.txt');}
FileMode:=2;
Reset(f1);{Reset(f2);}
While not EOF(f1) do begin
x:=1;
ReadLn(f1, s);
MedAcc :=MedStrings(s, x);Write('"',MedAcc,'" ');
MedLtr :=MedStrings(s, x);Write('"',MedLtr,'" ');
MedLname :=MedStrings(s, x);Write('"',MedLname,'" ');
MedFname :=MedStrings(s, x);Write('"',MedFname,'" ');
MedMI :=MedStrings(s, x);Write('"',MedMI,'" ');
MedDOB :=MedStrings(s, x);Write('"',MedDOB,'" ');
MedSex :=MedStrings(s, x);Write('"',MedSex,'" ');
MedRel1 :=MedStrings(s, x);Write('"',MedRel1,'" ');
MedRel2 :=MedStrings(s, x);Write('"',MedRel2,'" ');
MedRel3 :=MedStrings(s, x);Write('"',MedRel3,'" ');
MedRel4 :=MedStrings(s, x);Write('"',MedRel4,'" ');
MedRel5 :=MedStrings(s, x);Write('"',MedRel5,'" ');
MedRel6 :=MedStrings(s, x);Write('"',MedRel6,'" ');
MedPro :=MedStrings(s, x);Write('"',MedPro,'" ');
MedLastAct:=MedStrings(s, x);Write('"',MedLastAct,'" ');
MedRclCyl :=MedStrings(s, x);Write('"',MedRclCyl,'" ');
MedLang :=MedStrings(s, x);Write('"',MedLang,'" ');
WriteLn;
end;
CloseFile(f1); {CloseFile(f2);}
End;
---
The Write's are just temporary, so I can see what it's reading while running.
My output file code hasn't been written yet..wanted to make sure all the data
gets read properly first. As for the output file, it will contain 186 fields
(many will be empty), luckily (?) all of them are fixed length and I know the
exact lengths -- weren't they kind?

I will also have to open at least one other file to get additional information
due to differences between the way each program stores its data. The new system
uses one file for all it's patient information, the old system uses one for
account information another for patient (sub-accounts if you will). In the new
system each account is unique to one patient, in the old each account can have 6
patients -- thus the account information is common to all the patients in each
account, and the patient information is unique to each patient. Hope that makes
sense.
As for the calls made to the procedure, it's currently the only procedure being
called in the main part of the program.

At any rate, I thought I oughta thank you for all the help you've provided -- I
really appreciate it.


Philippe Ranger

unread,
Sep 16, 1999, 3:00:00 AM9/16/99
to
<<Jeff:

Re: quotes -- Well, the program that will be importing my new data requires
text fields (including phone and SSN numbers) be quoted, numeric fields do
not,
and may include decimals (not applicable for this specific data file, but
other
files will have money involved).
>>

Ok.

Now, you've said you hadn't touched TP since 1990. I don't know how rusty
you really are. But, at least for some lurkers if not for you, I'll try to
provide some pointers regarding your current code.

The first and most godawful problem is that global x. The main proc starts
like this --

<<
x:=1;
ReadLn(f1, s);
MedAcc :=MedStrings(s, x);Write('"',MedAcc,'" ');
>>

And goes on through many near-identical lines. So, x begins at 1, and never
gets used in the main proc except as a parameter for medStrings. Now, check
medStrings --

<<
Function MedStrings(s: string; i:integer):string;
var
txtstr : string;
Begin
txtstr:='';
While s[i]<>',' do
begin
if s[i]<>',' then txtstr:=txtstr+s[i];
inc(i);x:=i;
end;
x:=x+1;
MedStrings:=txtstr;
End;
>>

Jeff, you should not use globals, and especially not when all they're for is
to be passed as a parameter (but diddled with nonetheless in the func
they're passed to).

So, x is passed in as i, which is used as an index into s, and the contents
of s up to the next comma, exclusive, are returned by the function. If there
is no comma, this loops until crash. On return, x is one position past the
comma.

First comment -- Commenting helps!!

Second comment -- Obviously, x is a return parameter, and since it's also an
input param, all you have to do is pass i as Var, and keep the x local to
the main proc.

Third comment -- Let's KISS, right? So, using the above as the spec for the
function, we first get --

------------
Function MedStrings(s: string; var i: integer): string;
Begin
result :='';
While (i <= length(s)) and (s[i] <> ',') do
begin
result := result + s[i];
inc(i);
end;
delete(result, length(result), 1);
inc(i);
End;
-------------

Note that I've made the function stop at the end of string.

By the way, the only thing I have against AnsiQuoted is that you don't have
to learn everything. But, what you would *love* to explore is HyperString,
which has all kinds of functions, including everything you need to parse
your lines. It's free, source $39, at
http://efd.home.mindspring.com/tools.htm .

Fourth comment --

<<
Procedure ReadPLData;
var

medacc,medltr,medlname,medfname,medmi,meddob,medsex,medrel1,medrel2,medrel3,

medrel4,medrel5,medrel6,medpro,medlastact,medrclcyl,medlang : string;
>>

Anything like this should ring a loud bell. That many strings, and you want
to do the same thing on all of them? Use an array (or a TstringList). Just
not to lose your way in the index, you can transform this into an enumerated
type, and index on that. So --

-----------
Procedure ReadPLData;
var
aFields: array[TmedField] of string;
jF: TmedField;


f1, f2: text;
s : string;

x: integer;


Begin
AssignFile(f1, 'datpl.dat');
{AssignFile(f2, 'lm01.txt');}
FileMode:=2;
Reset(f1);{Reset(f2);}
While not EOF(f1) do begin
x:=1;
ReadLn(f1, s);

for jF := low(aFields) to high(aFields) do begin
aFields[jF] := MedStrings(s, x);
Write('"', aFields[jF],'" ');
end;


WriteLn;
end;
CloseFile(f1); {CloseFile(f2);}
End;

---------------

Somewhat more KISS than it was -- arrays are for vars that you want to loop
over. I've used the low() and high() pseudo-functions, which are a bit
younger than TP5, but you don't need them. Only, using them, you don't have
to worry about your loop limits, they self-define over the array.

I still think my previous code is a better model, though.

Good luck, and don't hesitate to ask!

PhR

Philippe Ranger

unread,
Sep 16, 1999, 3:00:00 AM9/16/99
to
<<PhR:

Use an array (or a TstringList). Just
not to lose your way in the index, you can transform this into an enumerated
type, and index on that. So --
>>

Sorry, Jeff, I forgot to post the type declaration --

Type
TmedField = (medacc,medltr,medlname,medfname,medmi,meddob,
medsex,medrel1,medrel2,medrel3,medrel4,medrel5,medrel6,
medpro,medlastact,medrclcyl,medlang);

PhR

Jeff Buffington

unread,
Sep 16, 1999, 3:00:00 AM9/16/99
to
Philippe Ranger wrote:

> Now, you've said you hadn't touched TP since 1990. I don't know how rusty
> you really are. But, at least for some lurkers if not for you, I'll try to
> provide some pointers regarding your current code.

A lot has changed in 9 years -- Windows for example. I started off with
TurboBASIC before that (circa 1988); tried QuickBasic first but liked Borland
*much* better -- thus when I started learning Pascal in high school, Borland it
was. In 1990 my school (at the time) sold TP 5.5 as part of the course
requirements.

> The first and most godawful problem is that global x. The main proc starts
> like this --

I know -- whip me with a wet spaghetti string <g>

> First comment -- Commenting helps!!
>

// Comment to self -- don't forget to comment! :){ I should know better }

> Second comment -- Obviously, x is a return parameter, and since it's also an
> input param, all you have to do is pass i as Var, and keep the x local to
> the main proc.

I seem to recall trying that and having problems with it being "not declared" in
the function -- but at that point the sheets were still on the furniture, let
alone the cobwebs, in my brain.

> Function MedStrings(s: string; var i: integer): string;
> Begin
> result :='';
> While (i <= length(s)) and (s[i] <> ',') do
> begin
> result := result + s[i];

Ok presently I'm getting an error of "Incompatible types" on the above line.
Result is new to me, but according the 'help' it means the same as the function
itself. Perhaps the problem is that the function itself requires 2 params?

> inc(i);
> end;
> delete(result, length(result), 1);
> inc(i);
> End;
>

But I did understand your code -- :) Like I said, still cleaning those
cobwebs...

> Anything like this should ring a loud bell. That many strings, and you want

> to do the same thing on all of them? Use an array (or a TstringList). Just


> not to lose your way in the index, you can transform this into an enumerated
> type, and index on that. So --

I have fond memories of telling a professor that I could indeed make 4
dimensional arrays...

> I still think my previous code is a better model, though.

I did print it and will study it to learn from it.

Jeff Buffington

unread,
Sep 16, 1999, 3:00:00 AM9/16/99
to
Jeff Buffington wrote:

> > Function MedStrings(s: string; var i: integer): string;
> > Begin
> > result :='';
> > While (i <= length(s)) and (s[i] <> ',') do
> > begin
> > result := result + s[i];
>
> Ok presently I'm getting an error of "Incompatible types" on the above line.
> Result is new to me, but according the 'help' it means the same as the function
> itself. Perhaps the problem is that the function itself requires 2 params?

Found the problem -- I was trying to tell it "result:=length(result)+s[i];" -- my
bad (more wet spaghetti strings)

> > inc(i);
> > end;
> > delete(result, length(result), 1);

Problem here was it would remove a character from the string..(perhaps to remove the
comma?) ...the result was that single character fields were effectively nullified.
Solution was to remove the above line (or comment it out). It now reads all the
fields and can display them..and the code is "cleaner".

> > inc(i);
> > End;
> >

Tnx -- Jeff


Philippe Ranger

unread,
Sep 17, 1999, 3:00:00 AM9/17/99
to
<<Jeff:

I seem to recall trying that and having problems with it being "not
declared" in
the function -- but at that point the sheets were still on the furniture,
let
alone the cobwebs, in my brain.
>>

This is what you get if you use x in the func. The point of making i a var
param is not to use x at all in the function. That's parameters for you --
you pass the function anything, hippotamus if it's your cup of tea, and
inside the function the hippo has only one name, i, the name you declared
for the parameter. The hippo is inaccessible under "hippotamus" (unless it's
global). So, in the code I gave you, i is the parameter, and x is the var
passed to it -- shorter than hippo.

<<
Result is new to me, but according the 'help' it means the same as the
function
itself. Perhaps the problem is that the function itself requires 2 params?
>>

Result is the same as what you used to do with the function name --
something to assign the result to. But, contrary to the func name, you can
use it as a plain value, as I do in --

result := result + s[i];

So, compared to your original code, Result does double duty. It's txtstr,
and it's the func name, MedStrings.

<<
> > delete(result, length(result), 1);

Problem here was it would remove a character from the string..(perhaps to
remove the
comma?) ...the result was that single character fields were effectively
nullified.
Solution was to remove the above line (or comment it out). It now reads all
the
fields and can display them..and the code is "cleaner".
>>

Hmm... You're perfectly right, that's a good call! I over-corrected. My loop
already quits at the comma.

I should have checked -- if I have to do something unneat at the end, I
should make sure the main part is well-designed. In this case, it was
perfect, the unneat part was unneeded. At the time, I was thinking of a
further simplification, so here it is --

-----------


Function MedStrings(s: string; var i: integer): string;

Var
i0: integer;
Begin
i0 := i;
While (i <= length(s)) and (s[i] <> ',') do inc(i);
result := copy(s, i0, i - i0);
inc(i);
End;
-------------------

Your main proc assumes a specific number of fields, *all* comma-terminated,
so the length test is just defensive programming that ensures the function
will work in other contexts. Note that if we reach the end of the string, i
is returned = length + 2, and on the next call, if that happens, the loop
won't run and the function will return an empty string.

PhR

Jeff Buffington

unread,
Sep 17, 1999, 3:00:00 AM9/17/99
to
Philippe Ranger wrote:

> > > delete(result, length(result), 1);
> Problem here was it would remove a character from the string..(perhaps to

> Hmm... You're perfectly right, that's a good call! I over-corrected. My loop
> already quits at the comma.

It's coming back to me..slowly.. at least I knew where to look for a problem.

> Function MedStrings(s: string; var i: integer): string;
> Var
> i0: integer;
> Begin
> i0 := i;
> While (i <= length(s)) and (s[i] <> ',') do inc(i);
> result := copy(s, i0, i - i0);
> inc(i);
> End;

What is the bennefit of the above vs. below? Perhaps the "empty string...won't
run" you mentioned later? Hmm..---


Function MedStrings(s: string; var i:integer):string;

Begin
While (i <=length(s)) and (s[i] <> ',') do begin
if not IsDelimiter(',',s,i) then Result := result + s[i];
inc(i);
end;
inc(i);
End;
---
Also -- noticed last time you declared the function (appears both examples
above) as such:
"Function FunctionName(var i:InputType):OutputType;" as opposed to "Function
FunctionName(i:InputType):OutputType;"
Any advantage there? ---^ I always thought it was repetitive...but I'm
probably wrong.. :)

Have another question but it's somewhat unrelated to this part of the
code...will post it seperately.

Tnx, Jeff


Philippe Ranger

unread,
Sep 17, 1999, 3:00:00 AM9/17/99
to
<<Jeff:

What is the bennefit of the above vs. below? Perhaps the "empty
string...won't
run" you mentioned later? Hmm..---
Function MedStrings(s: string; var i:integer):string;
Begin
While (i <=length(s)) and (s[i] <> ',') do begin
if not IsDelimiter(',',s,i) then Result := result + s[i];
inc(i);
end;
inc(i);
End;
>>

Three problems with this. One, we'll never reach the If if it is true -- the
condition is caught by the While. So it's redundant. And redundant code is
compost for bugs.

Two, isDelimiter() only has a purpose when you're looking for more than one
delim. Otherwise, the plain char comparison is faster, and simpler -- no
risk of inverting the delims and string args.

Three, result MUST be inilialized (to ''). Local strings don't need
initializing, but result strings do. This is phenomenally well-hidden in the
docs, but for my part I never hit on a bug because of it because not
initializing something makes me nervous, so I do it "for documentation"
anyway. Well, I now know I have reason for the policy. The code I gave you
sets result in one go, by assigning it a Copy().

<<
Also -- noticed last time you declared the function (appears both examples
above) as such:
"Function FunctionName(var i:InputType):OutputType;" as opposed to
"Function
FunctionName(i:InputType):OutputType;"
Any advantage there? ---^ I always thought it was repetitive...but I'm
probably wrong.. :)
>>

THIS is important! When you pass i without "var", your func will work on a
local copy of it, and the var passed by the caller will be unaffected. When
you put var, the func works with the var the caller passed, though renamed
as the parameter. The need to *return* i was what drove you to use the
global x in the first place. I'm returning it by making it var.

Try this --

----------------------
Program vartest;
(*$apptype console*)

Procedure novar(i: integer);
Begin
i := 555;
End;

Procedure yesvar(var i: integer);
Begin
i := 555;
End;

Var
x: integer;
Begin
x := 1;
writeln('X init: ':20, x);
novar(x);
writeln('X after novar: ':20, x);
yesvar(x);
writeln('X after yesvar: ':20, x);
readln;
End.
----------------

PhR

Jeff Buffington

unread,
Sep 17, 1999, 3:00:00 AM9/17/99
to

Philippe Ranger wrote:

> Three problems with this. One, we'll never reach the If if it is true -- the
> condition is caught by the While. So it's redundant. And redundant code is
> compost for bugs.

Sounds logical.

> Two, isDelimiter() only has a purpose when you're looking for more than one
> delim. Otherwise, the plain char comparison is faster, and simpler -- no
> risk of inverting the delims and string args.

Indeed it does appear to run slightly faster.

> Three, result MUST be inilialized (to ''). Local strings don't need
> initializing, but result strings do. This is phenomenally well-hidden in the
> docs, but for my part I never hit on a bug because of it because not
> initializing something makes me nervous, so I do it "for documentation"
> anyway. Well, I now know I have reason for the policy. The code I gave you
> sets result in one go, by assigning it a Copy().

In my code it did "Result:=' '" just for good measure...forgot to copy that
line. I compared the two (commenting out one or the other), and both seem to
function properly..but again, that's assuming it runs into no errors in the
*.dat file. And as I said before, there was a slight increase in performance.

> THIS is important! When you pass i without "var", your func will work on a
> local copy of it, and the var passed by the caller will be unaffected. When
> you put var, the func works with the var the caller passed, though renamed
> as the parameter. The need to *return* i was what drove you to use the
> global x in the first place. I'm returning it by making it var.

Indeed the importance made itself blatently clear to me.. I was having problems
adding data to a record dataset in nested procedures. After reading this
message I used this option and *poof* now it (the data) remains constant. I
haven't tried to move the type declaration from global to the main procedure
yet, but will be doing that next. The main reason I put anything in globally is
simply to see if it will work first -- the old convential "like riding a
bicycle" may apply to Pascal in general, but I went from a Schwinn to a Harley..
:)

For my data to output, I've been looking at using records as opposed to
arrays...which would be more ideal? I kinda like the window that pops up when
you type "NameOfRecord." -- makes it easier to find the field I'm looking for
(there's 186 of them). I still need to write an initialization routine -- if
the "for jF := low(aFields) to high(aFields) do begin ..." trick will work for
record types (which I'll find out shortly), I'll likely use it. (Noticed while
testing my current code the random bits of data in the variables of the record
type -- "ALWAYS initialize your variables" my teacher used to say [and you for
that matter]).

Tnx, Jeff


Philippe Ranger

unread,
Sep 18, 1999, 3:00:00 AM9/18/99
to
<<Jeff:

In my code it did "Result:=' '" just for good measure...forgot to copy that
line. I compared the two (commenting out one or the other), and both seem
to
function properly
>>

Trust me on this one, uninitialized string results can bite, and when you
don't expect it. Try this --

--------------
Program noinit;
(*$apptype console*)

Function test: string;
Begin
result := result + '123';
End;

Var
s: string;
Begin
s := test;
writeln('After test 1: ', s);
s := test;
writeln('After test 2: ', s);
End.
--------------------

<<
For my data to output, I've been looking at using records as opposed to
arrays...which would be more ideal?
>>

Toldja -- TstringList. But a record is a very bad choice -- you can't loop
through it the way I showed you. If you don't want a TstringList, use an
array using a enumeration for the index, as I showed. In any case, if I were
you and I had 186 strings in there, I would give up using names for them,
I'd use numbers. TstringList indexes from 0 to count - 1. All you need is a
piece of paper noting what you've stored at 0, 1 ... 184, 185. Check this
(strlist.dpr) --

----------------
Program strlist;
(*$apptype console*)
Uses classes;

Procedure loadList(sls: TstringList);
(*Pre: sls is created.
Post: all lines from current program source added to sls, one item per
line*)
Var
t: textFile;
s: string;
Begin
assign(t, 'strList.dpr');
reset(t);
try
while not eof(t) do begin
readln(t, s);
sls.add(s);


end;
finally
closeFile(t);
end;
End;

Procedure showList(sls: TstringList);
Var
j: integer;
Begin
for j := 0 to sls.count - 1 do begin
writeln(j:3, ' ', sls[j]);
end;
End;

Var


sls: TstringList;
Begin
sls := TstringList.create;

try
loadList(sls);
showList(sls)
finally
sls.free;
end;
End.
-----------------

<<
if the "for jF := low(aFields) to high(aFields) do begin ..." trick will
work for
record types (which I'll find out shortly),
>>

It will not. Do keep it simple and use a TstringList -- after an hour,
you'll find it incredibly convenient, a dream. Just read the Help for
TstringList.

PhR


Jeff Buffington

unread,
Sep 20, 1999, 3:00:00 AM9/20/99
to

Philippe Ranger wrote:

> Toldja -- TstringList. But a record is a very bad choice -- you can't loop
> through it the way I showed you. If you don't want a TstringList, use an
> array using a enumeration for the index, as I showed. In any case, if I were
> you and I had 186 strings in there, I would give up using names for them,
> I'd use numbers. TstringList indexes from 0 to count - 1. All you need is a
> piece of paper noting what you've stored at 0, 1 ... 184, 185. Check this
> (strlist.dpr) --

> PhR

Well, I've been trying to read the help file about TstringList. I can create a
list, and destroy it..but the problem is, how can I specify the exact length of
each string in the list (they vary anywhere from 1 to 40)? Furthermore, I won't
be putting data in the list in sequentially..rather in my own bizarre order.
Each list will always have 186 fields, and they will always be in the same
order. The data I'm reading, however, is not in the same order as it needs to
be. So what I'm wanting to do is assign each field or sets of fields, when
they've been read.

Think of it like this...I have 10 fields in my list, and the first field of old
data will actually be located in the 3rd field of the new list, the 2nd in the
10th, 3rd in the 5th, etc. As I understand it, the list gets created as you go,
so if you ".add" it creates the new field. Perhaps I'm way off base...but I
tried assigning NewList[1]:=OldArray[Field3]...it really doesn't like
that..because the fields haven't been created yet? I like the functionality
that it has...but how can I put data in the 186th field if the first through
185th haven't been created yet ?

Should I .Add 186 [0-185] null fields in the list first, then try to assign the
data to the proper places? Or should I use the
Capacity property to make 186 fields and use Tstringlist.Insert to put my data
in the fields I want? Oh, what do you actually call the "fields" in a list?
Are they still fields, with a specific index...or a string with an index of a
list? Only reason I've been using the term fields, is because it's a familiar
term.

Tnx de Jeff


Philippe Ranger

unread,
Sep 20, 1999, 3:00:00 AM9/20/99
to
<<Jeff:

Well, I've been trying to read the help file about TstringList. I can
create a
list, and destroy it..but the problem is, how can I specify the exact length
of
each string in the list (they vary anywhere from 1 to 40)?
>>

You don't. These are dynamic strings, the normal string type in Delphi 32.
Will happily jump from size 1 to size 10000000.

Everytime we've use strings in code here, you or I, you'll notice length was
not specified. Those were dynamic strings (aka ansiStrings).

<<
. The data I'm reading, however, is not in the same order as it needs to
be. So what I'm wanting to do is assign each field or sets of fields, when
they've been read.
>>

You just switch the strings around. The values are actually pointers. Check
this variation on my last demo. Add this procedure --

Procedure reverseList(sls: TstringList);
Var
s: string;
jo, jm: integer;
Begin
jo := 0;
jm := sls.count - 1;
while (jo < jm) do begin
s := sls[jo];
sls[jo] := sls[jm];
sls[jm] := s;
inc(jo);
dec(jm);
end;
End;

And change the main program to --

Var
sls: TstringList;
Begin
sls := TstringList.create;
try
loadList(sls);

reverseList(sls);


showList(sls)
finally
sls.free;
end;
End.

<<


Think of it like this...I have 10 fields in my list, and the first field of
old
data will actually be located in the 3rd field of the new list, the 2nd in
the
10th, 3rd in the 5th, etc. As I understand it, the list gets created as you
go,
so if you ".add" it creates the new field.
>>

Yes. But, once it's filled, you can treat it as an array. This is what I'm
doing in reverseList and showList.

<<
Perhaps I'm way off base...but I
tried assigning NewList[1]:=OldArray[Field3]...it really doesn't like
that..because the fields haven't been created yet?
>>

Exactly. But, once youve done "newList := TstringList.create" then you can
do --

newList.add(oldArray[field3]);

Two problems with this. One, it assumes you're going to add in sequence.
Two, it has you going from an array to a list. You should choose one type
and keep to it (KISS).

<<
I like the functionality
that it has...but how can I put data in the 186th field if the first through
185th haven't been created yet ?
>>

This isn't the same question as the one the preceding quote asked. But it's
simply my "problem one". So, about problem one -- If you add to the new list
in sequence (which is pretty sensible anyhow), you'll have no problem. If
you don't, then to set up an empty list of some size, you have to do --

Procedure setCount (sls: TstringList; n: integer);
(*Pre: sls has just been created or cleared. It is empty.
Post: sls has n elements, all empty strings*)
Var
j: integer;
Begin
sls.capacity := n;
for j := 0 to n - 1 do sls.add('');
End;

<<
Should I .Add 186 [0-185] null fields in the list first, then try to assign
the
data to the proper places? Or should I use the
Capacity property to make 186 fields and use Tstringlist.Insert to put my
data
in the fields I want?
>>

SetCount clearly chooses the first option. The second won't work. Capacity
is capacity (like the set size of an old string[80]) while the actual,
current number of elements is registered in Count, and that's read-only
(like the old length value of a string -- and by the way you now have
SetLength for strings, any strings, short or ansi).

Doing SetCount isn't as neat has having a ready-made array, but on the other
hand there are all kinds of conveniences in TstringList. For instance, my
ReverseList could have been simplified by using Exchange.

<<
Oh, what do you actually call the "fields" in a list?
Are they still fields, with a specific index...or a string with an index of
a
list? Only reason I've been using the term fields, is because it's a
familiar
term.
>>

Officially, you call them "elements". But another trick that might interest
you is that you can feed in strings *with a name* --

myList.add('field83=Timbuktu');

Then you can use the list as an array indexed on the names --

myList.values['field83']

-- will return 'Timbuktu'! So you get the list indexed on normal indices, 0
to count-1, but also on any collection of names (field names) you please.
There is no relation between the two system, 'field83' can be added at index
0 if you want. Values aren't super-quick, but try them out if you think
they'd help you -- don't assume you need more speed.

PhR

Jeff Buffington

unread,
Sep 20, 1999, 3:00:00 AM9/20/99
to

Jeff Buffington wrote:

> Should I .Add 186 [0-185] null fields in the list first, then try to assign the
> data to the proper places? Or should I use the
> Capacity property to make 186 fields and use Tstringlist.Insert to put my data

> in the fields I want? Oh, what do you actually call the "fields" in a list?


> Are they still fields, with a specific index...or a string with an index of a
> list? Only reason I've been using the term fields, is because it's a familiar
> term.

> Tnx de Jeff

Ok..discovered the (a ?) solution on my own...following my suspicions I did this:

Procedure InitLytecList(var Lytec:TstringList);
var
i: integer;
Begin
For i := 0 to 186 do Lytec.Insert(i, '');
End;

I still haven't found the solution to making each list item a specific length
yet..but I'm looking. :)

Philippe Ranger

unread,
Sep 20, 1999, 3:00:00 AM9/20/99
to
<<Jeff:

Procedure InitLytecList(var Lytec:TstringList);
var
i: integer;
Begin
For i := 0 to 186 do Lytec.Insert(i, '');
End;

I still haven't found the solution to making each list item a specific
length
yet..but I'm looking. :)
>>

Insert will raise an exception if (i >count). You're actually doing an
add(), but through a more fragile call.

I've been thinking that, up to this point, everything you've given me pleads
in favor of array types (arrays of strings, of course), one per record type,
each indexed on an enumeration for the field names. I thought you'd be doing
stuff with your stringLists, but it doesn't look that way. It's neat you
should learn at least one class, but perhaps my last two messages have been
leading you up a primrose path.

As for strings, ansiStrings don't have a specific size, just a current
length.

PhR


Jeffrey A. Wormsley

unread,
Sep 22, 1999, 3:00:00 AM9/22/99
to
Philippe Ranger wrote:

> I'd use numbers. TstringList indexes from 0 to count - 1. All you need is a
> piece of paper noting what you've stored at 0, 1 ... 184, 185. Check this
> (strlist.dpr) --

Or use numbers with Constants for the names, as in:

Const
FFirstName = 0;
FLastName = 1;
FInitial = 2;
FTitle = 3;

etc...

Jeff.

Jeffrey A. Wormsley

unread,
Sep 22, 1999, 3:00:00 AM9/22/99
to
Philippe & Jeff:

Been lurking on the thread a bit, and I see two things that
may help out here. First, that MedString function that
searches through the whole input one character at a time is
a bit unclear. For this type parsing, while it may not be
as efficient, I usually use Pos and destroy the input string
as I go, as in:

Function GetFirstFieldFromCommaFile(Var S: String): String;
Var I : Integer;
Begin
I := Pos(',', S);
If I = 0 then // Comma not found, last field or S empty
Begin
Result := S;
S := '';
End
Else
Begin
Result := Copy(S, 1, I - 1);
S := Copy(S, I+1, Length(S));
End;
End;

This should let you break out each field, and return an
empty string for inputs that are like
Jim,Wilson,,,,Fresno,,,23 ... and so on, as Copy returns a
null string if you pass it a 0 for the number of characters
to copy, as a string begining with a ',' would do.

Second, the reason for temporary storage is only for the
purpose of rearranging the order of the fields between the
input format and the output format. For such a simple job,
then an array is fine for holding the data. Don't worry
about the string length until you output the data. Use
Constants to assign names to the array elements, such as the
code below:

Const OutField1 = 37; // use meaningful names,
obviously
OutField2 = 12;
...
OutField186 = 76;

Type TPLDataRow = Array[0..185] of String;

Procedure TranslatePLData;
Var PLDataRow : TPLDataRow;
f1, f2: text;
I : Byte;
s : string;


Begin
AssignFile(f1, 'datpl.dat');
AssignFile(f2, 'lm01.txt');
FileMode:=2;

Reset(f1); // Really need a FileExists check for f1 and f2
Reset(f2); // This may need to be ReWrite, not Reset if f2
doesn't exist
While not EOF(f1) do
Begin
ReadLn(f1, s);
For I := 0 to 185 Do
PLDataRow[I] := GetFirstFieldFromCommaFile(S);
PLDataRow := TranslateRow(PLDataRow); // Rearrange the
fields, fix sizes, etc. as needed
For I := 0 to 185 Do
Write(F2, PLDataRow[I]);
Writeln(F2);
End;
CloseFile(F2);
CloseFile(f1);
End;

This needs a helper function TranslateRow which you can pass
the PLDataRow and rearrange it. Here is where you could use
you constants. It might look like the following:

Function TranslateRow(Row: TPLDataRow): TPLDataRow;
Var I : Integer;
Begin
// Fieldxxx constants handle the field shuffling
Result[0] := Row[Field1]; // Do your conversions and
formatting here
Result[1] := '"' + Row[Field2] + '"';
Result[2] := MakeSureTenCharactersExactly(Row[Field3]);
Result[3] := ConvertToSocialSecurityNumber(Row[Field4]);
Result[4] := Row[Field6] + Row[Field7];
...
End;

Even all of this isn't the height of efficiency, but it
should be a bit more readable and work quite well.

Jeff.

Philippe Ranger

unread,
Sep 22, 1999, 3:00:00 AM9/22/99
to
<<Jeffrey:

Or use numbers with Constants for the names, as in:
>>

Horror! There's a circle of Hell for people who suggest this. What are
enumerations for? (Which was my alternative suggestion. But at some point an
enum brings a grey-matter memory overload.)

By the way, the Condemned walk around numerating the NYC white pages. Every
mistake, and they start over again. "20,281 D'Amato, Luigi; 20,282, D'Amato,
Louis" SLASH! Back to the beginning!

PhR


Philippe Ranger

unread,
Sep 22, 1999, 3:00:00 AM9/22/99
to
<<Jeff:
Been lurking on the thread a bit, and I see two things that
may help out here. First, that MedString function that
searches through the whole input one character at a time is
a bit unclear. For this type parsing, while it may not be
as efficient, I usually use Pos and destroy the input string
as I go, as in:
>>

Very early in the thread, I posted my option, and that's what it does. Next
post, I pointed your homonym to HyperString, which avoids having to dump the
part already traversed.

<<
For such a simple job,
then an array is fine for holding the data. Don't worry
about the string length until you output the data.
>>

I came to the same conclusion.

<<
Use Constants to assign names to the array elements, such as the code below:
>>

See my other reply. I'm sending you to Hell. 2513, Abramovicz, Irving.

PhR

Jeffrey A. Wormsley

unread,
Sep 23, 1999, 3:00:00 AM9/23/99
to
Philippe Ranger wrote:
>
> <<Jeffrey:
> Or use numbers with Constants for the names, as in:
> >>
>
> Horror! There's a circle of Hell for people who suggest this. What are
> enumerations for? (Which was my alternative suggestion. But at some point an
> enum brings a grey-matter memory overload.)

Enumerations are fine, as long as there is a continuous
discrete range that begins with zero. If you need to start
somewhere other than zero, have duplicate identifiers, or
skip identifiers, then they are not appropriate. Especially
in an instance where the first field is a row type, and the
remaining fields are interpreted differently for each row
type. Here, you would have many duplicates. Granted this
is more common with fixed column formats than with comma
delimited ones. Example: EPROM programmer S record files.



> By the way, the Condemned walk around numerating the NYC white pages. Every
> mistake, and they start over again. "20,281 D'Amato, Luigi; 20,282, D'Amato,
> Louis" SLASH! Back to the beginning!

If I am to go to the eternal fire sale, I believe my
punishment will be hand compiling obfuscated C code to VAX
or PDP machine code. In braille.

Jeff.

Philippe Ranger

unread,
Sep 23, 1999, 3:00:00 AM9/23/99
to
<<Jeffrey:

Enumerations are fine, as long as there is a continuous
discrete range that begins with zero.
>>

We're numbering fields! They're definitely continuous, and it doesn't matter
where the numbering starts, it's arbitrary.

Also, in the case in point, we want to use the enums as indices for arrays.
This allows typechecking, each array can only be called with indices of the
same enum type it was declared for.

PhR

Jeff Buffington

unread,
Sep 23, 1999, 3:00:00 AM9/23/99
to
Just for the record -- I ran my 'beta' version of the program through the entire
set of datafiles (PT, PL, IN and RC [that's a new one]).
If it just read PL I'd be in great shape...here's how it's working:
Repeat
Read line in PL
Search for line with same acct number in PT then read
Search for line with same acct number in IN then read
Search for next line with same account number in IN then read
..
Write data to output file
Until EOF

It took 23 hours, 30 minuts (roughly) to complete the process... <ahem> but it
worked... unfortunately, because of a few bugs, some of the data got clipped.
Damn social security numbers... I used Copy, and after having been up for 27
hours at the time made the mistake of copying only 3 digits for the last part
(should be nnn-nn-nnnn instead got nnn-nn-nnn ooops!). Oh, and the book noting
which field is what in the new system mixed up two fields..

Now I can prune the data files some (it goes back to 1986), I only need the last
2-3 years of data (but I'd like to think I might have future use for the program
and some Doctors might want all their data). So, I need a better way to read
this data, or rather not read the lines that it no longer needs. My next post
will include some of my code so you can literally see how I'm doing it now.
Tnx de Jeff


Jeff Buffington

unread,
Sep 23, 1999, 3:00:00 AM9/23/99
to
Here's the stuff that's way to slow at reading. This is called by ReadPLData
after reading a line in the PL data file. After this is ran, ReadPLData calls
a read to ReadINData. Which does essentially the same thing (in triplicate) to
another file. As you can see, they're currently re-reading the entire data
files each time. Perhaps there's a way to index the datafiles first? I don't
have another 24 hours to not be able to use my computer..or 23:23:27 to be
exact. :)

Procedure ReadPTData(account:string; var lytec:TstringList);
// PT contain account info and has *NO DUPLICATES*
// Read data from Patient file
Type {Fields for PT file}
TMedFieldPT = (medacc,medzip,medpymttype,medbalance,medcurrent,med30_59,

med60_89,med90_120,med120over,medlastpd,medacctopened,medpatLbill,medLpymt,
medLact,medInsLbill,meddummy,medDisc,medAddr1,medAddr2,medCity,medSt,
medDayPh,medEvePh,medSSN,medRef,medFlags,medStmt,meddummy2);
var
aFields: array[TmedFieldPT] of string;
jF : TmedFieldPT;
datpt: text;
x : integer;
s : string;
Begin
{Open File}
AssignFile(datpt,'datpt.dat');
FileMode:=2;
Reset(datpt);
// While not EOF(datpt) do begin {way to speed up? Replaced w/ 'Repeat'}
Repeat
x:=1;
ReadLn(datpt, s);
// Search for matching account number
if copy(s,1,12)=account then {modify fields only if matches} begin


for jF := low(aFields) to high(aFields) do begin

aFields[jF] := MedStrings(s, x);

end; {for}
end; {if}
// end; {while}
Until EOF(datpt) or (copy(s,1,12)=account); // added this to speed up
somewhat -- ends if it found the data
// Assign more values to Lytec record
// Values read so far:
{medacc,medzip,medpymttype,medbalance,medcurrent,med30_59,

med60_89,med90_120,med120over,medlastpd,medacctopened,medpatLbill,medLpymt,
medLact,medInsLbill,meddummy,medDisc,medAddr1,medAddr2,medCity,medSt,
medDayPh,medEvePh,medSSN,medRef,medFlags,medStmt,meddummy2}
{MedAcc -- should have already}
Lytec[9] :='"'+aFields[medzip]+'"';
... {several assignments} ...
Lytec[36] :=aFields[medlastpd]; {These fields were reversed in manual}
Lytec[37] :='19'+aFields[medLpymt]; {convert to Y2k compliant}
if aFields[medDayPh]='' then Lytec[10] := '"'+aFields[medEvePh]+'"' else
Lytec[10] :='"'+aFields[medDayPh]+'"'; {Work Phone} // Added this
9/23/99
Lytec[12] :='"'+aFields[medEvePh]+'"'; {Home Phone}
Lytec[13]
:='"'+Copy(aFields[medSSN],1,3)+'-'+Copy(aFields[medSSN],4,2)+'-'+
Copy(aFields[MedSSN],6,4)+'"'; {Formated ###-##-####} // Fixed mistake
in reading correctly 9/23/99
// End of current assignments
CloseFile(datpt);
End;

Philippe Ranger

unread,
Sep 24, 1999, 3:00:00 AM9/24/99
to
<<Jeff:

Perhaps there's a way to index the datafiles first?
>>

Yes, there is, and it would solve everything. Opening the files as
textFiles, it can be done, but it requires a "text file device driver".
There may be some around that work. I have one, but it's not updated to
Delphi.

However, there's also a way-simpler method -- load the whole file to a
TstringList (this part is not fast). Then you have all the lines neatly
indexed. You just copy the account numbers to a second TstringList --

indexList.add(accountNum, Tobject(lineNumber));

-- so that it keeps the account nums as strings, and the line numbers
(indices) as fake Tobjects. Then you sort the indexList. Voilà!

To find an account, you do --

indexList.Find(thisAccount, myIndexNum);
myAccountIndex := integer(indexList.objects[myIndexNum]);
myAccountLine := accountList[myAccountIndex];

You can save the indexList to file afterwards. Of course, you save the
accountList back to file (accountList.saveToFile) after modifications.

If you choose this path, you'll really be using a TstringList! The lists can
be as large as your virtual memory will allow.

Now, if the files are really too large, you could still index them, but it
would be work. Let's hope you don't need it.

Finally, for the record, a few points about your current method.

Point one (essential!) --

-----------------


var
aFields: array[TmedFieldPT] of string;
jF : TmedFieldPT;
datpt: text;
x : integer;
s : string;

aBuf: array[1..$8000] of char; // <<<<<


Begin
{Open File}
AssignFile(datpt,'datpt.dat');

setTextBuf(datpt, aBuf); // <<<<<<<<<<<<
----------------------

(You can forget about fileMode, it has nothing to do with text files.) The
above will very much speed up your textFile access. Do it on ALL your files.

Point two -- If you're going to scan the file for accounts, at least leave
it open, don't open and close on each scan! To return to the start, reset
without closing. (Not that this will change much.)

In passing --

<<
// While not EOF(datpt) do begin {way to speed up? Replaced w/
'Repeat'}
>>

Makes no speed difference, but the repeat will raise an exception if the
file is empty.

Point three --

<<
if copy(s,1,12)=account then
>>

Faster: if (0 = strLcomp(pChar(account), pChar(s), 12)) then

Point four --

<<
Until EOF(datpt) or (copy(s,1,12)=account); // added this to speed up
>>

Don't check a second time! Just call Break in the block you execute when you
find the account. Check the Help, Break and Continue are neat stuff to know.

<<
// Values read so far:
{medacc,medzip,medpymttype,medbalance,medcurrent,med30_59,
>>

From this point onwards, I don't quite know what you're doing.

PhR

Jeff Buffington

unread,
Sep 24, 1999, 3:00:00 AM9/24/99
to
Philippe Ranger wrote:

> Yes, there is, and it would solve everything. Opening the files as
> textFiles, it can be done, but it requires a "text file device driver".
> There may be some around that work. I have one, but it's not updated to
> Delphi.

What exactly do they do? Haven't found much about them in help yet.
Eventually, I'd like to expand my program's ability to cover different types of
data sets (ie- those from other systems)..since my output is always the same,
perhaps I could create a set of drivers that would handle the input data sets
(based on user input - selecting source type of data). Hmm..this of course
would be down the road -- perhaps my next encounter with a similar situation
(potentially possible). Of course, I hope to have implemented a GUI interface
first (complete with nifty little progress meters).

> However, there's also a way-simpler method -- load the whole file to a
> TstringList (this part is not fast). Then you have all the lines neatly
> indexed. You just copy the account numbers to a second TstringList --

> If you choose this path, you'll really be using a TstringList! The lists can
> be as large as your virtual memory will allow. Now, if the files are really
> too large, you could still index them, but it would be work. Let's hope you
> don't need it.

The largest input file is roughly 2 megs, and the ouput file winds up being 5
megs (after data is pulled from the 3 old input files).

> Finally, for the record, a few points about your current method.
> Point one (essential!) --
> -----------------

> aBuf: array[1..$8000] of char; // <<<<<

> setTextBuf(datpt, aBuf); // <<<<<<<<<<<<
> ----------------------

From 'Help':"When SetTextBuf is called on an open file once I/O operations have
taken place, data could be lost because of the change of buffer.
Delphi does not ensure that the buffer exists for the entire duration of I/O
operations on the file. >>A common error is to install a local variable as a
buffer, then use the file outside the procedure that declared the buffer.<<"

Will this be affected by the nested routines?

> (You can forget about fileMode, it has nothing to do with text files.) The
> above will very much speed up your textFile access. Do it on ALL your files.

Speed, SPEED, need more speed...! 8)

> Point two -- If you're going to scan the file for accounts, at least leave
> it open, don't open and close on each scan! To return to the start, reset
> without closing. (Not that this will change much.)

Just Reset it? With over 5000 records..any time conservation is good.. Should
I open the files in my main procedure, instead of each individual one? Here I
was trying to conserve memory by keeping declarations local... <g>

> In passing --
> // While not EOF(datpt) do begin {way to speed up? Replaced w/
> 'Repeat'}
> Makes no speed difference, but the repeat will raise an exception if the
> file is empty.

Actually, the main thing here that did make a difference was: "Until
EOF(datpt) or (copy(s,1,12)=account);" Basically, it ends the looping once it
has read the account information (or if it reaches the end of file). So it kept
reading the rest of the file regardless of whether it had it's data or not. I
tried "While not (EOF(datpt) or (copy(s,1,12)=account));", but it didn't like
that, I think... multiple boolean operators..not sure why tho.

> Point three --
> if copy(s,1,12)=account then
> Faster: if (0 = strLcomp(pChar(account), pChar(s), 12)) then

I *will* have to try this one.. :o Faster is better.. much better -- Umm..I
added a routine in the main procedure (the first one called) that gets the
current position, divides the length into it and writln's that information to
the screen (though I MISS my GotoXY and ClrScr routines.. I hate scrolling
text.... *I WANT MY C-R-T!!* [Unit, that is..] <g> I really miss my "While not
KeyPressed Do") Any change on it degredating the overall speed?

> Point four --
> > Until EOF(datpt) or (copy(s,1,12)=account); // added this to speed up
> Don't check a second time! Just call Break in the block you execute when you
> find the account. Check the Help, Break and Continue are neat stuff to know.

I thought about this one..was affraid I'd be committing a mortal sin <g> in
Pascal programming...somehow I have a feeling this _might_ increase speed
(instead of having to start at the beginning each time), but I'll have to check
the data to see if records in the PT and IN files appear throughout the files or
sequentially.

> From this point onwards, I don't quite know what you're doing.
> PhR

The rest are just assignments (those that can be made at this point) to the
output array. For example:


Lytec[13] :='"'+Copy(aFields[medSSN],1,3)+'-'+Copy(aFields[medSSN],4,2)+'-'+
Copy(aFields[MedSSN],6,4)+'"'; {Formated ###-##-####}

This part takes the unformatted data from the old data set, copies the first 3,
adds the '-', middle 2, adds '-' and last 4 digits of a social security number
-- assigning them to the 13th field of the array. Basically, last minute
formatting that is not universally done (the SSN is read only once per each
account). Incidentally, this is one of the places I screwed up accidentally by
forgetting to copy 4 digits -- I only coppied 3..leaving me with 8 digit SSN's
(not good, very important to the doctors, especially if the patients insurance
is the federal gov't ...Medicare, Medicade, etc).


Jeff Buffington

unread,
Sep 24, 1999, 3:00:00 AM9/24/99
to
Well, I reran it (stopped it after 15 minutes)...it did seem to run faster. 680k
after 15 minutes.
My '% meter' ("writeln((FilePos(datpl)/FileSize(datpl)*100):6:2);") seemed to jump
in 6.6% increments.
-Jeff


Philippe Ranger

unread,
Sep 24, 1999, 3:00:00 AM9/24/99
to
<<Jeff:

Here's the stuff that's way to slow at reading.
>>

Adding to previous reply. When I said indexList could be saved to file, it
can, but not simply by calling SaveToFile. That won't save the line numbers
from the account list. To do that, the fastest way is to go through
indexList and add the line numbers, cast to four chars, at the end of the
account number strings. After reading back, remove the appendages and put
them back to the "objects" --

On writing --

------------
Type
Ta = array[1..4] of char;
Var
...
j: integer;
s: string;
a: Ta;
Begin
for j := 0 to indexList.count - 1 do begin
s := indexList[j] + Ta(indexList.objects[j]);
indexList[j] := s;
end;
indexList.saveToFile('indexFile.ind');
------------

On reading, same declarations --

----------------
indexList.loadFromFile('indexFile.ind');
for j := 0 to indexList.count - 1 do begin
s := indexList[j];
move(s[length(s) - 3], a, 4);
s := copy(s, 1, length(s) - 4);
indexList[j] := s;
indexList.objects[j] := Tobject(a);
end;
----------------------

This is fancy typecasting, it's not something you're supposed to know after
three months of Delphi. But it works and it's fast.

Another note -- when you build indexList, first set --

indexList.capacity := accountList.capacity;

This will make the add for each account line go faster.

PhR

Jeff Buffington

unread,
Sep 24, 1999, 3:00:00 AM9/24/99
to
Jeff Buffington wrote:

> Philippe Ranger wrote:
> > Point four --
> > > Until EOF(datpt) or (copy(s,1,12)=account); // added this to speed up
> > Don't check a second time! Just call Break in the block you execute when you
> > find the account. Check the Help, Break and Continue are neat stuff to know.
>

> I thought about this one..was affraid I'd be committing a mortal sin <g> in
> Pascal programming...somehow I have a feeling this _might_ increase speed
> (instead of having to start at the beginning each time), but I'll have to check
> the data to see if records in the PT and IN files appear throughout the files or
> sequentially.

Re: mortal sin -- sort of the same line of thinking when Borland added the "Goto"
routine (I believe sometime between TP 4.0 and 5.0). Oddly enough it survived the
scrutiny and made it's way into Delphi (ran into it by accident). "Procedure good,
Function good, Label bad, Goto bad - very bad." Of course, I still have my
TurboBASIC book around here.. :) The binding is cracked so it opens to the ASCII
table..I love those tables.


Philippe Ranger

unread,
Sep 24, 1999, 3:00:00 AM9/24/99
to
<<Jeff:

What exactly do they do?
>>

Anything you please, including launch moon rockets on Readln. (Truth.) In
the case in point, they can allow you to read a TextFile and have FilePos
and Seek too -- as in eating your cake and having it too. For your current
needs, though, this is likely not to be the simplest solution. If
TstringList will do, use that! This applies to the generalized case too --

<<
Eventually, I'd like to expand my program's ability to cover different types
of
data sets (ie- those from other systems)..since my output is always the
same,
perhaps I could create a set of drivers that would handle the input data
sets
(based on user input - selecting source type of data).
>>

<<


The largest input file is roughly 2 megs, and the ouput file winds up being
5
megs (after data is pulled from the 3 old input files).
>>

Ah! Small stuff! TstringList, and forget the rest!

<<
From 'Help':"When SetTextBuf is called on an open file once I/O operations
have
taken place, data could be lost because of the change of buffer.
Delphi does not ensure that the buffer exists for the entire duration of I/O
operations on the file. >>A common error is to install a local variable as a
buffer, then use the file outside the procedure that declared the buffer.<<"

Will this be affected by the nested routines?
>>

No. As long as the buffer var is declared in the same block as the textFile
var, everything's sweet.

<<
Just Reset it? With over 5000 records..any time conservation is good..
>>

Yeah, but this is trimming corners on a square wheel with a nail file. You
need indexing. I was just pointing this out, one, in case of desperation
(but 5M files are nothing to despair of) and, two, just to tell you about
the trick.

<<
I *will* have to try this one.. :o Faster is better.. much better
>>

Yeah, but this is still reading through the file for each record. StrLcomp
is better than the nail file, but it's still filing the corners on a large,
iron square wheel.

<<
I
added a routine in the main procedure (the first one called) that gets the
current position, divides the length into it and writln's that information
to
the screen (though I MISS my GotoXY and ClrScr routines.. I hate scrolling
text.... *I WANT MY C-R-T!!* [Unit, that is..] <g> I really miss my "While
not
KeyPressed Do") Any change on it degredating the overall speed?
>>

Even without a Crt unit, you don't have to scroll. Just ^M to return to
margin. Try this --

--------
VAR
nPercent: integer;
BEGIN
for nPercent:=0 to 100 do begin
write (^M, nPercent, '% completed... ');
sleep(250);
end;
----------

There's a free crt_efd unit at http://efd.home.mindspring.com/tools.htm .
With source. But you don't need it now.

<<
I thought about this one..was affraid I'd be committing a mortal sin <g> in
Pascal programming...somehow I have a feeling this _might_ increase speed
(instead of having to start at the beginning each time),
>>

The Break will do no more and no less than what you're doing now, quit
reading when account found, only it will do it without repeating the whole
account number check that you do on every record! Another file taken to the
big, square wheel.

PhR

Philippe Ranger

unread,
Sep 24, 1999, 3:00:00 AM9/24/99
to
<<Jeff:

Re: mortal sin -- sort of the same line of thinking when Borland added the
"Goto"
routine (I believe sometime between TP 4.0 and 5.0).
>>

Goto belongs in the original Pascal, c. 1970, and has always been with us
since TP1. Break, Continue and Exit are all helps to clarity, imho. I use
them often. In fact, I'm so bad, by default I code my loops --

while true do begin
...
if exit condition found then BREAK;
...
end;

If afterwards (or sometimes ahead of time) I find the exit condition fits
well on top or at bottom, then I modify the While or use a Repeat.

PhR

Philippe Ranger

unread,
Sep 24, 1999, 3:00:00 AM9/24/99
to
<<Jeff:
Here's the stuff that's way to slow at reading.
>>

Still another addendum to the TstringList plan. The only reason for
indexList is to maintain the original order of accountList. I'm not clear if
you're modifying that file or not. If you are not, or if you can write it
back in a new order, then all you'd have to do is load it (sorted false),
set Sorted true and then use Find directly on it.

BTW, if you have to use a separate index instead, the better scheme for
setting it up would be --

indexList := TstringList.create;
indexList.capacity := accountList.capacity;
//copy account nums and line nums
indexList.sorted := true;

PhR

Jeff Buffington

unread,
Oct 3, 1999, 3:00:00 AM10/3/99
to
Any ideas on how to seperate city, state and zip from a single (address) string
into their respective counterparts?
examples ("Original City ST Zip-Code" --> "Origional City", "ST", "Zip-Code"):
"New York NY 10036-2034" --> "New York", "NY", "10036-2034"
"Tampa FL 33631" --> "Tampa", "FL", "33631"
"Kissimmee Florida 34741" --> "Kissimmee", "FL", "34741"
"Attn Claims Dept" --> "Attn Claims Dept", "", ""

Note the last two examples...there are some cases in which the state may be
spelled out, and other cases where all the data isn't what it ought to be (blame
the old program on this one).

My thoughts were to do it in reverse..start from the end, if it's numeric read
until it gets a space (if not don't process), make sure it's either 5 or 10
characters long (5 numbers and possibly a '-' with four additional numbers
[Zip+4]). Presume the next string is the state (could pose a problem for
spelled out state names [eg North Carolina]..hopefully most will be 2 letters),
then the rest can be the city name.

I don't think I can start from the front due the possibility of too many
possible 2 (or more) part names for cities [eg Saint Cloud]. I suppose I could
use a case to check the substring read against a list and if it matched, look
for the next part...eg "Dakota", "Carolina" look for next string in front of
it..could be "North" or "South" etc. Heaven forbid I run into "District of
Columbia"..<g>
--Jeff


Greg Lorriman

unread,
Oct 3, 1999, 3:00:00 AM10/3/99
to

Jeff Buffington <a...@netzero.net> wrote in message
news:37F7301E...@netzero.net...

> Any ideas on how to seperate city, state and zip from a single (address)
string
> into their respective counterparts?
>
> examples ("Original City ST Zip-Code" --> "Origional City", "ST",
"Zip-Code"):
> "New York NY 10036-2034" --> "New York", "NY", "10036-2034"
> "Tampa FL 33631" --> "Tampa", "FL", "33631"
> "Kissimmee Florida 34741" --> "Kissimmee", "FL", "34741"
> "Attn Claims Dept" --> "Attn Claims Dept", "", ""

I agree with your approach (starting in reverse) and would add some basic
validation (is state all uppercase) and an exception block to catch problem
records to be output to a error log of some sort so that the user can deal
with them.

This would mean that you can make certian assumptions and not have to write
complicted code to catch problematic records. Writing that sort of code is
prone to logic errors which means that you would quite likely incorrectly
split some records.

Philippe Ranger

unread,
Oct 3, 1999, 3:00:00 AM10/3/99
to
I'm sorry, Jeff, but you can't split strings sideways (an array of char
yielding two arrays of half-chars).

Any other question?

<<
examples ("Original City ST Zip-Code" --> "Origional City", "ST",
"Zip-Code"):
"New York NY 10036-2034" --> "New York", "NY", "10036-2034"
"Tampa FL 33631" --> "Tampa", "FL", "33631"
"Kissimmee Florida 34741" --> "Kissimmee", "FL", "34741"
"Attn Claims Dept" --> "Attn Claims Dept", "", ""

Note the last two examples...there are some cases in which the state may be


spelled out, and other cases where all the data isn't what it ought to be
(blame
the old program on this one).
>>

The short answer is -- DO THE SPECS!

Eventually, you'll have a routine. Look at it as a black box. You put a
string in, you get three strings out, some possibly empty, and forming a
complete split of the original. Right?

Well, then, either your black box is totally random, or there is a way of
saying what it does, simpler than saying "you put this in, you get this out"
for all possible "this". That description is the spec! Examples can make a
spec clearer, but they are not part of a spec, unless it's for arbitrary
behavior.

Your first line, rephrased gives the sentence I asked "right?" about. After
that, you give nothing but examples, and an incomplete list of "may be"s.
Jeff, remember, once you do have the code, it does follow a spec. So you can
write the spec in English now, check it for completeness, THEN worry about
implementation.

"Start from the end" is an implementation detail. The routine does the same
thing (fulfils the same spec) whether it starts from the end or it dials the
State Department for an answer.

Another way of saying this is, this should be the *question* for a math-type
problem. I'll tell you what I have, but I can't tell you if it's what you
want, you'll have to correct this --

-- Routine takes a string copy param and returns three string var (or out)
params which together hold the original string, minus possibly some spaces
or commas.

-- The out params are City, State, Zip. Any may be empty. But
City+State+Zip, in that order, yields the input string, minus separators.

-- There is at least one separator between each element present in the input
string.

-- A comma is always an element separator. A space may separate elements, or
it may be part of one. Combinations of space(s) and a single comma are
treated as one comma separator. Strings of spaces are treated as one space.
There are no other possible separators.

-- Zip is a collection of digits, with possibly a hyphen somewhere, and
nothing else.

-- The other two fields hold no digits at all.

-- If there is a Zip field, then there must be a state field.

-- We can list all state names that have two words. No state name has more
than two words. If a two-word state name from the list appears as the last
digit-less substring, then it is held to be a state. Example, "New York
01234" vs. "New York NY 01234".

-- Inputs not meeting these rules will be put into a separate exception file
for human perusal.

Jeff, it this is the problem setting, then it can be solved. Whether you go
in frontwards or backwards, it's about equally complicated each way. So, is
this right?

PhR

Peter N Roth

unread,
Oct 3, 1999, 3:00:00 AM10/3/99
to

Jeff Buffington <a...@netzero.net> wrote in message
news:37F7301E...@netzero.net...
> Any ideas on how to seperate city, state and zip from a single (address)
string
> into their respective counterparts?

How about listing all the states in an array?

Fullname = 50 lines, all uppercase
Abbreviations = another 50 lines, all uppercase

search for the state on each line.

The rest of the pattern should fall right out of the string.

[ unless someone doesn't know how to spell Vajenya, for example ]

The question: how many records are there? could determine
if you actually decode the strings in your lifetime. But at least you
could get this algorithm going until you figure out how to
make it faster. And you're only going to use this program
ONE TIME, so speed isn't _that_ important...
--
Grace + Peace Peter N Roth
Engineering Objects International
http://inconresearch.com/eoi
"I wonder what you would expect except to catch from finally?"


Philippe Ranger

unread,
Oct 3, 1999, 3:00:00 AM10/3/99
to
<<Greg:

I agree with your approach (starting in reverse)
>>

It makes absolutely no difference. If the first digit in the string occurs
at the start, after a space or after a comma, then it starts the zip code,
and the tail can be validated on that model. If the first digit occurs
anywhere else, the string needs a human to decode it. In other words,
digit=zip, and zip=easy part. Parsing backwards just complicates things a
bit. Getting rid of the easy part helps you not one wit for the hard part.

PhR

Philippe Ranger

unread,
Oct 3, 1999, 3:00:00 AM10/3/99
to
<<Peter:

Fullname = 50 lines, all uppercase
Abbreviations = another 50 lines, all uppercase
>>

I was thinking of that, but you need more than one collection of
abbreviations. "Minn.", "Miss." etc. Since the state *must* come at the end
or just before the zip, it's not all that hard. Especially if you first
check any two-letter word in that position against the standard
abbreviations (pos in the string of them).

PhR

Jeff Buffington

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to

Philippe Ranger wrote:

There is always the possibility it would run into a non-standard format for
the state names. Hopefully, this won't occur (at least not frequently) since
the US Postal Service abandoned the older formats ('Fla' should be 'FL' now
days). Where the biggest problem emenates from is the format that the old
program stored this data vs. the new programs method (which is significantly
more logical). The old one stored this information as a single string of some
length (perhaps as much as 255 chars, but most likely less), whereas the new
system has these fields seperated into specified length strings.

This is the crux of the problem that demands my attention, my 'quick-n-dirty'
method put the info into the only available field of any significant
length..the 'city' field, unfortunately if this string is longer than 25 chars,
it gets truncated. Ultimately, the customer was already notified that this
data needed close inspection prior to its use...but you know how well stern
warnings work on end users. <g>

I will start working on the code and post it here for 'peer-review'.. I always
wanted to use that phrase :) Tho no doubt most of my 'peers' (ie - Ya'll [to
use a local term]) are 'su-PEER-ior' to myself.
--Jeff--


Jeff Buffington

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to
Peter N Roth wrote:

> How about listing all the states in an array?

> Fullname = 50 lines, all uppercase
> Abbreviations = another 50 lines, all uppercase
>

> search for the state on each line.

This could present a problem tho with some city names also being state
names...the classic New York New York, or even Kansas City Kansas...

>
>
> The rest of the pattern should fall right out of the string.
>
> [ unless someone doesn't know how to spell Vajenya, for example ]
>
> The question: how many records are there? could determine
> if you actually decode the strings in your lifetime. But at least you
> could get this algorithm going until you figure out how to
> make it faster. And you're only going to use this program
> ONE TIME, so speed isn't _that_ important...

There are roughly 1000 records, but since each one is unique and in only one
file (both of which were not true for the other data being converted), it only
needs to run through the file once.

Oh..I know about speed problems...just ask Phil...the entire program took
nearly 24 hours to convert the entire data set... this being just a small
subset, I've made temporary modifications to skip the other data (gotta love
//)..those procedures involved are being ignored for this fix.. Eventually I
plan on allowing the end user to select check boxes in a GUI which datasets
need converting..but I haven't progressed that far yet..which is why this app
is currently console based. I'm still relatively new to the whole Win95/98/NT
programming thing...and I _still_ want my Crt unit back.. <G>

"Write('Press a key to continue...'); While Not Keypressed Do;"

--Jeff--

Greg Lorriman

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to
Parsing backwards takes care of names with two parts. Neither the code nor
the state abbreviation is likely to have two parts. But the name is.

--
Greg Lorriman
Handy, free utils at http://www.lorriman.demon.co.uk


Philippe Ranger <.> wrote in message news:7t90t5$5j...@forums.borland.com...

Jeffrey A. Wormsley

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to
Save yourself a lot of trouble and get a copy of the ZIP
code database from the postal service, or on many CD Roms.
Just parse out the ZIP code, and use that to look up the
city and state.

Jeff.

Jeff Buffington

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to
Actually, I split the difference... searched for the state, then read the zip
code (which follows), then read the rest of the string as the city (from start
to position of state). This solved a few problems, such as the multi-part names
for cities. The original field can be up to 30 characters, but the converted
data's city field can only be 25 characters. By removing the state and zip
codes, the remaining data will fit. The only problems remaining are human
error..typos (a non-valid state abbreviation 'OF' instead of 'OR' for example),
spelled out names and older abbreviations (such as 'FLORIDA' and 'Mass'). Will
post the code in my next message..
Greg Lorriman wrote:

> Parsing backwards takes care of names with two parts. Neither the code nor
> the state abbreviation is likely to have two parts. But the name is.

> Greg Lorriman

Jeff Buffington

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to
This procedure is actually declared in another procedure, and called
locally. As stated in my previous message, it only takes into account
those states which it knows (only 2 letter states).
---
Procedure ProcessAddress(var medadd2, medadd3, city, state, zip:string);

var
USStates: TstringList;
st, ptr : integer;
Begin
USStates:=TstringList.Create;
City:=''; State:=''; Zip:=''; {Initialize variables}
Try
With USStates do
begin
{State names -- will be preceeded and folled by a space}
{Does not take into account for mispelled, full or old style
state names}
Add(' AL ');Add(' AK ');Add(' AZ ');Add(' AR ');Add(' CA
');Add(' CO ');Add(' CT ');
Add(' DE ');Add(' DC ');Add(' FL ');Add(' GA ');Add(' HI
');Add(' ID ');Add(' IL ');
Add(' IN ');Add(' IA ');Add(' KS ');Add(' KY ');Add(' LA
');Add(' ME ');Add(' MD ');
Add(' MA ');Add(' MI ');Add(' MN ');Add(' MS ');Add(' MO
');Add(' MT ');Add(' NE ');
Add(' NV ');Add(' NH ');Add(' NJ ');Add(' NM ');Add(' NY
');Add(' NC ');Add(' ND ');
Add(' OH ');Add(' OK ');Add(' OR ');Add(' PA ');Add(' RI
');Add(' PR ');Add(' SC ');
Add(' SD ');Add(' TN ');Add(' TX ');Add(' UT ');Add(' VT
');Add(' VA ');Add(' WA ');
Add(' WV ');Add(' WI ');Add(' WY ');
End; {with/do}
For St := 0 to USStates.Count-1 do begin
If AnsiPos(USStates[St], medadd2)>0 then begin
ptr:=AnsiPos(USStates[St], medadd2);
city :=trim(copy(medadd2, 1, ptr));
state :=trim(copy(medadd2, ptr, 4));
zip :=trim(copy(medadd2, ptr+4,length(medadd2)));
medadd2 := '';
break;
end {if/then}
Else if AnsiPos(USStates[St],medadd3)>0 then begin
ptr:=AnsiPos(USStates[St], medadd3);
city :=trim(copy(medadd3, 1, ptr));
state :=trim(copy(medadd3, ptr, 4));
zip :=trim(copy(medadd3, ptr+4,length(medadd2)));
break;
end;
end; {for/do}
Finally
if ptr=0 then city:=medadd3;
USStates.Free;
End; {try/finally}
End;

Philippe Ranger

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to
<<Jeff:

There is always the possibility it would run into a non-standard format
for
the state names. Hopefully, this won't occur (at least not frequently)
since
the US Postal Service abandoned the older formats ('Fla' should be 'FL' now
days).
>>

What I'm thinking of is that you have two parts, set by the first digit. If
that digit comes after an alpha char, then you reject the whole record. If
not, it starts the zip field, you check that part in the obvious way, and
the preceding part goes to a tempCity field, lowcased and trimmed to the
last alpha char.

Now, if the tempCity holds one comma, that gives you the state part, which
you can check for validity, but anyhow you're done. If it has two commas,
it's wrong. Remaining problem is tempCity with no comma.

Now, with the above quote from you, you can have one list of two-letter
abbrev to check any two-letter value preceding the zip field (if present),
and another list of full state names to check the "temp city" field against,
when the first check fails. That's --

for j := 1 to 50 do begin
if (pos(states[j], tempCity = length(tempCity) - length(states[j]) +
1) then begin
state := states[j];
BREAK;
end;
end;

If you have the states in reverse alpha order, you'll catch West Virginia
before Virginia, which is what you want. If you find Kansas with the above
code, then it's not Kansas City. If you find New York, then you have to use
the rule I proposed, that state overrides city.

Jeff Wormsley has an excellent suggestion too (get states and cities from
zip codes). It still remains that you have to have a sanity check against
tempCity, as it's easier to mess up a zip code than a state name.

<<
unfortunately if this string is longer than 25 chars,
it gets truncated.
>>

Why only 25 chars? Is storage space that scarce?

PhR


oz

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to
Jeff Buffington <a...@netzero.net> wrote:

>Peter N Roth wrote:

>> How about listing all the states in an array?
>> Fullname = 50 lines, all uppercase
>> Abbreviations = another 50 lines, all uppercase
>>
>> search for the state on each line.

>This could present a problem tho with some city names also being state
>names...the classic New York New York, or even Kansas City Kansas...

<snip>
Jeff...
See how your routine goes with the following name/address:
MS FLORIDA WASHINGTON
1600 PENNSYLVANIA CT
OKLAHOMA CITY OKLAHOMA 12345

(MS being a frequently used title, as opposed to Mrs or
Miss, also an abbreviation for Mississippi, and CT being
and abbreviation for "Court", also an abbreviation for
Conneticut).

Oz

Philippe Ranger

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to
<<Oz:

MS FLORIDA WASHINGTON
1600 PENNSYLVANIA CT
OKLAHOMA CITY OKLAHOMA 12345
>>

The routine I last suggested would check the END of the whole string before
12345 against the list of all state names, and correctly find Oklahoma. But
it would blow CT if you had --

MS FLORIDA WASHINGTON
1600 PENNSYLVANIA CT

12345

-- then only validation against zip code would show an error.

PhR

Jeff Buffington

unread,
Oct 5, 1999, 3:00:00 AM10/5/99
to

Philippe Ranger wrote:

> <<Jeff:
> There is always the possibility it would run into a non-standard format
> for
> the state names. Hopefully, this won't occur (at least not frequently)
> since
> the US Postal Service abandoned the older formats ('Fla' should be 'FL' now
> days).
> >>
>
> What I'm thinking of is that you have two parts, set by the first digit. If
> that digit comes after an alpha char, then you reject the whole record. If
> not, it starts the zip field, you check that part in the obvious way, and
> the preceding part goes to a tempCity field, lowcased and trimmed to the
> last alpha char.
>
> Now, if the tempCity holds one comma, that gives you the state part, which
> you can check for validity, but anyhow you're done. If it has two commas,
> it's wrong. Remaining problem is tempCity with no comma.

Hmm..I didn't deal with the comma issue... while the data input file needs to be
quoted and comma delimited, the comma's are stripped anyways by the billing
software on output (commas, periods and other punctuation marks are no-no's on
the HCFA forms..that even includes money, not that it's involved in this part of
the conversion).

> Now, with the above quote from you, you can have one list of two-letter
> abbrev to check any two-letter value preceding the zip field (if present),
> and another list of full state names to check the "temp city" field against,
> when the first check fails. That's --

> ...If you have the states in reverse alpha order, you'll catch West Virginia


> before Virginia, which is what you want. If you find Kansas with the above
> code, then it's not Kansas City. If you find New York, then you have to use
> the rule I proposed, that state overrides city.

You're suggesting this as far as spelled out names go? I haven't added that to
my code presently..it essentially ignores the data if it doesn't fit some sort
of standard order... 'City Name ST Zip-Code' ... then just passes it out the
old way (if it gets truncated...c'est la vie).

> Jeff Wormsley has an excellent suggestion too (get states and cities from
> zip codes). It still remains that you have to have a sanity check against
> tempCity, as it's easier to mess up a zip code than a state name.

This is a good idea, but you're absolutely right...zip codes are very easy to
make a mistake on.. then again, did you know Oregon's new abbreviation is 'OF' ?
<g> Suffice it to say, I haven't added AI to look for human errors... nor do I
think I will :-)

> << unfortunately if this string is longer than 25 chars,
> it gets truncated.>>
> Why only 25 chars? Is storage space that scarce?
>
> PhR

It's the way the software that imports the data is set up.. they're using a
Btrieve database engine, and the program was written in MS Visual C. Most
likely (and I'm guessing here), they used records and preset the length of each
field. Thus, when it reads the data, it only copies x number of characters
(like saying S:string[5] := '123456' ... the value of S would be '12345'..the 6
gets nixed).

The other problem that comes into play with long strings is the forms it prints
on...government approved forms printed with red ink.. technically your supposed
"stay between the lines." Tho most insurance companies use optical scanners
that generally ignore the red ink anyways (they filter out the red spectrum).
Still, the data needs to be in essentially the right places...and addresses that
are too long might not be visible through the non-standard windowed envelopes
commonly used in these circumstances. If you went to your nearest office supply
store (in the states) and looked for medical billing envelopes...you'd see the
cost difference (perhaps 4x more than regular windowed envelopes). Gotta love
the government... <g> On the bright side, they did create a quasi-standard
(known as HCFA-1500), that most non-government insurance companies now use.
(Imagine having to have a different box of forms for each one of 1000 insurance
carriers)
--Jeff--


Jeff Buffington

unread,
Oct 5, 1999, 3:00:00 AM10/5/99
to

oz wrote:

> Jeff...
> See how your routine goes with the following name/address:

> MS FLORIDA WASHINGTON
> 1600 PENNSYLVANIA CT
> OKLAHOMA CITY OKLAHOMA 12345
>

> (MS being a frequently used title, as opposed to Mrs or
> Miss, also an abbreviation for Mississippi, and CT being
> and abbreviation for "Court", also an abbreviation for
> Conneticut).
>
> Oz

Ahh...but here's the interesting thing about the input... taking your
example:
MedAddr1 = 'MS FLORIDA WASHINGTON'
MedAddr2 = '1600 PENNSYLVANIA CT'
MedAddr3 = 'OKLAHOMA CITY OKLAHOMA 12345'

MedAddr1 is known to be either a name/dept/etc or an address (be it a
street address, or PO Box)..it gets passed as is.
Next it searches for a valid state identifier in MedAddr2, if none found
the looks in MedAddr3.

In your example, it would search the first, but will find none. While 'CT'
does stand for Connecticut, the program is actually looking for ' CT ' (in
this case). Since my rule is the state will always be followed by a zip
code, logic dictates that at least one space would seperate them.*

Then it would search MedAddr3 for a valid state identifier... hmm, none
found there either *...ok, pass it as it is and go on to the next record
(line).

As for my little *'s :
1) I did actually run into this problem while testing, and furthermore,
one you might not have thought of (I certainly hadn't). In my first
attempt of establishing my states, I tried using only 2 characters.. this
one I figured wouldn't work, but I let it try anyways. Okay, so I added a
space behind (thus 'FL' became 'FL '). This worked great...or so I
thought, until I noticed that it thought Philadelphia was in Iowa...why?
Well... it looked and found the 'IA ' in Philly...cute, so realizing this
was a dumb thing for it to do, I added another space, this time after the
state abbreviation.. thus Iowa's ' IA' became ' IA '.

Still, there is one situation where it would fail even if it uses the
proper 2 letter abbreviation...
MedAddr3 = 'OKLAHOMA CITY,OK 12345' -- since there is no leading space,
it wouldn't know what to do with it, and it'd pass it on without
exctracting the city, state or zip. Possible workaround .. have it look
for a comma, and read two chars afterwords as the state "Copy(S,
AnsiPos(',')+1, 2);", and of course after the state should be a zip
"Copy(S,AnsiPos(,')+3,Length(S));", before the state "Copy(S, 1,
AnsiPos(',')-1);" [Guess I'm writing code and email at the same time..<g>]

2) I'm still formulating a plan to tackle the numerous ways to abbreviate
or not abbreviate states, and futhermore deal with issues such as the 'New
York New York 10023'.

--Jeff--

Philippe Ranger

unread,
Oct 5, 1999, 3:00:00 AM10/5/99
to
<<Jeff:

Hmm..I didn't deal with the comma issue... while the data input file needs
to be
quoted and comma delimited, the comma's are stripped anyways by the billing
software on output
>>

The examples you gave showed commas were possible at least between city and
state, but not (obviously) inside either field. I put that in the specs I
proposed in my first message.

<<
You're suggesting this as far as spelled out names go?
>>

Yep, from your quote (no old-type abbreviations), I concluded that the state
(the last part before zip, if present) is either --

-- missing
-- a two-letter code
-- or a full state name

That's only two 50-item lists to check against. The two-letter check is
easy, since you need a last, two-letter, word before the zip. The full-name
check requires going through an array the way I showed. If nothing is found
on either run, then state is missing. (And, if the zip is there, you can use
it to fill-in.)

PhR

Philippe Ranger

unread,
Oct 5, 1999, 3:00:00 AM10/5/99
to
<<Jeff:

Still, there is one situation where it would fail even if it uses the
proper 2 letter abbreviation...
MedAddr3 = 'OKLAHOMA CITY,OK 12345' -- since there is no leading space,
it wouldn't know what to do with it, and it'd pass it on without
exctracting the city, state or zip. Possible workaround .. have it look
for a comma, and read two chars afterwords as the state "Copy(S,
AnsiPos(',')+1, 2);", and of course after the state should be a zip
"Copy(S,AnsiPos(,')+3,Length(S));", before the state "Copy(S, 1,
AnsiPos(',')-1);" [Guess I'm writing code and email at the same time..<g>]
>>

If you'll look at my original specs (which aren't really correct either),
you'll see that I don't presume I'll get spaces if there are commas. Jeff,
we'll never agree on this, but to my mind you're much too much
implementation-driven. You figure out a way to do something, and then that
becomes the thing to be done.

From the start, I never thought of actually calling Copy() on anything until
I had it parsed out.

Actually, I'm coming to Greg's idea of going backwards -- but that's simply
indexing backwards. Something like this --

-- while digit, move back
-- if start of string or preceding char is non-alpha, then tail part is zip
candidate. Copy and pass to zip-checker.
-- if preceding char is alpha, junk the record.
-- move back trough non-alpha chars.
-- move back one char (if possible, as always); if not alpha, state is
missing; whole string is city; stop scanning.
-- move back one char; if not alpha, then do state-abbrev check on two chars
just passed
-- if check passed, then we have state; remove two last chars and
preceding non-alphas.
-- if check not passed, then state is missing.
-- in both cases, remaining string is city; stop scannning
--if current char is alpha, pass whole string to full-state-name check I
showed previously
-- if check passed, get state field, remove it from string, as well as
preceding non-alphas.
-- in any case, remaining string is city; stop scanning.
--last, do pos(',', s) on remaining string for sanity check; a city does not
have commas.
-- I still suggest checking the state (if present) against the zip, to
reduce errors.

PhR


Philippe Ranger

unread,
Oct 5, 1999, 3:00:00 AM10/5/99
to
<<PhR:

-- while digit, move back
>>

--while digit or hypen, move back.

Also, once past this, moving backwards, any digit should cause the record to
be junked.

(It's all very simple with an FSA...)

PhR

oz

unread,
Oct 6, 1999, 3:00:00 AM10/6/99
to
Jeff...

I don't remember the specifics of the trial anymore, but
there was that guy who wanted to change his name to
a single digit (he lost, as I recall).

Just a reminder :0)

Oz

Jeff Buffington

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
oz wrote:

Well...I glad he lost..<G> Seriously tho, that part of the code (if you
check the entire thread, you'll see what the whole scope of the program
is) is targeting insurance companies' addresses. Sometimes they can be
quite obscure... but the primary reason for adding the check for
lengthy state names and old abbreviations would be to 'idiot-proof' the
code (perhaps from people who want to change their name to a single
digit.. <g>). Of course we all know about Murphy's Law..
In further revision, I'm sure I'll likey wind up improving the code even
more to identify zip codes even if the proper state abbreviation isn't
there. And, as best as I can figure backwards scanning will probably be
the key. Of course, there is always that possibility of an odd 5 digit
number showing up that isn't a zip code (a very likely possibility when
reviewing the source data files).

--Jeff--


Jeff Buffington

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to

Philippe Ranger wrote:

What about something like -- 'PO BOX 55121' ? Since large post offices can have
5 digit po boxes..this could pose a problem. Of course, the next field would
likely contain the city/state/zip... 'NEW YORK NY 10026'...but not necessarily.

--Jeff--


Jeff Buffington

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
Philippe Ranger wrote:

> <<Jeff:
> Still, there is one situation where it would fail even if it uses the
> proper 2 letter abbreviation...
> MedAddr3 = 'OKLAHOMA CITY,OK 12345' -- since there is no leading space,
> it wouldn't know what to do with it, and it'd pass it on without
> exctracting the city, state or zip. Possible workaround .. have it look
> for a comma, and read two chars afterwords as the state "Copy(S,
> AnsiPos(',')+1, 2);", and of course after the state should be a zip
> "Copy(S,AnsiPos(,')+3,Length(S));", before the state "Copy(S, 1,
> AnsiPos(',')-1);" [Guess I'm writing code and email at the same time..<g>]
> >>
>
> If you'll look at my original specs (which aren't really correct either),
> you'll see that I don't presume I'll get spaces if there are commas. Jeff,

There shouldn't be commas, but it could happen. And of course, if I came
accross something like 'KISSIMMEE,FL 34741' then it wouldn't find the state. My
only thinking the the ' FL ' type of search is to do a preliminary scan. I'll
have to do something to catch those that would slip through this 'crack'.

> we'll never agree on this, but to my mind you're much too much
> implementation-driven. You figure out a way to do something, and then that
> becomes the thing to be done.

I'm open to constructive criticism... please let me know what I can do to
improve. Have I left you with the impression that I would think the way I've
solved the problem would be the only way? (I hope not) Or that my line of
thinking my wind up creating more work than necessary..? I can see the latter
as a possibility..were I to continue to expand my model, I'd have to look for
each of 50+ fully spelled state/territy/etc names, 50+ old abbreviations.. and
would miss any typos, such as that 'OF' instead of 'OR' or 'FLO' (should be
'FLA')...a lot of additional typing. However, I'll still have to convert them
to regular abbreviations if possible..remember this data needs to have the state
field be only two characters (or they get truncated).

> From the start, I never thought of actually calling Copy() on anything until
> I had it parsed out.

Not sure how else to parse it out..? Help?

> Actually, I'm coming to Greg's idea of going backwards -- but that's simply
> indexing backwards. Something like this --

I thought that was my idea? :)

> -- while digit, move back

> -- if start of string or preceding char is non-alpha, then tail part is zip
> candidate. Copy and pass to zip-checker.

Again...from looking at the data (via Excel and the old program)..there are some
5 digit numbers oddly (and somewhat randomly) placed in the third (MedAddr3)
field... so something would have to be done to make sure it is indeed a zip
code and not just this number. (I think they're ID numbers assigned to the
doctor but it's not really where they belong).

> -- if preceding char is alpha, junk the record.

When you say 'junk', you do mean to just copy as is..? Since it was part of the
address (right or not) in the origional data.

> -- move back trough non-alpha chars.
> -- move back one char (if possible, as always); if not alpha, state is
> missing; whole string is city; stop scanning.
> -- move back one char; if not alpha, then do state-abbrev check on two chars
> just passed
> -- if check passed, then we have state; remove two last chars and
> preceding non-alphas.
> -- if check not passed, then state is missing.
> -- in both cases, remaining string is city; stop scannning
> --if current char is alpha, pass whole string to full-state-name check I
> showed previously
> -- if check passed, get state field, remove it from string, as well as
> preceding non-alphas.
> -- in any case, remaining string is city; stop scanning.

> --last, do pos(',', s) on remaining string for sanity check; a city does not
> have commas.

I like your reference to it... 'sanity check'... :) A good idea
none-the-less..

> -- I still suggest checking the state (if present) against the zip, to
> reduce errors.

This would require a lot more space wouldn't it? Plus I'd have to include it
with any distributions... was hoping to keep it on a single floppy..

Tnx,
--Jeff--

Philippe Ranger

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
<<Jeff:

What about something like -- 'PO BOX 55121' ?
>>

Jeff, for the third time -- why don't you write the full specs out in
English? NO EXAMPLES, no pseudo-code, just specs. You specifically said you
had three fields, city, state, zip, always in that order. Where does address
come in? And as the last item present? Write the whole mess in English, then
we'll see. As it stands, that yet-another-example is like a fifth ace. Right
now, you're the only guy who knows what you want to know, and it looks like
even under torture you won't tell.

PhR

Philippe Ranger

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
<<Jeff:

. Have I left you with the impression that I would think the way I've
solved the problem would be the only way? (I hope not)
>>

You've left me with the absolute conviction that you will not spell out, in
English, if full, what the problem is. I really, really, don't believe in
solutions to unstated problems. And in this case, I think spelling out the
problem in full, no solution, no hint of solution and no examples, will
easily translate into either, "here's the solution" (FSA) or "this is not
soluble unless you agree we dump such and such cases".

<<
I thought that was my idea? :)
>>

Sorry, missed that.

<<
Again...from looking at the data (via Excel and the old program)..there are
some
5 digit numbers oddly (and somewhat randomly) placed in the third (MedAddr3)
field...
>>

How about at least showing the old record definition?

<<
When you say 'junk', you do mean to just copy as is..?
>>

I mean, copy to a separate file for human decoding.

<<<PhR:


-- I still suggest checking the state (if present) against the zip, to
reduce errors.
>>>

<<Jeff:


This would require a lot more space wouldn't it? Plus I'd have to include
it
with any distributions... was hoping to keep it on a single floppy..
>>

Correct me if I'm wrong, I don't live in the states. There are less than
100000 zip codes. If you have a table of 2-char state values, with XX for
invalid-zip, that takes up only 200k. And that's the worst approach
possible. Actually, the zip header gives the state, perhaps as few as 99
values (2 first digits of zip) would suffice -- 200 bytes.

PhR

Rick Rogers (TeamB)

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
> I really, really, don't believe in solutions to unstated problems

To quote a friend and colleague (Steve Schafer): "Human users become
frustrated because they cannot figure out how to get the computer to do what
they want it to do, and so the computer persists in providing precisely the
right answer to precisely the wrong question."

- Rick

Jeff Buffington

unread,
Oct 8, 1999, 3:00:00 AM10/8/99
to
Philippe Ranger wrote:

Please don't get upset with me..I'm trying..and I do want to learn. And
remember, I posted my code...hoping you (and others) would see what I was
doing..as you told me I should. As a side note, I bought a book (paid $15 for
it)..unfortunately, it's only as current as Delphi3, but it should help ('Delphi
3 Superbible' published by Waite Group Press) I'm used to relying on a
reference manual...that's the only book I used when I did TP5.5 programming..and
not having something in print was driving me nuts (and I don't have enough toner
to print out the entire PDF file).

I do have the actual specs for my output, and as best as I have decoded
them..the specs from the input. Remember, I'm reading a file and rewriting that
information in the "proper" format...my program is a conversion utility. It
converts from an old outdated (DOS based and not Y2k compliant) billing program
to a new Window's based program. This portion of that program deals with just
one file, and writes just one file.. the only problem is the way the old program
stored information.. it didn't force the user to do it the 'right way.'

Ok... I have a file consisting of 20 fields per line, this file is named
FILCR.FIL. The output file is LM05.TXT, and is 51 fields per line.

Unfortunately, the only way I can think of to give you the specs is to show you
how I deciphered the fields. Straight from my Excel worksheet that I used to
decode which fields were what, call them my best guess at that files specs
(First line my Excel labels):

> Code| Name |Type|Mods | Pct Coverage |Deductable|Form| Address 1 | Address 2 | Address 3 | Phone |Contact |Balance |Current| 30-day|60-day |90-day |120-day |Life Benefits|Yearly Ben
> MED |MEDICARE B |2 |XXXXX|€€€€€€€€€€€€€€€€€| 75 |15E |PARTICIPATING PHYSCIAN CLAIMS | PO BOX 44117 |JACKSONVILLE FL 32231 |9046344988| |49238.131|177 |1278.08|1154.58|1093.39|45535.081|0 |0
>
In the FILCR.FIL file that data appears as :

> MED,MEDICARE B,2,XXXXX,€€€€€€€€€€€€€€€€€,75,15E,PARTICIPATING PHYSCIAN CLAIMS,PO BOX 44117,JACKSONVILLE FL 32231,9046344988,,49238.131,177,1278.08,1154.58,1093.39,45535.081,0,0,
>
It is important to note that the information in the three address fields *DO
NOT* appear in the same order. IOW, it's entirely possible that a PO Box or
street address might appear in the first field, and the city/st/zip in the
second...the third could be blank, or contain that obscure number I was telling
you about. Its not a zip code, more likely a provider ID number (an account
number for Doctors)..but it was put there BY the end-user entering that as part
of the address, not (directly) by the program that created the file. The
original program simply gives you 3 fields, each 30 characters long, to put an
address in..what the user types in it could be anything. This is key to the
problem....trying to decipher what a user put in these three fields and split
them into 5 fields (Address1, Address2, City, State and Zip). They could enter
literally anything they wanted...thus the case for my examples..they could spell
out the state, abbreviate it in the older style, forget it, mispell it, who
knows.

I've make some assumptions about the three address fields:
(1) Address1 likely contains a street address/PO Box, or department
name/number and should always be coppied over as is;
(2) Address2 and/or Address3 _will_ contain/complete a legitimate address,
one of which will have a city, state and zipcode;
(3) A zipcode will follow a state, and whatever preceeds the state will be a
city, whether or not it contains other information (since the destination 'City'
field has
25 characters to work with, one state and zip are removed, it should
fit;
(4) If Address2 doesn't contain a state, then it should get coppied over as
is, and Address3 should contain that information..and visa-vis;
(5) If it can't find a state..'junk it' (to use your phrase)
Address1=Address1, Address2=Address2 and Address3=City (if it's longer than 25
characters it gets
trunc'd).. in essence, this information will have to be edited by
the user in the new billing program, but the data still gets copied.

Each line like the above is read, then resequenced (rearranged), and written to
a comma delimted, quoted file (as such):

> "MED","","MEDICARE B","PARTICIPATING PHYSCIAN CLAIMS","PO BOX 44117","JACKSONVILLE FL 32231","","","9046344988","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
>

When the new billing program prints out data on forms, it concatenates city,
state and zip for it's output (seperating each with spaces)...so even if the
information was 'junked' it should still print out. The reason the customer
called me back was that many of the forms were missing part of the address
because of the 25 character limit on the 'city' field (most of the problems were
cases of missing digits off of the zipcode). Prior to revision (and the start
of this mess) my code simply coppied Address3 over to City... characters got
truncated in those cases.

Note that fields lengths are maximums, they can be shorter (or null), as long as
the quotes/commas exist. Here are the file output specs (51 fields, total; this
information straight out their manual):

> Field# Description Type Field Length >> Comes from input field
> 1 Insurance Code Text 9 >> Code
> 2 Insurance Type Text 2
> Valid values are 0=Commercial, 1=Medicaid Phys, 2=Medicare with Crossover, 3=Medicare, 4=Blue Shield, 5=Champus, 6=Workman's Comp., 7=Blue Cross-Phys.
> 3 Insurance Name Text 40 >> Name
> 4 Address Line 1 Text 40 >> Address1
> 5 Address Line 2 Text 40 >> Address2 or Address3 (no city/st/zip)
> 6 City Text 25 >> Address2 or Address3
> 7 State Text 2 >> (Same)
> 8 Zipcode Text 10 >> (Same)
> 9 Phone Text 14 >> Phone
> Telephone numbers must not have any brackets or dashes and must be contained in quotes ("8015620111").
> 10 Extension Text 5
> 11 Fax Text 14
>
The rest of the fields cannot be obtained from the original data, but just so
you know I'm being complete, here they are:

> 12 Assignment Text 1
> Valid Values are: Y=Yes, N=No
> 13 Practice ID Text 20
> 14 Provider ID Numeric 2
> 15 Medigap ID Text 20
> 16 Processing Type Text 2
> 17 Commercial ID# Part 1 Text 6
> 18 Commercial ID# Part 2 Text 5
> 19 Destination ID Text 4
> 20 Carrier ID (Part 1) Text 20
> 21 Carrier ID (Part 2) Text 20
> 22 Special Reference Name Text 20
> ..
> 26 Special Reference Name Text 20
> 27 Special Reference Number Text 4
> ..
> 31 Special Reference Number Text 4
> 32 Custom Field Text 100
> ..
> 51 Custom Field Text 100
>

_Please _ tell me I've clarified things... :)
--Jeff--


Jeffrey A. Wormsley

unread,
Oct 8, 1999, 3:00:00 AM10/8/99
to
Philippe Ranger wrote:
>
> I mean, copy to a separate file for human decoding.
>
For this type of application, this is a great idea.
Hopefully, only a small percentage won't fit the rule (zip,
two digit state with either ',ST ' or ' ST ' syntax, rest
City). Those that do not fit go into a "manual process
table" where you present those fields as they currently
exist, offer the user a chance to edit them, and show the
proposed output. When it is to the users satisfaction, they
commit the record and go on to the next one.

Jeff.

Philippe Ranger

unread,
Oct 8, 1999, 3:00:00 AM10/8/99
to
<<Jeff:

Please don't get upset with me..I'm trying..and I do want to learn. And
remember, I posted my code...hoping you (and others) would see what I was
doing..as you told me I should.
>>

Some people here are used to these kinds of db. I'm not. I don't know where
this one comes from, nor what it's for. Sometimes, it sounds like a scribble
pad for the entry clerk.

I'm not getting upset -- at all. But, for my part, I still do not know what
you actually want to do. It sounds as if you have a larger record than you
said, and you're only worried about the tail end of it -- for which you
posted code, but never spelled out *all* the problems to check for.

So, I'm back to my broken record -- let's see the problem, the whole of it,
just as a problem, no examples, no solutions, and then we'll see about the
solutions. "Is this a good solution to..." where the dots aren't fully
filled in, is a hard question to answer.

<<
I'm used to relying on a
reference manual...that's the only book I used when I did TP5.5
programming..and
not having something in print was driving me nuts (and I don't have enough
toner
to print out the entire PDF file).
>>

This is indeed a side note, as you said. Right now, you don't have an OP
problem, you have a routine-design problem. Point A is the records as they
are (and only you know much about that), point B is the records you're
supposed to output (and you've said a bit about this, but not the complete
picture), and the problem is moving from A to B. Even if it were "coded" as
instructions to a clerk, it would still be the same design problem.

<<
The
original program simply gives you 3 fields, each 30 characters long, to put
an
address in..what the user types in it could be anything.
>>

Ok. And I presume those 3 input fields match 3 specific fields in the
original file, right?

The 17 other fields are not a worry, only these three user-scribble fields?

And nowhere else in the original record can you get city, state and zip,
right?

And, from these fields, is there anything else you need to keep? Only city,
state, zip, or what might be a street address too? Question answered --

<<
This is key to the
problem....trying to decipher what a user put in these three fields and
split
them into 5 fields (Address1, Address2, City, State and Zip).
>>

<<


I've make some assumptions about the three address fields:
(1) Address1 likely contains a street address/PO Box, or department
name/number and should always be coppied over as is;
(2) Address2 and/or Address3 _will_ contain/complete a legitimate
address,
one of which will have a city, state and zipcode;

[etc]
>>

This is where you start jumping ahead of yourself, I think.

Presumably, your output Addr1 and 2 are simply split in order to print as
short lines. Now, are you reasonably sure that the original three fields,
when printed one to a line on an address label, were enough for the postman
to deliver a letter? Yes or no?

If yes, we know that the last info in those 3 fields was either zip code or
city+state. Right?

<<
The reason the customer
called me back was that many of the forms were missing part of the address
because of the 25 character limit on the 'city' field (most of the problems
were
cases of missing digits off of the zipcode).
>>

IF the answers to the last two questions are Yes and Right, then the first
rule should be, no matter how many chars it takes, your three fields should
keep in full the zip (easy to tell) and the city and state (not so easy to
tell). Also, as zip is mostly there, it should be used to cross-check the
state, or to find it if absent. In any case, if you have a "clear" state and
a zip, and they conflict, you junk the record. In all other cases, where you
have either a clear state or a zip, you put the state in two-letter form on
the output.

So, you have five fields for output, and you know fairly surely if you have
a zip and what it is -- last field done. You also know the state somehow,
else you have to junk the record, and that takes care of the next-to-last
field. Now, is there a reason for the "City" field to really, absolutely,
only hold city names?

If not, you can just put in there the last part of whatever is left from the
input field, once you've cut off state and zip. It'll be a city name (plus
possibly needless info) about 98% of the time.

Whatever you don't put in those last three fields goes into the two first
ones, and you just try to split it the way it was in the original, but first
ensure that you're not cutting stuff off.

The process I'm looking at seems different from the one you used. I'm not
assigning source fields to output fields. Rather, I go --

-- find zip, or whether zip absent

-- find state, accepting anything that matches a 2-letter or full-size
name, and is in a reasonable position. Accept close matches for full state
names (there are functions for that).

-- if both zip and state found, check one against the other.

-- if so far so good (i.e. record not junked), remove from the original
source fields the data thus parsed

-- Make reasonable cut from end of remaining source to fill in your city
field.

-- dump the rest of the source in your first address field. If the
"rest" still covers two fields in the source, use two fields in the output.
If the "rest" doesn't fit in one field, also use two. If it still doesn't
fit, prepend the overflow to the city field.

<<
_Please _ tell me I've clarified things... :)
>>

Definitely! Can you answer my questions?

PhR

Jeff Buffington

unread,
Oct 10, 1999, 3:00:00 AM10/10/99
to
Philippe Ranger wrote:

> Some people here are used to these kinds of db. I'm not. I don't know where
> this one comes from, nor what it's for. Sometimes, it sounds like a scribble
> pad for the entry clerk.

Frankly, I'd be surprised if anybody was used to this (the original) kind of
db.. :)

> It sounds as if you have a larger record than you
> said, and you're only worried about the tail end of it -- for which you
> posted code, but never spelled out *all* the problems to check for.

That's true...I took the approach that if it wasn't a problem (a specific field
to process or a procedure/function) to not go into detail about it. I had seen
previous comments from those in countries that have to pay-per-minute for
access, complaining about excessively long messages..although I only download
the headers..but then, I also stay online half the time.

> So, I'm back to my broken record -- let's see the problem, the whole of it,
> just as a problem, no examples, no solutions, and then we'll see about the
> solutions. "Is this a good solution to..." where the dots aren't fully
> filled in, is a hard question to answer.

Ok.. I'll try my best to give all the details (at least what's possible) from
now on.

> Ok. And I presume those 3 input fields match 3 specific fields in the
> original file, right?

Correct... This file is a list of insurance companies... it's a seperate entity
from the other files that were processed, and all the information is in this one
file. The link between all of these files is performed by the program that will
import my output files. Most patients will have insurance..but not all (some
may pay cash), and the patients info file would have a code corresponding to (at
least) one of the records listed in this file. It tells the billing software,
which company to bill for services, and what address to put on the billing forms
(and labels if they are used).

> The 17 other fields are not a worry, only these three user-scribble fields?

Yes...at least for now..the data pulled from the rest I've been able to deal
with..even the one with the Ansi characters..which btw correspond to a value
between 1 and 100 percent (just in case u were curious).

> And nowhere else in the original record can you get city, state and zip,
> right?

Unfortunately...those three fields are the only place where that info can be
found.

> And, from these fields, is there anything else you need to keep? Only city,
> state, zip, or what might be a street address too? Question answered --

If all that the Addr2 field consists of is a City, State and Zip, then in the
new file it would make Addr2 null (no need to duplicate information twice).
However, Addr2 may contain a street or box address, and thus Addr3 should
contain the rest (since the street address should preceed the city/st/zip, but
in a seperate line/field).

> This is where you start jumping ahead of yourself, I think.

> Presumably, your output Addr1 and 2 are simply split in order to print as
> short lines. Now, are you reasonably sure that the original three fields,
> when printed one to a line on an address label, were enough for the postman
> to deliver a letter? Yes or no?

Yes..as long as the Carrier (Insurance Company) Name field is there..and that is
simply making CarrierName=CarrierName.. no special processing (aside from the
quotes/commas) needs to be done. The information is also used for viewing said
info in the billing program for verification...presumably the end-user would
call the phone number and verify the patient's insurance information is correct
(get benefits, etc), and (hopefully) that the address is too.

> If yes, we know that the last info in those 3 fields was either zip code or
> city+state. Right?

Well...this should be 'Right'...but there's no guarantee..remember those oddly
placed numbers I spoke of? They seem to usually appear in the last field..but
again, that is just for the dataset that I'm converting now...It's where the end
user wound up putting that information, and it doesn't really belong there.
Also, there are cases when that third field was used for internal routing
information ('Attention: blah blah'). So that is why I had it look for the
state, then read the zip, and assume the previous as city.

> IF the answers to the last two questions are Yes and Right, then the first
> rule should be, no matter how many chars it takes, your three fields should
> keep in full the zip (easy to tell) and the city and state (not so easy to
> tell). Also, as zip is mostly there, it should be used to cross-check the
> state, or to find it if absent. In any case, if you have a "clear" state and
> a zip, and they conflict, you junk the record. In all other cases, where you
> have either a clear state or a zip, you put the state in two-letter form on
> the output.

Well..I'm reading these fields as Tstrings, so I will have read the entire zip,
it would just be a substring of one of those address fields. Of course it could
be either a 5 or a 5+4 zip.

> So, you have five fields for output, and you know fairly surely if you have
> a zip and what it is -- last field done. You also know the state somehow,
> else you have to junk the record, and that takes care of the next-to-last
> field. Now, is there a reason for the "City" field to really, absolutely,
> only hold city names?

No, not really...which is why in the first generation all I did was copy
from/to. However, you can sort the list, and print claims or statements, by
code, state, zip, etc. So to encourage the 'user friendly' concept, and as an
added bonus strip at least 7 characters (which could have been trunc'd
anyways)...the next revision attempts to fill in whatever blanks it can (St/Zip)
by moving them from the old string into the new proper one.

> If not, you can just put in there the last part of whatever is left from the
> input field, once you've cut off state and zip. It'll be a city name (plus
> possibly needless info) about 98% of the time.

Yep...once those are gone, even if it isn't just a city name, it should fit in
that field. However, if none of that is there, it probably

> Whatever you don't put in those last three fields goes into the two first
> ones, and you just try to split it the way it was in the original, but first
> ensure that you're not cutting stuff off.

If you look at that list of (output) fields, you'll see that they alot 40
characters for Addr1 & 2, but only 25 for City, whereas the fields from the
input file are 30 chars for Addr1-3.

> The process I'm looking at seems different from the one you used. I'm not
> assigning source fields to output fields. Rather, I go --
>
> -- find zip, or whether zip absent

If a zip is absent it cannot be mailed...the Postal Service won't deliver
non-zipped mail,. I suppose, however, the record could be a 'dummy' used for
some other strange purpose.

> -- find state, accepting anything that matches a 2-letter or full-size
> name, and is in a reasonable position. Accept close matches for full state
> names (there are functions for that).

Close matches? Not sure what you mean by that...

> -- if both zip and state found, check one against the other.

This would be the little routine for checking zip-codes against their states, I
presume?

> -- if so far so good (i.e. record not junked), remove from the original
> source fields the data thus parsed

> -- Make reasonable cut from end of remaining source to fill in your city

> field.
> -- dump the rest of the source in your first address field. If the
> "rest" still covers two fields in the source, use two fields in the output.
> If the "rest" doesn't fit in one field, also use two. If it still doesn't
> fit, prepend the overflow to the city field.

I think I see what ur doing... since the ouput has 40+40+25 chars to work with,
find the state and zip first and cut (if they exist), then move the rest into
Addr1 & 2 and City (as appropriate).

> Definitely! Can you answer my questions?
> PhR

Whew! Thank goodness. I hope I've answered them all, lemme know if I missed
anything. Oh, and here's some information about the zip-code database I
found..(from www.usps.gov).:

> City State File
> The City State File is a comprehensive list of ZIP Codes with corresponding city and county name. This file also contains any other names by which a post office
> may be known. The City State File can assist mailers with city, state, and five-digit validation, ZIP Code assignment, finance number assignment, and county
> code/name identification.
> This file is available on computer tape, cartridge, and CD-ROM. It contains approximately 100,000 records. Updates to this file are issued on a monthly or
> quarterly basis. Technical guides are shipped automatically with a subscription to this product or are available upon request.
> Note: The City State File does not contain detailed street data. It only contains city place names and their associated ZIP Codes. In order to
> perform complete five-digit ZIP coding of address files, the City State File must be used in conjunction with either the Five-Digit ZIP Code File, the
> ZIP+4 File, or the CRIS File.
>
I guess they want me to buy it..and that's not likely to happen.. perhaps if I
alleged FOIfA (Freedom Of Information Act).. <G>..all I want is to do download
it. I'll keep looking for other sources though, and I've sent email to them for
more info.

Tnx, Jeff

Philippe Ranger

unread,
Oct 10, 1999, 3:00:00 AM10/10/99
to
<<Jeff:

If all that the Addr2 field consists of is a City, State and Zip, then in
the
new file it would make Addr2 null (no need to duplicate information twice).
However, Addr2 may contain a street or box address, and thus Addr3 should
contain the rest (since the street address should precede the city/st/zip,
but
in a separate line/field).
>>

Where we differ here is that you're field-oriented and I am not. I know you
need either the zip field or both city and state, but much preferably all
three. Where you get them from is another question. Aside from that, you
have two address fields, which are only split for label formatting purposes.
Once we've assigned some of the original info (no matter where from) to
City, State and Zip (and some assignments may be empty), then the rest of
the original info, no matter what original fields it was put in, gets
stuffed into those two address fields, with emergency overflow to the city
field.

However, I should note that there's no need to duplicate info even once. <g>

<<<PhR:


Now, are you reasonably sure that the original three fields, when printed
one to a line on an address label, were enough for the postman to deliver a
letter? Yes or no?
>>>

<<Jeff:
Yes..
>>

<<<PhR:


If yes, we know that the last info in those 3 fields was either zip code or
city+state. Right?

<<Jeff:


Well...this should be 'Right'...but there's no guarantee..remember those
oddly placed numbers I spoke of? They seem to usually appear in the last
field..but again, that is just for the dataset that I'm converting
now...It's where the end
user wound up putting that information, and it doesn't really belong there.
>>

Well, the whole three fields went on the three last lines of a label, and
the mail was delivered. Just how much can you put on the bottom line of an
address, besides city, state, zip, and have the mail delivered anyhow? Or,
rather, could it be that in these cases the last line held no address info,
and city, state and zip were above it?

I don't have all that much db experience, but using the last line for office
info is not uncommon. Over here in Canada, the Post Office has gone on
campaigns over this kind of mess, especially in order to use machine
readers. Dunno about your neck of woods.

<<
Also, there are cases when that third field was used for internal routing
information ('Attention: blah blah'). So that is why I had it look for the
state, then read the zip, and assume the previous as city.
>>

Ah yes, the great "Attention:"! *IF* we go by the notion that in all cases
there was a generally deliverable address label, THEN we can assume that if
the last line fails to parse as either city and state, or zip, or both, then
you can forget it (put it in your Addr1 field), and the line to parse
becomes the second line. Right?

<<
Well..I'm reading these fields as Tstrings, so I will have read the entire
zip,
it would just be a substring of one of those address fields. Of course it
could
be either a 5 or a 5+4 zip.
>>

First, you can't read as Tstrings, it's an abstract class. Second, I don't
want to know this, it's implementation. We're still trying to figure what we
actually mean to do, i's dotted and t's crossed.

That said, the 5+4 zip is actually a variety of easy case. Perhaps you can
have oddball five-digit numbers in there, but you can't seriously have
12345-1234 unless it IS a zip code.

BTW, and totally no help to you, Canada copied the British system. This uses
"a1b 2c3", with space compulsory. It's impossible to have this by accident,
the system yields so many strings even a small apartment block rates its own
postal code, and there's a lot of redundancy and sanity-checks built in.

<<<PhR:


Now, is there a reason for the "City" field to really, absolutely, only hold
city names?
>>>

<<
No, not really...which is why in the first generation all I did was copy
from/to. However, you can sort the list, and print claims or statements, by
code, state, zip, etc.
>>

Nice wish, but that assumes that cities are always written the same way. No
LA, just Los Angeles (anyhow, the real name is Lalaland). No NYC. No Wash.
DC (a good suggestion anyhow). I say, getting a safe zip and state is hard
enough, let's leave some looseness for the City field, as long as it doesn't
confuse the mailman.

<<
If you look at that list of (output) fields, you'll see that they alot 40
characters for Addr1 & 2, but only 25 for City, whereas the fields from the
input file are 30 chars for Addr1-3.
>>

?? So you're "stuffing" 90 chars into 105? Let all your problems be of this
kind! Also, I agree that a city that can't be identified in the first 25
chars of its name, doesn't fit on the map either. If, once you have settled
on state and zip, and removed any relevant chars from the original 3-line
source, you take the 25 chars before that, surely the full city name is in
there (the problem is to find where it begins).

<<
If a zip is absent it cannot be mailed...the Postal Service won't deliver
non-zipped mail,. I suppose, however, the record could be a 'dummy' used
for
some other strange purpose.
>>

So, if a zip is not so obvious in there a mailman would find it, then the
record goes to the user-perusal file, definitely.

<<<PhR:


Accept close matches for full state names (there are functions for that).
>>>

<<Jeff:


Close matches? Not sure what you mean by that...
>>

No mailperson will have problems with New Jresey, especially as probably
only the zip counts. However, if you have the zip, you may wish to
sanity-check it against the state. In any case, you *have* to figure out
what passes for state info in the original, else you'll be mailing to --

New Jresey NJ
12345

You have to remove the supposed state info from the original before starting
to fill city and address. So, you need something that tells you New Jresey
is New Jersey misspelled. As usual, HyperString comes to the rescue --

---------
function Similar(const S1,S2:AnsiString):Integer;

Ratcliff/Obershelp pattern matching (DDJ, July 88). Conducts an exhaustive
examination of the two provided strings and returns a percentage ratio (0 -
100) representing their similarity.
------------

<<<PhR:


-- if both zip and state found, check one against the other.
>>>

<<
This would be the little routine for checking zip-codes against their
states, I
presume?
>>

Yes, the zip-code-to-state list. If the state you've found (see above)
doesn't match the zip, then you can test the zip to see if it's a valid one.
If valid, then either that or the state is wrong. If the state was a
two-letter code, then it may well be wrong, less likely otherwise. In any
case, you may choose to send the record to the user-check file. Bad info is
worse than no info.

<<
I think I see what ur doing... since the ouput has 40+40+25 chars to work
with,
find the state and zip first and cut (if they exist), then move the rest
into
Addr1 & 2 and City (as appropriate).
>>

That's it.

<<
I guess they want me to buy it..
>>

Well, how much is it?

PhR


Jeff Buffington

unread,
Oct 11, 1999, 3:00:00 AM10/11/99
to
Philippe Ranger wrote:

> <<Jeff:
> If all that the Addr2 field consists of is a City, State and Zip, then in
> the
> new file it would make Addr2 null (no need to duplicate information twice).
> However, Addr2 may contain a street or box address, and thus Addr3 should
> contain the rest (since the street address should precede the city/st/zip,
> but
> in a separate line/field).
> >>
>
> Where we differ here is that you're field-oriented and I am not. I know you
> need either the zip field or both city and state, but much preferably all
> three. Where you get them from is another question. Aside from that, you
> have two address fields, which are only split for label formatting purposes.
> Once we've assigned some of the original info (no matter where from) to
> City, State and Zip (and some assignments may be empty), then the rest of
> the original info, no matter what original fields it was put in, gets
> stuffed into those two address fields, with emergency overflow to the city
> field.
>
> However, I should note that there's no need to duplicate info even once. <g>

Unlese we really want to mess with the postal service.. <g>

> <<<PhR:
> Now, are you reasonably sure that the original three fields, when printed
> one to a line on an address label, were enough for the postman to deliver a
> letter? Yes or no?
> >>>
>
> <<Jeff:
> Yes..
> >>
>
> <<<PhR:
> If yes, we know that the last info in those 3 fields was either zip code or
> city+state. Right?
>

> Well, the whole three fields went on the three last lines of a label, and
> the mail was delivered. Just how much can you put on the bottom line of an
> address, besides city, state, zip, and have the mail delivered anyhow? Or,
> rather, could it be that in these cases the last line held no address info,
> and city, state and zip were above it?

From looking at the data, I can tell that quite often additional information was
passed onto labels or forms (windowed envelopes are commonly used), that
technically shouldn' t be there.. technically, as long as a street address and a
zip+four exists, the mail should be able to reach its intended destination, the
city and state are only fallbacks, for bad zip codes...and even then, my father
has gotten mail that should've gone to Alaska..or elsewhere..simply because of a
bad zip (even if the city and state were correct). After speaking with him, I
would gather that city/st/zip should, as a general rule, be the last line of
information. If something were to exist in the old records after that
information, it should be moved to the Addr2 field, if possible, since it could
throw off the OCRs.

> I don't have all that much db experience, but using the last line for office
> info is not uncommon. Over here in Canada, the Post Office has gone on
> campaigns over this kind of mess, especially in order to use machine
> readers. Dunno about your neck of woods.

Oh, yes..they've gone on and on about this.. I hear it all from my father (he's
a mailman himself). Their goal is to eliminate the human element from the
address interpretation..via OCR and handwriting recognition. In fact, while
searching for that zip info, I ran across a schedule of remote facility
closings...apparently, all non-recognized printing (handwriting, bad formatting,
etc) is read from a half-dozen locations across the country..even tho the actual
letter may be 1500 miles away; after identification their quasi-AI is updated.

> Ah yes, the great "Attention:"! *IF* we go by the notion that in all cases
> there was a generally deliverable address label, THEN we can assume that if
> the last line fails to parse as either city and state, or zip, or both, then
> you can forget it (put it in your Addr1 field), and the line to parse
> becomes the second line. Right?

Right...as I noted above, if something besides a city/st/zip exists on the last
non-null old address line, it should be relocated to somewhere else in the
address fields. As long as we find the city/st/zip, the new software will
always make them the last line printed, and in that order. If we don't, then st
and zip are empty strings, and whatever was put into city would be the last
visible thing printed.

> <<
> Well..I'm reading these fields as Tstrings, so I will have read the entire
> zip,

I should correct myself...(I was really tired)..they're dynamic strings..

> That said, the 5+4 zip is actually a variety of easy case. Perhaps you can
> have oddball five-digit numbers in there, but you can't seriously have
> 12345-1234 unless it IS a zip code.
>
> BTW, and totally no help to you, Canada copied the British system. This uses
> "a1b 2c3", with space compulsory. It's impossible to have this by accident,
> the system yields so many strings even a small apartment block rates its own
> postal code, and there's a lot of redundancy and sanity-checks built in.

Well, it IS possible to have a Canadian zip code.. since the field is merely a
text string of 10 chars; and Florida being a common destination of
"snow-birds".. I suppose it is possible that for the 6 odd months that they
reside here, if they needed medical attention, a Doctor here would bill someone
for their services..even if that patient's insurance company was in Canada... I
just looked, and indeed there is at least one in the data I'm looking at now :)
Though, how to add the country might pose a problem. Luckily, two letter
abbreviations for Provinces/Territories are used.. but, I suppose a different
form would be used as well. Question is, do you have to put the country name on
an envelope for it to make it up there? Or is the odd zip code enough..?
Again, this could be part of the 'City' field. Here's one (of many) I found:

> PMD,PAY-MED BLUE CROSS CANADA,1,,äääääääääääääääää,0,15E,BLUE CROSS CLAIMS,10 BENTON ROAD,TORONTO ONTARIO CANADA M6M 3G4,8002681479,CLAIMS,0,0,0,0,0,0,0,0,
>

Note that the prov/terr isn't abbreviated properly either. Perhaps just passing
this as is would be the best solution...hmm..but the Addr3 field exceeds 25
chars. Something would need to be done to fix this.. I'll have to check with my
father on Canadian addresses, to get his perspective.. what's yours? Do you
think it would work if I took 'Toronto' and put 'Canada' afterwords would work?
(would look like 'TORONTO, CANADA' for city, then 'ON' for state and 'M6M 3G4'
for zip)

> <<No, not really...which is why in the first generation all I did was copy
> from/to. However, you can sort the list, and print claims or statements, by
> code, state, zip, etc.
> >>

An addendum to this... common practice to is print claims by code... since each
set of claims with identical insurance carriers would be going to the same
place, you can save money on postage by printing all of your 'Aetna' claims
together..and stick 'em in a single envelope (albiet perhaps a manilla
envelope).

> Nice wish, but that assumes that cities are always written the same way. No
> LA, just Los Angeles (anyhow, the real name is Lalaland). No NYC. No Wash.
> DC (a good suggestion anyhow). I say, getting a safe zip and state is hard
> enough, let's leave some looseness for the City field, as long as it doesn't
> confuse the mailman.

I know..I lived in LV,NV for a year... aka 'Lost Wages, NV'... and washing DC is
perhaps the BEST idea yet.. <g> I aggree, insofar as we get a qualified state
and zip, we're in business. If I can get that data from the USPS, then I could
correctly provide the city and state, based solely on zip, even if it were
incorrect to start with.

> <<
> If you look at that list of (output) fields, you'll see that they alot 40
> characters for Addr1 & 2, but only 25 for City, whereas the fields from the
> input file are 30 chars for Addr1-3.
> >>
>
> ?? So you're "stuffing" 90 chars into 105? Let all your problems be of this
> kind! Also, I agree that a city that can't be identified in the first 25
> chars of its name, doesn't fit on the map either. If, once you have settled
> on state and zip, and removed any relevant chars from the original 3-line
> source, you take the 25 chars before that, surely the full city name is in
> there (the problem is to find where it begins).

Which is what my program is currently doing...assuming it finds the state.

> So, if a zip is not so obvious in there a mailman would find it, then the
> record goes to the user-perusal file, definitely.

Yah, at the very least, a list of items that needs to be checked out to be
generated...since the user, once equipped with the knowledge that these records
are flawed, could edit them within the new software.

> function Similar(const S1,S2:AnsiString):Integer;
>
> Ratcliff/Obershelp pattern matching (DDJ, July 88). Conducts an exhaustive
> examination of the two provided strings and returns a percentage ratio (0 -
> 100) representing their similarity.

I'll have to to some searching for this. I haven't read DDJ in years (shame on
me).

> Yes, the zip-code-to-state list. If the state you've found (see above)
> doesn't match the zip, then you can test the zip to see if it's a valid one.
> If valid, then either that or the state is wrong. If the state was a
> two-letter code, then it may well be wrong, less likely otherwise. In any
> case, you may choose to send the record to the user-check file. Bad info is
> worse than no info.

I'd like to think that the zip would not be on a line by itself...as this would
increase problems when dealing with those oddball 5 digit codes that aren't
zips. There are many cases where because of the 911 implementation, city names
have changed... cases where the address was not really within city limits, or
between two cities, etc. There are also cases where the zip may have
changed...mine did a few years ago (it was 32741, but it became 34741), tho I'm
not sure it's worth the trouble of trying to fix those cases. It's the users
fault if they never bothered to correct these... of course I could apply that
logic to all bad records...in which case, I wouldn't fix any problems and they'd
be mad at me.. :)

> <<I guess they want me to buy it..>>
> Well, how much is it?
> PhR

I emailed them..but since today is a holiday (Columbus Day)...I have yet to hear
from them. I noticed a 'subscription'...which entitles you to monthly
updates... sounds expensive. They don't list pricing information,
unfortunately. I still might be able to get the data from the Census Bureau,
tho.. as I believe that branch of the gov't is involved in that as well. The
USPS is a quasi-gov't entity (like Amtrak), whereas the Census Bureau is a
regular division of the dept of Treausury, call them the ugly step-sister to the
IRS. As a side note, the postal service here is actually now turning a profit,
unlike most gov't entities which operate in the red...hopefully, they won't try
to make that profit off of me... Lord knows they keep raising the postage rates
(and haven't given my dad a raise lately) :)

Tnx, Jeff


Philippe Ranger

unread,
Oct 11, 1999, 3:00:00 AM10/11/99
to
Jeff, please don't quote my whole reply, or nearly so, my own quotes
included. Your message was 12 K!

<<
Question is, do you have to put the country name on
an envelope for it to make it up there? Or is the odd zip code enough..?
>>

Oh, if you have Canadian addresses, then it's another ball game! Your state
field is two letters, you can't put 'Can.' in there. And forget about
putting provinces unless you also decode from Canadian postal codes --
Americans have no idea what the two-letter codes for Canadian provinces are,
you'll find just anything in your original data. BTW, there's also a
two-letter code for Canada, CN, but I'm not sure the US Post Office
recognizes it.

The odd zip code is also the British zip code, so it's not enough for the
Post Office to tell the country. And lots of Canadian cities have Brit. city
names too, so not help there. You HAVE to fit in "Can." in there, and the
postal code before that. So, your target db not having a country field (that
is really smart), you'll have to fill the city field with post code and
Can., total 13 chars (space prefix included). An accurate post code and an
accurate street number are enough for mail to be delivered in Canada, so you
can cut the city name to fit in there. You can skip the province -- but
that's tougher, first you'd have to figure out what stood for the province
name in the original.

So, in practice, if you're so lucky that the original ends in "Canada" or
"Can.", then you know you can't fill either zip or state, and you just fill
the city field with what you find, from the end, then the Addr2 field, then
the Addr1 field.

<<
I'll have to to some searching for this. I haven't read DDJ in years (shame
on
me).
>>

Just get HyperString for free, http://efd.home.mindspring.com/tools.htm ,
and pay $39 for the source once satisfied. Obviously, the source gives the
algorithm.

<<
There are many cases where because of the 911 implementation, city names
have changed... cases where the address was not really within city limits,
or
between two cities, etc.
>>

I'm not giving any attention to matching city and zip, for that reason among
others, and as you said the USPS trusts the zip, and only uses the city when
there is no zip (bad sender, extra sticker, extra charge).

PhR


Jeff Buffington

unread,
Oct 11, 1999, 3:00:00 AM10/11/99
to
Philippe Ranger wrote:

> Jeff, please don't quote my whole reply, or nearly so, my own quotes

I thought I cut some of it.. sre :) You don't know how hard it is to not to
post in Ansi... I keep fighting the urge (Netscape asks me each time I hit
send)...<g>

> Americans have no idea what the two-letter codes for Canadian provinces are,

I thought we were all Americans...North, that is.. :) Last time I crossed the
border, I felt like an idiot for responding with 'America', to the question of
"Country of Origin"..I quickly modified my answer to be "United States." Well,
I thought I knew most of them.. though I'd have to look some of them up (NW Terr
for example, no clue). And while I'm on that bandwagon with them, they should
have an area for Puerto Rican Urbanization...or better yet, just another frigin'
address field.. :)

> you'll find just anything in your original data. BTW, there's also a

Tell me about it.. I hadn't even thought about that in my original design..and
it does appear to be a potential problem..similar to the other one, same
result...cut off zip codes (be them US or Can.) I'll have to chastise the mfr
of the billing software, for not alotting an additional field for
country...perhaps they'll get it in for their next release (or update). It
obviously doesn't matter for US addresses, but I know for a fact that many
Canadians, and others, frequent the US, especially this area during winter
months, for periods upwards of 6 months.

> So, in practice, if you're so lucky that the original ends in "Canada" or

Ahh.. I have an idea here... since the zip field isn't really numerical, cut the
'Canada' and put it in the zip field, that should shorten the input field enough
to fit in the City field. My other thinking was to follow the city name with
'Canada', and then put prov/terr abbreviations in the State field, and zip in
zip, but then your postal service might bounce it back.

> I'm not giving any attention to matching city and zip, for that reason among

Actually, a 5+4 zip is enough information to get it to the correct state, city,
street, and even the correct side of the street...but not necessarily the
correct building (be it a house or office building), but it's not a good idea to
drop any of the other info..since the carriers themselves still have to sort
their own flats of mail.
--Jeff--


Philippe Ranger

unread,
Oct 11, 1999, 3:00:00 AM10/11/99
to
<<Jeff:

Ahh.. I have an idea here... since the zip field isn't really numerical, cut
the
'Canada' and put it in the zip field, that should shorten the input field
enough
to fit in the City field.
>>

That's a very good idea (it will wind up on a separate line on the label),
but it may mess anything that tries to go by zip and assumes digits only --
such as mailings preparation -- unless you yourself can code that too, and
recognize "Canada" as a zip code.

<<
My other thinking was to follow the city name with
'Canada', and then put prov/terr abbreviations in the State field, and zip
in
zip, but then your postal service might bounce it back.
>>

Canada? What about the USPS? They'll love you for this!

PhR

Dr John Stockton

unread,
Oct 12, 1999, 3:00:00 AM10/12/99
to
JRS: In article <7ttbqe$pa...@forums.borland.com> of Mon, 11 Oct 1999
14:32:18 in news:borland.public.delphi.objectpascal, Philippe Ranger
<?.?@?.?> wrote:

>Oh, if you have Canadian addresses, then it's another ball game! Your state
>field is two letters, you can't put 'Can.' in there. And forget about
>putting provinces unless you also decode from Canadian postal codes --

>Americans have no idea what the two-letter codes for Canadian provinces are,

>you'll find just anything in your original data. BTW, there's also a

>two-letter code for Canada, CN, but I'm not sure the US Post Office
>recognizes it.

They'd need to be careful, since CN is also an international code for
China. It reminds me that I at first wondered why there were so many
helpful Californians on the Net, till I realised what an E-mail address
ending in ".ca" really means.

>The odd zip code is also the British zip code, so it's not enough for the
>Post Office to tell the country.

I don't recall ever seeing a Canadian code matching the British
patterns; and I'd expect the USPS sorters to be familiar in general with
postcode patterns. Could, for example, N3 2QQ or EC1M 5UJ possibly be
Canadian? I believe that ours all end in <letter><digit><digit>;
exceptions are certainly rare. TQ5 6KF and DE34 8RW represent the
predominant patterns.

--
© John Stockton, Surrey, UK. j...@merlyn.demon.co.uk Turnpike v4.00 MIME. ©
Web <URL: http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.

In MS OE, choose Tools, Options, Send; select Plain Text for News and E-mail.

Philippe Ranger

unread,
Oct 12, 1999, 3:00:00 AM10/12/99
to
<<John:

I believe that ours all end in <letter><digit><digit>;
exceptions are certainly rare. TQ5 6KF and DE34 8RW represent the
predominant patterns.
>>

I was all wet, then, about the codes using the same format. But you mean
digit-letter-letter, not the reverse. Canadian postal codes all have one
format --

letter-digit-letter space digit-letter-digit

PhR

Jeff Buffington

unread,
Oct 13, 1999, 3:00:00 AM10/13/99
to

Philippe Ranger wrote:

> I was all wet, then, about the codes using the same format. But you mean
> digit-letter-letter, not the reverse. Canadian postal codes all have one
> format --
> letter-digit-letter space digit-letter-digit
> PhR

Boy, that makes me glad ours are all numbers... <g> I personally think
numbers are easier to memorize... Drivers license, Social Security, Sales
Tax ID, Telephone, Zip+4... all numbers (tho DL's in Florida do have the
first letter of our last names in the beginning). Of course, being a 'ham'
I'm used to call signs which are alpha-numeric (mine is KC4POD), but that
number of letters/digits in the prefix and suffix can vary from country to
country. In the US, only a single numeric designator is used..but some of
those former Soviet countries... S51, etc.. I hafta spend 10 minutes staring
at my world map just to locate their prefix. Ironically, I also have a
tendency to remember callsigns long before I can remember names.

I remember decoding the format Florida used for licenses when I was a kid...
I thought it was tough...but compared to decoding what I've been dealing
with lately, that _was_ childs play..

On another note.. what do you (or anyone for that matter) think about the
book I picked up for $15? 'Delphi 3 Superbible' .. it appears to be
thorough (roughly 1000 pages), but I've not even made a dent in it yet. I
realize it's a little outdated, since the first version of Delphi I've
touched is 5, but anything is better than nothing, and it makes quite a deal
about the fact that ver 3 is 32 bit/Win95/NT compliant..so I hope it's not
too outdated.

And second.. what's the best way to get started turning my app from a
console based app, into a Win-based app...or would that not be the best
idea? What about creating new Win based apps? My Pascal syntax is flooding
back into my brain..and I'm getting used to some of the new concepts of
Object-Pascal (and of OOP for that matter)...where do I go from here?

At this point, I've done little more than create generic.."do nothing"
Win-apps, just playing with the interface. And, while I'm used to the
concept of Units, I'm not so familiar with DLL's, OCX's and the like, nor
with the whole DPR idea; although I think I get the general idea. Correct
me if I'm wrong, but as I understand it, the DPR is essentially the entire
'program' (or sets of programs that work together), and units are sort of a
cross between the old TPL (libraries of units) and your actual code, a lot
of which may be generated by Delphi itself. I vaughly remember creating my
own units, and libraries, but I do recall thinking the concept was 'kewl' (I
was barely 17 back then <g>).

I know I've got a lot of reading to do..but any pointers on getting
started..and which direction to go?

--Jeff--


Philippe Ranger

unread,
Oct 13, 1999, 3:00:00 AM10/13/99
to
<<Jeff:

Boy, that makes me glad ours are all numbers... <g> I personally think
numbers are easier to memorize...
>>

Ok, so Canada has 24 * 10 * 24 * 10 *24 * 10 = 13 824 000 different codes,
for a tenth of the US population. (24 letters, because I think there are no
i's nor o's.) So, in the US, to achieve equal discrimination, you'd need a
nine-digit zip code. Any bidders?

Oh, and note that alternating digits and numbers makes the system quite
resilient to the most common error, exchanging neighbors. No all-digit
system comes near, the best you can do is have a check code, but the user
can't notice their error themselves.

PhR

Philippe Ranger

unread,
Oct 13, 1999, 3:00:00 AM10/13/99
to
I'm answering in a second post so as to separate the new topic from the old.

<<Jeff:


On another note.. what do you (or anyone for that matter) think about the
book I picked up for $15? 'Delphi 3 Superbible' .. it appears to be
thorough (roughly 1000 pages), but I've not even made a dent in it yet. I
realize it's a little outdated, since the first version of Delphi I've
touched is 5, but anything is better than nothing,
>>

I have a very downmarket opinion of the Super Bible series, but I'm not even
sure that the title belongs just to one series. The DSB may be totally
foreign to this. Anyhow, I don't know it (but I've seen posts here that
hinted the way of my prejudice).

About versions -- there isn't all that much that changes from one to the
next, in the core product, especially since D3. And with each version the
new books have been fewer -- also, they often assume you've read the
previous version of the book, they're more advanced. So there is absolutely
nothing wrong with using a D3 book for basics.

<<
And second.. what's the best way to get started turning my app from a
console based app, into a Win-based app...or would that not be the best
idea?
>>

If it works, don't fix it. If it's a console app, don't move it to GUI (real
GUI, not just a one-dialog shell) for the pleasure. And **do not mix the app
you working on with your interest for building true Winapps**

It's two different concerns, and you are much, much better off finding
something you'd really like to do, NEW, as a Winapp, and setting out on that
project. It has got to be something you want to have, because that'll drive
you, and because it's most important not to just go along with the flow, and
code whatever the IDE makes easy to code. If you want something real, you'll
work steadily towards it, and you'll be able to tell at each point how close
you are to target, what was wrong with the notion to begin with, and what
are the hard and the easy points in doing a Winapp.

<<
What about creating new Win based apps? My Pascal syntax is flooding
back into my brain..and I'm getting used to some of the new concepts of
Object-Pascal (and of OOP for that matter)...where do I go from here?
>>

You have my answer above. Your most immediate problems will be with grokking
the RAD environment, and the general logic of Windows apps, starting with
event-driven structure.

Hope others pipe in with more to-the-point suggestions.

PhR

Mike Orriss (TeamB)

unread,
Oct 13, 1999, 3:00:00 AM10/13/99
to
In article <7u0dba$lp...@forums.borland.com>, Philippe Ranger wrote:
> Canadian postal codes all have one
> format
>

Lucky you - UK postcode formats are a nightmare.

Mike Orriss (TeamB)
(Unless stated otherwise, my replies relate to Delphi 4.03/5.00)
(Unsolicited e-mail replies will most likely be ignored)


Rudy Velthuis

unread,
Oct 13, 1999, 3:00:00 AM10/13/99
to
Philippe Ranger wrote:

<<John:


TQ5 6KF and DE34 8RW represent the predominant patterns.
>>

<<Phr:
letter-digit-letter space digit-letter-digit
>>

One wonders why these are so complicated. In Germany they use 5
digits, period. In the Netherlands they use 4 digits for the city (or
part of it), 2 letters for the street or part of it. Is there any
logic in the zip/postal codes above?
--
Rudy Velthuis

0 new messages