Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Pulling Specific Records from a Text File

7 views
Skip to first unread message

Thomas M

unread,
Apr 20, 2002, 2:58:08 AM4/20/02
to
I work with a large medical database. I need to develop an extract file
to send to one of our customers, which they will use for claims payment.
Our system automatically generates extract files on a nightly basis, but
the extracted data is for ALL of our customers, so I can't simply send
those files because doing so would violate legal mandates to protect
confidentiality.

Quick Background
----------------
Each night our system produces 9 delimited text files. These can be
combined into 3 files for each day--admissions, services, and referrals.
The fields in each record are separated by the pipe symbol (vertical
bar). One of the fields for each record is called "Plan" (as in
insurance plan). The plan code for the records that I need to extract is
"1041".

Goals
-----
1) Copy the 9 files to another folder and combine them into 3 files.
2) Parse each of the 3 files and pull out the records where the plan code
is "1041", and write those records to a new text file.
3) On a regular interval combine the files for each day into one file.
For example, a 5-day work week would give us 15 total files (5 files each
for admissions, services, and referrals). The 5 files for each type
would be combined into 1 file, producing 3 files for the week. The
interval has not been decided yet--it could be daily, weekly, or monthly-
-but there would always be 3 resulting files.
4) Upload the finished extract file to an FTP server. The name of the
uploaded file should contain the current date (i.e. yourdata042202.txt)
5) Archive the delivered extract files for a period of time, then delete
them.

Where I'm At
------------
I will be starting this project next week (4/22). I shouldn't have any
problems with goals 1 and 3. I have no clue how to do goals 2 and 4
without using a manual process. Goal 5 I might be able to figure out,
but wouldn't mind some help there too.

Also, I'm thinking that goals 2 and 3 can be achieved with one operation
by simply writing each record with the right plan code to files that
would just sit there until they get uploaded, eliminating the need to
achieve the 3rd goal with a separate process. After the files are
uploaded, they would be archived, and 3 new files would be created to
recieve the next batch of data.

Any help will be greatly appreciated.

--

Please reply to the newsgroup so that all my learn from you wisdom.

--Tom

Jim Speiser

unread,
Apr 20, 2002, 9:29:58 AM4/20/02
to
For goal 2 I recommend Grep32 from www.oakroadsystems.com. You would
implement as:
Grep32 "1041" sourcefile > targetfile.
Tell Stan I sent ya.
Or you could simply use FIND. Type FIND /? for help.
For goal 4, use an FTP script. Create a file yourscript.txt and place in it
the ftp commands you would normally issue at the ftp> prompt. Then
implement as:
ftp /s:yourscript.txt.


==JJS==


"Thomas M" <ReplyToN...@Please.com> wrote in message
news:MPG.172abfe56...@news.mt.net...

Clay Calvert

unread,
Apr 20, 2002, 11:50:22 AM4/20/02
to
On Sat, 20 Apr 2002 06:58:08 GMT, Thomas M
<ReplyToN...@Please.com> wrote:

2 and 4 are definitely doable. Can you please post an example of what
is in one of these files? Change the names and numbers, for privacy
reasons, but retain any symbols, especially the pipes. About 4 or 5
lines should do, and with any header rows.

Clay Calvert
Replace "W" with "L" in email.

Thomas M

unread,
Apr 22, 2002, 11:33:43 AM4/22/02
to
Thanks for the feedback. I'll definitely look at those options. I do
have WS_FTP Pro installed, but have only used it once. I'll see if that
can create an FTP script.

At least now I know where to start on goals 2 and 4.

--

Please reply to the newsgroup so that all may learn
from your wisdom.

--Tom

In article <3cc16...@newsa.ev1.net>, jimsp...@yahoo.com says...

Thomas M

unread,
Apr 22, 2002, 11:36:14 AM4/22/02
to
Clay,

It will take me a while to build the file because I've got some other
issues going today. However, once I get to the that task, it shouldn't
take too long. I will try to post that by the end of the day (4/22).

--

Please reply to the newsgroup so that all may learn
from your wisdom.

--Tom

In article <3us2cu48q62memgob...@4ax.com>,
ccal...@Wanguru.com says...

Thomas M

unread,
Apr 22, 2002, 3:23:51 PM4/22/02
to
> For goal 2 I recommend Grep32 from www.oakroadsystems.com. You would
> implement as:
> Grep32 "1041" sourcefile > targetfile.
> Tell Stan I sent ya.
> Or you could simply use FIND. Type FIND /? for help.
> For goal 4, use an FTP script. Create a file yourscript.txt and place in it
> the ftp commands you would normally issue at the ftp> prompt. Then
> implement as:
> ftp /s:yourscript.txt.

One thing I neglected to consider, and therefore mention in my original
post, is that our corporate policy is to PGP encrypt files that contain
confidential data, then upload those files to the FTP server. Can an
FTP script also handle PGP encryption?

--

Please reply to the newsgroup so that all may learn
from your wisdom.

--Tom

Thomas M

unread,
Apr 22, 2002, 8:06:52 PM4/22/02
to
Clay,

Here is a sample file containing 20 records, 5 of which have the 1041
plan code (benefit_plan column out near the end of each record). The
names you see in the file belong to case managers and NOT members. Just
wanted to make that clear.

--

Please reply to the newsgroup so that all may learn
from your wisdom.

--Tom

In article <3us2cu48q62memgob...@4ax.com>,
ccal...@Wanguru.com says...

Clay Calvert

unread,
Apr 22, 2002, 8:51:48 PM4/22/02
to
On Mon, 22 Apr 2002 18:06:52 -0600, Thomas M <mis...@hotmail.com>
wrote:

>Clay,
>
>Here is a sample file containing 20 records, 5 of which have the 1041
>plan code (benefit_plan column out near the end of each record). The
>names you see in the file belong to case managers and NOT members. Just
>wanted to make that clear.

Tom,

I don't think the attachment made it to my server. Many servers block
attachments to non-binary groups. The message is too new to be in
Google. Can you just post the text here in the body of the message?
You can separate records with a double carriage return, if necessary.

Thomas M

unread,
Apr 22, 2002, 9:23:20 PM4/22/02
to
In article <bib9cus9oa37n5uno...@4ax.com>,
ccal...@Wanguru.com says...

Yeah, I noticed that. I think it was the server that I use. My employer
does not provide access to a good news server, so I have to hunt for free
servers, and I think the one I use doesn't allow attachments. My
newsreader gave me an error immediately. So I'm trying it again right
now (from home--better server). If it doesn't work this time, I'll just
post the data in the body of the message as you suggested.

--

Please reply to the newsgroup so that all my learn from you wisdom.

--Tom

Clay Calvert

unread,
Apr 22, 2002, 9:43:01 PM4/22/02
to
On Tue, 23 Apr 2002 01:23:20 GMT, Thomas M
<ReplyToN...@Please.com> wrote:

>Yeah, I noticed that. I think it was the server that I use. My employer
>does not provide access to a good news server, so I have to hunt for free
>servers, and I think the one I use doesn't allow attachments. My
>newsreader gave me an error immediately. So I'm trying it again right
>now (from home--better server). If it doesn't work this time, I'll just
>post the data in the body of the message as you suggested.

Didn't get it.

Alvin

unread,
Apr 22, 2002, 9:55:43 PM4/22/02
to
This is something which I do for FTP with date stamp.
I don't know whether it able to do encryption.

cd /d c:\temp
DATE/T > DATE.TXT
FOR /F "eol=; tokens=2,3 delims=/ " %%k in (c:\temp\date.txt) do set mymth=%%k
FOR /F "eol=; tokens=2,3 delims=/ " %%k in (c:\temp\date.txt) do set myday=%%l
del date.txt

cd /d d:\Medical\out
copy <Your_Customer.file> Your_Customer%computername%_%mymth%%myday%.alv
del ftp.txt
echo open <IP address> >> ftp.txt
echo user <user_ID> >> ftp.txt
echo <Password> >> ftp.txt
echo cd <directory to deposit> >> ftp.txt
echo cd <Sub directory> >> ftp.txt
echo prompt >> ftp.txt
echo bin >> ftp.txt
echo put Your_Customer%computername%_%mymth%%myday%.alv >> ftp.txt
echo quit >> ftp.txt
cmd /c ftp -s:ftp.txt

If you need more clarification you can mail me, I be more than happy to help you.

Thomas M <mis...@hotmail.com> wrote in message news:<MPG.172ddbd5a...@News.CIS.DFN.DE>...

Thomas M

unread,
Apr 23, 2002, 2:34:56 PM4/23/02
to
In article <k0f9cukaeoc3ctukr...@4ax.com>,
ccal...@Wanguru.com says...

Could this be any MORE difficult! :-) I was actually able to post it
successfully from home last night. I could even see the file after it
was posted. Anyway, here's the data. I've separated each record with a
double carriage return.

transaction_type|member_id|admission_date|event_status|external_cm_id|fa
cility_id|row_create_date|row_last_updated|temp_authorization_number|act
ual_discharge_date|actual_los|admission_comment|admission_diagnosis1
|admission_diagnosis2|admission_diagnosis3|admission_diagnosis4
|admission_procedure1|admission_procedure1_date|admission_procedure2
|admission_procedure2_date|admission_procedure3|admission_procedure3
_date|admission_procedure4|admission_procedure4
_date|admission_reason|admission_source|admission_type|admitting_provide
r|approved_clinical_days|approved_financial_days|authorization_date|auth
orization_number|authorized_admission_date|authorized_los|benefit_level|
communication_type|criteria_not_met_days|decision_sent_date|denied_clini
cal_days|denied_financial_days|discharge_diagnosis1|discharge_diagnosis2
|discharge_diagnosis3|discharge_diagnosis4
|discharge_status|drg_admission|drg_discharge|estimated_cost|event_read_
only|event_status_reason|facility_mrn|leakage_reason|los_variance_reason
|attending_provider|patient_room|readmission|request_date|target_dischar
ge_date|target_los|user_admission1|user_admission2
|original_auth_number|last_approved_day|user_admission3|user_admission4
|user_admission5|user_admission6|user_admission7|user_admission_note1
|user_admission_note2|user_admission_note3|user_admission_note4
|user_admission_note5
|event_mgmt_status|date_closed|close_reason|days_requested|benefit_plan|
benefit_product|benefit_carrier|benefit_region|benefit_subprogram|benefi
t_group|enrollment_date

AE|2195623|2002-04-08 12:00:00|6|RChristians|14458|2002-03-29 03:03:02|
2002-04-11 08:33:56|Y|2002-04-10 12:00:00|2|2 days app'd.|555.9|556.9|
729.1||||||||||EL|02|SU|2006145|0|0||A43430|2002-04-08 12:00:00|2|65|P||
2002-04-09 10:29:00|0|0|||||22||||N|11||||||N|2002-03-29 03:00:00
|||||A43430|2002-04-09||||||||||||||2|1020|1040|***|***|***|37020|2001-
06-01

AE|1423779|2002-04-02 12:00:00|4|RChristians|13959|2002-04-01 10:43:09|
2002-04-11 02:48:44|Y||||205.0||||96414|2002-04-02|||||||EM|02|ME|
1344147|0|0||A43480|2002-04-02 12:00:00|5|65|P||2002-04-01 02:10:44|0|0
|||||||||N|41||||||N|2002-04-01 10:30:00|||||A43480|||||||||||||||10|
1020|1040|***|***|***|37147|2002-02-01

AE|1194818|2002-03-30 12:00:00|6|SAber|13879|2002-04-01 10:59:13|2002-
04-11 09:45:47|Y|2002-04-06 12:00:00|7|approved 9 days....transferred 2d
early|431.|802.8|512.|864.01|01.24|2002-03-30|||||||EM|03|ME|2004645|0|0
||A43484|2002-03-30 12:00:00|9|65|F||2002-04-02 02:36:50|0|0|||||6||||N|
11||||||N|2002-04-01 10:45:00|2002-04-06 12:00:00|7|||A43484||||8|5
||||||||||10|1020|1077|***|***|***|17392|2002-01-23

AE|2392083|2002-03-30 12:00:00|6|TVanDuren|18751|2002-04-01 11:42:17|
2002-04-11 08:41:31|Y|2002-04-10 12:00:00|11|11 days app|490.|433.1|
964.2|790.|31629|2002-04-04|88.41|2002-04-09|||||EM|03|ME|2223646|11|11
||A43490|2002-03-30 12:00:00|3|65|P||2002-04-01 05:28:16|0|0|||||22
||||N|11||||||N|2002-04-01 11:30:00|||||A43490|||||10||||||||||11|1042|
1150|***|***|***|38959|2002-01-01

AE|1691707|2002-03-29 12:00:00|4|CReagor|14458|2002-04-01 02:26:43|2002-
04-11 08:28:10|Y||||428.0|415.0|401.9|780.57|||||||||EM|03|ME|1353865|12
|12||A43529|2002-03-29 12:00:00|4|65|P||2002-04-01 02:37:53|0|0
|||||||||N|41||||||N|2002-04-01 02:15:00|||||A43529|||||17||||||||||12|
1020|1040|***|***|***|14449|2001-07-01

AE|2287152||6|CMcDonald|19791|2002-04-01 02:59:18|2002-04-10 09:46:24
|Y|||cancelled rehab admission|803.3||||||||||||EL|05|ME|2380323
||||A43534|2002-04-02 12:00:00||60|P|||||||||||||N|19||||||N|2002-04-01
02:45:00|||||A43534|||||5|||||||||||1041|1140|***|***|***|1329515|2000-
10-01

AE|2287152|2002-04-01 12:00:00|6|CMcDonald|14246|2002-04-01 03:25:29|
2002-04-11 03:21:16|Y|2002-04-11 12:00:00|10|10d los approved|803.3|
873.1|881.12|872.79|||||||||EL|05|SU|2380324|10|10||A43539|2002-04-01
12:00:00|8|65|P||2002-04-02 11:00:00|0|0|||||3||||N|11||||||Y|2002-04-01
03:15:00|||||A43539|||||5||||||||||10|1041|1140|***|***|***|1329515|
2000-10-01

AE|1358364|2002-03-24 12:00:00|6||18067|2002-03-25 09:32:20|2002-04-10
09:42:06|Y|2002-03-27 12:00:00|3|3 d approved|414.0|410.90|||||||||||EM|
03|ME|2376523|0|0||A43072|2002-03-24 12:00:00|3|65|P||2002-03-25
12:17:53|0|0|||||22||||N|11||||||N|2002-03-25 09:15:00|||||A43072
|||||||||||||||3|1024|1049|***|***|***|23530|2001-10-01

AE|2351244|2002-03-25 12:00:00|4|BMerkley|19715|2002-03-25 10:46:11|
2002-04-11 09:00:46|Y||||436.|250.|401.9||||||||||EL|05|ME|2380158|17|17
||A43090|2002-03-25 12:00:00|5|60|P|||0|0|||||||||N|41||||||N|2002-03-25
10:30:00|||||A43090|||||5||||||||||30|1042|1150|***|***|***|1329559|
2002-01-01

AE|2284879|2002-03-22 12:00:00|6|AGail|16238|2002-03-25 12:13:08|2002-
04-11 12:33:10|Y|2002-03-24 12:00:00|2|4d equal day approved, 3d
los|V22.2||||74.99|2002-03-22|||||||EM|02|OB|2379991|0|0||A43112|2002-
03-22 12:00:00|4|65|P|||0|0|||||22||||N|11||||||N|2002-03-25 12:00:00
|||||A43112||||1|20||||||||||4|1041|1140|***|***|***|1329518|2001-01-01

AE|2391855|2002-03-23 12:00:00|6|AGail|19717|2002-03-25 01:02:46|2002-
04-10 09:46:03|Y|2002-03-25 12:00:00|2|2 days approved|644.2||||73.59|
2002-03-25|||||||EM|03|OB|2380157|0|0||A43119|2002-03-23 12:00:00|2|65
|P||2002-03-26 03:09:30|0|0|||||4||||N|11||||||N|2002-03-25 01:00:00
|||||A43119|||||20||||||||||2|1042|1156|***|***|***|1329576|2002-02-01

AE|2257698||4|KBuffingt|13912|2002-03-26 03:31:48|2002-04-11 11:31:19
|Y||||562.11||||45.74|2002-04-08|||||||EL|02|SU|2005227||||A43215|2002-
04-08 12:00:00|4|65|P||2002-03-26 04:25:40|||||||||||N|41||||||N|2002-
03-26 03:15:00|||||A43215|||||||||||||||5|1020|1040|***|***|***|37882|
2001-07-01

AE|2287108||4|MNelson|19698|2002-03-27 09:21:26|2002-04-10 09:45:49
|Y||||724.02|722.52|||22612|2002-04-09|22842|2002-04-09|22614|2002-04-09
|20937|2002-04-09|EL|02|SU|2380110|0|0||A43235|2002-04-09 12:00:00|3|65
|P||2002-03-27 09:31:07|0|0|||||||||N|41||||||N|2002-03-27 09:15:00
|||||A43235|||||||||||||||3|1041|1140|***|***|***|1329519|2001-02-01

AE|1722397|2002-04-03 12:00:00|4|AGail|13890|2002-04-04 11:52:59|2002-
04-11 10:41:55|Y||||644.||||||||||||EM|05|OB|2006250|8|8||A43734|2002-
04-03 12:00:00|3|65|P||2002-04-04 12:22:48|0|0|||||||||N|41||||||N|2002-
04-04 11:45:00|||||A43734|||||20||||||||||8|1020|1077|***|***|***|17396|
2000-08-30

AE|2348626|2002-04-04 12:00:00|6|Boutsikaris|19202|2002-04-04 01:11:53|
2002-04-11 03:13:57|Y|2002-04-09 12:00:00|5|5 days app|578.9|287.5|285.9
||86999|2002-04-04|86999|2002-04-06|86999|2002-04-07|86999|2002-04-08
|EM|03|ME|1791930|0|0||A43742|2002-04-04 12:00:00|5|65|P||2002-04-09
05:02:23|0|0|||||17||||N|11||||||N|2002-04-04 01:00:00|||||A43742|||||18
||||||||||5|1042|1152|***|***|***|38972|2002-01-01

AE|1417836|2002-01-12 12:00:00|6|DBirrer|19156|2002-04-04 02:48:42|2002-
04-11 02:09:48|Y|2002-01-18 12:00:00|6|6 days approved/RETRO
REVIEW,adult ip psych, discharged 1/18/02|311.||||||||||||EM|03|PS|
2376739|6|6|2002-04-09 12:00:00|A43750|2002-01-12 12:00:00|6|01|0|||0|0
|||||19||||N|39||||2376739||N|2002-04-04 02:45:00|||||A43750|2002-01-17
||||||||||||||6|||||||

AE|1396170|2002-04-05 12:00:00|6|JMoody|14458|2002-04-04 03:21:07|2002-
04-11 08:31:03|Y|2002-04-10 12:00:00|5|5 days app'd.|486.||||||||||||EM|
05|ME|2005286|5|5||A43758|2002-04-05 12:00:00|1|65|P||2002-04-05
08:19:44|0|0|||||22||||N|11||||||N|2002-04-04 03:15:00|||||A43758|2002-
04-09||||||||||||||5|1020|1040|***|***|***|26017|1999-04-01

AE|2276380|2002-04-05 12:00:00|4|TVanDuren|14229|2002-04-04 04:53:47|
2002-04-11 08:21:11|Y||||806.2|344.0|599.0||93.89|2002-04-05|||||||EL|05
|ME|2014528|0|0||A43761|2002-04-05 12:00:00|7|60|P||2002-04-05 10:58:36|
0|0|||||||||N|41||||||N|2002-04-04 04:45:00|||||A43761|||||1||||||||||37
|1040|1139|***|***|***|1329465|2001-09-01

AE|1616879|2002-04-03 12:00:00|4|CReagor|13872|2002-04-05 09:02:11|2002-
04-11 10:22:41|Y||||780.09|V44.3|998.6|V15.2|||||||||EL|02|ME|1345498|7|
7||A43769|2002-04-03 12:00:00|3|65|P||2002-04-05 04:02:55|0|0|||||||||N|
41||||||N|2002-04-05 08:45:00|||||A43769|||||2||||||||||7|1020|1053
|***|***|***|15759|1997-11-01

AE|2316842|2002-04-04 12:00:00|6||19453|2002-04-05 09:19:19|2002-04-11
10:45:29|Y|2002-04-05 12:00:00|1|1d approved|786.5||||||||||||EM|03|ME|
2399627|0|0||A43776|2002-04-04 12:00:00|1|65|P||2002-04-11 10:08:03|0|0
|||||4||||N|11||||||N|2002-04-05 09:15:00|||||A43776|||||||||||||||1|
1041|1140|***|***|***|1329519|2001-12-01

--

Please reply to the newsgroup so that all may learn
from your wisdom.

--Tom

Phil Robyn

unread,
Apr 23, 2002, 8:12:52 PM4/23/02
to

Simply doing a 'findstr /c:"|1041|" is probably not sufficient, because you really only
want the records where the value of the 78th data field is '1041'. So you will have to
go to the trouble to examine all of the fields in each record and only save those records
whose 78th field is '1041'.

Based on your example data, I get four hits on '1041' in the 78th field.

saverec1=AE\2287152\2002-04-01 12:00:00\6\CMcDonald\14246\2002-04-01 03:25:29\2002-04-11 03:21:16\Y\2002-04-11 12:00:00\10\10d los approved\
803.3\873.1\881.12\872.79\\\\\\\\\EL\05\SU\2380324\10\10\\A43539\2002-04-01 12:00:00\8\65\P\\2002-04-02 11:00:00\0\0\\\\\3\\\\N\11\\\\\\Y\20


02-04-01 03:15:00\\\\\A43539\\\\\5\\\\\\\\\\10\1041\1140\***\***\***\1329515\2000-10-01

saverec2=AE\2284879\2002-03-22 12:00:00\6\AGail\16238\2002-03-25 12:13:08\2002-04-11 12:33:10\Y\2002-03-24 12:00:00\2\4d equal day approved,
3d los\V22.2\\\\74.99\2002-03-22\\\\\\\EM\02\OB\2379991\0\0\\A43112\2002-03-22 12:00:00\4\65\P\\\0\0\\\\\22\\\\N\11\\\\\\N\2002-03-25 12:0


0:00\\\\\A43112\\\\1\20\\\\\\\\\\4\1041\1140\***\***\***\1329518\2001-01-01

saverec3=AE\2287108\\4\MNelson\19698\2002-03-27 09:21:26\2002-04-10 09:45:49\Y\\\\724.02\722.52\\\22612\2002-04-09\22842\2002-04-09\22614\20
02-04-09\20937\2002-04-09\EL\02\SU\2380110\0\0\\A43235\2002-04-09 12:00:00\3\65\P\\2002-03-27 09:31:07\0\0\\\\\\\\\N\41\\\\\\N\2002-03-27 09


:15:00\\\\\A43235\\\\\\\\\\\\\\\3\1041\1140\***\***\***\1329519\2001-02-01

saverec4=AE\2316842\2002-04-04 12:00:00\6\\19453\2002-04-05 09:19:19\2002-04-11 10:45:29\Y\2002-04-05 12:00:00\1\1d approved\786.5\\\\\\\\\\


\\EM\03\ME\2399627\0\0\\A43776\2002-04-04 12:00:00\1\65\P\\2002-04-11 10:08:03\0\0\\\\\4\\\\N\11\\\\\\N\2002-04-05 09:15:00\\\\\A43776\\\\\\
\\\\\\\\\1\1041\1140\***\***\***\1329519\2001-12-01

C:\cmd>rlist demo\read1041.cmd
=====begin C:\cmd\demo\read1041.cmd====================
01. @echo off
02. setlocal
03. set saverec_ctr=0
04. for /f "tokens=*" %%a in (
05. 'findstr /c:"|1041|" c:\temp\yourdata.txt'
06. ) do call :loadfields "%%a"
07. set /a saverec_ctr-=1
08. for /l %%a in (1,1,%saverec_ctr%) do call :display %%a
09. goto :EOF
10. :loadfields
11. set field_ctr=0
12. set rec=%*
13. set rec=%rec:|=\#%
14. set rec=%rec:"=%
15. set this_rec=%rec:#=%
16. :loadloop
17. set /a field_ctr+=1
18. for /f "tokens=1* delims=\" %%a in (
19. 'echo %rec%'
20. ) do (
21. set one_field=%%a
22. set rec=%%b
23. )
24. if %field_ctr% NEQ 78 goto :continue
25. set one_field=%one_field:#=%
26. if "%one_field%" EQU "1041" (
27. set /a saverec_ctr+=1
28. set saverec%saverec_ctr%=%this_rec%
29. )
30. :continue
31. if defined rec goto :loadloop
32. goto :EOF
33. :display
34. set num=%1
35. echo.
36. set saverec%num%
37. goto :EOF
=====end C:\cmd\demo\read1041.cmd====================


--

u n z i p m y a d d r e s s t o s e n d e - m a i l

Thomas M

unread,
Apr 23, 2002, 9:25:03 PM4/23/02
to
In article <3CC5F884...@uclink.berkzipeley.edu>,
pro...@uclink.berkzipeley.edu says...

> Simply doing a 'findstr /c:"|1041|" is probably not sufficient, because you really only
> want the records where the value of the 78th data field is '1041'. So you will have to
> go to the trouble to examine all of the fields in each record and only save those records
> whose 78th field is '1041'.
>
> Based on your example data, I get four hits on '1041' in the 78th field.
>
> saverec1=AE\2287152\2002-04-01 12:00:00\6\CMcDonald\14246\2002-04-01 03:25:29\2002-04-11 03:21:16\Y\2002-04-11 12:00:00\10\10d los approved\
> 803.3\873.1\881.12\872.79\\\\\\\\\EL\05\SU\2380324\10\10\\A43539\2002-04-01 12:00:00\8\65\P\\2002-04-02 11:00:00\0\0\\\\\3\\\\N\11\\\\\\Y\20
> 02-04-01 03:15:00\\\\\A43539\\\\\5\\\\\\\\\\10\1041\1140\***\***\***\1329515\2000-10-01
>
> saverec2=AE\2284879\2002-03-22 12:00:00\6\AGail\16238\2002-03-25 12:13:08\2002-04-11 12:33:10\Y\2002-03-24 12:00:00\2\4d equal day approved,
> 3d los\V22.2\\\\74.99\2002-03-22\\\\\\\EM\02\OB\2379991\0\0\\A43112\2002-03-22 12:00:00\4\65\P\\\0\0\\\\\22\\\\N\11\\\\\\N\2002-03-25 12:0
> 0:00\\\\\A43112\\\\1\20\\\\\\\\\\4\1041\1140\***\***\***\1329518\2001-01-01
>
> saverec3=AE\2287108\\4\MNelson\19698\2002-03-27 09:21:26\2002-04-10 09:45:49\Y\\\\724.02\722.52\\\22612\2002-04-09\22842\2002-04-09\22614\20
> 02-04-09\20937\2002-04-09\EL\02\SU\2380110\0\0\\A43235\2002-04-09 12:00:00\3\65\P\\2002-03-27 09:31:07\0\0\\\\\\\\\N\41\\\\\\N\2002-03-27 09
> :15:00\\\\\A43235\\\\\\\\\\\\\\\3\1041\1140\***\***\***\1329519\2001-02-01
>
> saverec4=AE\2316842\2002-04-04 12:00:00\6\\19453\2002-04-05 09:19:19\2002-04-11 10:45:29\Y\2002-04-05 12:00:00\1\1d approved\786.5\\\\\\\\\\
> \\EM\03\ME\2399627\0\0\\A43776\2002-04-04 12:00:00\1\65\P\\2002-04-11 10:08:03\0\0\\\\\4\\\\N\11\\\\\\N\2002-04-05 09:15:00\\\\\A43776\\\\\\
> \\\\\\\\\1\1041\1140\***\***\***\1329519\2001-12-01

Phil,

This almost works. The sample data actually contains 5 records where
the 78th field is 1041, so the code is not picking up one of them. I
haven't had a chance to study your code yet (I'll work on breaking it
down tomorrow so I can understand it), but I have an idea for possibly
fixing the problem.

Our system generates two files--one for new admissions and one for
changed admissions. The sample data I posted contained both because I
had combined the files. I could just keep the files separate until
AFTER they've been processed by your code and THEN combine them. But
I'll have to understand your code first before I'll know whether or not
that will solve the issue.

Thanks for your help! Once I get into it, I might have a few questions
on how the heck your code works!

Phil Robyn

unread,
Apr 24, 2002, 1:02:27 AM4/24/02
to

OK, try this version instead:

=====begin c:\cmd\demo\read1041.cmd====================


01. @echo off
02. setlocal
03. set saverec_ctr=0
04. for /f "tokens=*" %%a in (
05. 'findstr /c:"|1041|" c:\temp\yourdata.txt'
06. ) do call :loadfields "%%a"

07. for /l %%a in (1,1,%saverec_ctr%) do call :display %%a
08. goto :EOF
09. :loadfields
10. set field_ctr=0
11. set rec=%*
12. set rec=%rec:|=\#%
13. set rec=%rec:"=%
14. set this_rec=%rec:#=%
15. :loadloop
16. set /a field_ctr+=1
17. for /f "tokens=1* delims=\" %%a in (
18. 'echo %rec%'
19. ) do (
20. set one_field=%%a
21. set rec=%%b
22. )
23. if %field_ctr% NEQ 78 goto :continue
24. set one_field=%one_field:#=%
25. if "%one_field%" EQU "1041" call :saverec
26. :continue
27. if defined rec goto :loadloop
28. goto :EOF
29.
30. :saverec
31. set /a saverec_ctr+=1
32. set saverec%saverec_ctr%=%this_rec%
33. goto :EOF
34.
35. :display
36. set num=%1
37. echo.
38. set saverec%num%
39. goto :EOF
=====end c:\cmd\demo\read1041.cmd====================

>
> Our system generates two files--one for new admissions and one for
> changed admissions. The sample data I posted contained both because I
> had combined the files. I could just keep the files separate until
> AFTER they've been processed by your code and THEN combine them. But
> I'll have to understand your code first before I'll know whether or not
> that will solve the issue.
>
> Thanks for your help! Once I get into it, I might have a few questions
> on how the heck your code works!
>
> --
>
> Please reply to the newsgroup so that all may learn
> from your wisdom.
>
> --Tom

Tom:

I hope that these are not big files, because doing this in a batch file
is very slow. I only posted my version to indicate how it might be done
and to show that it is indeed possible to do this in a batch file. But
if I were doing this 'for real' on large files I probably would not want
to do it with a batch file.

In order to escape from the vexation of having to deal with a myriad pipe
'|' characters, I have changed them to backslashes (line 12). If it is
important to you to retain the '|' characters as field delimiters, then
you will have to change the backslashes back to pipe characters at some
point.

Have fun!

--
Phil Robyn
Univ. of California, Berkeley

Ritchie Lawrence

unread,
Apr 24, 2002, 5:54:23 AM4/24/02
to
========================== start =============================
@echo off&setlocal
for /f "tokens=*" %%a in (dat.pip) do set rec="%%a"&call:go
goto:eof

:go
set dat=%rec:|=|~%
for /f "tokens=31* delims=|" %%a in (%dat%) do (
for /f "tokens=31* delims=|" %%c in ("%%b") do (
for /f "tokens=16 delims=|" %%e in ("%%d") do (
if {%%e}=={~1041} echo %rec%
)
)
)
======================= end =====================================

--
Ritchie
Undo address for email.

"Thomas M" <mis...@hotmail.com> wrote in message news:MPG.172f573f6...@News.CIS.DFN.DE...

Thomas M

unread,
Apr 24, 2002, 5:21:13 PM4/24/02
to
In article <3CC63C63...@uclink.berkzipeley.edu>,
pro...@uclink.berkzipeley.edu says...

> Tom:
>
> I hope that these are not big files, because doing this in a batch file
> is very slow. I only posted my version to indicate how it might be done
> and to show that it is indeed possible to do this in a batch file. But
> if I were doing this 'for real' on large files I probably would not want
> to do it with a batch file.
>
> In order to escape from the vexation of having to deal with a myriad pipe
> '|' characters, I have changed them to backslashes (line 12). If it is
> important to you to retain the '|' characters as field delimiters, then
> you will have to change the backslashes back to pipe characters at some
> point.
>
> Have fun!

The system generates these files nightly, which keeps the file size
down. The average file seems to running between 1KB and 50KB, with
approximately half of those being smaller than 25KB. There is also a
demographics file which usually runs around 100KB.

I looked at the files that the system has generated over the last two
months and there was only one thing that concerned me. One day, the
system produced a 29MB demographics file. I suspect that is a fluke
because none of the other demographics files where anywhere close to
that size. Nonetheless, if it happened once, it could happen again.
The good news is that the system generates the files at 5:10 PM every
night, so the batch file would have all night to do the processing.
Maybe I'll copy that 29MB file over to my machine, start processing it
with your code at 5:00 tonight, and see if it's done by tomorrow
morning.

Anyway, my short-term goal is to get this working in a relatively
straight forward manner. Once I get to that point, I will look at
writing some VB code, or maybe trying to develop an automated process
for pulling the files into Access, running the query there, and
outputting the results. I'm not very familiar with VB or Access, and I
thought those options would take me longer to get up and running, which
is why I decided to pursue the batch method first.

With any luck, I'll have this running with VB or Access before the
system decides to spit out another 29MB file! :-)

Thomas M

unread,
Apr 24, 2002, 5:19:52 PM4/24/02
to
> Tom:
>
> I hope that these are not big files, because doing this in a batch file
> is very slow. I only posted my version to indicate how it might be done
> and to show that it is indeed possible to do this in a batch file. But
> if I were doing this 'for real' on large files I probably would not want
> to do it with a batch file.

Phil, I ran your code on a somewhat large file, and it didn't take all
that long, so I went ahead and tested it on the 29MB file. I inserted a
command at the beginning to capture the start time, and the same command
at the end to record the end time, and it processed the entire file in
22 minutes. Since this will definitely be an overnight process, I think
we'll be okay with that, especially since a file that large is very
rare.

BTW, I intended to mention in my earlier post that your new code picks
up all the records in the sample data. Thanks! I'm still working on
figuring it all out.

> In order to escape from the vexation of having to deal with a myriad pipe
> '|' characters, I have changed them to backslashes (line 12). If it is
> important to you to retain the '|' characters as field delimiters, then
> you will have to change the backslashes back to pipe characters at some
> point.

Is there any particular advantage to one or the other? Our system
generates the files with the '|' so I just kept them, but I don't see
any reason why backslashes couldn't be used.

Phil Robyn

unread,
Apr 24, 2002, 7:35:21 PM4/24/02
to
Ritchie Lawrence wrote:
>
> ========================== start =============================
> @echo off&setlocal
> for /f "tokens=*" %%a in (dat.pip) do set rec="%%a"&call:go
> goto:eof
>
> :go
> set dat=%rec:|=|~%
> for /f "tokens=31* delims=|" %%a in (%dat%) do (
> for /f "tokens=31* delims=|" %%c in ("%%b") do (
> for /f "tokens=16 delims=|" %%e in ("%%d") do (
> if {%%e}=={~1041} echo %rec%
> )
> )
> )
> ======================= end =====================================
>
> --
> Ritchie
> Undo address for email.
>

Very, very nice! Thanks for posting this.

--

Thomas M

unread,
Apr 24, 2002, 7:52:47 PM4/24/02
to
In article <3cc68...@mk-nntp-1.news.uk.worldonline.com>,
rlaw...@commanddoline.co.uk says...

> ========================== start =============================
> @echo off&setlocal
> for /f "tokens=*" %%a in (dat.pip) do set rec="%%a"&call:go
> goto:eof
>
> :go
> set dat=%rec:|=|~%
> for /f "tokens=31* delims=|" %%a in (%dat%) do (
> for /f "tokens=31* delims=|" %%c in ("%%b") do (
> for /f "tokens=16 delims=|" %%e in ("%%d") do (
> if {%%e}=={~1041} echo %rec%
> )
> )
> )
> ======================= end =====================================
>
> --
> Ritchie
> Undo address for email.

Ritchie,

I have two quick questions:

1) Is this an alternative to Phil's code? I haven't quite figured out
all the commands and the syntax, so I'm not 100% sure what I am looking
at, but I'm working on improving my understanding.

2) What exactly is a token? When I type "for /?" I get information that
includes:

" Processing consists of reading in the file, breaking it up into
individual lines of text and then parsing each line into zero or more
tokens."

But it never defines the term "token". Would I be correct in assuming
that a token is equivalent to a field of data, so that if a text file
has 84 fields per record, each line would have 84 tokens?

Ritchie Lawrence

unread,
Apr 25, 2002, 3:25:24 AM4/25/02
to
"Thomas M" <mis...@hotmail.com> wrote in message news:MPG.1730f33cc...@News.CIS.DFN.DE...

> In article <3cc68...@mk-nntp-1.news.uk.worldonline.com>,
> rlaw...@commanddoline.co.uk says...
> > ========================== start =============================
> > @echo off&setlocal
> > for /f "tokens=*" %%a in (dat.pip) do set rec="%%a"&call:go
> > goto:eof
> >
> > :go
> > set dat=%rec:|=|~%
> > for /f "tokens=31* delims=|" %%a in (%dat%) do (
> > for /f "tokens=31* delims=|" %%c in ("%%b") do (
> > for /f "tokens=16 delims=|" %%e in ("%%d") do (
> > if {%%e}=={~1041} echo %rec%
> > )
> > )
> > )
> > ======================= end =====================================
> >
> > --
> > Ritchie
> > Undo address for email.
>
> Ritchie,
>
> I have two quick questions:
>
> 1) Is this an alternative to Phil's code? I haven't quite figured out
> all the commands and the syntax, so I'm not 100% sure what I am looking
> at, but I'm working on improving my understanding.


Pardon me, my description of what I posted was rather brief <g>. Replace
the text 'dat.pip' with the name of your data file. Calculations suggested
it would parse your 29MB file in 10.57seconds (which I do find hard to
believe, could you confirm?).


> 2) What exactly is a token? When I type "for /?" I get information that
> includes:
>
> " Processing consists of reading in the file, breaking it up into
> individual lines of text and then parsing each line into zero or more
> tokens."
>
> But it never defines the term "token". Would I be correct in assuming
> that a token is equivalent to a field of data, so that if a text file
> has 84 fields per record, each line would have 84 tokens?


You can think of tokens as fields (in other words, the info between delimiters)
The FOR /F command allows you to specify the delimiter character(s) and the tokens
of interest. Its probably NT's most powerful batch command, you'll find it used
in virtually every other solution.

In addition to the CMD help, also see the GUI help, in NT4, click the Start menu and
select help. If not already selected, choose the Contents tab, then double-click 'Windows NT Commands'. In W2K, Start | Help |
Contents | Reference | MS-DOS Commands.

Both O/S's allow you to bookmark pages.

--
Ritchie
Undo address for email.

Herbert Kleebauer

unread,
Apr 25, 2002, 6:16:28 AM4/25/02
to

Thomas M wrote:
>
> In article <3CC63C63...@uclink.berkzipeley.edu>,
> pro...@uclink.berkzipeley.edu says...

> > In order to escape from the vexation of having to deal with a myriad pipe


> > '|' characters, I have changed them to backslashes (line 12). If it is
> > important to you to retain the '|' characters as field delimiters, then
> > you will have to change the backslashes back to pipe characters at some
> > point.

> Anyway, my short-term goal is to get this working in a relatively


> straight forward manner. Once I get to that point, I will look at
> writing some VB code, or maybe trying to develop an automated process
> for pulling the files into Access, running the query there, and
> outputting the results. I'm not very familiar with VB or Access, and I
> thought those options would take me longer to get up and running, which
> is why I decided to pursue the batch method first.

If you still want to use the | or you can't accept the extra " (as with
Ritchie's solution) you can also include a small assembler into your batch.
It reads the file character by character, so it is even slower than
Ritchie's batch solution, but it can easily be modified to use block I/O
if higher speed is necessary:


::-------------------------------------------------------------------------
@echo off
echo Bj@jzh`0X-`/PPPPPPa(DE(DM(DO(Dh(Ls(Lu(LX(LeZRR]EEEUYRX2Dx=>filter.com
echo 0DxFP,0Xx.t0P,=XtGsB4o@$?PIyU!WvX0GwUY Wv;ovBX2Gv0ExGIuht6>>filter.com
echo ~pep{EEr=?@_g?Lp{?1@asICt{{StLtVKgq1=e?Csop?OtpCB}?At{ASEE>>filter.com
echo HK@@=B{E}r?w@GyHAxxHexh@e=ZpOog`Lre?eOEuFL~@{rsDs`LB?t}?`G>>filter.com
echo I@~Bjgt=EEdM?@=B@=`Mx`Lbj?Do@xBG~s?G@~yMLZp{bq`st@x[`ecfB@>>filter.com
echo ~rK??O0>>filter.com

filter <infile >outfile
if not errorlevel 1 echo error or no line found
::-------------------------------------------------------------------------


The source code:

; write all lines with 1041 between 77. an 78. | to stdout
; errorlevel = 0 if error or no 1041 found
; 1 if 1041 found
@=$100
eor.l r6,r6 ; file pos

_20: move.b #0,found
move.w #77,r4
_30: bsr.w get_c
cmp.b #$0a,r0
beq.b _40
cmp.b #'|',r0
bne.b _30
dec.w r4
bne.b _30

bsr.w get_5c
cmp.l #'1401',r0
bne.b _40
cmp.b #'|',buf+4
bne.b _40
move.b #1,flag
inc.b found

_40: move.w #$4200,r0
move.w r6,r1
move.l r6,r2
lsr.l #16,r2
eor.w r3,r3
trap #$21

_60: bsr.w get_c
inc.l r6
tst.b #$ff,found
beq.b _50
move.b #$40,m0
inc.w r3
trap #$21
_50: cmp.b #$0a,buf
bne.b _60
br.b _20

error: move.b #0,flag
exit: move.w flag,r0
trap #$21

get_5c: move.w #5,r2
br.b gc
get_c: move.w #1,r2
gc: move.b #$3f,m0
move.w #buf,r1
eor.w r3,r3
trap #$21
bcs.b error
cmp.w r0,r2
bne.b exit
move.l buf,r0
rts.w

found: dc.b 0
flag: dc.b 0,$4c
buf:

Thomas M

unread,
Apr 25, 2002, 2:31:53 PM4/25/02
to
In article <3cc7b...@mk-nntp-1.news.uk.worldonline.com>,
rlaw...@commanddoline.co.uk says...

Don't know if I'll be able to get to it today, but I'll post the results
once I do test it.

This actually raises another issue. My method for capturing the start
and end times of the job is very rudimentary. I just use:

time /t > StartTime.txt
time /t > EndTime.txt

Unfortunately, that only measures down to the minute. How would I
capture the time down to the second? Of course, if it really will
process the file about 10 seconds, I can just time it with my watch.

> > 2) What exactly is a token? When I type "for /?" I get information that
> > includes:
> >
> > " Processing consists of reading in the file, breaking it up into
> > individual lines of text and then parsing each line into zero or more
> > tokens."
> >
> > But it never defines the term "token". Would I be correct in assuming
> > that a token is equivalent to a field of data, so that if a text file
> > has 84 fields per record, each line would have 84 tokens?
>
>
> You can think of tokens as fields (in other words, the info between delimiters)
> The FOR /F command allows you to specify the delimiter character(s) and the tokens
> of interest. Its probably NT's most powerful batch command, you'll find it used
> in virtually every other solution.

I gathered that from looking at the command line help yesterday. My
head was swimming looking at all those backslashes, percent signs, quote
marks, and asterisk. Put enough of those together (%%"*|@!#\) and you've
got cartoon cursing--which I do frequently when programming! :-)

> In addition to the CMD help, also see the GUI help, in NT4, click the Start menu and
> select help. If not already selected, choose the Contents tab, then double-click 'Windows NT Commands'. In W2K, Start | Help |
> Contents | Reference | MS-DOS Commands.

Good idea. I actually have that already bookmarked, but I just wasn't
thinking along the lines of GUI Help.

--

Phil Robyn

unread,
Apr 25, 2002, 2:39:51 PM4/25/02
to
Thomas M wrote:
>
<<<snip>>>

> Don't know if I'll be able to get to it today, but I'll post the results
> once I do test it.
>
> This actually raises another issue. My method for capturing the start
> and end times of the job is very rudimentary. I just use:
>
> time /t > StartTime.txt
> time /t > EndTime.txt
>
> Unfortunately, that only measures down to the minute. How would I
> capture the time down to the second? Of course, if it really will
> process the file about 10 seconds, I can just time it with my watch.

C:\cmd>grabtime read1041_1started_ && demo\read1041 && grabtime read1041_2ended_ && set read1041
:
:
:

read1041_1started_at=11:36:30.16
read1041_2ended_at=11:36:35.85

C:\cmd>currver grabtime qblist
===== begin file c:\cmd\TEST\grabtime.cmd =====
1. @echo off
2. for /f "tokens=5" %%a in ('echo.^|time^|find "current"') do set %1at=%%a
===== end file c:\cmd\TEST\grabtime.cmd =====

<<<<snip>>>>


--
Phil Robyn
Univ. of California, Berkeley

Thomas M

unread,
Apr 25, 2002, 7:06:29 PM4/25/02
to

Phil,

I've been studying your code. I've documented it below (line for line
in some places). I was wondering if you would be willing to look over
my comments and correct any misunderstandings I might have. My comments
are left justified and the code is indented.

==========
@echo off

Localizes environment variables.
setlocal

Initializes saverec_ctr.
set saverec_ctr=0

Searches the specified file for records containing "|1041|". When
findstr encounters a hit, the entire record is placed in %%a. Then
passes %%a to :loadfields.


for /f "tokens=*" %%a in (

'findstr /c:"|1041|" c:\temp\yourdata.txt'

) do call :loadfields "%%a"

Runs :display for each record that has been saved. Passes %%a to
:display.


for /l %%a in (1,1,%saverec_ctr%) do call :display %%a

goto :EOF

:loadfields

Initializes field_ctr.
set field_ctr=0

Fills rec with the record as a string.
set rec=%*

Replaces "|" with "\#" in the record.
set rec=%rec:|=\#%

Removes the quote marks from rec.
set rec=%rec:"=%

Removes the "#" from rec. At this point, the "|" has been replaced by
the "\" as the delimiter.
set this_rec=%rec:#=%

:loadloop

The "/a" signifies an expression. Increments field_ctr by 1.
set /a field_ctr+=1

Specifies that the entire record should be processed one token at a time
and that "\" is the field delimiter.


for /f "tokens=1* delims=\" %%a in (

'echo %rec%'
) do (

Sets one_field to the contents of the current field being processed by
the For loop.
set one_field=%%a

Sets rec to everything in the record after one_field.
set rec=%%b
)

Branch to :continue if field_ctr is not equal to 78. In others, stop
processing the current field.


if %field_ctr% NEQ 78 goto :continue

If field_ctr = 78, remove the "#" from one_field.
set one_field=%one_field:#=%

Branch to :saverec if one_field = 1041.


if "%one_field%" EQU "1041" call :saverec

:continue

Continues the loop.


if defined rec goto :loadloop

goto :EOF

:saverec

Increments saverec_ctr by 1 and saves the records as saverec1, saverec2,
etc.
set /a saverec_ctr+=1
set saverec%saverec_ctr%=%this_rec%
goto :EOF

:display

Creates num as a dynamic variable (value changes for each saved record)
and echoes each record to the screen.
set num=%1
echo.
set saverec%num%
goto :EOF
==========

There are a couple of things I'm not quite sure about. I found it hard
to visualize the value of %%a at various points in the code. The way I
think it works is that the entire record is passed to %%a in the first
For loop, and that then gets passed to :loadfields. In :loadloop, %%a
contains the value of the current field being processed. Then, in the
final For loop that echoes the records to the screen, %%a is equal to
the iteration of the For loop and gets passed to :display where it
supplies that value for num. Is that about right?

Also, how does %%b work? I inserted an ECHO REC command at that point
and saw that the first field was getting truncated with each successive
iteration. I understand why that is required (because the FOR loop is
looking at the first token on each pass), but I don't understand HOW %%b
truncates the first field.

Thanks for all the help you've given me! I've learned a lot.

Phil Robyn

unread,
Apr 25, 2002, 7:45:29 PM4/25/02
to


My comments are inserted below.

Or, if you wanted to use this same batch file to access other data fields
(not just the 78th field), you could do

set field_%field_ctr%=%%a

>
> Sets rec to everything in the record after one_field.
> set rec=%%b
> )
>
> Branch to :continue if field_ctr is not equal to 78. In others, stop
> processing the current field.
> if %field_ctr% NEQ 78 goto :continue
>
> If field_ctr = 78, remove the "#" from one_field.
> set one_field=%one_field:#=%
>
> Branch to :saverec if one_field = 1041.

Not 'branch' (i.e. 'goto'); CALL

> if "%one_field%" EQU "1041" call :saverec

Here, you could do what Ritchie did in his version and simply
compare if "%one_field%" EQU "#1041" ....

Then you would not have to remove the '#' (set one_field=%one_field:#=%)
before making the comparison.

>
> :continue
>
> Continues the loop.

Continues the loop IF rec still contains something (i.e., is still defined)

> if defined rec goto :loadloop
> goto :EOF
>
> :saverec
>
> Increments saverec_ctr by 1 and saves the records as saverec1, saverec2,
> etc.
> set /a saverec_ctr+=1
> set saverec%saverec_ctr%=%this_rec%
> goto :EOF
>
> :display
>
> Creates num as a dynamic variable (value changes for each saved record)
> and echoes each record to the screen.
> set num=%1
> echo.
> set saverec%num%
> goto :EOF
> ==========
>
> There are a couple of things I'm not quite sure about. I found it hard
> to visualize the value of %%a at various points in the code. The way I
> think it works is that the entire record is passed to %%a in the first
> For loop, and that then gets passed to :loadfields. In :loadloop, %%a
> contains the value of the current field being processed. Then, in the
> final For loop that echoes the records to the screen, %%a is equal to
> the iteration of the For loop and gets passed to :display where it
> supplies that value for num. Is that about right?

The '%%a' variables are 'transient' variables or 'pseudo-variables' that
only exist in the context of the immediate 'for' statements in which they
occur. Thus, the '%%a' variables in lines 6, 7, and 17 (see version with
line numbers above) are not related to each other in any way.

>
> Also, how does %%b work? I inserted an ECHO REC command at that point
> and saw that the first field was getting truncated with each successive
> iteration. I understand why that is required (because the FOR loop is
> looking at the first token on each pass), but I don't understand HOW %%b
> truncates the first field.

%%b doesn't truncate the first field. The key to understanding this statement

17. for /f "tokens=1* delims=\" %%a in (
18. 'echo %rec%'
19. ) do (
20. set one_field=%%a
21. set rec=%%b
22. )

is the 'tokens=1*'. There will be two tokens each time we look at 'rec'; the
first token (whatever comes before the first occurrence of the delimiter '\')
will be assigned to the transient variable %%a and then to the environment
variable one_field, and the second token, '*' which means everything that
comes after the first occurrence of the first delimiter '\', in other words,
the entire remaining content of rec, will be assigned to the transient variable
%%b and then to environment variable rec. So every time this statement is
executed, the leftmost data field (from the beginning of rec up to but not
including the first delimiter '\') is 'peeled off' and assigned to one_field,
and everything ('*') after the first delimiter is then reassigned to %%b and
then to rec. This way of doing it is slower than Richie's version, but it
provides the framework for accessing any of the data fields, not just the 78th
data field.

>
> Thanks for all the help you've given me! I've learned a lot.

You're very welcome. I'm sure it will big pay dividends when you post your own
useful batch files to the newsgroup and/or help other people who come here with
questions of their own! Happy batching!

>
> --
>
> Please reply to the newsgroup so that all may learn
> from your wisdom.
>
> --Tom

--

Thomas M

unread,
Apr 25, 2002, 8:09:01 PM4/25/02
to

I ran a test and it processed the file in 14 minutes. However, it did
not echo anything to the screen, and I believe it should have had a hit
on several hundred records. I will study your code tomorrow to see if I
can figure out how it works and why it didn't echo any records to the
screen.

--

Ritchie Lawrence

unread,
Apr 26, 2002, 4:27:37 AM4/26/02
to
"Thomas M" <mis...@hotmail.com> wrote in message news:MPG.1732488d7...@News.CIS.DFN.DE...

Hi Tom,
This is how the script was tested. I copied the data you posted into notepad and
removed surplus CR/LF's. Resulting file had twenty records, each followed by CR/LF
(including record twenty). Each record had 84 fields. The field of interest was
the 78th. Total file size was 6755 bytes. Under W2K(SP2) there were five hits.

Since your post, I copied the data file onto itself muliptle times until it was
29,598,859 bytes. The same O/S produced 21870 hits in 7.4 minutes (PII 333MHz)

The only NT4(SP6a) machine I have access to is a 486 with 32MB, but it also produced
five hits with the twenty record file.

Maybe with this info you can determine why we got different results.

--
Ritchie
Undo my address for email.


Thomas M

unread,
Apr 26, 2002, 6:19:41 PM4/26/02
to
In article <3cc91...@mk-nntp-1.news.uk.worldonline.com>,

Considering where I'm at on the learning curve, I would trust your
results more than mine. Maybe I inadvertently pointed the code to the
wrong file. I did check that, but you know how sometimes you can look
at something over and over and see nothing wrong, yet when you come back
the next day, the error is obvious.

I got thrown a couple of ad hoc projects today, so I didn't end up
getting around to studying your code. I'll do that this weekend. Just
glancing at it, it looks like the code should work, so I'm pretty sure
that I must have goofed up my test somehow.

Thomas M

unread,
Apr 28, 2002, 8:51:12 PM4/28/02
to
> "Thomas M" <mis...@hotmail.com> wrote in message news:MPG.1730f33cc...@News.CIS.DFN.DE...
> > In article <3cc68...@mk-nntp-1.news.uk.worldonline.com>,
> > rlaw...@commanddoline.co.uk says...
> > > ========================== start =============================
> > > @echo off&setlocal
> > > for /f "tokens=*" %%a in (dat.pip) do set rec="%%a"&call:go
> > > goto:eof
> > >
> > > :go
> > > set dat=%rec:|=|~%
> > > for /f "tokens=31* delims=|" %%a in (%dat%) do (
> > > for /f "tokens=31* delims=|" %%c in ("%%b") do (
> > > for /f "tokens=16 delims=|" %%e in ("%%d") do (
> > > if {%%e}=={~1041} echo %rec%
> > > )
> > > )
> > > )
> > > ======================= end =====================================

Ritchie,

I've had a chance to study your code, and I think I know how it works.
Please let me know if I have any of this wrong.

The first FOR statement puts the entire record into %%a and then puts %%
a into rec. When :go is called, the delimiter is changed from "|" to
"|~".

In the second FOR statement %%a gets set to the value of the 31st token.
By default, %%b gets set to everything that comes after the 31st token.
The third FOR statement sets %%c to the value of the 31st token in %%b,
or the 62nd token in the record. %%d is set by default to everything
after the 62nd token. The final FOR statement sets %%e to the value of
the 16th token in %%d, or the 78th token in the record, which is the
field we want to evaluate. Finally, if %%e = "~1041" it echoes rec to
the screen.

I have a couple of questions:

1) Why bother changing the delimiter? Is there a programming reason
for that, or is it just a personal preference because it makes the data
easier to look at?

2) In the :go routine, why not make the first FOR look directly at the
78th token? Is there a limitation of the FOR command that makes it
necessary to use the approach that you used, or is there a programming
reason that I'm missing?

3) In the IF statement you use { }. I didn't see anything in the GUI
Help or in the command line Help about using those brackets. Are those
required, or do they have a special syntax meaning?

--

Thomas M

unread,
Apr 28, 2002, 9:01:54 PM4/28/02
to
In article <MPG.1733806bf...@News.CIS.DFN.DE>, misc516
@hotmail.com says...

I've tested the code some more. It does indeed seem to work properly on
the sample data that I posted. However, the 29MB file makes it choke
for some reason. It won't echo anything back to the screen, and I see
messages like the following.

Run 1: The system cannot find the file 145zW.

Run 2: The system cannot find the file |||ůW?|????|||||||||||||||121.

I'm not sure what to make of that. It must be hitting some screwy data
in that file that it doesn't know how to process. Is there a way to do
a little error trapping so that I could maybe identify which record in
the file is tripping up the code? Maybe I could setup a counter and
just echo that to the screen on each iteration.

Another option might be for me to load the 29MB file into Access and
scan the data to see if there is anything obviously amiss, but it would
take a while to manually scan a file that size, and the process would be
open to human error.

Clay Calvert

unread,
Apr 28, 2002, 9:12:28 PM4/28/02
to
On Sun, 28 Apr 2002 19:01:54 -0600, Thomas M <mis...@hotmail.com>
wrote:

>I've tested the code some more. It does indeed seem to work properly on

>the sample data that I posted. However, the 29MB file makes it choke
>for some reason. It won't echo anything back to the screen, and I see
>messages like the following.
>
> Run 1: The system cannot find the file 145zW.
>
> Run 2: The system cannot find the file |||ůW?|????|||||||||||||||121.
>
>I'm not sure what to make of that. It must be hitting some screwy data
>in that file that it doesn't know how to process. Is there a way to do
>a little error trapping so that I could maybe identify which record in
>the file is tripping up the code? Maybe I could setup a counter and
>just echo that to the screen on each iteration.

If the data contains <, > or &, then there is a good chance for
problems.

Sorry, I didn't reply earlier last week, but my home ISP was not
availble for three days. You're definitely in good hands with Phil
and Ritchie.

Phil Robyn

unread,
Apr 29, 2002, 2:23:45 AM4/29/02
to

You would probably also have problems if the data contains '(' or ')'.

If you have the QBASIC interpreter (part of standard NT 4.0 install,
not included with Win2000 but you can simply copy QBASIC.* from NT 4.0
\winnt\system32 to install it), you can try the following on your 29MB
file; it's MUCH faster than a batch file, and you still have access to
all of the data fields in each record:

=====begin c:\qbaspgms\read1041.bas====================
01. DIM datafields(84) AS STRING
02. DIM i AS INTEGER, ctr AS INTEGER
03. OPEN "c:\temp\yourdata.txt" FOR INPUT AS #1
04. OPEN "c:\temp\1041recs.txt" FOR OUTPUT AS #2
05. DO WHILE NOT EOF(1)
06. ctr = 0
07. LINE INPUT #1, rec$
08. saverec$ = rec$
09. DO WHILE INSTR(rec$, "|") > 0
10. ctr = ctr + 1
11. datafields(ctr) = LEFT$(rec$, INSTR(rec$, "|") - 1)
12. rec$ = MID$(rec$, INSTR(rec$, "|") + 1)
13. LOOP
14. IF datafields(78) = "1041" THEN
15. PRINT #2, saverec$
16. datafields(78) = ""
17. END IF
18. LOOP
19. CLOSE
20. SYSTEM
=====end c:\qbaspgms\read1041.bas====================

To run this from the cmd prompt:

start /min /wait qbasic /run c:\qbaspgms\read1041.bas

Ritchie Lawrence

unread,
Apr 29, 2002, 4:38:50 AM4/29/02
to
> I've tested the code some more. It does indeed seem to work properly on
> the sample data that I posted. However, the 29MB file makes it choke
> for some reason. It won't echo anything back to the screen, and I see
> messages like the following.
>
> Run 1: The system cannot find the file 145zW.
>
> Run 2: The system cannot find the file |||ůW?|????|||||||||||||||121.
>
> I'm not sure what to make of that. It must be hitting some screwy data
> in that file that it doesn't know how to process. Is there a way to do
> a little error trapping so that I could maybe identify which record in
> the file is tripping up the code? Maybe I could setup a counter and
> just echo that to the screen on each iteration.
>
> Another option might be for me to load the 29MB file into Access and
> scan the data to see if there is anything obviously amiss, but it would
> take a while to manually scan a file that size, and the process would be
> open to human error.

Hi Thomas,
If you're not already doing so, I'd seriously consider Phils QBasic code -
its likely to save you a lot of time. Anyway, if you're still considering
the batch option or just curious which characters are causing problems,
below is some code with a counter. Hopefully it'll display record numbers
upto the crash. Maybe you could use Access to extract the records near the
crash point and post them.

I tried out a few combinations of known 'troublesome' characters. I got an
error using a ditto character ("), but the message was:-

'~XXXXXXX' is not recognized as an internal or external command,
operable program or batch file.

The error message you received can be generated by:-

for /f %%a in (no_such_file) do ...

Which leads me to think the problem is with a ditto character, but in a
position other than which I tested.

======================== start ===========================
@echo off&setlocal
set i=0
for /f "tokens=*" %%a in (yourdata) do set rec="%%a"&call:go
goto:eof

:go
set/a i+=1


set dat=%rec:|=|~%
for /f "tokens=31* delims=|" %%a in (%dat%) do (
for /f "tokens=31* delims=|" %%c in ("%%b") do (
for /f "tokens=16 delims=|" %%e in ("%%d") do (

if {%%e}=={~1041} (echo [%i%] %rec%) else (echo [%i%])
)
)
)
========================= end ===============================

Ritchie Lawrence

unread,
Apr 29, 2002, 5:48:48 AM4/29/02
to
"Thomas M" <mis...@hotmail.com> wrote in message news:MPG.173646e6f...@News.CIS.DFN.DE...

I could not have put it better.

> I have a couple of questions:
>
> 1) Why bother changing the delimiter? Is there a programming reason
> for that, or is it just a personal preference because it makes the data
> easier to look at?

Its necessary so that empty fields are not discounted. Consider this data:

one|two|three

Once processed by FOR using | as delim, token1=one, token2=two etc... But...

one||three

token1=one, token2=three. Now substituting the delim for two characters...

one|~|~three

token1=one, token2=~, token3=~three

Any reasonable character could have been used besides the tilde. Indeed, I
wonder if this could cause problems, as it does have limited special meaning.

> 2) In the :go routine, why not make the first FOR look directly at the
> 78th token? Is there a limitation of the FOR command that makes it
> necessary to use the approach that you used, or is there a programming
> reason that I'm missing?

AFAIA there is a limit of 32 tokens (the '*' counts as a token).

> 3) In the IF statement you use { }. I didn't see anything in the GUI
> Help or in the command line Help about using those brackets. Are those
> required, or do they have a special syntax meaning?

Normally reserved for smilicons, they have no particular special meaning
here :} Its good practice to enclose variables when making comparisons to avoid
'XXX was unexpected at this time' errors when a variable is null (in other
words undefined). Any reasonable character can be used. Dittos are the usual
choice, but they don't look as nice as wiggly brackets <g>.

Thomas M

unread,
Apr 29, 2002, 2:52:56 PM4/29/02
to
In article <3ccd1...@mk-nntp-1.news.uk.worldonline.com>,

Ah! Now I get it.

> > 2) In the :go routine, why not make the first FOR look directly at the
> > 78th token? Is there a limitation of the FOR command that makes it
> > necessary to use the approach that you used, or is there a programming
> > reason that I'm missing?
>
> AFAIA there is a limit of 32 tokens (the '*' counts as a token).

I figured it must be something like that. Computers often break things
into multiples of 8, so I figured the use of "tokens=31*" might indicate
a limitation of 32, but in retrospect, it doesn't make any sense that
the FOR statement would have the limitation, so it makes sense that it
is the tokens that are limited.

Thomas M

unread,
Apr 29, 2002, 3:08:23 PM4/29/02
to

Ritchie,

I've figured out the problem--it was user error on my part. The
solution just kind of leapt into my brain while I was in the shower this
morning. The sample data that I posted was from an admissions file,
whereas the 29MB file is a demographics file. The two files are very
similar, but have one important difference. The plan code that we are
keying on is not in the same place. In a demographics file it's the
40th field, not the 78th. That explains why your code ran perfectly for
the sample data I posted, but not for the 29MB file.

I actually DID think about this when I was testing Phil's code, and I
made the necessary adjustments to his code to account for the
difference, but for some reason, I just completely spaced it out when
testing your code.

I will still need your original code for processing the admissions file.
I just forgot to make the adjustment for the demographics file. I made
that adjusted this morning so that it now looks at the 40th token when
running against a demographics file, and it works just fine.

I apologize for all the confusion. I should have thought about this
right from the start. Now, if you'll excuse me, I have to go clean this
egg off my face!

Thomas M

unread,
Apr 29, 2002, 3:13:44 PM4/29/02
to
In article <3CCCE6F1...@uclink.berkzipeley.edu>,
pro...@uclink.berkzipeley.edu says...

Phil,

Thanks for this. As fortune would have it, I copied QBasic onto my
system just this weekend. Talk about a blast from the past! I remember
dabbling in QBasic back before the introduction of Windows 95. I'll
definitely give this a try.

FYI: I found the problem that I was having with Ritchie's code. It was
a rather embarrassing bit of user error on my part. I posted a follow-
up to his last message if you are interested.

Phil Robyn

unread,
Apr 29, 2002, 3:32:28 PM4/29/02
to

Hi, Tom,

I just read your follow-up. Looks like you _do_ need access to different
data fields after all. . . . =0)

CMD batch files are always going to have problems with data containing
the 'poison' character set (!<^>|~), and it is usually not worth the trouble
it takes to handle records containing these characters in CMD batch files.
The QBASIC is just one example of using a different (and more appropriate)
language for problems of this type. I suggested QBASIC because it came
standard as part of NT4.0 and because you mentioned in a previous post that
you were considering using VB. Fortuitously, you have just copied it onto
your system. I also highly recommend Rexx (free from various sources),
especially Regina Rexx from sourceforge.net.

Thomas M

unread,
Apr 29, 2002, 7:35:21 PM4/29/02
to
In article <3CCD9FCC...@uclink.berkzipeley.edu>,
pro...@uclink.berkzipeley.edu says...

Yeah, I probably do need to use something more powerful than batch
files, but I thought that I could get the batch file option working more
quickly than QBasic or VB. However, since you posted the QBasic code, I
went ahead and tested it and it processed the 29MB demographics file in
about 30 seconds! I have compared the results with the results of your
original batch code and Ritchie's code, and all 3 methods return 559
records, which I then verified by pulling the raw text file into Access
and running a query on the plan code. So it looks like we've got good,
solid results.

I think I'll end up using the QBasic code simply because it runs faster,
but I certainly do not regret the time spent on working with the batch
commands because I have learned a lot, and that knowledge is actually
tying in to an unrelated project.

The only caveat I see to using QBasic is that my version is 1.0, which
does not include a compiler. Without compiled code, QBasic must be on
the machine that is processing the files. If that machine gets upgraded
to an OS that does not include QBasic, there could be a disruption in
the delivery of the files to our customers. I did download a freeware
compiler called FastBas, which I have not yet tested. Hopefully, I can
use that to compile the code into an .EXE, which will remove the need to
have QBasic on the machine that is running the process. I think that
would just make the whole process a little more reliable.

Phil Robyn

unread,
Apr 29, 2002, 7:49:48 PM4/29/02
to
Thomas M wrote:
>
> In article <3CCD9FCC...@uclink.berkzipeley.edu>,
> pro...@uclink.berkzipeley.edu says...

<<<<<snip>>>>

> Yeah, I probably do need to use something more powerful than batch
> files, but I thought that I could get the batch file option working more
> quickly than QBasic or VB. However, since you posted the QBasic code, I
> went ahead and tested it and it processed the 29MB demographics file in
> about 30 seconds! I have compared the results with the results of your
> original batch code and Ritchie's code, and all 3 methods return 559
> records, which I then verified by pulling the raw text file into Access
> and running a query on the plan code. So it looks like we've got good,
> solid results.

That's great! Glad you were so thorough in using all three methods to
test and compare results.

>
> I think I'll end up using the QBasic code simply because it runs faster,
> but I certainly do not regret the time spent on working with the batch
> commands because I have learned a lot, and that knowledge is actually
> tying in to an unrelated project.

Yes, that usually happens. . . ;-)

>
> The only caveat I see to using QBasic is that my version is 1.0, which
> does not include a compiler.

There is no version of QBASIC with compiler; it's interpreted only. The
one with the compiler is QUICKBasic, not QBASIC.

> Without compiled code, QBasic must be on
> the machine that is processing the files. If that machine gets upgraded
> to an OS that does not include QBasic, there could be a disruption in
> the delivery of the files to our customers. I did download a freeware
> compiler called FastBas, which I have not yet tested. Hopefully, I can
> use that to compile the code into an .EXE, which will remove the need to
> have QBasic on the machine that is running the process. I think that
> would just make the whole process a little more reliable.

Since you mentioned using VB in an earlier post, you could easily take
that QBASIC code and compile it in VB. Then you would pretty much have
a standalone EXE file (except the VB runtime module [DLL] must be present
on the machine you're running on).

Alternatively, just make sure that the three QBASIC.* files get copied to
\winnt\system32 during any upgrade, at least until 64-bit Windows anyway.

>
> --
>
> Please reply to the newsgroup so that all may learn
> from your wisdom.
>
> --Tom

--

0 new messages