I have a file which contains 15,00,000 records. I use a SELECT command
on this file. This puts all the records in an active list. Now I need
every 5,00,00 records to be writen in a different file. So, I put a
piece of code as below.
EXECUTE "SELECT MY.LARGE.FILE" TO SELECT.LIST
TOT.NO.RECS = @SELECTED
CHUNK.SIZE = 500000
TOT.NO.AGENTS = TOT.NO.RECS/CHUNK.SIZE
FOR AGENT.IDX = 1 TO TOT.NO.AGENTS
R.WORK.FILE = FIELD(SELECT.LIST,FM,START.VALUE,CHUNK.SIZE)
WRITE R.WORK.FILE TO F.DIFFERENT.FILE,AGENT.IDX
START.VALUE = (CHUNK.SIZE * AGENT.IDX) + 1
NEXT AGENT.IDX
This 5,00,000 may be varry according to the size of the file.
My Question is, I am building an array R.WORK.FILE which consists of
5,00,000 fields from the list SELECT.LIST.
Whether this array overflow?
Is the code is optimized one?
Is there any better way?
Thanks
Malai
FNAME = 'MY.LISTS'
EXECUTE 'CREATE-FILE ':FNAME:' TYPE=UD' ;* Cannot be hashed
file
OPEN FNAME TO FILE ELSE STOP
OPEN 'MY.LARGE.FILE' TO MYFILE ELSE STOP
SELECT MYFILE ;* Keeps in memory only a group at a
time
CC.LIST = 0 ;* List number
CHUNK.SIZE = 500000 ;* Maximum number of records in each
list
CC.ID = 0 ;* Record counter
MAX.SIZE = 10000 ;* Maximum record size
SIZE = 0 ;* Record size
LIST.NAME = ''
BUFF = ''
GOSUB NEW.LIST ;* Open a new list
LOOP
READNEXT ID ELSE EXIT
CC.ID += 1
SIZE += LEN(ID) + 1
BUFF<-1> = ID
IF CC.ID = CHUNK.SIZE THEN
GOSUB OUTPUT ;* Save end of list
GOSUB NEW.LIST
END ELSE
IF SIZE > MAX.SIZE THEN GOSUB OUTPUT
END
REPEAT
IF SIZE THEN GOSUB OUTPUT ;* At this point the
CRT 'Done. Last list is ':LIST.NAME
STOP
**********************************************************************
OUTPUT:
WRITESEQ BUFF ON SQFILE ELSE STOP
SIZE = 0
BUFF = ''
RETURN
**********************************************************************
NEW.LIST:
IF CC.LIST THEN CLOSESEQ SQFILE
CC.LIST += 1
LIST.NAME = 'MY.LIST-':CC.LIST
WRITE '' ON FILE, LIST.NAME ;* Create empty list
OPENSEQ FNAME, LIST.NAME TO SQFILE ELSE STOP
CC.ID = 0
RETURN
END
Thanks for your effort.
But in my case F.DIFFERENT.FILE is a type J4 hashed file.
I think, atlast I need to issue a COPY command to copy the records to
the hashed file.
Is there any other way?
DIM V(10000)
MAT V = ''
I = 1
J = 1
LOOP
READNEXT ID ELSE EXIT
V(I)<-1> = ID
J += 1
IF J > 50 THEN
I += 1
IF I * 50 >= CHUNK.SIZE THEN
LIST.NAME = .....
MATWRITE V ON FILE, LIST.NAME
MAT V = ''
I = 1
END
J = 1
END
REPEAT
IF I > 1 OR J > 1 THEN
LIST.NAME = ...
MATWRITE V ON FILE, LIST.NAME
END
I have a few doubts here...how the SELECT.LIST is handling 15,00,000
records..like is the SELECT.LIST separated by FM. If its separated by
FM then I dont think R.WORK.FILE is going to get overflowed because
SELECT.LIST has value separated by FM for 15,00,000 records and the
R.WORK.FILE is going to have 5,00,000 values separated by FM....
Correct me If I am wrong. There is one more command you can play around with.
SELECT FBNK.ACCOUNT SAMPLE 10....but this will select the same sample
again and again. But this will be useful if your MY.LARGE.FILE can be
deleted repeatedly after the process and consecutive select of sample
can be made on the same file.
hope this helps ;-)
Best Regards,
Karthik Sekar
A few comments...
1) you will not overflow the array - that will purely be a function of
memory handling
2) this will be very slow. If the array SELECT.LIST has 15M records,
each with an ID of say 100 chars then SELECT.LIST is 1500Mb in size.
Each time you do a field statement jBase has to parse out 1500Mb to
extract the required portion. You would get a faster result using
READNEXT VALUE and R.WORK.FILE := VALUE:FM (if you understand my
shorthand code.
3) It may even be faster to have a temp file so that
FOR AGENT.IDX = 1 TO TOT.NO.AGENTS
FOR J = 1 TO CHUNK.SIZE
READNEXT ID ELSE EXIT
WRITE "X" ON F.TEMPFILE,ID
NEXT J
EXECUTE "SELECT TEMPFILE"
EXECUTE "SAVE-LIST MYTEMP"
DATA "(":DIFFERENT.FILE:" ":AGENT.IDX
EXECUTE "COPY-LIST MYTEMP"
NEXT AGENT.IDX
A select list will store the selected IDs in a SAVEDLIST not in an
array.
ie)
The SAVEDLIST sits inside the harddisk.
The ARRAY sits in your RAM.
R.WORK.FILE is an array which holds 5,00,000 values, surely in some
case it will bounce the limit.
Selecting and deleting some samples is also time expensive.
Lets think what micke proposed..
Thanks
Malai
I really dont understand your EXECUTE statements. I tried with some
sample codes as below.
MY.UD.FILE is a temp file which contains 5M records
0001 PROGRAM TEST.LIST
0002
0003 DIFFERENT.FILE = "MY.J4"
0004 AGENT.IDX = "1"
0005 EXECUTE "SELECT MY.UD.FILE"
0006 EXECUTE "SAVE-LIST MY.LIST"
0007 DATA "(":DIFFERENT.FILE:" ":AGENT.IDX
0008 EXECUTE "COPY-LIST MY.LIST"
0009
0010 STOP
Could you explain more on line 7 and 8.
Thanks
Malai
In the same program if I change the line no 7 as
0007 DATA "(":DIFFERENT.FILE
then this writes the list in my J4 fie with the ID as "MY.LIST"
I am running in 4.1, is there any change in 4.0 to 4.1.
Tks
Malai
I was just copying from the 4.1 manual that I had but this seems to be
telling porkies...
EXAMPLE 3
:COPY-LIST A.SALES
TO: (SALES.LISTS APRIL.SALES
List "A.SALES" written to record "APRIL.SALES" in file
"SALES.LISTS"
Copies A.SALES (a previously saved list) to record APRIL.SALES,
in file SALES.LISTS.
--------
so it would seem that you must save the original list as your AGENT.IDX
and then COPY-LIST which would give you the same answer.
rgds
Mike