LIST vs ARRAY

Malai

unread,

Mar 7, 2006, 9:16:54 PM3/7/06

to jBASE

Hi,

I have a file which contains 15,00,000 records. I use a SELECT command
on this file. This puts all the records in an active list. Now I need
every 5,00,00 records to be writen in a different file. So, I put a
piece of code as below.

EXECUTE "SELECT MY.LARGE.FILE" TO SELECT.LIST
TOT.NO.RECS = @SELECTED

CHUNK.SIZE = 500000

TOT.NO.AGENTS = TOT.NO.RECS/CHUNK.SIZE

FOR AGENT.IDX = 1 TO TOT.NO.AGENTS
R.WORK.FILE = FIELD(SELECT.LIST,FM,START.VALUE,CHUNK.SIZE)
WRITE R.WORK.FILE TO F.DIFFERENT.FILE,AGENT.IDX
START.VALUE = (CHUNK.SIZE * AGENT.IDX) + 1
NEXT AGENT.IDX

This 5,00,000 may be varry according to the size of the file.

My Question is, I am building an array R.WORK.FILE which consists of
5,00,000 fields from the list SELECT.LIST.

Whether this array overflow?
Is the code is optimized one?
Is there any better way?

Thanks
Malai

Message has been deleted

Lucian

unread,

Mar 7, 2006, 10:52:32 PM3/7/06

to jBASE

Be more specific about the key size.
If your keys are let's say 10 bytes, the LIST will grow to 5.5Mb which
is still manageable.
For long keys there is still a better way to build the list.
So what's the key size ? min / MAX / average

Malai

unread,

Mar 7, 2006, 11:02:49 PM3/7/06

to jBASE

The Key size may varry from 10 to 20 Characters.

Lucian

unread,

Mar 7, 2006, 11:58:36 PM3/7/06

to jBASE

Malai,
The program below creates a directory type file named MY.LISTS.
The directory type file is necessary in order for OPENSEQ / WRITESEQ to
work.
Then the program writes to this file every time it fills the buffer
BUFF to about 10K worth of record IDs..
When it reaches CHUNK.SIZE of 500000 record IDs, closes the current
list and opens a new one.
In order for OPENSEQ to work, the record list must exist, therefore is
created by writting a null record into MY.LISTS.

FNAME = 'MY.LISTS'

EXECUTE 'CREATE-FILE ':FNAME:' TYPE=UD' ;* Cannot be hashed
file
OPEN FNAME TO FILE ELSE STOP

OPEN 'MY.LARGE.FILE' TO MYFILE ELSE STOP

SELECT MYFILE ;* Keeps in memory only a group at a
time

CC.LIST = 0 ;* List number

CHUNK.SIZE = 500000 ;* Maximum number of records in each
list
CC.ID = 0 ;* Record counter

MAX.SIZE = 10000 ;* Maximum record size

SIZE = 0 ;* Record size

LIST.NAME = ''
BUFF = ''

GOSUB NEW.LIST ;* Open a new list

LOOP

READNEXT ID ELSE EXIT

CC.ID += 1

SIZE += LEN(ID) + 1

BUFF<-1> = ID

IF CC.ID = CHUNK.SIZE THEN

GOSUB OUTPUT ;* Save end of list

GOSUB NEW.LIST

END ELSE

IF SIZE > MAX.SIZE THEN GOSUB OUTPUT

END

REPEAT

IF SIZE THEN GOSUB OUTPUT ;* At this point the

CRT 'Done. Last list is ':LIST.NAME

STOP

**********************************************************************

OUTPUT:

WRITESEQ BUFF ON SQFILE ELSE STOP

SIZE = 0

BUFF = ''

RETURN

**********************************************************************

NEW.LIST:

IF CC.LIST THEN CLOSESEQ SQFILE

CC.LIST += 1

LIST.NAME = 'MY.LIST-':CC.LIST

WRITE '' ON FILE, LIST.NAME ;* Create empty list

OPENSEQ FNAME, LIST.NAME TO SQFILE ELSE STOP

CC.ID = 0

RETURN

END

Lucian

unread,

Mar 8, 2006, 12:04:41 AM3/8/06

to jBASE

Sorry for the formating - that's some Google feature.

Malai

unread,

Mar 8, 2006, 12:28:07 AM3/8/06

to jBASE

Lucian,

Thanks for your effort.

But in my case F.DIFFERENT.FILE is a type J4 hashed file.

I think, atlast I need to issue a COPY command to copy the records to
the hashed file.

Is there any other way?

Lucian

unread,

Mar 8, 2006, 1:01:46 AM3/8/06

to jBASE

Sorry, I don't know of any other solution.
One avenue is to use dimensioned arrays for storing the LIST while
processing.
For example use the first program but instead of doing LIST<-1> = ID
use
V(I)<-1> = ID.
Because each element of the vector V is processed in a linear way, it's
better to keep it within a few Kb.
In this case 50 record IDs would total somewhere between 0.5 to 1Kb.
If memory serves me, the maximum number of elements in a dimensioned
array is 500,000 so 10,000 is manageable.
I don't know the inner workings of jBASE but seems that it keeps in
memory only the matrix elements it's working with, so the fact that the
matrix has a total of 6 to 11 Mb is not an issue.

DIM V(10000)
MAT V = ''

I = 1
J = 1

LOOP
READNEXT ID ELSE EXIT

V(I)<-1> = ID
J += 1
IF J > 50 THEN
I += 1
IF I * 50 >= CHUNK.SIZE THEN
LIST.NAME = .....
MATWRITE V ON FILE, LIST.NAME
MAT V = ''
I = 1
END
J = 1
END
REPEAT

IF I > 1 OR J > 1 THEN
LIST.NAME = ...
MATWRITE V ON FILE, LIST.NAME
END

Karthik Sekar

unread,

Mar 8, 2006, 3:46:13 AM3/8/06

to jB...@googlegroups.com

Hi Malai,

I have a few doubts here...how the SELECT.LIST is handling 15,00,000
records..like is the SELECT.LIST separated by FM. If its separated by
FM then I dont think R.WORK.FILE is going to get overflowed because
SELECT.LIST has value separated by FM for 15,00,000 records and the
R.WORK.FILE is going to have 5,00,000 values separated by FM....

Correct me If I am wrong. There is one more command you can play around with.

SELECT FBNK.ACCOUNT SAMPLE 10....but this will select the same sample
again and again. But this will be useful if your MY.LARGE.FILE can be
deleted repeatedly after the process and consecutive select of sample
can be made on the same file.

hope this helps ;-)

Best Regards,
Karthik Sekar

mike ryder

unread,

Mar 8, 2006, 9:37:08 AM3/8/06

to jBASE

Hi Malai,

A few comments...
1) you will not overflow the array - that will purely be a function of
memory handling
2) this will be very slow. If the array SELECT.LIST has 15M records,
each with an ID of say 100 chars then SELECT.LIST is 1500Mb in size.
Each time you do a field statement jBase has to parse out 1500Mb to
extract the required portion. You would get a faster result using
READNEXT VALUE and R.WORK.FILE := VALUE:FM (if you understand my
shorthand code.
3) It may even be faster to have a temp file so that

FOR AGENT.IDX = 1 TO TOT.NO.AGENTS

FOR J = 1 TO CHUNK.SIZE
READNEXT ID ELSE EXIT
WRITE "X" ON F.TEMPFILE,ID
NEXT J
EXECUTE "SELECT TEMPFILE"
EXECUTE "SAVE-LIST MYTEMP"
DATA "(":DIFFERENT.FILE:" ":AGENT.IDX
EXECUTE "COPY-LIST MYTEMP"
NEXT AGENT.IDX

Malai

unread,

Mar 8, 2006, 8:40:17 PM3/8/06

to jBASE

Hi Karthik,

A select list will store the selected IDs in a SAVEDLIST not in an
array.

ie)
The SAVEDLIST sits inside the harddisk.
The ARRAY sits in your RAM.

R.WORK.FILE is an array which holds 5,00,000 values, surely in some
case it will bounce the limit.

Selecting and deleting some samples is also time expensive.

Lets think what micke proposed..

Thanks
Malai

Malai

unread,

Mar 8, 2006, 8:56:56 PM3/8/06

to jBASE

Hi Mike,

I really dont understand your EXECUTE statements. I tried with some
sample codes as below.
MY.UD.FILE is a temp file which contains 5M records

0001 PROGRAM TEST.LIST
0002
0003 DIFFERENT.FILE = "MY.J4"
0004 AGENT.IDX = "1"
0005 EXECUTE "SELECT MY.UD.FILE"
0006 EXECUTE "SAVE-LIST MY.LIST"
0007 DATA "(":DIFFERENT.FILE:" ":AGENT.IDX
0008 EXECUTE "COPY-LIST MY.LIST"
0009
0010 STOP

Could you explain more on line 7 and 8.

Thanks
Malai

Malai

unread,

Mar 8, 2006, 9:35:53 PM3/8/06

to jBASE

Hi,

In the same program if I change the line no 7 as

0007 DATA "(":DIFFERENT.FILE

then this writes the list in my J4 fie with the ID as "MY.LIST"

I am running in 4.1, is there any change in 4.0 to 4.1.

Tks
Malai

mike ryder

unread,

Mar 9, 2006, 7:28:27 AM3/9/06

to jBASE

Hi,

I was just copying from the 4.1 manual that I had but this seems to be
telling porkies...

EXAMPLE 3
:COPY-LIST A.SALES
TO: (SALES.LISTS APRIL.SALES
List "A.SALES" written to record "APRIL.SALES" in file
"SALES.LISTS"
Copies A.SALES (a previously saved list) to record APRIL.SALES,
in file SALES.LISTS.

--------
so it would seem that you must save the original list as your AGENT.IDX
and then COPY-LIST which would give you the same answer.

rgds
Mike

Reply all

Reply to author

Forward