Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

TIntegerList?

253 views
Skip to first unread message

Jim Andrews

unread,
Sep 13, 2001, 12:32:33 AM9/13/01
to
Anybody know of a tIntegerList component that acts like a tStringList?
I need the sorted, unique, find, add (if unique), etc. functions of a
sorted tStringList but with Integers rather than strings. Don't want to
use inttostr() on the data as it may be a fairly large list and speed is
important.


M.H. Avegaart

unread,
Sep 13, 2001, 4:12:52 AM9/13/01
to
You can use a TList. TList stored pointer, but because pointers have the
same size as integers (both 4 bytes) you can typecast between them, e.g.

var
List: TList;
V: Integer;
begin
List := TList.Create;
try
List.Add(Pointer(1));
List.Add(Pointer(2));
List.Add(Pointer(3));
List.Add(Pointer(4));
V := Integer(List[3]);
finally
List.Free;
end;
end;


"Jim Andrews" <j...@azdogs.com> wrote in message
news:3BA036E1...@azdogs.com...

Jim Andrews

unread,
Sep 13, 2001, 4:43:47 AM9/13/01
to
I may be wrong but I didn't find the "sorted, unique and find" features that
I need to avoid adding a value more than once. I need to be able to try and
add integers to a list and only end up with a list with one of each possible
values. The data would be 2,000 to 20,000 unique values out of perhaps
millions of values.

Bjørge Sæther

unread,
Sep 13, 2001, 7:03:48 AM9/13/01
to
Hi !

Here's a couple of old classes of mine (can't remember if thorougly tested or
not...):
with TIntegerList, all you need to do is create it with Sorted=true. Whenever
you add an item or set an item, the list is sorted. I can see there may be a
use for a variable FUpdating, and methods BeginUpdate, EndUpdate, Updating to
prevent sorting when a lot of items are added or altered.

Enjoy !
-------------------

TSortedList = class(TList)
private
FCompare: TListSortCompare;
FSearch: TListSortCompare;
FDuplicates: TDuplicates;
function GetSorted: boolean;
procedure SetCompare(const Value: TListSortCompare);
function GetSearchable: boolean;
function InternalFind(PValue: Pointer; var Index: integer): boolean; //
Locates the Item using FCompare
public
constructor Create(ASortCompare: TListSortCompare; ASearchCompare:
TListSortCompare; ADuplicates: TDuplicates = dupIgnore);
function Add(Item: Pointer): integer; reintroduce;
procedure Insert(Index: integer; Item: Pointer); reintroduce;
function Find(PValue: Pointer; var Index: integer): boolean; // Locates
the Item using FSearch
procedure Put(Index: integer; Item: Pointer); reintroduce;
property Compare: TListSortCompare read FCompare write SetCompare;
property Search: TListSortCompare read FSearch write FSearch;
property Duplicates: TDuplicates read FDuplicates write FDuplicates;
property Items[index: integer]: Pointer read get write Put; default; //
necessary to get the autosort functionality
property Searchable: boolean read GetSearchable;
property Sorted: boolean read GetSorted;
end;

TIntegerList = class(TSortedList)
private
function GetIntegers(index: integer): integer;
procedure PutIntegers(index: integer; const Value: integer);
public
constructor Create(Sorted: boolean; ADuplicates: TDuplicates =
dupIgnore);
function Add(Item: integer): integer; reintroduce;
procedure Insert(Index: integer; Item: integer); reintroduce;
function Find(Value: Integer; var Index: integer): boolean;
reintroduce;// Locates the Item using FSearch
property Integers[index: integer]: integer read GetIntegers write
PutIntegers; default;
end;


implementation

{ TSortedList }

procedure DuplicateError(List: TObject);
begin
raise Exception.Create('Attempt to store duplicate values in
'+List.ClassName);
end;

function TSortedList.Add(Item: Pointer): integer;
begin
if Sorted then begin
if InternalFind(Item, result) then begin
case Duplicates of
dupIgnore:
begin
result:=-1;
exit;
end;
dupAccept: ;
dupError : DuplicateError(Self);
end;
end;
inherited Insert(result, Item);
end
else
result:=Inherited Add(Item);
end;

constructor TSortedList.Create(ASortCompare: TListSortCompare;
ASearchCompare: TListSortCompare; ADuplicates: TDuplicates = dupIgnore);
begin
inherited Create;
FCompare:=ASortCompare;
FSearch:=ASearchCompare;
FDuplicates:=ADuplicates;
end;

function TSortedList.Find(PValue: Pointer; var Index: integer): boolean;
var
AList: PPointerList;
L, H, I: Integer; C: Int64;
begin
Result := False;
Index:=0;
if not Searchable then exit;
AList:=List;
L := 0;
H := Count - 1;
while L <= H do
begin
I := (L + H) shr 1;
C := FSearch(AList^[I], PValue);
if C < 0 then
L := I + 1
else begin
H := I - 1;
if C = 0 then
Result := True;
end;
end;
Index := L;
end;

function TSortedList.InternalFind(PValue: Pointer; var Index: integer):
boolean;
var
AList: PPointerList;
L, H, I: Integer; C: Int64;
begin
Result := False;
Index:=0;
if not Searchable then exit;
AList:=List;
L := 0;
H := Count - 1;
while L <= H do
begin
I := (L + H) shr 1;
C := FCompare(AList^[I], PValue);
if C < 0 then
L := I + 1
else begin
H := I - 1;
if C = 0 then
Result := True;
end;
end;
Index := L;
end;

function TSortedList.GetSearchable: boolean;
begin
result:=Assigned(FSearch);
end;

function TSortedList.GetSorted: boolean;
begin
result:=Assigned(FCompare);
end;

procedure TSortedList.Insert(Index: integer; Item: Pointer);
begin
if Sorted then
Add(Item)
else
inherited Insert(Index, Item);
end;

procedure TSortedList.Put(Index: integer; Item: Pointer);
var
TmpIndex: integer;
begin
if Duplicates = dupError then begin
if Find(Item, TmpIndex) then begin
if (TmpIndex <> Index) then
DuplicateError(Self);
end;
end;
inherited Put(Index, Item);
if Sorted then begin
if ((Index > 0) and (FCompare(Items[Index-1], Item) > 0) or ((Index <
Count-1) and (FCompare(Items[Index+1], Item) < 0))) then
Sort(FCompare);
end;
end;


procedure TSortedList.SetCompare(const Value: TListSortCompare);
begin
FCompare := Value;
Sort(FCompare);
end;

{ TIntegerList }

function TIntegerList.Add(Item: integer): integer;
begin
result:=Inherited Add(Pointer(Item));
end;

function ComparePointerIntegers(Item1, Item2: Pointer): integer;
begin
result:=Integer(Item1) - Integer(Item2);
end;

constructor TIntegerList.Create(Sorted: boolean; ADuplicates: TDuplicates);
begin
inherited Create(nil, nil, ADuplicates);
if Sorted then begin
Compare:=ComparePointerIntegers;
Search:=ComparePointerIntegers;
end;
end;

function TIntegerList.Find(Value: Integer; var Index: integer): boolean;
begin
result:=Inherited Find(Pointer(Value), Index);
end;

function TIntegerList.GetIntegers(index: integer): integer;
begin
result:=Integer(Inherited Get(Index));
end;

procedure TIntegerList.Insert(Index, Item: integer);
begin
Inherited Insert(Index, Pointer(Item));
end;

procedure TIntegerList.PutIntegers(index: integer; const Value: integer);
begin
Inherited Put(Index, Pointer(Value));
end;

---------------------

--
Bjoerge Saether
Consultant / Developer
http://www.itte.no
Asker, Norway
bjorge@takethisaway_itte.no (remve the obvious)
"Jim Andrews" <j...@azdogs.com> skrev i melding
news:3BA036E1...@azdogs.com...

Jan Philips

unread,
Sep 13, 2001, 9:08:39 AM9/13/01
to
Jim Andrews <j...@azdogs.com> wrote:

>I may be wrong but I didn't find the "sorted, unique and find" features that
>I need to avoid adding a value more than once. I need to be able to try and
>add integers to a list and only end up with a list with one of each possible
>values. The data would be 2,000 to 20,000 unique values out of perhaps
>millions of values.

Here's another approach, which may be better. Set up a boolean
array of the size of the maximum value, or use TBits for a bit
array. Initialize it to zero or false, then go through the data
and set the corresponding boolean or bit to true or 1. Then a
quick pass through the array gives the unique values, in order.

Sundial Services International, Inc.

unread,
Sep 13, 2001, 11:39:24 AM9/13/01
to
The viability of this approach really depends on how "sparse" the matrix
is expected to be. If the data is clustered, then one approach is to
create a list of records that each contain (say):

baseValue: Longint;
bits: set of 0..255;

A search is made for a record with baseValue <= n <= baseValue+255, and
the corresponding bit in the "bits" set is [n-basevalue].

However, these techniques are all just alternatives ... it truly depends
on the expected characteristics of the data.


In article <clb1qtc12tdpj2vmm...@4ax.com>,

Jan Philips

unread,
Sep 13, 2001, 2:09:13 PM9/13/01
to
in...@sundialservices.com (Sundial Services International, Inc.)
wrote:

>The viability of this approach really depends on how "sparse" the matrix
>is expected to be.

Yes, well, we don't know what distribution to expect. Not
knowing that, I'd assume roughly uniform dist. The running time
is O(n+m) where n is the number of items and m is the maximum.
Space is O(m), but he said that was only perhaps a few million.
10,000,000 bytes isn't too bad, and 10,000,000 bits is even
smaller. I've used this approach several times.

Jim Andrews

unread,
Sep 13, 2001, 10:45:39 PM9/13/01
to
Am I right in assuming that an variable array of n int64 would be smaller
than an array o 8*n bytes since there would be 1/8 the number of pointers?
If that is true then it might not be bad to use Sundial's approach and read
an set the bits with and/or commands in a predefined [0..63] array =
(1,2,4,8,16.....).

The database is a pedigree database that lists sire & dam. Part of the
"math" includes looking back in an ancestor list that could be 2^n and
identifying the all of the ancestors or forward x^n identifying all of the
descendants where n = number of generations and x = number of offspring in a
single generation. Since some dogs are very prolific, called matador
breeding, I have see x values in the hundreds.

Since the databases tend to be less than 60,000 dogs or so using Sundial's
approach with int64 values would only be about 1,000 int64 (8KB) and would be
very manageable vs a variable array of 60,000 integers (4 bytes data & 4
bytes pointer?). Would making an assumption that it would not exceed 120,000
records and using a [0..2000] array of integers be faster? I assume that
variable arrays use a pointer to each value.

Should I only use [0..62] to avoid negative number problems for the and/or
mask?

I know that this doesn't seem like it should be a speed problem but this is
only part of the calculation and often I need to do it for the entire
database so it may be repeated 60,000 times.

...Jim

Jim Andrews

unread,
Sep 13, 2001, 11:32:23 PM9/13/01
to
Answering my own question but as long as I am very sure that the max value of n <
262,143 (expected max < 100,000) thoughts on the following? Since I am setting
and reading bits will I see any problem with int64 being a signed integer?

This should be an incredibly fast answer to my problem and 32 Kbytes for the
array is certainly not a problem.

var
bits : array[0..63] of int64;
vIsUsed : array[0..4095] of int64;

procedure setbits;
begin
bits[0] := 1;
for j := 1 to 63 do
bits[j] := bits[j-1] shr 1;
end;

procedure clearused; // big advantage of fixed length array
begin
fillchar(visused,sizeof(visused),0);
end;

procedure AddUsed(n : integer);
begin
visused[n shr 5] := visused[n shr 5] or bits[n and 63];
end;

function isused(n : integer) : boolean;
begin
result := visused[n shr 5] and bits[n and 63];
end;

Jim Andrews

unread,
Sep 13, 2001, 11:38:57 PM9/13/01
to
oops,

Two small code changes but logic the same

procedure setbits;
var
j : integer; // needs to have a loop variable


begin
bits[0] := 1;
for j := 1 to 63 do
bits[j] := bits[j-1] shr 1;
end;

procedure clearused;
begin
fillchar(visused,sizeof(visused),0);
end;

procedure AddUsed(n : integer);
begin
visused[n shr 5] := visused[n shr 5] or bits[n and 63];
end;

function isused(n : integer) : boolean;
begin

result := visused[n shr 5] and bits[n and 63] = 1; // forgot the = 1
end;

Jan Philips

unread,
Sep 14, 2001, 3:08:06 PM9/14/01
to
Jim Andrews <j...@azdogs.com> wrote:

>The database is a pedigree database that lists sire & dam. Part of the
>"math" includes looking back in an ancestor list that could be 2^n and
>identifying the all of the ancestors or forward x^n identifying all of the
>descendants where n = number of generations and x = number of offspring in a
>single generation. Since some dogs are very prolific, called matador
>breeding, I have see x values in the hundreds.

I don't fully understand what you're doing, but some other type
of data structure might be better.

>Since the databases tend to be less than 60,000 dogs or so using Sundial's
>approach with int64 values would only be about 1,000 int64 (8KB) and would be
>very manageable vs a variable array of 60,000 integers

You said that the maximum value was a several million. A
boolean array for up to 10,000,000 would take only 10,000,000
bytes. If you use the bit array, that would be a tine 1.25
million bytes. Assuming you're using Delphi 2 or later, and a
computer with at least 64MB, 10 MB of data is OK.

> Would making an assumption that it would not exceed 120,000
>records and using a [0..2000] array of integers be faster?

It may depend on the data. The boolean array is probably the
fastest, but it uses the most memory. But if you have enough
memory ...

>I know that this doesn't seem like it should be a speed problem but this is
>only part of the calculation and often I need to do it for the entire
>database so it may be repeated 60,000 times.

I probably need a clearer picture of what the program is to
accomplish.

Jan Philips

unread,
Sep 14, 2001, 3:11:07 PM9/14/01
to
Jim Andrews <j...@azdogs.com> wrote:

>This should be an incredibly fast answer to my problem and 32 Kbytes for the
>array is certainly not a problem.

It looks to me like TBits would do the same thing, and free you
from having to write the overhead stuff.

Emil Beli

unread,
Sep 14, 2001, 8:29:37 PM9/14/01
to
In article <3BA036E1...@azdogs.com>, j...@azdogs.com says...

Try TRPCBuffer from my site. Even if it is a database-like with full
features, it is still around 10 times faster than any list and just
above 2 times faster then operating with pure String.
If a performance is crucial, that is the choice.

Note: 4 days ago I needed to make a large query cursor in memory
executing many times. Locating records with locate (and process) took
around 4273 ms for the whole process while everything with Buffer
locating did it in 771 ms.

--
Best regards,
Emil Beli
http://www.greenhousemm.com

0 new messages