Any ideas?
Kind regards,
Maurice
Maurice, It's a clasic problem with classic solution.
(I think I first read about it in a Hardy Boys book, in which
they were solving a crime by working with collections of
license plate numbers.) Just sort and search from tail
of list to front, deleting duplicates. Have fun. JohnH
--
Brian Slack
http://www.depicus.com
"Wake On Lan" and "Remote Shutdown" Software
"...never was I so scared as when I saw a Formula 1 car controlled by Visual
Basic..."
.
"John Herbster" <herb...@swbell.net> wrote in message
news:3cf94925$1_1@dnews...
try
yourstringlist.Add(somenewstring);
except
on EStringListError do begin
yourmemo.Lines.Add(somenewstring + ' is a duplicate');
end;
If you know that your TStringList is going to be that
large, it would probably be a good idea to set the
Capacity property too to avoid frequent reallocations.
Oz
Brian, Exceptions are expensive in terms of CPU cycles
and maybe memory fragmentation. However, you could
write your own insertion sort and avoid exceptions. But,
until I knew more about the need, like the expected
percent duplicates, would go with the search and delete
scheme. Rgds, JohnH
Oz, I had fogotten about the Duplicates property.
Why don't you just set it to dupIgnore and then
do your Add()s. I don't think that Maurice wanted
to print messages about duplicates and if he did
there are faster ways than using exceptions to
detect a failure to add. Regards, JohnH
>Set the Duplicates property to dupError,
>the Sorted property to True,
>and the CaseSensitive property to true or false
>depending on whether "cat" is supposed to be
>treated as a duplicate for "Cat". Then your
>code boils down to...
>
>try
> yourstringlist.Add(somenewstring);
>except
> on EStringListError do begin
> yourmemo.Lines.Add(somenewstring + ' is a duplicate');
>end;
in my limited tests, this took about 50 seconds (32 if i just muted
the excetion), while with the same stringlist source, my code took
about 3 seconds.
src.Sort;
i:= 0;
reported:= false;
lines:= src.Count;
while (i < lines) do
begin
s:= src[i];
dupe:= (i > 0) and (so = s);
if not dupe then
begin
reported:= false;
tmp.Add(s);
so:= s;
end
else
begin
if not reported then
begin
outp.Add(s);
reported:= true;
end;
end;
inc(i);
end;
(src is the input, outp is the duplicate-free list, log is the
stringlist that holds the duplicate values (i assumed Maurice only
wanted each string reported once), assign it to the memo when the loop
is done)
- Asbjørn
slItemList: TStringList; // your list with 200,000 entries
memDups: TMemo; // your memo for duplicates
slTempCleaned: TStringList; // tempory stringlist for cleaned items.
slTempDups: TStringList; // temporary stringlist to store duplicates.
I: Integer; // iterator
// First, create the temporary string lists.
slTempDups := TStringList.Create;
slTempDups.Duplicates := dupIgnore;
slTempDups.Sorted := True;
slTempCleaned := TStringList.Create;
slTempCleaned.Duplicates := dupError;
slTempCleaned.Sorted := True;
try
slTempDups.BeginUpdate;
slTempCleaned.BeginUpdate;
for I := 0 to Pred(slItemList.Count) do
begin
try
slTempCleaned.Add(slItemList[I]);
except
on EStringListError do slTempDups.Add(slItemList[I]);
end;
end;
slItemList.Assign(slTempCleaned);
memDups.Lines.Assign(slTempDups);
finally
slTempDups.Free;
slTempCleaned.Free;
end;
Regards,
Dustin Campbell
Microsys Technologies, Inc.
On Sat, 1 Jun 2002 23:37:49 +0200, "M.Wasbauer" <m.was...@nedpro.nl>
wrote:
Only one big problem, deleting from the D5 and probably
other Delphi TStringLists can be *very* slow. The time
required appears to grow at about the 2.15 power of the
number of items in the list when deleting about 22% of
the items.
A solution is to create a new TStringList and Add
the items that are *not* to be deleted. This very fast
compared to the sorting. The TStringListSort is
pretty fast (about 2000 CPU cycles per item when
doing about 200,000 items, each a string like
"234152"). --JohnH
You might try using Names[] and Values[] These are like hash pairs.
You can then Say Add ('First Item=1'); // where 1 means it's the first time
this item has been added to the list.
Then use IndexOfNames('First Item') to see if it already exists - I imagine
this uses hashing techniques so this would probably be faster than the
standard TStringlist maybe.
You can iterate through the list using Names[i] to get what's on the left of
the = sign and Values[] gives you what's on the right.
hope this helps ,
sincerely,
John
"M.Wasbauer" <m.was...@nedpro.nl> wrote in message
news:3cf93ee2_2@dnews...
Hmmm...he said he wanted the duplicates added to a memo.
There are many solutions to this problem, and the "best"
one will be predicated on how the original input data
is obtained. In some cases presorting a list and copying
the unique items to another list might be preferable
as in LordCRC's solution.
I supplied on a general solution where all the original
string data might not be available (e.g., it is arriving
a string at a time from a socket or is being entered by
a user from a form) so presorting isn't an option.
The general idea is to not add them in the first place rather
than go through all the machinery of adding only to delete
them later if they are a duplicate.
If the number of -expected- duplicates is small in relation to
the 200,000 input strings, it might be better to add the
duplicates to an auxiliary TStringList and just do a
memo.string.assign to get them to the memo.
Oz
On rereading the problem, I see he did!
Did you see my note about deleting from the D5 and
probably other Delphi TStringLists being *very* slow.
The time required appears to grow at about the 2.15
power of the number of items in the list when deleting
about 22% of the items.
I thought that deleting would just leave empty space at
the end of the list. But there is more to it than that.
Regards, JohnH
cnt : integer;
slTempCleaned.Duplicates := dupIgnore;
cnt := 0;
for I := 0 to Pred(slItemList.Count) do
begin
slTempCleaned.Add(slItemList[I]);
if cnt = slTempCleaned.Count then
slTempDups.Add(slItemList[I])
else
cnt := slTempCleaned.Count;
end;