Removing array elements...

178 views
Skip to first unread message

Pete

unread,
Jul 20, 2021, 1:08:24 PM7/20/21
to Harbour Users
Hello,

Let's say we have an array with filenames.
Our array is created/populated by using hb_DirScan()
and might be quite a big array! ;-)
f.e.:
    
    aFiles := hb_DirScan( "C:\", "*.*" ) // --> (hundreds of thousands of files)
    
now, we want to filter out  (remove from array) all the files with extension ".bak"
and the question is: what`s the most efficient (or the not so efficient) way to do that?

regards,
Pete

David Field

unread,
Jul 20, 2021, 2:48:01 PM7/20/21
to Harbour Users
Pete,

aFiles := hb_DirScan( "C:\", "*.*" ) // --> (hundreds of thousands of files)
nPosition := 1
While nPosition > 0
   nPosition := hb_AScan(aFiles, <SearchValue>, nPosition,  [<nCount>], [<lExact>]) 
   If nPosition > 0
      hb_aDel(aFiles,  nPosition, .T.)
   Endif
Enddo

Cheers,
David Field

Pete

unread,
Jul 20, 2021, 3:25:02 PM7/20/21
to Harbour Users
Hi David,

Thanks for the reply!
Yes, your code is more or less what I was thinking about.
The only problem is that aFiles (as it's returned by hb_DirScan() )
is a multi-dimensional array, so, the hb_Ascan() can't applied directly on it.
We have to split  it. And that's the challenge, whether we could avoid the split or not?

regards,
Pete

Klas Engwall

unread,
Jul 20, 2021, 4:40:28 PM7/20/21
to harbou...@googlegroups.com
Hi Pete,

Replacing <SearchValue> with a codeblock should take care of that problem:

nPosition := hb_AScan( aFiles, { |aVal| ".bak" $ lower( aVal[ F_NAME ] )
}, nPosition )

Whether it is efficient with hundreds of thousands of files is a
different question :-)

Regards,
Klas


Den 2021-07-20 kl. 21:25, skrev Pete:
> Hi David,
>
> Thanks for the reply!
> Yes, your code is more or less what I was thinking about.
> The only problem is that aFiles (as it's returned by hb_DirScan() )
> is a multi-dimensional array, so, the hb_Ascan() can't applied directly
> on it.
> We have to split  it. And that's the challenge, whether we could avoid
> the split or not?
>
> regards,
> Pete
>
> On Tuesday, 20 July 2021 at 21:48:01 UTC+3 david...@gmail.com wrote:
>
> Pete,
>
> aFiles := hb_DirScan( "C:\", "*.*" ) // --> (hundreds of thousands
> of files)
> *nPosition *:= 1
> While *nPosition *> 0
> *nPosition *:= hb_AScan(aFiles, <SearchValue>, *nPosition*,
> [<nCount>], [<lExact>])
>    If *nPosition *> 0
>       hb_aDel(aFiles, *nPosition*, .T.)
>    Endif
> Enddo
>
> Cheers,
> David Field
> El martes, 20 de julio de 2021 a la(s) 12:08:24 UTC-5, Pete escribió:
>
> Hello,
>
> Let's say we have an array with filenames.
> Our array is created/populated by using hb_DirScan()
> and might be quite a big array! ;-)
> f.e.:
>
> aFiles := hb_DirScan( "C:\", "*.*" ) // --> (hundreds of
> thousands of files)
>
> now, we want to filter out  (remove from array) all the files
> with extension ".bak"
> and the question is: what`s the most efficient (or the not so
> efficient) way to do that?
>
> regards,
> Pete
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Harbour Users" group.
> Unsubscribe: harbour-user...@googlegroups.com
> Web: http://groups.google.com/group/harbour-users
>
> ---
> You received this message because you are subscribed to the Google
> Groups "Harbour Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to harbour-user...@googlegroups.com
> <mailto:harbour-user...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/harbour-users/19685df2-04dc-4356-8c9e-7c76facd1425n%40googlegroups.com
> <https://groups.google.com/d/msgid/harbour-users/19685df2-04dc-4356-8c9e-7c76facd1425n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Pete

unread,
Jul 20, 2021, 5:23:20 PM7/20/21
to Harbour Users

Hi Klas,

On Tuesday, 20 July 2021 at 23:40:28 UTC+3 Klas Engwall wrote:
Replacing <SearchValue> with a codeblock should take care of that problem:

nPosition := hb_AScan( aFiles, { |aVal| ".bak" $ lower( aVal[ F_NAME ] )
}, nPosition )

Yes and yes! and that's the missing piece, which I was sure
it had been lost  into the mist of past! (read "I had forgoten for good" ).
Thank you very much for the reminding.
 
Whether it is efficient with hundreds of thousands of files is a
different question :-)
 
Well, I suppose (and hope) they won't, usually, be that much.
Most probably tens of thousands or just some thousands.
I just felt the need to give a bit of a dramatic tone in the case. ;-)
Anyway, I'm going to test it tomorow morning and see ...

regards,
Pete

Klas Engwall

unread,
Jul 20, 2021, 6:25:33 PM7/20/21
to harbou...@googlegroups.com
Hi Pete,

Oh, by the way, you should probably adjust the condition a little or
petes.bakery.jpg will be lost :-)

Regards,
Klas
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Harbour Users" group.
> Unsubscribe: harbour-user...@googlegroups.com
> Web: http://groups.google.com/group/harbour-users
>
> ---
> You received this message because you are subscribed to the Google
> Groups "Harbour Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to harbour-user...@googlegroups.com
> <mailto:harbour-user...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/harbour-users/f78be0c3-1bd1-4275-ad25-53999a385ef5n%40googlegroups.com
> <https://groups.google.com/d/msgid/harbour-users/f78be0c3-1bd1-4275-ad25-53999a385ef5n%40googlegroups.com?utm_medium=email&utm_source=footer>.
Message has been deleted

Pete

unread,
Jul 21, 2021, 5:44:18 AM7/21/21
to Harbour Users
On Wednesday, 21 July 2021 at 01:25:33 UTC+3 Klas Engwall wrote:
Hi Pete,

Oh, by the way, you should probably adjust the condition a little or
petes.bakery.jpg will be lost :-)

... not to mention that this "generic" condition, besides the loss of `petes.bakery.jpg`,
might also drive `The.baKlast.Harbourian.gif`, into galaxy's central black hole;
a lot of grievous losses, to (not) allow them happen! ;-)

what about :
----------------------------------------------
            cExt := Lower( hb_FNameExt( cX ) )  // extension to filter out
            nLen := Len( cExt )
            WHILE (nPos := hb_AScan( aList, {|f| cExt == Right( Lower( f[1] ), nLen) } )) > 0
                aList := hb_ADel( aList, nPos, .T. )
            END

----------------------------------------------
but now our filter suffers the overhead of the repeated Right() calls. I see not how it could be avoided, though.
then, what makes me skeptical is another thing: this WHILE ... END loop means that the array is parsed
and reconstructed  (delete/resize) so much times as the number of the extension instances we try to filter out.
Perhaps we could attempt to do the trick in one array pass?
I'm going to try and see whether it is feasible or not...

regards,
Pete

David Field

unread,
Jul 21, 2021, 1:54:00 PM7/21/21
to Harbour Users
Hi Pete,

The biggest problem you have with:
 WHILE (nPos := hb_AScan( aList, {|f| cExt == Right( Lower( f[1] ), nLen) } )) > 0
is that in every ASCAN you are checking from the beginning of the array, that is why I type in bold letters nPosition in my first example.

You can check hb_A · Petewg/harbour-core Wiki · GitHub for parameter options to hb_Ascan

Maybe try:
----------------------------------------------
            cExt := Lower( hb_FNameExt( cX ) )  // extension to filter out
            nLen := Len( cExt )
nPos := 1
            WHILE (nPos := hb_AScan( aList, {|f| cExt == Lower(Right( f[1], nLen))}, nPos )) > 0

                aList := hb_ADel( aList, nPos, .T. )
            END
----------------------------------------------

Changes:
1. Specify the beginning of the search in ASCAN
2. Change the order of Right(Lower(...   to Lower(Right(... since performing a lower to a small number of characters should be faster than to a large number of characters and we are looking for speed!

Cheers,
David Field

Klas Engwall

unread,
Jul 21, 2021, 4:49:23 PM7/21/21
to harbou...@googlegroups.com
Hi Pete,

> what about :
> ----------------------------------------------
>            cExt := Lower( hb_FNameExt( cX ) )  // extension to filter out
>             nLen := Len( cExt )
>             WHILE (nPos := hb_AScan( aList, {|f| cExt == Right( Lower(
> f[1] ), nLen) } )) > 0
>                 aList := hb_ADel( aList, nPos, .T. )
>             END
> ----------------------------------------------

Ouch, f[1], I can smell a magic number there. Give me a minute to
recover from the shock ... :-)

> but now our filter suffers the overhead of the repeated Right() calls. I
> see not how it could be avoided, though.

Yes, I am sure it is very expensive when applied to a large array, but I
can't see either how to avoid it.

> then, what makes me skeptical is another thing: this WHILE ... END loop
> means that the array is parsed
> and reconstructed  (delete/resize) so much times as the number of the
> extension instances we try to filter out.
> Perhaps we could attempt to do the trick in one array pass?
> I'm going to try and see whether it is feasible or not...

Maybe AEval() could be of help there?

One thing you haven't mentioned yet is what you plan to do with the
array after you have cleaned it. Would it be possible to replace the
sub-arrays that you don't want to keep with NIL, for example, and then
ignore the NIL elements in whatever your next step is?

Another thought might be to skip Directory() altogether and replicate
what it does using FileFindFirst() and FileFindNext() from
contrib\hbmisc\ffind.c and check with FileFindName() if the name of each
file found meets your non-".bak" criteria. After all, the combination of
FileFindFirst() and FileFindNext() does the same thing as Directory()
does regarding the search. But you would have to pick up the file names
etc yourself and build the array. You can pre-allocate empty array
elements in chunks of 100 or 1000 or whatever you like instead of
AAdd()ing each element separately. There will of course be a lot of
calls to HB_FUNC level functions, but there will be no need to delete
elements and shrink the array. You would have to first build an array of
directory names using hb_DirScan(). In the end, I am not sure if all the
work will pay off, but at least it should be doable :-)

Regards,
Klas

Auge & Ohr

unread,
Jul 21, 2021, 9:19:15 PM7/21/21
to Harbour Users
hi

aList := hb_ADel( aList, nPos, .T. )

is hb_ADel() the same as  hb_arrayDel() ?
if yes as i can say it does NOT "remove" Item ... it will be NIL

use ADEL()
ADEL(<aArray>, <nPos>)

Jimmy
p.s. why not "include" matching files instead of "remove" ...

      aDir := HB_DirScanM( cPath, { "*.bmp", "*.jpg", "*.jpeg", "*.png", "*.gif", "*.tif", "*.tiff", "*.wmf", "*.emf" }, "", .F. )
      aDir := ASORT( aDir,,, { | aX, aY | aX[ 1 ] < aY[ 1 ] } )
   ENDIF

// Code from EDK ( Edward ) User of HMG Forum
// search for Array of Extension ( Option Attribut type, recursive)
//
FUNCTION hb_DirScanM( cPath, aFileMask, cAttr, lrecursiv )
RETURN hb_DoScanM( hb_DirSepAdd( hb_defaultValue( cPath, "" ) ), ;
                   IIF( HB_ISARRAY( aFileMask ), aFileMask, IIF( HB_ISSTRING( aFileMask ), { aFileMask }, { hb_osFileMask() } ) ), ;
                   hb_defaultValue( cAttr, "" ), ;
                   hb_ps(), lrecursiv )

FUNCTION hb_doScanM( cPath, aMask, cAttr, cPathSep, lrecursiv )
LOCAL aFile
LOCAL lMatch
LOCAL aResult := {}

   DEFAULT lrecursiv TO .T.

   FOR EACH aFile IN hb_vfDirectory( cPath + hb_osFileMask(), cAttr + IF( lrecursiv, "D", "" ) )
      lMatch := .F.
      AEVAL( aMask, { | x | IIF( HB_ISSTRING( x ), lMatch := hb_FileMatch( aFile[ F_NAME ], x ) .OR. lMatch, Nil ) } )
      IF "D" $ aFile[ F_ATTR ]
         IF lMatch .AND. "D" $ cAttr
            IF lrecursiv = .T.
               AADD( aResult, aFile )
            ENDIF
         ENDIF
         IF !( aFile[ F_NAME ] == "." .OR. aFile[ F_NAME ] == ".." .OR. aFile[ F_NAME ] == "" )
            AEVAL( hb_DoScanM( cPath + aFile[ F_NAME ] + cPathSep, aMask, cAttr, cPathSep ), ;
                   { | x | x[ F_NAME ] := aFile[ F_NAME ] + cPathSep + x[ F_NAME ], ;
                   AADD( aResult, x ) } )
         ENDIF
      ELSEIF lMatch
         AADD( aResult, aFile )
      ENDIF
   NEXT

RETURN aResult


Appliserver

unread,
Jul 22, 2021, 1:04:03 PM7/22/21
to harbou...@googlegroups.com

Hi Pete, try this:

procedure main
  setmode(25,80)
  clear
  aDir := DIRECTORY("v:\sorgenti\hmg\test\*.*", "HSD")
  adirlen:=len(aDir)
  nDeleted:=0
  altd()
  ? "aDir, elements:"+str(adirlen)
  // for some reason this does not work
  //aeval(aDir,{|x|iif(lower(right(x[1],4))==".bak",x:=nil,nil)} )
  aeval(aDir,{|x,y|mydel(y)} )
  ? "deleted:"+str(nDeleted)
  asize(aDir,adirlen-nDeleted)
  ? "new size:"+str(len(aDir))
  wait
return

function mydel(n)
  if lower(right(aDir[n,1],4))==".bak"
    ? aDir[n,1]
    aDir[n]=nil
    ++nDeleted
  endif
return nil

Dan

Pete

unread,
Jul 22, 2021, 5:08:32 PM7/22/21
to Harbour Users
Hi David,


On Wednesday, 21 July 2021 at 20:54:00 UTC+3 david...@gmail.com wrote:
You can check hb_A · Petewg/harbour-core Wiki · GitHub for parameter options to hb_Ascan
So You think, huh? Ok, l I 'll probably follow the link; and maybe i 'll try to get in touch with the "author" for details,
--although isn't easy to find anybody in place, at the end of July. :-).

Changes:
1. Specify the beginning of the search in ASCAN
2. Change the order of Right(Lower(...   to Lower(Right(... since performing a lower to a small number of characters should be faster than to a large number of characters and we are looking for speed!

You are perfectly right  in both of your point! And I'm inexcusable for I didn't  read more carefully your first post.
Many thanks!

regards,
Pete

Pete

unread,
Jul 22, 2021, 5:32:24 PM7/22/21
to Harbour Users
Hi Klas,
On Wednesday, 21 July 2021 at 23:49:23 UTC+3 Klas Engwall wrote:
Ouch, f[1], I can smell a magic number there. Give me a minute to
recover from the shock ... :-)
Good old "nitpicking". i wish to assure you that it was not tottally unintentional: the half of it
was due to laziness (saving some typing) the other half  was a sneaky  attempt to
trigger such a nitpicking reaction by any respectable perfectionist user. ;-) 
 
Maybe AEval() could be of help there?
Already  went for it. What makes me some time, reluctant  to use AEval() is the difficulty
to escape from it once it's fired. Anyway I have found a way to do it.
 
One thing you haven't mentioned yet is what you plan to do with the
array after you have cleaned it.
Well, in fact I do not know what the potential user (me or any other) may want to do with the array.
What I'm trying to do is to refactor/refine an old/simple Directory Listing Class [1], aiming firstly at speed
and secondly in gathering some useful info about the listed files/directories (the second without remarkable
loss of first). Next days I 'll try to be back, hopefully, with a first sample of the "product" :-).

[1] ... it's almost sure that I am trying to re-invent a wheel here,
    but even re-inventions keep alive the "art of doing", I think,
    (particularly so much more, when "the doing" is done using our
    excellent Harbour toolset).
 
Would it be possible to replace the
sub-arrays that you don't want to keep with NIL, for example, and then
ignore the NIL elements in whatever your next step is?
Don't think so.
 
Another thought might be to skip Directory() altogether and replicate
what it does using FileFindFirst() and FileFindNext() from
contrib\hbmisc\ffind.c and check with FileFindName() if the name of each
file found meets your non-".bak" criteria. After all, the combination of
FileFindFirst() and FileFindNext() does the same thing as Directory()
does regarding the search. But you would have to pick up the file names
etc yourself and build the array. You can pre-allocate empty array
elements in chunks of 100 or 1000 or whatever you like instead of
AAdd()ing each element separately. There will of course be a lot of
calls to HB_FUNC level functions, but there will be no need to delete
elements and shrink the array. You would have to first build an array of
directory names using hb_DirScan(). In the end, I am not sure if all the
work will pay off, but at least it should be doable :-)
Interesting hints all the above! I'll try to test and see whether they are applicable
for the case. Thanks for suggesting.

regards,
Pete
 

Pete

unread,
Jul 22, 2021, 5:44:14 PM7/22/21
to Harbour Users
Hi Jimmy

Quite interesting code!  thanks for posting.

On Thursday, 22 July 2021 at 04:19:15 UTC+3 Auge & Ohr wrote:
is hb_ADel() the same as  hb_arrayDel() ?
if yes as i can say it does NOT "remove" Item ... it will be NIL
No! hb_ADel() does permanently remove elements  when the third parameter is specified as .T. (true).
otherwise is same as ADel()
 
p.s. why not "include" matching files instead of "remove" ...
it's simple. because i may want to include any other file except of certain extensions.

regards,
Pete

Pete

unread,
Jul 22, 2021, 6:00:50 PM7/22/21
to Harbour Users
Hi Dan,
Oh yes! Good clipper code! ;-) plain and pure and a nostalgic coding-trip
back to glorious eighties and nineties,  much appreciated and enjoyed, indeed.
Thanks!

regards,
Pete

Appliserver

unread,
Jul 22, 2021, 7:00:21 PM7/22/21
to harbou...@googlegroups.com


Il 23/07/2021 00:00, Pete ha scritto:
Hi Dan,
Oh yes! Good clipper code! ;-) plain and pure and a nostalgic coding-trip
back to glorious eighties and nineties,  much appreciated and enjoyed, indeed.
Thanks!

btw, I'm not sure that the elements set to nil are placed at the end of the array, as per aDel(), so maybe it's necessary, before the aSize(), an aSort(...)

The problem with aeval() is that you can't delete directly an element with hb_adel(arr,pos,.t.), that would skip an element and give an error (array access) because the array then is shorter but aeval will try to process the original number of elements.

Maybe aeval(aDir,{|x,y|iif(lower(right(x[1],4))==".bak",adel(aDir,y),nil)} )

The trick is to set the elements containing .bak to nil, and then remove all of them.

It would be interesting to check which code is faster wit BIG arrays.

Dan

--
--
You received this message because you are subscribed to the Google
Groups "Harbour Users" group.
Unsubscribe: harbour-user...@googlegroups.com
Web: http://groups.google.com/group/harbour-users

---
You received this message because you are subscribed to the Google Groups "Harbour Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/harbour-users/8fb077d7-a8dc-4f31-8c0f-6bed75a215f0n%40googlegroups.com.

José Quintas

unread,
Jul 22, 2021, 7:54:52 PM7/22/21
to harbou...@googlegroups.com
FOR nCont = Len( oArray ) TO 1 STEP -1

   IF Lower( Right( oArray[ F_NAME ], 4 ) ) == ".bak"

      hb_ADel( oArray, nCont, .T. )

   ENDIF

NEXT

José M. C. Quintas
Reply all
Reply to author
Forward
0 new messages