Performance of the empty() function

92 views
Skip to first unread message

Itamar Lins

unread,
Nov 26, 2025, 12:45:18 PM (2 days ago) Nov 26
to Harbour Developers
Hi!

I noticed that when I use the empty() function on simple fields up to 3 characters in size, it returns the result very quickly.
But if I use it on large strings and/or in a field in an FTP(memo) file "empty(field->attach)"
"(My PDF file)", it takes a long time to return true or false.
It greatly degrades the speed of browse(), for example on a network.

I believe that "empty()" reads the entire field; the larger the field to be read, the slower "empty()" becomes.

Best regards,
Itamar M. Lins Jr.

Itamar Lins

unread,
Nov 26, 2025, 12:58:09 PM (2 days ago) Nov 26
to Harbour Developers
Here my error sintaxe:  FTP(memo) to FPT(memo)

Francesco Perillo

unread,
Nov 26, 2025, 2:52:37 PM (2 days ago) Nov 26
to harbou...@googlegroups.com
Hi Itamar,
you may have a look at the source code for the empty function at

Probably the interesting code is this:
      case HB_IT_STRING:
      case HB_IT_MEMO:
         hb_retl( hb_strEmpty( hb_itemGetCPtr( pItem ), hb_itemGetCLen( pItem ) ) );
         break;


The definition of empty() for strings is that it should return .T. for a string like "" or a string like space(1000000).
And the only way to check if a string of a given len is really "empty" is to check every byte,,,

HB_BOOL hb_strEmpty( const char * szText, HB_SIZE nLen )
{
   HB_TRACE( HB_TR_DEBUG, ( "hb_strEmpty(%s, %" HB_PFS "u)", szText, nLen ) );
   while( nLen-- )
   {
      char c = szText[ nLen ];
      if( ! HB_ISSPACE( c ) )
         return HB_FALSE;
   }
   return HB_TRUE;
}

So, in case of a empty("") nLen is 0, the while loop is never ever run, HB_TRUE is returned immediately.

In case of empty( space(10000000)), nLen is 10000000 and each byte is checked by HB_ISSPACE macro, that expands to:
#define HB_ISSPACE( c )         ( ( c ) == ' ' || \
                                  ( c ) == HB_CHAR_HT || \
                                  ( c ) == HB_CHAR_LF || \
                                  ( c ) == HB_CHAR_CR )

then we have:
#define HB_CHAR_HT              '\t'    /*   9 - Tab horizontal */
#define HB_CHAR_LF              '\n'    /*  10 - Linefeed */
#define HB_CHAR_CR              '\r'    /*  13 - Carriage return */

So spaces, tabs, linefeeds and carriage returns are all considered "blanks".

There are several tricks to speedup these checks using "recent" cpus, but the code will be architecture dependent.

Despite all these checks I think that it is almost impossible to see a difference between the 2 calls, but we should time it. Probably the most time is spent retrieving the data from the db. In case of browse you should check how many times the empty is called. 

Francesco




--
You received this message because you are subscribed to the Google Groups "Harbour Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-deve...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/harbour-devel/a6c1b2a6-8b3e-4c8b-9115-d948b3468286n%40googlegroups.com.

Itamar Lins

unread,
Nov 26, 2025, 10:18:45 PM (2 days ago) Nov 26
to Harbour Developers
Hi!

Thank you for replying.
In this case, it's a file with a little over 3,000 records, and this DBF has an FPT field with attached PDFs, GIFs, and PNGs.
It's quite simple to use. I do it like this in the browse: {||iif(empty(cx->file),"YES","NO")}, but it degrades a lot this way.
I used a trick by getting the file extension that I store in another field of the DBF itself.

{||iif(empty(cx->suffix),"YES","NO")} and this way the speed doesn't degrade.
I don't know if it's because it reads all the files inside the FPT or because it does a "LOAD" of the specific field in the FPT in question.


Best regards,
Itamar M. Lins Jr.

hmpaquito

unread,
Nov 27, 2025, 2:46:42 AM (yesterday) Nov 27
to Harbour Developers
Hi,
 
 It’s safe to say that storing a lot of information in an FPT file with DbfCdx drastically reduces performance. Years ago, I also tested this with MySQL (BLOB fields), and the exact same thing happened. The best approach is to save the files on disk and store only the full path in a character field.  

Regards

Francesco Perillo

unread,
Nov 27, 2025, 3:19:15 AM (yesterday) Nov 27
to harbou...@googlegroups.com
Is the cx->file a MEMO record that contains the full file ?
In this case you need to read all the file from disk, store it in memory, and the check if empty. The check is really quick since the while ( nLen-- ) loop runs just once (or a maximum a few times). The delay is from recovering the file from the memo field.

The trick you used is fine.

Personally I'd never insert complete, binary files into memo fields but it is doable... why not ?

Francesco Perillo

unread,
Nov 27, 2025, 3:28:46 AM (yesterday) Nov 27
to harbou...@googlegroups.com

DBF access via Harbour and SQL queries are very different beasts working in a really different way. It seems strange to me that mysql reduced performance adding a lot of data into BLOBs. It may be. It may depend on the SELECT.

Anyway, in a case similar to Itamar, I'd create a new boolean field "attachment_present", or a numeric "attachmen_length", then run a script to fill it and amend the codebase to support it. In his case, using the extension is fine, of course.

Itamar Lins

unread,
Nov 27, 2025, 9:13:36 AM (21 hours ago) Nov 27
to Harbour Developers
Hi! 
IMHO, empty() function, it shouldn't iterate through the entire contents of the slot/container to return true or false.
It just needs to check if, after the start marker (I don't know what it is), the end marker of the container/slot is found; otherwise, it returns .F.
Let's imagine I save a 5 GiB movie in an fpt field. It needs to traverse all 5 GiB to return a simple question ?

Best regard,
Itamar M. Lins Jr.

Francesco Perillo

unread,
Nov 27, 2025, 9:23:33 AM (21 hours ago) Nov 27
to harbou...@googlegroups.com

empty() returns FALSE as soon as it finds a char that is not defined as HB_ISSPACE.

So if you 
? empty( memoread( "film.avi" ) )
it will spend ONE loop to return FALSE, but you spend a lot of time to retrieve and load the avi into memory.... since you are not "chunking" the load.
If you ask
? empty ( space( 5000000000000 ) )
it MUST iterate on the whole string.

See this special case:
? empty( "X" + space( 500000000000 ) )

Since empty() starts the check from the end of the string, it traverses and checks all the chars and when arrives at the first one (the last to be checked) it finds a char not in HB_ISSPACE and must return FALSE.

My personal idea: you shouldn't insert a big binary blob inside a memo field without storing the metadata somewhere else. If you know that you may need to know if the memo is empty, store the length, calculated once.


Itamar Lins

unread,
Nov 27, 2025, 10:00:47 AM (20 hours ago) Nov 27
to Harbour Developers
Hi!
The empty function must be smart enough to know if I'm asking about the content of a variable in memory, or if I'm querying a database, in this case a DBF field.
In any case, there are start and end markers.
I understand your point of view; the blank spaces that may be stored are empty data and need to be read.
It is up to the programmer to put a search (while next if = space, etc.) if they find these spaces at the beginning, within the marking, that is, the slot/container.


>My personal idea: you shouldn't insert a big binary blob inside a memo field without storing the metadata somewhere else. If you know that you may need to know if the memo is empty, store the length, calculated once.
Yes!

Best regards,
Itamar M. Lins Jr.

Francesco Perillo

unread,
2:34 AM (3 hours ago) 2:34 AM
to harbou...@googlegroups.com
Hi Itamar,
I can't understand your message, what do you mean for "start and end markers" ?

And how can empty(/) be a "smart" function ? In Harbour there isn't a "FIELD" data type so the funtion can't know.

And, as far as I know, there is no way to retrieve memo fields in chucks (for example, first N bytes) but you may only retrieve them in full.



--
You received this message because you are subscribed to the Google Groups "Harbour Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-deve...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages