Performance of the empty() function

108 views
Skip to first unread message

Itamar Lins

unread,
Nov 26, 2025, 12:45:18 PM (2 days ago) Nov 26
to Harbour Developers
Hi!

I noticed that when I use the empty() function on simple fields up to 3 characters in size, it returns the result very quickly.
But if I use it on large strings and/or in a field in an FTP(memo) file "empty(field->attach)"
"(My PDF file)", it takes a long time to return true or false.
It greatly degrades the speed of browse(), for example on a network.

I believe that "empty()" reads the entire field; the larger the field to be read, the slower "empty()" becomes.

Best regards,
Itamar M. Lins Jr.

Itamar Lins

unread,
Nov 26, 2025, 12:58:09 PM (2 days ago) Nov 26
to Harbour Developers
Here my error sintaxe:  FTP(memo) to FPT(memo)

Francesco Perillo

unread,
Nov 26, 2025, 2:52:37 PM (2 days ago) Nov 26
to harbou...@googlegroups.com
Hi Itamar,
you may have a look at the source code for the empty function at

Probably the interesting code is this:
      case HB_IT_STRING:
      case HB_IT_MEMO:
         hb_retl( hb_strEmpty( hb_itemGetCPtr( pItem ), hb_itemGetCLen( pItem ) ) );
         break;


The definition of empty() for strings is that it should return .T. for a string like "" or a string like space(1000000).
And the only way to check if a string of a given len is really "empty" is to check every byte,,,

HB_BOOL hb_strEmpty( const char * szText, HB_SIZE nLen )
{
   HB_TRACE( HB_TR_DEBUG, ( "hb_strEmpty(%s, %" HB_PFS "u)", szText, nLen ) );
   while( nLen-- )
   {
      char c = szText[ nLen ];
      if( ! HB_ISSPACE( c ) )
         return HB_FALSE;
   }
   return HB_TRUE;
}

So, in case of a empty("") nLen is 0, the while loop is never ever run, HB_TRUE is returned immediately.

In case of empty( space(10000000)), nLen is 10000000 and each byte is checked by HB_ISSPACE macro, that expands to:
#define HB_ISSPACE( c )         ( ( c ) == ' ' || \
                                  ( c ) == HB_CHAR_HT || \
                                  ( c ) == HB_CHAR_LF || \
                                  ( c ) == HB_CHAR_CR )

then we have:
#define HB_CHAR_HT              '\t'    /*   9 - Tab horizontal */
#define HB_CHAR_LF              '\n'    /*  10 - Linefeed */
#define HB_CHAR_CR              '\r'    /*  13 - Carriage return */

So spaces, tabs, linefeeds and carriage returns are all considered "blanks".

There are several tricks to speedup these checks using "recent" cpus, but the code will be architecture dependent.

Despite all these checks I think that it is almost impossible to see a difference between the 2 calls, but we should time it. Probably the most time is spent retrieving the data from the db. In case of browse you should check how many times the empty is called. 

Francesco




--
You received this message because you are subscribed to the Google Groups "Harbour Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-deve...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/harbour-devel/a6c1b2a6-8b3e-4c8b-9115-d948b3468286n%40googlegroups.com.

Itamar Lins

unread,
Nov 26, 2025, 10:18:45 PM (2 days ago) Nov 26
to Harbour Developers
Hi!

Thank you for replying.
In this case, it's a file with a little over 3,000 records, and this DBF has an FPT field with attached PDFs, GIFs, and PNGs.
It's quite simple to use. I do it like this in the browse: {||iif(empty(cx->file),"YES","NO")}, but it degrades a lot this way.
I used a trick by getting the file extension that I store in another field of the DBF itself.

{||iif(empty(cx->suffix),"YES","NO")} and this way the speed doesn't degrade.
I don't know if it's because it reads all the files inside the FPT or because it does a "LOAD" of the specific field in the FPT in question.


Best regards,
Itamar M. Lins Jr.

hmpaquito

unread,
Nov 27, 2025, 2:46:42 AM (yesterday) Nov 27
to Harbour Developers
Hi,
 
 It’s safe to say that storing a lot of information in an FPT file with DbfCdx drastically reduces performance. Years ago, I also tested this with MySQL (BLOB fields), and the exact same thing happened. The best approach is to save the files on disk and store only the full path in a character field.  

Regards

Francesco Perillo

unread,
Nov 27, 2025, 3:19:15 AM (yesterday) Nov 27
to harbou...@googlegroups.com
Is the cx->file a MEMO record that contains the full file ?
In this case you need to read all the file from disk, store it in memory, and the check if empty. The check is really quick since the while ( nLen-- ) loop runs just once (or a maximum a few times). The delay is from recovering the file from the memo field.

The trick you used is fine.

Personally I'd never insert complete, binary files into memo fields but it is doable... why not ?

Francesco Perillo

unread,
Nov 27, 2025, 3:28:46 AM (yesterday) Nov 27
to harbou...@googlegroups.com

DBF access via Harbour and SQL queries are very different beasts working in a really different way. It seems strange to me that mysql reduced performance adding a lot of data into BLOBs. It may be. It may depend on the SELECT.

Anyway, in a case similar to Itamar, I'd create a new boolean field "attachment_present", or a numeric "attachmen_length", then run a script to fill it and amend the codebase to support it. In his case, using the extension is fine, of course.

Itamar Lins

unread,
Nov 27, 2025, 9:13:36 AM (yesterday) Nov 27
to Harbour Developers
Hi! 
IMHO, empty() function, it shouldn't iterate through the entire contents of the slot/container to return true or false.
It just needs to check if, after the start marker (I don't know what it is), the end marker of the container/slot is found; otherwise, it returns .F.
Let's imagine I save a 5 GiB movie in an fpt field. It needs to traverse all 5 GiB to return a simple question ?

Best regard,
Itamar M. Lins Jr.

Francesco Perillo

unread,
Nov 27, 2025, 9:23:33 AM (yesterday) Nov 27
to harbou...@googlegroups.com

empty() returns FALSE as soon as it finds a char that is not defined as HB_ISSPACE.

So if you 
? empty( memoread( "film.avi" ) )
it will spend ONE loop to return FALSE, but you spend a lot of time to retrieve and load the avi into memory.... since you are not "chunking" the load.
If you ask
? empty ( space( 5000000000000 ) )
it MUST iterate on the whole string.

See this special case:
? empty( "X" + space( 500000000000 ) )

Since empty() starts the check from the end of the string, it traverses and checks all the chars and when arrives at the first one (the last to be checked) it finds a char not in HB_ISSPACE and must return FALSE.

My personal idea: you shouldn't insert a big binary blob inside a memo field without storing the metadata somewhere else. If you know that you may need to know if the memo is empty, store the length, calculated once.


Itamar Lins

unread,
Nov 27, 2025, 10:00:47 AM (yesterday) Nov 27
to Harbour Developers
Hi!
The empty function must be smart enough to know if I'm asking about the content of a variable in memory, or if I'm querying a database, in this case a DBF field.
In any case, there are start and end markers.
I understand your point of view; the blank spaces that may be stored are empty data and need to be read.
It is up to the programmer to put a search (while next if = space, etc.) if they find these spaces at the beginning, within the marking, that is, the slot/container.


>My personal idea: you shouldn't insert a big binary blob inside a memo field without storing the metadata somewhere else. If you know that you may need to know if the memo is empty, store the length, calculated once.
Yes!

Best regards,
Itamar M. Lins Jr.

Francesco Perillo

unread,
2:34 AM (9 hours ago) 2:34 AM
to harbou...@googlegroups.com
Hi Itamar,
I can't understand your message, what do you mean for "start and end markers" ?

And how can empty(/) be a "smart" function ? In Harbour there isn't a "FIELD" data type so the funtion can't know.

And, as far as I know, there is no way to retrieve memo fields in chucks (for example, first N bytes) but you may only retrieve them in full.



--
You received this message because you are subscribed to the Google Groups "Harbour Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-deve...@googlegroups.com.

Itamar Lins

unread,
10:02 AM (2 hours ago) 10:02 AM
to Harbour Developers
Hi!
>but you may only retrieve them in full.
I'm asking you to recover them from where?
Where does the file stored in a DBF or in memory begin and end?
e.g. cMyVar := "ANY TEXT" this -> " is mark of start and then end or [] or ' This is marque array {} etc... DBF same process. MARK of START and END. 
SLOT, CONTAINER, etc... nMyVar := 1, dDta := ctod('') all var exist MARK and type and START and END. In memory and DBF/CVS,XLS, etc...


>In Harbour there isn't a "FIELD" data type so the funtion can't know.
Is exists "FIELD->" Is MARK(alias) determine than a variable is stored in a DBF.
"M->" Is mark(alias)  determine than a variable is stored in a MEMORY.


Best regards,
Itamar M. Lins Jr.

Itamar Lins

unread,
10:18 AM (1 hour ago) 10:18 AM
to Harbour Developers
Hi!

Where does the file stored in a DBF or in memory begin and end?
Not any file, but any data or information itself, 
Data is not stored without knowing the location (address) with beginning and end (markers), type, size, whether in DBF or in Memory.
e.g. dbstru aStru := {"name","c",20,0}  Here I specify the type, size (markers with start and end)...
In the DBF file, everything is organized, in which record...
With FPT   aStru := {"attached","M",10,0} It is of variable size, with any type of data.
How does it know where it starts and ends? If it doesn't have start and end markers, it will get lost when searching for the information.


Best regards,
Itamar M. Lins Jr.
Message has been deleted

Marcos Jarrin

unread,
10:24 AM (1 hour ago) 10:24 AM
to Harbour Developers

To create a function that returns whether a memo field is empty, you need to understand the structure of how the DBF and FPT files are designed.

In the DBF file, each memo field occupies exactly 10 bytes per record. These bytes contain a 32-bit binary pointer (first 4 bytes) that indicates the position of the memo's first block in the FPT file (multiplied by the block size, typically 64 bytes); if it's 0x00000000, the field is empty. The remaining 6 bytes are padded with spaces (0x20) and are not used for length or end markers.

FPT File Structure

The FPT file begins with a 512-byte header (8 blocks of 64 bytes), where the first 4 bytes indicate the total number of blocks used and bytes 6-7 define the block size (default 64). Each memo data block (starting from block 15) follows this internal 8-byte structure:

Bytes 0-3: Total number of blocks that make up this complete memo (little-endian uint32).

Bytes 4-7: Length in bytes of this specific block's content (little-endian uint32).

The actual data follows immediately after, and long memos use multiple blocks linked sequentially without trees or complex pointers, only by linear position.

Size and Empty Field Management

To detect if a memo is empty or calculate its approximate size without traversing all blocks, read only the first 8 bytes of the first FPT block using the DBF pointer: if the number of blocks is 0, it's empty; the total size is (number_of_blocks × block_size) minus header overhead. This fixed-block sequential structure allows efficient access but requires at least reading the first block for precise length, since the DBF doesn't store this information.

To implement this, you need to create a function in C that performs this procedure.

Attached is a DBF/FPT file examined with a hexadecimal editor.

HxD - Freeware Hex Editor and Disk Editor
https://mh-nexus.de/en/hxd/

p01.jpgp02.jpg

Francesco Perillo

unread,
11:06 AM (1 hour ago) 11:06 AM
to harbou...@googlegroups.com
Marcos gave you the answer.

The information is stored somewhere, Harbour via the RDDs know how to retrieve it but there is no way (known to me) to get the information via harbour commands.

The Runtime library, where empty() belongs, can't know that the parameter is a field. Even if you use FIELD->x syntax. It is the compiler that adapts its generated code to handle the specific case. the FIELD-> is interpreted as a sort of "visibility filter" in cases where you have different databases with the same field name.
The runtime retrieves the value, from the db, from memory, form whatever, applying all the visibility rules imposed by the semantic and precedences of the language and then pushes it on the stack...  and finally calls empty()



Reply all
Reply to author
Forward
0 new messages