hb_serialize() is hanging - sample

406 views
Skip to first unread message

Miroslav Georgiev

unread,
Nov 30, 2017, 7:46:01 AM11/30/17
to Harbour Developers
I have intermittent problems with hb_serialize() when used with little more data than usual.
It's hanging (no alt-C) and using 100% CPU

I made sample to demonstrate the problem.
It hangs very easily - usually on 1st or 2nd loop.

PROCEDURE Main()
local a, c, i

   
while .t.
        a
:= {}
       
for i := 1 to 1000000
            aAdd
( a, { "asdfasdfasdf", 50, 20, 10, {} } )
       
next
       
?hb_DateTime(), "before"
        c
:= hb_serialize( a )
       
?hb_DateTime(), "after - press alt-C for exit"
       
InKey( 1 )
    enddo

return

Miroslav Georgiev

unread,
Nov 30, 2017, 7:57:56 AM11/30/17
to Harbour Developers
tested with both last hb32 and last hb34 (32&64)

Alain Aupeix

unread,
Nov 30, 2017, 8:24:25 AM11/30/17
to harbou...@googlegroups.com
Le 30/11/2017 à 13:57, Miroslav Georgiev a écrit :
> tested with both last hb32 and last hb34 (32&64)
>
> On Thursday, November 30, 2017 at 2:46:01 PM UTC+2, Miroslav Georgiev
> wrote:
>
> I have intermittent problems with hb_serialize() when used with
> little more data than usual.
> It's hanging (no alt-C) and using 100% CPU
>
> I made sample to demonstrate the problem.
> It hangs very easily - usually on 1st or 2nd loop.
>
> |
> PROCEDURE Main()
> locala,c,i
>
> while.t.
> a :={}
> fori :=1to 1000000
> aAdd(a,{"asdfasdfasdf",50,20,10,{}})
> next
> ?hb_DateTime(),"before"
> c :=hb_serialize(a )
> ?hb_DateTime(),"after - press alt-C for exit"
> InKey(1)
> enddo
>
> return
> |
>
Don't know exactly the usage of hb_serialize, but melting with a sample
found on

http://www.creasolgroup.com/xOraclipLanguageReferenceGuide/xOraClip%20Language%20Reference/Functions/Hb_serialize_f.en.html

I made it work without problem.
______________________________________________________________________

PROCEDURE Main()
local a, i, bBlock:={||alert(pstring)}

private pstring:="asdfasdfasdf", c

while .t.
a := {}
for i := 1 to 1000000
aAdd( a, { pstring, 50, 20, 10, {} } )
next
?hb_DateTime(), "before"
c := hb_serialize( bBlock )
?hb_DateTime(), "after - press alt-C for exit"
InKey( 1 )
enddo

return
______________________________________________________________________

A+
--
------------------------------------------------------------------------
Alain Aupeix
http://jujuland.pagesperso-orange.fr/
http://pissobi-lacassagne.pagesperso-orange.fr/
------------------------------------------------------------------------
U.buntu 12.04 | G.ramps 3.4.9-1 | H.arbour 3.2.0dev (2016-12-16 10:05) |
Hw.Gui 2.20-3 (2630)
------------------------------------------------------------------------
Message has been deleted

Teo Fonrouge

unread,
Nov 30, 2017, 8:54:01 AM11/30/17
to Harbour Developers
Hi Miroslav,

It seems to be related to the internal C function HB_SIZE hb_itemSerialSize() when
checking for cyclic references on the items to be serialized and it hangs on the
HB_BOOL hb_itemSerialValueRef()  function.

For the moment you can use the HB_SERIALIZE_IGNOREREF flag when calling the
HB_Serialize function to get the process done. However you need to be aware of
the following:

2014-08-27 18:19 UTC+0200 Przemyslaw Czerpak (druzus/at/poczta.onet.pl)

  * include/hbserial.ch

  * src/rtl/itemseri.c

    + added HB_SERIALIZE_IGNOREREF flag.

      This flag fully disables logic used to detect multireferences to the

      same complex (sub)items like arrays or hashes. It increses the speed

      of serialization but serialized data does not contain any information

      about refences, i.e. aVal[ 1 ] and aVal[ 2 ] in code below:

         aSub := { 1, 2, 3 }

         aVal := { aSub, aSub }

      are serialized as separated arrays. Additionally items with cyclic

      references like:

         aSub[ 2 ] := aSub

      cannot be serialized at all with HB_SERIALIZE_IGNOREREF flag because

      it will create infinite serialization loop and crash with out of

      memory message.



Your sample works as follow:

#include "hbserial.ch"


PROCEDURE Main()

local a, c, i


    while .t.

        a := {}

        for i := 1 to 1000000

            aAdd( a, { "asdfasdfasdf", 50, 20, 10, {} } )

        next

        ?hb_DateTime(), "before"

        c := hb_serialize( a, HB_SERIALIZE_IGNOREREF )

        ?hb_DateTime(), "after - press alt-C for exit"

        InKey( 1 )

    enddo


return



Seems that this will need to be checked more deeply for Przemek.

best regards,

Teo

Pete

unread,
Nov 30, 2017, 9:26:23 AM11/30/17
to Harbour Developers
Alternatively to solution provided by Teo
you could use:

PROCEDURE Main()
local a, c, i

    while .t.
        a := {}
        for i := 1 to 1000000
            aAdd( a, { "asdfasdfasdf", 50, 20, 10, {} } )
        next
        ?hb_DateTime(), "before"
        c := hb_serialize( a )
        ?hb_DateTime(), "after - press alt-C for exit"

        a := NIL  // NOTE This line added


        InKey( 1 )
    enddo

return
 
it seems to work fine here...
(tested with hb34 but do not believe it'b be any different in hb32 too.)

Aleksander Czajczynski

unread,
Dec 1, 2017, 11:28:24 AM12/1/17
to harbou...@googlegroups.com
Thanks all for sharing observations. It seems to me that the app doesn't halt, nor it is infinite loop or corruption.
I think it's something about memory management - OS struggles much more with memmove() operation in subsequent loop.
On a bit older and busy system i can't wait to see the first loop finish. But on a newer machine, HB_Serialize() is blazingly fast first time , yet the second loop is a struggle! a := NIL may help some, but not everywhere.

I've made a comparsion with AMF3 serialization from contrib/hbamf which i happen to create. It's much slower, yet it finishes second loop on my older test machine (few minutes slower), where HB_Serialize() struggles event on first loop.


PROCEDURE Main()
local a, c, i

    while .t.
        a := {}
        for i := 1 to 1000000
            aAdd( a, { "asdfasdfasdf", 50, 20, 10, {} } )
        next
        ?hb_DateTime(), "before"
        c := amf3_encode( a )
        ? Len( c )

        ?hb_DateTime(), "after - press alt-C for exit"
        InKey( 1 )
        a := NIL
    enddo

return

results:
01.12.2017 16:55:38.289 before
  14000017
01.12.2017 16:57:05.064 after - press alt-C
01.12.2017 16:57:06.732 before
  14000017
01.12.2017 17:06:21.523 after - press alt-C

From two minutes to...

Best regards, Aleksander Czajczynski

Teo Fonrouge wrote:
Hi Miroslav,

It seems to be related to the internal C function HB_SIZE hb_itemSerialSize() when
checking for cyclic references on the items to be serialized.

For the moment you can use the HB_SERIALIZE_IGNOREREF flag when calling the
HB_Serialize function.

Enter code here...



On Thursday, November 30, 2017 at 6:46:01 AM UTC-6, Miroslav Georgiev wrote:

Aleksander Czajczyński

unread,
Dec 2, 2017, 4:19:47 PM12/2/17
to harbou...@googlegroups.com
Got some spare minutes this evening to make a little visualization:

Test #1, like the original report.

PROCEDURE Main()
   LOCAL a, c, i

   WAIT
   
   WHILE .T.
      a := {}
      FOR i := 1 TO 1000000
         AAdd( a, { "asdfasdfasdf", 50, 20, 10, {} } )
      NEXT
      ? hb_DateTime(), "before"
      c := hb_Serialize( a )
      ? Len( c )
      ? hb_DateTime(), "after - press alt-C for exit"
      Inkey( 1 )
   ENDDO

   RETURN

Test #2, with forced garbage collection - apparently free’d the block from the OS.

PROCEDURE Main()
   LOCAL a, c, i

   WAIT
   
   WHILE .T.
      a := {}
      FOR i := 1 TO 1000000
         AAdd( a, { "asdfasdfasdf", 50, 20, 10, {} } )
      NEXT
      ? hb_DateTime(), "before"
      c := hb_Serialize( a )
      ? Len( c )
      ? hb_DateTime(), "after - press alt-C for exit"
      a := NIL
      Inkey( 1 )
      hb_gcAll()
   ENDDO

   RETURN


Test #3, forced GC + optimized array allocation a bit, because the array size is known.

PROCEDURE Main()
   LOCAL a, c, i

   WAIT
   
   WHILE .T.
      a := Array( 1000000 )
      FOR i := 1 TO 1000000
         a[ i ] := { "asdfasdfasdf", 50, 20, 10, {} }
      NEXT
      ? hb_DateTime(), "before"
      c := hb_Serialize( a )
      ? Len( c )
      ? hb_DateTime(), "after - press alt-C for exit"
      a := NIL
      Inkey( 1 )
      hb_gcAll()
   ENDDO

   RETURN

While i’m not exactly sure that it is OS kernel memory fragmentation issue, it could be, but OTOH the machine has still significant amount of memory free. The layout graphed is of course virtual one, real may be different.
Could it be that in first scenario Harbour tries to reuse blocks already reserved in GC?

After a while i realized that this variant, with outer array allocated once, without forced GC also seems to perform without significant bottlnecks.


PROCEDURE Main()
   LOCAL a, c, i

   WAIT
   
   WHILE .T.
      a := Array( 1000000 )
      FOR i := 1 TO 1000000
         a[ i ] := { "asdfasdfasdf", 50, 20, 10, {} }
      NEXT
      ? hb_DateTime(), "before"
      c := hb_Serialize( a )
      ? Len( c )
      ? hb_DateTime(), "after - press alt-C for exit"
      Inkey( 1 )
   ENDDO

   RETURN

Best regards, Aleksander Czajczyński

On 1 Dec 2017, at 17:28, Aleksander Czajczynski <h...@fki.pl> wrote:

Thanks all for sharing observations. It seems to me that the app doesn't halt, nor it is infinite loop or corruption.
I think it's something about memory management - OS struggles much more with memmove() operation in subsequent loop.
On a bit older and busy system i can't wait to see the first loop finish. But on a newer machine, HB_Serialize() is blazingly fast first time , yet the second loop is a struggle! a := NIL may help some, but not everywhere.


It seems to be related to the internal C function HB_SIZE hb_itemSerialSize() when
checking for cyclic references on the items to be serialized.

For the moment you can use the HB_SERIALIZE_IGNOREREF flag when calling the
HB_Serialize function.

Enter code here...



On Thursday, November 30, 2017 at 6:46:01 AM UTC-6, Miroslav Georgiev wrote:
I have intermittent problems with hb_serialize() when used with little more data than usual.
It's hanging (no alt-C) and using 100% CPU

I made sample to demonstrate the problem.
It hangs very easily - usually on 1st or 2nd loop.


--
You received this message because you are subscribed to the Google Groups "Harbour Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-deve...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Miroslav Georgiev

unread,
Dec 3, 2017, 3:51:25 PM12/3/17
to Harbour Developers
Hi Aleksander.

Very nice visualization. How you did it?


On Thursday, November 30, 2017 at 2:46:01 PM UTC+2, Miroslav Georgiev wrote:

Aleksander Czajczyński

unread,
Dec 4, 2017, 4:33:04 PM12/4/17
to harbou...@googlegroups.com

> On 3 Dec 2017, at 21:51, Miroslav Georgiev <miro....@gmail.com> wrote:
>
> Hi Aleksander.
>
> Very nice visualization. How you did it?

https://www.softwareverify.com/cpp-virtual-memory.php

Another similar tool is as VMMap from Sysinternals.

There is also very interesting RAMMap from Sysinternals which shows psychical layout, and therefore real memory allocations (but that is of course underlying OS dependant).
Unfortunately RAMMap doesn’t show a graphical map out of the box, nor it is refreshing in realtime, but it’s log file can be parsed and you can make one yourself if you really want.

I was changing reference list resize scheme in Harbour’s rtl/itemseri.c,
at https://github.com/harbour/core/blob/master/src/rtl/itemseri.c#L317
but without significant change, so i’m shooting that OS memory manager is already stressed with millions of arrays allocated.

I personally wouldn't serialize such large structures, but anyway if you want to, then you should be aware that multidimensional arrays in Harbour (like in Clipper) are technically arrays of arrays, not matrices.
It means that allocating such array, you have million separate small allocations (in fact two millions, because there is another one array inside in the example) which have to be handled. Matrix structure on the other hand can be allocated once.

It’d be better to split such serialization in chunks, store 1 serialized „row” in database 1 row/record or just use the DB.

Some other programming languages are allowing to use either type array, depending on needs:
https://stackoverflow.com/questions/597720/what-are-the-differences-between-a-multidimensional-array-and-an-array-of-arrays

I think that there is nothing wrong with hb_serialize(). The method it internally stores reference table entries need to use memove(), so it adds to the case scattered allocations. I think it's by design very fast to the point. Using HB_SERIALIZE_IGNOREREF flag is also good idea in such specific use case (millions of unique arrays).

In my tests, single „stressed” memmove() takes 3ms or even more, that is calculating simply 3ms * 1000000 easily gets to 3000s.

Best regards, Aleksander Czajczyński


Miroslav Georgiev

unread,
Dec 5, 2017, 5:20:42 AM12/5/17
to Harbour Developers
Thank you very much - I'll check these.

About my serialize scenario - when I commit changes to DBF files i log "undo" information via RDDI_TRIGGER. So every changed field/record is logged.

In case of an error I rollback all changes step by step. This works well. Lately I tested to write much more data, so log is growed much larger and I noticed the problem.

I use data serialization in this case for compression purposes - to store more data in memory. As you suggested I would cut data into smaller pieces for serialization&compress.

Przemyslaw Czerpak

unread,
Dec 8, 2017, 10:22:19 AM12/8/17
to harbou...@googlegroups.com
Hi,

The time problem is the cost of memmove() in current code used
to detect multiple references.
In such usage like yours there are no multiple references so you
can safely use HB_SERIALIZE_IGNOREREF what resolves the time problem.

When I was writing this code I was to lazy to create binary tree
to keep references. Maybe I'll find a while to implement it in this
month.

best regards,
Przemek


On Tue, 05 Dec 2017, Miroslav Georgiev wrote:

> Thank you very much - I'll check these.
>
> About my serialize scenario - when I commit changes to DBF files i log
> "undo" information via RDDI_TRIGGER. So every changed field/record is
> logged.
>
> In case of an error I rollback all changes step by step. This works well.
> Lately I tested to write much more data, so log is growed much larger and I
> noticed the problem.
>
> I use data serialization in this case for compression purposes - to store
> more data in memory. As you suggested I would cut data into smaller pieces
> for serialization&compress.
>
> On Monday, December 4, 2017 at 11:33:04 PM UTC+2, Aleksander Czajczyński
> wrote:
> >
> >
> > > On 3 Dec 2017, at 21:51, Miroslav Georgiev <miro....@gmail.com

Miroslav Georgiev

unread,
Dec 9, 2017, 2:59:13 AM12/9/17
to Harbour Developers
many thanks
Reply all
Reply to author
Forward
0 new messages