Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Fast memory allocation for short-living structures

166 views
Skip to first unread message

Dmitry Solomennikov

unread,
Sep 18, 2017, 3:35:10 AM9/18/17
to
Hello, guys!

I have code, which use data structures heavily.
It is a part of translator backend.

;; Everything is in SBCL

(defstruct Flow
(id 0 :type fixnum :read-only t))

(defstruct (TreeNode (:include Flow))
(fkey nil :type T :read-only T)
(fvalue nil :type T :read-only T)
(fleft nil :type T :read-only T)
(fright nil :type T :read-only T)
(fdepth nil :type FIXNUM :read-only T))

(defstruct (SyntaxTree (:include Flow))
(frule nil :type (SIMPLE-ARRAY CHARACTER (*)) :read-only T)
(fchoice nil :type FIXNUM :read-only T)
(fstart nil :type FIXNUM :read-only T)
(fend nil :type FIXNUM :read-only T)
(fchildren nil :type (SIMPLE-VECTOR *) :read-only T))

I also have a lot of another structures, but this two used much more then others.

(profile) shows:

measuring PROFILE overhead..done
seconds | gc | consed | calls | sec/call | name
----------------------------------------------------------------
2.613 | 0.896 | 1,777,617,264 | 27,607,929 | 0.000000 | MAKE-TREENODE
0.228 | 0.019 | 197,294,080 | 3,035,461 | 0.000000 | MAKE-SYNTAXTREE
----------------------------------------------------------------
2.841 | 0.915 | 1,974,911,344 | 30,643,390 | | Total


sb-sprof (by time, top lines) shows:

Self Total Cumul
Nr Count % Count % Count % Calls Function
------------------------------------------------------------------------
1 269 9.0 275 9.2 269 9.0 - MAKE-TREENODE
2 135 4.5 336 11.2 404 13.5 - SB-KERNEL:%MAKE-ARRAY
3 135 4.5 135 4.5 539 18.0 - (LAMBDA (SB-PCL::.ARG0. SB-PCL::.ARG1.) :IN "/private/tmp/sbcl-20170628-89938-8798f8/sbcl-1.3.19/src/pcl/dlisp3.fasl")
...

I have 30M requests for creating data structures, which live very short time, also, almost a second spent cleaning the garbage.

There are two questions:
1) May (and how) memory allocation be done cheaper/faster? Looks like Java can do the same twice as faster (I have Java backend also which is structural the same).
2) What does third line in sb-sprof's output mean?

For the first question manardb looks promising, but it is Linux only at the moment. I hope there is a cross-OS solution (if not, I'll be thinking of fixing osicat and manardb).

Dmitry.

Madhu

unread,
Sep 18, 2017, 4:51:21 AM9/18/17
to

* Dmitry Solomennikov Wrote on Mon, 18 Sep 2017 00:35:02 -0700 (PDT):

> I have 30M requests for creating data structures, which live very
> short time, also, almost a second spent cleaning the garbage.

The usual answer to this is to resource your objects, so the overhead of
creation is avoided. You can find various implementations of
USING-RESOURCE like in CLIM. (But these are useful when object creation
is expensive. The sort of structs you showed ought to be inexpensive to
create and collected easily, if they are really ephemeral objects.
Check a lisp different implementation like lw if you have the same
performance characteristics)

It is not immediately clear to me the resourcing will pay off if your
objects are nodes of a rapidly changing tree. But if only a fraction of
the 30 million objects are used at any time you can certainly avoid the
overhead of creation by reusing the objects (i.e. resourcing them)

Pascal J. Bourguignon

unread,
Sep 18, 2017, 7:47:29 PM9/18/17
to
Dmitry Solomennikov <99dm...@gmail.com> writes:

> Hello, guys!
>
> I have code, which use data structures heavily.
> It is a part of translator backend.
>
> ;; Everything is in SBCL
>
> (defstruct Flow
> (id 0 :type fixnum :read-only t))
>
> (defstruct (TreeNode (:include Flow))
> (fkey nil :type T :read-only T)
> (fvalue nil :type T :read-only T)
> (fleft nil :type T :read-only T)
> (fright nil :type T :read-only T)
> (fdepth nil :type FIXNUM :read-only T))

Type error: NIL is not a FIXNUM.

>
> (defstruct (SyntaxTree (:include Flow))
> (frule nil :type (SIMPLE-ARRAY CHARACTER (*)) :read-only T)
Type error: NIL is not a SIMPLE-ARRAY.
> (fchoice nil :type FIXNUM :read-only T)
Type error: NIL is not a FIXNUM.
> (fstart nil :type FIXNUM :read-only T)
Type error: NIL is not a FIXNUM.
> (fend nil :type FIXNUM :read-only T)
Type error: NIL is not a FIXNUM.
> (fchildren nil :type (SIMPLE-VECTOR *) :read-only T))
Type error: NIL is not a SIMPLE-ARRAY.


Don't you run your code thru a Common Lisp compiler?


> I also have a lot of another structures, but this two used much more then others.
> [...]

> I have 30M requests for creating data structures, which live very
> short time, also, almost a second spent cleaning the garbage.
>
> There are two questions:

> 1) May (and how) memory allocation be done cheaper/faster? Looks like
> Java can do the same twice as faster (I have Java backend also which
> is structural the same).

You can pre-allocate all the objects you will ever need, and never
deallocate them. Particularly, if you need to allocate such short lived
objects again and again several times.


Also, in the case of structures having slots of different types, you can
improve memory usage, buy splitting them over specialized arrays.


--
__Pascal J. Bourguignon
http://www.informatimago.com

Spiros Bousbouras

unread,
Sep 18, 2017, 9:21:47 PM9/18/17
to
From CLHS on DEFSTRUCT

Each slot-initform supplied for a defstruct component, when used by the
constructor function for an otherwise unsupplied component, is re-evaluated
on every call to the constructor function. The slot-initform is not evaluated
unless it is needed in the creation of a particular structure instance. If it
is never needed, there can be no type-mismatch error, even if the type of the
slot is specified; no warning should be issued in this case. For example, in
the following sequence, only the last call is an error.

(defstruct person (name 007 :type string))
(make-person :name "James")
(make-person)

Which compiler did you use ? It seems it has a bug.

Madhu

unread,
Sep 18, 2017, 9:50:27 PM9/18/17
to

* Spiros Bousbouras <r9Z1NS8GeEmZYgwD...@bongo-ra.co> :
Wrote on Tue, 19 Sep 2017 01:21:40 GMT:

>
> From CLHS on DEFSTRUCT
>
> Each slot-initform supplied for a defstruct component, when used
> by the constructor function for an otherwise unsupplied component,
> is re-evaluated on every call to the constructor function. The
> slot-initform is not evaluated unless it is needed in the creation
> of a particular structure instance. If it is never needed, there
> can be no type-mismatch error, even if the type of the slot is
> specified; no warning should be issued in this case. For example,
> in the following sequence, only the last call is an error.
>
> (defstruct person (name 007 :type string))
> (make-person :name "James")
> (make-person)
>
> Which compiler did you use ? It seems it has a bug.

CMUCL used to issue this warning. Personally I think the warning is
advisable because the actual runtime type error could be ugly to stare
at the debugger, if the compiler uses the type information when
compiling the defstruct.

BTW the CMUCL implementation also used a pattern where a form which
results in an error is supplied as the slot-initform for required
arguments, it supplied an EXTENSIONS:REQUIRED-ARGUMENT for use in user
code.

(defun required-argument ()
"This function can be used as the default value for keyword arguments that
must be always be supplied. Since it is known by the compiler to never
return, it will avoid any compile-time type warnings that would result from a
default value inconsistent with the declared type. When this function is
called, it signals an error indicating that a required keyword argument was
not supplied. This function is also useful for DEFSTRUCT slot defaults
corresponding to required arguments."
(error "A required keyword argument was not supplied."))

Pascal J. Bourguignon

unread,
Sep 18, 2017, 10:12:32 PM9/18/17
to
Spiros Bousbouras <spi...@gmail.com> writes:

> From CLHS on DEFSTRUCT
>
> Each slot-initform supplied for a defstruct component, when used by the
> constructor function for an otherwise unsupplied component, is re-evaluated
> on every call to the constructor function. The slot-initform is not evaluated
> unless it is needed in the creation of a particular structure instance. If it
> is never needed, there can be no type-mismatch error, even if the type of the
> slot is specified; no warning should be issued in this case. For example, in
> the following sequence, only the last call is an error.
>
> (defstruct person (name 007 :type string))
> (make-person :name "James")
> (make-person)
>
> Which compiler did you use ? It seems it has a bug.

Right, I was already instantiating structures in my head.

But if that's what was intended, then it should have been better written
as:

(defstruct (SyntaxTree (:include Flow))
(frule (error "Missing :frule argument.") :type (SIMPLE-ARRAY CHARACTER (*)) :read-only T)
(fchoice (error "Missing :fchoice argument.") :type FIXNUM :read-only T)
(fstart (error "Missing :fstart argument.") :type FIXNUM :read-only T)
(fend (error "Missing :fend argument.") :type FIXNUM :read-only T)
(fchildren (error "Missing :fchildren argument.") :type (SIMPLE-VECTOR *) :read-only T))

(make-syntaxtree)
> Debug: Missing :frule argument.

Paul Rubin

unread,
Sep 19, 2017, 3:41:56 AM9/19/17
to
Dmitry Solomennikov <99dm...@gmail.com> writes:
> I have 30M requests for creating data structures, which live very
> short time, also, almost a second spent cleaning the garbage.

On a modern pc, 30m objects isn't that many. Use a bigger memory region
so they all get allocated without any gc'ing. Then when you gc, the
live data gets copied. Is there a way to shut off ephemeral gc for
an operation like this?

Pascal Costanza

unread,
Sep 19, 2017, 5:35:53 AM9/19/17
to
On 18/09/2017 09:35, Dmitry Solomennikov wrote:
> I have 30M requests for creating data structures, which live very short time, also, almost a second spent cleaning the garbage.
>
> There are two questions:
> 1) May (and how) memory allocation be done cheaper/faster? Looks like Java can do the same twice as faster (I have Java backend also which is structural the same).

Depending on how your code is organized, it can be a good idea to learn
about dynamic-extent declarations. This allows you to allocate objects
on the stack rather than on the heap, which is typically significantly
faster. However, you have to be sure that you understand the lifetime of
such objects, because if used incorrectly, a wrong dynamic-extent
declaration can crash your programs. Different compilers vary to what
extent they support dynamic-extent, so you have to check the compiler
documentation. (To the best of my understanding, no CL compiler does
escape analysis to automatically and safely put objects on the stack, so
you have to manually assist the compiler with dynamic-extent declarations.)

Pascal

--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/

Spiros Bousbouras

unread,
Sep 19, 2017, 9:04:51 AM9/19/17
to
On Tue, 19 Sep 2017 11:35:45 +0200
Pascal Costanza <p...@p-cos.net> wrote:
> On 18/09/2017 09:35, Dmitry Solomennikov wrote:
> > I have 30M requests for creating data structures, which live very short time, also, almost a second spent cleaning the garbage.
> >
> > There are two questions:
> > 1) May (and how) memory allocation be done cheaper/faster? Looks like Java can do the same twice as faster (I have Java backend also which is structural the same).
>
> Depending on how your code is organized, it can be a good idea to learn
> about dynamic-extent declarations. This allows you to allocate objects
> on the stack rather than on the heap, which is typically significantly
> faster.

What makes you think it will be faster ? The most straightforward way to
allocate on the heap , if we assume that the heap grows upwards , is to have
a pointer pointing to the highest used address and simply increase the
pointer to allocate more objects. If we need to increase the heap size then
extra work will be needed but if we only need to increase the pointer then
it will be no slower than allocating items on the stack.

> However, you have to be sure that you understand the lifetime of
> such objects, because if used incorrectly, a wrong dynamic-extent
> declaration can crash your programs.

Or even worse corrupt the heap. If it crashes your programme then you
will at least know that something is wrong but if it corrupts the heap
then you may be getting wrong results and not realise it. With that
in mind , and considering also that the rules mentioned in CLHS for
correctly using DYNAMIC-EXTENT seem complicated to me , that a compiler
may not use the declaration in any useful way and that stack allocation
won't necessarily be faster than heap allocation , DYNAMIC-EXTENT is
the last thing I would use. If nothing else works I would consider
writing part of the code in C or trying to understand enough of the
internals of the CL implementation to see what can be improved about my
speed issues.

> Different compilers vary to what
> extent they support dynamic-extent, so you have to check the compiler
> documentation. (To the best of my understanding, no CL compiler does
> escape analysis to automatically and safely put objects on the stack, so
> you have to manually assist the compiler with dynamic-extent declarations.)

--
It is a page-turner, but only backwards as you try to re read the
previous page to try and figure out what happened!!
Reader amazon review on "The algebraist" by Iain M. Banks
http://www.amazon.co.uk/review/R2FP0JKG5VIVEO

Pascal J. Bourguignon

unread,
Sep 19, 2017, 10:14:53 AM9/19/17
to
Dmitry Solomennikov <99dm...@gmail.com> writes:

> Hello, guys!
>
> I have code, which use data structures heavily.
> It is a part of translator backend.
>
> ;; Everything is in SBCL
>
> (defstruct Flow
> (id 0 :type fixnum :read-only t))
>
> (defstruct (TreeNode (:include Flow))
> (fkey nil :type T :read-only T)


Also, I don't know if any implementation does this, but it may be worth
a try: if you can avoid references in your structure slots (ie. not
using type T, but something more specific that could be stored directly
in the slot like FIXNUMs), this would allow the implementation to know
that it doesn't have to scan the structure to check for object
references, so it would be able to garbage collect the structure more
rapidly. This would be worth it as well if you chose to splice your
structure in vectors as indicated in my other message (then it could
eventually collect the whole vectors without scanning them).

Dmitry Solomennikov

unread,
Sep 20, 2017, 4:11:15 AM9/20/17
to
вторник, 19 сентября 2017 г., 6:47:29 UTC+7 пользователь informatimago написал:
>
> Type error: NIL is not a FIXNUM.
>
> >
> > (defstruct (SyntaxTree (:include Flow))
> > (frule nil :type (SIMPLE-ARRAY CHARACTER (*)) :read-only T)
> Type error: NIL is not a SIMPLE-ARRAY.
> > (fchoice nil :type FIXNUM :read-only T)
> Type error: NIL is not a FIXNUM.
> > (fstart nil :type FIXNUM :read-only T)
> Type error: NIL is not a FIXNUM.
> > (fend nil :type FIXNUM :read-only T)
> Type error: NIL is not a FIXNUM.
> > (fchildren nil :type (SIMPLE-VECTOR *) :read-only T))
> Type error: NIL is not a SIMPLE-ARRAY.
>
>
> Don't you run your code thru a Common Lisp compiler?
>

Sure. SBCL 1.3.19 Mac OS. No errors here.

Dmitry Solomennikov

unread,
Sep 20, 2017, 4:16:21 AM9/20/17
to
вторник, 19 сентября 2017 г., 21:14:53 UTC+7 пользователь informatimago написал:
>
> Also, I don't know if any implementation does this, but it may be worth
> a try: if you can avoid references in your structure slots (ie. not
> using type T, but something more specific that could be stored directly
> in the slot like FIXNUMs), this would allow the implementation to know
> that it doesn't have to scan the structure to check for object

Unfortunately, I can't follow this. This is an ML-like language and structures
can be of any type. It is impossible to make it more precise.

> references, so it would be able to garbage collect the structure more
> rapidly. This would be worth it as well if you chose to splice your
> structure in vectors as indicated in my other message (then it could
> eventually collect the whole vectors without scanning them).

This is another idea I want to check, thanks for advice.

Dmitry Solomennikov

unread,
Sep 20, 2017, 4:22:02 AM9/20/17
to
понедельник, 18 сентября 2017 г., 14:35:10 UTC+7 пользователь Dmitry Solomennikov написал:
>
> ;; Everything is in SBCL
>

The code is at SBCL 1.3.19, MacOS, but also it should work on Windows and Linux as well.

I've measured runtime more precisely and I know, that out of 30M objects 25M are garbage and overall volume consed is 20 Gb. Due to structure of code (it is autogenerated mostly) manual preallocation is hard (if even possible).

Getting objects via (sb-ext:finalize ...) looks promising (I used it for measurement), but I have no idea if it possible to catch objects, thrown by GC. Direct collecting in finalize hang the image.

Pascal J. Bourguignon

unread,
Sep 20, 2017, 12:11:54 PM9/20/17
to
Why don't you read the user manual?

Günther Thomsen

unread,
Sep 21, 2017, 11:58:20 AM9/21/17
to
On Tuesday, September 19, 2017 at 3:04:51 PM UTC+2, Spiros Bousbouras wrote:
> On Tue, 19 Sep 2017 11:35:45 +0200
> Pascal Costanza <p...@p-cos.net> wrote:
> > On 18/09/2017 09:35, Dmitry Solomennikov wrote:
> > > I have 30M requests for creating data structures, which live very short time, also, almost a second spent cleaning the garbage.
> > >
> > > There are two questions:
> > > 1) May (and how) memory allocation be done cheaper/faster? Looks like Java can do the same twice as faster (I have Java backend also which is structural the same).
> >
> > Depending on how your code is organized, it can be a good idea to learn
> > about dynamic-extent declarations. This allows you to allocate objects
> > on the stack rather than on the heap, which is typically significantly
> > faster.
>
> What makes you think it will be faster ? The most straightforward way to
> allocate on the heap , if we assume that the heap grows upwards , is to have
> a pointer pointing to the highest used address and simply increase the
> pointer to allocate more objects. If we need to increase the heap size then
> extra work will be needed but if we only need to increase the pointer then
> it will be no slower than allocating items on the stack.
>
That would be the most primitive allocator. Chances are, it's not well suited for the general case (internal fragmentation). Stack allocation might not be much (or perhaps not even significantly) faster than a specialized heap allocator, but gc typically will be (if there must be gc, which isn't obvious here).

Pascal Costanza

unread,
Sep 23, 2017, 8:19:42 PM9/23/17
to
On 19/09/2017 15:04, Spiros Bousbouras wrote:
> On Tue, 19 Sep 2017 11:35:45 +0200
> Pascal Costanza <p...@p-cos.net> wrote:
>> On 18/09/2017 09:35, Dmitry Solomennikov wrote:
>>> I have 30M requests for creating data structures, which live very short time, also, almost a second spent cleaning the garbage.
>>>
>>> There are two questions:
>>> 1) May (and how) memory allocation be done cheaper/faster? Looks like Java can do the same twice as faster (I have Java backend also which is structural the same).
>>
>> Depending on how your code is organized, it can be a good idea to learn
>> about dynamic-extent declarations. This allows you to allocate objects
>> on the stack rather than on the heap, which is typically significantly
>> faster.
>
> What makes you think it will be faster ?

Experience. It's not only the allocation itself, but also the additional
pressure on the garbage collector, plus in a multi-threaded system, you
eventually need to coordinate memory management between the threads. All
that becomes irrelevant with stack allocation. In my experience, it's
one of the most important things to optimize, just after type
declarations. YMMV, as usual.

Pascal

--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
The views expressed are my own, and not those of my employer.

Albert van der Horst

unread,
Jan 31, 2018, 8:10:02 AM1/31/18
to
In article <90e0d4b5-2e3e-4606...@googlegroups.com>,
Dmitry Solomennikov <99dm...@gmail.com> wrote:
>=D0=B2=D1=82=D0=BE=D1=80=D0=BD=D0=B8=D0=BA, 19 =D1=81=D0=B5=D0=BD=D1=82=D1=
>=8F=D0=B1=D1=80=D1=8F 2017 =D0=B3., 21:14:53 UTC+7 =D0=BF=D0=BE=D0=BB=D1=8C=
>=D0=B7=D0=BE=D0=B2=D0=B0=D1=82=D0=B5=D0=BB=D1=8C informatimago =D0=BD=D0=B0=
>=D0=BF=D0=B8=D1=81=D0=B0=D0=BB:
>>=20
>> Also, I don't know if any implementation does this, but it may be worth
>> a try: if you can avoid references in your structure slots (ie. not
>> using type T, but something more specific that could be stored directly
>> in the slot like FIXNUMs), this would allow the implementation to know
>> that it doesn't have to scan the structure to check for object
>
>Unfortunately, I can't follow this. This is an ML-like language and structu=
>res
>can be of any type. It is impossible to make it more precise.
>
>> references, so it would be able to garbage collect the structure more
>> rapidly. This would be worth it as well if you chose to splice your
>> structure in vectors as indicated in my other message (then it could
>> eventually collect the whole vectors without scanning them).
>
>This is another idea I want to check, thanks for advice.=20

I'm thinking about a lisp implemented in Forth.
This gave me an idea.
Apart from the short lived data on the stack that is restricted to
single machine words (items that fit in 32 bits) there are two memory
spaces in Forth.
One is allocated strictly incremental (ALLOT), effectively as stack, and
the other is random and can be garbage collected. (ALLOCATE).

With FORGET , resetting a pointer, the data that was
ALLOTted can be removed very fast.

Now the idea is in e.g. a calculation to ALLOT all intermediate
results and only ALLOCATE the result. Then FORGET all the
intermediate stuff.

I think this mechanism could be added to any lisp implementation,
and could be worthwhile for e.g. the infamous recursive
Fibonacci calculation. Only the result is made permanent and
the O(N^phi) intermediate objects are just discarded in one go.

Note that the type of the objects don't come into play.
You just ALLOCATE the objects that remain after the calculation
and maybe a few objects they refer to. All the rest is garbage.

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Pascal J. Bourguignon

unread,
Jan 31, 2018, 9:36:06 AM1/31/18
to
This is why there is this DYNAMIC-EXTENT declaration.

The objects bound to variables (or local functions) that are declared
dynamic-extend are considered to never escape the dynamic scope of those
variables, and therefore the implementation can allocate them on the
stack, and free them when the variable is destroyed.


That said, I consider DYNAMIC-EXTENT one of the most difficult to
implement correctly features. Most compilers are not sophisticated
enough to do but only a half-assed try at it, I'd feel safer if they
avoided trying (and I refuse to use this declaration in my code: either
the compiler is smart enough to infer it automatically and correctly, or
if it's not smart enough to infer it, it's probably not smart enough to
implement it correctly).

Kaz Kylheku

unread,
Jan 31, 2018, 12:41:40 PM1/31/18
to
On 2018-01-31, Albert van der Horst <alb...@cherry.spenarnc.xs4all.nl> wrote:
>>This is another idea I want to check, thanks for advice.=20
>
> I'm thinking about a lisp implemented in Forth.

Not this shit again.
0 new messages