I understand that I can adjust an array like this:
* (setf x (make-array 3 :initial-element 100))
#(100 100 100)
* (setf y (adjust-array x 6 :initial-element 42))
#(100 100 100 42 42 42)
My question is: This adjustment happens at the _end_ of the array, can I
expand the array at the beginning?
Could I do an adjustment on x where the final result is:
#(42 42 42 100 100 100)
Or do I have to write the code to allocate the larger array and manually copy
the elements myself?
Thank you.
-pete
> My question is: This adjustment happens at the _end_ of the array, can I
> expand the array at the beginning?
> [...]
> Or do I have to write the code to allocate the larger array and manually
> copy the elements myself?
I think that you have to copy yourself - but that is rather easy to
do, see eg REPLACE.
In general, if you know that you will not prepend more than a given
number of elements (or you are willing to copy elements occasionally),
you can create an array that is large enough, and displace to it using an
offset. Then if you need extra elements at the beginning, just displace
to a smaller offset.
Tamas
You want a displaced (and adjustable) array.
? (setf a1 (make-array 30))
#(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
? (dotimes (i 30) (setf (aref a1 i) i))
...
? a1
#(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
27 28 29)
? (setf a2 (make-array 10 :displaced-to a1 :displaced-index-offset 15
:adjustable t))
#(15 16 17 18 19 20 21 22 23 24)
? (adjust-array a1 '(15) :displaced-to a1 :displaced-index-offset 10)
#(10 11 12 13 14 15 16 17 18 19 20 21 22 23 24)
rg
I see.
If I knew how big the final sizes of the arrays were I could use
this method, but sadly I don't. I'll file away the method for another
day. It looks like under my circumstances, a new array into which I copy
all of my disparate pieces looks to be one of the only solutions.
Thank you for your time.
-pete
You can make the underlying array adjustable (but not displaced) so that
it can grow too. If you need to grow it at both ends you can use two
adjustable arrays and roll your own displaced array class to do the
indirection.
rg
Sometimes I face a similar problem (collecting an number of items that
are either unknown a prior, or I am lazy to calculate). I usually
solve it the following way: create array with extra elements, keep
using them until I run out, then create a new array with a buffer of
extra elements, copy and continue. This way, you will copy
occasionally, but not all the time. Example:
(use-package '(:bind))
(defun prepend-elements (vector n &optional (extra 256))
(bind (((:values displaced-to index-offset) (array-displacement vector))
(new-length (+ n (length vector))))
(if (and displaced-to (<= n index-offset))
(make-array new-length
:displaced-index-offset (- index-offset n)
:displaced-to displaced-to)
(let ((new-vector (make-array (+ extra new-length))))
(replace new-vector vector :start1 (+ extra n))
(make-array new-length
:displaced-index-offset extra
:displaced-to new-vector)))))
(defparameter *a* #(1 2 3))
(setf *a* (prepend-elements *a* 7))
(array-displacement *a*) ; #(0 0 0 ...), 256
You can add bells and whistles, eg prepended element initialization,
etc.
Tamas
Yeah, I thought of that. But it turns out the layering in my code base makes
that difficult to acheive.
My use case is I'm writing a master/worker style communication system
which can handle 10,000 to 50,000 clients. Buffer management for me is
a serious issue and I need to really conserve memory and not grind the
allocator too much.
The context is that when writing a data array to a client, the buffer
management layer needs to wrap a header around the data before sending it.
I thought about not doing the wrapping at all--just write the small header
then the payload, but then that leads to alternating writes of a very
small amount of data and then a large amount of data. That oscillation
sucks from a network scheduling/bandwidth perspective. I need to write
as many bytes as I can as often as I can.
If it really looks like the write buffering will be problematic from
a memory allocation point of view, I can make pools of buffers which
are reusable for the assembling of the data. I don't want to do that
initially since it is more code.
I was just wondering if I had missed something in Lisp's array management
which would allow me to grow an array at the head of the array. The
answer is somewhat yes, but I'd have to preallocate the array and that
means the application code would have to use a special abstraction of
make-array. There are problems with that I'd rather avoid.
I guess at this point, I'll just assemble the arrays naively and if the
profiler tells me there is a problem there, I'll look into it deeper.
Thank you.
-pete
Naive idea:
what about building a list (you can cons to the beginning of that) and
mapping it to an array only when you are done?
Could it be that a different data structure would be more
appropriate? What about a tree? (also, until recently, I had not
realized how many different kinds of tree there are (2-3, 2-3-4, left
leaning, etc). That would impact your access time, but in your case it
may be manageable, or you could first store in tree, and then transfer
into array.
Mirko
> Tamas K Papp <tkp...@gmail.com> wrote:
> > Sometimes I face a similar problem (collecting an number of items that
> > are either unknown a prior, or I am lazy to calculate). I usually
> > solve it the following way: create array with extra elements, keep
> > using them until I run out, then create a new array with a buffer of
> > extra elements, copy and continue. This way, you will copy
> > occasionally, but not all the time. Example:
>
> Yeah, I thought of that. But it turns out the layering in my code base makes
> that difficult to acheive.
>
> My use case is I'm writing a master/worker style communication system
> which can handle 10,000 to 50,000 clients. Buffer management for me is
> a serious issue and I need to really conserve memory and not grind the
> allocator too much.
>
> The context is that when writing a data array to a client, the buffer
> management layer needs to wrap a header around the data before sending it.
> I thought about not doing the wrapping at all--just write the small header
> then the payload, but then that leads to alternating writes of a very
> small amount of data and then a large amount of data. That oscillation
> sucks from a network scheduling/bandwidth perspective. I need to write
> as many bytes as I can as often as I can.
Seems to me then that what you want is not a growable array but rather
smarter write buffering.
> If it really looks like the write buffering will be problematic from
> a memory allocation point of view, I can make pools of buffers which
> are reusable for the assembling of the data. I don't want to do that
> initially since it is more code.
>
> I was just wondering if I had missed something in Lisp's array management
> which would allow me to grow an array at the head of the array. The
> answer is somewhat yes, but I'd have to preallocate the array and that
> means the application code would have to use a special abstraction of
> make-array. There are problems with that I'd rather avoid.
>
> I guess at this point, I'll just assemble the arrays naively and if the
> profiler tells me there is a problem there, I'll look into it deeper.
That sounds like a fine plan. Premature optimization and all that.
Memcpy is pretty frickin' fast, particularly when compared to network
I/O.
rg
I know a fair number of data structures, but most of them require consing
which is something I need to minimize. Luckily, the storage requirements
of my data aren't tree-like. Thank you for your suggesstion.
To give an example of the problems I'm facing, suppose I have 10,000 clients:
If I have a 16k initial read buffer for each client which can grow in
size up to 128k as needed, then just for the read buffers alone I'll need:
156MB up to 1.250GB of memory needed.
If I have about 20k data buffers I need for each client to write, then
the buffers required for writing are 195MB.
If I have about 20K in memory structures for each client, then another
195MB.
So, under worst case conditions (the clients send me 128KB results all the
time instead of averaging around 16K) for 10,000 clients, I need in total
546MB up to 1.640GB.
Now this needed memory for which I can easily account!
That is a lot of memory to keep resident and perform churn on and hence
why I'm very paranoid about consing or calling make-array. I use other
libraries and things and who knows what their memory usages are. But
at the scaling levels I'm desiring, I have to pay attention to it.
-pete
That doesn't solve the problem, suppose:
(#(1 2 3) (#4 5 6))
I still have to allocate a 6 element array somewhere to map all of that
into a single array: #(1 2 3 4 5 6)
-pete
I do not deny that, have any references? The nice thing about lisp is I
can just go back into that layer and bash the code into something else
with relative ease.
>> I guess at this point, I'll just assemble the arrays naively and if the
>> profiler tells me there is a problem there, I'll look into it deeper.
>
> That sounds like a fine plan. Premature optimization and all that.
> Memcpy is pretty frickin' fast, particularly when compared to network
> I/O.
Speaking of memcpy, in lisp what is the equivalent of memcpy? replace?
Hrm... I should use that in a few places I stupidly wrote the byte copies
with a do loop.
Thanks for the oblique suggestsion. :)
-pete
> RG <rNOS...@flownet.com> wrote:
> > In article <4bd9afe9$0$9898$8026...@spool.cs.wisc.edu>,
> > Peter Keller <psi...@merlin.cs.wisc.edu> wrote:
> >> The context is that when writing a data array to a client, the buffer
> >> management layer needs to wrap a header around the data before sending it.
> >> I thought about not doing the wrapping at all--just write the small header
> >> then the payload, but then that leads to alternating writes of a very
> >> small amount of data and then a large amount of data. That oscillation
> >> sucks from a network scheduling/bandwidth perspective. I need to write
> >> as many bytes as I can as often as I can.
> >
> > Seems to me then that what you want is not a growable array but rather
> > smarter write buffering.
>
> I do not deny that, have any references?
Sorry, no. But surely it's not that hard?
rg
Yeah, it shouldn't be that hard. When it comes to it, I'll just look in the
usual place for papers or descriptions of what people already do.
Thanks for your help.
Later,
-pete
> Speaking of memcpy, in lisp what is the equivalent of memcpy? replace?
> Hrm... I should use that in a few places I stupidly wrote the byte
> copies with a do loop.
Pretty much. Usually, if you want top speed, it pays to look into
your implementations sources and see what they use, then call that
directly. Or, alternatively, declare array types so that the compiler
will do that for you. Eg look at the transforms related to REPLACE in
SBCL, you will see that it will be heavily optimized when given enough
information.
Tamas
Do you know for sure that you can do a better job managing all these
bits than the garbage collector can. If many/most of incoming chunks of
data are short-lived, GC might do better than all the copying.
paul working first, fast later
> I was just wondering if I had missed something in Lisp's array management
> which would allow me to grow an array at the head of the array. The
> answer is somewhat yes, but I'd have to preallocate the array and that
> means the application code would have to use a special abstraction of
> make-array. There are problems with that I'd rather avoid.
Can't you just use the array as a "growable ring buffer"? If the ring
buffer is full, you grow the array, and you can move the start or the
end pointers into your buffer depending on where you want to add data.
(I use one ring buffer per client in a server I've written, but they're
fixed-size with 30000 elements, because in my case if the clients have
a lag of that many transactions, it's better for the clients if the
server just flushes part of its buffer)
--
(espen)
This idea I might check out in more detail.... Thanks!
-pete
I don't know for sure, it is just a feeling I have for where a nsaty thing
might happen.
> paul working first, fast later
That is very true, I don't so much want fast, as resident in memory. :)
Thank you all for your suggestions! They have been very helpful!
-pete
I suggest that you define a new ADT (Abstract Data Type) which
provides *exactly* the functionality you want. Once that's well
defined, we can brainstorm how to best implement it. For example,
you might implement a virtual array as multiple chunks, each a
fixed-size (non-adjustable) array, collected together by a
self-balancing binary-search-tree whose leaves are the chunks. This
would let you insert new segments (chunks) at the start, at the
end, or anywhere in the middle, with no need to ever copy blocks of
data from an old array to a new array.
With so many clients you must synchronize, why even *try* to store
the data in Lisp arrays? Why not use MySQL as your datastore, or if
you need higher performance and have lots of $cash$ then use Oracle
instead?
> 546MB up to 1.640GB.
>
> Now this needed memory for which I can easily account!
>
> That is a lot of memory to keep resident and perform churn on and hence
> why I'm very paranoid about consing or calling make-array. I use other
> libraries and things and who knows what their memory usages are. But
> at the scaling levels I'm desiring, I have to pay attention to it.
10 years ago this was a lot of memory. Now it's small change. 50 or
100G is something to think about, but less than 4 doesn't count for
much unless you're planning on some vast number of instances.
> 10 years ago this was a lot of memory. Now it's small change. 50 or
> 100G is something to think about, but less than 4 doesn't count for
> much unless you're planning on some vast number of instances.
It depends a bit on the life time of the memory and the requirements of
your server. I have one server application instance that uses 6-7GB of
memory, and which needs to run a full GC freeing most of that once a
day. That takes a about 5 seconds (I think, I haven't looked at it in
detail for a while, it runs for months without problems now), which is
not critical at all for *this* server, since I can run the GC at 2 in
the morning when the server rarely is needed. But other applications may
have other requirements.
--
(espen)
> It depends a bit on the life time of the memory and the requirements of
> your server. I have one server application instance that uses 6-7GB of
> memory, and which needs to run a full GC freeing most of that once a
> day. That takes a about 5 seconds (I think, I haven't looked at it in
> detail for a while, it runs for months without problems now), which is
> not critical at all for *this* server, since I can run the GC at 2 in
> the morning when the server rarely is needed. But other applications may
> have other requirements.
I think that if GCs have problems which make using large heaps
difficult (and 6 or 7GB is not really large), then GC implementation
need to get better - perhaps the Azul people are right about that.
> I think that if GCs have problems which make using large heaps
> difficult (and 6 or 7GB is not really large), then GC implementation
> need to get better - perhaps the Azul people are right about that.
I looked at the real numbers for last night, it freed 1.76GB out of 4.48
and used 7.6 seconds cpu time. At least it's way better than a server
app I struggled with some 10 years ago, which used 15 hours for a couple
of hundred megabytes ;-)
--
(espen)
So, I got my master/worker codes working and the epilogue to this was I
would run out of heap memory (and get a heap exhaustion error!) because
I was creating and interning too many keyword symbols(!!) which permanently
consumed all of the memory.
I was using them for unique identification purposes for objects in
the codes.... I see now the folly of that. I've changed the unique
identification to some other kind of object which is garbage collected. :)
Later,
-pete
> So, I got my master/worker codes working and the epilogue to this was I
> would run out of heap memory (and get a heap exhaustion error!) because
> I was creating and interning too many keyword symbols(!!) which permanently
> consumed all of the memory.
Yikes! How many keyword symbols did you create before this happened?
--
(espen)
About 1.5 million. :) I am using SBCL 1.0.38.something with the default
heap sizes and whatnot.
The master codes can generate tasks into the hundreds of millions range
with about 1 million in memory at once depending upon size of the forms
in the tasks being sent to the workers and the size of the results when
they come back.
I changed the unique id to be short strings created with format, which
can be collected, instead of keyword symbols. Integers might be better,
and if there is some more problem in scaling I'll move to that. I wanted
something descriptive for the various objects in the system for tracking
purposes.
The thing about the identifiers is that they have to be valid across
process boundaries, so really using keyword symbols for that in the
first place was a terrible idea. :)
Later,
-pete
> About 1.5 million. :) I am using SBCL 1.0.38.something with the default
> heap sizes and whatnot.
Hmm, my LispWorks 6.0 (mac 64-bit) didn't think that was too hard - it
created 1.5 million keywords in 25 seconds (including some formatting)
- but it did grow to 1.4GB ;-)
But I agree, it's not a good idea anyway.
--
(espen)
Symbols are a heavyweight structure: they contain a name reference, a
function slot, a value slot, a property list, and a package reference
- at least 40 bytes (64-bit) plus a header if the implementation uses
headers.
Doesn't add up to 1.4GB though. Lispworks is likely expanding the
whole heap so that the symbol space is a constant fraction of it.
George
> Symbols are a heavyweight structure: they contain a name reference, a
> function slot, a value slot, a property list, and a package reference
> - at least 40 bytes (64-bit) plus a header if the implementation uses
> headers.
Symbols don't actually need to contain these slots - for instance an
implementation could have very lightweight symbols whose "slots" were
actually obtained by having hashtables keyed on some unique property of
the symbol (this probably has to be something like address I think,
because names do not need to be unique - (eq (make-symbol "FOO")
(make-symbol "FOO")) is false.
So uninterned symbols with no slots used could be quite small - maybe
two or three words + storage for the name (which would dominate the
space).
I wonder if any implementations have done this? I suspect not.
Yup.
+---------------
| I wonder if any implementations have done this? I suspect not.
+---------------
Well, I know of one that at least *partially* does it... ;-} ;-}
CMUCL has no SYMBOL-FUNCTION slot in its symbols, only -VALUE, -PLIST,
-NAME, & -PACKAGE. [And a non-standard HASH slot, used only internally.]
SYMBOL-FUNCTION is actually implemented as a call to the CMUCL-idiosyncratic
internal "info" database:
cmu> (info function definition '+)
#<FDEFINITION object for +>
T
cmu> (kernel:fdefn-function *)
#<Function + {10112289}>
cmu> (eq * #'+)
T
cmu> (eq ** (symbol-function '+))
T
cmu>
In practice, most functions are resolved at compile time, so this path
is used mainly for FUNCALLs of non-constants which evaluate to symbols
at runtime.
-Rob
-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607